Temporally delayed linear modelling (TDLM) measures replay in both animals and humans
Abstract
There are rich structures in offtask neural activity which are hypothesized to reflect fundamental computations across a broad spectrum of cognitive functions. Here, we develop an analysis toolkit – temporal delayed linear modelling (TDLM) – for analysing such activity. TDLM is a domaingeneral method for finding neural sequences that respect a prespecified transition graph. It combines nonlinear classification and linear temporal modelling to test for statistical regularities in sequences of taskrelated reactivations. TDLM is developed on the noninvasive neuroimaging data and is designed to take care of confounds and maximize sequence detection ability. Notably, as a linear framework, TDLM can be easily extended, without loss of generality, to capture rodent replay in electrophysiology, including in continuous spaces, as well as addressing secondorder inference questions, for example, its temporal and spatial varying pattern. We hope TDLM will advance a deeper understanding of neural computation and promote a richer convergence between animal and human neuroscience.
Introduction
Human neuroscience has made remarkable progress in detailing the relationship between the representations of different stimuli during task performance (Haxby et al., 2014; Kriegeskorte et al., 2008; Barron et al., 2016). At the same time, it is increasingly clear that resting, offtask, brain activities are structurally rich (Smith et al., 2009; Tavor et al., 2016). An ability to study spontaneous activity with respect to taskrelated representation is important for understanding cognitive process beyond current sensation (Higgins et al., 2021). However, unlike the case for taskbased activity, little attention has been given to techniques that can measure representational content of resting brain activity in humans.
Unlike human neuroscience, representational content of resting activity is studied extensively in animal neuroscience. One seminal example is ‘hippocampal replay’ (Wilson and McNaughton, 1994; Skaggs and McNaughton, 1996; Louie and Wilson, 2001; Lee and Wilson, 2002): during sleep, and quiet wakefulness, place cells in the hippocampus (that signal selflocation during periods of activity) spontaneously recapitulate old, and explore new, trajectories through an environment. These internally generated sequences are hypothesized to reflect a fundamental feature of neural computation across tasks (Foster, 2017; Ólafsdóttir et al., 2018; Pfeiffer, 2020; Carr et al., 2011; Lisman et al., 2017). Numerous methods have been proposed to analyse hippocampal replay (Davidson et al., 2009; Grosmark and Buzsáki, 2016; Maboudi et al., 2018). However, they are not domain general in that they are designed to be most suited for specific needs, such as particular task design, data modality, or research question (van der Meer et al., 2020; Tingley and Peyrache, 2020). Most commonly, these methods apply to invasive electrophysiology signals, aiming to detect sequences in a linear track during spatial navigation task (Tingley and Peyrache, 2020). As a result, they cannot be directly adapted for analysing human resting activity collected using noninvasive neuroimaging techniques. Furthermore, in rodent neuroscience, it is nontrivial to adapt these algorithms to even small changes in tasks (such as 2D foraging). This may be a limiting factor in taking replay analyses to more interesting and complex tasks, such as complex mazes (Rosenberg et al., 2021).
Here, we introduce temporal delayed linear modelling (TDLM), a domaingeneral analysis toolkit, for characterizing temporal structure of internally generated neural representations in rodent electrophysiology as well as human neuroimaging data. TDLM is inspired by existing replay detection methods (Skaggs and McNaughton, 1996; Davidson et al., 2009; Grosmark and Buzsáki, 2016), especially those analysis of population of replay events (Grosmark and Buzsáki, 2016). It is developed based on the general linear modelling (GLM) framework and can therefore easily accommodate testing of ‘secondorder’ statistical questions (van der Meer et al., 2020), such as whether there is more forward than reverse replay, or is replay strength changing over time, or differs between behavioural conditions. This type of question is ubiquitous in cognitive studies, but is typically addressed ad hoc in other replay detection methods (van der Meer et al., 2020). In TDLM, such questions are treated naturally as linear contrasts of effects in a GLM.
Here, we show TDLM is suited to measure the average amount of replay across many events (i.e. replay strength) in linear modelling. This makes it applicable to both rodent electrophysiology and human neuroimaging. Applying TDLM on noninvasive neuroimaging data in humans, we, and others, have shown it is possible to measure the average sequenceness (propensity for replay) in spontaneous neural representations (Wimmer et al., 2020; Nour et al., 2021; Liu et al., 2019; Liu et al., 2021a). The results resemble key characteristics found in rodent hippocampal replay and inform key computational principles of human cognition (Liu et al., 2019).
In the following sections, we first introduce the logic and mechanics of TDLM in detail, followed by a careful treatment of its statistical inference procedure. We test TDLM in both simulation (see section ‘Simulating MEG data’) and real human MEG/EEG data (see section ‘Human replay dataset’). We then turn to rodent electrophysiology and compare TDLM to existing rodent replay methods, extending TDLM to work on a continuous state space. Lastly, using our approach we reanalyse rodent electrophysiology data from Ólafsdóttir et al., 2016 (see section ‘Rodent replay dataset’) and show what TDLM can offer uniquely compared to existing methods in rodent replay analysis.
To summarize, TDLM is a general, and flexible, tool for measuring neural sequences. It facilitates crossspecies investigations by linking largescale measurements in humans to singleneuron measurements in nonhuman species. It provides a powerful tool for revealing abstract cognitive processes that extend beyond sensory representation, potentially opening doors for new avenues of research in cognitive science.
Results
Temporal delayed linear modelling
Overview of TDLM
Our primary goal is to test for temporal structure of neural representations in humans. However, to facilitate crossspecies investigation (Barron et al., 2021), we also want to extend this method to enable measurement of sequences in other species (e.g. rodents). Consequently, this sequence detection method has to be domain general. We chose to measure sequences in a decoded state space (e.g. posterior estimated locations in rodents [Grosmark and Buzsáki, 2016] or time course of taskrelated reactivations in humans [Liu et al., 2019]) as this makes results from different data types comparable.
Ideally, a general sequence detection method should (1) uncover structural regularities in the reactivation of neural activity, (2) control for confounds that are not of interest, and (3) test whether this regularity conforms to a hypothesized structure. To achieve these goals, we developed the method under a GLM framework, and henceforth refer to it as temporal delayed linear modelling, that is, TDLM. Although TDLM works on a decoded state space, it still needs to take account of confounds inherent in the data where the state space is decoded from. This is a main focus of TDLM.
The starting point of TDLM is a set of n time series, each corresponding to a decoded neural representation of a task variable of interest. This is what we call the state space, X, with dimension of time by states. These time series could themselves be obtained in several ways, described in detail in a later section (‘Getting the states’). The aim of TDLM is to identify taskrelated regularities in sequences of these representations.
Consider, for example, a task in which participants have been trained such that n = 4 distinct sensory objects (A, B, C, and D) appear in a consistent order $:A\to B\to C\to D$ (Figure 1a, b). If we are interested in replay of this sequence during subsequent resting periods (Figure 1c, d), we might want to ask statistical questions of the following form: 'Does the existence of a neural representation of A, at time T, predict the occurrence of a representation of B at time T+ $\u2206t$?' and similarly for $B\to C$ and $C\to D$.
In TDLM, we ask such questions using a twostep process. First, for each of the n^{2} possible pairs of variables X_{i} and X_{j}, we find the linear relation between the X_{i} time series and the $\u2206t$shifted X_{j} time series. These n^{2} relations comprise an empirical transition matrix, describing how likely each variable is to be succeeded at a lag of $\u2206t$ by each other variable (Figure 1e). Second, we linearly relate this empirical transition matrix with a taskrelated transition matrix of interest (Figure 1f). This produces a single number that characterizes the extent to which the neural data follow the transition matrix of interest, which we call ‘sequenceness’. Finally, we repeat this entire process for all $\u2206t$ of interest, yielding a measure of sequenceness at each possible lag between variables and submit this for statistical inference (Figure 1g).
Note that, for now, this approach decomposes a sequence (such as $A\to B\to C\to D$) into its constituent transitions and sums the evidence for each transition. Therefore, it does not require that the transitions themselves are sequential: $A\to B$ and $B\to C$ could occur at unrelated times, so long as the withinpair time lag was the same. For interested readers, we address how to strengthen the inference by looking explicitly for longer sequences in Appendix 1: Multistep sequences.
Constructing the empirical transition matrix
In order to find evidence for statetostate transitions at some time lag $\u2206t$, we could regress a timelagged copy of one state, ${X}_{j}$, onto another, ${X}_{i}$ (omitting residual term ε in all linear equations):
Instead, TDLM chooses to include all states in the same regression model for important reasons, detailed in section ‘Moving to multiple linear regression’:
In this equation, the values of all states ${X}_{k}$ at time t are used in a single multilinear model to predict the value of the single state ${X}_{j}$ at time $t+\u2206t$.
The regression described in Equation 2 is performed once for each ${X}_{j}$, and these equations can be arranged in matrix form as follows:
Each row of X is a time point, and each of the n columns is a state. $X\left(\u2206t\right)$ is the same matrix as X, but with the rows shifted forwards in time by $\u2206t$. ${\beta}_{ij}$ is an estimate of the influence of ${X}_{i}\left(t\right)$ on ${X}_{j}\left(t+\u2206t\right)$. $\beta $ is an $n\times n$ matrix of weights, which we call the empirical transition matrix.
To obtain $\beta $, we invert Equation 3 by ordinary least squares regression:
This inversion can be repeated for each possible time lag ( $\u2206t=1,2,3,\dots $), resulting in a separate empirical transition matrix β at every time lag. We call this step the firstlevel sequence analysis.
Testing the hypothesized transitions
The firstlevel sequence analysis assesses evidence for all possible statetostate transitions. The next step in TDLM is to test for the strength of a particular hypothesized sequence, specified as a transition matrix,T. Therefore, we construct another GLM which relates T to the empirical transition matrix, β. We call this step the secondlevel sequence analysis:
As noted above, $\beta $ is the empirical transition matrix obtained from firststage GLM. It has dimension of $n$ by $n$, where $n$ is the number of states. Each entry in $\beta$ reflects the unique contribution of state i to state j at given time lag. Effectively, the above equation models this empirical transition matrix $\beta $ as a weighted sum of prespecified template matrices, $T}_{r$. Thus, $r$ is the number of regressors included in the secondstage GLM, and each scalar valued $Z\left(r\right)$ is the weight assigned to the $r$ th template matrix. Put in other words, $T}_{r$ constitutes the regressors in the design matrix, each of which has a prespecified template structure, for example, $T}_{auto$, $T}_{const$, $T}_{F$, and $T}_{B$ (Figure 1h).
$T}_{F$ and $T}_{B$ are the transpose of each other (e.g. red and blue entries in Figure 1b), indicating transitions of interest in forward and backward direction, respectively. In 1D physical space, $T}_{F$ and $T}_{B$ would be the shifted diagonal matrices with ones on the first upper and lower off diagonals. $T}_{const$ is a constant matrix that models away the average of all transitions, ensuring that any weight on $T}_{F$ and $T}_{B$ reflects its unique contribution. $T}_{auto$ is the identity matrix. $T}_{auto$ models selftransitions to control for autocorrelation (equivalently, we could simply omit the diagonal elements from the regression).
Z is the weights of the secondlevel regression, which is a vector with dimension of 1 by $r$. Each entry in Z reflects the strength of the hypothesized transitions in the empirical ones, that is, sequenceness. Repeating the regression of Equation 5 at each time lag ($\mathrm{\Delta}t=1,2,3,\dots$) results in time courses of the sequenceness as a function of time lag (e.g. the solid black line in Figure 1f). $Z}_{F$, $Z}_{B$ are the forward and backward sequenceness, respectively (e.g. red and blue lines in Figure 1g).
In many cases, Z_{F} and Z_{B} will be the final outputs of a TDLM analysis. However, it may sometimes also be useful to consider the quantity:
$D$ contrasts forward and backward sequences to give a measure that is positive if sequences occur mainly in a forward direction and negative if sequences occur mainly in a backward direction. This may be advantageous if, for example, $Z}_{F$ and $Z}_{B$ are correlated across subjects (due to factors such as subject engagement and measurement sensitivity). In this case, $D$ may have lower crosssubject variance than either $Z}_{F$ or $Z}_{B$ as the subtraction removes common variance.
Finally, to test for statistical significance, TDLM relies on a nonparametric permutationbased method. The null distribution is constructed by randomly shuffling the identities of the n states many times and recalculating the secondlevel analysis for each shuffle (Figure 1g). This approach allows us to reject the null hypothesis that there is no relationship between the empirical transition matrix and the taskdefined transition of interest. Note that there are many incorrect ways to perform permutations, which permute factors that are not exchangeable under the null hypothesis and therefore lead to false positives. We examine some of these later with simulations and real data. In some cases, it may be desirable to test slightly different hypotheses by using a different set of permutations; this is discussed later.
If the time lag $\mathrm{\Delta}t$ at which neural sequences exist is not known a priori, then we must correct for multiple comparisons over all tested lags. This can be achieved by using the maximum Z_{F} across all tested lags as the test statistic (see details in section 'Correcting for multiple comparisons'). If we choose this test statistic, then any values of Z_{F} exceeding the 95th percentile of the null distribution can be treated as significant at $\alpha =0.05$ (e.g. the grey dotted line in Figure 1g).
TDLM steps in detail
Getting the states
As described above, the input to TDLM is a set of time series of decoded neural representations or states. Here, we provide different examples of specific state spaces (X, with dimension of time by states) that we have worked with using TDLM.
States as sensory stimuli
The simplest case, perhaps, is to define a state in terms of a neural representation of sensory stimuli, for example, face, house. To obtain their associated neural representation, we present these stimuli in a randomized order at the start of a task and record wholebrain neural activity using a noninvasive neuroimaging method, for example, Magnetoencephalography (MEG) or Electroencephalography (EEG). We then train a model to map the pattern of recorded neural activity to the presented image (Figure 1—figure supplement 1). This could be any of the multitude of available decoding models. For simplicity, we used a logistic regression model throughout.
The states here are defined in terms of stimulievoked neural activity. The classifiers are trained at 200 ms poststimulus onset. For example, the stimuli are faces, buildings, body parts, and objects. Source localizing the evoked neural activity, we found that the activation patterns of stimuli in MEG signal are consistent with those reported in fMRI literature. For faces, activation peaked in a region roughly consistent with the fusiform face area (FFA) as well as the occipital face area (OFA). Activation for building stimuli was located between a parahippocampal place area (PPA) and retrosplenial cortex (RSC), a region also known to respond to scene and building stimuli. Activation for body part stimuli localized to a region consistent with the extrastriate body area (EBA). Activation for objects was in a region consistent with an objectassociated lateral occipital cortex (LOC) as well as an anterior temporal lobe (ATL) cluster that may relate to conceptual processing of objects. Those maps are thresholded to display localized peaks. The full unthresholded maps can be found at https://neurovault.org/collections/6088/. This is adapted from Wimmer et al., 2020.
In MEG/EEG, neural activity is recorded by multiple sensor arrays on the scalp. The sensor arrays record wholebrain neural activity at millisecond temporal resolution. To avoid a potential selection bias (given the sequence is expressed in time), we choose wholebrain sensor activity at a single time point (i.e. spatial feature) as the training data fed into classifier training.
Ideally, we would like to select a time point where the neural activity can be most truthfully read out. This can be indexed as the time point that gives the peak decoding accuracy. If the state is defined by the sensory features of stimuli, we can use a classical leaveoneout crossvalidation scheme to determine the ability of classifiers to generalize to unseen data of the same stimulus type (decoding accuracy) at each time point (see Appendix 2 for its algorithm box). In essence, this crossvalidation scheme is asking whether the classifier trained on this sensory feature can be used to classify the unseen data of the same stimuli (Figure 2a, b).
After we have identified the peak time point based on the crossvalidation, we can train the decoding models based on the multivariate sensor data at this given time.
Specifically, let us denote the training data, $M$, with dimension of number of observations, $b$, by number of sensors, $s$. The labels, Y, have dimension of $b$ by 1. The aim here is to obtain the classifier weights, W, so that $\mathrm{Y}\approx \sigma \left(\mathrm{M}\mathrm{W}\right)$. $\sigma$ is the logistic sigmoid function.
Normally we apply L1 regularization on the inference of weights (we will detail the reasons in section ‘Regularization’):
Next, we translate the data at testing time (e.g. during rest), R, from sensor space to the decoded state space:
where R is the testing data, with dimension of time by sensors, and X is the decoded state space, with dimension of time by states.
States as abstractions
As well as sequences of sensory representations, it is possible to search for replay of more abstract neural representations. Such abstractions might be associated with the presented image (e.g. mammal vs. fish), in which case analysis can proceed as above by swapping categories for images (Wimmer et al., 2020). A more subtle example, however, is where the abstraction pertains to the sequence or graph itself. In space, for example, grid cells encode spatial coordinates in a fashion that abstracts over the sensory particularities of any one environment, and therefore can be reused across environments (Fyhn et al., 2007). In human studies, similar representations have been observed for the location in a sequence (Liu et al., 2019; Dehaene et al., 2015). For example, different sequences have shared representations for their second items (Figure 2). These representations also replay (Liu et al., 2019). However, to measure this replay we need to train decoders for these abstract representations. This poses a conundrum as it is not possible to elicit the abstract representations in the absence of the concrete examples (i.e., the sensory stimuli). Care is required to ensure that the decoders are sensitive to the abstract code rather than the sensory representations (see Appendix 2 for algorithm box of selecting time point for training abstract code). Useful strategies include training classifiers to generalize across stimulus sets and ensuring the classifiers are orthogonal to sensory representations (Figure 2—figure supplement 1; details in Liu et al., 2019). One way that excludes the possibility of sensory contamination is if the structural representations can be shown to sequence before the subjects have ever seen their sensory correlates (Liu et al., 2019).
TDLM can also be used iteratively to ask questions about the ordering of different types of replay events (Figure 2d). This can provide for powerful inferences about the temporal organization of replay, such as the temporal structure between sequences, or the repeating pattern of the same sequence. This more sophisticated use of TDLM merits its own consideration and is discussed in Appendix 3: Sequences of sequences.
Controlling confounds and maximizing sensitivity in sequence detection
Here, we motivate the key features of TDLM.
Temporal correlations
In standard linear methods, unmodelled temporal autocorrelation can inflate statistical scores. Techniques such as autoregressive noise modelling are commonplace to mitigate these effects (Colclough et al., 2015; Deodatis and Shinozuka, 1988). However, autocorrelation is a particular burden for analysis of sequences, where it interacts with correlations between the decoded neural variables.
To see this, consider a situation where we are testing for the sequence $X}_{i}\to {X}_{j$. TDLM is interested in the correlation between $X}_{i$ and lagged $X}_{j$ (see Equation 1). But if the $X}_{i$ and $X}_{j$ time series contain autocorrelations and are also correlated with one another, then ${X}_{i}\left(t\right)$ will necessarily be correlated with ${X}_{j}\left(t+\mathrm{\Delta}t\right)$. Hence, the analysis will spuriously report sequences.
Correlations between states are commonplace. Consider representations of visual stimuli decoded from neuroimaging data. If these states are decoded using an nway classifier (forcing exactly one state to be decoded at each moment), then the n states will be anticorrelated by construction. On the other hand, if states are each classified against a null state corresponding to the absence of stimuli, then the n states will typically be positively correlated with one another.
Notably, in our case, because these autocorrelations are identical between forward and backward sequences, one approach for removing them is to compute the difference measure described above ($D={Z}_{F}{Z}_{B}$). This works well as shown in KurthNelson et al., 2016. However, a downside is it prevents us from measuring forward and backward sequences independently. The remainder of this section considers alternative approaches that allow for independent measurement of forward and backward sequences.
Moving to multiple linear regression
The spurious correlations above are induced because ${X}_{j}\left(t\right)$ mediates a linear relationship between ${X}_{i}\left(t\right)$ and ${X}_{j}\left(t+\mathrm{\Delta}t\right)$. Hence, if we knew ${X}_{j}\left(t\right),$ we can solve the problem by simply controlling for it in a linear regression, as in Granger causality (Eichler, 2007):
Unfortunately, we do not have access to the ground truth of $X$ because these variables have been decoded noisily from brain activity. Any error in ${X}_{j}\left(t\right)$ but not ${X}_{i}\left(t\right)$ will mean that the control for autocorrelation is imperfect, leading to spurious weight on $\beta}_{ij$, and therefore spurious inference of sequences.
This problem cannot be solved without a perfect estimate of X, but it can be systematically reduced until negligible. It turns out that the necessary strategy is simple. We do not know ground truth ${X}_{j}\left(t\right)$, but what if we knew a subspace that included estimated ${X}_{j}\left(t\right)$? If we control for that whole subspace, we would be on safe ground. We can get closer and closer to this by including further coregressors that are themselves correlated with estimated ${X}_{j}\left(t\right)$ with different errors from ground truth ${X}_{j}\left(t\right)$. The most straightforward approach is to include the other states of $X\left(t\right)$, each of which has different errors, leading to the multiple linear regression of Equation 2.
Figure 3a shows this method applied to the same simulated data whose correlation structure induces false positives in the simple linear regression of Equation 1, and by the same logic, so too in crosscorrelation. This is why previous studies based on a crosscorrelation (Eldar et al., 2018; KurthNelson et al., 2016) cannot look for sequenceness in forward and backward directions separately, but have to rely on their asymmetry. The multiple regression accounts for the correlation structure of the data and allows correct inference to be made. Unlike the simple subtraction method proposed above (Figure 3a, left panel), the multiple regression permits separate inference on forward and backward sequences.
Oscillations and long timescale autocorrelations
Equation 2 performs multiple regression, regressing each ${X}_{j}\left(t+\mathrm{\Delta}t\right)$ onto each ${X}_{i}\left(t\right)$ whilst controlling for all other state estimates at time t. This method works well when spurious relationships between ${X}_{i}\left(t\right)$ and ${X}_{j}\left(t+\mathrm{\Delta}t\right)$ are mediated by the subspace spanned by the other estimated states at time t (in particular, ${X}_{j}\left(t\right)$). One situation in which this assumption might be challenged is when replay is superimposed on a large neural oscillation. For example, during rest (which is often the time of interest in replay analysis), MEG and EEG data often express a large alpha rhythm, at around 10 Hz.
If all states experience the same oscillation at the same phase, the approach correctly controls false positives. The oscillation induces a spurious correlation between ${X}_{i}\left(t\right)$ and ${X}_{j}\left(t+\mathrm{\Delta}t\right)$, but, as before, this spurious correlation is mediated by ${X}_{j}\left(t\right)$.
However, this logic fails when states experience oscillations at different phases. This scenario may occur, for example, if we assume there are travelling waves in cortex (Lubenov and Siapas, 2009; Wilson et al., 2001) because different sensors will experience the wave at different times and different states have different contributions from each sensor. MEG sensors can be seen as measures of local field potential on the scalp, which contain background neural oscillations. In humans, this is dominantly alpha during rest.
In this case, ${X}_{i}\left(t\right)$ predicts ${X}_{j}\left(t+\mathrm{\Delta}t\right)$ over and above ${X}_{j}\left(t\right)$. To see this, consider the situation where $\mathrm{\Delta}t$ is $\frac{1}{4}\tau$ (where $\tau$ is the oscillatory period) and the phase shift between ${X}_{i}\left(t\right)$ and ${X}_{j}\left(t\right)$ is pi/2. Now every peak in ${X}_{j}\left(t+\mathrm{\Delta}t\right)$ corresponds to a peak in ${X}_{i}\left(t\right)$ but a zero of ${X}_{j}\left(t\right)$.
To combat this, we can include phaseshifted versions/more time points of $X\left(t\right)$. If dominant background oscillation is at alpha frequency (e.g. 10 Hz), neural activity at time T would be correlated with activity at time T + $\tau$. We can control for that by including $X\left(t+\tau \right)$, as well as $X\left(t\right)$, in the GLM (Figure 3b). Here, $\tau$ = 100 ms if assuming the frequency is 10 Hz. Applying this method to the real MEG data during rest, we see much diminished 10 Hz oscillation in sequence detection during rest (Liu et al., 2019).
Spatial correlations
As mentioned above, correlations between decoded variables commonly occur. The simplest type of decoding model is a binary classifier that maps brain activity to one of two states. These states will, by definition, be perfectly anticorrelated. Conversely, if separate classifiers are trained to distinguish each state’s representation from baseline (‘null’) brain data, then the states will often be positively correlated with each other.
Unfortunately, positive or negative correlations between states reduce the sensitivity of sequence detection because it is difficult to distinguish between states within the sequence: collinearity impairs estimation of β in Equation 2. In Figure 3c, we show in simulation that the ability to detect real sequences goes down as the absolute value of a spatial correlation goes up. We took the absolute value here because the direction of correlation is not important, only the magnitude of the correlation matters.
Ideally, the state decoding models should be as independent as possible. We have suggested the approach of training models to discriminate one state against a mixture of other states and null data (Liu et al., 2019; KurthNelson et al., 2016). This mixture ratio can be adjusted. Adding more null data causes the states to be positively correlated with each other, while less null data leads to negative correlation. We adjust the ratio to bring the correlation between states as close to zero as possible. In Figure 3d, we show in simulation the ensuing benefit for sequence detection. An alternative method is penalizing covariance between states in the classifier’s cost function (Weinberger et al., 1988).
Regularization
A key parameter in training highdimensional decoding models is the degree of regularization. In sequence analysis, we are often interested in spontaneous reactivation of state representations, as in replay. However, our decoding models are typically trained on taskevoked data because this is the only time at which we know the ground truth of what is being represented. This poses a challenge insofar as the models best suited for decoding evoked activity at training may not be well suited for decoding spontaneous activity at subsequent tests. Regularizing the classifier (e.g. with an L1 norm) is a common technique for increasing outofsample generalization (to avoid overfitting). Here, it has the added potential benefit of reducing spatial correlation between classifier weights.
During classifier training, we can impose L1 or L2 constraints over the inference of classifier coefficients, $W.$ This amounts to finding the coefficients, $W$, that maximize the likelihood of the data observations under the constraint imposed by the regularization term. L1 regularization can be phrased as maximizing the likelihood, subject to a regularization penalty on the L1 norm of the coefficient vector:
L2 regression can be viewed as a problem of maximizing the likelihood, subject to a regularization penalty on the L2 norm of the coefficient vector:
where M is the task data, with dimension of number of observations, $b$, by number of sensors, $s$. Y is the label of observations, a vector with dimension of $b$ by 1. $\mathrm{P}\left(\mathrm{Y}\mathrm{M},\mathrm{W}\right)=\sigma \left(\mathrm{M}\mathrm{W}\right)$, and $\sigma$ is the logistic sigmoid function.
We simulate data with varying numbers of true sequences at 40 ms lag and find that the beta estimate of sequence strength at 40 ms positively relates to the number of sequences. We also find that L1 weight regularization is able to detect sequences more robustly than L2 regularization, while L2 performs no better than an unregularized model (Figure 3e). The L1 models also have much lower spatial correlation, consistent with L1 achieving better sequence detection by reducing the covariances between classifiers.
In addition to minimizing spatial correlations, as discussed above, it can also be shown that L1induced sparsity encodes weaker assumptions about background noise distributions into the classifiers as compared to L2 regularization (Higgins, 2019). This might be of special interest to researchers who want to measure replay during sleep. Here, the use of sparse classifiers is helpful as background noise distributions are likely to differ more substantially from the (awake state) training data.
Statistical inference
So far, we have shown how to quantify sequences in representational dynamics. An essential final step is assessing the statistical reliability of these quantities.
All the tests described in this section evaluate the consistency of sequences across subjects. This is important because even in the absence of any real sequences of taskrelated representations spontaneous neural activity is not random but follows repeating dynamical motifs (Vidaurre et al., 2017). Solving this problem requires a randomized mapping between the assignment of physical stimuli to task states. This can be done across subjects, permitting valid inference at the group level.
At the group level, the statistical testing problem can be complicated by the fact that sequence measures do not in general follow a known distribution. Additionally, if a statetostate lag of interest ($\mathrm{\Delta}t$) is not known a priori, it is then necessary to perform tests at multiple lags, creating a multiple comparisons problem over a set of tests with complex interdependencies. In this section, we discuss inference with these issues in mind.
Distribution of sequenceness at a single lag
If a statetostate lag of interest ($\mathrm{\Delta}t$) is known a priori, then the simplest approach is to compare the sequenceness against zero, for example, using either a signedrank test or onesample t test (assuming Gaussian distribution). Such testing assumes the data are centred on zero if there were no real sequences. We show this approach is safe in both simulation (assuming no real sequences) and real MEG data where we know there are no sequences.
In simulation, we assume no real sequences, but state time courses are autocorrelated. At this point, there is no systematic structure in the correlation between the neuronal representations of different states (see later for this consideration). We then simply select the 40 ms time lag and compare its sequenceness to zero using either a signedrank test or onesample t test. We compare falsepositive rates predicted by the statistical tests with falsepositive rates measured in simulation (Figure 4a). We see the empirical false positives are well predicted by theory.
We have tested this also on real MEG data. In Liu et al., 2019, we had one condition where we measured resting activity before the subjects saw any stimuli. Therefore, by definition these sensory stimuli could not be replayed, we can use classifiers from these stimuli (measured later) to test a falsepositive performance of statistical tests on replay. Note, in our case, that each subject saw the same stimuli in a different order. They could not know the correct stimulus order when these resting data were acquired. These data provide a valid null for testing false positives.
To obtain many examples, we randomly permute the eight different stimuli 10,000 times and then compare sequenceness (at 40 ms time lag) to zero using either a signedrank test or onesample t test across subjects. Again, predicted and measured falsepositive rates match well (Figure 4b, left panel). This holds true across all computed time lags (Figure 4b, right panel).
An alternative to making assumptions about the form of the null distribution is to compute an empirical null distribution by permutation. Given that we are interested in the sequence of states over time, one could imagine permuting either state identity or time. However, permuting time uniformly will typically lead to a very high incidence of false positives as time is not exchangeable under the null hypothesis (Figure 4c, blue colour). Permuting time destroys the temporal smoothness of neural data, creating an artificially narrow null distribution (Liu et al., 2019; KurthNelson et al., 2016). This false positive also exists if we circular shift the time dimension of each state. This is because the signal is highly nonstationary. Replays come in bursts, as recently analysed (Higgins et al., 2021), and this will break a circular shift (Harris, 2020). State permutation, on the other hand, only assumes that state identities are exchangeable under the null hypothesis, while preserving the temporal dynamics of the neural data represents a safer statistical test that is well within 5% falsepositive rate (Figure 4c, purple colour).
Correcting for multiple comparisons
If the statetostate lag of interest is not known, we have to search over a range of time lags. As a result, we then have a multiple comparison problem. Unfortunately, we do not as yet have a good parametric method to control for multiple testing over a distribution. It is possible that one could use methods that exploit the properties of Gaussian random fields, as is common in fMRI (Worsley et al., 1996), but we have not evaluated this approach. Alternatively, we could use Bonferroni correction, but the assumption that each computed time lag is independent is likely false and overly conservative.
We recommend relying on state identitybased permutation. To control for the familywise error rate (assuming $\alpha =0.05$), we want to ensure there is a 5% probability of getting the tested sequenceness strength ($S}_{test$) or bigger by chance in *any* of the multiple tests. We therefore need to know what fraction of the permutations gives $S}_{test$ or bigger in any of their multiple tests. If any of the sequenceness scores in each permutation exceed $S}_{test$, then the maximum sequenceness score in the permutation will exceed $S}_{test$, so it is sufficient to test against the maximum sequenceness score in the permutation. The null distribution is therefore formed by first taking the peak of sequenceness across all computed time lags of each permutation. This is the same approach as used for familywise error correction for permutations tests in fMRI data (Nichols, 2012), and in our case it is shown to behave well statistically (Figure 4d).
What to permute
We can choose which permutations to include in the null distribution. For example, consider a task with two sequences, $Seq1:A\to B\to C\to D$ and $Seq2:E\to F\to G\to H$. We can form the null distribution either by permuting all states (e.g. one permutation might be E $\to F\to A\to B$, H $\to C\to E\to D$), as implemented in KurthNelson et al., 2016. Alternatively, we can form a null distribution which only includes transitions between states in different sequences (e.g. one permutation might be D $\to G\to A\to E$, H $\to C\to F\to B$), as implemented in Liu et al., 2019. In each case, permutations are equivalent to the test data under the assumption that states are exchangeable between positions and sequences. The first case has the advantage of many more possible permutations, and therefore may make more precise inferential statements in the tail. The second case may be more sensitive in the presence of a signal as the null distribution is guaranteed not to include permutations which share any transitions with the test data (Figure 4e). For example, in Figure 4e, the blue swaps are the permutations that only exchange state identity across sequences, as in Liu et al., 2019, while the red swaps are the permutations that permit all possible state identity permutations, as in KurthNelson et al., 2016. Note that there are many more different state permutations in red swaps than in blue swaps. We can make different levels of inferences by controlling the range of the null distributions in the state permutation tests.
Cautionary note on exchangeability of states after training
Until now, all nonparametric tests have assumed that state identity is exchangeable under the null hypothesis. Under this assumption, it is safe to perform state identitybased permutation tests on $Z}_{F$ and $Z}_{B$. In this section, we consider a situation where this assumption is broken.
More specifically, take a situation where the neural representation of states $A$ and $B$ is related in a systematic way or, in other words, the classifier on state $A$ is confused with state $B$, and we are testing sequenceness of $A\to B$. Crucially, to break the exchangeability assumption, representations of $A$ and $B$ have to be systematically more related than other states, for example, $A$ and $D$. This cannot be caused by lowlevel factors (e.g. visual similarity) because states are counterbalanced across subjects, so any such bias would cancel at the population level. However, such a bias might be induced by task training.
In this situation, it is, in principle, possible to detect sequenceness of $A\to B$ even in the absence of real sequences. In the autocorrelation section above, we introduced protections against the interaction of state correlation with autocorrelation. These protections may fail in the current case as we cannot use other states as controls (as we do in the multiple linear regression) because $A$ has systematic relationship with $B$, but not other states. State permutation will not protect us from this problem because state identity is no longer exchangeable.
Is this a substantive problem? After extensive training, behavioural pairing of stimuli can indeed result in increased neuronal similarity (Messinger et al., 2001; Sakai and Miyashita, 1991). These early papers involved long training in monkeys. More recent studies have shown induced representational overlap in human imaging within a single day (KurthNelson et al., 2015; Barron et al., 2013; Wimmer and Shohamy, 2012). However, when analysed across the whole brain, such representational changes tend to be localized to discrete brain regions (Schapiro et al., 2013; Garvert et al., 2017), and as a consequence may have limited impact on wholebrain decodeability.
Whilst we have not yet found a simulation regime in which false positives are found (as opposed to false negatives), there exists a danger in cases where, by experimental design, the states are not exchangeable.
Source localization
Uncovering temporal structure of neural representation is important, but it is also of interest to ask where in the brain a sequence is generated. Rodent electrophysiology research focuses mainly on the hippocampus when searching for replay. One advantage of wholebrain noninvasive neuroimaging over electrophysiology (despite many known disadvantages, including poor anatomical precision, low signalnoise ratio) is in its ability to examine neural activity in multiple other brain regions. Ideally, we would like a method that is capable of localizing sequences of more abstract representation in brain regions beyond hippocampus (Liu et al., 2019).
We want to identify the time when a given sequence is very likely to unfold, so we can construct averages of independent data over these times. We achieve this by transforming from the space of original states, $X}_{orig$, to the space of sequence events, $X}_{seq$. First, based on the transition of interest, $T$, we can obtain the projection matrix, $X}_{proj$:
If we know the state lag within sequence, $\mathrm{\Delta}t$ (e.g. the time lag give rise to the strongest sequenceness), or have it a priori, we can obtain the timelagged matrix, $X}_{lag$:
Then, we obtain state space with sequence event as states by elementwise multiply $X}_{proj$ and $X}_{lag$:
Each element in $X}_{seq$ indicates the strength of a (pairwise) sequence at a given moment in time. At this stage, $X}_{seq$ is a matrix with number of time points as rows (same as $X}_{orig$), and with number of pairwise sequences (e.g. A>B; B>C; etc.) as columns. Now on this matrix, $X}_{seq$, we can either look for sequences of sequences (see Appendix 3), or sum over columns (i.e. average over pairwise sequence events), and obtain a score at each time point reflecting how likely it is to be a sequence member (Figure 5a).
We can use this score to construct averages of other variables that might covary with replay. For example, if we choose time points when this score is high (e.g. 95th percentile) after being low for the previous 100 ms and construct an average timefrequency plot of the raw MEG data aligned to these times, we can reconstruct a timefrequency plot that is, on average, associated with replay onset (Figure 5b). Note that although this method assigns a score for individual replay events as an intermediary variable, it results in an average measure across many events.
This approach is similar to spiketriggered averaging (Sirota et al., 2008; Buzsáki et al., 1983). Applying this to real MEG data during rest, we can detect increased hippocampal power at 120–150 Hz, at replay onset (Figure 5b, c). Source reconstruction in the current analysis was performed using linearly constrained minimum variance (LCMV) beamforming, a common method for MEG source localization. This is known to suffer from distal correlated sources (Hincapié et al., 2017). A better method may be Empirical Bayesian Beamfomer for accommodating correlated neural source as a priori (O'Neill, 2021).
TDLM for rodent replay
So far, we have introduced TDLM in the context of analysing human MEG data. Relatedly, its application on human EEG data was also explored (Appendix 4: Apply TDLM to human wholebrain EEG data). Historically, replaylike phenomena have been predominantly studied in rodents with electrophysiology recordings in the hippocampal formation (Davidson et al., 2009; Grosmark and Buzsáki, 2016; Tingley and Peyrache, 2020). This raises interesting questions: how does TDLM compare to the existing rodent replay methods, can TDLM be applied to spiking data for detecting rodent replays, and what are the pros and cons? In this section, we address these questions.
Generality of graph vs. linebased replay methods
Given that TDLM works on the decoded state space, rather than sensor (with analogy to cell) level, we compared TDLM to rodent methods that work on the posterior decoded position (i.e., state) space, normally referred to as Bayesianbased methods (Tingley and Peyrache, 2020). (Note that these methods are typically Bayesian in how position is decoded from spikes [Zhang et al., 1998] but not in how replay is measured from decoded position.) Two commonly used methods are Radon transform (Davidson et al., 2009) and linear weighted correlation (Grosmark and Buzsáki, 2016).
Both methods proceed by forming a 2D matrix, where one dimension is the decoded state (e.g. positions on a linear track), and the other dimension is time (note that the decoded state is embedded in 1D). The methods then try to discover if an ordered line is a good description of the relationship between state and (parametric) time. For this reason, we call this family of approaches ‘line search’.
The radon method uses a discrete Radon transform to find the best line in the 2D matrix (Toft, 1996) and then evaluates the radon integral, which will be high if the data lie on a line (Figure 6a). It compares this to permutations of the same data where the states are reordered (Tingley and Peyrache, 2020). The linear weighted correlation method computes the average correlation between the time and estimated position in the 1D embedding (Figure 6b). The correlation is nonzero provided there is an orderly reactivation along the state dimension.
Both methods are applied to decoded positions, where they are sorted based on the order in a linearized state space. TDLM also works on the decoded position space, but instead of directly measuring the relationship between position and time, it measures the transition strength for each possible state to state transitions (Figure 6c).
This is a key difference between TDLM and these popular existing techniques. To reiterate, the latter rely on a continuous parametric embedding of behavioural states and time. TDLM is fundamentally different as it works on a graph and examines the statistical likelihood of some transitions happening more than others. This is therefore a more general approach that can be used for sequences drawn from any graph (e.g. 2D maze, Figure 6d), not just graphs with simple embeddings (like a linear track). For example, in a nonspatial decisionmaking task (KurthNelson et al., 2016), all states lead to two different states and themselves can be arrived at from two other different states (Figure 6e). Existing ‘line search’ methods will not work because there is no linear relationship between time and states (Figure 6f).
Multiscale TDLM
While continuous spaces can be analysed in TDLM by simply chunking the space into discrete states, TDLM in its original form may potentially be less sensitive for such analyses than techniques with builtin assumptions about the spatial layout of the state space, such as the linear relationship between time and reactivated states (Appendix 5 ‘Less sensitivity of TDLM to skipping sequences’). In essence, because TDLM works on a graph, it has no information about the Euclidean nature of the state space, while techniques that make assumptions about the linear relationship between space and time benefit from these assumptions. For example, detecting state 1 then state 5 then state 10 counts as replay in these techniques, but not in TDLM.
However, TDLM can be extended to address this problem. For continuous state spaces, we first need to decide how to best discretize the space. If we choose a large scale, we will miss replays that occur predominantly within a spatial bin. If we choose a small scale, we will miss transitions that jump spatial bins. A simple solution is to apply TDLM at multiple different scales and take an (varianceweighted) average of the sequenceness measures across different scales. For example, when measuring replay at the same speed, we can average events that travel 5 cm in 10 ms together with events that travel 10 cm in 20 ms.
Specifically, to perform multiscale TDLM, we discretize position bins at multiple widths. This generates rate maps at multiple scales (e.g. 5 cm, 10 cm, 20 cm, 40 cm), and hence a multiscale state space. For each replay speed of interest, we apply TDLM separately at each scale, and then take a varianceweighted average of replay estimates over all scales.
where $\beta}_{i$ is the sequence strength of given speed (i.e. statetostate lag) measured at scale $i$, $V}_{i$ is the variance of its $\beta}_{i$ estimator, and $n$ is the number of scales. In the end, statistical testing is performed on the precision weighted averaged sequence strength, $\beta}_{M$, in the same way as we do in the original TDLM.
It is easy to see why this addresses the potential concerns raised above as some scales will capture the 1 > 2 > 3 transitions, whilst others will capture the 1 > 10 > 20 transitions: because the underlying space is continuous, we can average results of the same replay speed together, and this will reinstate the Euclidean assumptions.
Applying multiscale TDLM to real rodent data (place cells in CA1)
We demonstrate the applicability of multiscale TDLM by analysing CA1 place cell spiking data from Ólafsdóttir et al., 2016. In Ólafsdóttir et al., 2016, rats ran multiple laps on a 600 cm Z maze and were then placed in a rest enclosure for 1.5 hr (Figure 7a). The Z maze consists of three tracks, with its ends and corners baited with sweetened rice to encourage running from one end to the other. The animal’s running trajectory was linearized, dwell time and spikes were binned into 2 cm bins and smoothed with a Gaussian kernel (σ = 5 bins). We generated rate maps separately for inbound (track 1 > track 2 > track 3) and outbound (track 3 > track 2 > track 1) running (see details in section ‘Rodent replay dataset’).
As in Ólafsdóttir et al., 2016, cells recorded in CA1 were classified as place cells if their peak firing field during track running was above 1 Hz with a width of at least 20 cm (see an example in Figure 7b). The candidate replay events were identified based on multiunit (MU) activity from place cells during rest time. Periods exceeding the mean rate by three standard deviations of MU activity were identified as possible replay events. Events less than 40 ms long, or which included activity from less than 15% of the recorded place cell ensemble, were rejected (see an example of putative replay event in Figure 7c), and the remaining events were labelled putative replay events.
We analysed data from one full recording session (track running for generating rate map, postrunning resting for replay detection) from Rat 2192 reported in Ólafsdóttir et al., 2016. Following the procedure described above, we identified 58 place cells and 1183 putative replay events. Replay analysis was then performed on the putative replay events, separately for inbound and outbound rate maps given the same position has a different decoded state depending on whether it was during an outbound or inbound run.
A forward sequence is characterized by states from the outbound map occurring in the outbound order or states from the inbound map occurring in the inbound order. Conversely, a backward sequence is when states from the inbound map occur in the outbound order or states from the outbound map occur in the inbound order. Candidate events were decoded based on a rate map, transforming the ncells * ntime to nstates * ntime. Each entry in this state space represents the posterior probability of being in this position at a given time. Replay analysis was performed solely on this decoded state space.
Note that TDLM is applied directly to the concatenated rather than individual replay events. This is because TDLM is a linear modelling framework. Applying TDLM on each single replay event and then averaging the beta estimates (appropriately weighted by the variances) is equivalent to running TDLM once on the concatenated replay events. It quantifies the average amount of replay across many events, which is different compared to existing replay methods that focus on single replay events. Because TDLM addresses statistical questions in linear modelling, it does not require secondary statistics to ask whether the ‘counts’ of individual events are more likely than chance or more likely in one situation than another.
During the whole sleep period, TDLM identified a significant forward sequence for the outbound map with a wide speed range around from 1 to 10 m/s (Figure 7d, left panel), consistent with recent findings from Denovellis, 2020 on varying replay speed (similar results were obtained for inbound map, not shown here for simplicity). In our analysis, the fastest speed is up to 10 m/s, which is around 20× faster than its free running speed, representing approximately half a trackarm in a typical replay event, consistent with previous work (Lee and Wilson, 2002; Davidson et al., 2009; Karlsson and Frank, 2009; Nádasdy et al., 1999).
Secondorder inferences
As pointed out by van der Meer et al., 2020, there are two types of statistical questions: a ‘firstorder’ sequence question, which concerns whether an observed sequenceness is different from random (i.e. do replays exist?); and a ‘secondorder’ question, which requires a comparison of sequenceness across conditions (i.e. do replays differ?). Because it is embedded in a linear regression framework, TDLM is ideally placed to address such questions. There are two ways of asking such questions in linear modelling: contrasts and interactions. We explain them with examples here.
Linear contrasts
After fitting a regression model, resulting in coefficients for different regressors, we can test hypotheses about these coefficients by constructing linear combinations of the coefficients that would be zero under the null hypothesis. For example, if we want to test whether effect A is greater than effect B, then we can compute the linear contrast A – B (which would be zero under the null hypothesis) and perform statistics on this new measure. If we want to test whether replay increases linearly over five conditions [A, B, C, D, E], we can compute the linear contrast −2*A – B + 0*C + D + 2*E (which would be zero under the null hypothesis) and perform statistics on this new measure. Statistics (within or across animals) can operate with these contrasts in exactly the same way as with the original coefficients from the linear model. Here, we demonstrate this by showing in our example dataset that there was a greater preponderance for forward than backward replay. We construct the contrast (forwards – backwards) and test it against zero using a multiplecomparisoncontrolled permutation test (Figure 7d, right panel, pink line). By constructing a different contrast (forwards + backwards), we can also show that the total replay strength across both types of replays was significant (Figure 7d, right panel, green line).
Interactions
A second method for performing secondorder tests is to introduce them into the linear regression as interaction terms, and then perform inference on the regression weights for these interactions. This means changing Equation 2 to include new regressors. For example, if interested in how reactivations change over time, one could build new regressors ($Xtim{e}_{k}\left(t\right)$), obtained by elementwise multiplying the state regressor, e.g. ${X}_{k}\left(t\right)$ with time indices ($Xtim{e}_{k}\left(t\right)={X}_{k}\left(t\right).\ast \mathrm{t}\mathrm{i}\mathrm{m}\mathrm{e}$). Now the firstlevel GLM is constructed as (omitting residual term ε, same as Equation 2):
Example regressors in the design matrix can be seen in Figure 7e. The first regressor, ${X}_{k}\left(t\right)$, is one of the state reactivation regressors used in standard TDLM. The second regressor, $Xtim{e}_{k}\left(t\right)$, is the same as ${X}_{k}\left(t\right)$ multiplied by time. (There are k regressors of each form in regressor matrix.) Here, we chose to demean the time regressor before the interaction, so the early half of the regressor is negative and the late half is positive. This has no effect on the regression coefficients of the interaction term, but, by rendering the interaction approximately orthogonal to ${X}_{k}\left(t\right)$, it makes it possible to estimate the main effect and the interaction in the same regression.
Note that the interaction regressor is orthogonal to the state reactivation regressor, so it will have no effect on the firstorder regression terms. If we include such regressors for all states, then we can get two measures for each replay direction (sequence effect and time effect). The first tells us the average amount of replay throughout the sleep period (first order). The second tells us whether replay increases or decreases as time progresses through the sleep period (second order).
Orthogonal tests in regions of interest
When examining forward–backward replay above, we did separate inference for each replay speed, and then performed multiple comparison testing using the maxpermutation method (see section 'Statistical inference'). We now take the opportunity to introduce another method common in human literature.
To avoid such multiple comparison correction, it is possible to select a ‘region of interest’ (ROI), average the measure in question over that ROI, and perform inference on this average measure. Because we are now only testing one measure, there is no multiple comparison problem. Critical in this endeavour, however, is that we do not use the measure under test or anything that correlates with that measure as a means to define the ROI. This will induce a selection bias (Kriegeskorte et al., 2009). In the example in Figure 7f, we have used the average replay (forwards + backwards) to select the ROI. We are interested in speeds in which there is detectable replay on average across both directions and the whole sleep period (Figure 7d, right panel, green shaded area). If we select our ROI in this way, we cannot perform unbiased inference on firstorder forward or backward replay because forward and backward regressors correlate with their sum (Figure 7f, statistical inference in the red rectangle is biased). However, we can perform unbiased inference on several secondorder effects (Figure 7f, statistical inference in the green rectangle). We can test (forwards – backwards) assuming the difference of terms is orthogonal to their sum (as it is in this case). Further, we can test any interaction with time because the ROI is defined on the average over time and the interaction looks for differences as a function of time. When we perform these tests in our example dataset (Figure 7f, green rectangle), we confirm that there are more forward than backward replay on average. We further show that forward replay is decreasing with time during sleep, and that backward replay is increasing with time. Their difference (forwards – backwards) is also significant.
In addition to the timevarying effect, we can also test the spatial modulation effect, that is, how replay strength (at the same replay speed) changes as a function of its spatial content. For example, is replay stronger for transitions in the start of track compared to the end of the track? As an illustrative example, we have used the same ROI defined above and test the spatial modulation effect on forward replay. Note that this test of spatial modulation effect is also unbiased from the overall strength of forward replay, and thereby no selection bias in this ROI, as well.
For visualization purposes, we have first plotted the estimated strength for each pairwise forward sequence (Figure 8a), separately within each scale (from 1 to 4, with increasing spatial scales). The pairwise sequences are ordered from the start of the track to the end of the track. Alongside the pairwise sequence plot, we have plotted the mean replay strength over all possible pairwise transitions (in red) in comparison to the mean of all control transitions (in grey; as expected, they are all around 0). Note that we cannot perform inference on the difference between the red and grey bars here because they have been selected from a biased ROI. It is simply for illustration purposes. We have therefore put them in red squares to match Figure 7f.
To formally test the spatial modulation effect, we can use the exact same approach as outlined above in section 'Linear contrasts'. Here, we test a linear increase or decrease across different transitions. We take the linear contrast weight vector, $c$ ([2,1,0,1,2] for the largest scale, [3:3] for the next scale, [5:5] for the next scale, and [12:12] for the smallest scale), and multiply these by the beta estimates of the transitions:
If this new measure, $contrast$, is different from zero, then there is a linear increase/decrease from one end of the track to the other. Note that this new contrast is no longer biased by the ROI selection as each transition contributed equally to the ROI selection, but we are now comparing between transitions. Inference on this contrast is therefore valid. We have therefore put them in green boxes to match Figure 7f (Figure 8b, c).
Within the larger two scales, these contrasts are significantly negative (tested against permutations in exactly the same way as the ‘mean’ contrasts). Since we are still in the linear domain, we can now just average these contrasts across the four scales and get a single measure for spatial modulation of replay. This average measure is significantly negative (Figure 8b). Hence, on average, forward replay is stronger at the beginning of the track.
We can do the same thing for backward replay. We found an opposite pattern, that is, strength of backward replay is stronger at the end of the track, and similarly, it is not significant in the smallest scale and becomes significant in the largest scale, and also significant on average across all scales (Figure 8c). Again, since we are in the linear domain, we can further contrast these contrasts, asking if this effect is different for forward and backward replay. We found that the difference is indeed significant (Figure 8d). This set of results is consistent with previous rodent literature (Diba and Buzsáki, 2007). Note that we would like to stress again that this analysis is not about a single replay event but is testing for average differences across all replay events.
Notably, extra care needs to be exercised for secondorder questions (compared to firstorder ones). Problems can emerge due to biases in secondorder inference, such as in behavioural sampling (e.g. track 1 may be experienced more than track 2 during navigation; this creates a bias when evaluating replay in tack 1 vs. track 2 during rest). Such issues are real but can be finessed by experimental design considerations of a sort commonly applied in the human literature. For example:
Ensure that biases that might occur within subjects will not occur consistently in the same direction across subjects (e.g. by randomizing stimuli across participants).
Compare across conditions in each subject.
Perform a random effects inference across the population by comparing against the betweensubject variance.
Such approaches are not yet common in rodent electrophysiology and may not be practical in some instances. In such cases, it remains important to be vigilant to guard against these biases with TDLM as with other techniques. If these approaches are feasible, the machinery for computing secondorder inferences is straightforward in a linear framework like TDLM.
Generality of TDLM
We have now discussed the applicability of TDLM in relation to human MEG, as well as in rodent electrophysiology (with comparisons to standard replay detection methods). A preliminary attempt at detecting replay in human EEG is also shown in Appendix 4. We believe that this establishes TDLM as a domaingeneral sequence analysis method: TDLM works at the level of decoded state space, rather than the sensor/cell level of the data. It can be applied to a wide range of data types and settings in both humans and rodents, stimulating crossfertilization across disciplines. It is based on the GLM framework, and this lends it flexibility for regressing out potential confounds while offering an intuitive understanding of the overall approach.
In this section, we discuss the generality of TDLM.
States
TDLM assesses the statistical likelihood of certain transitions on a graph. In its original form, TDLM works on discrete states (i.e. nodes in the graph). Continuous spaces can be incorporated by chunking them into discrete spaces. Furthermore, by averaging the same replay speeds measured at multiple scales of discretization (see section ‘TDLM for rodent replay’), the statistical benefits of an assumption of a Euclidean geometry can be recovered.
Time length
The longer the time length, the more accurate the estimates in TDLM. This is because TDLM assesses sequence evidence based on a GLM framework, where time length is the sample size. Higher sample size will lead to more accurate estimates. In the case of rodent analysis, we recommend applying TDLM to aggregated replay events rather than to a single event because this results in (1) more time samples for estimation and (2) more activated states in the analysis time framework. Unlike other techniques which search for a single replay in a single event, this aggregation can be implemented without losing generality as TDLM is able to handle multiple sequences in the same data with respect to different directions, contents, or speeds. Furthermore, by aggregating linearly across all replay events of the same condition, it provides a natural measure for comparing replay strength, speed, and direction across different experimental conditions.
TDLM has already proved important in human experiments where complex state spaces have been used (Wimmer et al., 2020; Liu et al., 2019; Liu et al., 2021a; KurthNelson et al., 2016). We expect this generality will also be important as rodent replay experiments move beyond 1D tracks, for example, to foraging in 2D, or in complex mazes.
Discussion
TDLM is a domaingeneral analysis framework for capturing sequence regularity of neural representations. It is developed on human neuroimaging data and can be extended to other data sources, including rodent electrophysiology recordings. It offers hope for crossspecies investigations on replay (or neural sequences in general) and potentially enable studies of complex tasks in both humans and animals.
TDLM adds a new analysis toolkit to the replay field. It is especially suited for summarizing replay strength across many events, comparing replay strength between conditions, and analysing replay strength in complex behavioural paradigms. Its linear modelling nature makes it amenable to standard statistical tests and thereby allows wide use across tasks, modalities, and species. Unlike alternative tools, we have not shown TDLM applied to individual replay events.
The temporal dynamics of neural states have been studied previously with MEG (Vidaurre et al., 2017; Baker et al., 2014). Normally such states are defined by common physiological features (e.g. frequency, functional connectivity) during rest and termed resting state networks (e.g. default mode network [Raichle et al., 2001]). However, these approaches remain agnostic about the content of neural activity. The ability to study the temporal dynamics of representational content permits richer investigations into cognitive processes (Higgins et al., 2021) as neural states can be analysed in the context of their roles with respect to a range of cognitive tasks.
Reactivation of neural representations has also been studied previously (Tambini and Davachi, 2019) using approaches similar to the decoding step of TDLM or multivariate pattern analysis (MVPA) (Norman et al., 2006). This has proven fruitful in revealing mnemonic functions (Wimmer and Shohamy, 2012), understanding sleep (Lewis and Durrant, 2011), and decisionmaking (Schuck et al., 2016). However, classification alone does not reveal the rich temporal structures of reactivation dynamics. We have described the application of TDLM mostly during offtask state in this paper. The very same analysis can be applied to ontask data to test for cued sequential reactivation (Wimmer et al., 2020) or sequential decisionmaking (Eldar et al., 2020). For example, the ability to detect sequences ontask allows us to tease apart clustered from sequential reactivation, where this may be important for dissociating decision strategies (Eldar et al., 2018) and their individual differences (Wimmer et al., 2020; Eldar et al., 2020). TDLM, therefore, may allow testing of neural predictions from process models such as reinforcement learning during task performance (Dayan and Daw, 2008), which have proved hard to probe previously (Wimmer et al., 2020; Nour et al., 2021; Liu et al., 2019; Liu et al., 2021a).
In the human neuroimaging domain, we have mainly discussed the application of TDLM with respect to MEG data. In Appendix 4, we show that TDLM also works well with EEG data. This is not surprising given EEG and MEG are effectively measuring the same neural signature, namely local field potential (or associated magnetic field) on the scalp. We do not have suitable fMRI data to test TDLM. However, related work has suggested that it might be possible to measure sequential reactivation using fMRI (Schuck and Niv, 2019), but particular methodological caveats need to be considered (e.g. a bias from last events due to slow haemodynamic response) (Wittkuhn and Schuck, 2021). We believe that TDLM can deal with this, given it models out nonspecific transitions, although further work is needed. In future, we consider it will be useful to combine the high temporal resolution available in MEG/EEG and the spatial precision available in fMRI to probe regionspecific sequential computations.
In the rodent electrophysiology domain, we show what TDLM (its multiscale version) has to offer uniquely compared to existing rodent replay methods. Most importantly, TDLM works on an arbitrary graph and its generality makes replay studies in complex mazes possible. Its linear framework makes the assessment of timevarying effect on replay (Figure 7) or other secondorder sequence questions straightforward. In future work, a promising direction will be to further separate process noise (e.g. intrinsic variability within sequences) and measurement noise (e.g. noise in MEG recording). This might be achieved by building latent statespace models as have been explored recently in rodent community (Maboudi et al., 2018; Denovellis, 2020).
Together, we believe that TDLM opens doors for novel investigations of human cognition, including language, sequential planning, and inference in nonspatial cognitive tasks (Eldar et al., 2018; KurthNelson et al., 2016), as well as complicated tasks in rodents, for example, forging in 2D mazes. TDLM is particularly suited to test specific neural predictions from process models, such as reinforcement learning. We hope that TDLM can promote an acrossspecies synthesis between experimental and theoretical neuroscience and, in so doing, shed novel light on neural computation.
Materials and methods
Simulating MEG data
We simulate the data so as to be akin to human MEG.
Task data for obtaining state patterns
Request a detailed protocolWe generate ground truth multivariate patterns (over sensors) of states. We then add random Gaussian noise on the ground truth state patterns to form the task data. We train a logistic regression classifier on the task data so as to obtain a decoding model for each of the state patterns. Later we use this decoding model to transform the restingstate data from sensor space (with dimension of time by sensors) to the state space (with dimension of time by states).
Rest data for detecting sequences
Request a detailed protocolFirst, to imitate temporal autocorrelations and spatial correlations commonly seen in human neuroimaging data, we generate the rest data using an autoaggressive model with multivariate (over sensors) Gaussian noise and add a dependence among sensors. In some simulations, we also add a rhythmic oscillation (e.g. 10 Hz).
Second, we inject a sequence of state patterns in the rest data. The sequences follow the ground truth of state transitions of interest. The statetostate time lag is assumed to follow a gamma distribution. We vary the number of sequences to be injected in the rest data to control the strength of sequences.
Lastly, we project the rest data to the decoding model of states obtained from the task data. TDLM will then work on the decoded state space.
An example of the MATLAB implementation is called ‘Simulate_Replay’ from the Github link: https://github.com/yunzheliu/TDLM (copy archived at swh:1:rev:015c0e90a14d3786e071345760b97141700d6c85), Liu, 2021b.
Human replay dataset
Task design
Request a detailed protocolParticipants were required to perform a series of tasks with concurrent MEG scanning (see details in Liu et al., 2019). The functional localizer task was performed before the main task and was used to train a sensory code for eight distinct objects. Note that the participants were provided with no structural information at the time of the localizer. These decoding models, trained on the functional localizer task, capture a sensorylevel neural representation of stimuli (i.e. stimulus code). Following that, participants were presented with the stimuli and were required to unscramble the ‘visual sequence’ into a correct order, that is, the ‘unscrambled sequence’ based on a structural template they had learned the day before. After that, participants were given a rest for 5 min. At the end, stimuli were presented again in random order, and participants were asked to identify the true sequence identity and structural position of the stimuli. Data in this session are used to train a structural code (position and sequence) for the objects.
MEG data acquisition, preprocessing, and source reconstruction
Request a detailed protocolWe follow the same procedure that has been reported in Liu et al., 2019. We have copied it here for references.
'MEG was recorded continuously at 600 samples/s using a wholehead 275channel axial gradiometer system (CTF Omega, VSM MedTech), while participants sat upright inside the scanner. Participants made responses on a button box using four fingers as they found most comfortable. The data were resampled from 600 to 100 Hz to conserve processing time and improve signaltonoise ratio. All data were then highpassfiltered at 0.5 Hz using a firstorder IIR filter to remove slow drift. After that, the raw MEG data were visually inspected, and excessively noisy segments and sensors were removed before independent component analysis (ICA). An ICA (FastICA, http://research.ics.aalto.fi/ica/fastica) was used to decompose the sensor data for each session into 150 temporally independent components and associated sensor topographies. Artefact components were classified by combined inspection of the spatial topography, time course, kurtosis of the time course, and frequency spectrum for all components. Eyeblink artefacts exhibited high kurtosis (>20), a repeated pattern in the time course and consistent spatial topographies. Mains interference had extremely low kurtosis and a frequency spectrum dominated by 50 Hz line noise. Artefacts were then rejected by subtracting them out of the data. All subsequent analyses were performed directly on the filtered, cleaned MEG signal, in units of femtotesla.
All source reconstruction was performed in SPM12 and FieldTrip. Forward models were generated on the basis of a single shell using superposition of basis functions that approximately corresponded to the plane tangential to the MEG sensor array. LCMV beamforming (Van Veen et al., 1997) was used to reconstruct the epoched MEG data to a grid in MNI space, sampled with a grid step of 5 mm. The sensor covariance matrix for beamforming was estimated using data in either broadband power across all frequencies or restricted to ripple frequency (120–150 Hz). The baseline activity was the mean neural activity averaged over −100 ms to −50 ms relative to sequence onset. All nonartefactual trials were baseline corrected at source level. We looked at the main effect of the initialization of sequence. Nonparametric permutation tests were performed on the volume of interest to compute the multiple comparison (wholebrain corrected) pvalues of clusters above 10 voxels, with the null distribution for this cluster size being computed using permutations (n = 5000 permutations)'.
Rodent replay dataset
Data description
Request a detailed protocolThis data is from Ólafsdóttir et al., 2016. We analysed one full recording session (track running for generating rate map, postrunning resting for replay detection) from Rat 2192.
Task description
Request a detailed protocolIn Ólafsdóttir et al., 2016, rats ran multiple laps on a Z maze and were then placed in a rest enclosure. The two parallel sections of the Z (190 cm each) were connected by a diagonal section (220 cm). Animals were pretrained to run on the track. At the recording session, rats were placed at one end of the Ztrack. The ends and corners of the track were baited with sweetened rice to encourage running from one end to the other. In each session, rats completed 20 full laps (30–45 min). Following the track session, rats were placed in the rest enclosure for 1.5 hr.
Preprocessing
Request a detailed protocolFollowing Ólafsdóttir et al., 2016, when generating rate maps we excluded data from both the ends and corners because the animals regularly performed nonperambulatory behaviours there. Periods when running speed was less than 3 cm/s were also excluded. Running trajectories were then linearized, dwell time and spikes were binned into 2 cm bins and smoothed with a Gaussian kernel (σ = 5 bins). We generated rate maps separately for inbound (track 1 > track 2 > track 3) and outbound (track 3 > track 2 > track 1) running.
As in Ólafsdóttir et al., 2016, cells recorded in CA1 were classified as place cells if their peak firing field during track running was above 1 Hz and at least 20 cm wide. The candidate replay events were identified based on MU activity from place cells during rest time. Only periods exceeding the mean rate by three standard deviations of MU activity were identified as putative replay events. Events less than 40 ms long or which included activity from less than 15% of the recorded place cell ensemble were rejected.
We analysed data from one full recording session (track running for generating rate map, postrunning resting for replay detection) of Rat 2192 reported in Ólafsdóttir et al., 2016. Following the procedure described above, we have identified 58 place cells and 1183 putative replay events. Replay analysis was then performed on the putative replay events, separately for inbound and outbound rate maps.
Code availability
Request a detailed protocolSource code of TDLM can be found at https://github.com/yunzheliu/TDLM.
Data availability
Request a detailed protocolData are also available at https://github.com/yunzheliu/TDLM.
Appendix 1
Multistep sequences
TDLM can be used iteratively. One extension of TDLM of particular interest is: multistep sequences. It asks about a consistent regularity among multiple states.
So far, we introduced methods for quantifying the extent to which the statetostate transition structure in neural data matches a hypothesized taskrelated transition matrix. An important limitation of these methods is that they are blind to hysteresis in transitions. In other words, they cannot tell us about multistep sequences. In this section, we describe a methodological extension to measure evidence for sequences comprising more than one transition: for example, $A\to B\to C$.
The key ingredient is controlling for shorter subsequences (e.g. $A\to B$ and $B\to C$) in order to find evidence unique to a multistep sequence of interest.
Assuming constant statetostate time lag, $\mathrm{\Delta}t$, between A and B, and between B and C. We can create a new state space AB by shifting B up $A\to B$, and elementwise multiply it with state A. This new state AB measures the reactivation strength of $A\to B$, with time lag $\mathrm{\Delta}t$. In the same way, we can create a new state space, BC, AC, etc. Then we can construct the same firstlevel GLM on the new state space. For example, if we want to determine the evidence of $A\to B\to C$ at time lag $\mathrm{\Delta}t$, we can regress AB onto state time course C at each $\mathrm{\Delta}t$ (Equation 1). But we want to know the unique contribution of AB to C. More specifically, we want to test if the evidence of $A\to B\to C$ is stronger than $X\to B\to C$, where X is any other state but not A. Therefore, similar to Equation 2, we need to control CB, DB, when looking for evidence of AB of C. Applying this method, we show that TDLM successfully avoids false positives arising out of strong evidence for shorter length (see simulation results in Appendix 1—figure 1a, and see results obtained on human neuroimaging data in Appendix 1—figure 1b). This process can be generalized to any number of steps.
TDLM, in its current form, assumes a constant intrasequence statetostate time lag. If there is a variability between state transitions TDLM can still cope, but not very elegantly. Assume there is a threestate sequence, $A\to B\to C$, with intrasequence variance. TDLM will need to test all possible combinations of statetostate time lags in $A\to B$ and $B\to C$. If there are $n$ number of time lag of interest in either of the two transitions, TDLM will then have to test $n}^{2$ possible time lag combinations. This is a large search space and one that increases exponentially as a function of the length of a sequence.
We note that this analysis is different from a typical rodent replay analysis which assesses the overall evidence for a sequence length (Davidson et al., 2009; Grosmark and Buzsáki, 2016). TDLM asks if there is more evidence for A > B > C, above and beyond evidence for B > C, for example. If the main question of interest is ‘do we have evidence of A > B > C in general’, as normally is the case in the rodent replay analysis (Davidson et al., 2009; Grosmark and Buzsáki, 2016), we should not control for shorter lengths. Instead, we can simply average the evidence together, as implemented in KurthNelson et al., 2016.
Appendix 2
Pseudocode of sensory code and abstract code crossvalidations
Appendix 3
Sequences of sequences
We have detailed the use of either sensory or abstract representations as the states in TDLM. We now take a step further and use sequences themselves as states. Using this kind of hierarchical analysis, we can search for sequences of sequences. This is useful because it can reveal temporal structure not only within sequence, but also between sequences. The organization between sequences is of particular interest for revealing neural computations. For example, the forward and backward search algorithms hypothesized in planning and inference (Penny et al., 2013) can be cast as sequences of sequences problem: the temporal structure of forward and backward sequence. This can be tested by using TDLM iteratively.
To look for sequences between sequences, we need first to define sequences as new states. To do so, the raw state course, for example, state B, needs to be shifted up by the empirical withinsequence time lag $\mathrm{\Delta}t$ (determined by the twolevel GLM), to align with the onset of state A, if assuming sequence $A\to B$ exist (at time lag $\mathrm{\Delta}t$). Then, we can elementwise multiply the raw state time course A with the shifted time course B, resulting in a new state AB. Each entry in this new state time course indicates the reactivation strength of sequence AB at a given time.
The general twolevel GLMs framework still applies, but now with one important caveat. The new sequence state (e.g. AB) is defined based on the original states (A and B), and where we are now interested in a reactivation regularity, that is, sequence, between sequences, rather than the original states. We need therefore to control for the effects of the original states. Effectively, this is like controlling for main effects (e.g. state A and shifted state B) when looking for their interaction (sequence AB). TDLM achieves this by including timelagged original state regressors A, B, in addition to AB, in the firstlevel GLM sequence analysis.
Specifically, let us assume that the sequence state matrix is $X}_{seq$, after transforming the original state space to sequence space based on the empirical withinsequence time lag $\mathrm{\Delta}{t}_{w}$. Each column at $X}_{seq$ is sequence state, denoted by $S}_{ij$, which indicates the strength of sequence i > j reactivation. The raw state i is $X}_{i$, and the shifted raw state j is $X}_{jw$ (by time lag $\mathrm{\Delta}{t}_{w}$).
In the first level GLM, TDLM ask for the strength of a unique contribution of sequence state $S}_{ij$ to $S}_{mn$ while controlling for original states ($X}_{i$ and $X}_{jw$). For each sequence state $ij$, at each possible time lag $\mathrm{\Delta}t$, TDLM estimated a separate linear model:
Repeat this process for each sequence state separately at each time lag, resulting in a sequence matrix $\beta}_{seq$.
At the secondlevel GLM, TDLM asks how strong the evidence for a sequence of interest is compared to sequences that have the same starting state or end state at each time lag. This secondlevel GLM will be the same as Equation 5, but with additional regressors to control for sequences that share the same start or end state. In simulation, we demonstrate, applying this method, that TDLM can uncover hierarchical temporal structure: state A is temporally leading state B with 40 ms lag, and the sequence A > B tends to repeat itself with a 140 ms gap (Appendix 3—figure 1a). One interesting application of this is to look for theta sequence (Mehta et al., 2002; McNaughton et al., 2006; Buzsáki and Moser, 2013). One can think of theta sequence, a welldocumented phenomenon during rodent spatial navigation, as a neural sequence repeating itself in theta frequency (6–12 Hz).
In addition to looking for temporal structure of the same sequence, the method is equally suitable when searching for temporal relationships between different sequences. For example, assuming two different types of sequences, one sequence type has a withinsequence time lag at 40 ms; while the other has a withinsequence time lag at 150 ms (Appendix 3—figure 1b, left and middle panel); and there is a gap of 200 ms between the two types of sequences (Appendix 3—figure 1b, right panel). These time lags are set arbitrarily for illustration purposes. TDLM can accurately capture the dynamics both within and between the sequences, supporting a potential for uncovering temporal relationships between sequences under the same framework.
Appendix 4
Apply TDLM to human wholebrain EEG data
An autocorrelation is commonplace in neuroimaging data, including EEG and fMRI. TDLM is designed to specifically take care of this confound, and, on this basis, we should be able to work with EEG and fMRI data. We do not have suitable fMRI data available to test TDLM but are interested to investigate this in more depth in our future work. We had collected EEG data from one participant to test whether TDLM would *just* work.
The task was designed to examine ontask sequential replay in decisionmaking by Dr. Toby Wise. This is a ‘Tmaze’ like task, where a participant needs to choose a left or right path based on the value received at the end of the path. We could decode seven objects well on the wholebrain EEG data using just raw amplitude feature (same with our MEGbased analysis) and could detect fast backward sequenceness (peaked at 30 ms time lag) during choice/planning time (Appendix 4—figure 1), similar to our previous MEG findings (KurthNelson et al., 2016). As this result is from one subject, we are cautious about making an excessive claim, but nevertheless we believe that the data show the TDLM approach is highly promising for EEG data.
Appendix 5
Less sensitivity of TDLM to skipping sequences
In a linear track where replays only go in a single direction, it is possible that TDLM is less sensitive compared to the linear correlation or the Radon method, given the latter assumes a parametric relationship between space and time. For example, if only the first and last states are activated, but not the intermediate states, the existing methods will report replay, but TDLM will not, because in existing methods space and time are parametric quantities (Appendix 5—figure 1). In contrast, TDLM only knows about transitions on a graph.
Data availability
No new data is used or generated in the current paper. Data relevant for current paper is available at https://github.com/YunzheLiu/TDLM (copy archived at https://archive.softwareheritage.org/swh:1:rev:015c0e90a14d3786e071345760b97141700d6c85). This dataset is from Ólafsdóttir et al., 2016.
References

Online evaluation of novel choices by simultaneous representation of multiple memoriesNature Neuroscience 16:1492–1498.https://doi.org/10.1038/nn.3515

Repetition suppression: a means to index neural representations using BOLD?Philosophical Transactions of the Royal Society B: Biological Sciences 371:20150355.https://doi.org/10.1098/rstb.2015.0355

Crossspecies neuroscience: closing the explanatory gapPhilosophical Transactions of the Royal Society B: Biological Sciences 376:20190633.https://doi.org/10.1098/rstb.2019.0633

Cellular bases of hippocampal EEG in the behaving ratBrain Research Reviews 6:139–171.https://doi.org/10.1016/01650173(83)900371

Memory, navigation and theta rhythm in the hippocampalentorhinal systemNature Neuroscience 16:130–138.https://doi.org/10.1038/nn.3304

Decision theory, reinforcement learning, and the brainCognitive, Affective, & Behavioral Neuroscience 8:429–453.https://doi.org/10.3758/CABN.8.4.429

Auto‐regressive model for nonstationary stochastic processesJournal of Engineering Mechanics 114:1995–2012.https://doi.org/10.1061/(ASCE)07339399(1988)114:11(1995)

Forward and reverse hippocampal placecell sequences during ripplesNature Neuroscience 10:1241–1242.https://doi.org/10.1038/nn1961

Granger causality and path diagrams for multivariate time seriesJournal of Econometrics 137:334–353.https://doi.org/10.1016/j.jeconom.2005.06.032

Replay comes of ageAnnual Review of Neuroscience 40:581–602.https://doi.org/10.1146/annurevneuro072116031538

Decoding neural representational spaces using multivariate pattern analysisAnnual Review of Neuroscience 37:435–456.https://doi.org/10.1146/annurevneuro062012170325

ThesisUncovering temporal structure in neural data with statistical machine learning modelsUniversity of Oxford.

Awake replay of remote experiences in the HippocampusNature Neuroscience 12:913–918.https://doi.org/10.1038/nn.2344

Representational similarity analysis  connecting the branches of systems neuroscienceFrontiers in Systems Neuroscience 2:4.https://doi.org/10.3389/neuro.06.004.2008

Circular analysis in systems neuroscience: the dangers of double dippingNature Neuroscience 12:535–540.https://doi.org/10.1038/nn.2303

Overlapping memory replay during sleep builds cognitive schemataTrends in Cognitive Sciences 15:343–351.https://doi.org/10.1016/j.tics.2011.06.004

Viewpoints: how the Hippocampus contributes to memory, navigation and cognitionNature Neuroscience 20:1434–1447.https://doi.org/10.1038/nn.4661

Path integration and the neural basis of the 'cognitive map'Nature Reviews Neuroscience 7:663–678.https://doi.org/10.1038/nrn1932

Replay and time compression of recurring spike sequences in the HippocampusThe Journal of Neuroscience 19:9497–9507.https://doi.org/10.1523/JNEUROSCI.192109497.1999

Beyond mindreading: multivoxel pattern analysis of fMRI dataTrends in Cognitive Sciences 10:424–430.https://doi.org/10.1016/j.tics.2006.07.005

Coordinated grid and place cell replay during restNature Neuroscience 19:792–794.https://doi.org/10.1038/nn.4291

The role of hippocampal replay in memory and PlanningCurrent Biology 28:R37–R50.https://doi.org/10.1016/j.cub.2017.10.073

Forward and backward inference in spatial cognitionPLOS Computational Biology 9:e1003383.https://doi.org/10.1371/journal.pcbi.1003383

Neural representations of events arise from temporal community structureNature Neuroscience 16:486–492.https://doi.org/10.1038/nn.3331

Awake reactivation of prior experiences consolidates memories and biases cognitionTrends in Cognitive Sciences 23:876–890.https://doi.org/10.1016/j.tics.2019.07.008

On the methods for reactivation and replay analysisPhilosophical Transactions of the Royal Society B: Biological Sciences 375:20190231.https://doi.org/10.1098/rstb.2019.0231

ThesisThe Radon Transform: Theory and ImplementationTechnical University of Denmark.

Progress and issues in secondorder analysis of hippocampal replayPhilosophical Transactions of the Royal Society B: Biological Sciences 375:20190238.https://doi.org/10.1098/rstb.2019.0238

Localization of brain electrical activity via linearly constrained minimum variance spatial filteringIEEE Transactions on Biomedical Engineering 44:867–880.https://doi.org/10.1109/10.623056

ConferenceAdvances in neural information processing systemsConference on Neural Information Processing Systems. pp. 1473–1480.

Episodic memory retrieval success is associated with rapid replay of episode contentNature Neuroscience 23:1025–1033.https://doi.org/10.1038/s415930200649z
Decision letter

Laura L ColginSenior Editor; University of Texas at Austin, United States

Caleb KemereReviewing Editor; Rice University, United States

Matthijs van der MeerReviewer
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This paper represents a valuable new tool for detecting and quantifying sequential structure in neural activity. Spontaneously generated sequences (a.k.a. "replay") is thought to be an important way for the brain to make efficient use of experience, facilitating learning and consolidation of information. Replay has been studied in the rodent hippocampus for decades, but it has recently become possible to detect such activity in human MEG (and perhaps even fMRI) data, generating much current excitement and promise in bringing together these fields. The approach of this work enables investigators to assess the overall level of replay present in a dataset (rather than locating individual events). In particular, a strength of the general linear modeling framework is that both the primary hypothesis "is replay prevalent in this data set?" and secondary hypotheses (e.g., "is there more of one type of replay than another?", "does replay strength change over time?") can be assessed.
Decision letter after peer review:
[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]
Thank you for submitting your work entitled "Measuring Sequences of Representations with Temporally Delayed Linear Modelling" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Matthijs van der Meer (Reviewer #2), and Caleb Kemere (Reviewer #3).
Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that we have decided to reject the paper in its current form.
The reviewers all felt that the work is extremely valuable; a domaingeneral replay detection method would be of wide interest and utility. However, we felt that the paper was lacking context and comparisons to existing methods. Most critically, the paper would be more impactful if comparisons with standard replay methods were included, and the reviewers felt that would be too substantial a change to ask for as a revision. There were also concerns about lack of detail in the description of the methods and data that diminished enthusiasm. The authors would be welcome to make changes along these lines and submit the paper again as a new submission.
Reviewer #1:
This paper describes temporal delayed linear modelling (TDLM), a method for detecting sequential replay during awake rest periods in human neuroimaging data. The method involves first training a classifier to decode states from labeled data, then building linear models that quantify the extent to which one state predicts the next expected state at particular lags, and finally assessing reliability by running the analysis with permuted labels.
This method has already been fruitfully used in prior empirical papers by the authors, and this paper serves to present the details of the method and code such that others may make use of it. Based on existing findings, the method seems extremely promising, with potential for widespread interest and adoption in the human neuroimaging community. The paper would benefit, however, from more discussion of the scope of the applicability of the method and its relationship to methods already available in the rodent and (to a lesser extent) human literature.
1. TDLM is presented as a general tool for detecting replay, with special utility for noninvasive human neuroimaging modalities. The method is tested mainly on MEG data, with one additional demonstration in rodent electrophysiology. Should researchers expect to be able to apply the method directly to EEG or fMRI data? If not, what considerations or modifications would be involved?
2. How does the method relate to the state of the art methods for detecting replay in electrophysiology data? What precludes using those methods in MEG data or other noninvasive modalities? And conversely, do the authors believe animal replay researchers would benefit from adopting the proposed method?
3. It would be useful for the authors to comment on the applicability of the method to sleep data, especially as rodent replay decoding methods are routinely used during both awake rest and sleep.
4. How does the method relate to the Wittkuhn and Schuck fMRI replay detection method? What might be the advantages and disadvantages of each?
5. The authors make the point that spatial correlation as well as anticorrelation between state patterns reduces the ability to detect sequences. The x axis for Figure 3c begins at zero, demonstrating that lower positive correlation is better than higher positive correlation. Given the common practice of building one classifier to decode multiple states (as opposed to a separate classifier for each state), it would be very useful to provide a demonstration that the relationship in Figure 3c flips (more correlation is better for sequenceness) when spatial correlations are in the negative range.
6. In the Results, the authors specify using a single time point for spatial patterns, which would seem to be a potentially very noisy estimate. In the Methods, they explain that the data were downsampled from 600 to 100 Hz to improve SNR. It seems likely that downsampling or some other method of increasing SNR will be important for the use of single time point estimates. It would be useful for the authors to comment on this and provide recommendations in the Results section.
7. While the demonstration that the method works for detecting theta sequences in navigating rodents is very useful, the paper is missing the more basic demonstration that it works for simple replay during awake rest in rodents. This would be important to include to the extent that the authors believe the method will be of use in comparing replay between species.
8. The authors explain that they "had one condition where we measured resting activity before the subjects saw any stimuli. Therefore, by definition these stimuli could not replay, but we can use the classifiers from these stimuli (measured later) to test the false positive performance of statistical tests on replay." My understanding of the rodent preplay literature is that you might indeed expect meaningful "replay" prior to stimulus exposure, as existing sequential dynamics may be coopted to represent subsequent stimulus sequences. It may therefore be tricky to assume no sequenceness prior to stimulus exposure.
Reviewer #2:
This paper addresses the important overall issue of how to detect and quantify sequential structure in neural activity. Such sequences have been studied in the rodent hippocampus for decades, but it has recently become possible to detect them in human MEG (and perhaps even fMRI) data, generating much current excitement and promise in bringing together these fields.
In this paper, the authors examine and develop in more detail the method previously published in their groundbreaking MEG paper (Liu et al. 2019). The authors demonstrate that by aiming their method at the level of decoded neural data (rather than the sensorlevel data) it can be applied to a wide range of data types and settings, such as rodent ephys data, stimulating crossfertilization. This generality is a strength and distinguishes this work from the typically ad hoc (studyspecific) methods that are the norm; this paper could be a first step towards a more domaingeneral sequence detection method. A further strength is that the general linear modeling framework lends itself well to regressing out potential confounds such as autocorrelations, as the authors show.
However, enthusiasm for the paper is limited by several overall issues:
1. It seems a major claim is that the current method is somehow superior to other methods (e.g. from the abstract: "designed to take care of confounds" implying that other methods do not do this, and "maximize sequence detection ability" implying that other methods are less effective at detection). But there is very little actual comparison with other methods made to substantiate this claim, particularly for sequences of more than two states which have been extensively used in the rodent replay literature (see Tingley and Peyrache, Proc Royal Soc B 2020 for a recent review of the rodent methods; different shuffling procedures are applied to identify sequenceness, see e.g. Farooq et al. Neuron 2019 and Foster, Ann Rev Neurosci 2017). The authors should compare their method to some others in order to support these claims, or at a minimum discuss how their method relates to/improves upon the state of the art.
2. The scope or generality of the proposed method should be made more explicit in a number of ways. First, it seems the major example is from MEG data with a small number of discrete states; how does the method handle continuous variables and larger state spaces? (The rodent ephys example could potentially address this but not enough detail was provided to understand what was done; see specific comments below.) Second, it appears this method describes sequenceness for a large chunk of data, but cannot tell whether an individual event (such as a hippocampal sharp waveripple and associated spiking) forms a sequence not. Third, there is some inconsistency in the terminology regarding scope: are the authors aiming to detect any kind of temporal structure in neural activity (first sentence of "Overview of TDLM" section) which would include oscillations, or only sequences? These are not fatal issues but should be clearly delineated.
3. The inference part of the work is potentially very valuable because this is an area that has been well studied in GLM/multiple regression type problems. However, the authors limit themselves to asking "firstorder" sequence questions (i.e. whether observed sequenceness is different from random) when key questions – including whether or not there is evidence of replay – are actually "secondorder" questions because they require a comparison of sequenceness across two conditions (e.g. pretask and posttask; I'm borrowing this terminology from van der Meer et al. Proc Royal Soc B 2020). The authors should address how to make this kind of comparison using their method.
Reviewer #3:
The methods used by the authors seem like potentially really useful tools for research on neural activity related to sequences of stimuli. We were excited to see that a new toolbox might be available for these sorts of problems, which are widespread. The authors touch on a number of interesting scenarios and raise relevant issues related to crossvalidation and inference of statistical significance. However, given (1) the paucity of code that they've posted, and its specificity to specific exact data and (2) the large literature on latent variable models combined with surrogate data for significance testing, I would hesitate to call TDLM a "framework". Moreover, in trying to present it in this generic way, the authors have muddled the paper, making it difficult to understand exactly what they are doing.
Overall: This paper presents a novel approach for detecting sequential patterns in neural data however it needs more context. What's the contribution overall? How and why is this analysis technique better than say Bayesian template matching? Why is it so difficult to understand the details of the method?
The first and most important problem with this paper is that it is intended (it appears) to be a more detailed and enhanced retelling of the author's 2019 Cell paper. If this is the case, then it's important that it also be clearer and easier to read and understand than that one was. The authors should follow the normal tradition in computational papers:
1. Present a clear and thorough explanation of one use of the method (i.e., MEG observations with discrete stimuli), then present the next approach (i.e., sequences?) with all the details necessary to understand it.
2. The authors should start each section with a mathematical explanation of the X's – the equation(s) that describes how they are derived from specific data. Much of the discussion of cross validation actually refers to this mapping.
3. Equation 5 also needs a clearer explanation – it would be better to write it as a sum of matrices (because that is clearer) than with the strange "vec" notation. And TAUTO, TF and TR should be described properly – TAUTO is "the identity matrix", TF and TR are "shift matrices, with ones on the first upper and lower off diagonals".
4. The cross validation schemes need a clear description. Preferably using something like a LaTeX "algorithm" box so that they are precisely explained.
Recognizing the need to balance readability for a general reader and interest, perhaps the details could be given for the first few problems, and then for subsequent results, the detail could go into a Methods section. Alternatively, the methods section could be done away with (though some things, such as the MEG data acquisition methods are reasonably in the methods).
Usually, we think about latent variable model problems from a generative perspective. The approach taken in this paper seems to be similar to a Kalman filter with a multinomial observation (which would be equivalent to the logistic regression?), but it's unclear. Making the connection to the extensive literature on dynamical latent variable models would be helpful.
[Editors’ note: further revisions were suggested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Temporally delayed linear modelling (TDLM) measures replay in both animals and humans" for further consideration by eLife. Your revised article has been reviewed by 3 reviewers and the evaluation has been overseen by Laura Colgin as the Senior Editor, and a Reviewing Editor.
While impressed by the novelty and potential utility of the method, the reviewers had a specific critical concern. Namely, do high sequenceness scores truly capture activation of patterns that widely span the specified sequence space (i.e., many complete "ABCDE" sequences) or only a collection of pairwise correlations (i.e., "AB", "DE", "BC")? Presenting more examples from experimental data that demonstrate the former was perceived as critical to demonstrate the utility of TDLM as an approach for "replay detection". Ultimately, the reviewers reached a consensus that in the current presentation, what TLDM actually detects remains opaque, and the impact of the work is diminished. We considered requesting an action plan, but upon reflection, I think that the main issue is one of semantics. However, if you wish to describe TDLM as a method for detecting "replay", it needs to be critically addressed.
In addition to the reviews below, in our discussion, one reviewer noted: "even though they now explain in more detail what they did to analyze theta sequences, the result (~12 Hz) is still seemingly at odds with the ~89 Hz repetition one would expect from the literature. I'm actually not sure this adds a whole lot to the paper so I think it would be better to just take this part out."
Reviewer #1:
Overall, I find great value in the effort to provide researchers working with very different animal models and datasets a similar toolkit to apply and analyze reactivation and replay. But I also have significant concerns about the potential for these methods, if poorly understood and applied, to further confound the field. Fully understanding this paper and the described methods and its caveats is not easy for most experimentalists, yours truly included. I am concerned that investigators will misapply these tools and claim significant replay in instances where there is none. These concerns may be addressable by better diagnostics and related guidance for interpretation.
Nevertheless, an important caveat in the work is that it does not detect "replay" per se, but rather temporal biases in decoded activity. Thus I think the title should be amended. In some places, the authors describe this as "sequenceness", or "temporal delays" which are both preferable to "replay". Prior work (e.g. Skaggs and McNaughton) used a similar measure, but referred to it as "temporal bias". While this temporal bias is certainly related to replay and sequences, it is only an indirect measure of these, and more akin to "reactivation", as it's largely pairwise analyses. Clarity on such issues is particularly important in this area given the excessive ambiguity in terminology for the replay and reactivation phenomena.
My other major concern is that the analysis is rather opaque in an area where there is much need for transparency, especially considering the existing debates and controversy surrounding appropriate methodology and conclusions that can be drawn from it. For example, in most of the figures that the authors provide, it's unclear whether the sequenceness score is driven by one particular pair of stimuli, or equally so among most possible pairs. Perhaps a transition graph could be composed from the data, but I did not find this except in cartoon diagrams. I think it would be important for the authors to provide guidance regarding these details. A related question is whether these biased pairs typically appear in isolation, or as part of larger multiitem events? It's not clear if there is a straightforward way to extract this important information. Some sample events could be shown, but examples can also be cherrypicked and nonrepresentative. Probably a histogram of sequence bout lengths would be valuable.
Part of the claimed value in these methods is their possible application to spike trains, e.g. from rodents, particularly for detecting and describing individual events. The authors claim that this is possible. However, while they analyze two rodent datasets in different parts, they do not apply it to any real data, but only on rather contrived (low noise) simulated data. This raises the concern that TDLM is not sufficiently sensitive for detecting individual events. The theta sequence analysis results shown in Supplementary Figure 3d are also concerning to me. They show a very noisy signal that passes threshold in one particular bin. If such a figure were a major result in a new manuscript, and the current eLife manuscript was cited as validation for the methods, would reviewers be obliged to accept it for publication? If not, what additional criterion or diagnostic steps would be recommended?
Comments for the authors:
P2, Line 32: "and" seems misplaced.
P2, Line 34: "periods of activity".
P3, Line 4: perhaps "single neuron" rather than "cellular".
P4, Lines 3435: it is not clear here what the authors mean by "over and above variance that can be explained by other states at the same time." It gets more clear later in page 11, section "Moving to multiple linear regression". The authors might consider either referring to that section at the end of the sentence or removing the unnecessary details that might confuse the reader at this juncture.
P5, Line 31: This sentence is a bit confusing. It is not clear what "in which" refers to. It might be better to start a new sentence for describing Zf and Zb.
P6, Lines 78: The authors might refer to the section "Correcting for multiple comparisons" on page 16, where more details are provided.
P8, Lines 4246: the description of abstraction may benefit from additional background and exposition.
P9, Line 1824, I found this entire paragraph very confusing.
P10, Line 7: "that share the same abstract representation" is redundant.
P10, Line 12: "tested" should be corrected to test it.
P10, Lines 2324: Confusing sentence with multiple "whiches".
P12, Line 24: Is it possible for the autocorrelation structure of Xi(t) to generate false positives for Xj(t+dt), or would this necessarily be entirely accounted for by Xj(t)?
P13, Lines 2324: How the regularization (penalizing for the norm of W) is supposed to make the model "trained on stimulusevoked data" generalize better to offtask/rest data? The authors might add a sentence or two at the end of the paragraph to make the aim more clear and motivate the next paragraphs in the section. Based on the descriptions in the first paragraph of Page 14, the regularization seems to add more sparsity to the estimated W that minimizes spatial correlations between the states, etc. Something similar to these descriptions could be used here.
P13, Line 28: It is not exactly clear what authors mean by "the prior term"? Does it refer to the equation before adding any regularization or to some prior probability distribution over W? I think in this context, we should be cautious with using the words like prior, posterior, etc.
P16, Line 30, extra "as".
P17, section "Considerations on secondorder inferences". It seems that this section should be placed later in the manuscript, in the section "TDLM FOR RODENT REPLAY".
P23 Line 39, missing "that".
Supplementary Note 5: what shuffle was used?
Reviewer #2:
This paper addresses the important overall issue of how to detect and quantify sequential structure in neural activity. Spontaneously generated sequences (a.k.a. "replay") is thought to be an important way for the brain to make efficient use of experience, facilitating learning and consolidation of information. Replay has been studied in the rodent hippocampus for decades, but it has recently become possible to detect such activity in human MEG (and perhaps even fMRI) data, generating much current excitement and promise in bringing together these fields.
However, comparison and crossfertilization between different replay studies – even within the same species, let alone across species – has been hampered by a fragmented landscape of different analysis methods, which are often adhoc and taskspecific. In this study, the authors develop and extend the method previously published in their groundbreaking MEG paper (Liu et al. 2019), notably demonstrating that by aiming their method at the level of decoded neural data (rather than the sensorlevel data) it can be applied to a wide range of data types and settings, including human MEG, EEG, and rodent ephys data. A further strength is that the general linear modeling framework lends itself well to regressing out potential confounds and formally testing secondorder questions such as whether tthere is more forward vs. reverse replay, or if replay strength changes with time.
In this revised submission, the authors have made several major additions to the manuscript, including a substantive analysis of rodent replay data, and associated multiscale extension of the method to make it suitable for continuous state spaces. This addition is an important component of the paper, in that it demonstrates the generality of a framework that can be applied to different kinds of data and task state spaces across species. Another much appreciated change is the expanded and much more clear explanations throughout, such as those of the TDLM method itself and of the data analysis pipelines for the various kinds of data. With these additions, I think the paper really makes good on the promise of a domaingeneral method for the detection of sequences in spontaneous neural activity, and provides a timely impetus to the study of replay across tasks and species.
There is one important area that the paper still falls short on, but that should be straightforward to address with some simple additional analysis and discussion. A distinctive strength of the GLM framework is that it easily accommodates testing important and ubiquitous "secondorder" questions, such as whether there is more forward than reverse replay, is replay strength changing over time, and so on. However, the message in the current version that one could simply subtract sequenceness scores doesn't address how one would formally test for a difference, or test for some factor like time being important. For forward vs. reverse, because this is fit to the same data, this is a comparison between different betas in the secondlevel GLM (Figure 1f). I am not a statistician, but my textbooks say there are a few ways of doing this, for instance z = (β_{fwd} – β_{rev}) / σ_(β_{fwd} – β_{rev}) where the crucial variance term can be estimated as the sqrt(σ(β_{fwd})^{2} + σ(β_{rev})^{2}), matching the intuition that a formal test requires an estimate of variance of the difference between the betas, not just the means.
For early vs. late sleep, things are more complicated because you are not comparing models/betas fit to the same data. I suppose the above approach could also work as long as all the betas are standardized, but wouldn't a better test be to include time as a regressor in the secondlevel GLM and formally test for an interaction between time and T_{fwd} (and T_{bwd})?
In either case, there are two important points to make to demonstrate the utility of the GLM framework for sequence analysis: (1) these ubiquitous secondorder inference questions require a bit more consideration than just saying you can subtract the sequenceness scores; there is another level of statistical inference here, and (2) the TDLM framework, unlike many other approaches, is in fact very well suited to doing just that – it benefits from the existing machinery of linear statistical modeling and associated model comparison tools to make such comparisons.
These points are not currently clear from the "Considerations on secondorder inferences" section. If in addition the GLM framework allows for the regressing out of some potential confounds that can come up in rodent data that the authors now reference that's an additional benefit but not the main point I am asking the authors to speak to.
Reviewer #3:
This revised manuscript describes a method for detecting "offline replay" events in human and animal electrophysiology. I have found the TDLM method described in the manuscript to be very valuable for the field. Its detailed description in the present manuscript will be very useful, first because it presents its different components very clearly and in detail – something which is not possible in a manuscript focused on a particular finding obtained using this method. Second, the authors show that this method can be applied not only to human MEG data, but also to rodent electrophysiology data.
I found the manuscript to be well written: it describes clearly and adequately the different analytic steps that have to be applied to avoid confounds arising from known features of electrophysiological signals (autocorrelations, oscillations), and from task features themselves. The fact that they show results obtained using their method from simulated and real data is also a strength of the manuscript. The level of computational detail regarding the equations appears adequate and well balanced for a broad readership, and example code using the method has been posted online.
I have not reviewed the original version of the manuscript, but have first read the revised manuscript before consulting the initial reviews and the responses provided by the authors.
The authors' responses to the initial reviews are extensive and insightful. I wonder how much the presentation of replay detection in EEG data from a single subject is particularly convincing, but nevertheless I agree with the authors that it shows that their method indeed can work using EEG data instead of MEG data.
The authors' extensive response regarding the comparison between their method and other stateoftheart methods for detecting replay events in electrophysiology data was very clear and useful. The revised manuscript adequately includes a full section on the use of their method for rodent electrophysiology data. The authors also discuss their method in light of other efforts, from Wittkuhn and Schuck using fMRI data for example.
Regarding the more important reservations of Reviewer #2, I found that the authors provided adequate and convincing responses to each of them. Regarding the concerns of Reviewer #3, I found that the revised manuscript allowed understanding the method quite clearly relative to my limited understanding of the details of the method after reading recent empirical papers from the authors (e.g., Liu et al., 2019, Cell). The fact that the authors have posted code online on GitHub is also very useful, and I find that the level of detail regarding the equations is sufficient (and well balanced) for the broad readership of a journal like eLife.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Temporally delayed linear modelling (TDLM) measures replay in both animals and humans" for further consideration by eLife. Your revised article has been evaluated by Laura Colgin as the Senior Editor, and a Reviewing Editor.
We shared the revised manuscript with the reviewers. After some discussion, it was concluded that the manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:
1. The reviewers were confused by the data in Figure 7e. We finally concluded that it was an attempt to explain how the regression was formed, but it took lots of back and forth. Given that this is a tools paper, there seems to be no reason why each analysis figure can't be backed with equations that identify the regressions being done, and the variables being regressed.
2. Figure 5 appears to be about analyzing the MEG data when events are detected. (Isn't TDLM an approach for measuring sequenceness over a population of events rather than finding single ones?) Even though this is previously published work, the methods need significant expansion (see Point 1). The text refers to a section that appears to be missing? Here's the text: "We want to identify the time when a given sequence is very likely to unfold. We achieve this, by transforming from the space of states to the space of sequence events. This is the same computation as in the section "States as sequence events". " (Search for "sequence events" yields no results.) Perhaps this refers to Appendix 3, but the text there doesn't really help much.
3. One reviewer had previously asked "in most of the figures that the authors provide, it's unclear whether the sequenceness score is driven by one particular pair of stimuli, or equally so among most possible pairs". To clarify, it seems that the question is: if the model proposed is A→B→C→D, and the data are largely A→B, can that be detected? Or alternatively, can you give a proper way of comparing two models (in this case, A→B→C→D vs A→B)?
https://doi.org/10.7554/eLife.66917.sa1Author response
[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]
The reviewers all felt that the work is extremely valuable; a domaingeneral replay detection method would be of wide interest and utility. However, we felt that the paper was lacking context and comparisons to existing methods. Most critically, the paper would be more impactful if comparisons with standard replay methods were included, and the reviewers felt that would be too substantial a change to ask for as a revision. There were also concerns about lack of detail in the description of the methods and data that diminished enthusiasm. The authors would be welcome to make changes along these lines and submit the paper again as a new submission.
We thank all reviewers for their positive evaluation of our work, and more so, for pointing to areas for improvement, including comparisons to existing rodent methods and absence of relevant details in the original submission. In this revised manuscript, we have described our approach – temporal delayed linear modelling (TDLM) in more detail, and with more focus on a comparison to existing methods.
Importantly, we have extended TDLM to better cope with continuous state spaces (as is normally the case in rodent experiments where physical positions are states to be decoded, while in human experiments, the states are often discrete, as indexed by visual stimuli). We call this method multiscale TDLM.
We have also reanalysed rodent electrophysiology data from Ólafsdóttir, Carpenter, and Barry (2016) using our approach, and show TDLM can offer unique perspective compared to existing “line searching” methods in rodent replay analysis.
As a result, the paper is substantially rewritten to highlight its generality as a sequence detection method with applicability to both rodents and humans. It is general because it can test any transition of interest on an arbitrary graph, and this goes beyond “line searching” on a linearized state space. This has already proved important in human experiments where complex statespaces have been used. We expect this generality (by this method or others) will also be important as rodent replay experiments move beyond 1D tracks, for example to foraging in 2D, or in complex mazes.
Below we address all the reviewers’ comments in a pointbypoint manner.
Reviewer #1:
This paper describes temporal delayed linear modelling (TDLM), a method for detecting sequential replay during awake rest periods in human neuroimaging data. The method involves first training a classifier to decode states from labeled data, then building linear models that quantify the extent to which one state predicts the next expected state at particular lags, and finally assessing reliability by running the analysis with permuted labels.
This method has already been fruitfully used in prior empirical papers by the authors, and this paper serves to present the details of the method and code such that others may make use of it. Based on existing findings, the method seems extremely promising, with potential for widespread interest and adoption in the human neuroimaging community. The paper would benefit, however, from more discussion of the scope of the applicability of the method and its relationship to methods already available in the rodent and (to a lesser extent) human literature.
We thank the reviewer for this positive feedback and agree it will be useful to discuss TDLM method in the context of existing ones. We have done so in the revised paper. Below we address each of the reviewer’s specific concerns and questions.
1. TDLM is presented as a general tool for detecting replay, with special utility for noninvasive human neuroimaging modalities. The method is tested mainly on MEG data, with one additional demonstration in rodent electrophysiology. Should researchers expect to be able to apply the method directly to EEG or fMRI data? If not, what considerations or modifications would be involved?
Yes, we expect this same method can be applied to human EEG, fMRI, as well as rodent electrophysiology data. In the revision, we show how TDLM can be extended to work on rodent replay during sleep (as detailed later below), We also applied TDLM to real human EEG data to demonstrate its applicability. We did not have suitable fMRI data at hand to test TDLM, but we believe the same procedure should work on fMRI as well. In the revision, we include this point in the Discussion section, along with analysis results on real rodent data in the main text (detailed later when responding to Q2), and analysis on human EEG in the supplemental material under the section “Apply TDLM to human wholebrain EEG data”. Below we have copied the relevant text in discussion, and supplemental material for reference:
Main text – Discussion, page 2526, line 455
“In the human neuroimaging domain, we have mainly discussed the application of TDLM with respect to MEG data. […] In future, we consider it will be useful to combine the high temporal resolution available in M/EEG and the spatial precision available in fMRI to probe region specific sequential computations.”
Supplemental material – Supplementary Note 4: Apply TDLM to human wholebrain EEG data
“Supplementary Note 4: Apply TDLM to human wholebrain EEG data
An autocorrelation is commonplace in neuroimaging data, including EEG and fMRI. and[…] These EEG sequence results replicate our previous MEGbased findings based on analyses at planning/decision time (see Figure 3 in KurthNelson et al., 2016, and also see Figure 3f in Liu et al., 2019).”
2. How does the method relate to the state of the art methods for detecting replay in electrophysiology data? What precludes using those methods in MEG data or other noninvasive modalities? And conversely, do the authors believe animal replay researchers would benefit from adopting the proposed method?
This is a great question. We thank all three reviewers for bringing this to our attention. To answer this, we discuss TDLM in comparison to existing rodent methods, in both simulation and real data, including rodent hippocampal electrophysiology (place cells in CA1) and human wholebrain MEG. To show the benefit of detecting replay using TDLM in rodents, we have now reanalyzed the data from Ólafsdóttir et al. (2016).
Firstly, we would like to highlight a key difference between TDLM and popular existing techniques (both methods we compare against below, and all methods we know about). These existing techniques rely on a continuous parametric embedding of behavioural states and the relationship between this embedding time (parametrically encoded). As far as we know existing techniques only use 1D embeddings, but this could likely be generalised. Essentially, they are looking for a line/linear relationship between time and decoded positions. We will henceforth refer them as “line search” approaches.
TDLM is fundamentally different to this as it operates on a graph and tests the statistical likelihood of some transitions happening more than others. This is therefore a more general approach that can be used for sequences drawn from any graph, not just graphs with simple embeddings (like a linear track).
For example, in a nonspatial decisionmaking task (KurthNelson et al., 2016), all states lead to two different states and themselves can be arrived at from two other different states (Figure 6). Existing methods will not work here because there is no linear relationship between time and states.
TDLM vs. existing rodent methods
TDLM works on a decoded state space, rather than sensor (with analogy to cell) level. We compared TDLM to rodent methods that work on the posterior decoded position (i.e., state) space, normally referred to as Bayesianbased methods (Tingley and Peyrache, 2020). Note, the “Bayesian” part is the decoding of an animals’ ‘location’ during rest/sleep based on spike counts and mean firing rate (Zhang, Ginzburg, McNaughton, and Sejnowski, 1998), not replay detection. Two commonly used methods are Radon transform (Davidson, Kloosterman, and Wilson, 2009) and linear weighted correlation (Grosmark and Buzsáki, 2016).
Both methods proceed by forming a 2D matrix, where one dimension is the decoded state (e.g., positions on a linear track), and the other dimension is time (note that, as stated above, a decoded state is embedded in 1D). These approaches then endeavour to discover if an ordered line is a good description of the relationship between state and (parametric) time. For this reason, we call this family of approaches “line search”.
The radon method uses a discrete Radon transform to find the best line in the 2D matrix (Toft, 1996) and then evaluates the radon integral, which will be high if the data lie on a line (Author response image 1A). It compares this to permutations of the same data where the states are reordered (Tingley and Peyrache, 2020). The linear weighted correlation method computes the average correlation between the time and estimated position in the 1D embedding (Author response image 1B). The correlation is nonzero provided there is an orderly reactivation along the state dimension.
Both these methods are performed on decoded positions, where these are sorted based on their order in the linearized state space. TDLM also works on the decoded position space, but instead of directly measuring the relationship between position and time, it measures the transition strength for each possible state to state transitions (Author response image 1C).
Applying TDLM to candidate replay events
Single sequence in a candidate replay event
In a simple simulation of spiking data, all methods work equally well (Author response image 2). All replay analysis is performed on a decoded posterior position space (time* decoded positions). The permutation is implemented by shifting the rate map of each neuron. This is similar to our state identitybased permutation in TDLM. Effectively, they both invoke a null hypothesis that state identities (i.e., positions) are exchangeable. The results shown in TDLM is the sequenceness (red line) at a time lag of 2, which is the ground truth of the simulated sequence. The shuffled samples (blue, Author response image 2B) shown in TDLM is the sequenceness estimates on the time lag that gives rise to the strongest evidence over all computed time lags in the shuffled data (to control for multiple comparison). This is the same statistical approach performed in the current paper.
To be more specific on how TDLM is applied.
1. We follow the same procedure to obtain a rate map and position decoding as other methods. This decoded position matrix is a state space with dimension of number of time bins by number of position bins.
2. To use TDLM, we need to specify the transition matrix of interest. We generally use the pairwise transition matrix in the space, i.e., postion1→ position2, postion2→ position3, postion4→ position5, etc. We run TDLM at each time lag, normally from time bin 1 to time 10, which corresponding to 10 ms to 100 ms, where each time bin is 10 ms in this simulation.
3. We search for the time lag (over all computed time lags) that give rise to the highest sequenceness; we consider that as the sequenceness score for this ripple event.
4. To assess statistical significance, we take the peak time lag over all computed time lags in the simulation, and then take the 95% percentile on that peak (over permutation samples) as the significance threshold. This is the same procedure we outline in the current paper.
More than one sequences in a candidate replay event.
As we have indicated above, TDLM is a more general approach (allowing a broader range of experiments). Even in a 1D statespace, TDLM can be sensitive to replays that do not reflect a line in StateTime space.
To see this more clearly, we have simulated a regime where there are two sequences, but in perfect opposite directions, within one ripple event (Author response image 3). We use this simulation to demonstrate a situation where TDLM might be more suitable.
It is not surprising that the linear weighted correlation method would fail (Author response image 3), given it is looking for a general time and decoded space correlation (which would be near zero). The Radon method is still fine, if we ignore fitting lines exceed certain range (e.g., vertical lines here), but it will not be able to capture both sequences, as compared to TDLM. In situations with many candidate sequences, the Radon method will not be able to assess the evidence for each of them but will only focus on the best fitting line.
TDLM can capture both sequences because it looks for evidence of any transition of interest in an arbitrary graph, and it characterizes sequence strength as a function of direction and speed (not shown here, see more details in real rodent replay analysis below).
Extending TDLM
In linear track where replays only go in a single direction, it is possible that TDLM is less sensitive compared to the linear correlation or the Radon method, given the latter assumes a parametric relationship between space and time. For example, if the first and last state alone are activated, but not intermediate states, the existing methods will report replay but TDLM will not, because in existing methods space and time are parametric quantities (Appendix 5—figure 1). In contrast, TDLM only knows about transitions on a graph.
Linear embedding (multiscale) TDLM in physical space
To solve this problem, we propose to perform TDLM in a linear embedding manner. The idea is to measure the same replay speed multiple times at different scales. For example, the speed of replay of 5cm per 10 ms, is the same as 10 cm per 20 ms, and 20 cm per 40 ms. Therefore, we can measure replay in multiscale state spaces separately, and average the replay strength of the same speed across scales later. To take into account potential differences in signal to noise ratio between state spaces, we estimate not only transition strength but also the uncertainty in its estimate within each state space, so that at the end we can do precision weighted averaging across scales. This multiscale approach has the benefit of not missing out on meaningful transitions e.g., state 1 → state 5 in original state space, and therefore could capture the parametric relationship between reactivated positions and time.
Specifically, to perform multiscale TDLM, we discretise position bins as a function of width, for example, from 5 cm to 40 cm. This generates rate maps in different scales (e.g., 5 cm per bin, 10 cm per bin, 20 cm per bin, 40 cm per bin), and hence multiscale state space. We then apply TDLM separately in each state space. We estimate both the replay strength and its uncertainty within each state space. This uncertainty estimate becomes important later when averaging the strength of the same replay speed across scales. Essentially, we are measuring the same thing multiple times, and average the measurements together while minimizing the variance.
In equations, the transition strength – 𝛽 is estimated by regressing the decoded state space – X to its lag copy – 𝑋(∆𝑡). In ordinary least squares (OLS) solution, 𝛽 is given by Equation 4 in the paper.
𝛽 = (𝑋^{T}𝑋)^{1}𝑋^{T}𝑋(∆𝑡)
The covariance matrix of 𝛽 can be estimated by
V = MSEC _{*}(X^{T}X)^{1}
Where, MSEC is the calibrated mean squared error, given by:
$MSEC=\phantom{\rule{0.222em}{0ex}}\frac{\sum _{i=1}^{T}\phantom{\rule{0.222em}{0ex}}({X}_{i}\left(\mathrm{\text{\Delta t}}\right){X}_{i}\beta {)}^{2}}{Tdf}$ 𝑇 is the number of time samples, and 𝑑𝑓 denotes the degrees of freedom consumed by model parameters. For OLS, each parameter takes away one degree of freedom.
For each replay speed of interest, we implement this separately in each state space and end up with [𝛽_{1} 𝑉_{1}], [𝛽_{2}, 𝑉_{2}], [𝛽_{2}, 𝑉_{2}]… [𝛽_{n}, 𝑉_{n}], with 𝑛 being the number of scales. Precision weighted averaging of sequence strength for this replay speed can be performed as:
${\beta}_{M}=\phantom{\rule{0.222em}{0ex}}\frac{\sum _{i=1}^{n}{\beta}_{i}{V}_{i}}{\sum _{i=1}^{n}{1/V}_{i}}$ We do this for all replay speeds of interest, with statistical testing then performed on the precision weighted averaged sequence strength in a similar manner to what we do in original TDLM.
To render this concrete, we simulate a scenario where multiscale TDLM is more sensitive, e.g., when there are gaps on a trajectory of interest. The multiscale TDLM is more sensitive because it encompasses this path more often than the original TDLM (Author response image 4).
Apply multiscale TDLM to real rodent data (place cells in CA1)
Next, we apply multiscale TDLM to real rodent data to demonstrate what it can reveal. The unique benefit of TDLM is that it does not need to assume a parametric relationship between state and time, which make it ideal for detecting sequences in an open maze. But even in a linearized maze, it can provide unique perspectives. For example, it measures sequence strength as a function of speed, which is not the case in either Radon or linear correlation approach. More so, because TDLM is built on linear modelling, it has good statistical properties if one wants to ask a second order statistical question: e.g., Is the strength of forward sequence stronger than backward sequence? or is replay stronger in early vs. middle vs. late sleep stage. We illustrate this by applying multiscale TDLM to CA1 place cells spiking data from Ólafsdóttir et al. (2016).
In Ólafsdóttir et al. (2016), rats ran multiple laps on a Z maze, and were then placed in a rest enclosure for 1.5 hours (Figure 7A). The Z maze consisted of 3 tracks, with its ends and corners baited with sweetened rice to encourage running from one end to the other. Following Ólafsdóttir et al. (2016), we excluded both the ends and corners when generating the rate map, given an animal regularly performs nonperambulatory behaviors at these locations. Periods when running speed were less than 3 cm/s were also excluded. The running paths were then linearized, and dwell time and spikes were binned into 2 cm bins, smoothed with a Gaussian kernel (σ = 5 bins). We generated rate maps separately for inbound (track1→track2→track3) and outbound (track3→track2→track1) running.
As in Ólafsdóttir et al. (2016), cells recorded in CA1 were classified as place cell if their peak firing field during track running was above 1 Hz with at least 20 cm width (see an example in Figure 7B). Candidate replay events were identified based on multiunit (MU) activity from place cells during rest time. Only periods exceeding the mean rate by 3 stand deviation of MU activity were determined as putative replay events. Events less than 40 ms long, or which included activity from less than 15% of the recorded place cell ensemble, were rejected (see an example of putative replay event in Figure 7C).
We analyzed data from one full recording session (track running for generating rate map, postrunning resting for replay detection) from Rat 2192, as reported in Ólafsdóttir et al. (2016). Following the procedure described above, we identified 58 place cells, and 1183 putative replay events. Replay analysis was then performed on these putative replay events, separately for inbound and outbound rate maps. Critically, because there are separate rate maps for inbound and outbound runs, we can separate forward and backward replay (as the same position has a different decoded state depending on whether it was obtained during an outbound or inbound run).
A forward sequence is defined as when states from the outbound map occur in the outbound order, or states from the inbound map occur in the inbound order. A backward sequence is when states from the inbound map occur in the outbound order or states from the outbound map occur in the inbound order. Candidate events were decoded based on a rate map, transforming the ncells * ntime to nstates * ntime. Each entry in this state space represents the posterior probability of being in this position at any given time. Replay analysis was performed solely on this decoded state space.
During the rest/sleep period, TDLM identified significant forward and backward sequences for both outbound and inbound maps with a wide speed range of 100 – 1000 cm/s, consistent with recent findings from Denovellis et al. (2020) on replay speed. In our analysis, the fastest speed (while still have a stronger evidence than the random sequences), is up to 1000 cm/s, which is around 20X faster than its corresponding free running speed, representing approximately half a trackarm in a typical replay event (e.g., 100 cm in 100 ms), consistent with previous work (Davidson et al., 2009; Karlsson and Frank, 2009; Lee and Wilson, 2002; Nádasdy, Hirase, Czurkó, Csicsvari, and Buzsáki, 1999).
Furthermore, because TDLM is a linear method, it can straightforwardly assess differences between replay in different conditions. For example, we can ask questions of the following form, is there more forward replay in early vs late sleep time? To answer this, we simply divide sleep into early (first 1/3 sleep time), middle (2/3), and late (the last 1/3 sleep), and average sequence evidence separately for different sleep times. We can see that sequence strength is stronger in early compared to middle and late sleep time, especially so for an outbound rate map (Author response image 5C). Being able to perform such comparisons in a linear framework should be useful to the rodent research community.
The utility of existing rodent methods for human neuroimaging data?
As discussed above, both Radon transform, and the linear correlation methods are essentially “line search” methods. These work well on identifying best trajectory (or linear timeposition relationship) in a candidate replay event, but are less suitable at detecting multiple sequence, e.g., in varying speed, directions or contents. This makes it hard to apply these methods to human neuroimaging data, given for example it is unlikely that only one sequence will exist in a 5min resting period.
In addition, existing rodent methods treat the position estimates at each time bin separately and do not endeavour to control for coactivation and autocorrelation. But correlations in both time and space (e.g., voxels in fMRI, sensors in EEG/MEG) are common in neuroimaging data, and, if not controlled, they will lead to false positives (e.g., when compared to zero).
In sum, TDLM can be applied to rodent spiking data to detect replay and has flexibility to control for other confounding variables. These confoundcontrolling concerns, which may not necessarily be important for spiking data, are crucial when it comes to human neuroimaging data analysis. TDLM is also a more general sequence detection method given it does not require a linearized transition. Finally, by placing replay analyses in the GLM framework, it allows us to specify and test hypotheses (such as differential replay across conditions) with established powerful statistical inference procedures.
In the revision, we have devoted a whole new section titled “TDLM FOR RODENT REPLAY” in the main text to address this question. Under this section, there are four topics. They are “Generality of graph vs linebased replay methods”; “Comparisons in simulated data”; “Multiscale TDLM to deal with continuous state space”; and “Applying multiscale TDLM to real rodent data (place cells in CA1)”.
We have also included a supplemental note “Supplementary Note 5: Applying TDLM to candidate replay events” for detailed simulation results. There are three topics, namely “Single sequence in a candidate replay event”; “More than one sequence in a candidate replay event” and “Lesser sensitivity of TDLM to skipping sequences”.
3. It would be useful for the authors to comment on the applicability of the method to sleep data, especially as rodent replay decoding methods are routinely used during both awake rest and sleep.
This is a great suggestion. We have not worked with human sleep data before.
We think it should be possible, though there will be technical problems to solve along the way.
For example, multivariate classifiers trained on wholebrain data pattern are sensitive to both stimulusspecific patterns of activity and the specific background noise distribution. We would anticipate that this background noise distribution in particular would differ substantially between the awake and sleep state, potentially resulting in degraded classifier performance that may pose a challenge to replay detection.
In our own studies we have shown that classifiers with sparse L1 regularisation are more sensitive to replay (Liu et al., 2019; Wimmer, Liu, Vehar, Behrens, and Dolan, 2020). It is also worth noting that the L1induced sparsity encodes weaker assumptions about background noise distributions into the classifiers as compared to L2 (Higgins, 2019). We expect that the use of sparse classifiers will be of greater importance when applied to sleep data, where background noise distributions differ more substantially from (awake state) training data.
We have added this point in the revised manuscript:
Main text – Regularization, page 14, line 1015
“In addition to minimizing spatial correlations, as discussed above, it can also be shown that L1induced sparsity encodes weaker assumptions about background noise distributions into the classifiers, as compared to L2 regularization (Higgins, 2019). […] Here, the use of sparse classifiers is helpful as background noise distributions are likely to differ more substantially from the (awake state) training data.”
4. How does the method relate to the Wittkuhn and Schuck fMRI replay detection method? What might be the advantages and disadvantages of each?
Wittkuhn and Schuck applied a simple linear regression between the serial position of the stimuli and their classification probabilities at every TR. This is similar to the linear weighted correlation method applied in rodent studies. There is no issue with respect to statistical inference given they are implicitly comparing to the strengths of other sequences. The problem is with generality. As indeed noted by these authors, their approach does not permit an examination of forward and backward sequence separately (Wittkuhn and Schuck, 2020). We make this clear in the discussion.
5. The authors make the point that spatial correlation as well as anticorrelation between state patterns reduces the ability to detect sequences. The x axis for Figure 3c begins at zero, demonstrating that lower positive correlation is better than higher positive correlation. Given the common practice of building one classifier to decode multiple states (as opposed to a separate classifier for each state), it would be very useful to provide a demonstration that the relationship in Figure 3c flips (more correlation is better for sequenceness) when spatial correlations are in the negative range.
We believe there is a misunderstanding here, and we apologise for lack of clarity. The X axis in Figure 3c is the absolute value of the correlation, and indeed when there is a higher negative correlation this leads to lower sequence detection. We took the absolute value here because the direction of correlation is not important, only the degree of the shared covariance matters. We add this point in the revised manuscript.
Main text – Spatial correlations, page 13, line 510
“Unfortunately, positive or negative correlations between states reduces the sensitivity of sequence detection, because it is difficult to distinguish between states within the sequence: collinearity impairs estimation of β in Equation 2[…] We took the absolute value here because the direction of correlation is not important, only the magnitude of the correlation matters.”
6. In the Results, the authors specify using a single time point for spatial patterns, which would seem to be a potentially very noisy estimate. In the Methods, they explain that the data were downsampled from 600 to 100 Hz to improve SNR. It seems likely that downsampling or some other method of increasing SNR will be important for the use of single time point estimates. It would be useful for the authors to comment on this and provide recommendations in the Results section.
In this paper, we try to focus on the sequence detection for the reactivation of representations, the principal aim of TDLM. As such, we have not explored different combinations of preprocessing pipelines (e.g., down sampling, averaged or single time bin) or selection of the raining time point. It is possible some other ways of preprocessing and feature selection is better than just training classifiers on a single time point, but we have not explored this. For simplicity, and for the purposes of this paper, which is already long and detailed, we hope the reviewer will allow us to leave this issue for future work.
7. While the demonstration that the method works for detecting theta sequences in navigating rodents is very useful, the paper is missing the more basic demonstration that it works for simple replay during awake rest in rodents. This would be important to include to the extent that the authors believe the method will be of use in comparing replay between species.
This is a great suggestion. In answering Q2, we have demonstrated both in simulation and real data, that TDLM is able to detect replay in hippocampal ripple data.
8. The authors explain that they "had one condition where we measured resting activity before the subjects saw any stimuli. Therefore, by definition these stimuli could not replay, but we can use the classifiers from these stimuli (measured later) to test the false positive performance of statistical tests on replay." My understanding of the rodent preplay literature is that you might indeed expect meaningful "replay" prior to stimulus exposure, as existing sequential dynamics may be coopted to represent subsequent stimulus sequences. It may therefore be tricky to assume no sequenceness prior to stimulus exposure.
This is a good point. We would add that this is different to the preplay phenomena observed in rodent literature. In rodents, preplay happens before the rodent enters the novel maze, the authors of the relevant paper suggest this is due to a predefined canonical dynamic in the hippocampus (Dragoi and Tonegawa, 2011, 2014). Crucially, this is possible in physical space because the transitions and representations can contain information about these fixed relationships. One cell can always fire before another, and cellular relationship can determine which cells fire later at which location, and therefore what the decoder looks like.
The analogy in our experiment is that we might build a decoder at the time the sequence is experienced (as is done in rodent studies). Then the decoder might potentially rely on 2 pieces of information – the stimulus and the position in the sequence. These representations might indeed preplay, because preexisting position representations might be built into the representation of the stimulus in the sequence. We have shown this is true, and indeed that the position portion of this representation does replay (Liu et al., 2019).
However, the replay we are looking at here cannot preplay, as not only is the resting data acquired before seeing the sequence, but crucially also the stimulus classifiers are built before the subject sees the sequence. Thus, they do not contain any representation of sequence position. Furthermore, the order of images in the sequence is randomised across subjects. On this basis there is no means for a subject to know which sequence to preplay.
We have added this point to the section “Distribution of sequenceness at a single lag” in the revised manuscript.
“We have tested this also on real MEG data. […] Indeed, in our case, each subject saw the same stimuli in a different order. They could not know the correct stimulus order when these resting data were acquired.”
Reviewer #2:
[…] Enthusiasm for the paper is limited by several overall issues:
1. It seems a major claim is that the current method is somehow superior to other methods (e.g. from the abstract: "designed to take care of confounds" implying that other methods do not do this, and "maximize sequence detection ability" implying that other methods are less effective at detection). But there is very little actual comparison with other methods made to substantiate this claim, particularly for sequences of more than two states which have been extensively used in the rodent replay literature (see Tingley and Peyrache, Proc Royal Soc B 2020 for a recent review of the rodent methods; different shuffling procedures are applied to identify sequenceness, see e.g. Farooq et al. Neuron 2019 and Foster, Ann Rev Neurosci 2017). The authors should compare their method to some others in order to support these claims, or at a minimum discuss how their method relates to/improves upon the state of the art.
We thank the reviewer for the helpful suggestion. In the original manuscript we wrote “designed to take care of confounds”, “maximize sequence detection ability” with the crosscorrelation method (Eldar, Bae, KurthNelson, Dayan, and Dolan, 2018; KurthNelson et al., 2016) in mind, and showed comparison results with TDLM in Figure 3. But we agree we should also compare TDLM with other replay detection methods, especially from rodent literature. This is also requested by Reviewer 1 (Question 2). I
In response, we have performed detailed comparisons in simulation and real data with other techniques. Since the response covered many pages, we have not reproduced these here. We very much hope the reviewer will understand, and we refer the response to Reviewer 1 (Question 2).
In the revised manuscript. we provide a whole new section titled “TDLM for rodent replay” in the main text that addresses this question. We also include a supplemental note “Supplementary Note 5: Applying TDLM to candidate replay events” for detailed simulations.
In brief, TDLM is a more general method that assesses statistical likelihood of certain transitions on an arbitrary graph, rather than testing for a parametric relationship between time and decoded positions in a linearized space (Author response images 1 and 2). While TDLM, in its original form, is sensitive to skipping states (Author response image 4), its hierarchal version – multiscale TDLM can deal with continuous state space. To show this, we have applied multiscale TDLM to rodent replay data described in Ólafsdóttir et al. (2016) (Figure 7AC, Author response image 5). Moreover, its GLM framework makes it straightforward to answer secondorder statistical questions, e.g., is replay strength stronger in early vs. late sleep stage (Author response image 5C).
2. The scope or generality of the proposed method should be made more explicit in a number of ways. First, it seems the major example is from MEG data with a small number of discrete states; how does the method handle continuous variables and larger state spaces? (The rodent ephys example could potentially address this but not enough detail was provided to understand what was done; see specific comments below.) Second, it appears this method describes sequenceness for a large chunk of data, but cannot tell whether an individual event (such as a hippocampal sharp waveripple and associated spiking) forms a sequence not. Third, there is some inconsistency in the terminology regarding scope: are the authors aiming to detect any kind of temporal structure in neural activity (first sentence of "Overview of TDLM" section) which would include oscillations, or only sequences? These are not fatal issues but should be clearly delineated.
We thank the reviewer for helpful suggestions.
1. States:
In this paper, we used 4 or 8 discrete states that formed one or two sequences. The number of states is not a constraint of the method, although decoding accuracy might drop gradually as a function of the number of states to be decoded, which could make the sequence analysis nosier. But this is a signal to noise ratio problem, not a problem of the sequence detection method itself.
Continuous spaces are amenable to TDLM by simply chunking the space into discrete states. It is the case that TDLM in its original form may potentially be less sensitive for such analyses than techniques that buildin assumptions about the spatial layout of the state space. This is because TDLM works on a graph, and has no information about the Euclidean nature of the state space. Techniques that make assumptions about linear structure benefit from these prior assumptions. For example, detecting state 1 then state 5 then state 10 counts as replay in these techniques, but this is not so in TDLM.
This divergence can be alleviated by a simple adjustment to TDLM to render it multiscale – i.e., by chunking the continuous Euclidean space into a discretisation of different scales, computing TDLM sequenceness separately at each scale, and then taking the precision weighted average across scales to recover a single sequenceness measure. It is easy to see why this addresses the concerns hinted at above, as some scales will capture the 1→2→3 transitions, whilst others will capture the 1→10→20 transitions. Because the underlying space is continuous, we can average results of the same speed together, and this will reinstate Euclidean assumptions. Applying multiscale TDLM, we show it can deal with skipping states as well as a continuous state space in both simulation (Figure 7, Author response image 5) and in rodent data analysis (Author response image 6). Indeed, this enables us to measure the speed of replay in cm/s.
2. Time length: The TDLM method can be applied to a single ripple event, though it is true that the sequence estimate is more accurate if applied to aggregated ripple events. This is because we have more samples and consequently more states are likely be activated in the analysis time frame. We argue this is a strength rather than weakness compared to traditional methods. Most of the time, we care about whether there are significant sequences in general rather than within a specific ripple event. Most existing methods assess sequence strength within ripple events, because they either search for a best fitting line or for correlations between time and state. Those methods cannot deal with multiple sequences, while TDLM can, because TDLM is instead looking for the averaged evidence for certain transitions (Appendix 5—figure 1). Furthermore, many interesting questions rely on comparing replay across different situations (i.e., “secondorder questions”). We argue this is more naturally done in a linear framework, such as TDLM, which can compute an aggregate measure across all instances of each situation and simply compare these aggregates than the alternative, which requires counting events that cross threshold criteria and comparing the counts across conditions.
3. Application scope: TDLM is designed to detect sequences alone, not oscillations, and in some cases, we deliberately control for neural oscillations to enable sequence detection. We now make this clearer in the revised manuscript.
We have added this new section “generality of TDLM” in the revised manuscript.
“Generality of TDLM
We have so far discussed the applicability of TDLM in relation to human MEG, as well as in rodent electrophysiology (with comparisons to standard replay detection methods). and[…] We expect this generality will also be important as rodent replay experiments move beyond 1D tracks, for example to foraging in 2D, or in complex mazes.”
3. The inference part of the work is potentially very valuable because this is an area that has been well studied in GLM/multiple regression type problems. However, the authors limit themselves to asking "firstorder" sequence questions (i.e. whether observed sequenceness is different from random) when key questions – including whether or not there is evidence of replay – are actually "secondorder" questions because they require a comparison of sequenceness across two conditions (e.g. pretask and posttask; I'm borrowing this terminology from van der Meer et al. Proc Royal Soc B 2020). The authors should address how to make this kind of comparison using their method.
We thank the reviewer for appreciating our use of GLM framework for statistical inference. This firstorder and secondorder distinction is helpful. We are also grateful for pointing us towards the relevant literature. Based on van der Meer, Kemere, and Diba (2020), the secondorder questions concerns (1) the comparison between forward and backward replay of the same sequence, (2) sequence events in different time, and (3) sequence strengths between different representational contents. The potential problems in addressing the secondorder questions are: (a) Biases in cell sampling; (b) Biases in behavioural sampling; (c) Nonstationary tuning curves.
In the linear framework, the machinery for computing these secondorder inferences is simple. We can just subtract the sequenceness measure between conditions. Indeed, in the original version of this manuscript there were various examples of this procedure for quantifying (Forwards – Backwards) replay. In the new version, we show a similar thing in rodent data, comparing between early, mid and late sleep (Figure 7, Author response image 5).
Whilst the machinery is simple, biases may still exist. We would, however, like to take the liberty of bringing some insights from the human neuroscience world. In human experiments, this type of problem is addressed as follows:
1. Ensure that biases that might occur within subjects will not occur consistently in the same direction across subjects (e.g., by randomising stimuli across participants)
2. Compare across conditions in each subject, to give a summary measure of the comparison in each subject.
3. Perform random effects inference across the population. That is, infer against the betweensubject variance.
If this procedure is followed, then it can ensure that any biases that might be present at an individual experimental subject level will not cause false positives for a population inference.
We realise that such a solution might not always be possible in rodent studies which may not have the sample size necessary for random effects inference, or where a large proportion of the recorded cells may come from a small portion of animals. However, we think that as largescale recordings become cheaper, such practices will become more common.
In the meantime, it will be necessary to be vigilant to these biases in rodent experiments that use TDLM, as is the case with other techniques.
In the human data we present here, we believe that the analogous problems are all controlled for by the approach described above. We randomised stimuli and included strict controls (for example comparing rewarded and neutral sequences in Liu et al. 2019).
We have now added the following point in the revised manuscript:
“Considerations on secondorder inferences
We can consider making a distinction between levels of inference, following van der Meer et al. (2020).[…] We can subtract the sequenceness measure between conditions.”
Reviewer #3:
The methods used by the authors seem like potentially really useful tools for research on neural activity related to sequences of stimuli. We were excited to see that a new toolbox might be available for these sorts of problems, which are widespread. The authors touch on a number of interesting scenarios and raise relevant issues related to crossvalidation and inference of statistical significance. However, given (1) the paucity of code that they've posted, and its specificity to specific exact data and (2) the large literature on latent variable models combined with surrogate data for significance testing, I would hesitate to call TDLM a "framework". Moreover, in trying to present it in this generic way, the authors have muddled the paper, making it difficult to understand exactly what they are doing.
Overall: This paper presents a novel approach for detecting sequential patterns in neural data however it needs more context. What's the contribution overall? How and why is this analysis technique better than say Bayesian template matching? Why is it so difficult to understand the details of the method?
We thank the reviewer for the positive feedback. We are sorry for lack of context and code. We now provided greater context and addressed specific concerns in detail in the following section.
We would like to keep calling TDLM a “framework” given it is based on the general linear model that can allow flexible and iterative use to meet different aims. Many different questions about sequences can be embedded in the same simple setup. The setup accounts for common problems and gives general solutions to questions such as inference. As noted by both reviewer 1 and reviewer 2, this paper aims to develop a domaingeneral sequence detection method that is not tied to any specific data modality (in this revision, we will show its applicability to detect replay within ripple events in rodents, and human EEG data as well as the MEG and theta sequences that were present in the original manuscript).
We agree with the reviewer that the paper needs more context. We provide this in the revised manuscript (also detailed in the response to specific concerns).
We believe the overall contribution of this paper is our introduction of a domain general sequence detection method under a GLM framework.
We have now compared TDLM with the existing methods in the rodent research, and shown the applicability of TDLM in both simulation, and real data (both rodent ephys data and human MEG and EEG, as a response to Reviewer 1 Question 1).
To make our points concrete, we reanalysed rodent electrophysiology data from Ólafsdóttir et al. (2016) using our approach, and shown what TDLM can offer uniquely compared to existing “line searching” methods in rodent replay analysis. Please see details in section “TDLM for rodent replay” in the revised manuscript, also our response to reviewer 1 question 2.
We apologise for not being clear in the original submission. We have revised the paper accordingly based on reviewer’s suggestions.
The first and most important problem with this paper is that it is intended (it appears) to be a more detailed and enhanced retelling of the author's 2019 Cell paper. If this is the case, then it's important that it also be clearer and easier to read and understand than that one was.
We apologies for not being sufficiently clear. It is true we have dominantly used data from our previous paper as examples, but we emphasise that this current paper is not intended as an extension of our previous paper. Rather, we want to expand on the method, and develop a domain general sequence detection method for wider use. In the revision, we have now shown its applicability across multiple datasets and have extended TDLM to work on a continuous state space, e.g., physical space, as mostly used in rodent literature.
However, we understand where the reviewer is coming from and we hope we make our aim clearer in the revised manuscript. In the following section, we address specific concerns and follow the reviewer’s suggestion wherever we can in order to make the paper more accessible.
The authors should follow the normal tradition in computational papers:
1. Present a clear and thorough explanation of one use of the method (i.e., MEG observations with discrete stimuli), then present the next approach (i.e., sequences?) with all the details necessary to understand it.
Thanks for the suggestion. We have largely rewritten the paper, with the same order as per the description, but also are more precise and provide greater detail in each section. We are confident it is now better organized to explain our thinking at each step of TDLM, and its application scope.
2. The authors should start each section with a mathematical explanation of the X's – the equation(s) that describes how they are derived from specific data. Much of the discussion of cross validation actually refers to this mapping.
We thank the reviewer for this helpful suggestion. We now include both the mathematical explanations and concrete examples of states (i.e., X) for each definition wherever appropriate. We also include an algorithm box for the process of cross validation in the section “Supplementary Note 2: Pseudocode of sensory code and abstract code crossvalidations” in the revised manuscript.
The revised texts on this section reads:
“Getting the states
As described above, the input to TDLM is a set of time series of decoded neural representations, or states.[…] This more sophisticated use of TDLM merits its own consideration and is discussed in section “Sequences of sequences” in the supplemental material (Supplementary Note 1: Extension to TDLM).”
3. Equation 5 also needs a clearer explanation – it would be better to write it as a sum of matrices (because that is clearer) than with the strange "vec" notation. And TAUTO, TF and TR should be described properly – TAUTO is "the identity matrix", TF and TR are "shift matrices, with ones on the first upper and lower off diagonals".
Thanks for the suggestion. We have done so in the revised manuscript.
This part in the revised manuscript now reads:
“Testing the hypothesized transitions
The first level sequence analysis assesses evidence for all possible statetostate transitions. […] Repeating the regression of Equation 5 at each time lag (∆𝑡 = 1, 2, 3, …) results in time courses of the sequenceness as a function of time lag (e.g., the solid black line in Figure 1f), in which 𝑍_{F}, 𝑍_{B} are the forward and backward sequenceness respectively (e.g., red and blue lines in Figure 1g).”
4. The cross validation schemes need a clear description. Preferably using something like a LaTeX "algorithm" box so that they are precisely explained.
Thanks for the suggestion. We have included the algorithm box for the process of the cross validation in the section “Supplementary Note 2: Pseudocode for sensory code and abstract code crossvalidations” in the revised manuscript.
“In the consideration of the formatting, we have attached the Latexbased algorithm box in a picture form.”
Recognizing the need to balance readability for a general reader and interest, perhaps the details could be given for the first few problems, and then for subsequent results, the detail could go into a Methods section. Alternatively, the methods section could be done away with (though some things, such as the MEG data acquisition methods are reasonably in the methods).
Thanks. We have done so in the revised manuscript.
Usually, we think about latent variable model problems from a generative perspective. The approach taken in this paper seems to be similar to a Kalman filter with a multinomial observation (which would be equivalent to the logistic regression?), but it's unclear. Making the connection to the extensive literature on dynamical latent variable models would be helpful.
Thanks for this suggestion. It is not a standard latent variable model, but it is related. The relevant latent variable model is not a Kalman filter (which tracks continuous variables), but instead a Hidden Markov Model (which tracks the transitions between discrete states).
However, TDLM is different from a traditional HMM as it:
1. Does not need to estimate the emission matrix from the test data, as it gets it from independent training data.
2. Does not assign every timepoint to a state, as the classifiers do not sum to 1.
3. Needs to estimate many transition matrices – 1 for each potential time lag.
As the reviewer points out, we could use the Bayesian generative model that underlies an HMM and account for these changes. Indeed, we are proponents of Bayesian modelling in much of our other work. However, in this situation we think the costs of the Bayesian approach outweigh its potential benefits.
The benefits of doing so would be a small reduction in state uncertainty that would come with using the estimated transition dynamics as prior distributions on the states.
The costs of doing so would be to lose the linear framework. This would mean
a. We would lose access to wellstudied and wellbehaved test statistics for inference.
b. We would lose flexibility in the available strategies for controlling for autocorrelations.
c. We would dramatically increase computation time (which is important for point 3 as we are estimating the model many times).
We have thought carefully about this, and we are unsure whether making the analogy to an HMM is really useful, except for methodological experts. We have chosen not to do so in the paper, but if the reviewer feels strongly, we are happy to introduce a section. We note that this answer will be available to experts on bioRxiv (and on the eLife website with the paper if it is published).
References:
Buzsáki, G. (2002). Theta oscillations in the hippocampus. Neuron, 33(3), 325340.
Chadwick, A., van Rossum, M. C., and Nolan, M. F. (2015). Independent theta phase coding accounts for CA1 population sequences and enables flexible remapping. eLife, 4, e03542.
Davidson, T. J., Kloosterman, F., and Wilson, M. A. (2009). Hippocampal replay of extended experience. Neuron, 63(4), 497507.
Denovellis, E. L., Gillespie, A. K., Coulter, M. E., Sosa, M., Chung, J. E., Eden, U. T., and Frank, L. M. (2020). Hippocampal replay of experience at realworld speeds. bioRxiv.
Dragoi, G., and Tonegawa, S. (2011). Preplay of future place cell sequences by hippocampal cellular assemblies. Nature, 469(7330), 397.
Dragoi, G., and Tonegawa, S. (2014). Selection of preconfigured cell assemblies for representation of novel spatial experiences. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 369(1635), 20120522.
Eldar, E., Bae, G. J., KurthNelson, Z., Dayan, P., and Dolan, R. J. (2018).
Magnetoencephalography decoding reveals structural differences within integrative decision processes. Nature human behaviour, 2(9), 670681.
Fyhn, M., Hafting, T., Treves, A., Moser, M.B., and Moser, E. I. (2007). Hippocampal remapping and grid realignment in entorhinal cortex. Nature, 446, 190.
Grosmark, A. D., and Buzsáki, G. (2016). Diversity in neural firing dynamics supports both rigid and learned hippocampal sequences. Science, 351(6280), 14401443.
Harris, K. D. (2020). Nonsense correlations in neuroscience. bioRxiv.
Higgins, C. (2019). Uncovering temporal structure in neural data with statistical machine learning models. University of Oxford.
Higgins, C., Liu, Y., Vidaurre, D., KurthNelson, Z., Dolan, R., Behrens, T., and Woolrich, M. (2020). Replay bursts in humans coincide with activation of the default mode and parietal α networks. Neuron.
Karlsson, M. P., and Frank, L. M. (2009). Awake replay of remote experiences in the hippocampus. Nature neuroscience, 12(7), 913.
KurthNelson, Z., Economides, M., Dolan, Raymond J., and Dayan, P. (2016). Fast Sequences of Nonspatial State Representations in Humans. Neuron, 91(1), 194204.
Lee, A. K., and Wilson, M. A. (2002). Memory of sequential experience in the hippocampus during slow wave sleep. Neuron, 36(6), 11831194.
Liu, Y., Dolan, R. J., KurthNelson, Z., and Behrens, T. E. J. (2019). Human replay spontaneously reorganizes experience. Cell, 178(3), 640652.
Liu, Y., Mattar, M. G., Behrens, T. E. J., Daw, N. D., and Dolan, R. J. (2020). Experience replay supports nonlocal learning. bioRxiv. doi:10.1101/2020.10.20.343061
Maboudi, K., Ackermann, E., de Jong, L. W., Pfeiffer, B. E., Foster, D., Diba, K., and Kemere, C. (2018). Uncovering temporal structure in hippocampal output patterns. eLife, 7, e34467.
Nádasdy, Z., Hirase, H., Czurkó, A., Csicsvari, J., and Buzsáki, G. (1999). Replay and time compression of recurring spike sequences in the hippocampus. Journal of Neuroscience, 19(21), 94979507.
Ólafsdóttir, H. F., Carpenter, F., and Barry, C. (2016). Coordinated grid and place cell replay during rest. Nature neuroscience, 19(6), 792.
Schuck, N. W., and Niv, Y. (2019). Sequential replay of nonspatial task states in the human hippocampus. Science, 364(6447), eaaw5181.
Sun, C., Yang, W., Martin, J., and Tonegawa, S. (2020). Hippocampal neurons represent events as transferable units of experience. Nature neuroscience, 113.
Tingley, D., and Peyrache, A. (2020). On the methods for reactivation and replay analysis. Philosophical Transactions of the Royal Society B, 375(1799), 20190231.
Toft, P. A. (1996). The Radon transformtheory and implementation.
van der Meer, M. A., Kemere, C., and Diba, K. (2020). Progress and issues in secondorder analysis of hippocampal replay. Philosophical Transactions of the Royal Society B, 375(1799), 20190238.
Wimmer, G. E., Liu, Y., Vehar, N., Behrens, T. E. J., and Dolan, R. J. (2020). Episodic memory retrieval success is associated with rapid replay of episode content. Nature neuroscience.
Wittkuhn, L., and Schuck, N. W. (2020). Faster than thought: Detecting subsecond activation sequences with sequential fMRI pattern analysis. bioRxiv.
Zhang, K., Ginzburg, I., McNaughton, B. L., and Sejnowski, T. J. (1998). Interpreting neuronal population activity by reconstruction: unified framework with application to hippocampal place cells. Journal of neurophysiology, 79(2), 10171044.
[Editors’ note: what follows is the authors’ response to the second round of review.]
While impressed by the novelty and potential utility of the method, the reviewers had a specific critical concern. Namely, do high sequenceness scores truly capture activation of patterns that widely span the specified sequence space (i.e., many complete "ABCDE" sequences) or only a collection of pairwise correlations (i.e., "AB", "DE", "BC")? Presenting more examples from experimental data that demonstrate the former was perceived as critical to demonstrate the utility of TDLM as an approach for "replay detection". Ultimately, the reviewers reached a consensus that in the current presentation, what TLDM actually detects remains opaque, and the impact of the work is diminished. We considered requesting an action plan, but upon reflection, I think that the main issue is one of semantics. However, if you wish to describe TDLM as a method for detecting "replay", it needs to be critically addressed.
In addition to the reviews below, in our discussion, one reviewer noted: "even though they now explain in more detail what they did to analyze theta sequences, the result (~12 Hz) is still seemingly at odds with the ~89 Hz repetition one would expect from the literature. I'm actually not sure this adds a whole lot to the paper so I think it would be better to just take this part out."
Thanks for the suggestion.
1. We have now carefully gone through the paper and ensure it is clear that we are using TDLM to measure the average replay strength over the whole measured time, rather than single replay events.
For example, in the introduction:
“Here we show TDLM is suited to measure the average amount of replay across many events (i.e., replay strength) in linear modelling. […] Applying TDLM on noninvasive neuroimaging data in humans, we, and others, have shown it is possible to measure the average sequenceness (propensity for replay) in spontaneous neural representations ^{14}.”
In the section on applying TDLM to rodent replay:
“Note, TDLM is applied directly to the concatenated rather than individual replay events. […] Because TDLM addresses statistical questions in linear modelling, it does not require secondary statistics to ask whether the “counts” of individual events are more likely than chance, or more likely in one situation than another.”
In the discussion:
“TDLM adds a new analysis toolkit to the replay field. It is especially suited for summarising replay strength across many events, for comparing replay strength between conditions, and for analysing replay strength in complex behavioural paradigms. […] Unlike alternative tools, we have not shown TDLM applied to individual replay events.”
2. Also following suggestion, we have deleted the section regarding single replay event analysis in both the main text and the supplementary note 5 (we assume this is what the reviewers suggested given the original supplementary figure 5 is replay analysis on human EEG data).
3. We have made it clear in the discussion that “TDLM is an addition to the analysis toolboxes of the field rather than a replacement”.
“TDLM adds a new analysis toolkit to the replay field. It is especially suited for summarising replay strength across many events, for comparing replay strength between conditions, and for analysing replay strength in complex behavioural paradigms. […] Unlike alternative tools, we have not shown TDLM applied to individual replay events.”
4. We have dropped the theta sequence analysis in the supplementary note 1 and have updated the figures accordingly.
Reviewer #1:
Overall, I find great value in the effort to provide researchers working with very different animal models and datasets a similar toolkit to apply and analyze reactivation and replay. But I also have significant concerns about the potential for these methods, if poorly understood and applied, to further confound the field. Fully understanding this paper and the described methods and its caveats is not easy for most experimentalists, yours truly included. I am concerned that investigators will misapply these tools and claim significant replay in instances where there is none. These concerns may be addressable by better diagnostics and related guidance for interpretation.
Nevertheless, an important caveat in the work is that it does not detect "replay" per se, but rather temporal biases in decoded activity. Thus I think the title should be amended. In some places, the authors describe this as "sequenceness", or "temporal delays" which are both preferable to "replay". Prior work (e.g. Skaggs and McNaughton) used a similar measure, but referred to it as "temporal bias". While this temporal bias is certainly related to replay and sequences, it is only an indirect measure of these, and more akin to "reactivation", as it's largely pairwise analyses. Clarity on such issues is particularly important in this area given the excessive ambiguity in terminology for the replay and reactivation phenomena.
My other major concern is that the analysis is rather opaque in an area where there is much need for transparency, especially considering the existing debates and controversy surrounding appropriate methodology and conclusions that can be drawn from it. For example, in most of the figures that the authors provide, it's unclear whether the sequenceness score is driven by one particular pair of stimuli, or equally so among most possible pairs. Perhaps a transition graph could be composed from the data, but I did not find this except in cartoon diagrams. I think it would be important for the authors to provide guidance regarding these details. A related question is whether these biased pairs typically appear in isolation, or as part of larger multiitem events? It's not clear if there is a straightforward way to extract this important information. Some sample events could be shown, but examples can also be cherrypicked and nonrepresentative. Probably a histogram of sequence bout lengths would be valuable.
Thank you for the great comments. The first two questions were the subject of a query we sent, which is copied below, and the response to this query is addressed above.
“Below is our take on the discussion. If you find this persuasive, then we are happy to do exactly as suggested by the reviewers on all other points. If you do not find this persuasive, then we are a bit stuck. So, we would like to understand the position of the reviewers before proceeding.
Here is what we think:
1. We are sorry if it seems opaque what the method is measuring. It is, in fact, mathematically defined, and we try to state it clearly here: We are measuring whether B is more likely to be activated at a certain time lag after A, compared to the average chance of B being activated in the rest of the data. We are testing against the null hypothesis that the frequency of B does not depend on the previous activation of A (Averaged across all pairs, AB, BC, CD ...).
2. If you want us to call the measure “sequenceness” instead of “replay” that is fine with us. However, we ask that you might consider the following points before coming to this conclusion (Note it is *not* the same as reactivation, as it does require a sequence of at least 2 elements).
The measure might be construed as sequenceness as opposed to replay because it only relies on AB, BC, CD as you say. However:
i. Importantly, every technique that we are aware of for detecting replay on the decoded state space has the same issue, or a more pronounced version of this issue. For example, the radon transform method will return a statistically significant result for “replay” if it detects AB (alone). It will also declare replay for AC alone, or AD alone or AE alone. We demonstrate this in supplementary figure 8. Any technique that asks, “Is there more replay than chance?” will return a positive for any of these situations. Nevertheless, they are described as “replay detection methods”.
ii.Like the radon transform, or linear correlation, or all other techniques, TDLM will return a stronger measure of replay if there is ABCD than if there is only AB. This is particularly true of multiscale TDLM.
iii. Uniquely, as far as we know TDLM does allow you to test for ABCD because, in the linear framework, it is possible to test for the interaction of AB, BC, CD. This asks “Does an AB pair precede a BC pair. Etc. We describe this in section “Multistep sequences”. Again, we can test this formally in the linear modelling framework. This section really *does* provide a method for detecting ABCD.
We therefore think, on this issue, there is no difference between TDLM and all other methods that have previously been described as detecting “Replay”, except that there is a setting in which TDLM will require the whole sequence, but there is not for other techniques.
However, there is a difference between what we have demonstrated with TDLM and other methods. Whilst it is conceptually possible, we have not demonstrated that TDLM can detect individual replay events. Instead, we have shown that TDLM can measure the average amount of replay across many events (for example all events in a sleep session, or in early sleep). We note, the sleep dataset is a real dataset in rodent hippocampal electrophysiology, it is from “Ólafsdóttir, H. F., Carpenter, F., & Barry, C. (2016). Coordinated grid and place cell replay during rest. Nature neuroscience”.
We’re afraid we are not going to change this. We believe that this is a more rigorous approach, more amenable to formal statistics and less likely to suffer the statistical problems that have plagued the replay literature to date. It does not require secondary statistics to ask whether the “Counts” of individual events are more likely than chance, or more likely in one situation than another (as highlighted by reviewer 2). We would therefore like to encourage the field to take this approach.
There is one other point we would like to raise. Reviewer 1 is concerned that we might further confound the experimental community with this method. We do not think this is true. Having developed a number of methods, this concern has been raised on many occasions. We have been advised that it is important to keep methods to the experts who understand them. We have instead found that by far the best way for a field to understand a method is for the authors to release it transparently to the experimental community, to try to support their work and to listen to their requests and criticisms. Then the strengths and weaknesses of a method become clear. We hope to do the same with this method.
Overall, therefore, we don’t think there is a strong reason, by comparison to other methods, for limiting the use of the word “replay”, but we are happy to do so if you disagree. We are happy to call it sequenceness. However, we are not going to analyse individual events.”
Part of the claimed value in these methods is their possible application to spike trains, e.g. from rodents, particularly for detecting and describing individual events. The authors claim that this is possible. However, while they analyze two rodent datasets in different parts, they do not apply it to any real data, but only on rather contrived (low noise) simulated data. This raises the concern that TDLM is not sufficiently sensitive for detecting individual events. The theta sequence analysis results shown in Supplementary Figure 3d are also concerning to me. They show a very noisy signal that passes threshold in one particular bin. If such a figure were a major result in a new manuscript, and the current eLife manuscript was cited as validation for the methods, would reviewers be obliged to accept it for publication? If not, what additional criterion or diagnostic steps would be recommended?
We hope that the changes in the manuscript demonstrate that we think there is value in an approach that measures replay across entire conditions and provides a natural framework for comparing between conditions, or between types of replays e.g., forward vs backward. We are now much clearer about how linear modelling can tackle these problems. See responses to Reviewer 2 below. We are also much clearer that we think that this method provides an additional tool, not replacing existing tools (see text above).
Comments for the authors:
P2, Line 32: "and" seems misplaced.
P2, Line 34: "periods of activity".
P3, Line 4: perhaps "single neuron" rather than "cellular".
P4, Lines 3435: it is not clear here what the authors mean by "over and above variance that can be explained by other states at the same time." It gets more clear later in page 11, section "Moving to multiple linear regression". The authors might consider either referring to that section at the end of the sentence or removing the unnecessary details that might confuse the reader at this juncture.
P5, Line 31: This sentence is a bit confusing. It is not clear what "in which" refers to. It might be better to start a new sentence for describing Zf and Zb.
P6, Lines 78: The authors might refer to the section "Correcting for multiple comparisons" on page 16, where more details are provided.
Thank you very much indeed, we have made the changes following your suggestions.
P8, Lines 4246: the description of abstraction may benefit from additional background and exposition.
P9, Line 1824, I found this entire paragraph very confusing.
Thanks very much indeed for these comments. This whole section was confusing and overly detailed. It was perhaps appropriate for the original paper where we were aiming predominantly at human researchers, but we agree it now adds confusion. We have dramatically shortened it (as pasted here), and referred to the supplement and the Liu et al. 2019 paper for details that may be important to a small minority of human researchers. We hope this new paragraph is clearer.
This paragraph now reads:
“As well as sequences of sensory representations, it is possible to search for replay of more abstract neural representations. […] One way that excludes the possibility of sensory contamination is if the structural representations can be shown to sequence before the subjects have ever seen their sensory correlates ^{4}.”
P10, Line 7: "that share the same abstract representation" is redundant.
P10, Line 12: "tested" should be corrected to test it.
P10, Lines 2324: Confusing sentence with multiple "whiches".
Thanks again, we have now deleted this whole section and replaced with the paragraph above.
P12, Line 24: Is it possible for the autocorrelation structure of Xi(t) to generate false positives for Xj(t+dt), or would this necessarily be entirely accounted for by Xj(t)?
We are not sure we understand this question. If any previous measurement from i predicts j(t+dt) better than all previous j(t), then this can’t be just because of autocorrelation (because the measurement we are predicting is j, not i). It needs to be (lagged) crosscorrelation. This lagged crosscorrelation is what we are trying to measure. Sorry if this does not quite get at your question, but we are struggling to understand.
P13, Lines 2324: How the regularization (penalizing for the norm of W) is supposed to make the model "trained on stimulusevoked data" generalize better to offtask/rest data? The authors might add a sentence or two at the end of the paragraph to make the aim more clear and motivate the next paragraphs in the section. Based on the descriptions in the first paragraph of Page 14, the regularization seems to add more sparsity to the estimated W that minimizes spatial correlations between the states, etc. Something similar to these descriptions could be used here.
Thanks, we have added a sentence at the end of the paragraph to make it clear. It now reads:
“A key parameter in training high dimensional decoding models is the degree of regularization. […] Here it has the added potential benefit of reducing spatial correlation between classifier weights.”
P13, Line 28: It is not exactly clear what authors mean by "the prior term"? Does it refer to the equation before adding any regularization or to some prior probability distribution over W? I think in this context, we should be cautious with using the words like prior, posterior, etc.
Thanks for the suggestion, we have now removed the “prior term”, and changed it to “regularization term”.
P16, Line 30, extra "as".
Thanks for pointing it out!
P17, section "Considerations on secondorder inferences". It seems that this section should be placed later in the manuscript, in the section "TDLM for rodent replay".
Thanks, we have made the change accordingly.
P23 Line 39, missing "that".
Thanks, we have added it in the next text.
Supplementary Note 5: what shuffle was used?
It is randomly shuffling the rate map. In the revised manuscript, we have deleted the single replay event analysis as per request.
Reviewer #2:
[…] There is one important area that the paper still falls short on, but that should be straightforward to address with some simple additional analysis and discussion. A distinctive strength of the GLM framework is that it easily accommodates testing important and ubiquitous "secondorder" questions, such as whether there is more forward than reverse replay, is replay strength changing over time, and so on. However, the message in the current version that one could simply subtract sequenceness scores doesn't address how one would formally test for a difference, or test for some factor like time being important. For forward vs. reverse, because this is fit to the same data, this is a comparison between different betas in the secondlevel GLM (Figure 1f). I am not a statistician, but my textbooks say there are a few ways of doing this, for instance z = (β_{fwd} – β_{rev}) / σ_(β_{fwd} – β_{rev}) where the crucial variance term can be estimated as the sqrt(σ(β_{fwd})^{2} + σ(β_{rev})^{2}), matching the intuition that a formal test requires an estimate of variance of the difference between the betas, not just the means.
For early vs. late sleep, things are more complicated because you are not comparing models/betas fit to the same data. I suppose the above approach could also work as long as all the betas are standardized, but wouldn't a better test be to include time as a regressor in the secondlevel GLM and formally test for an interaction between time and T_{fwd} (and T_{bwd})?
In either case, there are two important points to make to demonstrate the utility of the GLM framework for sequence analysis: (1) these ubiquitous secondorder inference questions require a bit more consideration than just saying you can subtract the sequenceness scores; there is another level of statistical inference here, and (2) the TDLM framework, unlike many other approaches, is in fact very well suited to doing just that – it benefits from the existing machinery of linear statistical modeling and associated model comparison tools to make such comparisons.
These points are not currently clear from the "Considerations on secondorder inferences" section. If in addition the GLM framework allows for the regressing out of some potential confounds that can come up in rodent data that the authors now reference that's an additional benefit but not the main point I am asking the authors to speak to.
We are delighted to know the reviewer appreciates our revision effort, and thanks again for all the insightful comments.
We agree that simply subtracting sequenceness scores does not constitute a formal statistical testing procedure. We also agree that we can rely on classical parametric tests, e.g., paired t test for comparison of replay strength between forward and backward replay, as long as there is no multiple comparison problem. If we do not have a replay speed of interest a priori, we will have to control for multiple comparisons (Figure 8A, right panel, pink line), as have discussed in section “Statistical inference”. The reviewer is also correct that we can directly model the time effect as TDLM is a linear framework. In the revised manuscript, we also used the test of “time varying effect on replay strength” as an example to illustrate how TDLM is approaching the secondorder questions.
New text is copied below:
“Second order inferences.
As pointed out by van der Meer, et al. ^{19}, there are two types of statistical questions: a "firstorder" sequence question, which concerns whether an observed sequenceness is different from random (i.e., do replays exist?); and a “secondorder” question, which requires a comparison of sequenceness across conditions (i.e., do replays differ?). […] There is no selection bias in performing statistics on the difference of sequence effects, or effects relating to time (green rectangle).”
References:
Wimmer, G. E., Liu, Y., Vehar, N., Behrens, T. E. J. and Dolan, R. J. Episodic memory retrieval success is associated with rapid replay of episode content. Nature Neuroscience 23, 1025–1033 (2020).
1. Nour, M. M., Liu, Y., Arumuham, A., KurthNelson, Z. and Dolan, R. Impaired neural replay of inferred relationships in schizophrenia. Cell in press (2021).
2. Liu, Y., Mattar, M. G., Behrens, T. E. J., Daw, N. D. and Dolan, R. J. Experience replay is associated with efficient nonlocal learning. Science in press (2021).
3. Liu, Y., Dolan, R. J., KurthNelson, Z. and Behrens, T. E. J. Human replay spontaneously reorganizes experience. Cell 178, 640652 (2019).
4. Fyhn, M., Hafting, T., Treves, A., Moser, M.B. and Moser, E. I. Hippocampal remapping and grid realignment in entorhinal cortex. Nature 446, 190 (2007).
5. Dehaene, S., Meyniel, F., Wacongne, C., Wang, L. and Pallier, C. The neural representation of sequences: from transition probabilities to algebraic patterns and linguistic trees. Neuron 88, 219 (2015).
6. Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. and Baker, C. I. Circular analysis in systems neuroscience: the dangers of double dipping. Nature neuroscience 12, 535 (2009).
7. Ólafsdóttir, H. F., Carpenter, F. and Barry, C. Coordinated grid and place cell replay during rest. Nature Neuroscience 19, 792 (2016).
8. Wilson, M. A. and McNaughton, B. L. Reactivation of hippocampal ensemble memories during sleep. Science 265, 676679 (1994).
9. Skaggs, W. E. and McNaughton, B. L. Replay of neuronal firing sequences in rat hippocampus during sleep following spatial experience. Science 271, 18701873 (1996).
10. Louie, K. and Wilson, M. A. Temporally structured replay of awake hippocampal ensemble activity during rapid eye movement sleep. Neuron 29, 145156 (2001).
11. Lee, A. K. and Wilson, M. A. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36, 11831194 (2002).
12. Foster, D. J. Replay comes of age. Annual Review of Neuroscience 40, 581602 (2017).
13. Ólafsdóttir, H. F., Bush, D. and Barry, C. The Role of Hippocampal Replay in Memory and Planning. Current Biology 28, R37R50 (2018).
14. Pfeiffer, B. E. The content of hippocampal “replay”. Hippocampus 30, 618 (2020).
15. Carr, M. F., Jadhav, S. P. and Frank, L. M. Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval. Nature Neuroscience 14, 147 (2011).
16. Lisman, J. et al. Viewpoints: how the hippocampus contributes to memory, navigation and cognition. Nature Neuroscience 20, 14341447 (2017).
17. Davidson, T. J., Kloosterman, F. and Wilson, M. A. Hippocampal replay of extended experience. Neuron 63, 497507 (2009).
18. Grosmark, A. D. and Buzsáki, G. Diversity in neural firing dynamics supports both rigid and learned hippocampal sequences. Science 351, 14401443 (2016).
19. Maboudi, K. et al. Uncovering temporal structure in hippocampal output patterns. eLife 7, e34467 (2018).
20. van der Meer, M. A., Kemere, C. and Diba, K. Progress and issues in secondorder analysis of hippocampal replay. Philosophical Transactions of the Royal Society B: Biological Sciences 375, 20190238 (2020).
21. Tingley, D. and Peyrache, A. On the methods for reactivation and replay analysis. Philosophical Transactions of the Royal Society B: Biological Sciences 375, 20190231 (2020).
22. Rosenberg, M., Zhang, T., Perona, P. and Meister, M. Mice in a labyrinth: Rapid learning, sudden insight, and efficient exploration. bioRxiv (2021).
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
We shared the revised manuscript with the reviewers. After some discussion, it was concluded that the manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:
1. The reviewers were confused by the data in Figure 7e. We finally concluded that it was an attempt to explain how the regression was formed, but it took lots of back and forth. Given that this is a tools paper, there seems to be no reason why each analysis figure can't be backed with equations that identify the regressions being done, and the variables being regressed.
Thanks for the suggestion and the close reading. It is very valuable to know what is and what is not clear. We have now expanded the text and written the linear equation.
“Interactions
A second method for performing second order tests is to introduce them into the linear regression as interaction terms, and then perform inference on the regression weights for these interactions.[…] This has no effect on the regression coefficients of the interaction term but, by rendering the interaction approximately orthogonal to 𝑋_{k}(𝑡), it makes it possible to estimate the main effect and the interaction in the same regression.”
2. Figure 5 appears to be about analyzing the MEG data when events are detected. (Isn't TDLM an approach for measuring sequenceness over a population of events rather than finding single ones?)
Sorry for the misunderstanding. As with the rest of the manuscript Figure 5 is an average over likely reactivations. The difference is that to construct this average it gives each timepoint a score based on the instantaneous sequenceness. Some timepoints have low scores. Others have high scores. We then average the temporal frequency plot over those with high scores. We believe this is very much of the spirit of the rest of the paper. It creates one average measure for the whole timeseries. Not a measure for each event. We are sorry if this was confusing. We have amended in the text below.
Even though this is previously published work, the methods need significant expansion (see Point 1). The text refers to a section that appears to be missing? Here's the text: "We want to identify the time when a given sequence is very likely to unfold. We achieve this, by transforming from the space of states to the space of sequence events. This is the same computation as in the section "States as sequence events". " (Search for "sequence events" yields no results.) Perhaps this refers to Appendix 3, but the text there doesn't really help much.
Here is the expanded text.
“We want to identify the time when a given sequence is very likely to unfold, so we can construct averages of independent data over these times.[…] Note that although this method assigns a score for individual replay events as an intermediary variable, it results in an average measure across many events.”
3. One reviewer had previously asked "in most of the figures that the authors provide, it's unclear whether the sequenceness score is driven by one particular pair of stimuli, or equally so among most possible pairs". To clarify, it seems that the question is: if the model proposed is A→B→C→D, and the data are largely A→B, can that be detected? Or alternatively, can you give a proper way of comparing two models (in this case, A→B→C→D vs A→B)?
Thanks for the question. We realise now that there are two ways to interpret this question. In our query document, we gave an answer to the first interpretation (which we repeat below). However, maybe we got the interpretation wrong, so we also provide a new answer below. We are grateful to the reviewer for this as it has allowed us to discover something new about the rodent data (maybe others knew it already?).
Interpretation 1: Can we discern whether TDLM is measuring an average of short transitions (AB, BC, CD) or long sequences (ABCD).
Yes. This issue is dealt with in Appendix 1: Multistep sequences. As far as we know, other techniques do not do this. In TDLM, for a particular time lag, we can compute the interaction between A and B (AB) and ask if this new variable AB predicts C at the next time lag better than A and B alone, or by some other combination (XB). The procedure works for any length sequence (so includes ABCD). We have reread this section, and we believe it is clear, but if the reviewer has specific suggestions, we would be happy to incorporate them.
Interpretation 2: Can we discern whether the transitions that are contributing to the average are disproportionately some (e.g., AB), rather than others (CD)?
Yes. TDLM measures each transition independently, so by examining these transitions before averaging them, we can see what the underlying contributions are. To demonstrate this approach, we reexamine the rodent data. We show below (and in the revised paper), that forward replay disproportionately occurs at the beginning of the track (significantly), that this is not true of backward replay (which prefers the end of the track), and that the difference is significant.
We used the same ROI we have defined previously based on the significant replay speeds (forward + backward, cf. Figure 7d, right panel, green shading). For visualization purposes, we have plotted the estimated strength for each pairwise forward sequence (Figure 8A), separately within each scale (from 1 to 4, with increasing spatial scales). The pairwise sequences are ordered from the start of the maze to the end of the maze. Alongside the pairwise sequence plot, we have plotted the mean replay strength over all possible pairwise transitions (in red), in comparison to the mean of all control transitions (in grey. As expected, they are all around 0). Note that we cannot perform inference on the difference between the red and grey bars here because they have been selected from a biased ROI. It is simply for illustration purposes. We have therefore put them in red squares to match figure 7F.
It is evident from Figure 8 that the TDLM average is not dominated by a single transition. Many transitions contribute. However, it also appears that there is a tendency for replay strength to decrease going from the start of maze to the end of maze (especially in the larger scales, which measure bigger replay movements). To formally test this, we can adopt the linear contrast approach described in the last round of revisions.
Our previous text on linear contrast read as follows:
“If we want to test whether replay increases linearly over 5 conditions [A, B, C, D, E], we can compute the linear contrast 2*A – B + 0*C + D + 2*E, (which would be zero under the null hypothesis) and perform statistics on this new measure.”
Here, we can use this exact same approach to test a linear increase or decrease across different transitions. We take the linear contrast weight vector, 𝑐 ([2,1,0,1,2] for the largest scale, [3:3] for the next scale, [5:5] for the next scale, [12:12] for the smallest scale) and multiply these by the β estimates of the transitions:
𝑐𝑜𝑛𝑡𝑟𝑎𝑠𝑡 = 𝑐^{0}𝛽
If this new measure, 𝑐𝑜𝑛𝑡𝑟𝑎𝑠𝑡, is different from zero, then there is a linear increase/decrease from one end of the track to the other. Note that this new contrast is no longer biased by the ROI selection as each transition contributed equally to the ROI selection, but we are now comparing between transitions. Inference on this contrast is therefore valid. We have therefore put them in green boxes to match figure 7F (Figure 8B, C).
Within the larger two scales, these contrasts are significantly negative (tested against permutations in exactly the same way as the “mean” contrasts). Since we are still in the linear domain, we can now just average these contrasts across the 4 scales and get a single measure for spatial modulation of replay. This average measure is significantly negative (Figure 8B). Hence, on average, forward replay is stronger at the beginning of the track.
We can do the same thing for backward replay. We found a opposite pattern, i.e., strength of backward replay is stronger at the end of the track, and similarly, it is not significant in the smallest scale, and become significant in the largest scale, and also significant on average across all scales (Figure 8C). Again, since we are in the linear domain, we can further contrast these contrasts, asking if this effect is different for forwards and backward replay. We found the difference is indeed significant (Figure 8D).
Note we would like to stress again, that this analysis is not about a single replay event but is testing for average differences across all replay events.
We have added this point in the revised manuscript:
“In addition to the time varying effect, we can also test the spatial modulation effect, i.e., how replay strength (at the same replay speed) change as a function of its spatial content.[…] We found the difference is indeed significant (Figure 8d). Note we would like to stress again, that this analysis is not about a single replay event but is testing for average differences across all replay events.”
https://doi.org/10.7554/eLife.66917.sa2Article and author information
Author details
Funding
Wellcome (098362/Z/12/Z)
 Raymond J Dolan
Wellcome (104765/Z/14/Z)
 Timothy E Behrens
Wellcome (219525/Z/19/Z)
 Timothy E Behrens
James S. McDonnell Foundation (JSMF220020372)
 Timothy E Behrens
Wellcome (212281/Z/18/Z)
 Caswell Barry
Max Planck Society
 Yunzhe Liu
Wellcome (203139/Z/16/Z)
 Timothy E Behrens
Wellcome (091593/Z/10/Z)
 Raymond J Dolan
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Matthew A Wilson for help with rodent theta sequence analysis. We thank Elliott Wimmer and Toby Wise for helpful discussion and generous sharing of their data. We thank Matt Nour for helpful comments on a previous version of the manuscript. YL is also grateful for the unique opportunity provided by the Brains, Minds and Machines Summer Course. We acknowledge fundings from the Open Research Fund of the State Key Laboratory of Cognitive Neuroscience and Learning to YL, Wellcome Trust Investigator Award (098362/Z/12/Z) to RJD, Wellcome Trust Senior Research Fellowship (104765/Z/14/Z), and Principal Research Fellowship (219525/Z/19/Z), together with a James S McDonnell Foundation Award (JSMF220020372), to TEJB; and Wellcome Trust Senior Research Fellowship (212281/Z/18/Z) to CB. Both Wellcome Centres are supported by core funding from the Wellcome Trust: Wellcome Centre for Integrative Neuroimaging (203139/Z/16/Z), Wellcome Centre for Human Neuroimaging (091593/Z/10/Z). The Max Planck UCL Centre is a joint initiative supported by UCL and the Max Planck Society.
Ethics
Human subjects: The human dataset used in this study was reported in Liu et al 2019. All participants were recruited from the UCL Institute of Cognitive Neuroscience subject pool, had a normal or correctedtonormal vision, no history of psychiatric or neurological disorders, and had provided written informed consent prior to the start of the experiment, which was approved by the Research Ethics Committee at University College London (UK), under ethics number 9929/002.
Senior Editor
 Laura L Colgin, University of Texas at Austin, United States
Reviewing Editor
 Caleb Kemere, Rice University, United States
Reviewer
 Matthijs van der Meer
Publication history
 Received: January 26, 2021
 Accepted: June 6, 2021
 Accepted Manuscript published: June 7, 2021 (version 1)
 Version of Record published: July 28, 2021 (version 2)
Copyright
© 2021, Liu et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 3,187
 Page views

 618
 Downloads

 7
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
Artificial neural networks could pave the way for efficiently simulating largescale models of neuronal networks in the nervous system.

 Neuroscience
A key question in decision making is how humans arbitrate between competing learning and memory systems to maximize reward. We address this question by probing the balance between the effects, on choice, of incremental trialanderror learning versus episodic memories of individual events. Although a rich literature has studied incremental learning in isolation, the role of episodic memory in decision making has only recently drawn focus, and little research disentangles their separate contributions. We hypothesized that the brain arbitrates rationally between these two systems, relying on each in circumstances to which it is most suited, as indicated by uncertainty. We tested this hypothesis by directly contrasting contributions of episodic and incremental influence to decisions, while manipulating the relative uncertainty of incremental learning using a wellestablished manipulation of reward volatility. Across two large, independent samples of young adults, participants traded these influences off rationally, depending more on episodic information when incremental summaries were more uncertain. These results support the proposal that the brain optimizes the balance between different forms of learning and memory according to their relative uncertainties and elucidate the circumstances under which episodic memory informs decisions.