Abstract
Most approaches for assessing causality in complex dynamical systems fail when the interactions between variables are inherently non-linear and non-stationary. Here we introduce Temporal Autoencoders for Causal Inference (TACI), a methodology that combines a new surrogate data metric for assessing causal interactions with a novel two-headed machine learning architecture to identify and measure the direction and strength of time-varying causal interactions. Through tests on both synthetic and real-world datasets, we demonstrate TACI’s ability to accurately quantify dynamic causal interactions across a variety of systems. Our findings display the method’s effectiveness compared to existing approaches and also highlight our approach’s potential to build a deeper understanding of the mechanisms that underlie time-varying interactions in physical and biological systems.
Introduction
Rather than being static in time, interactions between parts of a complex system continuously ebb and flow, with one variable driving another at one point, just to see the relationship reverse or lessen or disappear at a later point in time. Real-world signals are seldom stationary and well-behaved, and their causal linkages and interactions frequently appear, disappear, and reappear, possibly changing in strength over time. Examples of such systems abound in neuroscience [1, 2], ecology [3], finance, [4–6], and climate [7–9].
Despite the ubiquity of these dynamically altering interactions, however, most methods for assessing causality in complex dynamical systems have difficulty measuring how the direction and extent of interactions between variables in a system alter in time. This difficulty arises from several inherent characteristics of complex systems and the limitations of existing causal assessment methodologies. Most of these methods, including Granger Causality (GC) [10], often assume that the dynamical system should be approximately stationary, meaning their statistical properties do not change over time. Other common assumptions, such as linearity and time invariance are also often violated in real-world complex systems. These constraints significantly limit the applicability and accuracy of these approaches in many scenarios [11–16].
In addition, in systems where variables are strongly coupled and synchronized, some of these causality inference methods struggle to accurately infer the coupling strength and direction of causality [16]. This issue extends to scenarios of intermediate coupling, where the variables are neither weakly nor strongly linked. Additionally, the presence of noise in the system leads to a decrease in cross-mapping fidelity, revealing further limitations [17, 18]. Although the lack of correlation is neither necessary nor sufficient to demonstrate causation [19, 20], correlation does play an important role in many statistical methods as the basis for hypothesis tests for causality. Mirage correlations can appear in the simplest nonlinear systems [21]. Variables that may be positively correlated at some point in time can become anti-correlated some moments after or even lose all coherence. However, most causality methods do not adequately account for the fact that sudden changes in correlation over time between variables may indicate a change in the underlying temporal causal relationships.
In an attempt to overcome some of these limitations, here we introduce a new methodology for probing timevarying causal interactions using a new metric for assessing causal interactions combined with a novel machine learning architecture for causal inference, which we call Temporal Autoencoders for Causal Inference (TACI). We show the method’s effectiveness on synthetic and real-world data sets, both in an absolute sense and in comparison to extant methods, particularly focused on how to find time-varying causal structure in complex dynamical systems.
Overview of Methodology
In our methodology, we adopt a two-fold approach towards developing a causal inference method that accurately assesses causality between variables x(t) and y(t) in the Granger sense for nonlinear systems with timevarying interactions. The first aspect of our approach is to use a novel surrogate data comparison metric - the Comparative Surrogate Granger Index (CSGI) – that measures the relative improvement in prediction accuracy when including both variables vs. one of them and a randomized version of the other. The other aspect is to use a two-headed Temporal Convolutional Network architecture to robustly capture the space of potential nonlinear mappings between variables across the entire time series. As will be observed, the CSGI with linear autoregressive models works well to identify causal interactions in situations where relatively straightforward mappings exist between variables, but the more complicated neural network model is more effective when the mappings are more non-linear.
Comparative Surrogate Granger Index (CSGI)
Informally, Granger Causality (GC) defines a causal interaction from x(t) to y(t) to be when knowing the full history of both x(t) and y(t) provides a better prediction about the future of y(t) than knowing just the history of y(t) alone. While there are many variations on this methodology [22–24], the typical form used is to compare two models of similar type (e.g., linear autoregressive models, feedforward neural networks, etc.) based on their ability to predict the future state of y(t). More explicitly, the comparison is between
and
Usually, an F-test is used to determine whether the latter model is preferred over the former.
This form, however, suffers from two limitations. First, the comparison is a binary one – the second model is “significantly” better than the first or it is not – thus, differences in the strength of coupling can not be detected, just the presence or absence. Second, because the F-test and similar methods incorporate strong assumptions about the underlying dynamics of the system, statistical statements deriving from these tests are often not robust under resampling or re-parameterization. In addition, because the model complexities for the two models being compared are inevitably quite different, with one typically having twice as many parameters as the other, the F-test often fails to detect causal interactions properly.
A common strategy for ameliorating these limitations is to compare not
and
but rather
and
where x(s) is a surrogate data set that shares similar statistical properties to x(t) but is shuffled in some manner (e.g., shuffling values to preserve the distribution of values or shuffling phases of the time series’ Fourier Transform to preserve the frequency profile), or even a completely randomized time series. Typically, this comparison is accomplished through the Extended Granger Causality Index (EGCI) [22, 25]. If εy (t) are the residuals for fitting the future of y(t) on the past of y(t) and εxy (t) are the residuals for fitting the future of y(t) on the pasts of both x(t) and y(t), then the EGCI is given by ratio between the relative reduction in residual variance when the past of x(t) is included in the model:
y(t) is thus said to cause x(t) if the EGCI using the actual values of x(t) is significantly higher than the EGCI found substituting x(t) for x(s) (t).
Our approach attempts to assess directly the relative increase in variance explained for the predictive model when using x(t) vs. x(s) (t). Specifically, if is the fraction of variance explained about the future of y(t) using the pasts of x(t) and y(t) in the model and is the fraction of variance explained using x(s) (t) and y(t), then we define the Comparative Surrogate Granger Index (CSGI), χx→y, to be defined via
This metric’s advantages over the EGCI are that it is able to measure small changes in causal interactions and that it explicitly measures the difference in predictive power between using actual data and using surrogate data to predict the future. In this article, we will be measuring χx→y and χy→x for all pairs of variables to assess whether there is causal coupling between two variables, whether it is uni- or bi-directional, and the relative strength of the coupling.
Temporal Autoencoders for Causal Inference
While one advance in our methodology is the use of the CSGI in the previous section, the other novel contribution is the use of a new artificial neural network architecture to calculate the functions f and g that are used to predict the future of y(t). The original (and still most common) models ([10]) for f and g are auto-regressive linear models of the form
where x(s)(t) can be substituted for x(t) when using the surrogate approach. While this relatively simple approach shows impressive performance in a variety of scenarios, these models fail to accurately predict known causal interactions for couplings that have weak to moderate coupling and are governed by nonlinear dynamics that are not well approximated by linear models [26, 27]. This inability is typically because the systems fail to satisfy separability. In other words, all information about a causative factor has to be inherent to that specific variable and can be omitted by removing that variable from the model, as is the case for purely stochastic or linear systems [16]. For systems with strongly nonlinear deterministic components, however, this assumption fails, and, accordingly, so do the predictions from auto-regressive linear GC [13, 27]. In addition, because these linear models have difficulty predicting information across multiple timescales, they often have difficulty detecting subtle shifts in causality as a function of time.
In recent years, a solution has been to replace the linear model in (5) with deep neural networks of varying architectures that, due to their expressive nature, excel in approximating complex functions [28]. These methods include the use of Variational Autoencoders to estimate causal effects [29], Causal Generative Neural Networks to learn functional causal models [30], Neural Granger to estimate non-linearly dependencies based on Granger causality principles [24], and the Temporal Causal Discovery Framework (TCDF) to address time delay causal relationships [31]. These methods, however, are often unwieldy to train, are prone to overfitting, and are susceptible to inaccuracies in the presence of a significant amount of noise.
In this paper, we introduce a novel neural network architecture for causality using a two-headed Temporal Convolutional Network (TCN) autoencoder (Fig. 1). TCNs, the primary building block of our approach, are a specialized type of neural network that integrates causal inference into convolutional architectures. First introduced for video-based action segmentation [32], TCNs quickly became popular due to their ability to extend most of the convolutional advantages of regular CNNs – including sparsity and translational equivariance – into the time domain through a series of dilated and causal convolutional layers. A key characteristic of this framework is its simplicity, relatively long memory, and ability to outperform most convolutional architectures in autoregressive prediction tasks [33].
Our approach, which we call Temporal Autoencoders for Causal Inference (TACI) is a neural network that consists of a two-headed TCN autoencoder, where two TCNs are used to encode time series x(t) and y(t), and a third is used for decoding an equivalently long time series describing the future trajectory of y(t) (shifted by some time, T) from a relatively low-dimensional latent space that is derived from the outputs of the first two autoencoders. A more detailed description of our model and our training methodology can be found in Materials and methods. Code is available here: https://github.com/bermanlabemory/Temporal-Autoencoders-For-Causal-Inference-TACI.
For each comparison of interest, we train four versions of this network: one using x(t) and y(t) as input time series to predict the future of x(t), another that is the same except for replacing x(t) with the surrogate data x(s) (t), another pair of the networks that are structured the same except with x and y reversed in each case. Given these four trained networks, we can then make predictions for the future of the appropriate variable and calculate the fraction of variance explained over a moving window (i.e., or ). From these values, we can then apply Eqn. (4) to calculate the CSGI values χx→y and χy→x, which we will use to assess causal inference between these two variables.
Other Methods We Compare Against
Alongside comparing GC with linear autoregressive models, which we will refer to here as Surrogate Linear Granger Causality (SLGC), we will also test against two other commonly used methods: Convergent Cross Mapping and Transfer Entropy.
Convergent Cross Mapping (CCM)
Convergent Cross Mapping (CCM) was introduced to determine causation in systems that could be modeled as relatively noiseless deterministic dynamical systems [16]. The core concept of this approach is that according to Takens’ Embedding Theorem, if x(t) and y(t) are two variables of a deterministic dynamical system, one can reconstruct x(t) from a delay embedding of y(t) if and only if the time derivative of x(t) explicitly depends on y(t) [34–36]. Thus, if it is possible to predict x(t) from an embedding of y(t) alone, we would say that y has a causal interaction with x. Practically, these predictions are calculated by predicting x(t) from y(t) (and vice versa) and computing the correlation coefficient between the actual and predicted values [16], with correlations near one implying a strong casual influence and correlations near zero implying no or little influence. Here, we used Scikit Convergent Cross Mapping (skccm), a Python-based-library implementation of CCM for causal discovery [37].
Transfer Entropy
Transfer entropy (TE) is a metric that quantifies a reduction in uncertainty in predicting the future of one variable given the past of another using formalism from information theory [38]. Specifically, we can measure the transfer entropy from y(t) to x(t) at a given distance in the future, τ, (TY→X (τ)) via
where H(X|Y) is the Shannon entropy of the conditional probability distribution p(X|Y). This quantity is zero if adding information about the past of y(t) results in no reduction in our future guesses for x(t), and if it is non-zero, the quantity can be interpreted as the rate of information flowing from Y to X. Practically, we calculate TE for our systems using the Java Information Dynamics Toolkit (JIDT or Infodynamics Toolkit) [39].
Results
Artificial Test Systems
To test the validity of our approach, we applied the methodology to a variety of different deterministic and stochastic dynamical system models with known causal interactions, finding that TACI performs well across all cases. In particular, we are interested in cases where the coupling changes in time, which we will explore in detail for the Coupled Hénon Maps system.
The Rössler-Lorenz System
Our first example case is a system of coupled chaotic attractors, where the Lorenz system [40] is driven by a Rossler oscillator [41]:
where the constant C controls the coupling strength of the system, and the driving severely distorts the behavior of the Lorenz attractor as C increases [42]. Synchronization between and starts near C = 2.14, making the two systems’ behavior effectively coupled above this point despite the lack of an explicit coupling term, making traditional formal notions of causality ill-posed (Fig. 2A) [43]. The solutions to the differential equations were generated by using a fourth-order Runge-Kutta method. C was chosen between 0 and 5, computing a time series of length 300,000 (dt = 0.1) after a burn-in time of 30,000 time points at each coupling strength.
As seen in Figure 2B-E, TACI is the only method of the four tried here that accurately predicts the unidirectional coupling from to . SLGC (Fig. 2B) fails to predict any coupling whatsoever between the variables, and CCM and TE (Figs. 2C-D) predict bidirectional coupling (albeit with somewhat more information flowing from than in the reverse). TACI, in contrast, predicts only unidirectional coupling until the point of synchronization (C ≈ 2.14), after which, it predicts no effective causation in either direction.
Coupled Bi-directional Two-Species Model
In contrast to the Röossler-Lorenz System, the bidirectional two-species model [44], is calculated in discrete time, and it exhibits (unsurprisingly) bi-directional coupling:
where C once again is the coupling strength, noting that the coupling strength is five times larger from x → y than in the reverse direction. In this system, separability is not satisfied (i.e., information about y is redundantly present in x and vice versa). Despite the fact this model is deterministic and dynamically coupled, it shows alternating periods of positive, negative, and zero correlation [16]. For values of C ∈ [0, 0.35], we created a bivariate time series of length 300,000 (after a burn-in of 30,000 time points). The initial conditions were generated with random starting points drawn from the uniform distribution (0.01, 0.99).
Applying the four methods to these data (Fig. 3), we find that both TE and TACI correctly identify both the bi-directional aspect of the coupling and the increased causal link from x → y compared to y → x. CCM identifies the bi-directionality correctly, but it does not identify the relative strength of the couplings, and SLGC is unable to identify any causal link from y → x.
Coupled Autoregressive Models
Coupled autoregressive models are an extension of basic autoregressive models, intended to represent the dynamics of systems where multiple time series influence each other. In these models, the value of a variable at a given time point is not only a function of its own previous values but also depends on the past values of other variables in the system. Here, we study the following system consisting of two bidirectionally coupled autoregressive processes of the first order:
where C is the strength of the coupling between x and y , and εx(t) and εy (t) are drawn from a normal (Gaussian) distribution with a mean of 0 and . Higher values of C represent stronger couplings from x → y, and for C = 0, the system is unidirectional (only the past of y has an impact on the future of x). We examined values of C ∈ [0, 0.6] and created sets of bivariate time series of length L = 300,000 for each value of C (after a burnin time of 30,000 points). The initial conditions of the system were generated from the normal distribution with zero mean and unit variance.
Fig. 4 shows that SLGC does very well at identifying the onset of bi-directionality for C > 0, with the coupling of x → y monotonically increasing with C. This fact is perhaps not surprising, as SLGC is based on precisely such linear systems. TACI also does a comparable job at detecting bi-directionality, even roughly predicting the switchover between x → y and y → x coupling strengths at C = 0.5. CCM, however, does not predict any coupling from y → x at C = 0, and TE does not predict any significant coupling from y → x across all values of C.
Coupled Hénon Maps
Our last stationary example, the Hénon map, is a well-known example of a discrete-time dynamical system that exhibits chaotic behavior that was first developed as a simplified version of the Poincaré map of the Lorenz model [44], and in its chaotic regime, it is characterized by an attractor with a warped horseshoe shape. Here we consider a case of two Hénon maps, and , with unidirectional coupling [45]:
where C controls the strength of the coupling from to . For coupling strengths above C > 0.65, the systems start to show evidence of intermittent synchronizations. This on-off behavior becomes a fully synchronized state after C > 0.7 [46]. For C ε [0,0.9], we generated sequences of length 300,000 (after a burn-in period of 30,000) and analyzed data from x1 and y1 for each of the methods. TACI is the only method out of the four that correctly identifies the uni-directional coupling between from to (but not from to ), although SLGC is very close, it is statistically significantly different from zero at intermediate values of C. TE and CCM both predict bi-directional interactions, albeit with weaker coupling from to than in the reverse direction.
Non-stationary Coupled Hénon Maps
TACI is the only method that performed well across all four artificial test cases, but the challenge remains as to whether it can identify patterns in data that change over time. To test this idea, we generated time series from the coupled Hénon maps in (10) but with timevarying couplings, Cxy(t) and Cyx(t):
Here, the two coupling terms are similar to the coupling term, C, in (10) but with time-varying values and potentially allowing for coupling from y to x.
We performed four different tests to see how TACI performs when causal interactions alter with time: (i) setting Cyx(t) = 0 and toggling Cxy (t) between 0 and 0.6, (ii) initially setting Cyx(t) = 0 and Cxy (t) = 0.6 and then switching the two half-way through the run, (iii) setting Cyx (t) = 0 and toggling Cxy(t) between 0 and 0.6 but with pulses of Cxy (t) = 0.6 being set to different time widths, and (iv) setting Cyx (t) = 0 and stepping Cxy (t) from 0 to 0.4 and back down again in steps of 0.1.
Other than the coupling changes, all time series were generated in an identical manner to the previous section. It is important to note that the network for TACI was only trained once on the entire time series, not specifically for each testing window. Thus, by creating a robust model, our network is able to identify complex causal dynamics that change in time without having to constantly fit new models, as would be the case for SLGC, CCM, and TE.
In Fig. 6, we show that TACI performs well in the first three of these scenarios, ably identifying when eliminations of causal interactions occur, as well as when Cyx(t) = 0 and Cxy(t) flip. In addition, Fig. 7 shows that the TACI network is able to identify how coupling strengths change with time.
a. Summary of Results on Artificial Test Systems Among the methods tested, only TACI is able to robustly infer known causal interactions between variables without incorrectly predicting non-existent interactions. TACI consistently differentiates between unidirectional and bidirectional coupling in low, moderate, and strong settings. Additionally, it accurately detects instances when the time series become synchronized in all tested scenarios. TACI excels in identifying complex causal dynamics that evolve over time, such as those observed in pulse systems with time-varying coupling. Given these successes in artificial systems, we will now apply the method to two real-world examples.
Jena Climate Dataset
The first data set we will test our model on is the “Jena Climate Dataset”, a detailed collection of weather measurements recorded by the Max Planck Institute for Biogeochemistry from a weather station located in Jena, Germany [47]. The dataset spans nearly eight years – from January 10, 2009, to December 31, 2016 – and includes 14 distinct meteorological features recorded every 10 minutes. These features include a wide range of atmospheric conditions, from temperature to relative humidity to vapor pressure deficit (see Table I for details). Several example time series are shown in Fig. 8.
A key advantage of these data is that some of the interactions are known already due to empirical models of atmospheric dynamics, providing a good test case for our method on real data. One example is the relationship between relative humidity (RH), the dew point (Tdew), and the temperature (T), which is given by
where Tdew and T are in degrees Celsius and RH is a percentage [48]. Calculating the partial derivative of RH with respect to T (keeping Tdew fixed), we find that we should expect stronger interactions to occur from T to RH at lower temperatures (Fig. 9A). After training our TACI model from each of the variables in the data set onto T , we indeed find that causal interactions peak during epochs when the temperature drops (Fig. 9B), showing that our method can accurately find temporal variations in causal interactions in messy real-world data.
Electrocorticography in Non-Human Primates
Lastly, we used electrocorticography (ECoG) data from non-human primates to test whether our methodology can detect time-varying interactions between brain regions from these electrophysiological signals. These data exhibit extraordinarily complex dynamics that shift in time as an animal changes its state: from sleep to wake, from satiated to hungry, from attending from one object to another, and so on [2]. These alterations are often subtle, and, thus, understanding how different regions of the brain drive one another’s activity requires a method that can detect how slight variations in the relationship between variables lead to changing interactions across time.
Here, we analyzed publicly available ECoG data from a single monkey (Macaca fuscata) [49–51]. These recordings consisted of 128 channels of data that recorded activity from a hemisphere of the monkey’s brain that covered the visual, temporal, parietal, motor, prefrontal, and somatosensory cortices, sampling at 1kHz (details can be found in [49]). Data were collected during both awake and anesthetized states to examine neural activity across different consciousness levels. To generate an anesthetized state, the monkey was chair-restrained and propofol was injected intravenously. The recording sessions were structured into four distinct phases: an initial phase where the monkey is awake with eyes open, a subsequent phase where the monkey is awake but with its eyes covered, a phase where the monkey is under deep anesthesia, induced to reach a state of loss of consciousness, and a final stage where the monkey recovers from anesthesia with its eyes covered. The depth of anesthesia was assessed by monitoring the monkey’s responsiveness to tactile stimulation and the presence of slow wave oscillations in the ECoG signal [49].
Previous studies analyzing these data for changes in causal interactions using Spectral Granger Causality [49] or CCM [50], but each was only able to analyze data at the level of the four phases described in the previous paragraph (each requiring training a separate model, as well). Specifically, we trained TACI on one monkey (George in [51]) with a sequence length of 50 to account for the extended autocorrelation time observed in the time series (average of 53). Approximately 53 minutes of data corresponding to the four previously outlined phases were utilized for this purpose. The training was conducted over 300 epochs or until the point of convergence. Further details of the parameters used can be found in Table II
Finally, to compare with these previous studies, while we calculated the causal interactions between each pair of electrodes, we will present many of the results as the average result between pairs of electrodes assigned to the same region of the cortex. Here, we will be using the eight coarse-grained regions defined in [50]: the medial prefrontal cortex (mPFC), lateral prefrontal cortex (lPFC), pre-motor cortex (PMC), motor and somatosensory cortex (MSC), temporal cortex (TC), parietal cortex (PC), higher visual cortex (HVC), and lower visual cortex (LVC).
Fig. 10 shows time-averaged values of correlation (A), TACI-derived causal interactions (B), and Directionality (C), which we define as the difference in CSGI values in one direction vs. the other, for epochs of time before, during, and after anaesthetization. For correlation, we measure the average Pearson correlation coefficient between all the electrodes assigned to the various regions. Note that the diagonal terms do not necessarily have to be equal to one here, as electrodes within a region are not perfectly correlated with one another. There are only minimal changes in brain region interactions across the three time windows when measuring correlation, but large differences emerge when analyzing the data using TACI. Specifically, we see that almost all interactions disappear during the anesthetized period, with the interactions beginning to re-emerge during the recovery period. These results differ from the results from CCM in [50], where they claimed that while some interactions decreased, others strengthened (this effect is seen in our Directionality measurements, however). Also interesting are the nearly vertical lines in Fig. 10B, implying that certain regions like the mPFC might be affected broadly by signals from various parts of the cortex - a finding that agrees with the commonly held notion that the mPFC’s role often involves higher-level cognitive function [52]. Again, it should be noted that only one TACI network was trained per pair of interactions across all time epochs, unlike the other methods we describe, which must find interactions separately during each measurement period.
Lastly, taking advantage of the aforementioned property of TACI, we took a finer-grained look at how interactions between a pair of regions might change with time during the experiment, specifically the mPFC and the PC. In Fig. 11, we show how these regions’ interactions alter with time. Using our approach, we observe how the coupling slowly decays upon administration of the propofol and how it rapidly increases a few minutes into the recovery period. Also interesting is that while during the awake periods, PC consistently has a casual interaction towards mPFC, the reverse interaction has significant temporal fluctuations whose study might lead to insights into how these brain regions drive each other during cognitive tasks.
Discussion
In this article, we introduce a new methodology for probing time-varying causal interactions in complex dynamical systems using a novel machine learning architecture for causal inference, Temporal Autoencoders for Causal Inference (TACI), combined with a novel metric for assessing causal interactions using surrogate data. A particular advantage of our approach is being able to train a single model that captures the dynamics of the time series across all points in time, allowing for timevarying interactions to be found without retraining, a computationally expensive endeavor for most artificial neural networks. We found that our method performed well compared to other methods in the field on synthetic data sets with known causal interactions, including those with time-varying couplings between variables. We also found that our method was able to identify known interactions between variables in a climate data set and was able to discover subtle temporal fluctuations in coupling in non-human primate ECoG data.
Our approach, while novel, is not without its limitations. One of the primary concerns is the extensive training time and the resource-intensive nature of the model. Implementing TACI, especially on large datasets, requires significant computational power and time. We envision that several technical improvements in the network architecture and training will allow for the method to be sped-up considerably, however. Another concern is the potential for overfitting due to TACI’s considerable modeling capacity. While the framework is designed to capture the nuanced dynamics of causal relationships over time, like most other causal network models, this method can fit data too closely if not trained properly, resulting in models that perform exceptionally well on training data but generalize poorly to unseen data. Furthermore, TACI incorporates elements of the Granger causality approach, which means it also inherits some of its problems. Granger causality assumes that the causal variable contains unique information about the future values of the effect variable, which might not always hold true in complex systems where numerous latent factors influence outcomes. Lastly, but importantly, as our approach is based solely on observational data, TACI only attempts to provide hypotheses about causal relationships between variables or to infer important relationships between variables when perturbation experiments are impossible or unethical to perform.
These limitations withstanding, however, the results presented in this chapter provide evidence that our approach will be broadly applicable to complicated data sets with time-varying causal structure, with particular promise for neural data, where we hope to build our understanding of how parts of the brain shift their interactions as behavioral states and needs alter in the world.
Materials and Methods
At its core, TACI uses a two-headed autoencoder architecture implemented in a two-step process aiming to facilitate the prediction of future states and the inference of causal relationships between different time series datasets. In the first application, the two-headed autoencoder is utilized to process the original time series data, x(t) and y(t). The encoder segments of this autoencoder independently process x(t) and y(t), capturing and encoding their temporal dynamics and features into latent representations. These representations are then merged in the bottleneck, combining the distilled information from both time series into a unified latent space that encapsulates potential causal interactions. From this combined latent representation, the decoder works to reconstruct or predict the future trajectory of y(t), shifted by a time τ. The second application involves replacing x(t) with the surrogate data x(s) (t). This surrogate data is generated to mimic the statistical properties of x(t) but is designed to break any potential causal link between x(t) and y(t)
This two-step process is essential for figuring out how these variables are linked to one another. The model can validate the presence of a causal relationship by comparing the predictive accuracy of the decoder when using the original x(t) versus the surrogate x(s) (t). A significant drop in accuracy with the surrogate data suggests that the original x(t) contains specific information causally linked to the future states of y(t).
Architecture
In the TACI architecture, the concept of a two-headed encoder is employed to simultaneously process two distinct time series datasets, denoted as x(t) and y(t). This design allows for the independent yet parallel analysis of each time series, enabling the model to capture and encode their individual characteristics and temporal dynamics before merging their representations during the bottleneck process. The input sequences are selected to be greater in length than the autocorrelation time of each variable. This ensures that the sequences capture meaningful temporal dependencies and dynamics. A GaussianNoise layer is added to enhance the model’s ability to generalize and prevent overfitting.
The most important part of the encoder includes the use of a Temporal Convolutional Network (TCN) layer. Thus, capturing the long-term dependencies within each time series. This layer utilizes several key parameters: “nb fitters” sets the number of convolutional filters, “kerneLsize” affects the temporal extent of each convolution, “dilatiois” allows the model to efficiently gather information across various temporal distances. Additionally, “Dropout” layers are used to decrease overfitting by randomly dropping units during the training phase. Following the TCN, a Conv1D layer continues to process the data for each series, allowing the network to change dimensionality while preserving temporal resolution. An AveragePooling1D layer may then downsample the Conv1D layer’s output by pooling across the temporal dimension. This operation reduces the sequence length, emphasizing significant features and further decreasing data dimensionality. The data is subsequently processed by a series of Dense layers that compress it into a dense, lower-dimensional latent representation. The size of these layers decreases in each successive layer, concentrating the information into a more compact form.
The bottleneck stage starts once the two-headed encoder has finished processing and compressing the input sequences into a lower-dimensional latent space representation. The Bottleneck merges these latent representations through an element-wise multiplication operation. By combining the representations in this manner, the model effectively captures the potential interactions and dependencies between the time series, which are essential for uncovering causal relationships.
Once the latent representations are merged in the Bottleneck, this combined representation is forwarded to the Decoder. The Decoder’s task is to predict the future trajectory of the target time series. The first step in the Decoder is to progressively upscale the combined latent representation. This is achieved through a series of Dense layers, where each layer aims to increase the dimensionality of the data. The number and size of these layers are determined by the complexity of the data and the level of compression achieved by the Encoder. After the initial upscaling, an UpSampling1D layer is used to increase the sequence length to its original size, effectively reversing the pooling operation performed in the Encoder. A TCN layer is used to ensure that the reconstructed data maintains its temporal integrity and dynamics. This layer mirrors the TCN configuration in the Encoder, utilizing the same parameters for “nb fitters”‘, “knrneLszie”, and “dilations” to capture the temporal dependencies and patterns necessary for accurate prediction. Lastly, a Dense output layer produces the final prediction of the future states of the target time series.
Training and Prediction
As discussed earlier, the training phase of the TACI model involves four distinct configurations of the network. Central to this phase is the use of the Mean Squared Error (MSE) as the loss function, which facilitates the optimization of predictions for future trajectories against actual observed values. The Adam optimizer [53] is employed for its adaptive learning rate capabilities. Training is performed across 300 epochs to give the model enough time for the parameters to adjust and converge toward optimal solutions. The parameters controlling the batch size and data shuffling are finely tuned to balance computational efficiency and the promotion of model generalization. Callbacks such as ReduceLROn-Plateau, EarlyStopping, and ModelCheckpoint are employed in this phase for optimizing the training process by adjusting learning rates, preventing overfitting, and preserving the best model state, respectively.
Surrogate data were created by drawing random values from a uniform distribution between zero and one until a time series the length of the original one was generated. An alternative method would have been to create a surrogate time series by first converting the original series into the frequency domain through a Fourier transform. Then, we could apply random phase shifts, making sure the amplitude spectrum remained unaltered. This randomness is crucial to breaking any specific temporal dependencies present in the original series. Following this process, an inverse Fourier transform could be employed to reconstruct the series back into the time domain. This step generates a new time series that mirrors the original in terms of its overall power distribution but only has random contingencies with its partner data set. In practice, however, we found that this latter methodology did not result in more accurate results in training TACI, so we focused on the initially described method for generating surrogate data in this study.
After training is completed, the model moves on to the prediction phase, where the focus shifts to evaluating the trained model. In the first step of the prediction phase, the pre-trained models are loaded, each representing a unique configuration designed in the training phase to capture and analyze the causal dynamics between the time series datasets x(t) and y(t). At the same time, the full original dataset is divided into sequences with the same length and structure as the models were trained on. The prediction process occurs over defined rolling windows to allow for a temporal exploration of the dataset, enabling the models to make predictions for future states of the time series within each window. The models’ accuracy in forecasting future time series states is quantitatively evaluated for each rolling window using the R2 metric. To enhance the reliability and confidence of these assessments, 100 bootstrap samples are generated for each window. The causal inference for each rolling window can be determined using the CSGI Eq. 4. Through this calculation, the model not only quantifies the strength and direction of the causal relationship but also shows its variation over time, providing a dynamic and temporal perspective on causal inference.
For each interval, a bootstrap strategy is implemented. This strategy involves creating a set number of surrogate samples by randomly resampling within the interval. These samples are then used to evaluate the model’s predictions, which are generated under two conditions: one using the actual interactions between the time series and another using the surrogate data. By employing Equation 4, it’s possible to derive scores from which we compute both the mean and standard deviation. These computations provide insight into the average performance and variability of the model’s predictions across the bootstrap samples. The utilization of bootstrap methods significantly enhances the analytical depth by ensuring that the derived error bars and confidence intervals are supported by a solid statistical foundation. These statistics play a vital role in establishing the error bars in the plotted figures. By repeating this procedure across all intervals, the method provides a comprehensive view of how model performance fluctuates over time and under different conditions.
Acknowledgements
Both authors were supported by the Human Frontier Science Program (RGY0076/2018) and the Simons Foundation (707102 & 876207), and JC was supported by the NSF Physics of Living Systems Student Research Network (PHY-1806833). GJB would like to acknowledge the Aspen Center for Physics, where many of the initial ideas for this work were generated.
References
- [1]A Parametric Method to Measure Time-Varying Linear and Nonlinear Causality With Applications to EEG DataIEEE Transactions on Biomedical Engineering 60:3141–3148https://doi.org/10.1109/TBME.2013.2269766
- [2]Rhythms of the BrainOxford, UK: Oxford University Press
- [3]Proceedings of the Joint Oceanographic Assembly 1982 General Symposia: Dalhousie University, Halifax, Nova Scotia, CanadaCanadian National Committee for the Scientific Committee on Oceanic Research
- [4]Time-varying causality between crude oil and stock markets: What can we learn from a multiscale perspective?International Review of Economics & Finance 49:453–483https://doi.org/10.1016/j.iref.2017.03.007
- [5]Time-varying causality between renewable and non-renewable energy consumption and real output: Sectoral evidence from the United StatesRenewable and Sustainable Energy Reviews 149https://doi.org/10.1016/j.rser.2021.111326
- [6]Relationship between green bonds and financial and environmental variables: A novel time-varying causalityEnergy Economics 92https://doi.org/10.1016/j.eneco.2020.104941
- [7]Greenhouse Warming, Decadal Variability, or El Niño? An Attempt to Understand the Anomalous 1990sJournal of Climate 10:2221–2239https://doi.org/10.1175/1520-0442(1997)010<2221:GWDVOE>2.0.CO;2
- [8]Decadal variability of twentieth-century El Ninño and La Ninña occurrence from observations and IPCC AR4 coupled modelsGeophysical Research Letters 36https://doi.org/10.1029/2009GL037929
- [9]Synchronization and causality across time scales in El Ninño Southern OscillationNPJ Climate and Atmospheric Science 1https://doi.org/10.1038/s41612-018-0043-7
- [10]Investigating Causal Relations by Econometric Models and Cross-spectral MethodsEconometrica 37:424–438
- [11]Detecting and quantifying causal associations in large nonlinear time series datasetsScience Advances 5https://doi.org/10.1126/sciadv.aau4996
- [12]Causal Discovery with Attention-Based Convolutional Neural NetworksMachine Learning and Knowledge Extraction 1:312–340https://doi.org/10.3390/make1010019
- [13]A study of problems encountered in Granger causality analysis from a neuroscience perspectiveProceedings of the National Academy of Sciences 114https://doi.org/10.1073/pnas.1704663114
- [14]Granger causality and the sampling of economic processesJournal of Econometrics 132:311–336https://doi.org/10.1016/j.jeconom.2005.02.002
- [15]Data-driven causal analysis of observational biological time serieseLife 11https://doi.org/10.7554/eLife.72518
- [16]Detecting Causality in Complex EcosystemsScience 338:496–500https://doi.org/10.1126/science.1227079
- [17]Causal inference from noisy time-series data — Testing the Convergent Cross-Mapping algorithm in the presence of noise and external influenceFuture Generation Computer Systems 73:52–62https://doi.org/10.1016/j.future.2016.12.009
- [18]Distinguishing time-delayed causal interactions using convergent cross mappingScientific Reports 5https://doi.org/10.1038/srep14750
- [19]Distinguishing random environmental fluctuations from ecological catastrophes for the North Pacific OceanNature 435:336–340
- [20]Ecology for bankersNature 451:893–894
- [21]Nonlinear effects of large-scale climatic variability on wild and domestic herbivoresNature 410:1096–1099
- [22]Analyzing multiple nonlinear time series with extended Granger causalityPhysics Letters A 324:26–35https://doi.org/10.1016/j.physleta.2004.02.032
- [23]Kernel Method for Nonlinear Granger CausalityPhys Rev Lett 100https://doi.org/10.1103/PhysRevLett.100.144103
- [24]Neural Granger CausalityIEEE Transactions on Pattern Analysis and Machine Intelligence :1–1https://doi.org/10.1109/tpami.2021.3065601
- [25]Extended Granger causality: a new tool to identify the structure of physiological networksPhysiological Measurement 36https://doi.org/10.1088/0967-3334/36/4/827
- [26]Investigating Causal Relations by Econometric Models and Cross-spectral MethodsEconometrica 37:424–438
- [27]Causality: Models, Reasoning and InferenceUSA: Cambridge University Press
- [28]Deep LearningMIT Press
- [29]Causal Effect Inference with Deep Latent-Variable Models
- [30]Causal Generative Neural Networks
- [31]Causal Discovery with Attention-Based Convolutional Neural NetworksMachine Learning and Knowledge Extraction 1:312–340https://doi.org/10.3390/make1010019
- [32]Temporal Convolutional Networks: A Unified Approach to Action SegmentationComputer Vision – ECCV 2016 Workshops Cham: Springer International Publishing :47–54
- [33]An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence ModelingarXiv
- [34]Episodic Fluctuations in Larval SupplyScience 283:1528–1530https://doi.org/10.1126/science.283.5407.1528
- [35]Detecting strange attractors in turbulenceDynamical Systems and Turbulence, Warwick 1980 Berlin, Heidelberg: Springer Berlin Heidelberg :366–381
- [36]Generalized Theorems for Nonlinear State Space ReconstructionPLOS ONE 6:1–8https://doi.org/10.1371/journal.pone.0018295
- [37]skccm: State-space Reconstruction by k- Nearest Neighbors Convergent Cross MappingGitHub https://github.com/nickc1/skccm/blob/master/skccm/skccm.py
- [38]Measuring Information TransferPhys Rev Lett 85:461–464https://doi.org/10.1103/PhysRevLett.85.461
- [39]JIDT: An information-theoretic toolkit for studying the dynamics of complex systemsFrontiers in Robotics and AI 1https://doi.org/10.3389/frobt.2014.00011
- [40]Deterministic Nonperiodic FlowJournal of the Atmospheric Sciences 20:130–141https://doi.org/10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2
- [41]An Equation for Continuous ChaosPhysics Letters A 57:397–398https://doi.org/10.1016/0375-9601(76)90101-8
- [42]Nonlinear analyses of interictal EEG map the brain interdependences in human focal epilepsyPhysica D: Nonlinear Phenomena 127:250–266https://doi.org/10.1016/S0167-2789(98)00258-9
- [43]Learning driver-response relationships from synchronization patternsPhys Rev E 61:5142–5148https://doi.org/10.1103/PhysRevE.61.5142
- [44]A two-dimensional mapping with a strange attractorCommunications in Mathematical Physics 50:69–77https://doi.org/10.1007/BF01608556
- [45]Directionality of coupling from bivariate time series: How to avoid false causalities and missed connectionsPhys Rev E 75https://doi.org/10.1103/PhysRevE.75.056211
- [46]Detecting dynamical interdependence and generalized synchrony through mutual prediction in a neural ensemblePhys Rev E 54:6708–6724https://doi.org/10.1103/PhysRevE.54.6708
- [47]Jena Climate Datasethttps://www.bgc-jena.mpg.de/wetter/
- [48]The Relationship between Relative Humidity and the Dewpoint Temperature in Moist Air: A Simple Conversion and ApplicationsBulletin of the American Meteorological Society 86:225–234https://doi.org/10.1175/BAMS-86-2-225
- [49]Large-Scale Information Flow in Conscious and Unconscious States: an ECoG Study in MonkeysPLOS ONE 8https://doi.org/10.1371/journal.pone.0080845
- [50]Untangling Brain-Wide Dynamics in Consciousness by Cross-EmbeddingPLoS Comput Biol 11https://doi.org/10.1371/journal.pcbi.1004537
- [51]Multidimensional Recording (MDR) and Data Sharing: An Ecological Open Research and Educational Platform for NeurosciencePLOS ONE 6:1–7https://doi.org/10.1371/journal.pone.0022561
- [52]Circuit organization of the rodent medial prefrontal cortexTrends in Neurosciences 44:550–563https://doi.org/10.1016/j.tins.2021.03.006
- [53]Adam: A method for stochastic optimizationarXiv
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2024, Josuan Calderon & Gordon J Berman
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 523
- downloads
- 8
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.