iEEG implantation, behavioral task and computational modeling

(A) Anatomical location of intracerebral electrodes across the 16 epileptic patients. Anterior insula (aINS, n=75), dorsolateral prefrontal cortex (dlPFC, n=70), lateral orbitofrontal cortex (lOFC, n=59), ventromedial prefrontal cortex (vmPFC, n=44), (B) Number of pairwise connectivity links (i.e. within patients) within and across regions, (C) Example of a typical trial in the reward (top) and punishment (bottom) conditions. Participants had to select one abstract visual cue among the two presented on each side of a central visual fixation cross and subsequently observed the outcome. Duration is given in milliseconds, (D) Number of trials where participants received outcomes +1€ (142±44, mean±std) vs. 0€ (93±33) in the rewarding condition (blue) and outcomes 0€ (141±42) vs. −1€ (93±27) in the punishment condition (red), (E) Across participants trial-wise reward PE (RPE - blue) and punishment PE (PPE - red), ± 95% confidence interval.

Local mixed encoding of reward and punishment prediction error signals

(A) Time-courses of mutual information (MI in bits) estimated between the gamma power and the reward (blue) and punishment (red) PE signals. The solid line and the shaded area represent the mean and SEM of the across-contacts MI. Significant clusters of MI at the group level are plotted with horizontal bold lines (p<0.05, cluster-based correction, non-parametric randomization across epochs), (B) Instantaneous proportions of task-irrelevant (gray) and task-relevant bipolar derivations presenting a significant relation with either the RPE (blue), the PPE (red) or with both RPE and PPE (purple). Data is aligned to the outcome presentation (vertical line at 0 seconds).

Encoding of PE signals occurs with redundancy-dominated subsystems

Dynamic interaction information (II in bits) within-(A) and between-regions (B) about the RPE (IIRPE) and PPE (IIPPE) are plotted in blue and red. Significant clusters of IIRPE and IIPPE are displayed with horizontal bold blue and red lines (p<0.05, cluster-based correction, non-parametric randomization across epochs). Significant differences between IIRPE and IIPPE are displayed in green. Shaded areas represent the SEM. The vertical gray line at 0 seconds represents the outcome presentation.

Contextual modulation of information transfer

Time courses of transfer entropy (TE, in bits) from the aINS to the dlPFC (aINS→dlPFC) and from the vmPFC to the lOFC (vmPFC→lOFC), estimated during the rewarding condition (TERew in blue) and punishing condition (TEPun in red). Significant differences (p<0.05, cluster-based correction, non-parametric randomization across epochs) of TE between conditions are displayed with horizontal bold lines (blue for TERew > TEPun and red for TEPun > TERew). Shaded areas represent the SEM. The vertical gray line at 0 seconds represents the outcome presentation.

Synergistic interactions about the full PE signals between recordings of the dlPFC and vmPFC

(A) Dynamic interaction information (II in bits) between the dlPFC and vmPFC about the full prediction error (IIdlPFC-vmPFC). Hot and cold colors indicate synergy- and redundancy-dominated II about the full PE. Significant clusters of II are displayed with a horizontal bold green line (p<0.05, cluster-based correction, non-parametric randomization across epochs). Shaded areas represent the SEM. The vertical gray line at 0 seconds represents the outcome presentation. (B) Dynamic IIdlPFC-vmPFC binned according to the local specificity PPE-RPE (IIPPE-RPE in pink) or mixed (IIMixed in purple) (C) Distributions of the mean of the IIPPE-RPE and IIMixed for each pair of recordings (IIPPE-RPE: one sample t-test against 0; dof=34; P fdr-corrected=0.015*; T=2.86; CI(95%)=[6.5e-5, 3.9e-4]; IIMixed: dof=33; P fdr-corrected=0.015*; T=2.84; CI(95%)=[5.4e-5, 3.3e-4]).

Summary of findings

The four nodes represent the investigated regions, namely the anterior insula (aINS), the dorsolateral and ventromedial parts of the prefrontal cortex (dlPFC and vmPFC) and the lateral orbitofrontal cortex (lOFC). The outer disc represents the local mixed encoding i.e. the different proportions of contacts over time having a significant relationship between the gamma power and PE signals. In blue, is the proportion of contacts with a significant relation with the PE across rewarding trials (RPE-specific). Conversely, in red for punishment trials (PPE-specific). In purple, the proportion of contacts with a significant relationship with both the RPE and PPE. In gray, is the remaining proportion of non-significant contacts. Regarding interactions, we found that information transfer between aINS and dlPFC carried redundant information about PPE only and information transfer between vmPFC and lOFC about RPE only. This information transfer occurred with a leading role of the aINS in the punishment context and the vmPFC in the rewarding context. Finally, we found synergistic interactions between the dlPFC and the vmPFC about the full PE, without splitting into rewarding and punishing conditions.

Results of the one sample t-test performed against 0

Single subject anatomical repartition

(A) Number of unique subject per brain region and per pair of brain regions (B) Number of bipolar derivations per subject and per brain region

Single-subject estimation of predictions errors

Single-subject trial-wise reward PE (RPE - blue) and punishment PE (PPE - red), ± 95% confidence interval.

Local encoding of prediction error signals within the gamma band

We searched for the most informative sub-band about the R/PPE within the broad gamma range. To this end, we estimated the PSD between [50, 200]Hz during the first second following the outcome apparition. We used multitapers with a 20 Hz bandwidth for the multi-taper windowing function. We then estimated the amount of information carried by individual gamma frequencies about the R/PPE in the four brain regions. We found significant clusters of MI over the four brain regions with both RPE and PPE signals mostly within the [50, 100]Hz range (Fig. S1A). We then estimated the density of information within the [50, 100]Hz, [100, 150]Hz and [150, 200]Hz bins (Fig. S1B). Approximately 60% of the total information was concentrated within the [50, 100]Hz range. (A) Distribution of information in the aINS, dlPFC, lOFC and vmPFC about the R/PPE in the frequency domain. The solid line and the shaded area respectively represent the mean and SEM of the across-contacts MI. Horizontal thick lines represent significant clusters of information (p<0.05, cluster-based correction, non-parametric randomization across epochs), (B) Density of information in the [50, 100]Hz, [100, 150]Hz and [150, 200]Hz bins.

Inter-subjects reproducibility of local encoding of PE signals

To assess that the local encoding of RPE and PPE signals were not driven by a minority of subjects, we computed the mutual information between the gamma activity of each bipolar derivation and the R/PPE and performed statistical inferences at the single-subject level (correction for multiple comparisons across time points and bipolar derivations). We then estimated the proportion of unique subjects per brain region with at least one significant bipolar derivation (Fig. S4). We found a minimum proportion of approximately 30% reproducibility achieved in the lOFC and vmPFC for the encoding of the RPE and PPE, respectively. Conversely, between 75-100% of the subjects having bipolar derivations in the aINS and dlPFC presented a significant encoding of the PPE. Time-courses of proportion of subjects having at least one bipolar derivation with a significant encoding (p<0.05, cluster-based correction, non-parametric randomization across epochs) of RPE (blue) or PPE (red). Data is aligned to the outcome presentation (vertical line at 0 seconds).

Inter-subjects reproducibility of redundant interactions about PE signals

Similarly, we quantified the reproducibility of the type of interactions the four brain regions had about the RPE and PPE. We computed the interaction information between all pairs of bipolar derivations per subject and performed the statistical inferences at the single-subject level (correction for multiple comparisons across time points and pairs bipolar derivations). We finally estimated the proportion of unique subjects per pair of brain regions having at least one significant pair of bipolar derivations with either redundant or synergistic interactions with either the RPE and PPE (Fig. S5). For the within-regions interactions, approximately 60% of the subjects had redundant interactions about R/PPE in the aINS and about the PPE in the dlPFC and 40% about the RPE in the vmPFC. For the across-regions interactions, 60% of the subjects had redundant interactions between the aINS-dlPFC and dlPFC-lOFC about the PPE, and 30% had redundant interactions between lOFC-vmPFC about the RPE. Time-courses of proportion of subjects having at least one pair of bipolar derivation with significant interaction information (p<0.05, cluster-based correction, non-parametric randomization across epochs) about the RPE (blue) or PPE (red). Data is aligned to the outcome presentation (vertical line at 0 seconds). Proportion of subjects with redundant (solid) and synergistic (dashed) interactions are respectively going downward and upward.

Optimal delay interval for maximising information transfer

We searched for the optimal delay interval to maximise the information transfer between regions. An important parameter to consider when estimating TE is the number of time points in the past of the target to use for conditioning. An information flow from source X to target Y exists because the inclusion of the past of X reduces the uncertainty about the future of Y, given its own past. We investigated the influence of the delay on the TE. To this end, we estimated the TE across all pairs of contacts per participant, for rewarding and punishing trials, at every possible delay up to 350ms (Fig. S6). We found a maximum information flow for delays up to 176 ms. Therefore, for the main text analyses, we used a range centred around 176±60ms ([116, 236]ms). Modulation of transfer entropy (TE in bits) as a function of the delay between source and target areas. In blue, the TE computed across rewarding trials and in red, the TE computed across punishing trials. Shaded areas surrounding the time courses represent the 95% confidence interval estimated using a bootstrapping strategy.

Differential cortico-cortical directional interactions

We estimated the transfer entropy (TE) on the gamma power during the rewarding (TERew) and punishment conditions (TEPun). As in the main text, we computed the TE for all pairs of brain regions within delays between 116 and 236 msec and detected temporal clusters where the TE significantly differed between conditions (TERew > TEPun or TEPun > TERew). Only two pairs of brain regions displayed statistically significant modulations in TE (Fig. S7). The TE from the aINS to the dlPFC (TEaINS→dlPFC) during the punishment condition and the TE from the vmPFC to the lOFC (TEvmPFC→lOFC) during the rewarding condition. No other brain interactions were found significant. Time courses of transfer entropy (TE, in bits) estimated during the rewarding condition (TERew in blue) and punishing condition (TEPun in red). Significant differences (p<0.05, cluster-based correction, non-parametric randomization across epochs) of TE between conditions are displayed with horizontal bold lines (blue for TERew > TEPun and red for TEPun > TERew). Shaded areas represent the SEM. The vertical grey line at 0 seconds represents the outcome presentation.

Cortico-cortical interactions about the full PE signals

Dynamic interaction information (II in bits) between-regions about the full prediction error (IIPE). Hot and cold colours indicate synergy- and redundancy-dominated interactions about the full PE. Significant clusters of IIPE are displayed with a horizontal bold green line (p<0.05, cluster-based correction, non-parametric randomization across epochs). Shaded areas represent the SEM. The vertical grey line at 0 seconds represents the outcome presentation.

Interaction information binned according to the local specificity

We binned the II about the full PE (i.e. by concatenating the RPE and PPE) according to the local specificity of the bipolar derivations in the dlPFC and vmPFC i.e. contacts with gamma activity modulated according to the RPE only, to the PPE only or to both RPE and PPE (Fig. 2B). As a result, we binned the II into four categories: the IIRPE-RPE and IIPPE-PPE respectively reflecting the II estimated between recordings specific to the RPE and PPE, the IIPPE-RPE between recordings PPE and RPE specific and the IIMixed for the remaining possibilities (i.e. RPE-Both, PPE-Both and Both-Both) (Fig. S9A). We reported a significant cluster II about the full PE approximately between [250; 600]ms (Fig. 5). Therefore, we estimated the mean II across time points between [250; 600]ms within each of the four categories, for each pair of recordings (Fig. S9B) and computed a one-sample t-test against 0 (Table 1). Only the IIPPE-RPE (pink) and IIMixed (purple) mean showed a significant difference from 0. However, the number of pairs of recordings for the IIRPE-RPE(red, 7 pairs) and IIPPE-PPE (blue, 6 pairs) were probably too small to find a significant difference. (A) Dynamic interaction information (II in bits) between the dlPFC and vmPFC (IIdlPFC-vmPFC) binned according to the local specificity toward the RPE and PPE. Shaded areas represent the SEM. The vertical grey line at 0 seconds represents the outcome presentation. (B) Mean II between time points from 250 to 600 ms after outcome presentation per category of local specificity. Each individual point represents one pair of recordings from the dlPFC and vmPFC.

Simulation study testing for synergistic effects

We performed a simulation to demonstrate that synergistic interactions can emerge between two regions with the same specificity. For example, consider one region that locally encodes early trials of reward prediction error (RPE) and a second region that encodes late trials of RPE. Combining the two using the interaction information (II) measure would lead to synergistic interactions, as each region carries information that is not carried by the other.

To simulate this scenario, we initialized data for two brain regions, X and Y, and a 200-trial prediction error vector, all using random noise sampled from a uniform distribution. To simulate redundant interactions (Fig. S10A), both X and Y received a copy of the prediction error (one-to-all). To simulate synergy (Fig. S10B), X and Y received early and late prediction error trials, respectively (all-to-one).

In both scenarios, local mutual information (MI) encoding the PE increased for regions X and Y around 1.5 seconds. However, in the first case, it led to negative II (redundancy), while in the second case, it led to positive II (synergy). This toy example illustrates that local specificity is not the only factor determining the type of interactions between regions.

Within-area local encoding of PE using the mutual information (MI, in bits) for regions X and Y and between-area interaction information (II, in bits) leading to (A) redundant interactions and (B) synergistic interactions about the PE.