Figures and data

Hierarchical decision-making task.
The task required decisions at two levels. (A) The higher-level task required selecting the appropriate stimulus-response mapping for the lower-level task, which flipped between two contexts. Visual cues – pairs of dots – below the horizontal midline were indicative of Context 1, and cues above the midline were indicative of Context 2. In the Instructed condition the cues were completely reliable. In the Inferred condition noise was added to the position of the cues. (B) On the lower-level, participants used the selected stimulus-to-response mapping to report a simple orientation judgment of a visual grating stimulus. (C) Combined sequence of task events consisting of an ITI during which participants were presented with between 3 and 10 cues, and a trial period, containing the response to the basic mapping task. Before each cue the context could potentially change. The context active at the end of one trial was the context active at the beginning of the next ITI. (D) Accuracy of participants in Instructed and Inferred conditions. Vertical and horizontal lines represent the mean accuracy across participants in the respective condition.

Modeling the internal context selection process in the Inferred condition.
(A) Schematic of the normative (Bayes-optimal) solution for the higher-level decision. The algorithm involves combining a prior belief with incoming evidence (in the form of a log-likelihood ratio) and non-linearly transforming the resulting posterior to form the prior before the next cue (Glaze et al., 2015). During trials, a noisy version of the posterior belief is used to determine the current context, and hence, how to map the presented stimulus to a response. (B) Individual performance in the Inferred condition in the data (hollow bars) and in the model fits for a variant of the normative model that fit the data well (shading; “Miscalibrated h and b normative”), along with the theoretical maximum performance of a noiseless, calibrated and normative observer (see main text). (C-E) Temporal profiles of evidence weighting for internal decision (context selection), based on logistic regression (Inferred condition; see Methods). (C) Main effect of LLR on internal decision. (D, E) The regression also included terms to assess the modulatory effect of change point probability (CPP) and uncertainty. Error bars, participant data; gray shading, model fits (“Miscalibrated h and b normative”). (C-E) Error bar and shading widths correspond to ±1 standard error of the mean across participants. Bold lines on x-axis, significant difference from zero (p <= 0.05; two-sided) of participant data points in a permutation cluster-based T-test with threshold-free cluster enhancement (see Methods).

Latent behavioral model variables modulate pupil responses to individual cues.
(A) The effect of CPP on the derivative of the pupil diameter time course (preregistered), and (B) the effect of uncertainty on the pupil diameter time course, both time-locked to cue onset (Inferred condition) and quantified in terms of regression coefficients (Methods). Dashed lines, ±1 standard error of the mean. Bold lines on the axis, intervals in which the data were significantly different from zero. Significant differences from zero assessed in (A) with non-parametric permutation test, with false discovery rate correction (two-sided, q <= 0.05). No points in (B) were significant using this preregistered test. Significance in (B) corresponds to an alternative test against zero that accounts for the temporal structure of the data: A permutation cluster-based t-test with threshold-free cluster enhancement (p <= 0.05; two-sided; see Methods).

Decoding of prior belief about the context, and incoming evidence regarding the context, relative to the onset of cues in the Inferred condition.
Decoding results are plotted separately for 22 cortical regions (homotopic regions from both hemispheres combined), and were produced using source activity estimates (Methods). Cross-validated performance was evaluated through the correlation between decoder output and the values of the target variables (prior or evidence). Regional decoding performance is shown on a reconstruction of the left cortical surface for visualization; decoding is always based on activity patterns from both hemispheres. (A) Decoding of log-prior ratio with effects of log-likelihood ratio (LLR) regressed out. (B) Decoding of evidence (i.e. LLR) with effects of log-prior ratio regressed out. (C) The difference in decoding performance for prior (residuals) and decoding performance for evidence (residuals). That is, the difference between (A) and (B). (D) The difference in decoding performance between the log-prior ratio after being updated with the current cue, and the log-prior ratio before being updated with the current cue. Positive values indicate that the updated prior is more strongly decoded than the old prior. (E, F) Temporal decoding generalization matrices for prior (residuals) and evidence (residuals) in the primary visual and inferior parietal regions, showing the cross-validated performance of decoders trained on data from a specific time point, evaluated on data from all other time points. In all panels, only regions (A-D) or time points (E, F) with decoding significantly different from zero are shaded (assessed using two-sided one-sample T-tests corrected using false discovery rate; q <= 0.05; see Methods).

Decoding of prior belief about the context, and incoming evidence regarding the context, relative to responses (i.e. button presses) in the Inferred condition.
Decoding was performed and evaluated analogously to that in Figure 4. (A – C) Decoding performance was evaluated through correlation. Only regions where decoding was significantly different from zero are shaded. (A) Decoding of log-posterior ratio (LPR) with effects of log-likelihood ratio (LLR) regressed out. (B) Decoding of evidence (i.e. LLR) with effects of LPR regressed out. (C) The difference in decoding performance for posterior (residuals) and evidence (residuals). That is, the difference between (A) and (B). (D and E) Decoding generalization matrices for posterior (residuals) and evidence (residuals) in the primary visual and inferior parietal regions. Only regions or time points with decoding significantly different from zero are shaded (assessed using two-sided one-sample T-tests corrected using false discovery rate; q <= 0.05). Further details in Methods.

Decoding of prior in different frequency bands.
Performed relative to the onset of cues in the Inferred condition. Decoding was performed separately for different brain regions, now using source estimates of spectral power in two frequency bands (5 ± 2.5 Hz and 10 ± 2.5 Hz) instead of the broadband time-domain signal. Otherwise decoder training and evaluation was analogous to that used in Figure 4. The target of the decoding was log-prior ratio with effects of log-likelihood ratio (LLR) regressed out. (A) Decoding using alpha power (10 Hz). (B) Decoding using power at 5 Hz. (C) The difference in decoding performance when using alpha power and when using 5 Hz. That is, the difference between (A) and (B). Only regions with decoding significantly different from zero are shaded (assessed using two-sided one-sample T-tests corrected using false discovery rate; q <= 0.05). Further details in Methods.

Decoding of change point probability (CPP) and uncertainty relative to the onset of cues in the Inferred condition.
Decoding was performed and evaluated analogously to that in Figure 4, using estimates of source activity without time-frequency analysis applied. (A) Decoding of CPP. (B) Decoding of uncertainty. (C) The difference in decoding performance for CPP and uncertainty. That is, the difference between (A) and (B). Only regions with decoding significantly different from zero are shaded (assessed using two-sided one-sample T-tests corrected using false discovery rate; q <= 0.05). Further details in Methods.

The preregistered comparisons of the correlated variability analysis.
Our primary prediction was that when a specific context was active, spontaneous fluctuations in activity in visual representations would be associated with fluctuations in activity in motor representations, in a manner that mirrored the stimulus-to-response mapping rule in that context (see Methods; van den Brink et al., 2023). We measured the latter by training decoders on visual and motor representations and correlating the outputs of visual decoders (“vis”) and action decoders (“act”). (A and B) Positive correlations indicate correlated variability consistent with Context 1, and negative values indicate correlated variability consistent with Context 2. (A) In the Instructed condition we did not observe the predicted difference between Context 1 and Context 2 in decoder output correlations (one-sided permutation T-test, N = 19, T = −1.48, p = 0.922), nor were the correlation values positive under Context 1 or negative under Context 2 (Table 2). (D) Similarly, in the Inferred condition we did not observe the predicted difference between the two conditions (one-sided permutation T-test, N = 19, T = 1.35, p = 0.0962), nor positive values under Context 1 or negative values under Context 2 (Table 2). A preregistered control analysis involved looking at the correlation between pairs of visual decoders and pairs of action decoders. In such pairs, both decoders are trained to decode the same target, so we expected the outputs to be positively correlated under all contexts and conditions (see Methods). These preregistered control tests were all significant (one-sided permutation T-tests against zero; p <= 0.05; Table 2).

Bounds placed on model parameters during fitting, along with the initial range from which candidate start points for the fitting were drawn.

Behavioral model comparison (Inferred condition).
(A) Average accuracy in the Inferred condition across participants (solid lines represent ±1 standard error of the mean; SEM) and in the model fits (gray bars and error bars, which give +1 SEM across simulated participants). (B-E) Comparison of the goodness of fit of all models with plausible performance in (A) using AIC (B and C) and BIC (D and E). (B, D) Mean difference in AIC and BIC from the best fitting model and 95% confidence intervals on these differences. (C, E) The number of participants for which each model was the best fitting model. Further details in Methods.

Secondary preregistered statistical tests of the correlated variability analysis.
These tests were conducted on the correlations between decoder outputs and are described in the Methods. Figure 8 displays the associated data.

Alternative quantification of evidence weighting for higher-level decision in the Inferred condition.
Average evidence residuals favoring the context used, both in the participants (error bars), and in the miscalibrated h and b normative model (shading). Evidence residuals are a measure of evidence that are not correlated between time-points. Average evidence residuals signed, such that positive values indicate support for the used context, will be positive at time-points used to determine the context, and zero at time-points that do not influence the choice (see Methods; Resulaj et al., 2009). Error bars and shading give ±1 standard error of the mean. Bold lines on the axis represent periods in which the participant data differed significantly from zero (two-sided permutation cluster-based T-test with threshold-free cluster enhancement, p <= 0.05).

The regression analysis from Figure 2, performed on the real data, but additionally examining effects of the cold pressor test.
This regression analysis was conducted in the same way as for the main figure, except additional terms were included in the regression for the use of the cold pressor test (“ice water”; see Methods). This allowed us to plot effects with the cold pressor test (“ice water”) and without (“warm water”). Error bars correspond to ±1 standard error of the mean across participants. Unlike in the main figure, we assessed for significant differences between the “ice water” and “warm water” conditions, indicated by bold lines on the x- axis (p <= 0.05; two-sided; permutation cluster-based T-test with threshold-free cluster enhancement; see Methods). Results were qualitatively and quantitatively similar under the two conditions. Only for the effect of LLR modulated by uncertainty was there a significant difference between the conditions.

The effect of CPP on the derivative of pupil diameter when not including a term for sensory difference (preregistered).
This variant of the CPP analysis presented in Figure 3 was conducted to test the robustness of that result. Dashed lines represent ±1 standard error of the mean. Bold lines on the axis represent periods in which the data differed significantly from zero (non-parametric permutation test, with false discovery rate correction, two-sided, q <= 0.05). Further details in Methods.

Cortical distribution of decoding performance using a fine-grained cortical parcellation (180 areas per hemisphere) as defined by Glasser et al. (2016).
Beyond this, the analysis and plotting is identical to that used in the corresponding subplots of Figure 4. Decoding is again performed based on activity patterns in homotopic areas from both hemispheres, and decoding performance is displayed on the left hemisphere for visualization. Only regions with decoding significantly different from zero are shaded (assessed using two-sided one-sample T-tests corrected using false discovery rate; q <= 0.05).

Cortical distribution of decoding performance, relative to the decoding performance in primary auditory cortex (A1).
Analysis and plotting is identical to that used in the corresponding subplots of Figure 4 except that we computed, separately for each participant, time-point and cortical region, the difference in the decoding performance (measured using correlation) between the region under consideration, and the decoding performance in the A1 parcel. Hence, the figure shows differences in decoder performance from the performance achieved in A1. Only regions with decoding (differences) significantly different from zero are shaded (assessed using two-sided one-sample T-tests corrected using false discovery rate; q <= 0.05)

Average y-gaze position and performance of prior belief decoders trained on all Inferred condition data but evaluated contingent on gaze position.
(A) Separately for each participant we computed average gaze position along the y-axis of the screen (i.e. along the vertical axis) in degrees of visual angle (DVA). This was performed separately depending on the context supported by the prior, using Inferred condition data from the pre-cue time window. The histogram represents the distribution across participants. (B, C) Decoding was performed as in main Figure 4, with two exceptions. First, we excluded participants for whom there had been issues with the raw eye tracking image (four excluded; see Methods). Second, cross-validated performance was evaluated separately on two types of cue, “gaze-consistent” cues and “gaze inconsistent” cues, based on median splits of both y-gaze position (during pre-cue time window) and the prior residuals. “Gaze consistent” refers to cues for which y-gaze position and the (residual) prior were on consistent sides of the median split. E.g. gaze was in the upper half of the visual field, and prior was on the side of the median split that was associated with cues in the upper-half of the visual field. “Gaze inconsistent” refers to cues for which y-gaze position and the (residual) prior were on opposite sides of the median splits. Maps of cortical distribution for (B) gaze-consistent priors and (C) gaze-inconsistent priors. Only regions with decoding significantly different from zero are shaded (assessed using two-sided one-sample T-tests corrected using false discovery rate; q <= 0.05). Further details in Methods.

Modulation of evidence decoding by CPP or uncertainty.
Decoding was performed separately for different brain regions using source activity estimates computed from measured MEG signals, and cross-validated performance was evaluated in the same way as before. (A) Decoding of the interaction between the CPP and the LLR associated with the cue, with effects of LLR and log-prior ratio regressed out. (B) Decoding of the interaction between uncertainty before observing the cue and LLR, with the effects of LLR and log-prior ratio regressed out. Further details in Methods. Only regions with decoding significantly different from zero are shaded (assessed using two-sided one-sample T-tests corrected using false discovery rate; q <= 0.05).