Means and standard deviation of the demographic data.

Vineland: Vineland-II Adaptive Behaviour Scales, IBQ: Infant behaviour questionnaire, STAI: State-Trait Anxiety Inventory. Vineland and IBQ pertain to the infants, whereas STAI pertains to the parents. The number of participants who completed the questionnaires is reported separately for each questionnaire.

Cued reversal paradigm – The pre- and post-reversal trials started with gaze-contingent orange and blue star cues that span for 500ms upon looking at it.

This was followed by a 1000ms long auditory cue that had a different tune in the pre- and post-reversal phases. The last 500ms of the auditory cue overlapped with the decision screen, which lasted 2000ms. In the decision screen, two white rectangles were displayed on either side of the screen. There were three ROIs, the left and right large ROIs and a narrow ROI at the centre (red dashed lines were not visible, and they are used for illustrative purposes only). The left and right large ROIs were defined as the area from the left red line to the left edge and the right red line to the right edge of the screen, respectively. The infants’ choices between the left and right ROIs were determined by dwell times, where the ROI with the greater dwell time was registered as the infants’ choice. The trials where the combined dwell time on the left and right ROIs was shorter than 500ms on the decision screen were repeated and no animation was played on these trials. In the cartoon animation period, a clear or blurred animation would play at the ROI of choice for 5000ms, depending on whether a correct decision was made. Correct and incorrect choices prompted the clear and blurred versions of the animation, and the audio was reversed in the blurred animation. Finally, there was a baseline period, which involved an animation of two concentric circles with an auditory tune, and its duration was jittered between 5000 – 7000ms. For illustrative purposes, we assumed that the left ROI was chosen on the first trial, thus setting the correct side of the screen as the left and right ROIs in the pre- and post-reversal phases, respectively.

Behavioural measures – The panels on the left show behavioural measures across trials where the first and last 18 trials correspond to blocks 1 and 2, respectively.

The red dashed lines indicate the final trial before the reversal in each block, while the solid black line at the 18th trial indicates the end of the first block. The panels on the right illustrate the pairwise differences in behavioural measures with respect to different contrasts (see the legends at the bottom). A) The right panel shows average accuracy in the pre- and post-reversal phases. B) These panels show the average perseveration, defined in terms of consecutive incorrect responses in the same context (pre or post-reversal). The middle panel compares perseveration in pre-vs post-reversal phases averaged across blocks, whereas the right panel compares block-averaged perseveration (i.e., blocks 1 vs 2). C) The green and red shades in the left panel show the dwell time on correct and incorrect ROIs in the decision period, respectively. The right panel shows the dwell time difference between the correct and incorrect ROIs in the pre- and post-reversal phases, averaged across blocks. D) The green and red shades in the left panel show the dwell time on the displayed animation following correct and incorrect responses, respectively. Clear and blurred versions of the animation are displayed following correct and incorrect responses, respectively. This means that green and red shades are associated with the clear and blurred animations. The middle panel shows pre-vs post-reversal dwell time on the displayed animation, averaged across blocks. The right panel shows the dwell time on the animation following correct (clear animation) and incorrect (blurred animation) responses.

Behavioural responses – This table shows the significant pairwise differences between behavioural measures for the contrasts i) post vs pre-reversal, ii) block 1 vs 2, and iii) correct vs incorrect trials.

Model comparison – A) There are four competing models at this stage, namely i) simple – keep experience, ii) simple – discard, iii) simple – no context and iv) complex.

The upper panel shows that simple – discard has more evidence than the other models, and the lower panel shows this model has a posterior probability of nearly 1. B) The winning model of the previous stage, simple – discard, is refitted to the data, excluding one component of expected free energy at a time. The competing models at this stage are i) expected free energy (EFE), ii) no novelty, iii) no epistemic and iv) no extrinsic. The no novelty model has more evidence than the rest and similarly has a posterior probability of nearly 1. C) The winning model of the second stage enters this stage as a parent model, where the children models are defined by turning off the parameters of the former. There are 24 = 16 children models because each parameter of the parent model can be either included (on) or excluded (off). The bottom left panel shows the model space where the dark and white colours indicate which parameters are on or off for a given model. The top left and right panels compare models in terms of their evidence and posterior probabilities, respectively. The middle left and right panels show the maximum a posteriori (MAP) parameter estimates (in log space) at the group level before and after applying Bayesian model reduction, respectively. The posterior probabilities of parameters are presented at the bottom right panel. Parameters with higher posterior probabilities have greater evidence to be included in the model.

Parameter estimates and correlations

A) Individual parameter estimates and the correlation between the parameters are displayed on the left and right panels, respectively. B) The group-level means of parameters and their posterior probabilities are shown on the panels on the left and centre. The correspondence between the model’s and infants’ choices is presented in terms of decision fit on the right panel. C) Two datasets were generated per infant using their individual parameter estimates. The panel on the left shows the accuracy in the original dataset, whereas the right panel shows the average accuracy (over datasets and infants) in the simulated dataset. D) This panel presents the correlations between the original parameter estimates and the estimates obtained by fitting the model to the simulated dataset. Parameters are presented in log space.

Pupil size and prediction error

A) Normalised pupil size and prediction errors are plotted across trials on the left panels. The differences between the average post and pre-reversal responses are plotted as a function of the number of trials (n) used before (pre-reversal) and after (post-reversal) the contextual switch. The average responses in the pre-reversal phase are obtained by computing the mean responses over n preceding trials relative to the switch trial. Similarly, n trials are used from the switch trial to compute the average responses in the post-reversal phase. The differences between the two are reported for the pupil size PupilPostPre and prediction errors PEPostPre as a function of n on the right panels. B) The top panel shows that there is a moderate correlation between PupilPostPre and PEPostPre for n = 3. The bottom panels show that there are significant differences between the post and pre-reversal pupil size and prediction errors for n = 3. C) The average difference between the responses on incorrect and correct trials are computed for pupil sizes PupilIncCor and prediction errors PEIncCor. The top panel shows a significant correlation between PupilIncCor and PEIncCor. Excluding the outlier infant (see the red circle), still yielded significant correlations. The bottom panels show that there is an increase in the pupil size and prediction errors on incorrect trials. However, this increase is only significant for the prediction error. The error bars show the standard error of the mean.

Infant clusters

A) Applying k-means clustering returned two clusters of infants, represented by the yellow and magenta colours. The top left panel shows the clusters in the parameter space after minmax normalisation, where the cluster centers (centroids) are shown with the star symbols. The differences between the two clusters manifest behaviourally in terms of accuracy, perseveration, and dwell time difference between the correct and incorrect ROIs in the decision period. The lower panels compare the clusters in the pre-reversal (light colours) and post-reversal (dark colours) phases. B) These panels show cluster-wise differences in terms of questionnaire scores. These are the Vineland subscales – Talking, – Caring, – Playing and the total Vineland score, and the IBQ – Surgency subscale from the Infant Behaviour Questionnaire (IBQ).

Cluster-wise differences

A) GLMM analysis of behavioural measures returned a significant interaction effect between the phase (pre- or post-reversal) and cluster identity on accuracy, perseveration and dwell time difference (decision period). This table shows the significant pairwise differences between clusters in the pre- and post-reversal phases. B) Multiple linear regression was used to test if the cluster identity and age explained questionnaire scores. The significant coefficients for cluster identity and age are reported in this table. The reference cluster is magenta for the cluster coefficient βCluster, and negative coefficient estimates indicate reduced scores for the yellow cluster. Age is converted to months for the regression analysis.

Comparisons between clustering solutions

The upper panel compares clustering solutions derived from behavioural measures and parameters when tested on the questionnaire data. The lower panel compares clustering solutions obtained with the questionnaires and parameters when tested on the behavioural measures. The entries in the table are the variance ratio criterion (VRC) estimates. Higher VRC indicates better separation across clusters.