Task Design and Pupil Dilation Signal.

A. Upper part: Baseline-corrected change in pupil dilation during a trial (mean signal and 95% confidence intervals estimated with a Local Polynomial regression). The grey-shaded area indicates the baseline and the unshaded area indicates the time-window of interest, in which infants can predict whether they will receive information. Lower part: Informative and uninformative trials. After a fixation stimulus, 4 identical static shapes (i.e., the cue) were presented. The border type of the shapes (pointy vs. smooth) predicted whether their following movement was informative. In informative trials, all four shapes moved to one corner of the screen signalling the location of the reward. In uninformative trials, each shape moved to a different corner of the screen. After the shapes moved back to the centre and glowed up twice, a cartoon animal was presented as reward. B. Example of the cues presented in the first 25 trials. Border type and colours were counterbalanced across participants. After 17 trials, new shapes were added. From that moment onwards, on each trial either familiar or novel shapes were presented as cues.

Pupil dilation during informative and uninformative events.

A. The Bayesian additive models estimated pupil change during the predictive time window, with informative trials in red and uninformative trials in blue. The shaded areas represent the standard error (darker) and 89% credible interval (lighter) of the estimate. The x-axis displays time in milliseconds and the y-axis shows the estimated pupil change from baseline. B. Overall, the estimated pupil change was lower for informative trials compared to uninformative trials. C. The difference between the conditions developed over trials, as infants learned which stimuli were informative. Across trials, the pupil constricted more in informative trials while it remained unchanged in uninformative trials.

Model comparison.

A. Estimated pupil change over trials as predicted by the linear and TD-learning models. The linear model is a purely statistical model, while the TD model also makes assumptions about the underlying cognitive mechanisms. The shaded areas represent the standard error (darker) and 89% credible interval (lighter) of the estimate. B. Model comparison was performed comparing waic scores. The TD model (in green) had a lower WAIC score, indicating better performance. The elpd difference (in blue) offers a direct comparison between the two models, showing that the TD-learning model was significantly better that the linear model. The errorbar represents the standard deviation of the elpd difference.

Pupil dilation during different learning moments.

Change in the estimated pupil change is displayed as a function of cue type (informative/uninformative) and learning (before learning/after learning/generalization). As expected, before learning (i.e., trials 1 to 4), there was no difference in pupil size between the informative and uninformative trials. After learning (i.e., trials 12 to 15), infants showed a more constricted pupil in informative trials compared to uninformative ones. This pattern was also shown for the generalization trials (i.e., trials 18 to 21), suggesting that infants were able to generalize their knowledge to novel, unseen stimuli. The shaded areas represent the standard error (darker) and 89% credible interval (lighter) of the estimates.

Pupil size trends across conditions.

Slope estimates by condition derived from the Bayesian additive model applied to the predictive time window. The graph shows a negative slope in the informative condition, indicating a consistent decrease in pupil size over multiple trials. In the uninformative condition, the slope is not different from zero, indicating no significant change in pupil size across trials.

Latency to the reward location.

The estimated normalized latency to the reward location is shown as a function of cue type (informative/uninformative). The informative trials are plotted in red, while the uninformative trials are plotted in blue. The shaded areas represent the standard error (darker) and 89% credible interval (lighter) of the estimate.

AIC values.

AIC values of the additive models fitted on TD-learning estimates extracted using different learning rates. The figure shows that the model with the lowest AIC was obtained using a learning rate of 0.19.

AIC values in relation to the learning rate.

AIC values of models fitted on TD-learning estimates extracted using different learning rates