Introduction

“Information does not stream into us from the environment. Rather, it is we who explore the environment and suck information from it actively, like food.”

The ability to find valuable information is essential for successfully navigating our complex environment. In today’s information-rich world, this can be a challenge, and failure to do so can result in being misinformed, misguided, or even deceived (Case, 2007; Hwang and Jeong, 2023). The need to distil useful information from the environment is particularly evident when considering the experience infants have in their first year of life. Although infants’ experience of the world is vastly different from our own, they also encounter a vast amount of novel input they have to make sense of (Johnson and Hannon, 2015). To do so, they have to learn to detect relevant stimuli amidst a constant stream of multisensory input (Hunnius, 2022). As they learn to interact with their environment, infants develop expectations about where and how to find valuable information. This enables them to focus their attentional resources on stimuli that are likely to provide useful information while ignoring irrelevant ones, resulting in more efficient learning. This process of information-seeking is the foundation for a lifetime of learning and discovery, but its origins are not yet well understood. Previous research has demonstrated that even infants as young as 8 months of age guide their attention based on the informativity of stimuli (Poli et al., 2020) and can identify which social partner is more reliable in delivering information (Tummeltshammer et al., 2014). Nevertheless, the cognitive mechanisms that enable infants to detect or infer where to find information have yet to be described. Here, we propose that statistical learning (Kirkham et al., 2002; Krogh et al., 2012) and generalization processes (Aslin, 2017; Kemp et al., 2007; Lake et al., 2015; Yuan et al., 2020) jointly support infants’ ability to learn where information can be found.

Research has shown that from early in life infants possess the ability to extract statistical regularities from streams of sounds or visual patterns (Krogh et al., 2012; Saffran et al., 1996). Saffran and colleagues demonstrated that 8-month-old infants can detect statistical regularities in a continuous auditory stream consisting of four three-syllable nonsense words repeated in random order (Saffran et al., 1996). The stream provided no acoustic cues to word boundaries except for the transitional probabilities between syllable pairs. At test, infants were able to discriminate between “words” and “nonwords” with the same syllables in a different order. This indicates that infants can extract and utilize statistical regularities from speech after only a short exposure. While previous research has demonstrated that infants can infer structures from statistical regularities in sensory stimuli, we propose that infants may also exploit this ability to learn regularities in informativity. In other words, infants may learn which input is consistently informative and use this knowledge to generate expectations about where to find information. Infants’ remarkable ability to generalize and apply learned knowledge to new situations has been demonstrated in several studies (Baram et al., 2021; Werchan et al., 2016, 2015). For example, Werchan and colleagues (2015) showed that 8-month-old infants could learn which feature (colour and shape) was informative about the location of a subsequent cartoon, as they became faster at predicting that location with their gaze over trials (Werchan et al., 2015). Importantly, the infants were able to apply this knowledge to new stimuli that followed the same rule, indicating their generalization ability. In this study, we investigated whether infants can use their generalization skills not only to predict stimuli but also the informativity of stimuli. This would allow infants to optimize their learning by focusing on informative stimuli and thereby drastically saving time and effort in the long run. Specifically, infants might use perceptual features of stimuli (like colour or shape) and generalize their learned expectations about informativity to novel stimuli that share the same perceptual features.

To test our hypotheses, we designed a novel experimental paradigm where infants could form expectations about the informativity of incoming stimuli. Infants were presented with static cues that varied in their border type (see Fig. 1B), indicating whether they would receive information about the location of a rewarding stimulus (see Fig. 1A). Then, the shapes moved in a way that either did or did not signal the location of a subsequent reward. We measured infants’ pupil dilation in response to the presentation of informative and uninformative cues before the information itself was delivered. During stimulus presentation, infants’ pupil dilation was continuously measured using an eye-tracker. Previous research has shown that pupi dilation highly correlates with stimulus uncertainty (Joshi and Gold, 2020; Lavín et al., 2013; Preuschoff et al., 2011), which refers to the unpredictability or ambiguity surrounding a stimulus outcome, and with the expected information gain of a stimulus in a task (Zenon, 2019). In particular, it has been observed that pupil dilation increases when stimuli are more uncertain or less informative, and decreases when they are more predictable or highly informative (Joshi and Gold, 2020; Lavín et al., 2013; Preuschoff et al., 2011; Zénon, 2019). Consequently, we expected that infants’ pupil size would increase when presented with an uninformative cue, and decrease when presemted with an informative cue, indicating that infants can predict whether they will receive information or not. We expected this effect to build up across trials, suggesting that infants gradually develop expectations regarding the informativity of the cues presented to them.

Task Design and Pupil Dilation Signal.

A. Upper part: Baseline-corrected change in pupil dilation during a trial (mean signal and 95% confidence intervals estimated with a Local Polynomial regression). The grey-shaded area indicates the baseline and the unshaded area indicates the time-window of interest, in which infants can predict whether they will receive information. Lower part: Informative and uninformative trials. After a fixation stimulus, 4 identical static shapes (i.e., the cue) were presented. The border type of the shapes (pointy vs. smooth) predicted whether their following movement was informative. In informative trials, all four shapes moved to one corner of the screen signalling the location of the reward. In uninformative trials, each shape moved to a different corner of the screen. After the shapes moved back to the centre and glowed up twice, a cartoon animal was presented as reward. B. Example of the cues presented in the first 25 trials. Border type and colours were counterbalanced across participants. After 17 trials, new shapes were added. From that moment onwards, on each trial either familiar or novel shapes were presented as cues.

In addition to assessing infants’ ability to learn and predict the informativity of upcoming stimuli, our study also tested infants’ capacity to generalize this knowledge to novel stimuli. To explore infants’ capacity to generalize this knowledge to novel stimuli, cues with novel shapes were introduced after several trials which followed the same rule as infants had seen so far (e.g., a pointy shape predictive of upcoming information and a smooth shape of no information). By analyzing infants’ pupil response to the familiar and novel cues, we aimed to gain insights into whether infants were able to extend their acquired knowledge instead of learning from scratch about the new cues. We expected infants to exhibit a similar pattern of pupil dilation in response to both novel and familiar cues, indicating that they generalized their acquired knowledge to new stimuli

In summary, our study aimed to investigate infants’ early information-seeking abilities. Specifically, we examined how infants use domain-general abilities such as statistical learning and generalization to generate expectations about the informativity of upcoming stimuli. This ability is crucial to actively determine where to find information, thus highlighting how infants take an active role in their own learning to maximize information gains while minimizing time and effort.

Results

Infants learn the informativity of novel stimuli through statistical learning

We preprocessed infants’ pupillometry data (see Methods) and used Bayesian additive models to fit the pupil response time-locked to the cue onsets (see Fig. 2A). Additive models are a useful tool for capturing complex relationships between variables as they allow to detect any type of nonlinear relationship between dependent and independent variables. Leveraging Bayesian models’ inherent ability to generate predictive distributions for parameters estimates, we investigated multiple aspects of the estiamted pupil response. We identified a significant difference in pupil dilation between conditions (beta mean difference = 0.007, 89%HDI = [0.002, 0.01]). When infants were presented with informative cues, their pupil dilation was smaller compared when they were presented with cues that were not informative (see Fig. 2B). This is consistent with the idea that pupil dilation decreases when uncertainty is lower (Joshi and Gold, 2020; Lavín et al., 2013; Preuschoff et al., 2011), and it is indicative of infants’ ability to detect what stimuli will lead to information. These findings are further supported by complementary analyses of infants’ looking behaviour (see Supplementary Materials).

Pupil dilation during informative and uninformative events.

A. The Bayesian additive models estimated pupil change during the predictive time window, with informative trials in red and uninformative trials in blue. The shaded areas represent the standard error (darker) and 89% credible interval (lighter) of the estimate. The x-axis displays time in milliseconds and the y-axis shows the estimated pupil change from baseline. B. Overall, the estimated pupil change was lower for informative trials compared to uninformative trials. C. The difference between the conditions developed over trials, as infants learned which stimuli were informative. Across trials, the pupil constricted more in informative trials while it remained unchanged in uninformative trials.

To study infants’ learning we focused on the trial-by-trial change in the pupillary response to the cues (see Fig. 2C). This showed a significant interaction between condition (informative vs uninformative) and trial number (beta mean difference = 0.002, 89%HDI = [0.002, 0.003]). The interaction was driven by a gradual decrease in pupil size in the informative condition (estimated beta = -0.002, 89%HDI = [-0.002, -0.002]) over the course of multiple trials, while the pupil size in the uninformative condition remained unchanged (estimated beta = -3e-5, 89%HDI = [-3e-4, 2e-4]) (see Supplemetary figure 1). The pupil size difference between conditions thus gradually emerged as evidence accumulated.

The temporal dynamics of infant learning match the predictions of a reward-based reinforcement-learning model

While our data demonstrate that infants can learn where to find information, the mechanism underlying this learning remains unclear. As mentioned previously, statistical learning may enable infants to learn which stimuli are informative and which are not. However, our analysis has assumed a linear relationship between uncertainty and trials, without explicitly modelling this change in uncertainty over the course of the trials. As a result, the model assumed a gradual linear decrease in uncertainty associated with the cues as the number of trials increased. To gain a better understanding of the underlying learning process, we conducted an exploratory analysis to investigate whether the change in uncertainty aligned with the predictions of a temporal-difference (TD) learning model.

Temporal difference (TD) learning is a reinforcement learning algorithm (Gabriel and Moore, 1990; Sutton and Barto, 2012) that quantifies the temporal dynamics that enable an agent to predict rewards based on environmental cues. This is achieved by continually updating an agent’s predictions as new evidence becomes available, thus improving them over time. Recently, TD-learning has been employed as a biologically plausible implementation of statistical learning (Orpella et al., 2021). In our study, TD-learning predicts that, over time, infants shift the value that they attribute to the information (here the movement indicating the location of a reward stimulus) to the static cue which signals upcoming information. The term “value” refers to the relative importance or usefulness of a particular event or stimulus in achieving a goal, in this case, obtaining information about the location of the reward. Hence, when the movement is informative, the value of the cues should slowly increase over time, following a TD-learning function (see Fig. 4). Our results are consistent with these predictions, as we found a high correlation between infants’ pupil dilation and the expected reduction of uncertainty (i.e., the information gain) estimated by a TD-learning model (beta mean = -0.064, 89%HDI = [0.059, 0.069]). Crucially, this model performed better than the previous statistical model which assumed a linear decrease in pupil dilation over time (waic elpd difference = -37.7, waic standard error difference = 5.9). This suggests that infants learn where to expect information by shifting the value of the informative events to the predictive cues.

Model comparison.

A. Estimated pupil change over trials as predicted by the linear and TD-learning models. The linear model is a purely statistical model, while the TD model also makes assumptions about the underlying cognitive mechanisms. The shaded areas represent the standard error (darker) and 89% credible interval (lighter) of the estimate. B. Model comparison was performed comparing waic scores. The TD model (in green) had a lower WAIC score, indicating better performance. The elpd difference (in blue) offers a direct comparison between the two models, showing that the TD-learning model was significantly better that the linear model. The errorbar represents the standard deviation of the elpd difference.

Pupil dilation during different learning moments.

Change in the estimated pupil change is displayed as a function of cue type (informative/uninformative) and learning (before learning/after learning/generalization). As expected, before learning (i.e., trials 1 to 4), there was no difference in pupil size between the informative and uninformative trials. After learning (i.e., trials 12 to 15), infants showed a more constricted pupil in informative trials compared to uninformative ones. This pattern was also shown for the generalization trials (i.e., trials 18 to 21), suggesting that infants were able to generalize their knowledge to novel, unseen stimuli. The shaded areas represent the standard error (darker) and 89% credible interval (lighter) of the estimates.

Infants generalize their acquired knowledge

After 17 trials, novel cues were introduced, which shared relevant (i.e. the type of shape) and irrelevant features (i.e., the colour) with the familiar cues. We tested whether infants exploited the informative features to quickly generalize the informativity to the novel cues. A Bayesian additive model showed that infants’ pupil dilation was reduced for novel cues. This was specific to those novel cues that shared the features of the familiar informative cues (estimated mean difference = -0.05, 89%HDI = [-0.062, - 0.038]). The size of this effect approximated the difference between conditions that were observed for familiar stimuli (estimated mean difference = -0.067, 89% HDI = [-0.077, -0.057]). Crucially, this difference was not observable at the start of the task, when the familiar stimuli were first introduced (estimated mean difference = -0.007, 89%HDI = [-0.015, 0.001]). This suggests that two different learning processes were at play: one slower process allowed to learn where to find information, while a faster generalization process allowed to apply what had been learned to novel instances.

Discussion

How do we learn to find information that helps us to successfully navigate this world? We approached this question from a developmental perspective, examining the cognitive roots that underlie our remarkable learning skills from the first year of life. This study offers novel insights into the mechanisms by which infants detect and predict informativity and thereby unravels the fundamental learning mechanisms that support information-seeking. Infants have exceptional learning abilities, allowing them to acquire vast amounts of knowledge in a short time (James, 2010; Westermann et al., 2010). Previous research suggested that infants preferentially deploy their cognitive resources on informative stimuli (Gottlieb et al., 2013; Poli et al., 2020). Specifically, infants can detect whether stimuli provide new information and become more likely to disengage when information content decreases. While this study demonstrates that infants respond to the level of information provided by a stimulus, it remains unclear whether infants can predict the sources of new information. This would enable infants to strategically allocate attentional resources, focusing on stimuli likely to provide information and prepare for the acquisition of new knowledge which would make learning more effective.

In this study, we investigated the ability of infants to distill the informativity of upcoming stimuli. We found a steady reduction in pupil size over trials, indicating that infants learned that specific cues predicted whether they would later receive information about the location of a reward. This discovery supports the growing body of evidence indicating that infants are proactive in shaping their learning environment by searching for and focusing on information-rich stimuli (Poli et al., 2023, 2020). For the first time, this demonstrates that infants do not only react to information but can learn about the informativity of a stimulus and anticipate information before it is available. This offers an explanation for other behavioural patterns previously observed in infants, like their enhanced attention towards social cues. By repeated exposure to social cues, such as eye contact (Csibra and Gergely, 2009), the mouth while speaking (Hunnius and Geuze, 2004; Lewkowicz and Hansen-Tift, 2012), or pointing gestures (Sodian and Thoermer, 2004), infants may learn to expect that they carry relevant information (Begus et al., 2016; Tummeltshammer et al., 2014; Zmyj et al., 2010).

We did not only show that infants can learn where to find information, but using a reinforcement learning (TD-learning) model, we also demonstrated that the changes in pupil dilation over trials were compatible with a shift of value from the informative event itself to the static cues predictive of the informativity. The TD-learning model performed better than a linear model, but it also comes with greater explanatory power, as it captures the specific learning mechanism behind this ability to build expectations of informativity. Our model assumes that information is valuable, but it remains agnostic as to why this is the case. One possibility is that information has an intrinsically positive value from birth, and this comes as a fundamental bias of the human brain to support information-seeking behaviours (Burda et al., 2018; Houthooft et al., 2016). Another compelling hypothesis is that the same TD-learning mechanism observed in our study is involved in shaping the value of information over time. This would predict that early in life, only rewards are valuable, not information. As information is often instrumental in obtaining rewards, the value of the reward may be progressively transferred onto information via TD-learning. For example, an infant may initially only value the immediate reward of food when presented with a spoonful of pureed fruit. However, over time, the infant may begin to associate certain sounds or movements with the delivery of food, such as the sound of a caregiver opening a jar of baby food or the sight of the spoon being brought to their mouth. Following the TD-learning approach, the infant may gradually begin to place value on these cues that predict the delivery of food, rather than solely on the immediate reward of the food itself.

Future studies are needed to investigate whether this mechanism is indeed at play very early in life, which would provide insight into the origins of the value placed on information. Future research is also needed to explore alternative computational models that may capture different learning strategies. By comparing these models to infant learning, we may gain a more comprehensive understanding of the specific mechanisms involved in early information-seeking abilities.

Finally, our study not only demonstrates that infants can learn to predict where to find information, but also that they can generalize this knowledge to novel stimuli. Their remarkable generalization abilities17,18 allow them to extend expectations about the informativity of familiar stimuli to novel stimuli which share relevant features. These findings suggest the presence of multiple learning processes in infants, with one being slower and more data-hungry, and the other being faster and relying on generalization.

This combination of fast and slow learning processes is key for effective learning, and a similar implementation in artificial agents may be essential to develop machines that learn and explore as humans do. Only recently, machines have been endowed with a bias towards information that makes them experience novel information as rewarding in itself leading to an improvement in learning speed and performance (Pathak et al., 2017). Yet, the abilities displayed by infants in our study - slow learning of informativity and fast generalizations to novel stimuli - are still lacking in artificial agents. Given that these mechanisms are active from early on in humans, they may be fundamental to a successful implementation of efficient and flexible human-like learning.

In conclusion, our study sheds new light on the cognitive processes that underlie infants’ remarkable learning skills. Specifically, it identifies the ability to form expectations about the informativity of new stimuli as a fundamental aspect of their learning. This study offers an explanation of how infants can process and make sense of the vast amount of sensory information they are exposed to. Hence, it contributes to a mechanistic understanding of how infants develop sophisticated models of the social and physical world around them at such a breathtaking rate.

Materials and Methods

Participants

Forty-four full-term infants (mean age: 8.2 months, SD: 0.24; 22 males) were recruited from a database of interested families in the Nijmegen region, a middle-sized city in the Netherlands (ECSW-2021-096). Based on the number of trials left after trial rejection, 14 infants were excluded from the subsequent data analysis (see Preprocessing) resulting in a final sample of 30 infants (mean age = 8.3 months, SD = 0.19; 12 males).

Stimuli and Procedure

The aim of this study was to investigate infants’ ability to predict the informativity of a stimulus. To accomplish this we created a set of four different shapes. Two shapes were pointy and two shapes were smooth. One of the border types (e.g., smooth) was used to signal informative trials and the other (e.g., pointy) to signal uninformative trials. The border types linked to informative and informative trials were counterbalanced between participants. Each shape could either be red or blue. The colour was pseudorandomized and it had no influence on the task.

Each trial of the experiment began with a fixation stimulus made of two concentric circles. Its colour and overall size matched the colour and size of the shapes presented later during the trial. Following a 2-second presentation of the fixation stimulus, four identical shapes were presented in the centre of the screen for 3 seconds. The identity of the shapes was sampled pseudorandomly from the four available shapes. This defined the “cue” which was predictive of whether or not infants would later receive information about where a cartoon animal would appear.

After the 3-second presentation of the cue, the shapes moved. Their movement pattern was dependent on the trial type. In informative trials, all four shapes moved to one corner of the screen (pseudorandomly selected on each trial). In uninformative trials, each shape moved to a different corner of the screen. After moving to the corners, the shapes returned to the centre of the screen. In total, this lasted 2 seconds. The shapes then remained static for 0.5 seconds, glowed (i.e. expanded and contracted) twice to capture back the attention of the infant for a total of 2 additional seconds, remained static for another 0.5 seconds, and then disappeared. Finally, 0.75 seconds after the disappearance of the shapes, a video of a cartoon animal appearing from a present box was displayed for 2 seconds. On each trial, the specific cartoon animal was selected pseudorandomly from a pool of four different animals. In the informative condition, the location where the cartoon animal appeared was the same as the one cued by the movement of the shapes, while in the uninformative condition, it was selected pseudorandomly. This design allowed us to separate the moment in which infants could predict that they would receive information (i.e. cue of informativity) from the moment the information was actually provided.

After 17 trials, two shapes were replaced. Specifically, we replaced one pointy shape with a new pointy shape, and we replaced one smooth shape with a new smooth shape. Thus, from the 18th trial on, infants were presented with a combination of familiar shapes that they had seen before and a novel shape that they had not encountered before, while still maintaining the overall manipulation of the study. The purpose of this condition was to investigate if infants can generalize the rule they learned to novel stimuli, thereby examining their ability to generalize their expectations of informativity to new concepts and objects.

Preprocessing

PupillometryR software (Forbes, 2020) was used to preprocess the pupil size data. We first identified the relevant trials from the continuous data and removed segments with excessive noise. Specifically, trials that had fewer than 99% valid samples were excluded, resulting in the rejection of 1.34% of the total trials (13 trials were rejected). We then regressed the left and right pupils against each other and calculated the mean of the two pupils as the final indicator of pupil size. The data was downsampled to a rate of 20Hz by taking the median of each 50ms time-bin (Mathôt and Vilotijević, 2022).

Additionally, to further ensure the quality of the data we rejected any trials that had less than 75% of the data, and participants who kept less than 80% of the trials. This resulted in a trial rejection of 46.9% (453 trials) in relation to the total trial number and a rejection of 13 participants. The final sample on which further analysis was conducted contained 30 participants. To reduce noise in the data, we applied a Hanning window with a degree of 11 samples. Linear interpolation was used to estimate missing pupil size samples to ensure data continuity (Jackson and Sirois, 2009). Samples were then baseline-corrected by subtracting the average of the final 500 ms of the fixation stimulus thus representing pupil size change in comparison to the baseline. Finally, we selected only the 3000ms of the cue time window for further analysis (Colizoli et al., 2018).

Analysis

To investigate the changes in pupil size, we used Bayesian mixed-effect additive models to analyze the preprocessed data. This was accomplished using the brms package (Bürkner, 2021, 2018, 2017). Additive models have been widely utilized in statistical literature as a powerful tool for data analysis, as they allow for flexible modelling of complex relationships between variables while maintaining interpretability and ease of estimation. To flexibly model the relationships between variables, additive models allow to include smooth terms. Smooth terms are flexible functions that allow for non-linear relationships between the predictor and the response variable. Furthermore, the utilization of a Bayesian framework allows for the expression of model uncertainty and the incorporation of prior information about the parameters. The application of additive models to pupillometry data is particularly advantageous as it allows for the capture of nonlinear and complex relationships between different predictors and pupil dilation response (Hershman et al., 2023; van Rij et al., 2019).

To properly model the fluctuation of pupil size over the time course of the trials, we included two smooth terms in each model. The first term modelled the differences in pupil fluctuation between conditions (informative and uninformative), while the second term modelled differences between subjects. Additionally, to account for residual variability between participants, we included a random intercept term in the models.

To ensure robust model fitting, each model was estimated using Markov Chain Monte Carlo (MCMC) sampling with 4 chains of 6000 iterations. We discarded 4000 iterations as warm-up and kept the remaining 2000 iterations for analysis. We specified weakly informative priors for the models to allow for flexibility in the estimation process. The detailed specification of the priors used can be found in the supplementary materials. By using this approach, we were able to model the complex relationships between the predictors and the pupil dilation response and to account for the random variation in the data. Convergence and stability of the Bayesian sampling were assessed using R-hat, which should be below 1.01 (Vehtari et al., 2020), and effective sample size (ESS), which should be greater than 1000 (Bürkner, 2017).

Effect of informativity

To investigate the impact of informativity on pupil size, we conducted a statistical analysis on all trials except the generalization trials. This was done to ensure that the results of the analysis were not confounded by the additional manipulation of the generalization trials.

To investigate the effect of informativity on pupil size change, we used an additive model to analyze the preprocessed pupillometry data (Thurman et al., 2019). Condition (informative vs uninformative), trial number, and their interaction were fixed factors. This approach allowed us to examine how baseline-corrected pupil size changed in relation to the predictability of the upcoming information and the trial number. Furthermore, by including the interaction term in the model, we were able to assess the development of the effect of condition on pupil size changes over time.

To further analyze the interaction between trials and the two conditions, we made use of the estimate_slopes function from the modelbased package (Makowski et al., 2020). This function enabled us to determine if the slopes of the informative and uninformative trials were significantly different from zero. Thus, showing whether there were significant differences in the change of pupil dilation over trials between the two conditions.

Ideal Learner

We employed a Temporal Difference (TD) learning algorithm to explore the mechanism underlying infants’ learning. TD learning is a reinforcement learning method that enables an agent to predict whether a reward will be delivered based on previous experience. When new evidence is observed, the model can update its predictions, hence improving over time (Gabriel and Moore, 1990; Sutton and Barto, 2012). Here, we used TD learning to predict whether information will be delivered. The model learns to predict the delivery of information based on the preceding cue stimulus. At the start of the task, the cues have not been associated with the information yet, and the initial value V of the cues is thus zero. On every timepoint t of every trial n, a prediction error (i.e., temporal-difference error, TDE) is computed as follows:

Note that TDE increases not only when information (In,t) is delivered unexpectedly, but also when information is expected to be delivered (Vn,t+1). This implies that cues preceding information increase in value across learning. This learning is implemented by updating the value associated with the cue with the prediction error:

Where α indicates the learning rate, which is a free parameter that determines how much weight is assigned to the prediction error when updating the predictions. We expected Vn,t to correlate with the baseline-corrected changes in pupil dilation during the presentation of the static cue.

To test the value of alpha that best fitted the infant data, we used a grid-search approach with 30 different values of alpha (from 0.01 to 0.30). Specifically, we analyzed the pupillary response data in relation to the Vn,t values obtained with the different learning rate values using additive models (MGCV package, (Anderson-Cook, 2007; Pedersen et al., 2018)). Compared to brms, MGCV allowed for a quicker evaluation of which model would perform best. The mean pupil was modelled based on the uncertainty values extracted from the TD-learning model. To account for the fluctuations of pupil size, two smooth terms were included in each model, one to model differences in pupil fluctuation between conditions (informative and uninformative), and another to model differences between subjects.

The Akaike information criterion (AIC) was used to compare and evaluate the best-performing model. The lower the AIC, the better the model fits the data. The model with α = 0.19 performed best (see Supplementary figure 3). Hence, we ran a Bayesian additive mixed-effects model using a similar approach as described in the previous paragraph, with the only difference being that the mean pupil was modelled using the Vn,t values extracted from the TD-learning model instead of using Condition and Trial number.

Finally, we compared the performance of the TD-learning model with the linear model using the waic function from the brms package (Bürkner, 2017). The function allows to compare the theoretical expected log pointwise predictive density (elpd) (Vehtari et al., 2017) of different models by returning the elpd difference and the elpd standard error.

Generalization

To investigate infants’ ability to generalize the association learned between the border type and the upcoming information, we selected trials: we compared pupil dilation at the presentation of the static cue during generalization trials (trials 18 to 21) with pupil dilation at the beginning of the task, before learning has occurred (trials 1 to 4) and with pupil dilation later in the task, after learning has occurred (trials 12 to 15). To avoid possible confounds related to stimulus novelty and surprise, the very first trial of the study (trial 0) and the very first generalization trial (trial 17) were excluded. Participants that did not watch generalization trials were excluded from this analysis, resulting in a sample of 19 infants.

We modelled baseline-corrected pupil size by including condition (informative vs uninformative), learning (before learning, after learning, generalization), their interaction and trial number as fixed factors. This approach allowed us to examine the extent to which infants were able to generalize the rule they had learned to new, previously unseen trials. By comparing the baseline-corrected pupil size before and after learning the association between border type and informativity, as well as during the generalization trials, we could investigate whether infants were able to transfer the rule to novel situations and whether this transfer was reflected in the pupillary response. After fitting the model to the data we explored the contrast of the interaction between condition and generalization by using the estimate_contrast function (Makowski et al., 2020). This enabled us to explore the difference between each contrast of this interaction.

Data Availability

The data and the analysis scripts are publicly available on OSF via: https://osf.io/tkzf9/?view_only=458ffe27e0b344c8ae519259b1ef630d

Supplementary material for: “Early roots of information-seeking: Infants predict and generalize the value of information”

Pupil Size Trends

Pupil size trends across conditions.

Slope estimates by condition derived from the Bayesian additive model applied to the predictive time window. The graph shows a negative slope in the informative condition, indicating a consistent decrease in pupil size over multiple trials. In the uninformative condition, the slope is not different from zero, indicating no significant change in pupil size across trials.

Latency to reward location

Analysis

In addition to analyzing the pupillary response to the cues, we also examined infants’ latency to look at the reward location. This analysis aimed to determine if infants were using the cues’ movements to predict the reward location. Raw eye-tracking data were processed using 2-means clustering (I2MC) (Hessels et al., 2017), that allows for a robust idenitification of fixations especilly in noisy data as the one from infants. After running I2MC we delineated four areas of interest of 400 × 400 pixels around the target locations and extracted latencies using a Python script. Latencies were extracted between 1250ms before the reward presentation and 1000ms after the presentation of the reward. These latency values were then standardized for each individual participant by computing z-scores using their mean and standard deviation. Finally, we employed a Bayesian generalized linear mixed-effects model to investigate whether infants relied on the movement of the cues to guide their gaze to the location of the reward. Due to the response variable being limited to 1000ms after the reward appearance, the response distribution of the model was truncated using the trunc function (Bürkner, 2017). The model included condition (Informative vs Uninformative) and trial number as fixed factors. We also included a random intercept term in the model to account for individual subject variability. Similarly to the models run for the main analysis, the model was estimated using MCMC sampling with 4 chains of 6000 iterations, with 4000 iterations as a warm-up. The detailed specification of the priors used can be found in the Models’ priors paragraph.

Results

Following preprocessing of infants’ eye-tracking data, we fitted Bayesian additive models to analyze latency to the reward location time-locked to the reward onsets (Fig. 2A). The analysis revealed a significant difference in latency between conditions (mean = 0.16485723, 89%CI = [-0.014, 0.347]). Infants were faster to look at the reward when they received information about its location compared to when they did not receive any information (Supplementary figure 2). These findings confirm infants’ ability to associate the movement of the cues with the location of the forthcoming rewards and utilize this information to attend to the correct location.

Latency to the reward location.

The estimated normalized latency to the reward location is shown as a function of cue type (informative/uninformative). The informative trials are plotted in red, while the uninformative trials are plotted in blue. The shaded areas represent the standard error (darker) and 89% credible interval (lighter) of the estimate.

Learning rate grid search analysis

As mentioned in the main manuscript the TD-learning estimates were extracted using different learning rate values. Specifically, we extracted the TD-learning estimated using learning rates ranging from 0.1 to 0.3. To determine the learning rate that best approximated that of our infant sample, we conducted a grid-search analysis. We applied additive models using the MGCV package (Anderson-Cook, 2007; Pedersen et al., 2018) to analyze the pupillary response data in relation to each set of TD-learning estimates. To account for the fluctuations of pupil size over the course of trials, two smooth terms were included in each model, one to model differences in pupil fluctuation between conditions (informative and uninformative), and another to model differences between subjects. The Akaike information criterion (AIC) was used to compare and evaluate the best-performing model. Our results revealed that the model with the highest performance was the one based on the TD-learning esimates extracted using a learning rate of 0.19 (Supplementary figure 3, Supplementary table 1). Consequently, the analyses presented in the main manuscript were conducted using the TD-learning estimates obtained with this learning rate.

AIC values.

AIC values of the additive models fitted on TD-learning estimates extracted using different learning rates. The figure shows that the model with the lowest AIC was obtained using a learning rate of 0.19.

AIC values in relation to the learning rate.

AIC values of models fitted on TD-learning estimates extracted using different learning rates

Models’ priors

Pupil dilation

In the additive Bayesian models used to explore the pupil changes fluctuation, we have specified weakly informative priors for the fixed effects, smooth terms, and the intercept. The choice of these priors was based on our knowledge of the response variable (pupil dilation) and insights gained from data visualization.

The fixed effects (class ‘b’) have been assigned normal priors with a mean of 0 and a standard deviation of 0.2, reflecting our expectation that these effects are centred around zero, with a modest degree of uncertainty. This choice of prior represents a relatively weak constraint on the fixed effects, allowing the data to play a more substantial role in updating the posterior distribution.

For the standard deviations of the smooth terms (class ‘sds’), we employed Student’s t-distribution priors with 5 degrees of freedom, a mean of 0, and a scale parameter of 0.2. The use of a Student’s t-distribution with a small number of degrees of freedom allows for heavier tails than the normal distribution, providing a more robust estimate against potential outliers. Additionally, the choice of a scale parameter of 0.2 represents a relatively weak constraint on the smooth terms’ standard deviations, enabling the data to inform the uncertainty associated with these terms.

Lastly, the intercept (class ‘Intercept’) was assigned a Student’s t-distribution prior with 5 degrees of freedom, a mean of 0, and a scale parameter of 0.3. This choice of prior distribution for the intercept is similar to the one used for the smooth terms’ standard deviations, with the heavier tails providing robustness against potential outliers. The scale parameter of 0.3 reflects a modest degree of uncertainty about the intercept’s true value, allowing the data to update the prior distribution accordingly.

In addition to these specified priors, we retained the default priors provided by the brms package for other parameters of the model, such as the residual standard deviation (sigma), group-level standard deviations (sd), and the degrees of freedom parameter (nu) for the Student’s t family. The default priors utilized are as follows: a gamma distribution with shape and rate parameters set at 2 and 0.1 respectively for the nu parameter, and a Student’s t-distribution with 3 degrees of freedom, a mean of 0, and a scale parameter of 2.5 for both the sd and sigma parameters.

Latency

In the Bayesian model employed to investigate infants’ latency to look to the reward location, we utilized the default priors provided by the brms package for all the parameters, including fixed effects, group-level effects, and residual standard deviation.

For the fixed effects (class ‘b’), flat priors were used, which implies uniform probability distribution over a wide range of values. The intercept (class ‘Intercept’) was assigned a default Student’s t-distribution prior with 3 degrees of freedom, a mean of 0.1, and a scale parameter of 2.5. For the group-level standard deviations (class ‘sd’), a Student’s t-distribution prior with 3 degrees of freedom, a mean of 0, and a scale parameter of 2.5 was employed. Lastly, the residual standard deviation (class ‘sigma’) was assigned a Student’s t-distribution prior with 3 degrees of freedom, a mean of 0, and a scale parameter of 2.5.