Conceptual models of looking to familiar vs. novel items according to three models of infant attention: (A) The partial encoding model proposed by Hunter and Ames [25] which suggests a shift from familiarity to novelty preferences, (B) The Goldilocks model proposed by Kidd et al [29] which suggests that infants prefer to attend to intermediately surprising events, and (C) the ‘optimal curiosity’ model introduced by Cao et al [9] which suggests that infants are maximizing expected information gain from noisy perceptual samples.

Three components of RANCH: (A) Perceptual representation: we plotted the first two principal components of stimulus embeddings from ResNet-50 final layer activations, (B) Learning model: plate diagram for Gaussian concept learning from noisy observations, (C) Decision model: RANCH repeatedly samples until environmental EIG outweighs EIG from another sample from the same stimulus.

Experimental design for Experiments 1 and 2, infants and adults. The vertical line separates the exposure phase and the test trial. Experiment 1 varied exposure to a stimulus and measured looking to familiar or novel stimuli. Experiment 2 varied how the familiar stimulus was violated, by changing the pose, identity, number or animacy of the stimulus. Infants had fixed prior exposure durations (5 second per exposure) and looking was measured until a 2-second lookaway. Adults responded on each trial via a keypress to continue to the next trial.

Experiment 1, Infant and RANCH behavior (different linking hypotheses and lesioned models). The x-axis shows the number of prior exposures shown to infants and RANCH before measuring looking time / number of samples on the test trial, which was novel or familiar. The y-axis shows the mean looking time in seconds for the behavioral panel, and the scaled model samples for RANCH. The scaling procedure was applied to each linking hypotheses/lesioned model separately. We found evidence for habituation and dishabituation (after long prior exposures), but no evidence for familiarity preferences. Error bars show the standard error of the mean.

Experiment 1, Adult and RANCH behavior (different linking hypotheses and lesioned models). For adults and the corresponding RANCH simulations, looking time was measured on every trial, shown on the x-axis. The y-axis shows the mean looking time in seconds for the behavioral panel, and the scaled model samples for RANCH. Similar to infants, we found evidence for habituation and dishabituation, but no evidence for familiarity preferences. Error bars show the standard error of the mean.

Cross-validated RMSE across linking hypotheses and lesioned models. Mean RMSE is averaged across all parameter values, best RMSE is the single best-performing parameter set

Experiment 2, Infant and RANCH behavior. The x-axis shows how the familiar animal was violated and the y-axis shows the mean looking time in seconds / scaled model samples for RANCH during test trials following the exposure phase. Error bars show the standard error of the mean. Both infants and RANCH showed a graded pattern of dishabituation depending on the violation type. We show the results using the combined dataset of the Zoom and Children Helping Science experiments.

Experiment 2, Adult and RANCH behavior. The x-axis shows the position in the block (which could be of length 2, 4 or 6), and y-axis shows looking time/scaled model samples. Both adults and RANCH showed a similar pattern of graded dishabituation in this task. Error bars show the standard error of the mean.

Illustration of how predictions from the Goldilocks model in Fig. 1 were derived.

Histograms of RMSE for different linking hypotheses show that RANCH is largely robust to variations in parameters.

The effects of scaling on model results. (A) Experiment 1 with unscaled embeddings resulted in a reversal of familiar vs. novel. (B) Experiment 2 with unscaled embeddings resulted in an unintuitive ordering.

Histograms of model fit colored by priors. X-aixs shows the RMSE of the model fit with the behavioral data. Results suggest that when the learner’s noise is high, RANCH fits infant data better.

Embedding distances of different violation types for (1) a base ResNet50 model, (2) a perceptually aligned embedding model [32]. and (3) a ResNet50 model trained on SAYCAM data [38].

Control experiment for Experiment 2 (infants), checking for baseline differences in interest. We found no significant differences in baseline looking.