Illustration of analyses with the central figure showing all low- and high-level features synced.

A) Spectral flux being computed as the differences between consecutive spectrum (green area between the current and previous spectrogram). B) Hand movement is computed as the average distance travelled by the right and left hands. C) Face movement is computed as the sum of derivative of the eye-to-brow distance (blue dot on the eye to red dot on the eyebrow) and the mouth opening distance (blue dot on the upper lip to red dot on the lower lip). D) Object naming was obtain using the automatic transcription and a query for the specific toys present during the interaction. E) Semantic surprisal. For each word, a probability distribution was obtain using GPT2 prompted with the previous words, the semantic surprisal is the log probability of the observed word. F) Information rate. For each word, a cumulative complexity (upper) is computed using lossless compression algorithms, taking the derivative gives the information rate (lower). G) Facial novelty. For every frame, the facial expression is estimated, and an information distance is computed using the Kullback-Leibler between consecutive frames.

Descriptive analyses on lower- and higher-level features.

Violin plots showing the average at a group level of lower-level: Spectral flux - ambient (A); spectral flux - vocalisations (B); facial movement (C); hand movement (D), and higher-level features: object naming (E); information rate (F); semantic surprisal (G); facial novelty (H). Individual dots represent the data for each participant, in orange is data at 5 months, and in purple is data at 15 months. Red dots are showing the mean and red lines are showing the median. Asterisks indicate significance (* = p-adj <0.05, ** = p-adj <0.01, *** = p-adj < 0.001).

Correlation matrix between all features of the interaction at 5 and 15 months.

Each square shows the correlation between two maternal behavioural variables with a zero lag. The bottom-left triangle shows correlation values for the 5 months visit, and the top-right triangle shows correlation values for the 15 months visit.

Cross correlations between lower-level features and infant attention to objects.

Spectral flux of ambient background and vocalisations, facial movement, and hand movement in relation to infant attention to objects at 5 months (A, C, E, G) and 15 months (B, D, F, H), respectively. Thick orange (5 months) and purple (15 months) lines represent the observed cross-correlation results, with shaded coloured areas showing their SEM. Grey lines represent control (permuted) data, with the shaded grey area indicating its SEM. Red thick lines indicate significance from the CBP test (significance for the CBP tests was set to p<0.025, two-sided, and was then FDR adjusted).

Cross correlations between higher-level features and infant attention to objects.

Maternal object naming, information rate, semantic surprisal, and facial novelty in relation to infant attention to objects at 5 months (A, C, E, G) and 15 months (B, D, F, H), respectively. Thick orange (5 months) and purple (15 months) lines represent the observed cross-correlation results, with shaded coloured areas showing their SEM. Grey lines represent control (permuted) data, with the shaded grey area indicating its SEM. Red thick lines indicate significance from the CBP test (significance from CBP was set to p<0.025, two-sided; of note, significance for object naming was set at p<0.05, one-sided, based on the expectation that object naming would positively correlate with attention. All results were FDR adjusted).

Demographic data at 5 and 15 months.

Table summarising the numbers of datasets included in the analyses for both samples as well as reason for exclusion.

Statistics for overall average levels

Experimental paradigm

Experimental paradigm. A) Top figure shows the experimental set up for the joint play condition. Two cameras pointed at the infant (view in photos 1 and 2) and one camera pointed at the mother (view in photo number 3). Looking behaviour was coded manually at 50fps for object and partner looks from both the mother and the infant. B) shows the different type of looks (i.e. looks to object 1-3, looks to partner and ‘others’. Notice that the latter category – ‘others’ – included inattention and uncodable moments). C) Plot of the average duration of the joint play interactions at 5 months (in orange) and 15 months (in purple). Asterisks indicate significance from the two-sample t-test (* = p<0.05, ** = p<0.01, *** = p< 0.001). D) Photos of the toys employed at both time points: panda (A), book (B) and rattle (C).

Descriptive analysis on looks to the object and to the partner.

Descriptive analyses on looking behaviour. Figure showing average number of looks per minute (A, C) and average look duration (in seconds) (B, D) for looks to object (A, B) and looks to partner (C, D) respectively. Asterisks indicate significance (* = p<0.05, ** = p<0.01, *** = p< 0.001).

Average amount of speech per participant.

Descriptive analyses on amount of speech. Figure showing average number of words. To quantify the amount of speech in each interaction, we transformed the output from Whisper Open AI (Radford et al., 2023) into a binary array where ones represent words and zeroes indicate the absence of words. We then summed the number of words per interaction and divided it by the length of each interaction respectively. We conducted a two-sided t-test to assess significant differences across age groups (* indicates p<0.05).

Schematic illustrating how to interpret cross-correlation results

Visual guide on how to interpret findings from a cross-correlation analysis. Each plot illustrates the interpretation of a positive (A and B) and negative cross-correlation (C and D) values across backward (A and C) and forward (B and D) time-lags. Below each cross-correlation scheme is a illustration of the alternative possible explanations regarding the cross-correlated variables. For instance A) shows a positive correlation in backward lags between infant gaze and maternal behaviour, this can be interpreted is two complementary ways : an increased in infant gaze in followed by an increased in maternal behaviour or that a decreased in infant gaze is followed by a decreased in maternal behaviour.

Correlations between behavioural measures.

Matrix of pairwise cross-correlations between behavioural measures. Each subplot shows the correlation values between pairs of variables at different lags (in seconds), where backwards lags indicate that the first variable (variables at the top) precede the second one (variables on the side), and forward lags that the first variable follows the second one. Please refer to Fig S4 for a visual guide on interpreting findings from

Cross correlations between lower-level features and infant attention to faces.

Cross correlations between lower-level features and infant attention to faces. Spectral flux of non-vocalisations and vocalisations, facial movement, and hand movement in relation to infant attention to faces at 5 months (A, C, E, G) and 15 months (B, D, F, H), respectively. Thick orange (5 months) and purple (15 months) lines represent the observed cross-correlation results, with shaded coloured areas showing their SEM. Grey lines represent control (permuted) data, with the shaded grey area indicating its SEM. Red thick lines indicate significance from the CBP test (p<0.025, two-sided). All results were FDR adjusted.

Cross correlations between higher-level features and infant attention to faces.

Cross correlations between higher-level features and infant attention to faces. Maternal object naming, information rate, semantic surprisal, and facial novelty in relation to infant attention to objects at 5 months (A, C, E, G) and 15 months (B, D, F, H), respectively. Thick orange (5 months) and purple (15 months) lines represent the observed cross-correlation results, with shaded coloured areas showing their SEM. Grey lines represent control (permuted) data, with the shaded grey area indicating its SEM. Red thick lines indicate significance from the CBP test (p<0.025, two-sided). All results were FDR adjusted.