Introduction

Bodily functions such as heartbeat and respiration are vital to the survival of living beings. The perception of signals arising from the body such as heartbeat, respiration, and hunger is called interoception (Craig, 2002). Individuals differ with regard to their interoceptive sensitivity, the degree to which they perceive their own bodily signals (Critchley & Harrison, 2013). Interoceptive sensitivity is related to human experience and behavior, such as the perception of emotions, mental health, and social cognition (Khalsa et al., 2018). Further, several recent theoretical accounts have highlighted that interoceptive sensitivity plays a vital role in early development in infancy, such as the development of the self and early social abilities (Filippetti, 2021; Fotopoulou & Tsakiris, 2017; Musculus et al., 2021). As infants are born with limited ability to self-regulate bodily states giving rise to interoceptive sensations such as hunger, they rely on interactions with their primary caregiver for co-regulation. These interactions in turn play an important role in shaping early development in infancy. Despite these theoretical frameworks, we have little knowledge about infants’ sensitivity to their interoceptive signals. Recently, the first paradigm to assess cardiac interoceptive sensitivity in infancy was introduced (Maister et al. 2017). In the present study, we aim to replicate the experimental paradigm introduced by Maister et al. (2017) on cardiac interoceptive sensitivity in infants. Further, we aim at tracking the development of interoceptive sensitivity and related individual differences across the infancy period. Going beyond cardiac perception, we introduce a novel approach to measure respiratory interoceptive sensitivity in infants.

Most empirical investigations of interoceptive processing have focused on cardiac interoception (Khalsa et al., 2018). In adults, a large body of research has investigated cardiac interoceptive perception using paradigms in which participants are asked to count or detect their own heartbeat (Brener & Ring, 2016; Schandry, 1981). Using modified versions of the tasks for adult participants, studies with children have shown that stable and adult-like interoceptive skills can be measured already at 4-6 years of age (Schaan et al., 2019).

In contrast to the existing evidence on cardiac interoception in children, we know little about whether infants perceive their interoceptive signals. The first published empirical evidence on interoceptive sensitivity used an eye-tracking paradigm, namely the iBEATs task, in which 5-month-old infants observed images on the screen such as clouds and stars that bounced either synchronously or asynchronously to the infant’s heartbeat. Maister and colleagues (2017) found that infants on average looked longer at stimuli that moved asynchronously to their heartbeat as compared to stimuli moving synchronously. Furthermore, infants’ cardiac interoceptive sensitivity scores were correlated with their heartbeat evoked potentials (HEPs), a neural marker of interoceptive processing (Coll et al., 2021). This study provided the first evidence that already at 5 months of age infants show sensitivity to their own cardiac signals. Further, this approach has also successfully been replicated with rhesus monkeys (Charbonneau et al., 2022) and a recent study using an adapted experimental paradigm in 6-month-old infants has found similar results (Imafuku et al., 2023)

Recently, however, no evidence of cardiac interoceptive sensitivity in 5- to 7-month-old infants was reported (Weijs et al., 2023). Despite some methodological differences, all studies used very similar experimental paradigms in which infants were presented with stimuli oscillating either synchronously or asynchronously to their heartbeat. It is unclear whether the null findings reported by Weijs and colleagues (2023) indicate that infants at this age do not show cardiac interoceptive sensitivity or whether methodological differences, such as the measurement method, outlier rejection criteria, and statistical power, might explain divergent results. In any case, the findings of the study by Weijs and colleagues (2023) highlight the importance of replicating the iBEATs paradigm (Maister et al. 2017) to advance the understanding of interoception in infants.

To gain a more comprehensive understanding of interoceptive processing in infancy, it is crucial to investigate other interoceptive modalities. Especially, as different interoceptive modalities are not necessarily related and might have different functions or underlying neural signature (Allen et al., 2022; Garfinkel et al., 2016; Khalsa et al., 2018). One interoceptive signal that is closely related to cardiac processes is respiration. In fact, heartbeat and respiration are linked functionally and anatomically (Draghici & Taylor, 2016; Garcia Ill et al., 2014).

In recent years, an increasing number of publications have focused on the perception of respiration in adults (Weng et al., 2021). Experimental paradigms that measure sensitivity to resistance in breathing, for instance, have highlighted the connection between respiratory interoception and emotional states such as anxiety (Harrison, Garfinkel, et al., 2021; Nikolova et al., 2021; Harrison, Köchli, et al., 2021). The neural network which links respiratory perception with emotional and cognitive processes has become a matter of scientific interest (Allen et al., 2022; Kluger et al., 2021). The emerging literature shows that breathing constitutes a fundamental process with functional significance for self-regulation (Boyadzhieva & Kayhan, 2021; Heck et al., 2017). In children, it has been shown that sensitivity to respiratory signals can be observed from at least 10 years of age onward (Nicholson et al., 2019). Given the relevance of respiration for self-regulation and social interaction, it is important to map out respiratory interoceptive sensitivity in infancy, a period that is especially relevant for development of self-related perception (Van Puyvelde et al., 2019; Weng et al., 2021).

Regarding the relationship between cardiac and respiratory interoception, empirical results have painted a mixed picture. In children it has been reported that respiratory interoception is not correlated to cardiac interoception (Nicholson et al., 2019). In adults it has been found that while accuracy in interoceptive domains is not related across cardiac and respiratory perception, meta-perception for both modalities shows a significant relation (Garfinkel et al., 2016). In early infancy the relationship between different interoceptive modalities has not yet been investigated. Therefore, it is currently unclear whether sensitivity to different interoceptive modalities emerges at the same time, and in a similar manner.

In the present study, we aim to fill the knowledge gap on interoceptive sensitivity in early infancy by reporting results from two studies investigating cardiac and respiratory interoceptive sensitivity in 3-, 9- and 18- month-old infants. We investigated the age group of 3 months to provide evidence on the early emergence of interoceptive sensitivity, as 3 months is the earliest at which eye tracking paradigms can be reliably applied. We chose the groups of 9 and 18 months as these ages mark important milestones in the development of abilities related to interoception, such as self-perception (Filippetti, 2021; Fotopoulou & Tsakiris, 2017; Musculus et al., 2021) and social cognition (Carpenter et al., 1998; Repacholi & Gopnik, 1997). For instance, around 18 months of age explicit mirror self-recognition can be observed (Amsterdam, 1972). Moreover, around 9 months of age, the ability to show joint attention drastically matures (Carpenter et al., 1998).

To measure cardiac interoceptive sensitivity, we replicated the iBEATs paradigm originally developed by Maister and colleagues (2017), as it is the only task to measure cardiac interoceptive sensitivity in infants to date. Moreover, we developed a novel experimental paradigm that follows the logic of the iBEATs to investigate respiratory interoceptive sensitivity in infants: the iBREATH task. In a first step, we conducted a study with a longitudinal design investigating 9- and 18-month-old-infants. We then replicated the experimental paradigms in a separate sample of 3-month-old infants. Our initial prediction concerning the 9- and 18-month-old sample were threefold: 1) For both tasks and in both age groups (9 and 18 months), we expected to find the same preference in looking behavior as reported in Maister et al. (2017), that is, longer looking times to asynchronous trials as an indication of infants’ detection of incongruency of the visual stimulus and their interoceptive signals. 2) We expected to find a positive correlation between performance in the cardiac and respiratory interoception tasks given the conceptual proximity between both tasks and 3) we predicted an increase in individual interoceptive sensitivity from 9 to 18 months of age, as increased interoceptive sensitivity might be associated with the development of self-recognition and socio-cognitive skills. We tentatively predicted that 3-month-olds would already differentiate between visual displays that move in synchrony vs. asynchrony to their own cardiac and respiratory signals.

Results

Confirmatory Analyses: 9-month-old infants

First, we investigated whether 9-month-old infants displayed sensitivity to their cardiac and respiratory signals. Following our preregistered analysis plan (https://aspredicted.org/QP9_6FP) we computed paired t-tests to compare mean looking times between synchronous and asynchronous conditions for the iBEATs and the iBREATH. We found that for both tasks 9-month-old infants displayed a preference for stimuli presented synchronously with their own heartbeat (Figure 1A, N = 52, Msynch = 7020.62ms, Masynch = 5496.7ms, t = -2.96, p = .005, Cohens d = .48) and respiration (Figure 1B, N = 56, Msynch = 6336.21ms, Masynch = 5483.77ms, t = -2.80, p = .007, Cohens d = .32). These results on the one hand replicate the approach by Maister et al. (2017) in an older age group showing that infants are sensitive to their cardiac signals. Going beyond, using a novel paradigm, we further provide the first evidence that infants are also sensitive to their respiratory signals. Notably, mean preferences were switched compared to our expectations and the results of Maister et al. (2017) who reported a mean preference for stimuli presented asynchronously to the infants’ heartbeats.

Looking times for A) iBEATs and B) iBREATH.

Note. Looking times for the A) iBEATs and B) iBREATH tasks. In both tasks, 9-month-old infants looked significantly longer at stimuli presented synchronously to their own physiological signals. Black dots refer to the group mean. Black bars refer to the standard error of the mean. Grey lines and colorful dots refer to individual mean looking times per condition and infant.

Interoceptive Sensitivity at 18 months

Next, we followed up the same infants at 18 months. Unfortunately, as the study was conducted during the Covid-19 pandemic, we had a large number of dropouts for the longitudinal follow-up. We conducted paired t-tests comparing looking times to synchronous and asynchronous stimuli at 18 months following our approach of the 9-month-old sample for iBEATs (t(27) = -.75, p = .461, d = .13) and iBREATH (t(29) = 1.09, p = .283, d = -.25) which did not indicate a significant mean preference. However, a non-significant result does not provide evidence for the absence of an effect (Lakens et al. 2017). Therefore, we conducted two equivalence tests using the effect size reported by Maister et al. (2017, d = .4) as equivalence bounds. Equivalence tests facilitate the interpretation of non-significant results by investigating whether a given confidence interval is too wide to discriminate between expected effect (= the equivalence bounds) and null effect, or whether one can rule out an effect at least as strong as we expected. The results of the equivalence tests indicate that we do not find conclusive evidence in favor of or against a mean preference effect in our 18- month-old sample for both the iBEATs (t(27) = .71, p = .242) or the iBREATH ((t(29) = 1.10, p = .141), potentially due to the small sample size.

When inspecting results from our analysis approach (e.g., Figure 1), as well as previous results (Maister et al. 2017, Weijs et al. 2023) it becomes evident that there are large individual differences in preferences (e.g., some infants prefer synchronous, some asynchronous trials). Thus, sample size might be an important factor in detecting a mean preference effect. To gain additional insights into the interplay of sample size and variability due to the large individual differences we conducted simulations which are reported in Supplementary Materials B. Overall, results from the simulation indicate that sample sizes of around 30 infants might be too small to reliably detect a mean preference effect in the iBEATs task.

Interoceptive Sensitivity at 3 months

Initially the present project was planned as a longitudinal approach spanning 3-, 9-, and 18 months. However, difficulties in recruiting very young infants due to the Covid-19 pandemic precluded us from starting the longitudinal assessment with 3-month-old infants. Still, we decided to test the iBEATs and iBREATH tasks in an additional 3-month-old sample once recruitment was possible again (total N = 80, pre-registration: https://aspredicted.org/44L_QKH). Data for this sample was collected after analysis of the 9- and 18-month-old data. Using our preregistered analysis approach, we found evidence for a group mean preference for synchronous stimuli in the iBEATs (paired Bayesian t-test; BF = 2.02, mean difference: 793.95ms, 95% CI [108.63, 1388.69], N = 53) but not in the iBREATH task (paired Bayesian t-test; BF = 0.23, mean difference: 502.21ms, 95% CI [- 701.49, 1600.86], N = 40) at 3 months of age. Due to the absence of evidence for the iBREATH task we conducted a test for practical equivalence similar to our approach for the 18-month-old’s data (Harms & Lakens, 2018). We used the effect size of the iBREATH task at 9 months to investigate whether we can rule out an effect at least as strong as that. Results indicated that we cannot distinguish between absence or presence of an effect at least as strong as it was present in the 9-month-old’s iBREATH sample (95%HDIs = [-711.41, 1606.80], region of practical equivalence: 77.08%). Reasons for the non-significant result might include the smaller sample size for the iBREATH at 3 months (N = 40) compared to the iBEATs (N = 53), combined with a reduced signal to noise ratio for eye-tracking tasks in 3-month-olds compared to older infants, in general. In sum, we replicate the results of our 9-month-old sample for the cardiac domain in 3-month-old infants, while finding inconclusive evidence regarding the respiratory domain.

Interoceptive Sensitivity in the first Two Years of Life – A MEGA Analytic Approach

So far, we have presented results on cardiac and respiratory interoceptive sensitivity spanning three age groups in the first two years of life. We find some evidence that infants prefer stimuli presented synchronously with their respective physiological signal. However, we also find some inconclusive evidence, such that we cannot distinguish between a null-finding and a significant effect. This might be potentially due to a small number of observations in some of our samples, as indicated by equivalence tests and data simulation. So far, we have investigated all age groups separately, building up on our preregistration, and the assumption that results might be different for age groups. An alternative approach that might help in adjusting for sample size issues is to pool together all age groups using an explorative MEGA-analytic approach (Koile & Cristia, 2021). Such an approach might give us the statistical power needed to make claims about absence or presence of a cohesive effect in the first two years of age, i.e., whether the mean effect across age groups supports the conclusion of a shared effect.

We computed two mixed models with looking time as outcome, condition and age-group/experiment, as well as their interaction, as fixed effects, and participant as a random effect using the R-package “glmmTMB” (Brooks et al., 2017) utilizing a beta error distribution for the iBEATs (Figure 2A bottom, Table 1 & 2) and the iBREATH (Figure 2B bottom, Table 3 & 4), respectively. First, we compared each model with a null model missing the condition term. For the iBEATs we find that the full model is statistically significant from the null model, suggesting a better fit (p = .012, Table 1). For the iBREATH we do not find a statistically significant better fit for the full compared to the null model (p = .091, Table 3). Still, the Bayesian information criterion (BIC), which can be interpreted similar to a Bayes Factor (Burnham & Anderson, 2004), related to this comparison is 15.1 smaller for the full, compared to the null model, giving some evidence for a better fit for the full model.

Results from the MEGA analysis for A) iBEATs and B) iBREATH.

Note. Plot of difference scores computed as mean synchronous looking times minus mean asynchronous looking times per individual for each age group, as well as the combined sample. Red dots refer to mean effects for the respective analysis as described above, red bars refer to 95% confidence/credible intervals. Dashed line indicates a difference of 0. For 3, 9, and 18 months age groups our preregistered analysis is plotted. For the combined sample we computed a linear mixed model using lme4 for visualization purposes as results from a mixed model with a beta error distribution cannot easily be transformed back to the original scale.

Full-null model comparison for the iBEATs model.

Results for the MEGA analysis of the iBEATs data.

Full-null model comparison for the iBREATH model.

Results for the MEGA analysis of the iBREATH data.

Next, we inspected the model output. For both models we did not find a significant interaction between age and condition indicating that the effect of condition on age group does not significantly vary between age groups. For the iBEATs, we found a significant main effect of condition on looking time in the combined sample indicating that infants show longer looking times for stimuli presented synchronously with their heartbeat over all ages (OR = 1.13, 95% CI [1.03, 1.25], t(1769) = 2.541, p = .011). In contrast, for the iBREATH, we did not find a significant effect of condition on looking time over all ages (OR = 1.07, 95% CI [0.96, 1.20], t(1284) = 1.192, p = .234). Interestingly, we find that all samples and tasks apart from the 18-month-olds iBREATH sample show a numerical preference for synchronous stimuli. Notably, when excluding the 18-month-olds iBREATH sample we find evidence for an effect of condition on looking times for the 3- and 9-month-olds iBREATH samples (Table 5, OR = 1.15, 95% CI [1.03, 1.29], t(1050) = 2.397, p = .017).

Results for the MEGA analysis of the iBREATH data excluding the 18-month-olds.

To sum up, regarding cardiac interoceptive sensitivity, results from the MEGA analysis support the notion that, across all age groups tested here, infants on average prefer stimuli presented synchronously with their own heartbeat. Regarding respiratory interoceptive sensitivity, we only found evidence in our 9-month-old sample, but not in the 3- and 18-month-olds, or the MEGA analysis. However, this latter result might be driven by the 18-month-olds iBREATH sample.

The Relationship Between Cardiac and Respiratory Interoceptive Sensitivity

Next, we investigated the relationship between cardiac and respiratory interoceptive sensitivity. First, we computed absolute proportional scores as individual difference scores for the iBEATs and the iBREATH following previous approaches and our preregistration. These scores range from 0 to 1, and a higher score indicates a stronger preference for either synchronous or asynchronous stimuli in the iBEATs or iBREATH, respectively. However, a difference score does not indicate the direction of the preference (synchronous or asynchronous). The reasoning behind the use of absolute proportional scores is that, in principle, both a preference for synchronous and for asynchronous stimuli indicates that the participant identified a (bodily) signal from noise. Importantly, all studies using iBEATs like paradigms in infants so far have used absolute proportional scores to investigate individual differences (Maister et al. 2017; Weijs et al. 2023). Further, visual inspection of the individual preferences in both paradigms (grey lines, Figure 1) reveals that, although the group mean difference displays a preference to the synchronous stimuli, in fact, looking preferences for both synchronous and asynchronous stimuli can be observed on an individual level.

We used a mega analytic approach, pooling together data from all age groups, to investigate the relationship between both tasks. We fitted a mixed model using a beta-error distribution with the iBREATH scores as outcome, the iBEATs, age, and their interaction as factors, and participant as a random factor (Table 6). We did not find a strong relationship between cardiac and respiratory interoceptive sensitivity across all ages (Figure 4A, N = 84), mirroring previous results in adults and children (Garfinkel et al., 2016; Nicholson et al., 2019). However, we found a significant interaction between the iBEATs scores and age at the 18-month level (β = 3.13, SE = 1.41, p = .027). To follow up the interaction, we conducted a pairwise comparison which indicated that for the effect of iBEATs scores on the iBREATH scores there was a significant difference between the 9- and 18-months of age (β = -0.60, SE = 0.24, p =.043), while there were no significant differences between the 3- and 18-month groups (β = -0.60, SE = 0.25, p = .055). Still, coefficients indicate a similar strength and direction of both effects.

Relationship between iBEATs and iBREATH using a combined sample.

Note. Histogram with plotted line for individual performance on iBEATs and iBREATH using a beta regression. Following Maister et al. (2017), individual difference scores were computed as proportion of absolute difference between synchronous and asynchronous trials.

Exploratory analysis for age effect.

Note. Absolute proportional scores for A) iBEATs and B) iBREATH plotted for each age group. Red dots refer to group means, and colorful dots to individual means.

MEGA analysis for the relationship between iBEATs and iBREATH.

Developmental Changes in Interoceptive Sensitivity

Next, we aimed at further investigating whether there are developmental changes in interoceptive sensitivity in the first two years of life. Initially, following our preregistration (https://aspredicted.org/GMB_XCW), we conducted a longitudinal analysis using the infants that participated both at 9 and 18 months of age. Unfortunately, as described earlier, due to the study being conducted during the Covid-19 pandemic, only a subsample of infants could be re-invited to the lab in the targeted age range and contributed data suitable for longitudinal analyses. Comparing the absolute individual difference scores between both age groups found no evidence for a strong change in cardiac (paired Bayesian t-test; BF = 0.26, N = 20) or respiratory (paired Bayesian t-test; BF = 0.33, N = 19) interoceptive sensitivity, indicating that absolute individual difference scores in both domains do not change substantially from 9 to 18 months of age. Notably, a regions of practical equivalence follow-up analysis indicates that we cannot rule out an effect at least as strong as a change of .1 for the absolute proportional scores (iBEATs: ROPE [-0.10, 0.10], 97.53% inside ROPE, iBREATH: ROPE [-0.10, 0.10], 97.76% inside ROPE, 95% HDI [-0.11, 0.05]). Further, in an exploratory analysis we computed Spearman correlations between timepoints. We did not find evidence for the iBEATs (r(18) = .236, p = .315) and the iBREATH (r(17) = .195, p = .423) that individual difference scores correlate strongly between timepoints.

To increase the number of observations, and statistical power, we conducted an exploratory MEGA-analytic follow up in which we included all infants, not only those that contributed usable data to both time points. Results showed that individual difference scores increased significantly for the iBREATH (Figure 4B, Table 7) in the 18-month-olds compared to the 3-month-olds (OR = 0.544, SE = 0.12, p = .014), and the 9-month-olds (OR = 0.525, SE = 0.12, p = .004), but not for the iBEATs (Figure 4A, Table 8) indicating that respiratory, but not cardiac, interoceptive sensitivity increases at 18 months of age.

Change in absolute proportional scores across age groups for the iBEATs.

Change in absolute proportional scores across age groups for the iBREATH.

Specification Curve Analysis

Notably, apart from the 18-month-old iBREATH sample, we found that (numerical) mean group preferences indicated a preference to stimuli presented synchronously with the respective bodily signal. Thus, mean group preferences were switched compared to our initial expectation and the original study by Maister et al. (2017) who found a mean group preference for stimuli presented asynchronously to the infant’s heartbeat. In addition, other studies have failed to find evidence for cardiac interoceptive sensitivity in infants (Wejis et al. 2023). Further, a wide range of analytical choices have been reported in approaches on cardiac interoception in infants (Maister et al., 2017; Weijs et al., 2023) and nonhuman primates (Charbonneau et al., 2022) to date.

Therefore, it is important to further describe and validate our results. Using a specification curve analysis (Simonsohn et al., 2020), it is possible to map out the space of theoretically justified analysis strategies on a given dataset. Thus, it is possible to investigate whether analytical choices, such as differences in exclusion criteria and physical signal preprocessing, impact the results. Importantly, this method allows us to rule out that a group mean preference for synchronous stimuli is due to specific analytical choices of our preregistered analysis or whether a range of different analysis paths come to the same conclusion.

We ran a specification curve analysis following the approach outlined by Simonsohn et al. (2020). We used our 9-month-olds dataset as input dataset, as it shows the clearest evidence for infant interoceptive sensitivity (i.e., better data quality compared to the 3-months sample, and larger sample size compared the 18-months sample). First, we identified theoretically justified analysis paths applicable to the present dataset by comparing the approaches presented in Weijs et al. (2023), Maister et al. (2017) and Charbonneau et al. (2022). As the first step, we focused on the iBEATs, as already 3 different research groups have published experiments similar to the iBEATs and thus, a number of different specifications could be extracted from the literature (e.g., regarding physiological signal processing, in/exclusion criteria for infants and number of trials, or statistical analysis; for a full list see the Supplementary Materials A). Combining all possible ways of analyzing the present dataset gave a number of 1024 possible analyses which we subsequently ran (Figure 5A). Next, we ran a specification curve for the iBREATH data of our 9-month-old sample by extracting and adapting analytical decisions we used for the iBEATs, which resulted in 1536 possible analyses (Figure 5B).

Specification Curve Analysis for the A) iBEATs and B) iBREATH task.

Note. Specification curve analysis plotting standardized beta regression coefficients (y-axis) and number of analysis (x-axis) for A) iBEATs and B) iBREATH. Number of analysis (x-axis) are ordered increasing from lowest to highest standardized beta regression coefficient. Blue color indicates a significant effect (p <. 05) for a mean synchronous preference, red color indicates a significant effect (p < .05) for a mean asynchronous preference, and grey indicates a non-significant outcome

Our results indicated that for the iBEATs almost half (44.73%) of all analytical paths led to a significant result (p < .05), while for the iBREATH 17.51% of all analytical paths came to such a conclusion. Almost all specifications indicated a mean group preference for synchronous trials (43.16%, Figure 5, blue color). Interestingly, however, there were also a handful of specifications for the iBEATs (n = 16, 1.6%) that would have resulted in a mean group preference for asynchronous trials (Figure 2A, red color). In sum, these results can be seen as a validation of our preregistered analytical approach described above, as they highlighted that a mean group preference for synchronous trials is not dependent on the combination or interaction of specific analytical choices. Still, given that 1.6% of analysis paths would have come to a different conclusion might indicate that the influence of analytical choices is not completely negatable.

Discussion

In the present study we investigated cardiac and respiratory interoceptive sensitivity in 3-, 9-, and 18-month-old infants utilizing a preregistered approach, validated by a specification curve analysis, and MEGA analytic approaches. Regarding cardiac interoceptive sensitivity we found evidence for a preference for stimuli presented synchronously in all three age groups. Regarding respiratory interoceptive sensitivity we find a more nuanced picture with infants showing a significant preference for stimuli presented synchronously at 9-months of age, but not at 3- and 18-months. We did not find strong evidence for a relationship between cardiac and respiratory interoceptive sensitivity in infants in the first year of life. However, we find some evidence for a positive relationship at 18 months. Further, in an exploratory analysis we find indications that respiratory perception increases between 9- and 18-months. However, due to the small sample size at 18-months these results must be considered speculative and need to be validated in further research.

In recent years, new theoretical frameworks have deepened our understanding of interoceptive processing (Murphy et al., 2020; Suksasilp & Garfinkel, 2022). For instance, in their 2x2 model, Murphy et al. (2020) distinguished between two main factors of interoception – interoceptive accuracy (i.e., how exact one perceives internal bodily signals) and interoceptive attention (i.e., how often one thinks of internal bodily signals in everyday life). When applying the Murphy model to the iBEATs and the iBREATH, both aspects might be needed to show a preference. First, it is necessary to access one’s own internal bodily signals to notice a difference between synchronous and asynchronous signals (i.e., interoceptive accuracy). Second, one also needs to pay attention to one’s own bodily signals and compare them to what is happening on the screen (i.e., interoceptive attention). Thus, it is possible that the present task does not distinguish between both dimensions. Instead, the present task might measure a propensity to engage with own interoceptive signals (Murphy, 2023). In fact, when considering the potential impact of interoceptive sensitivity in real-world settings it is unlikely that “pure” interoceptive accuracy or attention can be differentiated, but that the interplay of both shapes the outcomes.

We do not find evidence for a strong relationship between cardiac and respiratory interoceptive sensitivity in the first year of life. This finding is in line with empirical results not finding a relationship between cardiac and respiratory interoception in adults (Garfinkel et al., 2016) and children (Nicholson et al., 2019). Further, these findings might be explained by accounts proposing different brain networks for processing of cardiac and respiratory information (Suksasilp & Garfinkel, 2022). Interestingly, we find evidence for a positive relationship between cardiac and respiratory perception in our 18-month-old sample.

To investigate individual differences, we used absolute proportional scores, following previous approaches (see Figure 5, Maister et al., 2017; Weijs et al., 2023). As a preference in any direction in the iBEATs or the iBREATH task can, in principle, be considered as evidence for the participant’s ability to distinguish their own bodily signals from noise. However, it remains an open question if individual looking preferences for synchronous or asynchronous stimuli have a functional importance. In other studies, investigating infants’ processing of information about body ownership, preferential looking paradigms similar to the iBEATs and iBREATH have been used. For instance, newborns prefer to look at synchronous visuo-tactile cues compared to asynchronous ones (Filippetti et al., 2013), similar to 7- and 10-month-olds (Zmyj et al., 2011). In other cases, older infants showed a looking preference for sensorimotor-incongruencies (Rochat, 1998). Furthermore, at 5 months of age infants recognize delays in visualization of their own leg-movements and prefer to look at stimuli that are asynchronous to their own movements (Bahrick & Watson, 1985). Thus, previous results are inconclusive as to whether infants generally prefer to look at stimuli that are synchronous or asynchronous to their own bodily movements and experiences. Yet, there is convincing evidence that they detect such (in-)congruencies. For instance, 14-month-old infants are more likely to help a person that had previously bounced in synchrony with them, compared to an asynchronously bouncing person (Cirelli et al., 2014).

Longer looking times for synchronous stimuli might indicate a familiarity preference (or more generally a preference for a stimulus that is easier to process). Longer looking times for asynchronous stimuli might indicate a novelty preference, that is, a preference for a stimulus that offers a learning opportunity (Hunter & Ames, 1988). According to the framework presented by Hunter and Ames (1988), the preference for novelty or familiarity depends on three factors that interact with each other: familiarization, age, and task difficulty. In short, the model proposes that less familiarization, a lower age, as well as an increase in task difficulty, facilitates a familiarity preference. Thus, when applying the framework to the present results it might be that certain details contributed to a familiarity preference displayed by most infants (as indicated by the synchronous preference). For instance, the data presented here at 9- and 18-months was collected as part of a larger study with several other paradigms, whereas the study of Maister et al. (2017) used the iBEATs as the first task of the session. Thus, this increased complexity for the infant in our setting might have impacted the task difficulty and potentially reduced the familiarization. Further, at 3-months of age the experimental setting might be more challenging thus leading to an increased complexity. However, the interpretation of looking time preferences in infancy research in general and the Hunter and Ames (1988) framework specifically, remains a topic of debate and further research (Bergmann et al., 2019; ManyBabies 5 Team, 2023).

Nevertheless, the switch in mean preference reported here regarding cardiac interoceptive sensitivity, compared to Maister et al. (2017), might also indicate a development around 5 months of age. Infants at 5 months of age might be more drawn to asynchrony between their cardiac signals and visual stimuli, while 3- and 9-month-old infants on average prefer synchrony. Such a developmental trajectory might also explain the null findings reported by Weijs et al. (2023), as infants tested in their study were in between 5 and 7 months of age. It is also possible that there are developmental windows in which the perception of bodily signals plays an important role. For instance, age groups in the present study were chosen to be in a similar range as the emergence of theoretical relevant constructs, such as mirror self-recognition. However, to disentangle such effects adequately powered longitudinal studies are needed.

To validate the impact of analytical choices on mean group preferences we used a specification curve analysis. In the following, we will discuss impactful decisions and make recommendations for future approaches (see Table 9). Regarding analytical choices that had a strong impact on the results, we found that applying the same physiological data rejection criteria to synchronous and asynchronous trials led to more significant results (Table 9, 1st entry). The logic behind not removing asynchronous trials with physiological artifacts in the tasks described here is that in these trials the signal is not generated by real-time feedback of the physiological signal. Thus, it is not directly relevant for stimulus presentation. However, our results indicate that applying differing criteria for both trial categories might obscure effects.

Number of significant results for specifications for iBEATs and iBREATH

Moreover, we found that for both tasks, in terms of physiological artifact rejection, including more data points led to more significant results (Table 9, 3rd entry). This might be explained by the inclusion of more data, and thus, greater statistcal power. For instance, in the iBEATs task, strict artifact rejection means that a trial is removed once a single R-peak is not (or falsely) detected. However, in such trials it might still be possible to recognize that the stimulus presentation is synchronous or asynchronous to one’s heartbeat and it thus still holds information relevant for the task. For future studies, we would recommend a more fine-tuned approach for removing trials based on physiological artifacts. Furthermore, we would advise employing the same criteria to all conditions used.

Regarding specifications that did not have a strong impact, we found that outlier criteria using standard deviations had a negliable impact on the results (Table 9, 2nd entry). Such criterions are usually applied to remove extreme values in the data. In the paradigms described here, looking times were bound by trial length (e.g., in the iBEATs max. 20 s). Thus, rejecting trials based on standard deviation might not be useful in analyses of preferential looking paradigms that use maximum trial length. One reason might be that extremely large outliers in looking times are impeded already by the experimental design. We also did not find that an inclusion criterion regarding a minimal number of valid trials an infant had to contribute to be included in the analysis changed the number of significant results much. Such criterions are typically used to increase the reliability of results, as individual trial outliers’ weight stronger when an infant only completes few trials. For instance, in our preregistered analysis, infants had to complete a minimum of 8 trials for the iBEATs or 4 trials for the iBREATH to be included in the analysis (Table 9, 6th entry). However, there were few infants who completed less than these minimum number of trials. For future approaches, we would advise against using exclusion criteria based on standard deviations or number of trials.

Overall, the recommendations outlined above can be discussed within the scope of a fundamental challenge in experimental research – how to balance noise in a given dataset with losing statistical power by exclusion of participants and trials. This is especially relevant in infancy research that oftentimes deals with high drop-out rates and noisy datasets. For the present dataset, we find that leaning on the side of including more data points (e.g., regarding rejection of physiological artifacts, or exclusion criteria) might be more beneficial as long as the same criteria are applied to all data. Thus, exclusion of data points should be driven by trying to minimize the impact of erroneous or random datapoints, while still keeping those that have interesting characteristics (Leys et al., 2019). We want to stress that outlier criteria should ideally be formulated within a preregistration (Bakker & Wicherts, 2014)

Overall, we found more significant results for a group mean preference for the iBEATs (44.73%) compared to the iBREATH (17.51%) at 9 months of age. Given that our exploratory analysis indicated an increase of iBREATH difference scores from 9 to 18 months, respiratory interoceptive sensitivity might develop in this age range. However, it is also possible that the coupling of physiological signals with visual stimuli in infancy might produce stronger mean preferences for cardiac-, compared to respiratory signals. In sum, the results of the specification curve analysis validated our preregistered analysis, as almost all analysis paths resulted in a numerical mean preference for synchronous stimuli.

Ideas and Speculation – Development of Respiratory Interoceptive Sensitivity

While we had found consistent evidence for cardiac interoceptive sensitivity, whereby infants on average prefer stimuli presented synchronously with their heartbeat, the evidence regarding respiratory interoceptive sensitivity was more nuanced. In particular, the 18-month-olds sample for the iBREATH displayed three interesting characteristics: it was the only sample showing a (numerical) preference for the asynchronous condition, absolute proportional scores increased compared to 3, and 9 months, and there was a positive relationship with cardiac interoceptive sensitivity scores at 18 months (but not at 3 or 9 months). To interpret these results, one might speculate that a maturation of respiratory interoceptive sensitivity towards 18 months of age takes place. Given the relevance of interoception for theoretical accounts of self-development, a hypothesis to be tested in future research is that developmental improvement in respiratory perception might be related to increases in self-recognition around 18 months of age (Fotopoulou & Tsakiris, 2017; Musculus et al., 2021). Moreover, maturation of respiratory perception might be related to gross motor development, which drastically matures in the first two years of life (WHO Multicentre Growth Reference Study Group, 2006) and which has been shown to be related to respiratory function in children with cerebral palsy (Kwon & Lee, 2014). However, the result and interpretation warrant further follow-up given the small sample size of the 18-month-olds and exploratory nature of the respective analysis.

Limitations

The data presented in this study holds several limitations. Drop-out numbers must be discussed. For the 9-month-old sample ninety mother-infant dyads were invited to take part in the present study, but only 74/75 provided data for iBEATs and iBREATH, respectively. Further, only 52 (iBEATs) and 56 (iBREATH) could be included in the confirmatory analysis based on the predefined exclusion criteria, and only 34 contributed usable data for both paradigms. This might also be attributable to the paradigms being embedded in a data collection for a larger project. Similar, for the 3-month-old sample 80 infants were invited to the lab, however only 53 (iBEATs) and 40 (iBREATH) could be included in the analysis. Also, the 9- and 18-month-old samples were collected during the Covid-19 pandemic, which led to high dropout rates for the longitudinal follow-up at 18 months, as lockdowns and Covid-19 cases made data collection challenging. Thus, we might not have had sufficient statistical power to detect possible effects using our longitudinal sample.

To overcome this limitation, we have computed exploratory analysis using all data available, not just those infants that contributed data at both timepoints. However, such an approach can only provide correlational evidence. Regarding the specification curve analysis, it is possible that there are specifications that might be relevant, which were not considered here. Furthermore, in the specification curve analysis, we did not inspect assumptions underlying the statistical tests in-depth.

Conclusion

To sum up, we present compelling evidence that infants are sensitive to their own cardiac signals in the first two years of life by replicating the paradigm introduced by Maister et al. (2017). Moreover, we present the first evidence that infants are sensitive to their respiratory signals using the iBREATH paradigm. By using a preregistered approach, a comparably large sample size and age range spanning the first two years of life, and by extending the interoceptive modality assessed to respiration, we provided important empirical evidence for theoretical accounts highlighting the relevance of interoceptive sensitivity in infancy. Regarding longitudinal development, we found no evidence for a change of interoceptive sensitivity in our confirmatory longitudinal analysis. However, exploratory analysis using a between groups approach revealed evidence for an age-related increase in respiratory, but not cardiac, interoceptive sensitivity scores towards 18-months-of age. We did not find that cardiac and respiratory interoceptive sensitivity are strongly related, mirroring results in adults and children. However, we find exploratory evidence for a relationship at 18-months.

We used a specification curve analysis to validate our results and showed that a specification curve analysis is a suitable tool to investigate the impact of analysis choices in infancy research. Finally, we provided guidelines for the analysis of the two paradigms presented here, as well as for preferential looking time paradigms in general. Overall, our results demonstrate that infants’ interoceptive sensitivity, measured through coupling a visual presentation to a physiological signal, is a replicable phenomenon, that can be generalized to different age groups, as well as to different interoceptive modalities. By providing empirical results that go beyond previously published studies on infant interoception, our results give an important empirical basis for theoretical approaches targeting interoception during development, as well as related constructs such as self-perception, in early infancy.

Materials and Methods

Sample

The data reported here was collected as part of a larger project involving a range of other measures. For the 9-month-old sample in total, 90 infant-mother dyads were tested in the laboratory. Initially, we intended to invite mother-infant dyads when the infant was 9 to 10 months of age. However, as this study was conducted during the Covid-19 pandemic, we extended the age range to 10-months and 15 days to be able to include a sufficient number of infants (Mage = 301.63 days, SDage = 10.57). We followed up the same sample again when the infants were 18-20 months of age (N = 54, Mage = 576.65 days, SDage = 14.49). Data collection took place during the Covid-19 pandemic, from September 2020 to September 2021. The total sample size was based on a power analysis for an unrelated analysis. However, building up on the results reported by Maister et al. (2017; paired t-test; t = 3.267, n = 29, Cohen’s d = .4), the study would have been adequately powered to detect an effect approx. 30% (Cohen’s d = .3) smaller than reported by Maister et al. (2017). Regarding the 3-month-old sample we invited 80 infant-mother dyads to the lab when the infant was 3-4 months old (Mage = 113.53 days, SDage = 7.82).

Participants were recruited from an existing database of volunteer families and parents. We strived to include an equal number of boys and girls. All infants were born full term with normal birth weight and had no known developmental delays or neurological impairments. Experiments were approved by the ethics committee of the University of Vienna (reference no. 00504).

Experimental Procedures

Upon arrival in the laboratory, primary caregivers were asked to fill out an informed consent form. After a warm-up period, the infants performed several tasks in randomized order. In the current manuscript, we only report results from the iBEATs and the iBREATH tasks, as results from the other tasks will be presented in separate reports. The order of the tasks was counterbalanced across participants. As both the iBEATs and the iBREATH followed a similar structure and required similar equipment, the tasks were performed back-to-back in an alternating order. Between the iBEATs and the iBREATH, we additionally acquired 3-minutes of resting state data to analyze cardio-respiratory coupling while infants watched a neutral video. The procedure was the same for infants from all age groups. Notably, the infants participating at 3-months only did the iBEATs and iBREATH in alternating order.

iBEATs

To measure cardiac interoceptive sensitivity, we replicated the iBEATs paradigm (Maister et al., 2017). Three electrodes were attached to the infant’s chest in a three-lead setup. We used an ADInstruments Powerlab and BioAmp equipment to monitor and to record cardiac activity (www.adinstruments.com). To identify R-peaks, we used the built-in hardware-based function, namely “fast response output”, which sends a pulse to a presentation-computer via a custom-made Arduino set-up, once a predefined threshold is reached. The threshold was set individually for each infant.

Upon the placement of ECG electrodes and the adjustment of the fast response output, infants were placed in an infant chair roughly 60 cm away from an eye tracker sampling at 500 Hz (Eyelink 1000 plus). The caregiver was asked to sit right behind the infant. In case the infant got fussy or did not tolerate the infant chair, we offered the option to place the infant on the caregiver’s lap. Following a 3-point calibration, infants were presented with trials in which visual stimuli (i.e., either clouds or stars) moved rhythmically up or down on the screen, either synchronous or asynchronous to the infant’s heartbeat. Movements of the stimuli were accompanied with a jumping sound to attract infants’ attention. In synchronous trials, the movement of the stimulus on the screen was coupled to each infant’s R-peak. For asynchronous trials, first, mean inter-beat-interval of the preceding synchronous trial were computed for each infant. Movement of the stimuli then followed a predetermined rhythm that was either 10% faster or slower than the average inter-beat-interval of the last synchronous trial for that infant.

There was a maximum of 80 trials in the task. For the 9- and 18-month-old sample the first trial was always synchronous, as asynchronous trials required a synchronous trial to compute an average inter-beat-interval. For the 3-month-old sample we recorded physiological signal directly before the first trial, thus the first trial could be either synchronous or asynchronous. Before each trial, an attention getter was displayed. Once the infant looked at the screen, a trial started lasting for a minimum of 5 seconds and a maximum of 20 seconds. After the initial 5 seconds, the duration of the trials was infant controlled. The ongoing trial automatically terminated, and the next trial started, if the infant looked away from the screen longer than 2 consecutive seconds or the maximum trial duration of 20 seconds was reached. The task was terminated, if the infant looked away from the screen longer than four consecutive trials (i.e., a total duration of 20 consecutive seconds) or the infant became fussy or tired.

The visual stimuli were counterbalanced across experimental conditions and infants. This way, for each participant, either a star or a cloud could represent the synchronous or the asynchronous stimulus. Stimuli appeared on the left or on the right side of the screen. Location of the stimulus (i.e., left or right) was pseudo-randomly selected for each trial. The trial order was pseudo-randomized, thus, two synchronous or two asynchronous trials could have followed each other. The stimulus presentation was performed using a custom-made script in MATLAB (Matlab 2018b).

iBREATH

To measure respiratory interoceptive sensitivity, we developed and used the iBREATH paradigm, which followed a similar logic to the iBEATs task. A respiratory belt connected to an ADInstruments Powerlab was attached to the infant’s torso (www.adinstruments.com). Once a stable signal was obtained, infants were seated in an infant chair roughly 60 cm away from an eye tracker sampling at 500 Hz (Eyelink 1000 plus). The caregiver was asked to sit right behind the infant. The signal of the respiration belt was sent to a presentation computer using a custom-made Arduino set-up.

Similar to the iBEATs procedure, during a 3-point calibration, infants observed moving circles accompanied by a sound. Following calibration, infants were presented with an infant-friendly neutral stimulus (i.e., a strawberry or an apple), which increased and decreased in size, either synchronous or asynchronous to that infant’s respiratory rhythm. Stimuli presentation was accompanied by an infant-friendly sound. The volume of the sound was adjusted in relation to the size of the stimuli, thus, increasing and decreasing as the image got bigger and smaller, respectively. In synchronous trials, the stimuli on the screen expanded and shrank in synchrony with each infant’s respiration rhythm. In asynchronous trials, movement of the stimulus was either 10% faster or slower than the average breathing frequency of the last trial for that individual infant.

To generate asynchronous trials, two components of the immediately preceding synchronous trial were used to compute a sinusoidal signal that was either 10% faster or slower than the signal in the previous trial. First, the average breathing frequency of the last trial was extracted, which was either speeded up or slowed down based on the asynchronous trial type. Then, the average respiratory amplitude in the last synchronous trial was extracted, which was used to set the amplitude of the asynchronous trial. By combining frequency and amplitude, the sinusoidal signal was created.

The iBREATH paradigm consisted of a maximum of 80 trials. For the 9- and 18-month-old sample the first trial always was a synchronous trial because asynchronous trials require a synchronous trial to generate a respiratory signal. For the 3-month-old sample we recorded physiological signal directly before the first trial, thus the first trial could be either synchronous or asynchronous. Before each trial, an attention getter was displayed. A trial was displayed for a minimum of 5 seconds and a maximum of 30 seconds. Following the initial 5 seconds, the duration of the trials was infant controlled. An ongoing trial was terminated automatically, and the next trial started, if the infant looked away from the screen longer than 2 consecutive seconds or when the maximum trial duration of 30 seconds was reached. The task was terminated, if the infant looked away from the screen longer than four consecutive trials (i.e., a total duration of 20 consecutive seconds) or when the infant became fussy or tired.

Stimuli presentation was counterbalanced across experimental conditions and infants. That is, either a strawberry or an apple was associated with the synchronous or the asynchronous stimuli for each infant. The stimulus location was counterbalanced; thus, a stimulus could appear either on the left or on the right side of the screen. Location of the stimuli was pseudo-randomly assigned for each trial. The trial order was pseudo-randomized so that two synchronous or two asynchronous trials could follow each other. The stimulus presentation was performed using a custom script on MATLAB (Matlab 2018b).

Confirmatory Analysis

This study was preregistered on aspredicted.org. The preregistration for the 9-month-old sample can be accessed here: https://aspredicted.org/QP9_6FP. The preregistration for the longitudinal analysis can be assessed here: https://aspredicted.org/GMB_XCW. The preregistration for the 3-month-old sample can be accessed here: https://aspredicted.org/44L_QKH. Data, analysis-, and experimental scripts are available here: https://osf.io/jy5fe/?view_only=6199b7c7e7f34599a10ccaf25d5e33d8.

Pre-processing

In a first step, we visually inspected each trial of the iBEATs and the iBREATH tasks to exclude trials in which stimulus presentation was impacted by technical problems or physiological artifacts. We excluded trials for technical problems if transmission of the physiological signal was interrupted during a trial (e.g., an electrode was removed, a cable got unplugged etc.) or stimulus presentation was interrupted (e.g., there was a problem in connecting to the stimulus presentation computer).

Next, we excluded trials with physiological artifacts. In the iBEATs, we excluded a trial if not all R-peaks were picked up by the fast-response-output. In the iBREATH, we excluded a trial if movement or other technical artifacts were visible in the respiratory signal during a trial. Furthermore, in the iBEATs, infants were included if they completed a minimum of 8 trials. In the iBREATH, we adapted this criterion as respiration is a slower signal than the heartbeat and maximum trial durations were longer. As this might result in fewer total number of trials in the iBREATH task as compared to the iBEATs task, we adjusted the cut-off number for the iBREATH task and included data of infants who completed a minimum of 4 trials in the analysis. For the longitudinal analysis, we used a less strict criterion to increase our potential sample size as outlined in our preregistration. Thus, infants were included when they completed at least 4 trials in either task.

Pre-processing of looking times-data

We defined an area of interests (AOIs) based on the maximum coordinates of the animated character on the screen. We took the maximum movement range of the animated character and computed looking times in each trial as the summed duration of all eye-tracking samples falling in that AOI. Because we aimed to replicate the study by Maister and colleagues (2017), we followed the same analysis approach as they did in the original paper. Accordingly, we excluded trials with looking times two standard deviations away from the condition’s (i.e., synchronous or asynchronous trials) group mean. To compare cardiac and respiratory interoceptive sensitivity, we computed individual discrimination scores defined as the absolute proportion of looking time difference between synchronous and asynchronous conditions, again following the procedure by Maister and colleagues (2017). For both tasks we excluded trials with looking times of 0, as it is not clear whether infants did not look at the screen in these trials, or whether there were technical issues in these trials.

Statistical analysis

All statistical analysis reported here were computed in R (R Core Team, 2022) using the packages “pwr” (Champely, 2020), “TOSTER” (Lakens et al., 2018), “ggstatsplot” (Patil, 2021), “BayesFactor” (Morey & Rouder, 2022), “specr” (Masur & Scharkow, 2019), “lme4” (Bates et al., 2015), “afex” (Singmann et al., 2022), “psych” (Revelle, 2022), “broom.mixed” (Bolker & Robinson, 2022), “bayestestR” (Makowski et al., 2019), “DHARMa” (Hartig, 2022), “glmmTMB” (Brooks et al., 2017) and “faux” (DeBruine, 2023). To compute the Stouffer’s z indices for the specification curve analysis we used the function provided in Simonsohn et al. (2021).

Out of the 90 mother-infant dyads invited to participate in the study, for the 9-month-old sample, 74 infants contributed any data for the iBEATs task and 75 to the iBREATH task. For the iBEATs task, 3 additional infants were excluded due to technical errors. Furthermore, following our preregistered analysis, 2 infants were excluded for the iBEATs task due to not reaching the minimum of 8 trials, 9 due to noisy ECG data, and 8 due to the +/- 2SD outlier rejection criterion, leaving a final sample of 52 infants. In comparison, for the iBREATH task, 10 infants were excluded due to technical errors, 3 due to not reaching at least 4 trials, 3 due to noisy respiratory belt data, and 3 due to the +/- 2 SD outlier rejection criterium, leaving a final sample of 56 infants.

As outlined in our preregistration, we lowered the threshold for outlier rejection in the longitudinal analysis to increase the sample size. Thus, for all analysis infants who completed at least 4 trials per task were included. For the 9-month-old data this would have slightly changed the iBEATs analysis plan. However, this criterion did not lead to the inclusion of additional infants in the final sample. For the 18-month-olds’ iBEATs data, no infants were excluded due to not reaching at least 4 trials, 4 infants were excluded due to quality of the ECG signal, and 2 infants were excluded due to the +/- 2SD outlier rejection criterium, resulting in a final sample of 28. For the 18-month-olds’ iBREATH data, 1 infant was excluded due to not reaching at least 4 trials, 3 infants were excluded due to noisy physiological data, and 3 infants were excluded due to the +/- 2SD outlier rejection criterium, leaving a final sample of 30 infants. Means and SDs for number of trials completed for infants included in the analysis can be found in Table 7.

Descriptive information for number of trials

Out of the 80 mother-infant dyads invited to participate in the 3-month-old study, for, 77 infants contributed any data for the iBEATs task and 71 to the iBREATH task. Furthermore, following our preregistered analysis, 1 infant was excluded for the iBEATs task due to noisy ECG data, and 23 due to problems with the eye-tracking giving a sample of 53 infants. In comparison, for the iBREATH task, 2 infants were excluded due to not reaching at least 4 trials, 10 due to noisy respiratory belt data, and 19 due to problems with the eye-tracking resulting in a final included sample of 40 infants.

All statistics in our confirmatory analysis using null hypothesis testing were evaluated against a two-tailed significance level of p < .05. In case of non-significant results, if possible we aimed at following up the respective analysis with an equivalence or region of practical equivalence test (Lakens et al., 2018). To compare synchronous and asynchronous trials at 9 and 18 months, both for the iBEATs and the iBREATH tasks, we computed two separate paired t-tests (Maister et al., 2017; see supplementary materials C for more information on asynchronous trials). At 3 months we used a Bayesian paired t-test. We preregistered to correlate iBEATs and iBREATH scores at 9 months. However, in the manuscript we only report the details of the MEGA-analysis (see next paragraph). To investigate the longitudinal development of cardiac and respiratory interoceptive sensitivity, we computed a Bayesian paired t-test comparing absolute proportional scores between 9 and 18 months.

MEGA-analysis

We computed three MEGA-analyses pooling together data from all three age groups – to investigate a mean preference effect, the relation between the iBEATs and the iBREATH, as well as the development over age groups. First, to investigate whether there is a mean preference in the iBEATs and the iBREATH tasks, we computed mixed models using the R-package “glmmTMB” utilizing a beta-error distribution and logit-link function. We used looking time as outcome, condition, age, and their interaction as fixed effect, and participant as a random effect. We transformed age into a factor with 3-levels (3, 9, 18 months), whereby 3 months was set as reference level. After fitting the model, we visually inspected assumptions using the check_model function of the R-package “performance”. In addition, we checked for overdispersion using the “DHARMa” package (iBEATs: dispersion = 1.07, p = .168; iBREATH: dispersion = 1.12, p = .120). Further, we checked a reduced model lacking the interaction term for issues of collinearity (iBEATs, VIF = 1.00; iBREATH: VIF = 1.00). We then conducted full-null model comparisons by fitting a null-model that excluded the condition factor.

To investigate whether there is a relationship between the iBEATs and the iBREATH absolute proportional scores we computed a mixed model using the R-package “glmmTMB” using a beta error distribution. We used the iBREATH scores as outcome variable, the iBEATs scores, age group and the interaction as factors, and participant as a random intercept. Age was included as a factor with 3 months as reference level. After fitting the model, we visually inspected assumptions using the check_model function of the R-package “performance”. We also did not find evidence for overdispersion (dispersion = 1.03, p = .800). Further, we checked a reduced model lacking the interaction term for issues of collinearity (VIF = 1.04).

Last, to investigate whether there is a difference between absolute proportional scores in the iBEATs and the iBREATH we computed two mixed models using the R-package “glmmTMB” with a beta error distribution. We used the iBEATs or the iBREATH absolute proportional scores as outcome, age as factor, and participant as a random effect. Age was included as a factor with 3 levels. After fitting the model, we visually inspected assumptions using the check_model function of the R-package “performance”. Further, we checked for absence of overdispersion (iBEATs: dispersion = 1.07, p = .560; iBREATH: dispersion = 1.09, p = .600). We found that for the iBEATs, the full model did not significantly improve fit over the null model (χ²(3) = 0.170, p < .919), but for the iBREATH, the full model did provide a significantly better fit than the null model (χ²(3) = 10.60, p = .005).

Acknowledgements

Markus R. Tünte and Stefanie Höhl are funded by the FWF (Project number: P33486) and Ezgi Kayhan is funded by the DFG (Project number: 402789467). We want to thank all infants and mother who participated in this project. We also want to thank Monica Vanoncini and Liesbeth Forsthuber, as well as all research assistants, interns, and master students for their help in data collection and preparation of the experiment: Sandra Gaisbacher, Laura Neumann, Julia Otter, Lisa Triebenbacher, Jakob Weickmann, Felicia Wittmann, Gesine Jordan, Nina Maier, Rebecca Lutz, Celine Dorczok, Ann-Cathrine Gärtner, Maria Baumann, Nadine Pointner.

Contributions

Markus R. Tünte: conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing – original draft, writing – review & editing, visualization, supervision, project administration, funding acquisition. Stefanie Höhl: conceptualization, methodology, validation, resources, writing - original draft, writing – review & editing, supervision, project administration, funding acquisition. Moritz Wunderwald: conceptualization, methodology, software, validation, visualization. Johannes Bullinger: validation, investigation, writing – original draft, writing – review & editing. Asena Boyadziheva: validation, investigation, writing – original draft, writing – review & editing. Lara Maister: methodology, software, validation. Birgit Elsner: conceptualization, supervision, project administration, funding acquisition. Manos Tsakiris: conceptualization, validation, writing – original draft, writing – review & editing, supervision, funding acquisition. Ezgi Kayhan: conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing – original draft, writing – review & editing, visualization, supervision, project administration, funding acquisition.

Supplemental Materials

A) Specification Curve Analysis

For the specification curve analysis, we followed the approach outlined in Simonsohn et al. (2020). First, we identified the subset of suitable analytical choices by reviewing all available papers that used a task similar to the iBEATs published so far (Charbonneau et al., 2022; Maister et al., 2017; Weijs et al., 2023). Building up, we extracted potential analytical decisions applicable to our dataset (Table 1, Table 2). Second, we ran all suitable analysis and plotted the results (Figure 1, Figure 2). Third, we used a permutation approach to investigate how inconsistent the obtained results were with the null hypothesis of no effect (Table 3). Next, we will discuss results of the specification curve analysis for the iBEATs and the iBREATH, respectively, and make recommendations for future data analysis for projects using iBEATs/iBREATH as well as preferential looking paradigms in general.

Specification Curve Analysis iBEATs

An overview over analytical choices can be found in Table 1. For the iBEATs we identified 7 different categories, yielding a total of 1024 potential analyses (Figure 2). We found that 458 (44.73%) analyses led to a significant effect. Most of these (442, 43.16%) yielded a significant effect for a mean synchronicity preference. However, there are also a few analysis paths (16, 1.56%) that we could have chosen that would have resulted in a mean preference for asynchronous stimuli.

Analytical decisions for the iBEATs.

Specification Curve Analysis iBREATH

As this is the first paper on respiratory interoceptive sensitivity in infants, for the iBREATH we adapted the choices made for the iBEATs paradigm. An overview can be found in table 2. The only difference between the iBEATs and the iBREATH specification curve analysis concerns artifact removal for the physiological data. For the iBEATs, there were 4 different categories, while for the iBREATH we identified 6 different categories. There were 1536 potential analyses for the iBREATH (Figure 3). We found that 269 (17.51%) analytical choices led to a significant effect. Further, all analyses rendering a significant effect revealed a mean synchronous preference.

Analytical decisions for the iBREATH.

Inference of the Specification Curve Analysis

Specification Curve Analysis for the iBEATs.

Note. Descriptive results from the specification curve analysis for the iBEATs task. Blue coloring in A) and B) refers to a significant result for a mean synchronous preference, while red color indicates to a significant result for a mean asynchronous preference (p < .05) for the specification and test. In A) standardized beta regression estimates are plotted. In B) an overview for a range of analytical choices is given. In C) analytical choices are further decomposed.

Specification Curve Analysis for the iBREATH.

Note. Descriptive results from the specification curve analysis for the iBREATH task. Blue coloring in A) and B) refers to a significant result for a mean synchronous preference, while red color indicates to a significant result for a mean asynchronous preference (p < .05) for the specification and test. In A) standardized beta regression estimates are plotted. In B) an overview for a range of analytical choices is given. In C) analytical choices are further decomposed.

B) Data Simulation for Different Sample Sizes

In our main analysis we have found that some of the tests investigating mean difference between conditions were not statistically significant. Follow-up analysis using equivalence tests and region of practical equivalence approaches showed that sample size might have played a role, as statistical power could have been too low to detect an effect. Here, we aimed at further investigating the absence of a significant effect for a mean group preference due to reduced statistical power in smaller samples. To investigate this hypothesis, we decided to run simulations building up on our data. Our aim was to characterize how statistical power to detect an effect is impacted by different levels of sample sizes. This is relevant as sample sizes in infancy research tend to be low, and all non-significant results reported so far for iBEATs-like paradigms had a sample size of roughly 30 infants (Weijs et al. 2023, the 18-month-old samples reported here). Further, the results of such simulations might be very informative for researchers planning to use experimental paradigms like iBEATs or iBREATH in infant samples in the future.

In a first step, we used the R-package “faux” to simulate data sets using the 9-month-old data from the iBEATs as input. The package “faux” simulates data that has the same properties as an existing data set. We simulated data sets ranging from 5 to 125 participants and generated 50 data sets for each number of participants, giving a total of 6000 datasets. We used the data sets processed according to our preregistration as input data set (n = 52, Cohen’s d = .48).

Next, we ran analyses with the generated data which are visualized in Figure 3. We computed a t-test following our preregistered analysis strategy comparing mean looking times for synchronous and asynchronous conditions for different sample sizes. In 3A) mean differences for the paired t-test (y-axis) are plotted against the sample size for the simulated data obtained building up on our 9-month-old sample. Red color refers to a significant result from the paired t-test, while blue color refers to a non-significant effect. In B) the percent of significant results for the paired t-tests (y-axis) are plotted against the sample size (x-axis).

The results from the simulation give us an idea about the chance to find a significant result given an effect of d = .48 in data sets with varying sample sizes. We observed that for the data building up on our 9-month-old sample, upon approaching a sample size of 50-60, 80-90% of results are significant. Interestingly, the proportion of significant results for a sample size of 30 are a little bit above 50%, indicating that with a sample size of 30 infants the chance to find a significant mean difference is roughly that of a coin flip. Thus, the absence of a significant result in samples of 30 infants might not necessarily indicate the absence of a mean group preference in general, but the sample size might not have been sufficient to detect a significant result.

Simulating data frames for sample sizes from 15 to 125 building up on the iBEATs data from the 9-month-olds.

Note. Results from the simulations. In A) mean effects and 95% confidence intervals are plotted for the different sample sizes. Red color indicates a significant effect, while blue indicates a non-significant result. In B) the percent of significant results are plotted with a fitted line.

C) Slow and Fast Asynchronous Trials

Asynchronous trials in the iBEATs and the iBREATH could be either faster or slower than the infant’s respective physiological signal. In an exploratory analysis, we computed t-tests to investigate whether looking times differed between slow and fast asynchronous trials for all age groups and paradigms (Table 4). Overall, we did not find evidence for a difference between looking times to fast and slow asynchronous trials.

Looking times for slow and fast asynchronous trials