Author response:
The following is the authors’ response to the previous reviews.
eLife assessment
This important study explores infants' attention patterns in real-world settings using advanced protocols and cutting-edge methods. The presented evidence for the role of EEG theta power in infants' attention is currently incomplete. The study will be of interest to researchers working on the development and control of attention.
Public Reviews:
Reviewer #1 (Public Review):
Summary:
The paper investigates the physiological and neural processes that relate to infants' attention allocation in a naturalistic setting. Contrary to experimental paradigms that are usually employed in developmental research, this study investigates attention processes while letting the infants be free to play with three toys in the vicinity of their caregiver, which is closer to a common, everyday life context. The paper focuses on infants at 5 and 10 months of age and finds differences in what predicts attention allocation. At 5 months, attention episodes are shorter and their duration is predicted by autonomic arousal. At 10 months, attention episodes are longer, and their duration can be predicted by theta power. Moreover, theta power predicted the proportion of looking at the toys, as well as a decrease in arousal (heart rate). Overall, the authors conclude that attentional systems change across development, becoming more driven by cortical processes.
Strengths:
I enjoyed reading the paper, I am impressed with the level of detail of the analyses, and I am strongly in favour of the overall approach, which tries to move beyond in-lab settings. The collection of multiple sources of data (EEG, heart rate, looking behaviour) at two different ages (5 and 10 months) is a key strength of this paper. The original analyses, which build onto robust EEG preprocessing, are an additional feat that improves the overall value of the paper. The careful consideration of how theta power might change before, during, and in the prediction of attention episodes is especially remarkable. However, I have a few major concerns that I would like the authors to address, especially on the methodological side.
Points of improvement
(1) Noise
The first concern is the level of noise across age groups, periods of attention allocation, and metrics. Starting with EEG, I appreciate the analysis of noise reported in supplementary materials. The analysis focuses on a broad level (average noise in 5-month-olds vs 10-month-olds) but variations might be more fine-grained (for example, noise in 5mos might be due to fussiness and crying, while at 10 months it might be due to increased movements). More importantly, noise might even be the same across age groups, but correlated to other aspects of their behaviour (head or eye movements) that are directly related to the measures of interest. Is it possible that noise might co-vary with some of the behaviours of interest, thus leading to either spurious effects or false negatives? One way to address this issue would be for example to check if noise in the signal can predict attention episodes. If this is the case, noise should be added as a covariate in many of the analyses of this paper.
We thank the reviewer for this comment. We certainly have evidence that even the most state-of-the-art cleaning procedures (such as machine-learning trained ICA decompositions, as we applied here) are unable to remove eye movement artifact entirely from EEG data (Haresign et al., 2021; Phillips et al., 2023). (This applies to our data but also to others’ where confounding effects of eye movements are generally not considered.) Importantly, however, our analyses have been designed very carefully with this explicit challenge in mind. All of our analyses compare changes in the relationship between brain activity and attention as a function of age, and there is no evidence to suggest that different sources of noise (e.g. crying vs. movement) would associate differently with attention durations nor change their interactions with attention over developmental time. And figures 5 and 7, for example, both look at the relationship of EEG data at one moment in time to a child’s attention patterns hundreds or thousands of milliseconds before and after that moment, for which there is no possibility that head or eye movement artifact can have systematically influenced the results.
Moving onto the video coding, I see that inter-rater reliability was not very high. Is this due to the fine-grained nature of the coding (20ms)? Is it driven by differences in expertise among the two coders? Or because coding this fine-grained behaviour from video data is simply too difficult? The main dependent variable (looking duration) is extracted from the video coding, and I think the authors should be confident they are maximising measurement accuracy.
We appreciate the concern. To calculate IRR we used this function (Cardillo G. (2007) Cohen's kappa: compute the Cohen's kappa ratio on a square matrix. http://www.mathworks.com/matlabcentral/fileexchange/15365). Our “Observed agreement” was 0.7 (std= 0.15). However, we decided to report the Cohen's kappa coefficient, which is generally thought to be a more robust measure as it takes into account the agreement occurring by chance. We conducted the training meticulously (refer to response to Q6, R3), and we have confidence that our coders performed to the best of their abilities.
(2) Cross-correlation analyses
I would like to raise two issues here. The first is the potential problem of using auto-correlated variables as input for cross-correlations. I am not sure whether theta power was significantly autocorrelated. If it is, could it explain the cross-correlation result? The fact that the cross-correlation plots in Figure 6 peak at zero, and are significant (but lower) around zero, makes me think that it could be a consequence of periods around zero being autocorrelated. Relatedly: how does the fact that the significant lag includes zero, and a bit before, affect the interpretation of this effect?
Just to clarify this analysis, we did include a plot showing autocorrelation of theta activity in the original submission (Figs 7A and 7B in the revised paper). These indicate that theta shows little to no autocorrelation. And we can see no way in which this might have influenced our results. From their comments, the reviewer seems rather to be thinking of phasic changes in the autocorrelation, and whether the possibility that greater stability in theta during the time period around looks might have caused the cross-correlation result shown in 7E. Again though we can see no way in which this might be true, as the cross-correlation indicates that greater theta power is associated with a greater likelihood of looking, and this would not have been affected by changes in the autocorrelation.
A second issue with the cross-correlation analyses is the coding of the looking behaviour. If I understand correctly, if an infant looked for a full second at the same object, they would get a maximum score (e.g., 1) while if they looked at 500ms at the object and 500ms away from the object, they would receive a score of e.g., 0.5. However, if they looked at one object for 500ms and another object for 500ms, they would receive a maximum score (e.g., 1). The reason seems unclear to me because these are different attention episodes, but they would be treated as one. In addition, the authors also show that within an attentional episode theta power changes (for 10mos). What is the reason behind this scoring system? Wouldn't it be better to adjust by the number of attention switches, e.g., with the formula: looking-time/(1+N_switches), so that if infants looked for a full second, but made 1 switch from one object to the other, the score would be .5, thus reflecting that attention was terminated within that episode?
We appreciate this suggestion. This is something we did not consider, and we thank the reviewer for raising it. In response to their comment, we have now rerun the analyses using the new measure (looking-time/(1+N_switches), and we are reassured to find that the results remain highly consistent. Please see Author response image 1 below where you can see the original results in orange and the new measure in blue at 5 and 10 months.
Author response image 1.
(3) Clearer definitions of variables, constructs, and visualisations
The second issue is the overall clarity and systematicity of the paper. The concept of attention appears with many different names. Only in the abstract, it is described as attention control, attentional behaviours, attentiveness, attention durations, attention shifts and attention episode. More names are used elsewhere in the paper. Although some of them are indeed meant to describe different aspects, others are overlapping. As a consequence, the main results also become more difficult to grasp. For example, it is stated that autonomic arousal predicts attention, but it's harder to understand what specific aspect (duration of looking, disengagement, etc.) it is predictive of. Relatedly, the cognitive process under investigation (e.g., attention) and its operationalization (e.g., duration of consecutive looking toward a toy) are used interchangeably. I would want to see more demarcation between different concepts and between concepts and measurements.
We appreciate the comment and we have clarified the concepts and their operationalisation throughout the revised manuscript.
General Remarks
In general, the authors achieved their aim in that they successfully showed the relationship between looking behaviour (as a proxy of attention), autonomic arousal, and electrophysiology. Two aspects are especially interesting. First, the fact that at 5 months, autonomic arousal predicts the duration of subsequent attention episodes, but at 10 months this effect is not present. Conversely, at 10 months, theta power predicts the duration of looking episodes, but this effect is not present in 5-month-old infants. This pattern of results suggests that younger infants have less control over their attention, which mostly depends on their current state of arousal, but older infants have gained cortical control of their attention, which in turn impacts their looking behaviour and arousal.
We thank the reviewer for the close attention that they have paid to our manuscript, and for their insightful comments.
Reviewer #2 (Public Review):
Summary:
This manuscript explores infants' attention patterns in real-world settings and their relationship with autonomic arousal and EEG oscillations in the theta frequency band. The study included 5- and 10-month-old infants during free play. The results showed that the 5-month-old group exhibited a decline in HR forward-predicted attentional behaviors, while the 10-month-old group exhibited increased theta power following shifts in gaze, indicating the start of a new attention episode. Additionally, this increase in theta power predicted the duration of infants' looking behavior.
Strengths:
The study's strengths lie in its utilization of advanced protocols and cutting-edge techniques to assess infants' neural activity and autonomic arousal associated with their attention patterns, as well as the extensive data coding and processing. Overall, the findings have important theoretical implications for the development of infant attention.
Weaknesses:
Certain methodological procedures require further clarification, e.g., details on EEG data processing. Additionally, it would be beneficial to eliminate possible confounding factors and consider alternative interpretations, e,g., whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during the free play.
We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.
Reviewer #3 (Public Review):
Summary:
Much of the literature on attention has focused on static, non-contingent stimuli that can be easily controlled and replicated--a mismatch with the actual day-to-day deployment of attention. The same limitation is evident in the developmental literature, which is further hampered by infants' limited behavioral repertoires and the general difficulty in collecting robust and reliable data in the first year of life. The current study engages young infants as they play with age-appropriate toys, capturing visual attention, cardiac measures of arousal, and EEG-based metrics of cognitive processing. The authors find that the temporal relations between measures are different at age 5 months vs. age 10 months. In particular, at 5 months of age, cardiac arousal appears to precede attention, while at 10 months of age attention processes lead to shifts in neural markers of engagement, as captured in theta activity.
Strengths:
The study brings to the forefront sophisticated analytical and methodological techniques to bring greater validity to the work typically done in the research lab. By using measures in the moment, they can more closely link biological measures to actual behaviors and cognitive stages. Often, we are forced to capture these measures in separate contexts and then infer in-the-moment relations. The data and techniques provide insights for future research work.
Weaknesses:
The sample is relatively modest, although this is somewhat balanced by the sheer number of data points generated by the moment-to-moment analyses. In addition, the study is cross-sectional, so the data cannot capture true change over time. Larger samples, followed over time, will provide a stronger test for the robustness and reliability of the preliminary data noted here. Finally, while the method certainly provides for a more active and interactive infant in testing, we are a few steps removed from the complexity of daily life and social interactions.
We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.
Reviewer #1 (Recommendations For The Authors):
Here are some specific ways in which clarity can be improved:
A. Regarding the distinction between constructs, or measures and constructs:
i. In the results section, I would prefer to mention looking at duration and heart rate as metrics that have been measured, while in the introduction and discussion, a clear 1-to-1 link between construct/cognitive process and behavioural or (neuro)psychophysical measure can be made (e.g., sustained attention is measured via looking durations; autonomic arousal is measured via heart-rate).
The way attention and arousal were operationalised are now clarified throughout the text, especially in the results.
ii. Relatedly, the "attention" variable is not really measuring attention directly. It is rather measuring looking time (proportion of looking time to the toys?), which is the operationalisation, which is hypothesised to be related to attention (the construct/cognitive process). I would make the distinction between the two stronger.
This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).
B. Each analysis should be set out to address a specific hypothesis. I would rather see hypotheses in the introduction (without direct reference to the details of the models that were used), and how a specific relation between variables should follow from such hypotheses. This would also solve the issue that some analyses did not seem directly necessary to the main goal of the paper. For example:
i. Are ACF and survival probability analyses aimed at proving different points, or are they different analyses to prove the same point? Consider either making clearer how they differ or moving one to supplementary materials.
We clarified this in pg. 4 of the revised manuscript.
ii. The autocorrelation results are not mentioned in the introduction. Are they aiming to show that the variables can be used for cross-correlation? Please clarify their role or remove them.
We clarified this in pg. 4 of the revised manuscript.
C. Clarity of cross-correlation figures. To ensure clarity when presenting a cross-correlation plot, it's important to provide information on the lead-lag relationships and which variable is considered X and which is Y. This could be done by labelling the axes more clearly (e.g., the left-hand side of the - axis specifies x leads y, right hand specifies y leads x) or adding a legend (e.g., dashed line indicates x leading y, solid line indicates y leading x). Finally, the limits of the x-axis are consistent across plots, but the limits of the y-axis differ, which makes it harder to visually compare the different plots. More broadly, the plots could have clearer labels, and their resolution could also be improved.
This information on what variable precedes/ follows was in the caption of the figures. However, we have edited the figures as per the reviewer’s suggestion and added this information in the figures themselves. We have also uploaded all the figures in higher resolution.
D. Figure 7 was extremely helpful for understanding the paper, and I would rather have it as Figure 1 in the introduction.
We have moved figure 7 to figure 1 as per this request.
E. Statistics should always be reported, and effects should always be described. For example, results of autocorrelation are not reported, and from the plot, it is also not clear if the effects are significant (the caption states that red dots indicate significance, but there are no red dots. Does this mean there is no autocorrelation?).
We apologise – this was hard to read in the original. We have clarified that there is no autocorrelation present in Fig 7A and 7D.
And if so, given that theta is a wave, how is it possible that there is no autocorrelation (connected to point 1)?
We thank the reviewer for raising this point. In fact, theta power is looking at oscillatory activity in the EEG within the 3-6Hz window (i.e. 3 to 6 oscillations per second). Whereas we were analysing the autocorrelation in the EEG data by looking at changes in theta power between consecutive 1 second long windows. To say that there is no autocorrelation in the data means that, if there is more 3-6Hz activity within one particular 1-second window, there tends not to be significantly more 3-6Hz activity within the 1-second windows immediately before and after.
F. Alpha power is introduced later on, and in the discussion, it is mentioned that the effects that were found go against the authors' expectations. However, alpha power and the authors' expectations about it are not mentioned in the introduction.
We thank the reviewer for this comment. We have added a paragraph on alpha in the introduction (pg.4).
Minor points:
1. At the end of 1st page of introduction, the authors state that:
“How children allocate their attention in experimenter-controlled, screen-based lab tasks differs, however, from actual real-world attention in several ways (32-34). For example, the real-world is interactive and manipulable, and so how we interact with the world determines what information we, in turn, receive from it: experiences generate behaviours (35).”
I think there's more to this though - Lab-based studies can be made interactive too (e.g., Meyer et al., 2023, Stahl & Feigenson, 2015). What remains unexplored is how infants actively and freely initiate and self-structure their attention, rather than how they respond to experimental manipulations.
Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.
Stahl, A. E., & Feigenson, L. (2015). Observing the unexpected enhances infants' learning and exploration. Science, 348(6230), 91-94.
We thank the reviewer for this suggestion and added their point in pg. 4.
(2) Regarding analysis 4:
a. In analysis 1 you showed that the duration of attentional episodes changes with age. Is it fair to keep the same start, middle, and termination ranges across age groups? Is 3-4 seconds "middle" for 5-month-olds?
We appreciate the comment. There are many ways we could have run these analyses and, in fact, in other papers we have done it differently, for example by splitting each look in 3, irrespective of its duration (Phillips et al., 2023).
However, one aspect we took into account was the observation that 5-month-old infants exhibited more shorter looks compared to older infants. We recognized that dividing each into 3 parts, regardless of its duration, might have impacted the results. Presumably, the activity during the middle and termination phases of a 1.5-second look differs from that of a look lasting over 7 seconds.
Two additional factors that provided us with confidence in our approach were: 1) while the definition of "middle" was somewhat arbitrary, it allowed us to maintain consistency in our analyses across different age points. And, 2) we obtained a comparable amount of observations across the two time points (e.g. “middle” at 5 months we had 172 events at 5 months, and 194 events at 10 months).
b. It is recommended not to interpret lower-level interactions if more complex interactions are not significant. How are the interaction effects in a simpler model in which the 3-way interaction is removed?
We appreciate the comment. We tried to follow the same steps as in (Xie et al., 2018). However, we have re-analysed the data removing the 3-way interaction and the significance of the results stayed the same. Please see Author response image 2 below (first: new analyses without the 3-way interactions, second: original analyses that included the 3-way interaction).
Author response image 2.
(3) Figure S1: there seems to be an outlier in the bottom-right panel. Do results hold excluding it?
We re-run these analyses as per this suggestion and the results stayed the same (refer to SM pg. 2).
(4) Figure S2 should refer to 10 months instead of 12.
We thank the reviewer for noticing this typo, we have changed it in the reviewed manuscript (see SM pg. 3).
(5) In the 2nd paragraph of the discussion, I found this sentence unclear: "From Analysis 1 we found that infants at both ages showed a preferred modal reorientation rate".
We clarified this in the reviewed manuscript in pg10
(6) Discussion: many (infant) studies have used theta in anticipation of receiving information (Begus et al., 2016) surprising events (Meyer et al., 2023), and especially exploration (Begus et al., 2015). Can you make a broader point on how these findings inform our interpretation of theta in the infant population (go more from description to underlying mechanisms)?
We have extended on this point on interpreting frequency bands in pg13 of the reviewed manuscript and thank the reviewer for bringing it up.
Begus, K., Gliga, T., & Southgate, V. (2016). Infants' preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397-12402.
Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.
Begus, K., Southgate, V., & Gliga, T. (2015). Neural mechanisms of infant learning: differences in frontal theta activity during object exploration modulate subsequent object recognition. Biology letters, 11(5), 20150041.
(7) 2nd page of discussion, last paragraph: "preferred modal reorientation timer" is not a neural/cognitive mechanism, just a resulting behaviour.
We agree with this comment and thank the reviewer for bringing it out to our attention. We clarified this in in pg12 and pg13 of the reviewed manuscript.
Reviewer #2 (Recommendations For The Authors):
I have a few comments and questions that I think the authors should consider addressing in a revised version. Please see below:
(1) During preprocessing (steps 5 and 6), it seems like the "noisy channels" were rejected using the pop_rejchan.m function and then interpolated. This procedure is common in infant EEG analysis, but a concern arises: was there no upper limit for channel interpolation? Did the authors still perform bad channel interpolation even when more than 30% or 40% of the channels were identified as "bad" at the beginning with the continuous data?
We did state in the original manuscript that “participants with fewer than 30% channels interpolated at 5 months and 25% at 10 months made it to the final step (ICA) and final analyses”. In the revised version we have re-written this section in order to make this more clear (pg. 17).
(2) I am also perplexed about the sequencing of the ICA pruning step. If the intention of ICA pruning is to eliminate artificial components, would it be more logical to perform this procedure before the conventional artifacts' rejection (i.e., step 7), rather than after? In addition, what was the methodology employed by the authors to identify the artificial ICA components? Was it done through manual visual inspection or utilizing specific toolboxes?
We agree that the ICA is often run before, however, the decision to reject continuous data prior to ICA was to remove the very worst sections of data (where almost all channels were affected), which can arise during times when infants fuss or pull the caps. Thus, this step was applied at this point in the pipeline so that these sections of really bad data were not inputted into the ICA. This is fairly widespread practice in cleaning infant data.
Concerning the reviewer’s second question, of how ICA components were removed – the answer to this is described in considerable detail in the paper that we refer to in that setion of the manuscript. This was done by training a classifier specially designed to clean naturalistic infant EEG data (Haresign et al., 2021) and has since been employed in similar studies (e.g. Georgieva et al., 2020; Phillips et al., 2023).
(3) Please clarify how the relative power was calculated for the theta (3-6Hz) and alpha (6-9Hz) bands. Were they calculated by dividing the ratio of theta or alpha power to the power between 3 and 9Hz, or the total power between 1 (or 3) and 20 Hz? In other words, what does the term "all frequency bands" refer to in section 4.3.7?
We thank the reviewer for this comment, we have now clarified this in pg. 22.
(4) One of the key discoveries presented in this paper is the observation that attention shifts are accompanied by a subsequent enhancement in theta band power shortly after the shifts occur. Is it possible that this effect or alteration might be linked to infants' saccades, which are used as indicators of attention shifts? Would it be feasible to analyze the disparities in amplitude between the left and right frontal electrodes (e.g., Fp1 and Fp2, which could be viewed as virtual horizontal EOG channels) in relation to theta band power, in order to eliminate the possibility that the augmentation of theta power was attributable to the intensity of the saccades?
We appreciate the concern. Average saccade duration in infants is about 40ms (Garbutt et al., 2007). Our finding that the positive cross-correlation between theta and look duration is present not only when we examine zero-lag data but also when we examine how theta forwards-predicts attention 1-2 seconds afterwards seems therefore unlikely to be directly attributable to saccade-related artifact. Concerning the reviewer’s suggestion – this is something that we have tried in the past. Unfortunately, however, our experience is that identifying saccades based on the disparity between Fp1 and Fp2 is much too unreliable to be of any use in analysing data. Even if specially positioned HEOG electrodes are used, we still find the saccade detection to be insufficiently reliable. In ongoing work we are tracking eye movements separately, in order to be able to address this point more satisfactorily.
(5) The following question is related to my previous comment. Why is the duration of the relationship between theta power and moment-to-moment changes in attention so short? If theta is indeed associated with attention and information processing, shouldn't the relationship between the two variables strengthen as the attention episode progresses? Given that the authors themselves suggest that "One possible interpretation of this is that neural activity associates with the maintenance more than the initiation of attentional behaviors," it raises the question of (is in contradiction to) why the duration of the relationship is not longer but declines drastically (Figure 6).
We thank the reviewer for raising this excellent point. Certainly we argue that this, together with the low autocorrelation values for theta documented in Fig 7A and 7D challenge many conventional ways of interpreting theta. We are continuing to investigate this question in ongoing work.
(6) Have the authors conducted a comparison of alpha relative power and HR deceleration durations between 5 and 10-month-old infants? This analysis could provide insights into whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during free play.
We thank the reviewer for this suggestion. Indeed, this is an aspect we investigated but ultimately, given that our primary emphasis was on the theta frequency, and considering the length of the manuscript, we decided not to incorporate. However, we attached Author response image 3 below showing there was no significant interaction between HR and alpha band.
Author response image 3.
Reviewer #3 (Recommendations For The Authors):
(1) In reading the manuscript, the language used seems to imply longitudinal data or at the very least the ability to detect change or maturation. Given the cross-sectional nature of the data, the language should be tempered throughout. The data are illustrative but not definitive.
We thank the reviewer for this comment. We have now clarified that “Data was analysed in a cross-sectional manner” in pg15.
(2) The sample size is quite modest, particularly in the specific age groups. This is likely tempered by the sheer number of data points available. This latter argument is implied in the text, but not as explicitly noted. (However, I may have missed this as the text is quite dense). I think more notice is needed on the reliability and stability of the findings given the sample.
We have clarified this in pg16.
(3) On a related note, how was the sample size determined? Was there a power analysis to help guide decision-making for both recruitment and choosing which analyses to proceed with? Again, the analytic approach is quite sophisticated and the questions are of central interest to researchers, but I was left feeling maybe these two aspects of the study were out-sprinting the available data. The general impression is that the sample is small, but it is not until looking at table s7, that it is in full relief. I think this should be more prominent in the main body of the study.
We have clarified this in pg16.
(4) The devotes a few sentences to the relation between looking and attention. However, this distinction is central to the design of the study, and any philosophical differences regarding what take-away points can be generated. In my reading, I think this point needs to be more heavily interrogated.
This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).
(5) I would temper the real-world attention language. This study is certainly a great step forward, relative to static faces on a computer screen. However, there are still a great number of artificial constraints that have been added. That is not to say that the constraints are bad--they are necessary to carry out the work. However, it should be acknowledged that it constrains the external validity.
We have added a paragraph to acknowledged limitations of the setup in pg. 14.
(6) The kappa on the coding is not strong. The authors chose to proceed nonetheless. Given that, I think more information is needed on how coders were trained, how they were standardized, and what parameters were used to decide they were ready to code independently. Again, with the sample size and the kappa presented, I think more discussion is needed regarding the robustness of the findings.
We appreciate the concern. As per our answer to R1, we chose to report the most stringent calculator of inter-rater reliability, but other calculation methods (i.e., percent agreement) return higher scores (see response to R1).
As per the training, we wrote an extensively detailed coding scheme describing exactly how to code each look that was handed to our coders. Throughout the initial months of training, we meet with the coders on a weekly basis to discuss questions and individual frames that looked ambiguous. After each session, we would revise the coding scheme to incorporate additional details, aiming to make the coding process progressively less subjective. During this period, every coder analysed the same interactions, and inter-rater reliability (IRR) was assessed weekly, comparing their evaluations with mine (Marta). With time, the coders had fewer questions and IRR increased. At that point, we deemed them sufficiently trained, and began assigning them different interactions from each other. Periodically, though, we all assessed the same interaction and meet to review and discuss our coding outputs.