Pipeline of detecting facial features of two marmosets.

(A) Six facial features of the marmoset (face frame) are color-coded: right tuft (red), central blaze (yellow), left tuft (green), right eye (purple), left eye (blue), and mouth (magenta). (B) Feature tracking pipeline (right) with the corresponding illustration for each step across two adjacent video frames (left). At the end of the pipeline, the facial features are clustered with the identities assigned consistently across frames. Facial points are color-coded as in A. (C) Example frames of four steps in the pipeline shown in B. It can be seen clearly that the facial points are tracked and clustered accurately, and the identities are consistent across frames.

3D Reconstruction of facial features and head gaze modeling.

(A) The face frames of two marmosets are reconstructed in 3D using the tracked facial points in Fig. 1 A. A cone perpendicular to the face frame (gaze cone; 10-degree solid angle) is modeled as the head gaze. (B) Two example frames with the facial points projected from 3D space onto different camera views are shown. The left frame demonstrates that the facial points can be detected using information from other cameras, even if the face is invisible from that viewpoint. (C) Trajectory of the reconstructed face frame and the corresponding gaze cones across time. (D) The histogram of the face norm velocity. The red dotted line (0.05) is the threshold below which the marmoset head direction is considered to be stationary.

Positional dynamics of marmoset dyads

(A) Movement trajectories of the face frame centroids for a marmoset pair (orange for female, cyan for male) in an example five-minute block. The heatmaps were calculated using the projections of the trajectories to XY, YZ, and XZ planes. (B) Marginal distributions of movement trajectories along the X and Z axes were calculated for all marmosets and grouped by familiarity and sex (transparent colors for familiar pairs, opaque colors for unfamiliar pairs). Black bars indicate significant differences between pairs of distributions (Mann-Whitney U test, significance level at 5%). (C) Histograms of social distance along the X axis show trimodal distributions for both familiar pairs (gray) and unfamiliar pairs (black). The top inset shows a schematic of the arena configuration with corresponding colors (orange for females and cyan for males). Social distance is defined as the Euclidean distance between the two marmosets. I he fitted red curve for the unfamiliar pairs is a tn-Gaussian distribution, while the fitted red curve for the familiar pairs is a mixture of Gamma and Gaussian distributions, with the first peak as the Gamma distribution. The three regions were designated as ‘Near’, Intermediate’, and ‘Far’. The bottom inset illustrates the reason for this nomenclature. (D) Temporal evolution of social distance for unfamiliar and familiar pairs (which each 5-minute viewing block). The central dark line is the mean and the shaded area is the standard deviation. Black dots indicate significant differences (Mann-Whitney U test, significance level at 5%).

Live interactive gaze analysis of unfamiliar and familiar marmoset dyads

(A) Gaze type categorized based on the relative positions of the gaze cones Joint gaze is defined as two marmosets looking at the same location. A partner gaze is defined as one animal looking at the other animal’s face (but not vice versa) No interaction occurs when the two gaze cones do not intersect (B) Histograms of gaze count as a function of inter-animal distance, shown separately for familiar and unfamiliar pairs (C) Same data as m B shown as pie charts of percentages in social gaze states (D) Gaze state transition diagrams for familiar and unfamiliar pairs The nodes ate the gaze states and the edges connecting the nodes represent the transition between states Edge colors indicate transition probabilities (E) Partner gaze self-transition probabilities for familiar and unfamiliar pairs (χ2 test) (F) Delta transition matrix between the unfamiliar pair and familiar pair state transition diagrams. Transitions that are significantly different across familiarity are marked by astensks (χ2 test, male to male, p < 0.05; female to male, p < 0.05; joint to joint p < 0.0001). (G) Left. The schematic illustrates hew gazing toward the surrounding region of a partner’s face area was measured Right Counts of gaze towards the surrounding region of the partner s face by familiarity and sex. (Mann-Whitney U test, **** means p < 0.001. **** means p < 0.0001)

Joint gaze analysis between familiar and unfamiliar dyads.

(A) Heatmaps of joint gaze locations projected onto 2D planes (From left to right: ‘XY’, ‘XZ’, ‘YZ’). The icon on the left show a schematic of a joint gaze along with the arena configuration with corresponding colors (orange for females and cyan for males). The colored rectangles superimposed on the heatmaps show the projections of the arenas onto those 2D planes. The white dotted line indicates the midline between the two arenas. (B) The probability of joint gaze at varying social distances shows distinct distributions for unfamiliar pairs (black) and familiar pairs (gray). The fitted red curve for the unfamiliar pairs is a lognormal distribution, while the fitted red curve for the familiar pairs is a tri-Gaussian distribution. Thin fitted red curves in both panels are the data from individual pairs. The inset shows a schematic of a joint gaze and the social distance metric. Social distance is defined as the Euclidean distance between the two marmosets

Experimental setup and reconstruction in 3D space.

(A) Two transparent acrylic arenas allowed marmosets to visually interact with each other. An opaque divider between the arenas was introduced intermittently to prevent visual access between animals. Two monitors on two ends were used to display video or image stimuli. Eight cameras surrounding the arenas ensured full coverage of both animals. LEDs at four positions were used to synchronize the video recordings across the set of cameras. (B) 3D reconstruction of the experimental setup. The two arenas are color-coded as orange and cyan. The cameras colored the same as the arenas indicate that they primarily record the marmoset in the corresponding arena. Two purple L-frames within the arenas were used to establish a world coordinate system in the reconstruction process. Two gray planes on both ends are the reconstructed monitors.

Gaze behavior analysis of a single marmoset viewing stimuli on the monitor.

(A) Gaze duration for video stimuli is significantly higher compared to image stimuli (Mann Whitney U Test, p < 0.001). (B) Left, Gaze dispersion is defined as the average distance between the centers of the intersection of the gaze cone and the monitor within a gaze epoch. There was no difference between the video and image stimuli for gaze dispersion (Mann Whitney U Test, ns). (C) Left, Illustration of four gaze epochs (gray bars) to repeated presentations of a stimulus and the gaze counts at different time points within the duration of the presentation. Marmosets have more gazes in the early period than the late period for the video stimuli (Mann Whitney U Test, p < 0.001), however, this is not the case for the image stimuli (Mann Whitney U Test, ns). During the early period, marmosets had significantly higher gaze counts for the video stimuli than the image stimuli (Mann Whitney U Test, p < 10^-10).