Introduction

The way in which we recognize objects and people is a central but as yet unresolved question in vision sciences. Our understanding of visual perception is still limited by our ability to account for how we recognize objects and people despite the sometimes radically different images they project onto the retinae, due to varying lighting, distance, depth rotation, etc (DiCarlo & Cox, 2007). We experience faces under a broad range of depth rotations mainly along the x axis, i.e., from left to right profile (i.e., yaw; Favelle & Palmisano, 2012) due to biomechanical constrains favoring x-axis head rotations and the vantage point on our conspecifics’ faces also varying more along the x than the y axis (i.e., when moving around people). While rotating in depth, a given face projects retinal images that are more dissimilar than the ones that different faces under the same viewpoint would project (Adini et al., 1997; Burton et al., 2016; Hill et al., 1997). Yet humans generally have no difficulty in recognizing a familiar face across views, which implies the joint ability to differentiate its identity from others and to generalize it from one view to another, i.e., with tolerance to variations (Ritchie & Burton, 2017).

The tolerance of face identity recognition culminates when faces are familiar to the observer (e.g., Favelle & Palmisano, 2018; Hancock et al., 2000; Hill et al., 1997; Jeffery et al., 2006; Johnston & Edmonds, 2009; Jones et al., 2017; Liu, 2002; Newell et al., 1999; O’Toole et al., 1998, 1999; Troje & Bulthoff, 1996), suggesting that a core determinant of tolerant recognition is the repeated exposure to the natural statistics of a person’s face (e.g., to the variability of a person appearance across different views; Tian & Grill-Spector, 2015a; Troje & Bulthoff, 1996; Van Meel & Op de Beeck, 2018; Wallis & Bulthoff, 2001). Such statistical learning is assumed to be the main unsupervised learning route for tolerant face/object recognition in humans and animals (DiCarlo & Cox, 2007b; Fiser & Aslin, 2002; Hauser et al., 2001; Huber et al., 2023; Li & DiCarlo, 2008, 2010, 2012; Tian & Grill-Spector, 2015b). Since exposure to the natural variations of a given person is typically prolonged (several seconds), it has been proposed that temporal contiguity contributes to the generation of a multi-view representation of identity. The importance of temporal contiguity in identity recognition is supported by findings that human observers tend to confound identities if learnt in the same temporal sequence of views (Li & DiCarlo, 2010; Miyashita, 1993; Pitts & McCulloch, 1947; Van Meel & Op De Beeck, 2018; Wallis et al., 2009; Wallis & Bulthoff, 1999). Other findings suggest that temporal contiguity is not necessary and that the important contributing factor is the number of learnt views, be them seen randomly or in sequence (T. Liu, 2007; Tian & Grill-Spector, 2015b). Yet, since the different views of a given person are usually seen in temporal contiguity in natural viewing, it seems likely that temporal contiguity contributes to statistical learning of face identity.

While some attention has been devoted to the contribution of temporal contiguity in view-tolerant recognition, the spatial aspects of the natural statistics supporting view-tolerant face identity recognition are still largely elusive. Face appearance results from the complex interaction between extrinsic viewing conditions and the intrinsic 3D shape and reflectance properties of the face (determined e.g., by viewpoint and lighting; Favelle et al., 2017; Hill & Bruce, 1996; Liu et al., 2000). A study by Hill et al. (1997) suggested that shape is the primary determinant of view-tolerant recognition, and that surface cues such as reflectance and texture contribute less. Burton and colleagues (Burton et al., 2005; Burton, 2013) suggested that as exposure to multiple appearances of a person increases, the accidental, irrelevant variations would be progressively whitened (i.e., averaged out) and reveal the stable cues to identity. Figure 1 simulates such averaging using the pictures of a given identity taken from drastically variable poses, as experienced in natural viewing. It can be seen that additionally to the whitening of accidental properties, the resulting average contains a strong horizontal structure; namely it results in the emergence of the so-called (horizontal) bar code of the face (Dakin & Watt, 2009). We, as others, proposed that the horizontal content of the face stimulus may drive the visual mechanisms engaged for the view-tolerant recognition of face identity (Caldara & Seghier, 2009; Dakin & Watt, 2009; Gilad-Gutnick et al., 2018; Goffaux, 2008; Goffaux & Dakin, 2010). Several lines of empirical evidence indicate that the horizontally-oriented face information conveys optimal cues to identity. First, the visual mechanisms specialized for the identity recognition of faces’ frontal view were found to rely preferentially on the horizontal structure of the face image, indicating a better sensitivity to identity in the horizontal range of face information (e.g., Balas et al., 2015; Dumont et al., 2024; Goffaux, 2019; Goffaux & Dakin, 2010; Pachai et al., 2013, 2018). The horizontal dependence of face identity recognition has also been shown to predict face identification accuracy at the individual observer level (Duncan et al., 2019; M. V. Pachai et al., 2013). Furthermore, there is indirect evidence that horizontal face information may optimally drive the tolerance of face identity recognition. For example, while the horizontal dependence manifests from three months of age (De Heering et al., 2016), it strengthens over the lifespan as individuals accumulate experience with the natural statistics of face appearance (Goffaux et al., 2015). Moreover, and considering that the recognition of familiar faces differs from unfamiliar face recognition by its stronger tolerance to retinal variations due to e.g., a change in view (Bruce et al., 1999; Burton et al., 1999, 2010; Hill & Bruce, 1996; Megreya & Burton, 2006; O’Toole et al., 1998; Ramon, 2015b), the finding by Pachai and colleagues that the recognition of a face from % to full-frontal view increasingly relies on horizontal information as a function of familiarity supports the notion that not only the accuracy but also the tolerance of identity recognition most crucially depends on horizontal face cues (Pachai et al., 2017; see also Ramon, 2015a). In one of the Goffaux and Dakin (2010)’s experiments, the matching of unfamiliar faces from frontal to % view was more largely disrupted by horizontal than vertical noise masking. Yet, whether this effect reflects view-tolerant recognition or whether it is primarily due to the encoding of the frontal view probe is unclear. The important question of whether the horizontal range of face information is a privileged informational avenue for view-tolerant identity recognition is therefore still unanswered.

Graphic illustration of the horizontal smearing occurring when averaging multiple views of a face.

The tolerance of human face identity recognition to drastic appearance variations caused by varying lighting, viewpoint, facial expression, etc. has been proposed to emerge thanks to an averaging mechanism that would progressively whiten these accidental variations and preserve the stable identity cues to identity. Past illustrations (Burton, 2013) used varying lighting and expressions but moderate pose variations. Here we show that when pose varies from left to right profile, averaging results in horizontally smeared face cues suggesting that across encounters with a face, cues at most orientations except horizontal are whitened. As the observer learns the natural statistics of a person’s face, it seems plausible that they increasingly rely on horizontal cues for identity recognition (see Pachai et al., 2017 for supporting evidence). Images of two celebrities (George Clooney and Daniel Ratclife) were sampled from the internet and sorted into three view categories: frontal, left-, and right-averted. In order to illustrate the horizontal smearing due to view variations, the averages were made of 40% left-averted, 40% right-averted, and 20% frontal views. The luminance and RMS contrast of the averaged faces were set to a luminance of .5 and contrast of .4. Using this procedure, one can appreciate the emergence of the so-called bar code, namely the vertical arrangement of horizontally-oriented cues typical of the face at basic and individual levels of categorization (Dakin & Watt, 2009).

The present study aims at directly investigating the hypothesis that the tolerance of human face identity recognition is supported by the horizontal range of face information. We familiarized a sample of human observers with the multiple views of a set of face identities. In an old-new recognition task involving face stimuli presented in various views, we demonstrated that humans stay broadly tuned to the horizontal range of face information irrespective of yaw, with a stronger tuning observed for frontal views. We used a model observer approach to define the information physically available in the stimulus for the matching of face identity within a given viewpoint and across different viewpoints. A model observer is a basic image processing algorithm that cross-correlates a target image with probe images at the pixel level, and selects the probe with the highest correlation, i.e., the most likely match. Combined with orientation filtering, this method provides a formal way to describe how the information most useful for matching identity is distributed in the orientation domain of the face image (e.g., Collin et al., 2014 for a similar approach in the spatial frequency domain).

We tested two model observers on the same multiple views of face identities as used in the human old-new recognition task: a view-selective model observer, which matched identities within the same view (e.g., matching a profile view of identity A with profile views of all identities), and a view-tolerant model observer, which matched identities across different views (e.g., matching a profile view of identity A with the other views of all identities). This approach revealed a substantial difference in the orientation distribution of view-specific and view-tolerant information. The view-selective model indicated that, at the image level, face identity is conveyed by a broad view-dependent spectrum of orientations: identity cues were distributed in the horizontal range at frontal views (in line with Pachai et al., 2013) but shifted to the vertical range as the face turned to profile. In contrast, the view-tolerant model indicated that the horizontal structure of the face stimulus provides the most useful cues for matching identity across yaws. Partial correlations between model and human observer performance suggest that the horizontal tuning of human face identity recognition is due to the high diagnosticity of horizontal information at frontal view and the stability of the horizontal identity cues across views.

Methods

Subjects

Twenty-two healthy young adults took part in this experiment in exchange for monetary compensation (8 euros per hour of testing). They were 14 females and 8 males, aged 23.5 (± 3.4) on average (4 were left-handed), recruited via Facebook advertisement. They received a written description of the experiment protocol and gave their written informed consent. Participants had normal vision as verified by a Landolt C acuity test (conducted using FRACT; Bach, 1996). Participants wore optical corrections when necessary. The experimental protocol was approved by the local ethical committee (Psychological Sciences Research Institute, UCLouvain).

Stimuli

We selected 30 face identities (15 male, 15 female) from the 3D laser-scanned face database of the Max-Planck Institute for Biological Cybernetics (Tuebingen, Germany; Troje & Bulthoff, 1996). Faces were viewed under seven different viewpoints ranging from −75° to +75° in steps of 25° (see Figure 2). We first converted all images into grayscales and resized them so that all faces subtended a height of 210 pixels. All images were padded into 400x400 pixels gray canvas and alpha-blended with a viewspecific aperture designed to cover the hair and neck of the average of all face images at a given viewpoint (using Adobe Photoshop).

Stimulus conditions.

Columns. Each identity was viewed from seven different viewpoints ranging from +75° to −75° in steps of 25°. Rows. All images were filtered in the Fourier domain to preserve only a selective range of orientation, from 0° (vertical) to 157.5° in steps of 22.5°.

Next, images were normalized to obtain a mean luminance of 0 and root-mean square (RMS) contrast of 1 and submitted to a fast two-dimensional Fourier transform to manipulate orientation content. Since manipulations in the Fourier domain apply to the whole image, they encompass both the face and background pixels. When the image of a face on a plain background is filtered in the Fourier orientation domain, energy belonging to the face therefore smears to the background and vice-versa, resulting in an oriented halo. To minimize this smearing, we applied an iterative phase-scrambling procedure (as in Canoluk et al., 2023; Petras et al., 2019; Roux-Sibilon et al., 2023; Schuurmans et al., 2023), which consists in iteratively phase-scrambling the image, pasting the original face pixels, and phase-scrambling again (50 iterations). By making the power spectra of the face and background pixels more comparable, this procedure minimizes smearing during orientation filtering. The amplitude spectra of the resulting images were then multiplied with wrapped Gaussian filters (standard deviation of 20°) centered on orientations ranging from 0° (vertical) to 157.5° in steps of 22.5°. After inverse-Fourier transform, the filtered images were combined with the view-specific aperture.

In all images, the luminance and RMS contrast of the face pixels were fixed to 0.55 and 0.15, respectively, and background pixels were uniformly set to 0.55. The percentage of clipped pixel values (below 0 or above 1) per image did not exceed 3%. We used custom-written scripts in Matlab 2014a (Mathworks Inc, Natick, MA) for stimulus preparation.

Stimuli were displayed against a grey background (0.55 luminance across RGB channels) at a viewing distance of57 cm on a Viewpixx monitor (VPixx Technologies Inc., Saint-Bruno, Canada) with a resolution of 1920 x 1080 pixels and a 70Hz refresh rate using PsychToolbox (Brainard, 1997). With this display, face area subtended 5° (width) by 8° (height) of visual angle (approximately corresponding to conversational distance). At the start of the experiment lighting was switched off and the testing area of the lab was closed off separately with light-draining black curtains.

Procedure

Face identification performance was measured with an old/new recognition task. Faces were presented one by one at screen center and participants were instructed to determine for each of them whether it was a face with which they had been a priori familiarized (‘old’ face) or not (‘new’ face). They answered using a button box by pressing ‘1’ for ‘old’ and ‘2’ for ‘new’. Out of the 30 face identities used in the main experiment, participants were familiarized with 5 female and 5 male faces. The familiar identities were randomly sampled from the stimulus set on the first visit of the participant and were kept identical throughout the experiment.

In the initial session, we familiarized each participant with their selection of to-be-learnt ‘old’ faces, presented in their full spectrum version. To engage participants in the face learning procedure, they were asked to remember a face-name pairing, although during the main experiment they were only asked to judge whether the face was seen during familiarization (‘old’) or new (i.e., incidental learning method as in Liu and Chaudhuri, 2002). Each identity was first shown centrally rotating from left profile (+75°) to right profile (-75°) pseudo-dynamically along with its assigned first name on top of the screen (400ms per frame). The various views of the given face (from +75° to −75° in steps of 25°, i.e. seven views) were then presented one by one for 400ms. After each identity presentation, a recap screen appeared with all the faces learned in previous trials, clustered by identity and shown under the various viewpoints side by side. All learnt identities were randomly presented one-by-one in the second familiarization phase, each under one of the seven viewpoints. The face appeared in isolation for 1000ms. Then the name associated with the face was added and both name and face were shown together for another 2000ms.

Participants were then evaluated on their ability to name the learnt so-called ‘old’ faces at various viewpoints (+75° to −75°, in steps of 25°). A trial started with a 500ms fixation, then the stimulus was presented at screen center along with the name options (the five names of the ‘old’ faces of the same gender of the shown face), numbered from 1 to 5, at the top of the screen until participant response (maximum of 3000ms). Participants responded by pressing on the corresponding key (1 to 5). The fixation turned green in case of a correct response. When incorrect, the fixation turned red. In both cases, the correct name appeared along with feedback. Feedback lasted for 500ms. Familiarization and test were looped until naming accuracy reached 80 %.

The main experiment was divided into 32 runs of 40 trials. A run started with recap screens for all learned identities under the seven viewpoints (from +75° to −75° in 25° steps) along with their associated name. In each main experiment trial, a face stimulus was presented centrally at a specified viewpoint and orientation range (from 0° to 157.5° in 22.5° steps). Viewpoint and orientation range varied randomly from one trial to the other. A trial started with a 1s fixation, next the face stimulus appeared until response or for 3s maximum. At the end of a run, participants received feedback on their average accuracy. To avoid inducing a response bias, there were as many ‘old’ as ‘new’ trials, making the 10 ‘old’ faces twice as frequent as the 20 ‘new’ ones, which were each presented only once per condition. There were 40 trials per condition and a total of 1280 trials.

Participants first practiced the main experimental task on 20 randomly selected trials with full spectrum stimuli. If they reached 80% accuracy, they could start the main experiment with the filtered stimuli. If practice accuracy was lower than 80%, participants were invited to run the familiarization (learning and test) again. Whenever accuracy in an experiment run dropped below 55% correct, and every third run regardless of performance, participants were presented with the recap screens again. All through the experiment, they were encouraged to respond as accurately and rapidly as possible.

The total experiment lasted for 1.5 hours on average, split into three testing sessions.

Human data analysis

To prevent outlier responses from contaminating the results, we applied a log10 transformation on response latencies at the trial level and excluded trials with latencies at more than 2.5 times the standard deviation above and below the individual mean, in each participant. This procedure resulted in the exclusion of 1.81% of the trials on average.

In order to estimate the orientation dependence of human sensitivity to identity across views, we derived individual d’ at each viewpoint and orientation. To do so, we determined hit and false alarm rates (Tanner & Birdsall, 1958) from the ‘old’/’new’ response in each participant, at each orientation and for each viewpoint. Following the log-linear rule (Hautus, 1995), we added a 0.5 correction to both before calculating the z scores. Performance at the 0° filter was duplicated to have circular filter values from 0° to 180°. As expected from previous studies (Dumont et al., 2024; Goffaux & Greenwood, 2016; M. V. Pachai et al., 2018), the d’ plotted as a function of orientation in the frontal view condition (yaw = 0°) depicted a bell-shaped function centred on horizontal orientation (i.e., 90°; see Figure 3A, central panel). At the other viewpoints, d’ followed a very similar shape, with maximum sensitivity roughly centred on horizontal orientation. Therefore, human sensitivity data could be fitted using a simple Gaussian model. The Gaussian model is defined as:

A. Sensitivity of human observers to facial identity (d’) as a function of the orientation filter (0° to 180° in 22.5° steps) and face viewpoint (yaw: +75° to −75° in 25° steps). Dots and error bars represent mean d’ values and 95% confidence intervals across participants. Solid lines and shaded areas indicate the mean posterior predictions and 95% credible intervals from the Gaussian Bayesian multilevel model. B. Population-level mean parameters of the Gaussian Bayesian Multilevel model, plotted with 95% credible intervals as a function of face viewpoint. The 95% credible intervals reflect the uncertainty of the model and can be interpreted as follows: there is a 95% chance that the value of the population parameter lies within this interval.

with orientation being the orientation of the filter, ranging from 0 to 180°. The model estimates four free parameters. The Peak Location is the orientation on which the gaussian curve is centred. The Standard Deviation is the width of the Gaussian curve (i.e., strength of the tuning). The Base Amplitude is the height of the gaussian curve base (i.e., the minimum sensitivity, typically found near vertical orientations). Peak Amplitude refers to the height of the Gaussian curve relative to its baseline — that is, it reflects the advantage of horizontal over vertical orientations.

We used the package brms (Burkner, 2018) in R to implement this model in the Bayesian framework, using a multilevel modelling approach (Dumont et al., 2024; Moors et al., 2020; Nalborczyk et al., 2019). The four parameters Peak Location, Standard Deviation, Base Amplitude, and Peak Amplitude were conjointly estimated by linear predictor terms which included an intercept and the effect of Viewpoint. We also estimated a subject-level intercept of Standard Deviation and Base Amplitude as random effects, to allow the shape of the gaussian to vary across participants. This multilevel structure provides a more accurate estimation of population-level parameters by accounting for subject variability. The prior distributions of the different parameters of the model were specified based on a compromise between (1) using knowledge from previous research (e.g., we used a normal distribution centred on horizontal orientation – 90° – for Peak Location, in line with the well-established horizontal tuning of face identification), (2) keeping unbiased uncertainty when previous research was not informative (e.g., for the effect of Viewpoint on the different parameters), and (3) allowing the convergence of the model. The details of the prior distributions can be found in Supplementary Table 1. We ran four Markov Chain Monte-Carlo simulations, with 20,000 iterations including 3,000 warm-up iterations.

Posterior mean and 95% credible interval for each parameter of the Gaussian model, at each viewpoint.

Model diagnostics of the model were checked and indicated a good convergence: The potential scale reduction factor (R-hat) was of 1.00 for all parameters, the Bulk Effective Sample Size (ESS) was superior to 10.000 for all but four parameters (Intercept of the Base Amplitude: Bulk ESS = 5504; Intercept of the Peak Amplitude: Bulk ESS = 8494; Subject-level random effect of the Standard Deviation: Bulk ESS = 5965; Subject-level random effect of the Base Amplitude: Bulk ESS = 6308), and the Tail ESS was superior to 10.000 for all but one parameters (Subject-level random effect of the Standard Deviation: Bulk ESS = 5321).

Image analysis

We investigated how the distribution of energy across orientations in the experimental stimuli varied as a function of viewpoint. The amplitude and phase spectra for the image of each full spectrum face were obtained by means of a two-dimensional fast Fourier Transform and multiplied with wrapped Gaussian filters, with peak orientations centred on orientation values from 0° to 157.5° in steps of 22.5° (20° standard deviation; as in Goffaux & Greenwood, 2016) and with peak spatial frequencies ranging from 1 to 200 cycles per image in 20 logarithmic steps. Amplitude values within each spatial frequency and orientation bin were squared and summed, then averaged across spatial frequencies per orientation bin.

Note that because the Fourier transform represents image energy in a discrete manner, energy at the lowest spatial frequency components can only be reliably sampled at the main cardinal and oblique ranges (i.e., 0°, 45°, 90°, 135°) especially when very narrow orientation filters are used (Hansen & Essock, 2004). However, because orientation filters were broad and we interpret relative differences, and not absolute values, of energy distribution across viewpoints, the influence of this issue is minimal. We obtained one mean energy value per orientation band by averaging across identities.

We found that while face images contained most of their energy in the horizontal range, irrespective of viewpoint (Figure 3), the amplitude of the horizontal energy peak decreased as the face viewpoint moved away from frontal. For profile views, there was a plateau covering the horizontal plus the adjacent left and right oblique orientations for the left- and right-pointing profile views, respectively. In comparison, vertical energy was lower than any other range regardless of viewpoint. These findings replicate past evidence that most of the energy in a face image is contained in a range centred over the horizontal angle (Goffaux, 2019a; Goffaux & Greenwood, 2016; Keil, 2009).

Yet the distribution of image energy as a function of orientation does not provide any insight about its potential usefulness for face identity recognition; we addressed this using model observers.

Model observers

To systematically quantify the orientation profile of the stimulus information physically available (1) to identify faces seen under a fixed viewpoint and (2) to match face identities across viewpoints (requesting tolerance to change in views), we designed two different model observers: a view-selective model observer and a view-tolerant model observer, respectively. By characterizing the orientation profile of the information available to recognize face identity in a view-selective and -tolerant manner, this approach enables emitting formal, stimulus-driven hypotheses on the information as used by humans (Collin et al., 2014; M. V. Pachai et al., 2013). Namely, it allows addressing the extent to which human view-tolerant identity recognition relies on the orientation range that is the most informative for the view at stake, or the one most stable across views.

Model observers performed the task on the same 10 “old” faces and 20 “new” faces as randomly selected for the human observers; we ran 22 view-selective and 22 view-tolerant model observers to match the number of tested human subjects. We presented each of the 30 faces once per condition. In each trial, the models computed the pixelwise similarity based on the calculation of a cross-correlation between a target image (either from the ‘old’ or ‘new’ set) and each of the possible exemplars of the same gender (i.e., probes) filtered at the same orientation as the target image. The distributions of cross-correlation coefficients across viewpoints and orientations are shown for each model separately in the supplementary section. The probe face with the highest correlation (i.e., the more similar) to the target image was selected as the model response (winner-take-all scheme). Depending on the correspondence between the “winner” probe and the actual target, the responses of the model observer were categorized as hit or false alarm, allowing for the computation of a sensitivity d’ in each orientation and viewpoint condition along a procedure like the one used to compute human performance.

In both model types, target and probes were of the same orientation range (see Collin et al., 2014; Nasanen, 1999 for a similar method applied to the spatial frequency domain). Targets and probes in the view-selective model observer stemmed from a fixed viewpoint whereas the view-tolerant model observer matched target and probe separated by more than one viewpoint step to force tolerance to viewpoint in this model performance (e.g., a face at +025 of yaw was matched to faces at +075, −025, – 050, and −075). Since profile views stood at the viewpoint continuum extrema, one extra viewpoint was available for comparison. We therefore decided to drop the mirror profile view to match the number of comparisons across profile and non-profile viewpoints. Performance of the view-tolerant model was averaged across comparison viewpoints.

In a pilot phase, we measured the overall identification performance of each model. In the view- selective model, we had to decrease signal-to-noise ratio (SNR) of the target and probe images to .125 (face RMS contrast: .01; noise RMS contrast: .08) to avoid ceiling effects and keep overall performance close to human levels. Sensitivity d’ of the view-tolerant model was much lower than human sensitivity, even without noise. The view-tolerant model therefore processed fully visible stimuli (SNR of 1). This decreased sensitivity in the view-tolerant compared to the view-selective model is expected as none of the probes exactly matched the target at the pixel level due to viewpoint differences. In contrast to humans who rely on internally stored representations to match identity across views, the model observer lacks such internal representations and entirely relies on (inefficient) pixelwise comparisons. The distribution of d’ and 95% confidence intervals for each model observer are displayed in the supplementary section.

The main objective of running these model observers is to interpret human orientation sensitivity profiles considering the available information in the stimulus; following this reasoning, model observer performance is only interesting when compared to human performance. Therefore, we investigated which model observer best predicted the orientation dependence profile of human face recognition using partial Pearson correlation analyses. We controlled for the variance in image energy across orientations and viewpoints, in all partial correlation analyses to exclude the possibility that view- selective/tolerant model performance is a mere epiphenomenon of absolute oriented energy. The Fisher-Z transform of perfect correlations leading to infinite values, we replaced all −1 and 1 partial correlation coefficients to −.99 and .99, respectively, before applying Fisher Z-transformation and computing 95% confidence intervals (see Table 1). We (partial) correlated the orientation profiles of human and each model observer at each viewpoint separately (eight orientation vectors). For each specific viewpoint, the orientation sensitivity profile of each human observer was correlated to the average orientation sensitivity profile of either model observer, while controlling for the variance explained by (1) the average orientation sensitivity profile of the alternate model observer and (2) the average profile of orientation energy. The variability in human individual orientation profiles was taken as an estimate of the maximally achievable data-model correlation. We computed the cross-correlation between each human individual orientation sensitivity profile and the average sensitivity profile of the remaining participants. The maximally achievable correlation was the mean of these individual-to-group correlations.

Results

Results – Human observers

Upon visual inspection, group-averaged d’ plotted as a function of orientation depicted a bell-shaped function roughly centred on horizontal orientation in the frontal view condition as in the other Viewpoint conditions (Figure 3A). The fitting confirmed that sensitivity to identity shows a roughly similar Gaussian orientation tuning profile across viewpoints (see bell-shaped curves on Figure 3A).

The relative stability of the human orientation sensitivity profile was corroborated by the stable and significantly positive correlations of the orientation tuning profile across viewpoints (rs> .67, ps < .05; Figure 4).

Sensitivity (i.e., performance at the recognition task) of human and model observers and image energy across viewpoints and orientations.

Left column. 3D surf plots of the normalized energy/sensitivy across orientation and viewpoints. Middle column. Matrix representations of the normalized energy/sensitivity across orientation and viewpoints. Right column. Matrix representations of the Pearson correlation (non Fisher-z transformed) of the normalized orientation distributions of energy/sensitivity across viewpoints.

Despite human sensitivity for face identity always being best around horizontal orientations and worse around vertical ones, there were notable fluctuations in the orientation sensitivity profile across viewpoints. To better grasp these variations, we plotted the population-level estimates of the four parameters of the Gaussian curve as a function of viewpoint in Figure 3B. Population-level estimates and 95% credible intervals can also be found in Table 2.

Mean and 95% confidence interval of the (Fisher Z-transformed) partial correlation coefficients between human and each model orientation d’ profiles while controlling for the variance in image energy and alternate model.

Peak Location is estimated to lie close to 90° (grey horizontal line in Figure 3B) in all viewpoints except at the two profile views where peak location shifts toward adjacent obliques orientations (yaw = −75/75; see Figure 3A). Specifically, peak human sensitivity tended to shift towards left and right oblique for the most extreme leftward and rightward deviations in viewpoint, respectively. Peak amplitude, which corresponds to the difference in sensitivity between the vertical and horizontal ranges, is relatively stable across all viewpoints except for the two profile views where it is lower. Base amplitude, reflecting sensitivity in the vertical range, is highest for the two profile views and progressively decreases towards the full-frontal view. This pattern suggests that vertically-oriented face content is more diagnostic for profile than for other viewpoints. The variation of Standard Deviation across viewpoints can hardly be interpreted because of the high uncertainty of the estimations; the 95% credible intervals of the Standard Deviation for the different viewpoints mostly overlap.

Results – View-selective model observer

The performance of the model observer matching faces at specific viewpoints indicates that the most diagnostic orientation range varies as a function of viewpoint (Figure 4).

Within the frontal view, optimal cues to identity are conveyed by orientations close to the horizontal angle (between 90° and 112.5°). The striking similarity of the orientation tuning profile of the view- selective model observer with the human performance indicates that at frontal view, the human visual system makes an efficient use of the information as made available in the stimulus. The peak of model performance shifted to the left and right oblique orientations closest to horizontal angle (67.5° and 112.5°) for leftward and rightward deviations in viewpoint, respectively. Additionally, the view-selective model observer increasingly relied on vertical cues as the face is turned towards its profile, in line with human performance.

We found only (significant) weakly positive correlations of the orientation sensitivity profiles across adjacent viewpoints (e.g., +075 and +050) confirming the orientation tuning variations across views (Figure 4).

Except for the frontal view, the orientation tuning of the view-selective model observer differs from human orientation profiles, which kept tuned to horizontal and adjacent oblique ranges irrespective of viewpoint.

Results – View-tolerant model observer

The model observer matching face identity across viewpoints performed in a drastically different way from the view-selective model observer. Recognition performance peaked sharply in horizontal angles and dropped abruptly at other ranges (Figure 4). Such orientation profile was relatively stable across viewpoints as shown by the homogeneous matrix of large positive and significant Pearson correlation coefficients for sensitivity profiles across viewpoints (rs> .88, ps< .002).

Such horizontal tuning was sharper than the one observed for humans with a much more severe drop of sensitivity in the vertical orientation range. The horizontal tuning of human performance is much shallower even at frontal view. It shows a similar dip in sensitivity for the vertical range but the latter however tends to reduce when moving away from the frontal view, which does not happen for the view-tolerant model observer that keeps sharply horizontally-tuned.

Results – Human Versus Model observers

What model observer best predicts the orientation tuning profile of human face recognition? Is it the view-selective model taking advantage of the identity cues that are optimal for the viewpoint at stake, or the view-tolerant model the performance of which performance relies on the identity cues most stable across viewpoints? To address this question directly, human orientation d’ profiles were correlated at the individual level with each model orientation d’ profile, when partialling out the variance explained by the alternate model and image energy. We submitted the so-obtained individual Fisher-Z transformed partial correlation coefficients to a repeated-measure ANOVA with Model and Viewpoint as within-subject factors.

These partial correlations showed that the orientation tuning profiles of human and view-selective model observers correlated strongly for frontal and near-frontal views, approaching the maximally achievable correlation. However, partial correlations dropped steeply as the views deviated further from frontal (Figure 5; Table 2). The correlation of orientation tuning profiles between human and view-tolerant model observers was lower overall but significant across all viewpoints.

Fisher-z transformed Pearson partial correlation of the orientation sensitivity profiles between humans and each model, while controlling for the alternate model and image energy.

Error bars show the 95% confidence intervals. The fade grey line depicts the maximally achievable correlation for the separate viewpoint conditions in the human dataset (see Methods for details).

The repeated-measure ANOVA confirmed that the human-model correlation of orientation sensitivity profiles differed depending on Model (F(1,21)= 5.21, p = .033, η2 = .017) and Viewpoint (F(3.85,80.9)= 11.064, p< .001, η2 = .11). The interaction between Model and Viewpoint was also significant (F(4.2,88.35)= 13.035, p< .001, η2 = .23).

We explored the impact of viewpoint on human-model correlation for the view-selective and view-tolerant models separately using a repeated-measure ANOVA with Viewpoint as a within-subject factor.

For the view-selective model, this analysis revealed a robust effect of Viewpoint on the human-model partial correlation (F(3.86,80.99)= 22.795, p< .001, n2= −52)· For the view-tolerant model, the effect of Viewpoint on the human-model partial correlation was not significant (F(6,126)= 1.2, p= .32, η2 = .054). This confirms the relatively stable human-model partial correlation coefficients across viewpoints observed for the view-tolerant model and the fluctuant profile of human-model partial correlation coefficients for the view-specific model (peaking at frontal views and decreasing as moving toward profile views).

Furthermore, at each viewpoint, we examined which model observer best correlated with human orientation sensitivity profile (using Holm-corrected post-hoc tests on Fisher Z-transformed partial correlation coefficients; Table 3). The view-tolerant model predicted a significantly larger portion of the variance in human orientation sensitivity profile at +/-050 and +/-075 viewpoints, this difference was marginal for the +050 viewpoint. It is only for the identification of frontal views of faces that the view- selective model correlated best with human data. Correlations were of similar strength for the +075, +025, −025 (and +050) viewpoints (Table 3).

Difference of the human-model correlation coefficients between viewpoint-selective and view-tolerant model observers.

Positive t values indicate a stronger correlation with the viewpoint-selective model and negative t values stronger correlation with the view-tolerant model.

Results – Is model observers’ performance predicted by horizontal energy predominance?

In the above correlation analyses, we controlled the variance in image energy to yield a clean measure of the functional link between human and model observer performance. Here, we explore the possibility that the horizontal reliance of either model observer performance scales with the energy predominance of this orientation range in the stimulus image.

To analyse the influence of horizontal predominance on the model’s performance, we computed for each image the horizontal minus vertical difference in terms of energy and model observer sensitivity. For each identity and at each viewpoint, energy and model sensitivity difference values were submitted to a Pearson correlation. We found a positive modest correlation between view-selective model observer and image energy (Pearson r = .24, p < .0005). There was no similar correlation for the view-tolerant model (Pearson rho = .025, p = .72) despite energy and view-tolerant model performance similarly peaked in the horizontal range across viewpoints.

What this analysis shows is that, in a given face, the stronger the predominance in horizontal energy (relative to the vertical energy), the more diagnostic the horizontally-oriented identity cues for view- selective recognition. However, horizontal energy predominance in a given face does not account for this range carrying the most stable cues across viewpoints.

This image-level analysis did not include human data since human sensitivity necessarily aggregates performance across trials. However, it would likely show a similar detachment from horizontal energy predominance as the view-tolerant model. Namely, humans are expected to rely on horizontal cues to match faces across views no matter the amount of horizontal energy predominance in the face at stake (e.g., George Clooney versus Brad Pitt).

Discussion

Tolerant face identity recognition is defined as the ability to extract idiosyncratic face cues despite the large variability in any given face appearance (e.g., Burton et al., 2016; Kramer et al., 2018). The spatial information supporting the tolerance of visual recognition is a central and debated topic in the field of the visual and computational neurosciences (e.g, Andrews et al., 2023; DiCarlo & Cox, 2007a). By showing that the information supporting tolerance in face identity recognition is image-computable, i.e. that it can be objectively defined in the orientation domain of the face image, the present work makes a decisive advance on this question. Our finding that view-tolerant face identity recognition is driven by the horizontal range of face information yields concrete, image-driven constraints for the development of theoretical models of visual recognition.

Human participants performed an old/new identity recognition task on face stimuli presented under a variable view (from left to right profile) and filtered to contain a restricted range of oriented content (from 0° to 157.5° in steps of 22.5°; see Figure 2). For each view, recognition performance followed a Gaussian profile with a peak in the horizontal range and declining progressively when shifting towards the vertical range (Figure 3). Yet, while identity recognition stayed broadly tuned to the horizontal range irrespective of the vantage view, there were moderate but notable fluctuations in the tuning profiles. First, the peak location of the Gaussian tuning profile slightly and gradually shifted away from horizontal towards right and left adjacent obliques as the face turned to left or right profile. Second, the base amplitude of the Gaussian increased drastically towards profile views. The U-shaped profile of base amplitude as a function of viewpoint shows the increasing contribution of vertically oriented information as the face moves away from the frontal view (see Figure 3B). In other words, the horizontal advantage of human identity recognition is largest at full-frontal views and attenuates for non-frontal views due to the increased contribution of vertical (and close-to-vertical) orientations. In other words, while human identity recognition stays tuned to the horizontal range irrespective viewpoints, it tends to increasingly rely on non-horizontal orientations as the face shifts to profile.

As the vantage point of a face shifts away from frontal view, morphological features related to the 3D structure of the face become more apparent (e.g., nose and cheek protuberances, jaw line, nose bridge, eyebrow head...; see e.g., Stephan & Caine, 2007 for evidence that the nose gains in informativeness from frontal to profile view). Our finding that non-horizontal ranges, particularly vertical, gain importance in non-frontal views suggests that these orientations may facilitate access to such features.

In contrast, other sources of information are lost when shifting towards profile views, such as the bilateral symmetric organization of the face as well as the 2D shape of features and their configuration along the x axis (e.g., size of eyebrows, interocular distance; Royer et al., 2016; Troje & Bulthoff, 1996). In a way, it is surprising that this shift in accessible information did not manifest in a substantial variation of the peak location of the orientation tuning profile. Instead, the relatively stable preference for horizontal information across views suggests that it is not solely due to this range facilitating access to the 2D properties and bilateral symmetry (Dakin & Watt, 2009), features that are most available in frontal views. Our recent work (Dumont et al., 2024) showed that the inversion and negation effects on identity recognition, presumed to reflect a disrupted access to 2D shape and surface cues, respectively (for shape: Jiang et al., 2011; Meinhardt-Injac et al., 2013; Rossion, 2009; for surface: Bruce & Langton, 1994; Kemp et al., 1996; C. H. Liu et al., 2000; Russell et al., 2006; Vuong et al., 2005), are strongest in the horizontal range. This supports the notion that the contribution of the horizontal face information to identification goes beyond the carriage of 2D shape cues. Alternatively, the stable horizontal tuning may be due to identity recognition relying mostly on the eye region irrespective of view. Despite the eye appearance being strongly affected by changes in view, the identity information extracted from the eye region is the most diagnostic across views (Stephan & Caine, 2007; but see Royer et al., 2016 for conflicting results) and stays best defined in the horizontal range.

Using a model observer approach, we investigated how the human observer makes use of the identity cues physically available in the image across orientations and views. It is indeed important to compare the human tuning profile to a pixel-based quantification of the information that is available in the stimulus in order to characterize more formally the human sampling specificities potentially at stake when recognizing face identity. We designed two model observers to measure the information available in the stimulus to “recognize” (i.e., cross-correlate) face identity within and across views and disentangle stimulus information available for view-selective and view-tolerant recognition, respectively.

Let’s first summarize the findings related to the view-selective model observer. It showed that identity cues most diagnostic to discriminate face identity in a view-specific manner is highly variable across views (Figure 4). The view-selective model observer was horizontally-tuned for frontal views of faces, in a manner strikingly similar to the human performance profile. This suggests that human observers make a close-to-optimal use of the orientation information in face images when identifying frontal face views. Akin to human recognition, the view-selective model progressively increased its reliance on vertically- oriented cues when the face moved from frontal to profile. However, for the view-selective model – and in contrast to human performance – this came to the expense of the horizontal tuning, which vanished completely. What these findings suggest is that the robustness of the horizontal tuning of face identity recognition at frontal view is due to this range conveying the optimal cues to identity in these specific conditions. However, they also show that the horizontal range loses its view-specific informativeness in non-frontal views; namely, differences in face identity at non-frontal views are predominantly carried by non-horizontal ranges of information. Therefore, view-specific informativeness cannot account for the generalization of the horizontal tuning profile of human identity recognition across views. In other words, the reason human recognition remains tuned to the horizontal range from frontal to profile views cannot be that this range provides optimal identity cues at each view.

In contrast to the view-selective model observer, the view-tolerant model observer kept sharply tuned to the horizontal range irrespective of view (Figure 4). The horizontal range resulted in the highest crosscorrelation among the different views of a given face indicating that this range yields the identity cues that are the most stable across views, those that enable binding different face views into a unique representation of identity (Burton, 2013). This physical property of the face image likely explains why human identity recognition keeps horizontally-tuned across views (see Goffaux & Dakin, 2010 for a similar suggestion based on empirical evidence in a viewpoint-generalization identity matching task).

For a direct comparison of human and model performance, we quantified the variance shared between each model observer and human orientation tuning profile while controlling for image energy and the alternate model (Figure 5). These analyses confirmed that the view-selective model best predicted the orientation tuning of human identity recognition at frontal and close-to-frontal views, that are typically experienced during so-called face-to-face conversations, but not at profile and close-to-profile views. In contrast, the variance explained by the view-tolerant model was relatively stable across viewpoints. The particularly strong horizontal tuning of human identity recognition is thus likely due to the visual system extracting a representation that simultaneously prioritizes the orientation range conveying the cues that are the richest at frontal view and the most stable across views.

The horizontal tuning of human face recognition was found to be relatively broad across views, compared with the sharp tuning of the view-tolerant model observer. This tuning breadth may serve to retain information of the complex manifold of a given face appearance variability across views. Indeed, each face identity has its own specific way to vary across expression, illumination, view, etc. Such idiosyncratic within-person variability was proposed to drive familiar face identity recognition as much as the stable (horizontal) identity information (e.g., Burton et al., 2016; Ritchie & Burton, 2017). Moreover, accidental properties such as head and gaze orientation carry important cues for the regulation of social interactions. Our past evidence shows that the fine discrimination of gaze direction is best supported in the vertical range (Goffaux, 2019). The vertical range also likely carries most of the information about the head direction. Thus, the broad horizontal tuning of identity recognition by humans may allow for the integration of such accidental properties of a face with the (more) stable identity (Or& Wilson, 2010). For functional social interactions, it may be necessary to retain the dynamic and variable signals emitted by a face as much as its invariant aspects, and this would entail a relatively broad tuning to orientation.

Effect of lighting are even more disruptive than view changes (Adini et al., 1997; Braje et al., 1998; S. Favelle et al., 2017; Hill & Bruce, 1996; A. Johnston et al., 2013; Tarr et al., 1998, 2008). Image representations that emphasized the horizontal features were found to be less sensitive to changes in the direction of illumination (Adini et al., 1997). Future research should test whether tolerance of human face identity recognition to lighting is also supported by the horizontal range.

To conclude, this study demonstrates that the horizontal range carries the richest identity cues in frontal views, and the most stable across views. The orientation tuning profile of human identity recognition aligns with this combination of high diagnostic value in frontal views and cross-view stability. Taken together, this body of evidence suggests that the invariant representation of a given face, gradually learned through repeated exposure to its natural appearance statistics, relies heavily on horizontal facial information (Figure 1; (Burton et al., 2016; Dakin & Watt, 2009; Ritchie & Burton, 2017).

Acknowledgements

We thank Julie Juaneda, Zoe Strapazzon, Hana Zjakic, and Stien Van de Plas for their help with the data collection. H.D. is supported by the Belgian National Fund for Scientific Research (FRS-FNRS). V.G. is a research associate of the FRS-FNRS.

Additional information

Contributor Roles

A.R.-S.: Formal analysis, Conceptualization, Visualization, Writing – review & editing;

H.D.: Formal analysis, Data curation, Visualization;

V.B.: Formal analysis;

C.J.: Methodology, Investigation;

V.G.: Conceptualization, Methodology, Formal analysis, Funding acquisition, Visualization, writing – original draft, Project administration, Supervision.