Shape-invariant encoding of dynamic primate facial expressions in human perception

  1. Nick Taubert
  2. Michael Stettler
  3. Ramona Siebert
  4. Silvia Spadacenta
  5. Louisa Sting
  6. Peter Dicke
  7. Peter Thier
  8. Martin A Giese  Is a corresponding author
  1. Section for Computational Sensomotorics, Centre for Integrative Neuroscience & Hertie Institute for Clinical Brain Research, University Clinic Tübingen, Germany
  2. International Max Planck Research School for Intelligent Systems (IMPRS-IS), Germany
  3. Department of Cognitive Neurology, Hertie Institute for Clinical Brain Research, University of Tübingen, Germany
10 figures, 4 tables and 1 additional file

Figures

Stimulus generation and paradigm.

(A) Frame sequence of a monkey and a human facial expression. (B) Monkey motion capture with 43 reflecting facial markers. (C) Regularized face mesh, whose deformation is controlled by an embedded elastic ribbon-like control structure that is optimized for animation. (D) Stimulus consisting of 25 motion patterns, spanning up a two-dimensional style space with the dimensions ‘expression’ and ‘species’, generated by interpolation between two expressions (‘anger/threat’ and ‘fear’) and the two species (‘monkey’ and ‘human’). Each motion pattern was used to animate a monkey and a human avatar model.

Raw data and statistical analysis.

(A) Histograms of the classification data for the four classes (see text) as functions of the style parameters e and s. Data is shown for the human avatar, front view, using the original motion-captured expressions as prototypes. (B) Fitted discriminant functions using a logistic multinomial regression model (see 'Materials and methods'). Data is shown for the human avatar, front view, using the original motion-captured expressions as prototypes. (C) Prediction accuracy of the multinomial regression models with different numbers of predictors (constant predictor, only style variable e or s, and both of them).

Fitted discriminant functions Pi(e,s) for the original stimuli.

Classes correspond to the four prototype motions, as specified in Figure 1D (i = 1: human-angry, 2: human-fear, 3: monkey-threat, 4: monkey-fear). (A) Results for the stimuli generated using original motion-captured expressions of humans and monkeys as prototypes, for presentation on a monkey and a human avatar. (B) Significance levels (Bonferroni-corrected) of the differences between the multinomially distributed classification responses for the 25 motion patterns, presented on the monkey and human avatar.

Tuning functions.

(A) Fitted species-tuning functions DH(s) (solid lines) and DM(s) (dashed lines) for the categorization of patterns as monkey vs. human expressions, separately for the two avatar types (human and monkey) and the two view conditions. Different line styles indicate the experiments using original motion-captured motion, stimuli with occluded ears, and the experiment using prototype motions that were equilibrated for the amount of motion/deformation across prototypes. (B) Thresholds of the tuning functions for the three experiments for presentation on the two avatar types and the two view angles. (C) Steepness of the tuning functions at the threshold points for the experiments with and without equilibration of the prototype motions, and with occlusion of the ears.

Fitted discriminant functions Pi(e,s) for the condition with occlusions of the ears.

Classes correspond to the four prototype motions, as specified in Figure 1D (i = 1: human-angry, 2: human-fear, 3: monkey-threat, 4: monkey-fear). (A) Results for the stimuli generated using original motion-captured expressions of humans and monkeys as prototypes but with occluded ears, for presentation on a monkey and a human avatar (only using the front view). (B) Significance levels (Bonferroni-corrected) of the differences between the multinomially distributed classification responses for the 25 motion patterns, presented on the monkey and human avatar.

Equilibration of low-level expressive information.

(A) Mean perceived expressivity ratings for stimulus sets that were equilibrated using different types of measures for the amount of expressive low-level information: OF: optic flow computed with an optic flow algorithm; DF: shape difference compared to the neutral face (measured by the 2D distance in polygon vertex space); MF: two-dimensional motion of the polygons on the surface of the face. In addition, the ratings for a static neutral face are shown as reference point for the rating (neutral). (B) Extreme frames of the monkey threat prototype before and after equilibration using the MF measure (C) 2D polygon motion flow (MF) computed for the 25 stimuli in our expression style space for the monkey avatar for the front view (similar results were obtained for the other stimulus types).

Fitted discriminant functions Pi(e,s) for the experiment with equilibration of expressive information.

Classes correspond to the four prototype motions, as specified in Figure 1D (i = 1: human-angry, 2: human-fear, 3: monkey-threat, 4: monkey-fear). (A) Results for the stimuli set derived from prototype motions that were equilibrated with respect to the amount of local motion/deformation information, for presentation on a monkey and a human avatar. (B) Significance levels (Bonferroni-corrected) of the differences between the multinomially distributed classification responses for the 25 motion patterns, presented on the monkey and human avatar.

Appendix 1—figure 1
Details of generation of the monkey head model.

(A) Irregular surface mesh resulting from the magnetic resonance scan of a monkey head. (B) Face mesh, the deformation of which is following control points specified by motion-captured markers. (C) Surface with a high polygon number derived from the mesh in (B), applying displacement texture maps, including high-frequency details such as pores and wrinkles. (D) Skin texture maps modeling the epidermal layer (left), the dermal layer (middle), and the subdermal layer (right panel). (E) Specularity textures modeling the reflection properties of the skin; overall specularity (left) and the map specifying oily vs. wet appearance (right panel). (F) Complete monkey face model, including the modeling of fur and whiskers.

Appendix 1—figure 2
Details of generation of the human head model.

(A) Human face mesh and deformations by a blendshape approach, in which face poses are driven by the 43 control points (top panel). Tension map algorithm computes compression (green) and stretching (red) of mesh during animation (middle panel). Corresponding texture maps were blended (bottom panel). (B) Examples of diffuse texture maps (top panel), with additional maps for stretching (middle panel) and compression (bottom panel). (C) Subsurface color map modeling the color variations by light scattering and reflection by the skin. (D) Specular map modeling the specularity of the skin. (E) Wetness map modeling the degree of wetness vs. oilyness of the skin. (F) Regularized basic mesh with embedded muscle-like ribbon structures (violet) for animation. Yellow points indicate the control points defined by the motion capture markers. (G) Mesh with additional high-frequency details. (H) Final human avatar including hair animation.

Appendix 1—figure 3
Motion morphing algorithm and additional results.

(A) Graphical model showing the generative model underlying our motion morphing technique. The hierarchical Bayesian model has three layers, reducing subsequently the dimensionality of the motion data yn. The top layer models the trajectory in latent space using a Gaussian process dynamical model (GPDM). The vectors e and s are additional style vectors that encode the expression type and the species type. They are binomially distributed. Plate notation indicates the replication of model components for the encoding of the temporal sequence, and the different styles. Nonlinear functions are realized as samples from Gaussian processes with appropriately chosen kernels (for details, see text). (B) Results from Turing test experiment. Accuracy for the distinction between animations with original motion capture data and trajectories generated by our motion morphing algorithm is close to chance level (dashed line), opposed to the accuracy for the detection of small motion artifacts in control stimuli, which was almost one for both avatar types.

Tables

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional information
Software, algorithmCustom-written software written in C#This studyhttps://hih-git.neurologie.uni-tuebingen.de/ntaubert/FacialExpressions (copy archived at swh:1:rev:6d041a0a0cc7055618f85891b85d76e0e7f80eedTaubert, 2021)
Software, algorithmC3DserverWebsitehttps://www.c3dserver.com
Software, algorithmVisual C++ Redistributable for Visual Studio 2012 Update4 × 86 and x64Websitehttps://www.microsoft.com/en-US/download/details.aspx?id=30679
Software, algorithmAssimpNetWebsitehttps://www.nuget.org/packages/AssimpNet
Software, algorithmAutodesk Maya 2018Websitehttps://www.autodesk.com/education/free-software/maya
Software, algorithmMATLAB 2019bWebsitehttps://www.mathworks.com/products/matlab.html
Software, algorithmPsychophysics toolbox 3.0.15Websitehttp://psychtoolbox.org/
Software, algorithmR 3.6Websitehttps://www.r-project.org/
OtherTraining data for interpolation algorithmThis studyhttps://hih-git.neurologie.uni-tuebingen.de/ntaubert/FacialExpressions/tree/master/Data/MonkeyHumanFaceExpression
OtherStimuli for experimentsThis studyhttps://hih-git.neurologie.uni-tuebingen.de/ntaubert/FacialExpressions/tree/master/Stimuli
Appendix 1—table 1
Model comparison.

Results of the accuracy and the Bayesian Information Criterion (BIC) for the different logistic multinomial regression models for the stimuli derived from the original motion (no occlusions) for the monkey and the human avatar. The models included the following predictors: Model 1: constant; Model 2: constant, s; Model 3: constant, e; Model 4: constant, s, e; Model 5: constant, s, e, product se; Model 5: constant, s, e, Optic Flow.

Model comparison
Monkey front viewModelAccuracy [%]Accuracy increase [%]BICParametersdfχ2p
Model 138.29748733
Model 257.8619,56 (relative to Model 1)50763632411<0,0001
Model 349.4911,2 (relative to Model 1)61253631362<0,0001
Model 477.5319,67 (relative to Model 2)35863931490<0,0001
Model 577.530 (relative to Model 4)359842311.997<0.0074
Model 677.42−0,11 (relative to Model 4)35804235.6750.129
Human front view
Model 136.84748133
Model 254.2217,38 (relative to Model 1)55413631940<0,0001
Model 353.5616,72 (relative to Model 1)58473631633<0,0001
Model 481.5627,35 (relative to Model 2)34203932120<0,0001
Model 581.35−0,22 (relative to Model 4)3309423112<0,0001
Model 681.38−0,18 (relative to Model 4)338942331.66<0,0001
Monkey 30-degree
Model 135.32691333
Model 257.4022,08 (relative to Model 1)43143632622<0,0001
Model 349.3614,04 (relative to Model 1)51793631757<0,0001
Model 484.0426,64 (relative to Model 2)23593931977<0,0001
Model 584.880,84 (relative to Model 4)233542348<0,0001
Model 684.080,04 (relative to Model 4)233142328<0,0001
Human 30-degree
Model 137.40681933
Model 255.7218,32 (relative to Model 1)48433631975<0,0001
Model 354.3616,96 (relative to Model 1)52173631602<0,0001
Model 481.3225,6 (relative to Model 2)29103931956<0,0001
Model 582.881,56 (relative to Model 4)2809423101<0,0001
Model 681.920,6 (relative to Model 4)2890423190.0002
Appendix 1—table 2
Parameters of the Bayesian motion morphing algorithm.

The observation matrix Y is formed by N samples of dimension D, where N results from S * E trails with T time steps. The dimensions M and Q of the latent variables were manually chosen. The integers S and E specify the number of species and expressions (two in our case).

Parameters of motion morphing algorithm
ParametersDescriptionValue
DData dimension208
MFirst layer dimension6
QSecond layer dimension2
TNumber of samples per trial150
SNumber of species2
ENumber of expressionstwo or 3
NNumber of all samplesT * S * E
Hyper parameters (learned)Size
β1Inverse width of kernel k11
β2Inverse width of kernel k21
β3Inverse width for non-linear part one of kernel k31
β4Inverse width for non-linear part two of kernel k31
γ1Precision absorbed from noise term εd1
γ2Variance for non-linear part of k11
γ3Variance for linear part of k11
γ4Variance for non-linear part of k21
γ5Variance for linear part of k21
γ6Variance for non-linear part of k31
γ7Variance for linear part one of k31
γ8Variance for linear part two of k31
VariablesSize
YDataN x D
HLatent variable of first layerN x M
XLatent variable of second layerN x Q
s^MStyle variable vector for monkey speciesS x 1
s^HStyle variable vector for human speciesS x 1
e^1Style variable vector for expression oneE x 1
e^2Style variable vector for expression twoE x 1
Appendix 1—table 3
Detailed results of the two-way ANOVAs.

ANOVA for the threshold: two-way mixed model with expression type as within-subject factor and the stimulus type as between-subject factor for both the monkey and the human avatar. Steepness: two-way ANOVA with avatar type and expression factor for each stimulus motion type (original, occluded, and equilibrated). The mean square is defined as Mean Square = Sum of Square/df; df = degree of freedom.

ANOVAs
ThresholdMonkey avatarSum of squaredfMean squareFp
Stimulus type0,0020,000,000999
Expression type1,2011,20188,830000
Stimulus * Expression0,0620,034,510015
Error0,42600,01
Total1,7265
Human avatar
Stimulus type0,0020,000,010993
Expression type0,4010,4046,370000
Stimulus * Expression0,0520,033,150049
Error0,57600,01
Total1,0265
SteepnessOriginal motion stimulus
Avatar type376,681376,686,30016
Expression type0,3610,360,010939
Avatar * Expression0,1610,1600959
Error2391,214059,78
Total2768,4143
Occluded motion stimulus
Avatar type286,171286,173,330076
Expression type0,0210,0200988
Avatar * Expression0,0010,0000995
Error3094,543685,96
Total3380,7339
Equilibrated motion stimulus
Avatar type1,5711,570,40533
Expression type0,2510,250,060803
Avatar * Expression0,0210,0200945
Error174,76443,97
Total176,6047

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Nick Taubert
  2. Michael Stettler
  3. Ramona Siebert
  4. Silvia Spadacenta
  5. Louisa Sting
  6. Peter Dicke
  7. Peter Thier
  8. Martin A Giese
(2021)
Shape-invariant encoding of dynamic primate facial expressions in human perception
eLife 10:e61197.
https://doi.org/10.7554/eLife.61197