Figures and data

Timecourse of the species categorization task with stimuli example and acoustic distance data.
(A) Detail of the timecourse of four trials of the species categorization task in non-representative order, including waveform and spectrogram graphs for one example stimulus of each species. (B) Scatter plot and histogram of the acoustic Mahalanobis distance data of each stimulus for each species including mean (numbers represent exact mean value) and violin plots of the standard error of the mean in addition to distribution fit. ITI: inter trial interval; Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque.

Wholebrain results when contrasting the processing of chimpanzee to other species’ vocalizations with mean fundamental frequency and energy as trial-level covariates of no-interest (model 1).
(ABC) Enhanced brain activity on a sagittal view with activity specific to chimpanzee vocalizations (dark blue to green) as well as between chimpanzee calls vs bonobo and macaque calls (chimpanzee > bonobo and macaque: brown to red with light yellow outline). (D) Percentage of signal change for each individual and relevant species according to the contrast in the left anterior superior temporal gyrus (aSTG1). Box plots represent mean value (black line) and the standard error of the mean with distribution fit. (EFG) Direct comparison between human and chimpanzee vocalizations (human > chimpanzee: dark red to yellow; chimpanzee > human: dark green to yellow) on a sagittal render. (H) Percentage of signal change in the anterior superior temporal gyrus (aSTG2) when contrasting chimpanzee to human vocalizations for each individual and relevant species according to the contrast with box plots representing mean value (black line) and the standard error of the mean with distribution fit. Brain activations are independent of low-level acoustic parameters for all species (mean fundamental frequency ‘F0’ and mean energy of vocalizations). Data corrected for multiple comparisons using wholebrain voxelwise false discovery rate (FDR) at a threshold of p<.05. Percentage of signal change extracted at cluster peak including 9 surrounding voxels, selecting among these the ones explaining at least 85% of the variance using singular value decomposition. Circles represent individual values, boxplot represents the mean and its standard error, and half-violin plots show data distribution. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque. TVA: sample-specific (N=23) temporal voice areas. ‘a’ prefix: anterior; ‘m’ prefix: mid; ‘p’ prefix: posterior; STG: superior temporal gyrus; STS: superior temporal sulcus; L: left hemisphere; R: right hemisphere.

Activations, cluster size and coordinates for each contrast of interest of model 1 (mean of vocalization fundamental frequency and energy as trial-level covariates of no-interest) in the sample-specific temporal voice areas, wholebrain voxelwise p<.05 FDR corrected, k>10.

Wholebrain results when contrasting the processing of chimpanzee to other species’ vocalizations with Mahalanobis acoustic distance as trial-level covariate of no-interest (model 2).
(ABC) Enhanced brain activity on a sagittal view with activity specific to chimpanzee vocalizations (chimp > hum,bon,mac; dark blue to green) as well as between chimpanzee calls vs bonobo and macaque calls (chimpanzee > bonobo and macaque: brown to red with light yellow outline). (D) Percentage of signal change for each individual and relevant species according to the contrast in the left anterior superior temporal gyrus (aSTG6). Box plots represent mean value (black line) and the standard error of the mean with distribution fit. (EFG) Direct comparison between human and chimpanzee vocalizations (human > chimpanzee: dark red to yellow; chimpanzee > human: dark green to yellow) on a sagittal render. (H) Percentage of signal change in the anterior superior temporal gyrus (aSTG8) when contrasting chimpanzee to human vocalizations for each individual and relevant species according to the contrast with box plots representing mean value (black line) and the standard error of the mean with distribution fit. Brain activations are independent from the acoustic distance of each stimulus for all species. Data corrected for multiple comparisons using wholebrain voxelwise false discovery rate (FDR) at a threshold of p<.05. Percentage of signal change extracted at cluster peak including 9 surrounding voxels, selecting among these the ones explaining at least 85% of the variance using singular value decomposition. Circles represent individual values, boxplot represents the mean and its standard error, and half-violin plots show data distribution. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque. TVA: sample-specific (N=23) temporal voice areas. ‘a’ prefix: anterior; ‘m’ prefix: mid; ‘p’ prefix: posterior; STG: superior temporal gyrus; STS: superior temporal sulcus; L: left hemisphere; R: right hemisphere.

Activations, cluster size and coordinates for each contrast of interest of model 2 (inter-species vocalization acoustic distance as trial-level covariate of no-interest) in the sample-specific temporal voice areas, wholebrain voxelwise p<.05 FDR corrected, k>10.

Wholebrain results when contrasting the processing of chimpanzee to other species’ vocalizations with vocalization loudness, intensity, change in spectrum, F2 bandwidth contour, F0 power and intensity contour difference as trial-level covariates of no-interest (model 3).
(ABC) Enhanced brain activity on a sagittal view with activity specific to chimpanzee vocalizations (dark blue to green) as well as between chimpanzee calls vs bonobo and macaque calls (chimpanzee > bonobo and macaque: brown to red with light yellow outline). (D) Percentage of signal change for each individual and relevant species according to the contrast in the left anterior superior temporal gyrus (aSTG10). Box plots represent mean value (black line) and the standard error of the mean with distribution fit. (EFG) Direct comparison between human and chimpanzee vocalizations (human > chimpanzee: dark red to yellow; chimpanzee > human: dark green to yellow) on a sagittal render. (H) Percentage of signal change in the anterior superior temporal gyrus (aSTG12) when contrasting chimpanzee to human vocalizations and when contrasting chimpanzee to bonobo and macaque calls (aSTG13) for each individual and relevant species according to the contrast with box plots representing mean value (black line) and the standard error of the mean with distribution fit. Brain activations are independent of the most discriminant low-level acoustic parameters of the stimuli set [30]. Data corrected for multiple comparisons using wholebrain voxelwise false discovery rate (FDR) at a threshold of p<.05. Percentage of signal change extracted at cluster peak including 9 surrounding voxels, selecting among these the ones explaining at least 85% of the variance using singular value decomposition. Circles represent individual values, boxplot represents the mean and its standard error, and half-violin plots show data distribution. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque. TVA: sample-specific (N=23) temporal voice areas. ‘a’ prefix: anterior; ‘m’ prefix: mid; ‘p’ prefix: posterior; STG: superior temporal gyrus; STS: superior temporal sulcus; L: left hemisphere; R: right hemisphere.


Activations, cluster size and coordinates for each contrast of interest of model 3 (vocalization loudness, intensity, change in spectrum, F2 bandwidth contour, F0 power and intensity contour difference as trial-level covariate of no-interest) in the sample-specific temporal voice areas, wholebrain voxelwise p<.05 FDR corrected, k>10.

Synthesis of mid and anterior TVA clusters of activity recruited specifically by the processing of chimpanzee and macaque vocalizations (Models 1,2,3).
aSTG and aSTS clusters recruited for the processing of chimpanzee calls as opposed to: human voices (green); bonobo, macaque calls (blue) and human voice; bonobo and macaque calls (turquoise) in the general TVA (AB, N=98) as well as in the sample-specific TVA (CD, N=23). Macaque results are only significant for Model 3 (purple: Macaque vs all other species; lilac: Macaque vs other nonhuman primates). Clusters are represented across all statistical models (Model 1: dotted line; Model 2: dashed line; Model 3: solid line). Model 1: mean of fundamental frequency and energy (covariates of no-interest, N=2); Model 2: acoustic distance (covariate of no-interest, N=1); Model 3: acoustic parameters that characterize low-level acoustics of our stimuli following a discriminant analysis (covariates of no-interest, N=6). Data are all corrected for multiple comparison using wholebrain voxelwise false discovery rate (FDR) at a threshold of p<.05. Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque. TVA: temporal voice areas. ‘a’ prefix: anterior; STG: superior temporal

Clusters recruited specifically by the processing of chimpanzee and macaque vocalizations (Model 3) in subregions of the TVA, as a function of non-vocal material type.
Enhanced brain activity on a sagittal views with activity specific to macaque vocalizations (red to yellow), specific to chimpanzee vocalizations (dark blue to green) as well as between chimpanzee calls vs bonobo and macaque calls (chimpanzee > bonobo and macaque: brown to red with light yellow outline). Brain activations are independent of the most discriminant low-level acoustic parameters of the stimuli set [30]. Data corrected for multiple comparisons using wholebrain voxelwise false discovery rate (FDR) at a threshold of p<.05. Black outline represents: voice compared to non-vocal stimuli of animal sounds (A,B), nature sounds (C,D), music (E,F), artificial noise (G,H). Hum: human; Chimp: chimpanzee; Bon: bonobo; Mac: macaque. TVA: sample-specific (N=23; white outline) temporal voice areas. STG: superior temporal gyrus; STS: superior temporal sulcus; ‘a’ prefix: anterior; ‘m’ prefix: mid; L: left hemisphere; R: right hemisphere.