Reviewer #1 (Public review):

Summary:

This study investigates how human temporal voice areas (TVA) respond to vocalizations from nonhuman primates. Using functional MRI during a species-categorization task, the authors compare neural responses to calls from humans, chimpanzees, bonobos, and macaques while modeling both acoustic and phylogenetic factors. They find that bilateral anterior TVA regions respond more strongly to chimpanzee than to other nonhuman primate vocalizations, suggesting that these regions are sensitive not only to human voices but also to acoustically and evolutionarily related sounds.

The work provides important comparative evidence for continuity in primate vocal communication and offers a strong empirical foundation for modeling how specific acoustic features drive TVA activity.

Strengths:

(1) Comparative scope: The inclusion of four primate species, including both great apes and monkeys, provides a rare and valuable cross-species perspective on voice processing.

(2) Methodological rigor: Acoustic and phylogenetic distances are carefully quantified and incorporated into the analyses.

(4) Neuroscientific significance: The finding of TVA sensitivity to chimpanzee calls supports the view that human voice-selective regions are evolutionarily tuned to certain acoustic features shared across primates.

(4) Clear presentation: The study is well organized, the stimuli well controlled, and the imaging analyses transparent and replicable.

(5) Theoretical contribution: The results advance understanding of the neural bases of voice perception and the evolutionary roots of voice sensitivity in the human brain.

Weaknesses:

(1) Acoustic-phylogenetic confound: The design does not fully disentangle acoustic similarity from phylogenetic proximity, as species co-vary along both dimensions. A promising way to address this would be to include an additional model focusing on the acoustic features that specifically differentiate bonobo from chimpanzee calls, which share equal phylogenetic distance to humans.

(2) Selectivity vs. sensitivity: Without non-vocal control sounds, the study cannot determine whether TVA responses reflect true selectivity for primate vocalizations or general auditory sensitivity.

(3) Task demands: The use of an active categorization task may engage additional cognitive processes beyond auditory perception; a passive listening condition would help clarify the contribution of attention and task performance.

(4) Figures and presentation: Some results are partially redundant; keeping only the most representative model figure in the main text and moving others to the Supplementary Material would improve clarity.

Figures and data

Timecourse of the species categorization task with stimuli example and acoustic distance data.

Wholebrain results when contrasting the processing of chimpanzee to other species’ vocalizations with mean fundamental frequency and energy as trial-level covariates of no-interest (model 1).

Wholebrain results when contrasting the processing of chimpanzee to other species’ vocalizations with Mahalanobis acoustic distance as trial-level covariate of no-interest (model 2).

Wholebrain results when contrasting the processing of chimpanzee to other species’ vocalizations with vocalization loudness, intensity, change in spectrum, F2 bandwidth contour, F0 power and intensity contour difference as trial-level covariates of no-interest (model 3).

Synthesis of mid and anterior TVA clusters of activity recruited specifically by the processing of chimpanzee and macaque vocalizations (Models 1,2,3).

Clusters recruited specifically by the processing of chimpanzee and macaque vocalizations (Model 3) in subregions of the TVA, as a function of non-vocal material type.