Introduction

It is a longstanding question whether there is something unique about the cognitive abilities of humans relative to other animals (Hauser et al., 2002; Fitch et al., 2005; Iriki, 2006; Hopkins et al., 2012; Kietzmann, 2019; Penn et al., 2008; Berwick and Chomsky, 2016). Symbols are ubiquitous in many domains of human cognition, underlying not only language but mathematical, musical and social representations among many others domains (Deacon, 1998; Dehaene et al., 2022; Kabdebon and Dehaene-Lambertz, 2019; Nieder, 2009; Sablé-Meyer et al., 2021). The appearance of symbolic representations, which would develop in parallel with the expansion of prefrontal and parietal associative areas, has therefore been suggested as a crucial marker signaling hominization (Deacon, 1998; Dehaene et al., 2022; Henshilwood et al., 2002; Neubauer et al., 2018).

This proposal, however, hinges on the definition of what a symbol is. The term symbol is often used as a synonym for a sign, which is classically defined by Ferdinand de Saussure as an arbitrary binding between a “signifier” (for instance a word, a digit, but also a traffic sign, logo, etc.) and a “signified” (the meaning or content to which the signifier refers) . In that respect, however, many non-human animals, including chimpanzees, macaques, but also dogs, are able to learn hundreds of such relationships, even with arbitrary signs (Kaminski et al., 2004; Livingstone et al., 2010; Matsuzawa, 1985; Premack, 1971). Even bees can learn to associate arbitrary visual shapes to abstract representations such as visual quantities (2 or 3 elements) independently of the density, size or color of the elements in the visual display (Howard et al., 2019). More recently, it has been proposed to reserve the term “symbol” for a collection of such signs that can be syntactically manipulated according to precise compositional rules (Deacon, 1998; Dehaene et al., 2022; Nieder, 2009). The symbols then entertain relationships between each other that are parallel to the relationships between the objects, or concepts, they represent. For example, numerical symbols allow manipulations such “2+3=5” irrespective of whether it applies to apples, oranges or money. Performing the “sum” operation internally allows expectations about a specific outcome in the external world. Non-human animals may be conditioned to acquire iconic or indexical associations (i.e. signs which bear, respectively, a non-arbitrary or arbitrary relationships between the signifier and the signified) and even perhaps perform operations on the learned signs, such as addition (Livingstone et al., 2014), but their capacities for novel symbolic composition, especially of a recursive syntactic nature, appear limited, or absent (Berwick and Chomsky, 2016; Dehaene et al., 2022, 2015; Penn et al., 2008; Sablé-Meyer et al., 2021; Yang, 2013; Zhang et al., 2022).

The difference between humans and animals in terms of symbolic access remains controversial, in part because learning complex tasks require considerable training in animals, and a variety of factors such as motivation, learning rate and working memory capacity, may therefore explain an animal’s failure. This difficulty could be circumvented by testing a basic element of symbolic representations, i.e., the temporal reversibility of a learned arbitrary association. While the associations between indices and objects (typically acquired during classical conditioning) are unidirectional, as in the famous example of the whistle indicating the food, symbolic associations are bidirectional or symmetric (Deacon, 1998; Nieder, 2009). When hearing the word ‘dog’ for example, you can think of a dog, but when seeing a dog, you can also come up with the word ‘dog’. Such reversibility is crucial for communication (the language learner must acquire both comprehension and production skills), but also for symbolic computations, which require going back-and-forth between the real world (e.g., seeing three sets of four objects), the internal symbols (e.g. to allow the internal computation “3x4=12”) and back (to expect a total quantity of twelve). In the current work, we test the “reversibility hypothesis”, which proposes that because of a powerful symbolic system, humans are biased to spontaneously form bidirectional associations between an object and an arbitrary sign. It implies that the referential function of the sign immediately operates in both directions (i.e., comprehension and production), allowing to retrieve the signified (meaning) from the signifier (symbol) and vice-versa.

A small number of behavioral studies, spread over four decades, report that non-human animals such as bees and pigeons, but also macaques, baboons and chimpanzees, struggle to reverse the associations that they learned in one direction (Imai et al., 2021; Kojima, 1984; Lipkens et al., 1988; Medam et al., 2016; Sidman et al., 1982; Howard et al., 2019; see Chartier and Fagot, 2022, for a review and discussion). In a recent experiment, Chartier and Fagot (2022) explored this question in 20 free-behaving baboons. After having learned to pair visual shapes (two pairs A-B) above 80% success, their performance dropped considerably when the order of presentation was subsequently reversed (B-A; 54% correct, chance = 50%), although their relearning performance was only slightly but significantly better when the reversed pairs were congruent (B1-A1; B2-A2) rather than incongruent (B1-A2; B2-A1). Even for the famous case of chimpanzee AI, who learned Arabic numerals and other arbitrary tokens for colors and objects (Matsuzawa, 2009, 1985), it turns out that her capacity to associate signs and their meanings was based on an explicit and sequential training in both directions, at least initially (Kojima, 1984). In sharp contrast, humans as young as 8 months, even when tested under the same conditions as monkeys or baboons (Sidman et al., 1982), show behavioral evidence of immediate spontaneous reversal of learned associations (Imai et al., 2021; Ogawa et al., 2010; Sidman et al., 1982).

Still, behavioral tests depend on an explicit report which could hide an implicit understanding of symbolic representations. This confound can be alleviated by directly recording the brain responses, providing a more direct comparison between species. Here, we propose a simple brain-imaging test of reversible associations. First, the participant receives evidence of several stimulus pairings between an object (O) and an arbitrary sign or label (L) in a fixed ‘canonical order’, e.g., from O1 to L1 and from O2 to L2. Knowledge of these learned (i.e., congruent) associations is then tested using a classic violation-of-expectation paradigm, by evaluating the brain’s surprise response or “prediction error” when, say, O1 is followed by L2. This response can then also be evaluated in the converse direction, by switching the order of presentation of the two items within a pair. The crucial question is whether the brain shows a surprise response to an incongruent pairing presented in reversed order (e.g., L1 followed by O2), relative to the corresponding congruent pairing (L1 followed by O1). The reversibility hypothesis predicts that if symbolic associations are formed, pairs presented in canonical and reversed order should be similarly processed, and so a similar surprise response to incongruent pairings should be found in both cases.

A recent study from our lab used EEG to apply this approach to 4-5 month-old human infants (Kabdebon and Dehaene-Lambertz, 2019). The infants were habituated to pairs of stimuli in which a specific picture (a lion or a fish) was associated with tri-syllabic none words, depending on a rule concerning syllable- repetition in the word (e.g. xxY words such as babagu, didito, etc.. were followed by the fish picture whereas xYx words such as lotilo, fudafu, etc.. were followed by the lion picture). Violation-of-expectations responses were recorded in both canonical and reverse order, suggesting that preverbal human infants, already have the ability to reversibly attach a symbol to an abstract rule. In human adults, an fMRI study with a more complex design using explicit reports on associations between abstract patterns also showed brain signatures suggestive of spontaneous reversal of learned associations (Ogawa et al., 2010). The network of brain areas overlapped with the multiple-demand system that is ubiquitously observed in high- level cognitive tasks (Duncan, 2010; Fedorenko et al., 2013), including bilateral inferior and middle frontal gyrus (IFG and MFG), anterior insula (AI), intraparietal sulcus (IPS), and dorsal anterior cingulate cortex (dACC). In contrast, a human fMRI study investigating association learning between two natural visual objects found that violation effects in the learned direction were restricted to low level visual areas (Richter et al., 2018). Similarly, in macaque monkeys violation effects in the learned direction have been found selectively in visual areas, using fMRI as well as single-neuron recordings (Kaposvari et al., 2018; Meyer et al., 2014; Meyer and Olson, 2011; Vergnieux and Vogels, 2020). One of these studies (Meyer and Olson, 2011) also tested, in a small subset of 17 neurons, whether the learned associations spontaneously reversed, and showed no such reversal. From these studies, it is difficult to draw a conclusion about a potential difference between species, due to important differences in recording techniques and task design.

Here, we directly compared the ability to spontaneously reverse learned associations in humans and macaque monkeys using identical training, stimuli and whole-brain fMRI measures. Our goals were to (1) probe the reversibility hypothesis in an elementary passive paradigm in both species; (2) to shed light on the brain mechanisms of symbolic associations in humans. Indeed, two alternative hypotheses may be formulated. First, given that symbolic learning is a defining feature of language, reversible violation-of- expectation effects might be restricted to the left-hemispheric temporal and inferior frontal language areas. Alternatively, since symbolic learning is manifest in many domains outside of language, for instance in mathematics or music, each attached to a dissociable fronto-posterior brain network (Amalric and Dehaene, 2016; Chen et al., 2021; Dehaene et al., 2022; Fedorenko et al., 2011; Nieder, 2019; Norman-Haignere et al., 2015), reversibility could be expected to arise from a broad and bilateral network of human brain areas, including dorsal intraparietal and middle frontal nodes. We thus tested audio-visual and visual-visual symbolic pairing in two successive experiments.

Results

Summary of the experimental design

In the first experiment, we examined the learning and reversibility of auditory-visual pairs, i.e., between a visual object and an auditory label. Over the course of three days, we habituated humans (n=31) and macaque monkeys (n=2) with 4 pairs of visual objects and speech sounds (Figure 1A; Supplementary Figure 1). Two of the pairs were presented in the auditory-to-visual direction and two in the visual-to-auditory direction, ensuring that all subjects had experience with both orders and would not be surprised by their temporal reversal per se (see discussion of the utility of this point in Medam et al, 2016). After three consecutive days of habituation with 100% of congruent canonical trials (24 training trials in total per pair, presented outside the scanner), subjects were tested for learning using 3T fMRI, during which they were passively exposed to pairs that respected or violated the learned pairings (Figure 1B). To sustain the memory for learned pairs, the design still included 70% of congruent canonical trials (identical to the trials presented during habituation). In addition, there were 10% of incongruent canonical trials, in which the temporal order was maintained but the pairings between auditory and visual stimuli were violated. Enhanced brain responses to such incongruent pairs would indicate surprise and therefore prove that the associations had been learned. Note that all auditory and visual stimuli themselves were familiar: only their pairing was unusual. The design also included 10% of reversed congruent and 10% of reversed incongruent trials, in which the habitual (i.e. canonical) order of presentation of the pairs was reversed (Figure 1A). Observing an incongruity effect on such reversed trials would indicate that subjects spontaneously reversed the pairings and were surprised when they were violated. Note that the frequency of the two types of reversed trials was equal, and thus did not afford any additional learning of the reversed pairs (unlike Chartier and Fagot, 2022).

Experimental paradigm for auditory-visual label learning.

A) Subjects were exposed to four different visual-auditory pairs during three days (6 repetitions of each pair, 3 minute video). Two pairs were always presented in the ‘visual-then-auditory’ order (object to label), and two in the ‘auditory-then-visual’ (label to object) order. During the test phase, this canonical order was kept on 80% of trials, including 10% of incongruent pairs to test memory of the learned pairs, and was reversed on 20% of the trials. On reversed trials, half the pairs were congruent and half were incongruent (each 10% of total trials), thus testing reversibility of the pairings without affording additional learning. B,C) Activation in sensory cortices. Although each trial comprises auditory and visual stimuli, these could be separated by the temporal offsets. Images show significantly activated regions in the contrasts image > sound (red-yellow) and sound > image (blue-light blue), averaged across all subjects and runs for humans (B) and monkeys (C). D,E) Average finite-impulse-response (FIR) estimate of the deconvolved hemodynamic responses for humans (D) and monkeys (E) within clusters shown in B and C respectively, separately for visual-audio (VA) and audio- visual (AV) trials. Sign flipped on y-axis for monkey responses.

Experiment 1| audio-visual stimulus pairs

We first mapped the cortical regions that were activated by visual and auditory stimuli, modelling the two stimuli within each pair with separate regressors (Figure 1B, C). Visually evoked activations propagated all the way to the prefrontal cortex (PFC) in monkeys while they remained restricted to lower cortical areas in humans, in line with previous studies (Denys et al., 2004b; Mantini et al., 2013). In contrast, the response was relatively weak in the auditory cortex of monkeys, also in line with previous studies (Erb et al., 2019; Petkov et al., 2009; Uhrig et al., 2014). This is expected as the size of the auditory cortex in monkeys is small, relative to their visual cortex (Felleman and Van Essen, 1991), as well as relative to the size of the auditory cortex in humans (Woods et al., 2010). Even though the onset of the two stimuli within a pair were just 800ms apart, the fast acquisition allowed us to separate the timing of the activation of the visual and auditory pathways in both humans and monkeys (Figure 1 D, E). In visual cortex, the response evoked by the pair arose earlier when the first stimulus of the pair was visual compared to when it was auditory, and the other way around for the auditory cortex.

We next investigated whether the subjects had learned the associations, whether the brain responses showed signatures of generalization to the reversed direction, and which brain areas were involved. If participants had learned the associations, incongruent trials should evoke a surprise response relative to congruent trials, when presented in the same order as the training pairs (canonical trials). Crucially, if they spontaneously reversed the associations, a similar incongruity effect should also be seen on reversed trials. According to the reversibility hypothesis, humans should show a spontaneous reversal, while monkeys should not. Only for monkey, we should therefore find an interaction effect between incongruity and canonicity, indicating a significant difference between the congruity effect in the learned direction compared to the congruity effect in the reserved direction.

Indeed, in humans, a vast network was activated by incongruity on both canonical and reversed trials (voxel p<0.001, cluster p<0.05 corrected, n=31 participants) (Figure 2A, Table 1). This network included a set of high-level brain regions previously described as the multiple demand system (Duncan, 2010; Fedorenko et al., 2013), including bilateral IFG, MFG, AI, IPS, and dACC. It also included the language network (Pallier et al., 2011), with the left superior temporal sulcus (STS), and the left IFG. However, in our case the activation was bilateral, thereby supporting the model that the language network is part of a larger symbolic network (Dehaene et al., 2022). Furthermore, we also found activations in the precuneus, similar to the network that has been found for top-down attention to memorized visual stimuli (Sestieri et al., 2010), which also included bilateral STS and IPS. Notably, we did not find any congruity effects in visually activated regions (compare to Figure 1B), in contrast to a previous human fMRI study (Richter et al., 2018). Figure 2B shows the hemodynamic response within the different clusters and the different conditions. In all analyses, since there were a majority of canonical congruent trials, sensitivity was higher in the canonical direction, and thus the size of the significant clusters was larger on canonical than on reversed trials. However, no significant cluster exhibited any interaction between congruity and canonicity, indicating that there was no statistical difference between the effect of congruity for the habituated and the reversed direction. Thus, the human brain fully and spontaneously reverses the auditory-visual associations that it learns.

Congruity effects in the auditory-visual task in humans (experiment 1).

A) areas activated by incongruent trials more than by congruent trial in canonical trials (red), reversed trials (blue), and their overlap (green). Brain maps are thresholded at pvoxel < 0.001& pcluster < 0.05 corrected for multiple comparisons across the brain volume. No interaction effect was observed between congruity and canonicity. B) Average FIR estimate of the deconvolved hemodynamic responses within significant clusters in the left hemisphere, separately for VA and AV trials.

Congruity effect in Experiment 1 in humans (n=31).

We next asked whether monkeys (n=3) also learned the associations and did so in both directions. The canonical congruity effect, indexing learning, was not significant when analyzing only the first imaging session after the 3 days of training. Thus, monkeys were further trained during two weeks (with in total ∼960 training trials per pair) and tested during 4 consecutive days. The same training and testing pattern was used for 5 stimuli sets (Supplementary Figure 1). After this extended training, we found consistent effects in both monkeys, with clusters in early visual areas (V1, V2, V4), and auditory association areas in the left temporo-parieto-occipital cortex (TPO) (AV and VA trials combined, p>0.001, cluster p<0.05, n=2) (Figure 3, Table 2). Crucially, however, this effect was confined to the canonical direction, with no significant clusters in the reversed direction at the whole-brain level, in accordance with the reversibility hypothesis. We specifically tested the difference between the congruity effect in the learned and the reversed direction by calculating the interaction effect between congruity and canonicity, which showed an activation pattern that was similar to the canonical congruity effect, which reached significance in areas V2 and V4. Figure 3C shows the corresponding hemodynamic signals, with an enhanced response to incongruent pairs in the canonical direction (continuous red curve) but not in the reversed direction (dashed red curve). The results thus indicated that monkey cortex could acquire audio-visual pairings, as also shown by prior visual-visual experiments (Meyer and Olson, 2011; Vergnieux and Vogels, 2020), but with two major differences with humans: the congruity effects did not involve a broad network of high-level cortical areas but remained restricted to early sensory areas, and the learned associations did not reverse.

Congruity effects in the auditory-visual task in monkeys (experiment 1).

A) significant clusters from the incongruent-congruent canonical contrast. No significant clusters were found for the reversed direction. B) significant clusters from the interaction between congruity and canonicity. (pvoxel<0.001 & pcluster<0.05 for both maps) C,D) Average FIR estimate of the deconvolved MION responses within the clusters from the incongruent-congruent canonical contrast, averaged over VA and AV trials. All clusters in early visual areas were taken together to create figure C. Average of 2 animals.

Congruity effect in Experiment 1 in monkeys (n=2)

Experiment 2 | visual-visual stimulus pairs

The non-reversal in monkeys in the above audio-visual experiment could be due to a number of methodological choices. First, although the visual stimuli were optimized for monkeys, as 3 out of 5 stimulus sets were pictures of familiar toys, the auditory stimuli (pseudowords) might have been suboptimal for them (although note that monkeys in our lab have extensive experience with human speech). It might be argued that this choice made their discrimination difficult (although note that the canonical congruity effect is evidence of discrimination). Indeed, the auditory cortex is relatively small in monkeys compared to humans (Woods et al., 2010), and there is evidence that auditory memory capacity is reduced in monkeys compared to humans (Scott and Mishkin, 2016). Second, the instructions differed: while we asked human subjects to fixate a dot at the center of the screen and to pay attention to the stimuli, monkeys were simply rewarded for fixation.

To address those concerns, we replicated the experiment with reward-dependent visual-visual associations in 3 macaque monkeys (Figure 4; Supplementary Figure 2A). First, we replaced the spoken auditory stimuli with abstract black-and-white shapes similar to the lexigrams used to train chimpanzees to communicate with humans (Matsuzawa, 1985) (Supplementary Figure 2B). Second, to enhance attention for the monkeys, we introduced a reward association paradigm that made the stimuli behaviorally relevant for them (Wikman et al., 2019). Within each presentation direction, one of the two pictures of objects was associated with a high reward, and one with a low reward (Supplementary Figure 2A). Monkeys were still rewarded for fixation, but object identity predicted the size of the reward during the delay period following the presentation of the stimuli (two objects predicted a high reward, and two predicted a low reward). To calculate congruity effects, the two pairs within each direction were always averaged, making the reward association an orthogonal element in the design.

Visual-visual label learning in humans and monkeys (experiment 2).

A, Experiment paradigm. Subjects were habituated to 4 different visual-visual pairs during three days. Two pairs were in the ‘object-then-label’ order and two pairs in the ‘label-then-object’ order. For the monkeys, one object in each direction was associated with a high reward while the other one was associated with a low reward, making reward size orthogonal to congruity and canonicity (See Supplementary Figure 2 for details). B, monkey fMRI results. Significant clusters (pvoxel<0.001 & cluster volume >50) from the incongruent- congruent canonical contrast (left) and the interaction between congruity and canonicity (right). C, human fMRI results. Areas more activated by incongruent trials more than by congruent trial in canonical trials (red), reversed trials (blue), and their overlap (green) (right) (pvoxel<0.005 & cluster volume >50). No red voxels are visible because all of them figure in the overlap (green). D, Human behavioral results. After learning, human adults rated the familiarity of different types of pairs (including a fifth category of novel, never seen pairings). Each dot represents the mean response of one subject in each condition. Although the reversed congruent trials constituted only 10% of the trials, they were considered almost as familiar as the canonical congruent pairs.

Using this design, we obtained significant canonical congruity effects in monkeys on the first imaging day after the initial training (24 trials per pair), indicating that the animals had learned the associations (Figure 4B, Table 3). The effect was again found in visual areas (V1, V2 & V4), also spreading to the prefrontal cortex (45B, 46v), very similar to the visually activated areas (compare to Figure 1C). In addition, small clusters were also found in area 6 and in STS. . Crucially, the congruity effect remained restricted to the learned direction, as no area showed a significant reversed congruity effect, again in accordance with the reversibility hypothesis. The interaction between congruity and canonicity indicated that there was a significant difference between the canonical and the reversed direction in a similar set of regions (V1, V2, area 45A, 46v and 6). The greater involvement of frontal cortex in the congruity effect in this paradigm fits with previous reports on the impact of reward association on long-term memory for visual stimuli in macaque monkeys (Ghazizadeh et al., 2018) . To further investigate this, we split high versus low rewarded pairs and found that congruity effect was present only for high-reward conditions, with a significant interaction of congruity and reward in area 45 and caudate nucleus (Supplementary Figure 3). Overall, these results indicate that, even when stimuli were optimized and made relevant for monkeys, leading to enhanced activations and an activation of prefrontal cortex to violations of expectations, the learned associations did not reverse in monkeys.

Congruity effect in Experiment 2 in monkeys (n=3)

We also ran this visual-visual paradigm in human participants (n=24) with the goal to clarify the role of language in the reversibility process. Humans again gave evidence of reversed association, although weaker than with spoken words (Figure 4C, Table 4). At the normal threshold (voxel p<0.001, cluster p<0.05 corrected), the main effect of congruity was significant in a network very similar to experiment 1, including bilateral middle frontal gyrus (MFG), left intraparietal sulcus (IPS), bilateral anterior insula, dorsal anterior cingulate cortex (dACC), with an additional focus in left inferior temporal gyrus (Figure 4C, Table 4). The involvement of the language network was limited. In particular a main effect of congruity in the STS was absent, in agreement with the shift to visual symbols. Still, bilateral middle frontal gyri, STS and the precuneus were again activated by the incongruent minus congruent contrast on reversed trials (voxel p<0.001, cluster p<0.05 corrected), thereby extending beyond the multiple-demand system (Duncan, 2010; Fedorenko et al., 2013). While sensory activated regions were again absent, in contrast to a previous study on congruity effects in humans when using associations between two visual objects (Richter et al., 2018). And crucially, no interaction effect was again found between congruity and canonicity, neither at the classical threshold (p<0.001) nor at a lower threshold (p<0.01). Those results indicate that humans can also encode pairs of visual stimuli in a symmetrical, reversible fashion, involving a network of high-level cortical areas, unlike monkeys.

Congruity effect in Experiment 2 in humans (n=23)

Further evidence was obtained from a behavioral test, performed after imaging, where we collected familiarity ratings for each stimulus pair (see Methods, Figure 4). Although participants reported a higher familiarity with congruent canonical pairs (which were presented on 70% of trials) than with congruent reversed pairs (which were presented on 10% of trials, t(20)=2.8, p=0.01), both pairs were rated as much more familiar than their corresponding incongruent pairs (although they were also presented 10% of time), and than never-seen pairs (all t(20) >7, p<0.0001, bilateral paired t-test). This familiarity task thus confirms that humans spontaneously reverse associations and experience a memory illusion of seeing the reversed pairs.

Joint analysis of audio-visual and visual-visual stimulus pairs

In order to better characterize the human reversible symbol learning network and its dependence on modality, we reanalyzed both human experiments together (n=55) (Supplementary Figure 4). There was, unsurprisingly, a main effect of experiment with greater activation in a bilateral auditory and linguistic network in the AV experiment, and in the occipital, occipito-temporal and occipito-parietal visual pathways in the VV experiment. A main effect of congruity was observed and was again significant in both directions, canonical and reversed, in bilateral regions: insula, MFG, precentral, IPS, precuneus, ACC and STS. Crucially, there was still no region sensitive to the congruity X canonicity interaction, indicating that the learned associations were fully reversible. Finally, a single region, the left posterior STS, showed a significantly different congruity effect in the two experiments, as it was slightly larger in the AV relative to VV paradigm ([-60 -40 8], z=4.51; 183 vox, pcor=0.049), compatible with a specific role in learning of new spoken lexical items. The results therefore suggest that a broad and bilateral network, encompassing language areas but extending beyond them into dorsal parietal and prefrontal cortices, responded to violations of reversible symbolic association regardless of modality.

To interrogate more finely the role of language-related and non-related areas, we turned to a sensitive subject-specific region-of-interest (ROI) analysis. We selected ROIs which are considered as the main hubs of language (Pallier et al., 2011), mathematics (Amalric & Dehaene, 2016) and reading networks. Within these ROIs, we used a separate localizer (Pinel et al., 2007) to recover the subject-specific coordinates of the 10% best voxels involved in amodal sentence processing (within language ROIs), in mental arithmetic (within mathematical ROIs), and in sentence reading relative to listening (within the visual word form area, VWFA). We added this region as it is activated by written words, visual symbols par excellence. We then performed ANOVAs on the betas of the main experiment averaged over these voxels.

A main congruity effect was observed in all ROIs (Table 5). There was also a main effect of experiment in all language ROIs, VWFA and right IT, due on the one hand to larger activations in the AV than VV experiment in frontal and superior temporal ROIs, and on the other hand to the converse trend in the VWFA and IT ROIs. A significant congruity x experiment interaction was seen only in the pSTS and IFG triangularis, because these ROIs showed a large congruity effect in the AV experiment, but no effect in the VV experiment – thus further confirming that these areas contribute specifically to the acquisition of linguistic symbols, while all other areas were engaged regardless of modality. Importantly, in all these analyses, no significant interaction canonicity X congruity nor experiment x canonicity X congruity were observed, confirming the whole brain analyses (Supplementary Figure 4 and Table 5).

ROIs analyses: F-values of ANOVAs performed on the averaged betas of the main task across the 10% best voxels selected in an independent localizer in ROIs commonly activated in language and mathematical tasks. The language ROIs are presented as red areas on the sagittal (x=-50) and coronal (y=- 58) brain slices and the mathematical ROIs as yellow areas. The left white area corresponds to the VWFA; n=52; df=50; pFDRcor: *** <0.001, ** < 0.01, *< 0.05, ° < 0.1.

Finally, in experiment 2 in which participants rated the familiarity of the pairs, we computed a within- subject behavioral index of reversibility as the difference in familiarity rating between incongruent and congruent reversed pairs. Across subjects, this index was correlated with the fMRI congruity effect (difference between incongruent and congruent trials in the ROI) on canonical trials (r=0.49, p=0.028) and especially on reversed trials (r=0.64, p=0.002) in the left dorsal part of area 44. In the right cerebellum, a similar correlation was observed but only for the reversed trials (r=0.57, p=0.008). No significant correlation was observed in other ROIs.

Discussion

Using fMRI in human and non-human primates, we studied the learning of a sequential association between either a spoken label and an object (Exp. 1), or a visual label and an object (Exp. 2). In humans, we observed no difference in brain activation between the learned and the temporally reversed associations: in both directions, violations of the learned association activated a large set of bilateral regions (insula, prefrontal, intraparietal, cingulate cortex,) that extended beyond the language processing network. Thus, humans generalized the learned pairings across a reversal of temporal order (Figure 5). In contrast, non-human primates showed evidence of remembering the pairs only in the learned direction and did not show any signature of spontaneous reversal. Monkey responses to incongruent pairings were entirely confined to the learned canonical order and occurred primarily within sensory areas, with propagation to frontal cortex only for rewarded stimuli, yet still only in the forward direction (Figure 5).

Summary of the two experiments in humans and monkeys.

(In experiment 1, pvoxel < 0.001 & pcluster < 0.05 for humans and monkeys. In experiment 2, pvoxel<0.005 & cluster volume >50 in humans and pvoxel<0.001 & cluster volume >50 in monkeys.)

Several studies previously found behavioral evidence for a uniquely human ability to spontaneously reverse a learned association (Imai et al., 2021; Kojima, 1984; Lipkens et al., 1988; Medam et al., 2016; Sidman et al., 1982), and such reversibility was therefore proposed as a defining feature of symbol representation reference (Deacon, 1998; Kabdebon and Dehaene-Lambertz, 2019; Nieder, 2009). Here, we went one step further by testing this hypothesis at the brain level. Indeed, a limit of previous behavioral studies is that animals could have understood the reversibility of a symbolic relationship, but failed to express it behaviorally because of extraneous procedural or attentional factors, or because of a conflict between different brain processes (e.g., for maintaining the specific and rewarded learned pairing vs. generalizing to the reverse order). Here, we used fMRI and a passive paradigm to directly probe whether any area of the monkey brain would exhibit surprise at a violation of the reversal of a learned association. Our results show that this is not the case.

Interpretation must remain cautious, as there are also some occasional behavioral reports of spontaneous reversal of learned associations, for instance in one well-trained California sea lion and a Beluga whale (Kastak et al., 2001; Murayama et al., 2017; Schusterman and Kastak, 1998) and possibly in 1 out of 20 baboons in Medam et al. (2016). These studies may indicate that, with sufficient training, symbolic representation might eventually emerge in some animals, as also suggested by the small reversal trend in a recent behavior study in baboons (Chartier and Fagot, 2022). However, they may also merely show that animals may begin to spontaneously reverse new associations once they have received extensive training with bidirectional ones (Kojima, 1984). The bulk of the literature strongly suggests that while animals easily learn indexical associations, especially monkeys and chimpanzees (Diester and Nieder, 2007; Livingstone et al., 2010; Matsuzawa, 1985; Premack, 1971), but also dogs (Fugazza et al., 2021; Kaminski et al., 2004), vocal birds (e.g. Pepperberg, 2009) and even bees (Howard et al, 2019), they exhibit little or no evidence for genuine symbolic processing. Discriminating symbolic from indexical representations can be achieved by testing for spontaneous reversibility between the labels and the objects, as in the current study, or by testing for the presence of relationships among the labels (Nieder, 2009).

One previous study showed preliminary evidence for a lack of reversibility in macaque monkey inferotemporal cortex (Meyer & Olson, 2011), but only recorded on a subset of neurons, and after extensive training on pairs of visual images (816 exposures per pair). Interestingly, a similar set of arbitrary stimuli and extensive training protocol (258 trials per pair) was used in an fMRI study on stimulus association in humans, where congruity effects were also found to be restricted to early visual areas (Richter et al., 2018). It might have been that the extensive training lead to more low-level and rigid encoding in the trained direction. It is therefore instructive that, here, we found irreversibility after a very short training. Indeed, in experiment 2, just 24 exposures per pair were sufficient to observe a surprise effect in the canonical direction without generalization in the reverse direction -even after longer exposures. In addition, we strived to make the objects concrete and recognizable to the monkeys (by using pictures of toys that were familiar to them, taken from various angles), while the labels were as abstract as possible to promote a symbol- referent asymmetry in the pairs. We considered using macaque vocalizations for the sounds, but these already have a defined meaning, often emotional, that could have disrupted the experiments. Furthermore, the present animals had extensive experience with human speech. Finally, while the present lab setting could be judged artificial and not easily conducive to language acquisition, previous evidence indicates that human preverbal infants easily learn labels in such a setting (Mersad et al., 2021) and spontaneously reverse associations after only a short training period (Ekramnia and Dehaene-Lambertz, 2019; Kabdebon and Dehaene-Lambertz, 2019).

Non-human primates are often considered the animal model of choice to understand the neural correlates of high-level cognitive functions in humans (Feng et al., 2020; Newsome and Stein-Aviles, 1999; Roelfsema and Treue, 2014). Accordingly, many studies have emphasized the similarity between human and non-human primates in terms of brain anatomy, physiology and behavior (Caspari et al., 2018; De Valois et al., 1974; Erb et al., 2019; Hackett et al., 2001; Harwerth and Smith, 1985; Dante Mantini et al., 2012; D. Mantini et al., 2012; Mantini et al., 2011; Margulies et al., 2016; Petrides et al., 2012; Uhrig et al., 2014; Warren, 1974; Wilson et al., 2017; Wise, 2008). At the same time, important differences between human and monkey brains have been reported as well. Using a direct comparison with fMRI, some specific functional differences have been found (Denys et al., 2004a, 2004b; Mantini et al., 2013; Vanduffel et al., 2002). Particularly relevant is that in contrast to humans, monkeys show clear feature tuning in the prefrontal cortex, which is in line with the sensory activation we found in monkey PFC (Figure 1C) and the involvement of monkey PFC in the congruity effect in experiment 2 (Figure 4B). Many anatomical differences have been reported between humans and monkeys using MRI as well as histological methods. In particular, the human brain is exceptionally large (Herculano-Houzel, 2012), and contains a number of structural differences compared to the brains of other primates (Chaplin et al., 2013; Leroy et al., 2015;

Neubert et al., 2014; Palomero-Gallagher and Zilles, 2019; Rilling, 2014; Schenker et al., 2010; Takemura et al., 2017). Notably, while the human arcuate fasciculus provides a strong direct connection between inferior prefrontal and temporal areas involved in language processing, this bundle is reduced and does not extend as anteriorly and as ventrally in other primates, including chimpanzees (Balezeau et al., 2020; Eichert et al., 2020; Rilling et al., 2012, 2008; Thiebaut de Schotten et al., 2012). Also, the PFC is selectively increased in terms of tissue volume (Chaplin et al., 2013; Donahue et al., 2018; Hill et al., 2010; Smaers et al., 2017). While this may not translate to a selective increase in terms of the number of PFC neurons (Gabi et al., 2016), dendritic arborizations and synaptic density are larger in human PFC (Elston, 2007; Hilgetag and Goulas, 2020; Shibata et al., 2021). These anatomical differences may underlie the fundamental differences in language learning abilities between these species, but this is still controversial (Hopkins et al., 2012; Iriki, 2006). Here, we show that reversibility of associations, a crucial element in the ability to attach symbols to objects and concepts, sharply differs between human and non-human primates and offers a more tractable way to investigate potential differences between species.

The areas that specifically activated in humans when the reversed association was violated were not limited to the classical language network in the left hemisphere. They extended bilaterally to homolog areas of the right hemisphere, which are involved for instance in the acquisition of musical languages (Patel, 2010). They also extend dorsally to the middle frontal gyrus and intraparietal sulcus which are involved in the acquisition of the language of numbers, geometry and higher mathematics (Amalric and Dehaene, 2016; Piazza, 2010; Wang et al., 2019). Finally, an ROI analysis shows that they also include the VWFA and vicinity. The VWFA is known to be sensitive to letters, but also to other visual symbols such as a new learned face-like script (Moore et al., 2014) or emblematic pictures of famous cities (e.g. the Eiffel tower for Paris; Song et al., 2012), and the nearby lateral inferotemporal cortex responds to Arabic numerals and other mathematical symbols (Amalric and Dehaene, 2016; Shum et al., 2013). Strikingly, these extended areas, shown in Figure 2, correspond to regions whose cortical expansion and connectivity patterns are maximally different in humans compared to other primates (Chaplin et al., 2013; Donahue et al., 2018; Hill et al., 2010; Smaers et al., 2017). They also fit with a previous fMRI comparison of humans and macaque monkeys, where humans were shown to exhibit uniquely abstract and integrative representations of numerical and sequence patterns in these regions (Wang et al., 2015).

In all of these studies, the observed changes are bilateral, extended, and go beyond the language network per se. Such an extended network does not fit with the hypothesis that a single localized system, such as natural language or a universal generative faculty, is the primary engine of all human-specific abstract symbolic abilities (Hauser and Watumull, 2017; Spelke, 2003). Rather, our results suggest that multiple parallel and partially dissociable human brain networks possess symbolic abilities and deploy them in different domains such as natural language, music and mathematics (Amalric and Dehaene, 2017; Chen et al., 2021; Dehaene et al., 2022; Fedorenko et al., 2011; Fedorenko and Varley, 2016).

The neurobiological mechanism that enables reversible symbol learning in humans remain to be discovered. Interestingly, most learning rules, such as spike-time-dependent plasticity, are sensitive to temporal order and timing, a feature of fundamental importance for predictive coding. In contrast, as indicated by the behavioral results of experiment 2, humans seem to forget the temporal order in which pairs of stimuli are presented when they store them at a symbolic level. This has been interpreted as improper causal reasoning (Ogawa et al., 2010). Indeed, if A repeatedly precedes B, then perceiving A predict the appearance of B; but if B is observed, concluding to the likely presence of A is a logical fallacy. Still, brain mechanisms for temporal reversal do exist in the literature. The most prominent candidate, in both humans and non-human animals, is hippocampal-dependent neuronal replay of sequences of events, which can occur in both forward and reverse temporal order (Foster, 2017; Liu et al., 2019). Sequence reversal may be important during learning, in order to trace back to a memorized event that led to a reward. In line with this, a retroactive gradient has been shown in memory storage in humans, where memory is strongest for stimuli that were presented close to the reward but preceding it (Braun et al., 2018). This memory trace may explain the slight facilitation observed in baboons when they learn reversed congruent pairs relative to reversed incongruent pairs (Chartier et al, 2022). Although neuronal replay in both forward and reverse directions exists in non-human animals, it might be that this mechanism has selectively expanded to symbol-related areas of the human brain – a clear hypothesis for future work.

Obviously, even humans do not always disregard temporal order for all associations between stimulus pairs – for instance, they remember letters of the alphabet in a fixed temporal order (Klahr et al., 1983). Thus, future work should also clarify which conditions promote reversible symbolic learning. Here, the pairs comprised one fixed and abstract element (either linguistic or graphical), which served as a label, paired with several different views of a concrete object. In human infants, the association of a label with the presentation of objects helps them construct the object category, as revealed by several experiments in which infants discriminate between categories (Ferry et al., 2013), or correctly process the number of objects (Xu et al., 2005) when the categories and objects are named, but not in the absence of a label. Interestingly, preverbal infants are flexible and accept pictures as labels for a rule (Kabdebon et al, 2019), as well as monkey vocalizations and tones as labels for an animal category (Ferguson and Waxman, 2016; Ferry et al., 2013), whereas older infants who have been exposed to many social situations in which language is the primary symbolic medium to transfer information, expect symbolic labels to be in the native language (Perszyk and Waxman, 2019). Later, they recover flexibility suggesting that this transient limitation might be a contextual strategy due to the pivotal role of language in naming at this time of life.

While our results suggest a dramatic difference in the way human and non-human primates encode associations between sensory stimuli, several limitations of the present work should be kept in mind. First, due to ethical and financial reasons we only tested 4 monkeys, while we tested 55 humans in total. While it is common in primate physiological studies to report the results for 2 animals, this makes it challenging to extrapolate the results to a species of animals (Fries and Maris, 2022). To address this point, we combined the results from two different labs, collecting data from 2 animals in each lab. A second limitation is that we only tested macaque monkeys; non-human primates closer to humans, such as chimpanzees, might yield different conclusions, and chimpanzee Ai’s failure of reversibility (Kojima, 1984), although striking, may not be representative. Similarly, reversible symbolic learning should be evaluated in vocal learners such as songbirds and parrots, as some demonstrate sophisticated flexible label learning (see e.g. Pepperberg and Carey, 2012). Furthermore, in dogs, social interactions between the dog and the experimenter during learning facilitate associations (Fugazza et al., 2021), as it is also the case in infants. Social cues were absent in our design, and whether they would favor a switch to a symbolic system might be interesting to explore. Finally, we only tested adult monkeys, yet there might be a critical period during which reversible symbolic representation might be possible with appropriate training procedures; indeed, juvenile macaques learn better and faster to associate an arbitrary label with visual quantities than adults (Srihasam et al., 2012). The present work provides a simple experimental paradigm that can easily be extended to all these cases, thus offering a unique opportunity to test whether humans are unique in their ability to acquire symbols.

Methods

Participants

We tested four adult rhesus macaques (male, 6-8 kg, 5-19 years of age). YS and JD participated in experiment 1 and JD, JC and DN in experiment 2. All procedures were conducted in accordance with the European convention for animal care (86-406) and the NIH’s guide for the care and use of laboratory animals. They were approved by the Institutional Ethical Committee (CETEA protocol # 16-043) and by the ethical committee for animal research of the KU Leuven. Animal housing and handling were according to the recommendations of the Weatherall report, allowing extensive locomotor behavior, social interactions, and foraging. All animals were group-housed (cage size at least 16-32 m3) with diverse cage enrichment (auditory and visual stimuli, toys, foraging devices etc.).

We also tested 55 healthy human subjects with no known neurological or psychiatric pathology (Exp. 1, n=31; Exp2., n=24; in experiment 2, an additional 3 subjects were not included because they showed no evidence of learning the canonical pairs). Human subjects gave written informed consent to participate in this study, which was approved by the French national Ethics Committee.

Stimuli

Five sets of images were used (Supplementary Figure 1). The two first sets were 3D renderings of objects differing in their visual properties and semantic categories. As they might be considered as more familiar to humans, the other three sets of objects were photographs of monkey toys which the monkeys were exposed to in their home cages for at least 2 weeks prior to the training blocks. They were mostly geometrical 3D objects with no evident and consistent name for naive human participants. The rendering and photos were taken from 8 different viewpoints. These stimuli were used in both experiments and are called “object” thereafter.

A label was associated to each object in each set. For experiment 1, the labels were auditory French pseudo- words with large differences in the number and identity of their syllables within each set (e.g. “tøjɑ°”, “ɡliʃu”,”byɲyɲy”, “kʁɛfila”). Note that monkeys were daily exposed to French radio and television as well as to French-speaking animal caretakers. or experiment 2, the labels were abstract black-and-white shapes, difficult to name and similar to the lexigrams used to train chimpanzees to communicate with humans (Matsuzawa, 1985) .

Experimental paradigm

Stimulus presentation

Each set to be learned comprised 4 pairs. Two pairs were presented in the label- object direction (L1-O1 & L3-O3), and two in the object-label direction (O2-L2 & O4-L4). Labels were speech sounds in experiment 1, and black-and-white shapes in experiment 2. In each trial, the first stimulus (label or object) was presented during 700ms, followed by an inter-stimulus-interval of 100ms then the second stimulus during 700ms. The pairs were separated by a variable inter-trial-interval of 3-5 seconds. The visual stimuli were ∼8 degrees in diameter, centered on the screen, with an average luminance set equal to the background. At each trial, the orientation of the object was randomly chosen among the 8 possibilities. A cross was present at the center of the screen when no visual stimulus was present. Auditory stimuli were presented to both ears at 80dB.

Training

The experiment was designed to be also tested in 3-month-old human infants (Ekramnia et al, in preparation), which explains our choice of short training sessions over 3 consecutive days because of the short attention span in infants and the reported benefit of sleep for encoding word meaning after a learning session (Friedrich et al., 2017). Therefore, training consisted of three short videos presenting 24 trials as described above (one video for each of the 3 training days). Two pairs (one in each direction) were introduced on the first day of training (e.g., L1-O1 and O2-L2). First, one pair was shown for 6 trials, then the other pair for 6 trials, then the two pairs were randomly presented for 6 trials each. On the second day of training, the two other pairs (L3-O3 and O4-L4) were presented using the same procedure than on day 1. On the third day, all pairs were randomly presented (6 presentations each). The object-label pairing was constant but the direction of presentation (O-L or L-O) and the introduction of the pair on the first or second day was counterbalanced across participants.

Human protocol

n experiment 1, the participants came to the lab to watch the first video, and on the next two consecutive days they received a web link on which the two videos were uploaded for each day. For experiment 2, all 3 videos were sent via a web link. The participants were only instructed to look attentively at each movie (24 trials, ∼3 min long) one time on a given day. The participants came for the fMRI session on the fourth day. Each participant saw only one set of objects-labels, either stimulus set 2 or stimulus set 3, distributed equally across participants.

In experiment 2, we added a behavioral test at the end of the MRI session to check their learning. They were shown all 16 possible trial pairs (incongruent and congruent in canonical and non-canonical order), plus 16 never seen, one by one. For each of them, they were asked to rate how frequently they had seen them (on a 5-level scale ranging from never to rarely, sometimes, often and always). The results were analyzed using a 5-level ANOVA which included the canonicity X congruity 2x2 design. A computer crash erased responses from two participants and one subject did not participate leaving 21 subjects for this analysis.

Monkey protocol

Monkeys were implanted with an MR-compatible headpost under general anesthesia. The animals were first habituated to remain calm in a chair inside a mock MRI setup, and trained to fixate a small dot (0.25 degrees) within a virtual window of 1.25-2 degrees diameter (Uhrig et al., 2014). Then similar to the human participants, they received 1 training block per day for 3 consecutive days (24 trials per block) for each stimulus set. Rewards were given at regular intervals, asynchronous with the visual and auditory stimulus presentation. On the fourth day, they were scanned while being presented with the test blocks for the corresponding stimulus set. All monkeys were trained and tested with all of the stimulus sets.

For experiment 1, after the first imaging session at day 4 which did not show learning (no difference between congruent and incongruent pairs in the canonical direction), monkeys were further trained for an additional 2 weeks (∼80 blocks), and then scanned each day during 4 days. Then a new set of four object- label pairs was presented with the same training and testing design. So, training and testing took 3 consecutive weeks for each of the five stimulus sets.

In experiment 2, a reward was introduced to promote monkeys’ engagement in the task. The amount of reward that the monkeys received after successfully fixating throughout the pair presentation was either increased or decreased for a duration of 1450ms (starting 100ms after the offset of the second stimulus), depending on the identity of the visual object. The amount of reward remained the same, but the time in between consecutive rewards was set either twice as short (for high rewards) or twice as long (for low rewards). For each direction, one visual object was associated with a high reward while the other one was associated with a low reward (see Supplementary Figure 2). By design, the two pairs that were averaged for each of the critical tested dimension (direction, congruity and canonicity of the pair) had opposite reward size, making reward size an orthogonal design element. The first stimulus set was used for procedural training on this reward association paradigm for 2 weeks. Stimulus sets 2-5 were used for training as in experiment 1 (with 1 block per day for 3 consecutive days) and an fMRI test session on the fourth day.

Test in MRI

The MRI session comprised 4 test blocks in humans and between 12 and 32 blocks in monkeys. In both humans and monkeys, each block started with 4 trials in the learned direction (congruent canonical trials), one trial for each of the 4 pairs (2 O-L and 2 L-O pairs). The rest of the block consisted of 40 trials in which 70% of trials were identical to the training; 10% were incongruent pairs but the direction (O-L or L-O) was correct (incongruent canonical trials), thus testing whether the association was learned; 10% were congruent pairs but the direction within the pairs was reversed relative to the learned pairs (congruent reversed trials) and 10% were incongruent pairs in reverse (incongruent reversed trials). As the percentage of congruent and incongruent pairs was the same in the reversed direction, a difference can only be due to a generalization from the canonical direction. For incongruent trials, the incongruent stimulus always came from the pair presented in the same direction (see figure 1), in order to avoid that a change of position within the pair itself (1st or 2nd stimulus) induced the perception of an incongruity.

Human participants were only instructed to keep their eyes fixed on the fixation point and pay attention to the stimuli. The monkeys were rewarded for keeping their eyes fixed on the fixation point, as in the training. In Experiment 1, the reward was constant, whereas in Experiment 2, they received the differential reward that was implemented during training.

Data acquisition

For experiment 1, both humans and monkeys were scanned with the 3T Siemens Prisma at NeuroSpin using a T2*-weighted gradient echo-planar imaging (EPI) sequence, using a 64-channel head coil for humans and a customized eight-channel phased-array surface coil (KU Leuven, Belgium) for monkeys. The imaging parameters were the following: in humans, resolution: 1.75mm isotropic, TR: 1.81s, TE: 30.4ms, PF: 7/8, MB3, slices: 69; in monkeys, resolution: 1.5mm isotropic, TR: 1.08s, TE: 13.8ms, PF: 6/8, iPAT2, slices: 34.

Monkeys were trained to sit in a sphinx position in a primate chair with their head fixed. MION (monocrystalline iron oxide nanoparticle, Molday Ion, BioPAL, Worchester MA) contrast agent (10 mg/kg, i.v.) was injected to monkeys before scanning (Vanduffel et al., 2001). Eye movements were monitored and recorded by an eye tracking system (EyeLink 1000, SR Research, Ottawa, Canada). In total, we recorded 583 valid runs, 278 for YS and 305 for JD.

For Experiment 2, the settings remained the same for the humans and for one of the monkeys (JD). Two new monkeys (JC and DN) were included at the Laboratory of Neuro- and Psychophysiology of KU Leuven and scanned with a 3T Siemens Prisma using a T2*-weighted gradient echo-planar imaging (EPI) sequence. For JC, an external 8-channel coil was used and the imaging parameters were the following: resolution: 1.25mm isotropic, TR: 0.9s, T7: 15ms, PF: 6/8, iPAT3, multi-band 2, slices: 52. For DN, an implanted 8- channel coil was used and the imaging parameters were the following: resolution: 1.25mm isotropic, TR: 0.9s, TE: 15ms, PF: 6/8, iPAT3, multi-band 2, slices: 40. Monkeys were trained to sit in a sphinx position in a primate chair with their head fixed, and MION was again injected to before scanning (11 mg/kg, i.v.). Eye movements were monitored and recorded by an eye tracking system (ETL200, ISCAN inc., Woburn, MA, USA). The animals were also required to keep their hands in a box in front of the chair (as verified with optical sensors), which limited body motion. In total, we recorded 279 valid runs, 81 for JD, 106 for JC and 92 for DN.

Preprocessing of monkey fMRI data

Functional images were reoriented, realigned, resampled (1.00 mm isotropic) and coregistered to the anatomical template of the monkey Montreal Neurologic Institute (Montreal, Canada) space using Pypreclin, a custom-made scripts of Python programming language (Tasserie et al., 2020).

Eye-data was inspected for each run for quality. Only runs with more than 85% fixation (virtual window of 2-2.5 degrees diameter) were included for further analyses (n=16 excluded in experiment 1 and n=14 excluded in experiment 2). Moreover, a trial was excluded if the eyes were closed for more than 650ms (out of 700) while an image was present on the screen. In experiment 1, the top 5% of runs were motion was strongest across monkeys were excluded (n=30) because there remained significant residual motion. In total, for experiment 1, 395 runs remained to be analyzed, 184 for YS and 211 for JD. For experiment 2, 268 runs remained, 77 for JD, 107 for JC and 84 for DN.

Preprocessing of human fMRI data

SPM12 (http://www.fil.ion.ucl.ac.uk/spm) was used for preprocessing of human data as well as first and second level models. Preprocessing consisted of standard preprocessing pipeline, including slice-time correction, realign, top-up correction, segmentation, normalization to standard MNI space and smoothing with a 4-mm isotropic Gaussian.

First and second-level analyses

After imaging preprocessing, active brain regions were identified by performing voxel-wise GLM analyses implemented in SPM12 in both monkeys and humans. For the first experiment, in a first-level SPM model, the twelve predictors included: (1-4) the onsets of the first stimulus of the pair (4 regressors consisting in the combinations of audio/visual and canonical/non-canonical factors), and (5-12) the onsets of the second stimulus (8 regressors consisting in the combinations of audio/visual, canonical/non-canonical and congruent/incongruent factors). These twelve events were modeled as delta functions convolved with the canonical hemodynamic response function (for MION in case of monkeys). Parameters of head motion derived from realignment were also included in the model as covariates of no interest. Contrast images for the effect of congruity (incongruent-congruent canonical and incongruent - congruent non-canonical) as well as the interaction (congruity x canonicity) were computed. For the second experiment, the analysis was the same, except that given the two elements of the pair were in the same visual modality, only a single predictor was used for each stimulus pair, giving 4 predictors: the onsets of the second stimulus of the pair, with congruent/incongruent and canonical/non-canonical as the two factors. For the monkeys, an additional factor was whether the pair was associated with a high or a low reward, giving 8 predictors. In this case, the temporal derivative of the hemodynamic response function was added to the model as well. Before entering the second-level analysis, the data was smoothed again, using a 5mm smoothing kernel in humans and 2 mm in monkeys.

For the second-level group analysis, subjects were taken as the statistical unit for the humans and runs were taken as statistical units for the monkeys. One-sample t-tests were performed on the contrast images to test for the effect of the condition. Results are reported at an uncorrected voxelwise threshold of p<0.001 and a cluster p<0.05 corrected for multiple comparisons (FDR).

ROI analyses

In a separate localizer, human participants listened and read short sentences. In some of the sentences, the participants were asked to compute easy mathematical operations (math sentences). Subtracting activations to math and non-math sentences allowed to separate the regions more involved in mathematical cognition than in general sentence comprehension and reciprocally. We selected seven left-hemispheric regions previously reported as showing a language-related activation (Pallier et al., 2011), 6 bilateral ROIs showing mathematically-related activations (Amalric & Dehaene, 2016) and finally a 10-radius sphere around the VWFA [-45 -57 -12]. In these ROIs, we recovered the coordinates of each participant’s 10% best voxels in the comparisons: sentences vs rest for the 6 language Rois plus reading vs listening for the VWFA, and numerical vs non-numerical sentences for the 8 mathematical ROIs. We extracted the beta of these voxels and performed ANOVAs with Congruity and Canonicity as within-subject factors and experiment as between-subjects factor. “ Two participants in experiment 1 and one in experiment 2 had no localizer, leaving 52 participants (n=29 and n= 23) for these analyses. P-values were FDR corrected considering all 15 ROIs in each comparison.

Stimulus sets for experiment 1.

A) Complete description of the task paradigm for visual-visual label learning. Subjects were habituated to 4 different visual-visual pairs during three days. Two pairs were in the ‘object-label’ order and two pairs in the ‘label-object’ order. During the test phase, the same canonical order was kept in 80% of the trials, including 10% of incongruent pairs. In reversed trials (20% of trials), the pairs were either congruent (10%) or incongruent (10%) with the learning. For the monkeys, one pair in each direction was associated with a high reward while the other one was associated with a low reward, making the reward size orthogonal to congruity and canonicity. B) Stimulus sets for experiment 2 in monkeys. Humans were tested with stimulus set 2.

Effect of reward for the visual-visual task in non-human primates.

A) Significant clusters from the incongruent-congruent canonical contrast in low reward trials. B) Significant from the incongruent-congruent canonical contrast in high reward trials. C) Significant clusters from the interaction between congruity and reward. pvoxel<0.001 & pcluster <0.05 in all panels.

Analyses of all human participants in experiments 1 and 2 merged.

A) Main effect of experiment. B) Main effect of congruity, C) Effect of congruity in the canonical trials and D) in the reversed trials. E) No significant cluster was observed for the interaction canonicity X congruity. F) slices in the 3 planes showing the only significant cluster in the Experiment X Congruity interaction. pvoxel<0.001 & pcluster <0.05 in all panels