Experimental paradigm for auditory-visual label learning.

A) Subjects were exposed to 4 different visual-auditory pairs during 3 days (6 repetitions of each pair, 3 minute video). Two pairs were always presented in the ‘visual-then-auditory’ order (object to label), and 2 in the ‘auditory-then-visual’ (label to object) order. During the test phase, this canonical order was kept on 80% of trials, including 10% of incongruent pairs to test memory of the learned pairs, and was reversed on 20% of the trials. On reversed trials, half the pairs were congruent and half were incongruent (each 10% of total trials), thus testing reversibility of the pairings without affording additional learning. B,C) Activation in sensory cortices. Although each trial comprises auditory and visual stimuli, these could be separated by the temporal offsets. Images show significantly activated regions in the contrasts image > sound (red-yellow) and sound > image (blue-light blue), averaged across all subjects and runs for humans (B) and monkeys (C). D,E) Average finite-impulse-response (FIR) estimate of the deconvolved hemodynamic responses for humans (D) and monkeys (E) within clusters shown in B and C respectively, separately for visual-audio (VA) and audio-visual (AV) trials. Sign flipped on y-axis for monkey responses.

Stimulus sets for experiment 1.

Congruity effects in the auditory-visual task in humans (experiment 1).

A) areas activated by incongruent trials more than by congruent trials in canonical trials (red), reverse trials (blue), and their overlap (green). Brain maps are thresholded at pvoxel < 0.001 & pcluster < 0.05 corrected for multiple comparisons across the brain volume. No interaction effect was observed between congruity and canonicity. B) Average FIR estimate of the deconvolved hemodynamic responses within significant clusters in the left hemisphere, separately for VA and AV trials. 31 human subjects were tested, on a single imaging session per subject after 3 days of exposure to canonical trials.

Congruity effect in Experiment 1 in 31 human subjects, with 1 imaging session per subject after 3 days of exposure to congruent canonical pairs. MNI coordinates and t-values of the different contrasts at the peak voxel of each significant cluster (main effect of congruity, congruity effect for canonical trials, and congruity effect for reversed trials).

Congruity effects in the auditory-visual task in monkeys (experiment 1).

A) significant clusters from the incongruent-congruent canonical contrast. No significant clusters were found for the reversed direction. B) significant clusters from the interaction between congruity and canonicity. (pvoxel<0.001 & pcluster<0.05 for both maps) C,D) Average FIR estimate of the deconvolved MION responses within the clusters from the incongruent-congruent canonical contrast, averaged over VA and AV trials. All clusters in early visual areas were taken together to create figure C. The two monkeys were scanned after two additional weeks of exposure (4 imaging sessions per subject per stimulus set, 3 stimulus sets were used).

Congruity effect in Experiment 1 in two monkeys after two additional weeks of exposure to congruent canonical pairs. Per subject, 3 stimulus sets were used, with 4 imaging sessions per stimulus set. MNI coordinates and t-values of the different contrasts at the peak voxel of each significant cluster (congruity effect for canonical trials, congruity effect for reversed trials, and interaction effect of congruity and canonicity).

Visual-visual label learning in humans and monkeys (experiment 2). A, Experiment paradigm.

Subjects were habituated to 4 different visual-visual pairs during 3 days. Two pairs were in the ‘object-then-label’ order and two pairs in the ‘label-then-object’ order. For the monkeys, 1 object in each direction was associated with a high reward while the other 1 was associated with a low reward, making reward size orthogonal to congruity and canonicity (See Supplementary Figure 2 for details). B, monkey fMRI results. Significant clusters (pvoxel<0.001 & cluster volume >50) from the incongruent-congruent canonical contrast (left) and the interaction between congruity and canonicity (right). One imaging session per subject per stimulus set was performed after 3 days of exposure to canonical trials in each of the 3 monkeys, with 5 stimulus sets per subject. C, human fMRI results. Areas more activated by incongruent trials than by congruent trials in the canonical (red), and the reversed direction (blue), and their overlap (green) (right) (pvoxel<0.005 & cluster volume >50). No red voxels are visible because all of them figure in the overlap (green). One imaging session was performed per subject in 23 participants after 3 days of exposure to a short block of 24 canonical trials. D, Human behavioural results. After learning, human adults rated the familiarity of different types of pairs (including a fifth category of novel, never seen pairings). Each dot represents the mean response of a subject in each condition. Although the reversed congruent trials constituted only 10% of the trials, they were considered almost as familiar as the canonical congruent pairs.

A) Complete description of the task paradigm for visual-visual label learning.

Subjects were habituated to 4 different visual-visual pairs during 3 days. Two pairs were in the ‘object-label’ order and two pairs in the ‘label-object’ order. During the test phase, the same canonical order was kept in 80% of the trials, including 10% of incongruent pairs. In reversed trials (20% of trials), the pairs were either congruent (10%) or incongruent (10%) with the learning. For the monkeys, 1 pair in each direction was associated with a high reward while the other one was associated with a low reward, making the reward size orthogonal to congruity and canonicity. B) Stimulus sets for experiment 2 in monkeys. Humans were tested with stimulus set 2.

Congruity effect in Experiment 2 in 3 monkeys after 3 days of exposure to congruent canonical pairs. Per subject, 5 stimulus sets were used, with 1 imaging session per stimulus set. MNI coordinates and t-values of the different contrasts at the peak voxel of each significant cluster (congruity effect for canonical trials, congruity effect for reversed trials, and interaction effect of congruity and canonicity).

Effect of reward for the visual-visual task in non-human primates.

A) Significant clusters from the incongruent-congruent canonical contrast in low reward trials. B) Significant from the incongruent-congruent canonical contrast in high reward trials. C) Significant clusters from the interaction between congruity and reward. pvoxel<0.001 & pcluster <0.05 in all panels.

Congruity effect in Experiment 2 in 23 human subjects, with one imaging session per subject after 3 days of exposure to congruent canonical pairs. MNI coordinates and t-values of the different contrasts at the peak voxel of each significant cluster (main effect of congruity, congruity effect for canonical trials, and congruity effect for reversed trials).

Analyses of all human participants in experiments 1 and 2 merged.

A) Main effect of experiment. B) Main effect of congruity, C) Effect of congruity in the canonical trials and D) in the reversed trials. E) No significant cluster was observed for the interaction canonicity X congruity. F) slices in the 3 planes showing the only significant cluster in the Experiment X Congruity interaction. pvoxel<0.001 & pcluster <0.05 in all panels.

ROIs analyses of the language and mathematics localizer : F-values of ANOVAs performed on the averaged betas of the main task across different ROIs (main effect of congruity, canonicity, experiment (1 or 2), and interaction effect of congruity and canonicity, and congruity and experiment). These ROIs correspond to the 10% best voxels selected in each participant thanks to an independent and short localizer, in regions commonly reported in the literature as activated in language and mathematical tasks. In this localizer, participants listened to and read short sentences of general content or requiring easy mental calculations. On the sagittal (x=-50mm) and coronal (y=-58mm) brain slices, the language and mathematical ROIs are presented as red and yellow areas respectively. The left-lateralized white area corresponds to the Visual Word Form Area (VWFA); n=52; df=50; pFDRcor: *** <0.001, ** < 0.01, *< 0.05, ° < 0.1.

Summary of the two experiments in humans and monkeys.

(In experiment 1, pvoxel < 0.001 & pcluster < 0.05 for humans and monkeys. In experiment 2, pvoxel<0.005 & cluster volume >50 in humans and pvoxel<0.001 & cluster volume >50 in monkeys.)