Multisensory discrimination task for rats

a Schematic illustration of the behavioral task. A randomly selected stimulus was triggered when a rat placed its nose in the central port. If the triggered stimulus was a 10 kHz pure tone (A10k), a vertical light bar (Vvt), or their combination (A10kVvt), the rat was rewarded at the left port. Conversely, for other stimuli (a 3 kHz pure tone (A3k), a horizontal light bar (Vhz), or A3kVhz), the rat was required to move to the right port. Visual stimuli were presented via custom-made LED matrix panels, with one panel for each side. Auditory stimuli were delivered through a centrally located speaker. b The mean performance for each stimulus condition across rats. Circles connected by a gray line represent data from one individual rat. c Cumulative frequency distribution of reaction time (time from cue onset to leaving the central port) for one representative rat in auditory, visual and multisensory trials. d Comparison of average reaction times across rats in auditory, visual, and multisensory trials. ***, p<0.001. Error bars represent SDs.

Auditory, visual and multisensory selectivity in task engagement

a Histological verification of recording locations within the primary auditory cortex (Au1). b Proportion of neurons responding to auditory-only (A), visual-only (V), both auditory and visual (A, V), and audiovisual-only (VA) stimuli. c-d Rasters (top) and peristimulus time histograms (PSTHs, bottom) show responses of two exemplar neurons in A3k (light blue), A10k (dark blue), Vhz (light green), Vvt (dark green), A3kVhz (light orange) and A10kVvt (dark orange) trials (correct only). Mean spike counts in PSTHs were computed in 10-ms time windows and then smoothed with a Gaussian kernel (σ = 100ms). Black bars indicate the stimulus onset and duration. e Mean normalized response PSTHs across neurons in auditory (top), visual (middle), and multisensory (bottom) trials (correct only) for multisensory discrimination (left) and free-choice (right) groups. Shaded areas represent mean ± SEM. Black bars indicate stimulus onset and duration. f Histograms of auditory (top), visual (middle) and multisensory (bottom) selectivity for multisensory discrimination (left) and free-choice (right) groups. Filled bars indicate neurons for which the selectivity index was significantly different from 0 (permutation test, p<0.05, bootstrap n = 5000). The dash line represents zero. g Comparison of visual selectivity distribution between audiovisual (top) and visual (bottom) neurons. h Comparison of auditory (A), visual (V), and multisensory (AV) selectivity of 150 audiovisual neurons, ordered by auditory selectivity. i Average absolute auditory, visual and multisensory selectivity across 150 audiovisual neurons. Error bars represent SDs. *, p<0.05; ***, p<0.001. j Decoding accuracy of populations in the multisensory discrimination group. Decoding accuracy (cross-validation accuracy based on SVM) of populations between responses in two auditory (blue), two visual (green) and two multisensory (red) trials. Each decoding value was calculated in a 100ms temporal window moving at the step of 10ms. Shadowing represents the mean ± SD from bootstrapping of 100 repeats. Two dashed lines represent 90% of decoding accuracy for auditory and multisensory conditions. k Decoding accuracy of populations in the free-choice group.

Auditory and visual integration in multisensory discrimination task

a-c Rasters and PSTHs showing responses of three typical neurons recorded in well-trained rats performing multisensory discrimination tasks. d Population-averaged multisensory responses (red) compared to the corresponding stronger unisensory responses (dark blue) in A3k-Vhz (top) and A10k-Vvt (bottom) pairings for multisensory discrimination (left) and free-choice (right) groups. e Comparison of modality selectivity (multisensory vs. stronger unisensory) between A3k-Vhz (x-axis) and A10k-Vvt (y-axis) conditions. Each point represents values for a single neuron. Open circles: modality selectivity in either condition was not significant (p > 0.05, permutation test); triangles: significant selectivity in either A3k-Vhz (blue) or A10k-Vvt (green) condition; Diamonds: significant selectivity in both conditions. Dashed lines show zero modality selectivity. Points labeled a-c correspond to the neurons in panels a-c. f Mean modality selectivity for A10k-Vvt and A3k-Vhz pairings across audiovisual neurons in the multisensory discrimination group. g Comparison of modality selectivity for the free-choice group. h Mean modality selectivity across audiovisual neurons for the free-choice group. i SVM decoding accuracy in AC neurons between responses in multisensory vs. corresponding unisensory trials. Black line indicates shuffled control. j Positive relationship between the change in modality selectivity (A10k-Vvt - A3k-Vhz) and the selectivity change (multisensory selectivity - auditory selectivity). k Probability density functions of predicted mean multisensory responses (predicted AV) based on 0.83 times the sum of auditory (A) and visual (V) responses (same neuron as in Fig. 2c). The observed multisensory response matches the predicted mean (Z-score = -0.17). l Frequency distributions of Z-scores. Open bars indicate no significant difference between actual and predicted multisensory responses. Red bars: Z-scores>=1.96; blue bars: Z-scores<=-1.96.

Impact of Choice selection on Audiovisual Integration

a PSTHs show mean responses of an exemplar neuron for different cue and choice trials. b Mean modality selectivity across neurons for correct (orange) and incorrect (blue) choices in the A10k-Vvt pairing. c Comparison of modality selectivity between correct and incorrect choices for the A10k-Vvt pairing. Error bars, SEM. d, e Similar comparisons of modality selectivity for correct and incorrect choices in the A3k-Vhz pairing.

Information-dependent integration and discrimination of audiovisual cues

a-c Responses of three example neurons to six target stimuli and two unmatched multisensory cues (A3kVvt and A10kVhz) presented during task engagement. d Lower panel: Modality selectivity for different auditory-visual pairings. Neurons are sorted by their modality selectivity for the A10kVvt condition. Upper panel: Mean modality selectivity across the population for each pairing. Error bar, SEM. ***, p<0.001. e Population decoding accuracy for all stimulus pairings (8×8) within a 150ms window after stimulus onset. White, purple, and red dashed squares denote the decoding accuracy for discriminating two target multisensory cues (white), discriminating two unmatched auditory-visual pairings (red) and matched versus unmatched audiovisual pairings (purple), respectively. Conventions are consistent with Fig. 2.

Multisensory integration and cue preferences of neurons recorded in left AC

a The behavioral task and the recording site within the brain (the left AC). b The proportion of neurons responsive to auditory-only (A), visual-only (V), both auditory and visual (A, V), and audiovisual-only (VA) stimuli based on their spiking activity. c The mean PSTHs of normalized responses across neurons for different cue conditions. d A histogram of auditory selectivity index. Filled bars represent neurons with a significant selectivity index (permutation test, P < 0.05, bootstrap n = 5000). e Similar to panel d, this panel shows a histogram of the visual selectivity index. f Comparison of modality selectivity between A3k-Vhz and A10k-Vvt pairings. g A boxplot shows the comparison of modality selectivity for audiovisual neurons. ***, P<0.001. The conventions used are consistent with those in Fig. 2 and Fig. 3.

Multisensory integration and cue preferences in right AC neurons after unisensory training

a This panel depicts the training stages for the rats: auditory discrimination training followed by visual discrimination training. b The proportion of neurons of different types. c Mean normalized responses across neurons for different cue conditions. d-e Histograms of visual (d) and auditory (e) selectivity. f Comparison of neuronal modality selectivity between A3k-Vhz and A10k-Vvt pairings. g Modality selectivity comparison in audiovisual neurons. N.S., no significant difference. The conventions used are consistent with those in Fig. 2, Fig. 3 and Fig. 6.

Characteristic Frequency (CF) and response to target sound stimuli of AC neurons recorded in well-trained rats under anesthesia

a Frequency-intensity tonal receptive fields (TRFs) of three representative AC neurons. The red triangle denotes CF. To create the TRF, we measured responses to a series of pure tone sounds (1-45 kHz in 1 kHz steps) at different intensities (20-60 dB in 10 dB increments). The color bar indicates the range of maximum and minimum spike response counts observed within the TRFs. b CF distributions of neurons recorded from two representative rats. c Overall distribution of all recorded neurons. d Comparison of responses to 3 kHz (A3k) and 10 kHz (A10k) pure tones. Each circle represents one neuron. e Comparison of mean response magnitude across populations for A3k and A10k conditions.

Cue and modality selectivity of AC neurons in well-trained rats under anesthesia

a Comparison of auditory and visual selectivity. Each point represents values for a single neuron. Open circles indicate neurons where neither auditory nor visual selectivity was significant (p > 0.05, permutation test). Triangles indicate significant selectivity in either auditory (blue) or visual (green) conditions, while diamonds represent significant selectivity in both auditory and visual conditions. Dashed lines represent zero cue selectivity. b Mean auditory (filled) and visual (unfilled) selectivity. Error bars represent SEMs. c Modality selectivity in A3k-Vhz and A10k-Vvt pairings. d Mean modality selectivity in A3k-Vhz (unfilled) and A10k-Vvt (filled) pairings. Error bars represent SEMs. e, f Comparison of mean multisensory responses (red) with mean corresponding largest unisensory responses (dark blue) across populations for A3k-Vhz (e) and A10k-Vvt (f) pairings. The conventions used are consistent with those in Fig. 2 and Fig. 3.

The observed versus the predicted multisensory response

a Probability density functions of the predicted mean multisensory responses (predicted AV) based on summing the modality-specific visual (V) and auditory (A) responses (see Methods for details). Black arrow, the mean of the predicted distribution. The same neuron is shown in Fig. 2B and only responses in correct contralateral choice trials are involved. The predicted mean, actual mean observed, and their difference expressed in SDs (Z-score) are shown. b Frequency distributions of Z-scores. Open bars represent Z-scores between -1.96 and 1.96, indicating that the actual observed multisensory response is not significantly different from the predicted multisensory response (additive integration). Red bars, Z-score>=1.96 (superadditive integration); blue bars, Z-score<=-1.96 (subadditive integration). c Comparison between the mean predicted multisensory response and the actual observed multisensory response for all audiovisual neurons. Note that the predicted multisensory response is greater than the observed in most cases. Red squares and blue diamonds represent neurons with observed multisensory responses that are significantly greater (red squares) or less (blue diamonds) than the predicted mean multisensory responses.