Multimodal mismatch responses in mouse auditory cortex

Magdalena Solyga; Georg B. Keller

doi:10.7554/eLife.95398.1

eLife assessment

This study provides important findings on the modulation of cortical neuronal responses to sensory stimuli by motor-driven predictive signals. The study is methodologically sound and well-designed. The data, as analysed, provide incomplete support for the conclusion that audiomotor mismatch responses are observed in the auditory cortex and that these are strongly modulated by cross-modal signals.

https://doi.org/10.7554/eLife.95398.1.sa3

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

incomplete: Main claims are only partially supported

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Summary

Our movements result in predictable sensory feedback that is often multimodal. Based on deviations between predictions and actual sensory input, primary sensory areas of cortex have been shown to compute sensorimotor prediction errors. How prediction errors in one sensory modality influence the computation of prediction errors in another modality is still unclear. To investigate multimodal prediction errors in mouse auditory cortex (ACx), we used a virtual environment to experimentally couple running to both self-generated auditory and visual feedback. Using two-photon microscopy, we first characterized responses of layer 2/3 (L2/3) neurons to sounds, visual stimuli, and running onsets and found responses to all three stimuli. Probing responses evoked by audiomotor mismatches, we found that they closely resemble visuomotor mismatch responses in visual cortex (V1). Finally, testing for cross modal influence on audiomotor mismatch responses by coupling both sound amplitude and visual flow speed to the speed of running, we found that audiomotor mismatch responses were amplified when paired with concurrent visuomotor mismatches. Our results demonstrate that multimodal and non-hierarchical interactions shape prediction error responses in cortical L2/3.

Introduction

Neuronal responses consistent with prediction errors have been described in a variety of different cortical areas (Audette and Schneider, 2023; Ayaz et al., 2019; Han and Helmchen, 2023; Heindorf et al., 2018; Keller et al., 2012; Liu and Kanold, 2022) and across different species (Eliades and Wang, 2008; Keller and Hahnloser, 2009; Stanley and Miall, 2007). In V1, these responses are thought to be learned with experience (Attinger et al., 2017) and depend on local plasticity in cortex (Widmer et al., 2022). Prediction errors signal the unanticipated appearance or absence of a sensory input, and are thought to be computed as a deviation between top-down predictions and bottom-up sensory inputs (Rao and Ballard, 1999). One type of top-down signals that conveys a prediction of sensory input are motor-related signals (Leinweber et al., 2017). In auditory cortex, motor-related signals can modulate responses to self-generated vocalizations (Eliades and Wang, 2008) or to sounds coupled to locomotion (Schneider et al., 2018). Auditory cortex is thought to use these motor-related signals to compute audiomotor prediction error responses (Audette et al., 2022; Audette and Schneider, 2023; Eliades and Wang, 2008; Keller and Hahnloser, 2009; Liu and Kanold, 2022). These audiomotor prediction errors can be described in a hierarchical variant of predictive processing, in which top-down motor-related signals function as predictions of bottom-up sensory input. While parts of both auditory and visual processing streams are well described by a hierarchy, the cortical network as a whole does not easily map onto a hierarchical architecture, anatomically (Markov et al., 2013) or functionally (St-Yves et al., 2023; Suzuki et al., 2023), in a non-trivial way. One of the connections that does not neatly fit into a hierarchical model is the surprisingly dense reciprocal connection between ACx and V1 (Clavagnier et al., 2004; Falchier et al., 2002; Ibrahim et al., 2016; Leinweber et al., 2017; Zhao et al., 2022). From ACx to V1 this connection conveys a prediction of visual input given sound (Garner and Keller, 2022). What the reciprocal projection from V1 to ACx conveys, is still unclear. In proposals for hierarchical implementations of predictive processing there are no such lateral connections, and there is no reason to assume prediction error computations in different modalities should directly interact at the level of primary sensory areas. Thus, we argued that the lateral interaction between V1 and ACx is a good starting point to investigate how non-hierarchical interactions are involved in the computation of prediction errors, and how multimodal interactions shape sensorimotor prediction errors.

Based on this idea, we designed an experiment in which we could couple and transiently decouple running speed in a virtual environment to both self-generated auditory feedback and self-generated visual flow feedback. While doing this, we recorded activity in L2/3 neurons of ACx using two-photon calcium imaging. Using this approach, we first confirmed that a substantial subset of L2/3 neurons in ACx responds to either auditory, visual (Sharma et al., 2021), or motor-related inputs (Henschke et al., 2021; Morandell et al., 2023; Vivaldo et al., 2023). While we found that L2/3 neurons in ACx responded to audiomotor mismatches in a way that closely resembles visuomotor mismatch responses found in V1 (Keller et al., 2012), we found no evidence of responses to visuomotor mismatch in ACx. However, when coupling both visual flow and auditory feedback to running, we found that L2/3 neurons in ACx non-linearly combine information about visuomotor and audiomotor mismatches. Overall, our results demonstrate that prediction errors can be potentiated by multimodal interactions in primary sensory cortices.

Results

Auditory, visual, and motor-related signals were intermixed in L2/3 of ACx

To investigate auditory, visual, and motor-related signals in mouse ACx, we combined an audiovisual virtual reality system with two-photon calcium imaging in L2/3 ACx neurons (Figure 1A). We used an adeno-associated viral (AAV) vector to express a genetically encoded calcium indicator (AAV2/1-EF1α-GCaMP6f-WPRE) in ACx (Figures 1B-D). Following recovery from surgery, mice were habituated to the virtual reality setup (Figure 1C). We first mapped the location of the primary auditory cortex (A1) and the anterior auditory field (AAF) using widefield calcium imaging (Figure 1C and Figure 1E). Based on these maps, we then chose recording locations for two-photon imaging of L2/3 neurons in either A1 or AAF. For the purposes of this work, we did not distinguish between A1 and AAF and will refer to these two areas here as ACx. To characterize basic sensory and motor-related responses, we recorded neuronal responses to pure tones, full-field moving gratings, and running onsets. We first assessed population responses of L2/3 neurons evoked by sounds (pure tones presented at 4 kHz, 8 kHz, 16 kHz, or 32 kHz at either 60 dB or 75 dB sound pressure level (SPL); Figure 1F). While pure tones resulted in both increases or decreases in calcium activity in individual neurons (Figure 1G), the average population response exhibited a significant decrease in activity (Figure 1H). Next, we analyzed visual responses evoked by full-field drifting gratings (see Methods; Figure 1I). Visual stimulation resulted in a diverse response across the population of L2/3 neurons (Figure 1J) that was initially positive at the population level (Figure 1K). To quantify motor-related inputs, we analyzed activity during running onsets (Figure 1L). We found that the majority of neurons increased their activity during running onsets (Figure 1M), which was also reflected in a significant positive response on the population level (Figure 1N). Finally, we investigated how running modulates auditory and visual responses in ACx. In V1, running strongly increases responses to visual stimuli (Niell and Stryker, 2010), while in auditory cortex running has been shown to modulate auditory responses in a variety of different ways (Audette et al., 2022; Bigelow et al., 2019; Henschke et al., 2021; McGinley et al., 2015; Morandell et al., 2023; Schneider et al., 2014; Vivaldo et al., 2023; Yavorska and Wehr, 2021; Zhou et al., 2014). Separating auditory responses by running state, we found that sound evoked responses of ACx neurons were overall similar during sitting and running, but exhibited a smaller decrease in activity when the mouse was sitting (Figure S1A). Visual responses also appeared overall similar but with a small increase in strength during running (Figure S1B), similar to the running modulation effect observed on visual responses in V1. Thus, running appears to moderately and differentially modulate auditory and visual responses in L2/3 ACx neurons. Consistent with previous work, these results demonstrate that auditory, visual, and motor-related signals are all present in ACx and that running modulation influences L2/3 ACx neurons differently than in V1.

Auditory, visual, and motor-related signals were present in L2/3 of ACx
**(A)** Schematic of the virtual reality system. For imaging experiments, mice were head-fixed and free to run on an air-supported spherical treadmill. For all recordings, the microscope was tilted 45 degrees to the left to image left ACx.
**(B)** Strategy for two-photon imaging of L2/3 ACx neurons. We injected an AAV vector to express a genetically encoded calcium indicator in ACx.
**(C)** Timeline of the experiment. Starting 10 days after viral injection and window-implantation surgery mice were habituated to the virtual reality setup without any visual or auditory stimulation for 5 days. We mapped ACx with widefield imaging to be able to target two-photon (2P) imaging to ACx. In 1 to 6 recording sessions, 1 session per day, we recorded from 7637 neurons in 17 mice.
**(D)** Example two-photon image in L2/3 of ACx.
**(E)** Example widefield mapping of ACx. Response maps reflect regions with the strongest response for each tested sound frequency.
**(F)** The sound stimuli were 1 s long pure tones of 4 kHz, 8 kHz, 16 kHz, or 32 kHz played at 60 dB or 75 dB sound pressure level (SPL), presented with randomized inter-stimulus intervals.
**(G)** The average sound evoked response of all L2/3 ACx neurons across all tested frequencies and sound levels. Sound is presented from 0 s to 1 s. Red indicates an increase in activity, while blue indicates a decrease in activity. All responses are baseline subtracted. To avoid regression to the mean artifacts in plotting, the response heatmap is generated by splitting data in two halves by trials. The responses from the first half of trials are used to sort neurons by response strength and the average responses of the second half of trials are plotted for each neuron. To prevent graphical aliasing, the heatmaps are smoothed over 10 neurons for plotting.
**(H)** The average sound evoked population response of all ACx L2/3 neurons across all tested frequencies and sound levels (7637 neurons). Stimulus duration was 1 s (gray shading). Here and in subsequent panels, solid black lines represent mean and shading SEM. The horizontal bar above the plot marks time bins in which the response is statistically different from 0 (gray: not significant, black: p<0.05; see Methods).
**(I)** The visual stimuli we used were full-field drifting gratings of 8 different directions, presented for 4 s to 8 s with randomized inter-stimulus intervals.
**(J)** As in G, but for gratings onsets responses averaged across all orientations.
**(K)** As in H, but for the population response to grating onsets averaged across all orientations.
**(L)** Motor-related activity was assessed based on responses upon running onsets.
**(M)** As in G, but for running onset responses.
**(N)** As in H, but for the average population response to running onsets. Only data from running onsets in which the mouse ran for at least 1 s (gray shading) were included.

L2/3 neurons in ACx responded to audiomotor mismatch

To test whether auditory, visual, and motor-related signals are integrated in L2/3 neurons of ACx to compute prediction errors, we first probed for responses to audiomotor (AM) mismatches. A mismatch in this context, is the absence of a sensory input that the brain predicts to receive from the environment, and thus a specific type of negative prediction error. We experimentally generated a coupling between movement and sensory feedback and then used movement as a proxy for what the mouse predicts to receive as sensory feedback. To do this with an auditory stimulus, we coupled the sound amplitude of an 8 kHz pure tone to the running speed of the mouse on the spherical treadmill such that sound amplitude was proportional to locomotion speed (Figures 2A and 2B). In this paradigm, a running speed of 0 corresponded to a sound amplitude of 0, while 30 cm/s running speed corresponded to a sound amplitude of 60 dB SPL. We refer to this type of session as closed loop. We then introduced AM mismatches by setting the sound amplitude to 0 for 1 s at random times (on average every 15 s). An alternative approach to introduce AM mismatches would have been to clamp the sound amplitude to a constant value. However, based on the analogy between sound amplitude and visual flow speed in visuomotor (VM) mismatch paradigms, where we induce VM mismatch by setting visual flow speed to 0 (Keller et al., 2012), we chose the former. We found that AM mismatch resulted in a strong population response (Figure 2C and Figure 2D). Interestingly, this response was already apparent in the first closed loop session with audiomotor coupling that the mice ever experienced, suggesting that this coupling is learned very rapidly (Figure S2A). To test whether AM mismatch responses can be explained by a sound offset response, we performed recordings in open loop sessions that consisted of a replay of the sound profile the mouse had self-generated in the preceding closed loop session. Mice were free to run during this session and did so at similar levels as during the closed loop session (Figure S2B). The average response to the playback of sound halt during the open loop session was significantly less strong than the average response to AM mismatch (Figure 2D), but in contrast to visual playback halt responses in V1 (Vasilevskaya et al., 2023), we found no evidence of a running modulation of the response to the playback halt (Figure S2C). Thus, L2/3 neurons in ACx respond to AM mismatch in a way similar to how the L2/3 neurons in V1 respond to visuomotor (VM) mismatch.

L2/3 neurons of ACx responded to audiomotor mismatch events
(A) Schematic of the virtual reality system used to study responses to audiomotor (AM) mismatches. The sound amplitude of an 8 kHz pure tone was coupled to the running speed of the mouse on a spherical treadmill. These experiments were performed in darkness.
**(B)** In closed loop sessions, the running speed of the mouse was coupled to the sound amplitude. AM mismatches were introduced by briefly setting the sound amplitude to 0 for 1 s. Below, the calcium response of an example neuron to AM mismatch events.
**(C)** Responses of all L2/3 ACx neurons to audiomotor mismatches. The response heatmap is generated as described in Figure 1G.
**(D)** The average population response of all L2/3 neurons to AM mismatches and sound playback halts (4755 neurons). AM mismatch duration was 1 s (gray shading). The horizontal bar above the plot marks time bins in which the AM mismatch response is statistically different from the playback halt response (gray: not significant., black: p<0.05; see Methods).
**(E)** The average population response of AM mismatch neurons (5% of strongest responders) to sound stimulation (black) and running onsets (green). Sound stimulation was 1 s (gray shading).
**(F)** Comparison of the response strength of AM mismatch (MM) neurons to sound stimulation (left) and running onsets (right) compared to those of the remainder of the neuron population. Error bars indicate SEM. Here and elsewhere, n.s.: not significant; *: p<0.05; **: p<0.01; ***: p<0.001. See Table S1 for all statistical information.
**(G)** Scatter plot of the correlations of calcium activity with sound amplitude (x-axis) and running speed (y-axis) in open loop sessions, for all neurons. The color-code reflects the strength of responses to AM mismatch in the closed loop session. Note, AM mismatch-responsive neurons are enriched in the upper left quadrant.
**(H)** Scatter plot of the responses to AM mismatch and sound playback halt for all neurons. Neurons that exhibited significant (p < 0.05) positive responses to AM mismatch are shown in red (13.7%). Black dashed line marks unity.

Assuming that AM mismatch responses are computed as a difference between an excitatory motor-related prediction and an auditory stimulus driven inhibition, we would expect the neurons with high AM mismatch responses to exhibit opposing influence of motor-related and auditory input. To test this, we selected the 5% of neurons with the strongest responses to AM mismatch and quantified the responses of these neurons to sound stimulation and running onsets. Consistent with a model of a subtractive computation of prediction errors, we found that AM mismatch neurons exhibited a strong reduction in activity in response to sound stimulation and an increase of activity on running onsets (Figure 2E). Given that mismatch responses are likely enriched in the superficial part of L2/3 (O’Toole et al., 2023), and that in our two-photon imaging experiments we also preferentially recorded from more superficial neurons, we suspect that our population is enriched for mismatch neurons. Consistent with this interpretation, we observed a strong population response to AM mismatches (Figures 2C and 2D) and a decrease in population activity in response to sound stimulation (Figure 1H). Nevertheless, sound evoked responses were significantly more negative in neurons strongly responsive to AM mismatch, than for the remainder of the L2/3 neuronal population (Figure 2F). This effect was similar when we used different thresholds for the selection of AM mismatch neurons (10% or 20% of neurons with the strongest response to AM mismatch; Figure S3). Consistent with a sound driven reduction of activity and running related increase of activity in AM mismatch neurons, the correlation of calcium activity of AM mismatch neurons was predominantly negative with sound amplitude and positive with running speed in open loop sessions (Figure 2G). This again resembles the properties of VM mismatch neurons in V1 (Attinger et al., 2017). If AM mismatch responses are computed as a difference between a locomotion driven excitation and a sound driven inhibition, we could also expect to find a correlation between the strength of mismatch response and the strength of sound playback halt responses. Even in the absence of locomotion driven excitation, a relief from sound driven inhibition could trigger an increase in calcium activity. When comparing AM mismatch responses with playback sound halt responses for all neurons, we do indeed find a positive correlation between the two (Figure 2H). Overall, these results suggest that the implementation of sensorimotor prediction error computation generalizes beyond V1 to other primary cortices and might be a canonical cortical computation in L2/3.

We found no evidence of visuomotor mismatch responses in L2/3 of ACx

Visuomotor mismatch responses are likely calculated in V1 (Jordan and Keller, 2020), and spread across dorsal cortex from there (Heindorf and Keller, 2023). To investigate multimodal mismatch responses, we first quantified the strength of these VM mismatch responses, which are independent of auditory input, in ACx. In these experiments, the running speed of the mouse was coupled to the visual flow speed in a virtual corridor, but not to any sound feedback (Figures 3A and 3B). We introduced VM mismatches by halting visual flow for 1 s at random times while the mice were running, as previously described (Keller et al., 2012; Zmarz and Keller, 2016). To control for visual responses independent of visuomotor coupling, we used an open loop replay of the visual flow generated in the previous session (see Methods). We found that neither VM mismatches nor visual flow playback halts, which the mouse experienced in open loop sessions, resulted in a measurable population response in ACx (Figures 3C and 3D). Selecting the 5% of neurons with the strongest responses to VM mismatches and quantifying their responses to grating presentations and running onsets, we found that these neurons exhibited positive responses to running onset and no significant response to grating stimuli (Figure 3E). These responses were not different from the population responses of the remainder of the neurons (Figure 3F). Quantifying the correlation of calcium activity with visual flow speed and running speed in the open loop session, we found that VM mismatch responsive neurons exhibited a distribution not different from chance (Figure 3G). We also found no evidence of a correlation between VM mismatch responses and playback halt responses (Figure 3H). Thus, while there may be a small subset of VM mismatch responsive neurons in L2/3 of ACx, we find no evidence of a VM mismatch response at the level of the L2/3 population.

We found no evidence of visuomotor mismatch responses in ACx
(A) Schematic of the virtual reality system used to measure VM mismatch responses. The visual flow of the virtual corridor was coupled to the running speed of the mouse on a spherical treadmill. There was no sound stimulus present in these experiments.
**(B)** In closed loop sessions, the running speed of the mouse was coupled to the movement in a virtual corridor. VM mismatches were introduced by briefly setting visual flow speed to 0 for 1 s. Below, the calcium response of an example neuron to VM mismatch events.
**(C)** Responses of all L2/3 ACx neurons to VM mismatches. The response heatmap is generated as described in Figure 1G.
**(D)** The average population response of all L2/3 neurons to VM mismatches and visual flow playback halts (5688 neurons). Gray shading marks the duration of both stimuli. The horizontal bar above the plot marks time bins in which the VM mismatch response is statistically different from the playback halt response (gray: n.s., black: p<0.05; see Methods).
**(E)** The average population response of VM mismatch neurons (5% of strongest responders) to grating stimulation (blue) and running onsets (green). Stimulus duration was 4 s to 8 s (gray shading).
**(F)** Comparison of the response strength of VM mismatch (MM) neurons to visual stimulation (left) and running onsets (right) compared to those of the remainder of the neuron population. Error bars indicate SEM. Here and elsewhere, n.s.: not significant; *: p<0.05; **: p<0.01; ***: p<0.001. See Table S1 for all statistical information.
**(G)** Scatter plot of the correlation of calcium activity with visual flow speed (x-axis) and running speed (y-axis) in open loop sessions for all neurons. The color-code reflects the strength of responses to VM mismatch in the closed loop session. Note, VM mismatch responsive neurons are scattered randomly.
**(H)** Scatter plot of the responses to VM mismatch and visual flow playback halt for all neurons. The percentage of neurons (6.6%; in red) that exhibited significant (p < 0.05) responses to VM mismatches, is only barely above chance.

Mismatch responses were potentiated by multimodal interactions

Finally, we explored how multimodal coupling of both auditory and visual feedback to running speed influenced mismatch responses in L2/3 of ACx. To do this, we coupled both sound amplitude and visual flow speed to the running speed of the mouse in an audiovisual virtual environment (Figures 4A and 4B). We then introduced mismatch events by halting both sound and visual flow for 1 s to trigger a concurrent audiomotor and visuomotor [AM + VM] mismatch (Figure 4B). The nomenclature here is such that the first letter in the pair denotes the sensory input that is being predicted, while the second letter denotes the putative predictor – the square brackets are used to denote that the two events happen concurrently. By putative predictor, we mean an information source available to the mouse that would, in principle, allow it to predict another input, given the current experimental environment. Thus, in the case of a [AM + VM] mismatch both the halted visual flow and the halted sound amplitude are predicted by running speed. The [AM + VM] mismatch resulted in a significant response on the population level (Figure 4C). The concurrent experience of mismatch between multiple modalities could simply be the result of a linear combination of the responses to the different mismatch stimuli or could be the result of a non-linear combination. To test whether we find evidence of a non-linear combination of mismatch responses, we compared the [AM+VM] mismatch to [AM] and [VM] mismatch events presented alone. We found that the presentation of a [AM+VM] mismatch led to a significantly larger response than either an [AM] or a [VM] mismatch in isolation (Figure 4D). To test whether the linear summation of [AM] + [VM] mismatch responses could explain the response to the concurrent presentation [AM+VM], we compared the two directly, and found that the concurrent presentation [AM+VM] elicited a significantly larger response than the linear sum of [AM] + [VM] mismatch responses (Figures 4E and S4). Plotting the [AM+VM] mismatch responses against the linear sum of the [AM] + [VM] mismatch responses for each neuron, we found that while there is some correlation between the two, there is a subset of neurons (13.7%; red dots, Figure 4F) that selectively respond to the concurrent [AM+VM] mismatch, while a different subset of neurons (11.2%; orange dots, Figure 4F) selectively responds to the mismatch responses in isolation. This demonstrates that mismatch responses in different modalities can interact non-linearly.

Discussion

Consistent with previous reports, we found that auditory, visual, and motor-related signals are intermixed in the population of L2/3 neurons in ACx. Responses to both motor-related (McGinley et al., 2015; Schneider et al., 2014; Vivaldo et al., 2023; Yavorska and Wehr, 2021; Zhou et al., 2014) and visual signals (Bigelow et al., 2022; Morrill and Hasenstaub, 2018; Sharma et al., 2021) have been reported across layers in ACx, with the strongest running modulation effect found in L2/3 (Schneider et al., 2014). Also, consistent with previous reports, we found that a subset of L2/3 neurons in ACx respond to audiomotor prediction errors (Audette et al., 2022; Liu and Kanold, 2022). In V1, it has been demonstrated that neurons signaling prediction errors exhibit opposing influence of bottom-up visual and top-down motor-related inputs. This has been speculated to be the consequence of a subtractive computation of prediction errors (Jordan and Keller, 2020; Keller et al., 2012; Leinweber et al., 2017). Our findings now reveal a similar pattern of opposing influence in prediction error neurons in primary ACx that exhibit a positive correlation with motor-related input and a negative correlation with auditory input (Figure 2G). This would be consistent with the idea that both visuomotor and audiomotor prediction errors are computed as a subtractive difference between bottom-up and top-down inputs. Based on this, it is conceivable that this type of computation extends also beyond primary sensory areas of cortex and may be a more general computational principle implemented in L2/3 of cortex.

Finally, we found that concurrent prediction errors in multiple modalities result in an increase in prediction error response that exceeds a linear combination of the prediction error responses in single modalities (Figure 4E), with a subset of neurons selectively responding only to the combination of prediction error responses (Figure 4F). A similar non-linear relationship has been described between auditory and visual oddball responses in both ACx and V1 (Shiramatsu et al., 2021). At this point, it should be kept in mind that deviations from linearity in terms of spiking responses are difficult to assess using calcium imaging data. However, given that the difference between the concurrent presentation and the linear sum of the two individual mismatch responses was approximately a factor of two (Figure 4E), and the fact that we found a population of neurons that responds selectively to the concurrent presentation of both mismatches, we suspect that also the underlying spiking responses are non-linear. What are the mechanisms that could underlie this interaction? Neurons in ACx have access to information about VM mismatches from at least two sources. In widefield calcium imaging, VM mismatch responses are detectable across most of dorsal cortex (Heindorf and Keller, 2023). Thus, VM mismatch responses could be present in long-range cortico-cortical axons from V1, or possibly in L4, L5, or L6 neurons in ACx. Alternative sources of VM mismatch input are neuromodulatory signals. Locus coeruleus, for example, drives noradrenergic signals in response to VM mismatches across the entire dorsal cortex (Jordan and Keller, 2023). However, given that noradrenergic signals only weakly modulate responses in L2/3 neurons in V1 (Jordan and Keller, 2023), it is unclear if the broadcasted noradrenergic signals could non-linearly potentiate the AM mismatch responses of ACx neurons. We speculate that cholinergic signals are also unlikely to contribute to this effect. In V1 there are no cholinergic responses to VM mismatch (Yogesh and Keller, 2023). However, given that ACx and V1 receive cholinergic innervation from different sources (Kim et al., 2016), we cannot rule out the possibility that cholinergic signals in ACx respond to VM mismatch. Nevertheless, given that the [AM+VM] mismatch responses do not simply appear to be an amplified variant of the [AM] mismatch responses (Figure 4F), we speculate that [AM+VM] mismatch responses are primarily driven by long-range cortico-cortical input from V1 that interacts with a local computation of [AM] mismatch responses in the L2/3 ACx circuit.

Lateral interactions in the computation of prediction errors between sensory streams are not accounted for by hierarchical variants of predictive processing. In these hierarchical variants, prediction errors are computed as a comparison between top-down and bottom-up inputs (Rao and Ballard, 1999). To explain the lateral interactions between prediction errors likely computed in ACx (AM mismatch responses) and prediction errors likely computed in V1 (VM mismatch responses) that we describe here, we will need new variants of predictive processing models that include lateral and non-hierarchical interactions. Thus, our results demonstrate that mismatch responses in different modalities interact non-linearly and can potentiate each other. The circuit mechanisms that underlie this form of multimodal integration of mismatch responses are still unclear and will require further investigation. However, we would argue that the relatively strong multimodal interaction demonstrates that unimodal and hierarchical variants of predictive processing are insufficient to explain cortical mismatch responses - if predictive processing aims to be a general theory of cortical function, we will need to explore non-hierarchical variants of predictive processing.

Supplementary figures

Running exhibited differential effects on the responses to sound presentation and moving grating onsets. Related to Figure 1
(A) The average population response of ACx L2/3 neurons to sound presentation (4390 neurons) during sitting (light gray) and running (dark gray). Stimulus duration was 1 s (gray shading). The horizontal bar above the plot marks time bins in which the responses are statistically different from each other (gray: n.s., black: p<0.05; see Methods).
**(B)** As in A, but for responses to moving gratings (3701 neurons) during sitting (light blue) and running (dark blue). Stimulus duration was 4 s to 8 s (gray shading).

Controls for audiomotor mismatch responses. Related to Figure 2
(A) The average population response of L2/3 ACx neurons to AM mismatches as a function of experience with audiomotor coupling. AM mismatch duration was 1 s (gray shading). The data shown are from the first two audiomotor closed loop sessions. Each closed loop session lasted 5.5 minutes and mice experienced one such session per day (Day 1: 1271, Day 2: 904). Note, AM mismatch responses are already present in the first closed loop session.
**(B)** Comparison of running speeds during either closed or open loop audiomotor sessions. Dots are different recording sessions. Here and elsewhere, n.s.: not significant; *: p<0.05; **: p<0.01; ***: p<0.001. See Table S1 for all statistical information.
**(C)** Comparison of the average population response of L2/3 neurons to sound playback halts while mice were running (dark gray, 4017 neurons) or sitting (light gray, 1878 neurons) in open loop session. AM mismatch duration was 1 s (gray shading).

Opposing influence of sound and running on AM mismatch neurons. Related to Figure 2
The same analysis presented as in Figure 2F but using the 10% of most AM mismatch responsive neurons (left), or the 20% of the most AM mismatch responsive neurons (right). Error bars indicate SEM. Here and elsewhere, n.s.: not significant; *: p<0.05; **: p<0.01; ***: p<0.001. See Table S1 for all statistical information.

Non-linear combination of mismatch responses also with spike estimation. Related to Figure 4
As in Figure 4E, but using an estimate of firing rate calculated using CASCADE (Rupprecht et al., 2021). Gray shading marks the duration of the mismatch stimulus.

Methods

Mice and surgery

All animal procedures were approved by and carried out in accordance with guidelines of the Veterinary Department of the Canton Basel-Stadt, Switzerland. C57BL/6 female mice (Charles River), between the ages of 7 and 12 weeks were used in this study. For cranial window implantation, mice were anesthetized using a mixture of fentanyl (0.05 mg/kg), medetomidine (0.5 mg/kg), and midazolam (5 mg/kg). Analgesics were applied perioperatively. Lidocaine was injected subcutaneously into the scalp (10 mg/kg s.c.) prior to the surgery. Mice underwent a cranial window implantation surgery at an age of between 7 and 8 weeks. First, a custom-made titanium head-plate was attached to the skull (right hemisphere) with dental cement (Heraeus Kulzer). Next, a 3 mm craniotomy was made over left ACx (4.2 mm to 4.4 mm lateral from the midline and 2.6 mm to 2.8 mm posterior from bregma) followed by 4 to 6 injections of approximately 200 nl each of the AAV vector: AAV2/1-EF1α-GCaMP6f-WPRE (10^13- ¹⁴ GC/ml). A circular glass cover slip was glued (Ultragel, Pattex) in place to seal the craniotomy. Metacam (5 mg/kg, s.c.) and buprenorphine (0.1 mg/kg s.c.) were injected intraperitoneally for 2 days after completion of the surgery. Mice were returned to their home cage and group housed for 10 days prior to the first experiments.

Virtual reality environment

All recordings were done with mice head-fixed in a virtual reality system, as described previously (Leinweber et al., 2014). Mice were free to run on an air-supported polystyrene ball. Three types of closed loop conditions were used for the experiments. The rotation of the spherical treadmill was either coupled 1: to the sound amplitude of an 8 kHz pure tone (audiomotor coupling), while the animal was locomoting in darkness, 2: to the movement in a virtual corridor (visuomotor coupling), or 3: to both the sound amplitude of an 8 kHz pure tone and the movement in a virtual corridor (audio-visuo-motor coupling). For audiomotor coupling, we used the running speed of the mouse to control the SPL of an 8 kHz pure tone presented to the mouse through a loudspeaker (see section auditory stimulation). This closed loop coupling was not instantaneous but exhibited a delay of 260 ms ± 60 ms (mean ± STD). For visuomotor coupling, the running speed of the mouse was coupled to the visual flow speed in the virtual environment projected onto a toroidal screen surrounding the mouse using a Samsung SP-F10M projector synchronized to the turnaround times of the resonant scanner of the two-photon microscope. The delay in the visuomotor closed loop coupling was 90 ms ± 10 ms (mean ± STD). From the point of view of the mouse, the screen covered a visual field of approximately 240 degrees horizontally and 100 degrees vertically. The virtual environment presented on the screen was a corridor tunnel with walls consisting of vertical sinusoidal gratings. Prior to the recording experiments, mice were habituated in darkness to the setup in 1 to 2-hour long sessions for up to 5 days, until they displayed regular locomotion. Closed loop sessions were followed by open loop sessions, in which rotation of the spherical treadmill was decoupled from both the sound amplitude and the movement in the virtual corridor. During these open loop sessions, we replayed the amplitude modulated sound or the visual flow recorded in the previous closed loop session.

Auditory stimulation

Sounds were generated with a 16-bit digital-to-analog converter (PCI6738, National Instruments) using custom scripts written in LabVIEW (LabVIEW 2020, National Instruments) at 160 kHz sampling rate, amplified (SA1, Tucker Davis Technologies, FL, USA) and played through an MF1 speaker (Tucker Davis Technologies. FL, USA) positioned 10 cm from the mouse’s right ear. Stimuli were calibrated with a wide-band ultrasonic acoustic sensor (Model 378C01, PCB Piezotronics, NY, USA). To study sound-evoked responses, we used 4 kHz, 8 kHz, 16 kHz, and 32 kHz pure tones played at 60 dB and 75 dB SPL (1 s duration, at a randomized inter-stimulus interval 4 s ± 1 s, 10 repetitions, 1 ms on and off-ramp, in a randomized order). For audiomotor coupling experiments, we used an 8 kHz pure tone with a sound amplitude that varied between 40 dB and 75 dB SPL.

Visual stimulation

For visual stimulation, we used full-field sinusoidal drifting grating (0 degrees, 45 degrees, 90 degrees, 270 degrees, moving in either direction) in a pseudo-random sequence, each presented for a duration of 6 s ± 2 s, with between 2 and 7 repetitions, with a randomized inter-stimulus interval of 4.5 s ± 1.5 s during which a gray screen was displayed.

Running onsets

Running onsets were defined as the running speed crossing a threshold of 3 cm/s, where the average speed in the previous 3 s was below 1.8 cm/s. To separate trials with AM mismatch, VM mismatch, auditory stimulus and grating stimulus based on locomotion state into those running and those while sitting, we used threshold of 0.3 cm/s in a 1 s window preceding the stimulus onset.

Widefield calcium imaging

To establish a reference tonotopic map of A1 and AAF (Figure 1E), we performed widefield fluorescence imaging experiments on a custom-built microscope consisting of objectives mounted face-to-face (Nikon 85 mm/f1.8 sample side, Nikon 50 mm/f1.4 sensor side), as previously described (Heindorf and Keller, 2023). Blue illumination was provided by a light-emitting diode (470 nm, Thorlabs) and passed through an excitation filter (SP490, Thorlabs). Green fluorescence emission was filtered with a 525/50 bandpass filter. Images were acquired at a frame rate of 100 Hz on a sCMOS camera (PCO edge 4.2). The raw images were cropped on-sensor, and the resulting data was saved to disk with custom-written software in LabVIEW (National Instruments).

Two-photon imaging

Calcium imaging of L2/3 neurons in A1 and AAF was performed using a modified Thorlabs Bergamo II microscope with a 16x, 0.8 NA objective (Nikon N16XLWD-PF), as previously described (Leinweber et al., 2014). To record in left ACx, the microscope was tilted 45 degrees to the left. The excitation light source was a tunable, femtosecond-pulsed laser (Insight, Spectra Physics or Chameleon, Coherent) tuned to 930 nm. The laser power was adjusted to 30 mW. A 12 kHz resonance scanner (Cambridge Technology) was used for line scanning, and we acquired 400 lines per frame. This resulted in a frame rate of 60 Hz at a resolution of 400 × 750 pixels. We used a piezo-electric linear actuator (Physik Instrumente, P-726) to record from imaging planes at four different cortical depths, separated by 15 μm. This reduced the effective frame rate per layer to 15 Hz. The emission light was bandpass filtered using a 525/50 nm filter (Semrock), and signals were detected with a photomultiplier (Hamamatsu, H7422), amplified (Femto, DHCPCA-100), digitized at 800 MHz (National Instruments, NI5772), and bandpass filtered at 80 MHz with a digital Fourier-transform filter on a field-programmable gate array (National Instruments, PXIe-7965). Recording locations were visually registered against the reference images acquired with widefield imaging previously using blood vessels patterns.

Widefield image analysis

Off-line data processing and data analysis were done with custom-written MATLAB scripts. Slow drifts in the fluorescence signal were removed using 8^th percentile filtering with a 62.5 s moving window, similar to what was used for two-photon imaging data (Dombeck et al., 2007). Activity was calculated as the ΔF/F₀, where F₀ was the median fluorescence over the entire recording session. For stimulus responses, we use a response window of 0.2 s to 1.2 s following stimulus onset and a baseline window of −1 s to 0 s before stimulus onset. The pixels with the strongest response (top 3% - 5% of response distribution), were used to mark the tonotopic areas corresponding to the different stimuli.

Two-photon image analysis

Calcium imaging data were processed as described previously. In brief, raw images were full-frame registered to correct for lateral brain motion. Neurons were selected manually based on mean and maximum fluorescence images. Average fluorescence per neuron over time was corrected for slow fluorescence drift using an 8^th percentile filter and a 66 s (or 1000 frames) window (Dombeck et al., 2007; Keller et al., 2012; Leinweber et al., 2014) and divided by the median value over the entire trace to calculate ΔF/F₀. All stimulus-response curves were baseline subtracted. The baseline subtraction window was −0.5 s to 0 s before stimulus onset. For quantification of responses during different onset types (auditory, visual, running, mismatch), ΔF/F was averaged over the response time window (0.5 s to 2.5 s after stimulus onset) and baseline subtracted (mean activity in a window preceding stimulus onset, −0.5 s to 0 s). Onsets which were not preceded by at least 2 s of baseline or not followed by at least 3 s of recording time, were excluded from the analysis. Sessions with less than two onsets were not included in the analysis. To quantify the difference in average calcium responses as a function of time, we used a hierarchical bootstrap test for every 5 frames of the calcium trace (333 ms) and marked comparisons where responses were different (p < 0.05). Mismatch responsive neurons were selected based on the absolute response strength over the response time window (0.5 s to 2.5 s). To infer spikes from calcium signals (Figure S4), we used CASCADE (Rupprecht et al., 2021).

Statistical tests

All statistical information for the tests performed in this manuscript is provided in Table S1. We used hierarchical bootstrapping (Saravanan et al., 2020) for statistical testing to account for the nested structure of the data (multiple neurons from one imaging site). We first resampled the data with replacement at the level of imaging sites, followed by resampling at the level of neurons. We then computed the mean responses across the resampled population and repeated this process 10 000 times. The probability of one group being different from the other was calculated as a fraction of bootstrap sample means which violated the tested hypothesis.

Key Resource Table

Acknowledgements

We thank Tingjia Lu for the production of viral vectors and all the members of the Keller lab for discussion and support. This project has received funding from the Swiss National Science Foundation (GBK), the Novartis Research Foundation (GBK), and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 865617) (GBK).

Author contributions

MS designed and performed the experiments and analyzed the data. All authors wrote the manuscript.

Declaration of interests

The authors declare no competing financial interests.

References

1. Attinger A.
2. Wang B.
3. Keller G.B.
2017Visuomotor Coupling Shapes the Functional Development of Mouse Visual CortexCell 169:1291–1302https://doi.org/10.1016/j.cell.2017.05.023 Google Scholar
1. Audette N.J.
2. Schneider D.M.
2023Stimulus-specific prediction error neurons in mouse auditory cortexhttps://doi.org/10.1101/2023.01.06.523032 Google Scholar
1. Audette N.J.
2. Zhou W.
3. La Chioma A.
4. Schneider D.M.
2022Precise movement-based predictions in the mouse auditory cortexCurr Biol 32:4925–4940https://doi.org/10.1016/j.cub.2022.09.064 Google Scholar
1. Ayaz A.
2. Stäuble A.
3. Hamada M.
4. Wulf M.A.
5. Saleem A.B.
6. Helmchen F.
2019Layer-specific integration of locomotion and sensory information in mouse barrel cortexNature Communications 10:1–14https://doi.org/10.1038/s41467-019-10564-8 Google Scholar
1. Bigelow J.
2. Morrill R.J.
3. Dekloe J.
4. Hasenstaub A.R.
2019Movement and VIP Interneuron Activation Differentially Modulate Encoding in Mouse Auditory CortexeNeuro 6https://doi.org/10.1523/ENEURO.0164-19.2019 Google Scholar
1. Bigelow J.
2. Morrill R.J.
3. Olsen T.
4. Hasenstaub A.R.
2022Visual modulation of firing and spectrotemporal receptive fields in mouse auditory cortexCurrent Research in Neurobiology 3:100040https://doi.org/10.1016/j.crneur.2022.100040 Google Scholar
1. Clavagnier S.
2. Falchier A.
3. Kennedy H.
2004Long-distance feedback projections to area V1: Implications for multisensory integration, spatial awareness, and visual consciousness. Cognitive, Affective& Behavioral Neuroscience 4:117–126https://doi.org/10.3758/CABN.4.2.117 Google Scholar
1. Dombeck D.A.
2. Khabbaz A.N.
3. Collman F.
4. Adelman T.L.
5. Tank D.W.
2007Imaging large-scale neural activity with cellular resolution in awake, mobile miceNeuron 56:43–57https://doi.org/10.1016/j.neuron.2007.08.003 Google Scholar
1. Eliades S.J.
2. Wang X.
2008Neural substrates of vocalization feedback monitoring in primate auditory cortexNature 453:1102–6https://doi.org/10.1038/nature06910 Google Scholar
1. Falchier A.
2. Clavagnier S.
3. Barone P.
4. Kennedy H.
2002Anatomical evidence of multimodal integration in primate striate cortexJ Neurosci 22:5749–5759https://doi.org/10.1523/JNEUROSCI.22-13-05749.2002 Google Scholar
1. Garner A.R.
2. Keller G.B.
2022A cortical circuit for audio-visual predictionsNat Neurosci 25:98–105https://doi.org/10.1038/s41593-021-00974-7 Google Scholar
1. Han S.
2. Helmchen F.
2023Behavior-relevant top-down cross-modal predictions in mouse neocortexhttps://doi.org/10.1101/2023.04.03.535389 Google Scholar
1. Heindorf M.
2. Arber S.
3. Keller G.B.
2018Mouse Motor Cortex Coordinates the Behavioral Response to Unpredicted Sensory FeedbackNeuron 99:1040–1054https://doi.org/10.1016/j.neuron.2018.07.046 Google Scholar
1. Heindorf M.
2. Keller G.B.
2023Antipsychotic drugs selectively decorrelate long-range interactions in deep cortical layershttps://doi.org/10.1101/2022.01.31.478462 Google Scholar
1. Henschke J.U.
2. Price A.T.
3. Pakan J.M.P.
2021Enhanced modulation of cell-type specific neuronal responses in mouse dorsal auditory field during locomotionCell Calcium 96:102390https://doi.org/10.1016/j.ceca.2021.102390 Google Scholar
1. Ibrahim L.A.
2. Mesik L.
3. Ji X.-Y.
4. Fang Q.
5. Li H.-F.
6. Li Y.-T.
7. Zingg B.
8. Zhang L.I.
9. Tao H.W.
2016Cross-Modality Sharpening of Visual Cortical Processing through Layer-1-Mediated Inhibition and DisinhibitionNeuron 89:1031–1045https://doi.org/10.1016/j.neuron.2016.01.027 Google Scholar
1. Jordan R.
2. Keller G.B.
2023The locus coeruleus broadcasts prediction errors across the cortex to promote sensorimotor plasticityeLife 12https://doi.org/10.7554/eLife.85111 Google Scholar
1. Jordan R.
2. Keller G.B.
2020Opposing Influence of Top-down and Bottom-up Input on Excitatory Layer 2/3 Neurons in Mouse Primary Visual CortexNeuron 108:1194–1206https://doi.org/10.1016/j.neuron.2020.09.024 Google Scholar
1. Keller G.B.
2. Bonhoeffer T.
3. Hübener M.
2012Sensorimotor Mismatch Signals in Primary Visual Cortex of the Behaving MouseNeuron 74:809–815https://doi.org/10.1016/j.neuron.2012.03.040 Google Scholar
1. Keller G.B.
2. Hahnloser R.H.R.
2009Neural processing of auditory feedback during vocal practice in a songbirdNature 457:187–90https://doi.org/10.1038/nature07467 Google Scholar
1. Kim J.-H.
2. Jung A.-H.
3. Jeong D.
4. Choi I.
5. Kim K.
6. Shin S.
7. Kim S.J.
8. Lee S.-H.
2016Selectivity of Neuromodulatory Projections from the Basal Forebrain and Locus Ceruleus to Primary Sensory CorticesJ. Neurosci 36:5314–5327https://doi.org/10.1523/JNEUROSCI.4333-15.2016 Google Scholar
1. Leinweber M.
2. Ward D.R.
3. Sobczak J.M.
4. Attinger A.
5. Keller G.B.
2017A Sensorimotor Circuit in Mouse Cortex for Visual Flow PredictionsNeuron 95:1420–1432https://doi.org/10.1016/j.neuron.2017.08.036 Google Scholar
1. Leinweber M.
2. Zmarz P.
3. Buchmann P.
4. Argast P.
5. Hübener M.
6. Bonhoeffer T.
7. Keller G.B.
2014Two-photon calcium imaging in mice navigating a virtual reality environmentJournal of visualized experiments : JoVE e 50885Google Scholar
1. Liu J.
2. Kanold P.O.
2022Interactive auditory task reveals complex sensory-action integration in mouse primary auditory cortexhttps://doi.org/10.1101/2022.12.12.520155 Google Scholar
1. Markov N.T.
2. Ercsey-Ravasz M.
3. Van Essen D.C.
4. Knoblauch K.
5. Toroczkai Z.
6. Kennedy H.
2013Cortical high-density counterstream architecturesScience. American Association for the Advancement of Science https://doi.org/10.1126/science.1238406 Google Scholar
1. McGinley M.J.
2. David S.V.
3. McCormick D.A.
2015Cortical Membrane Potential Signature of Optimal States for Sensory Signal DetectionNeuron 87:179–192https://doi.org/10.1016/j.neuron.2015.05.038 Google Scholar
1. Morandell K.
2. Yin A.
3. Del Rio R.T.
4. Schneider D.M.
2023Movement-related modulation in mouse auditory cortex is widespread yet locally diversebioRxiv https://doi.org/10.1101/2023.07.03.547560 Google Scholar
1. Morrill R.J.
2. Hasenstaub A.R.
2018Visual Information Present in Infragranular Layers of Mouse Auditory CortexJ Neurosci 38:2854–2862https://doi.org/10.1523/JNEUROSCI.3102-17.2018 Google Scholar
1. Niell C.M.
2. Stryker M.P.
2010Modulation of visual responses by behavioral state in mouse visual cortexNeuron 65:472–9https://doi.org/10.1016/j.neuron.2010.01.033 Google Scholar
1. O’Toole S.M.
2. Oyibo H.K.
3. Keller G.B.
2023Molecularly targetable cell types in mouse visual cortex have distinguishable prediction error responsesNeuron 111:2918–2928https://doi.org/10.1016/j.neuron.2023.08.015 Google Scholar
1. Rao R.P.N.
2. Ballard D.H.
1999Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effectsNature Neuroscience 2:79–87https://doi.org/10.1038/4580 Google Scholar
1. Rupprecht P.
2. Carta S.
3. Hoffmann A.
4. Echizen M.
5. Blot A.
6. Kwan A.C.
7. Dan Y.
8. Hofer S.B.
9. Kitamura K.
10. Helmchen F.
11. Friedrich R.W.
2021A database and deep learning toolbox for noise-optimized, generalized spike inference from calcium imagingNat Neurosci 24:1324–1337https://doi.org/10.1038/s41593-021-00895-5 Google Scholar
1. Saravanan V.
2. Berman G.J.
3. Sober S.J.
2020Application of the hierarchical bootstrap to multi-level data in neuroscienceNeuron Behav Data Anal Theory :3https://nbdt.scholasticahq.com/article/13927-application-of-the-hierarchical-bootstrap-to-multi-level-data-in-neuroscience
1. Schneider D.M.
2. Nelson A.
3. Mooney R.
2014A synaptic and circuit basis for corollary discharge in the auditory cortexNature 513:189–194https://doi.org/10.1038/nature13724 Google Scholar
1. Schneider D.M.
2. Sundararajan J.
3. Mooney R.
2018A cortical filter that learns to suppress the acoustic consequences of movementNature 561:391–395https://doi.org/10.1038/s41586-018-0520-5 Google Scholar
1. Sharma S.
2. Srivastava H.K.
3. Bandyopadhyay S.
2021Modulation of auditory responses by visual inputs in the mouse auditory cortexhttps://doi.org/10.1101/2021.01.22.427870 Google Scholar
1. Shiramatsu T.I.
2. Mori K.
3. Ishizu K.
4. Takahashi H.
2021Auditory, Visual, and Cross-Modal Mismatch Negativities in the Rat Auditory and Visual CorticesFront Hum Neurosci 15:721476https://doi.org/10.3389/fnhum.2021.721476 Google Scholar
1. Stanley J.
2. Miall R.C.
2007Functional activation in parieto-premotor and visual areas dependent on congruency between hand movement and visual stimuli during motor-visual primingNeuroImage 34:290–9https://doi.org/10.1016/j.neuroimage.2006.08.043 Google Scholar
1. St-Yves G.
2. Allen E.J.
3. Wu Y.
4. Kay K.
5. Naselaris T.
2023Brain-optimized deep neural network models of human visual areas learn non-hierarchical representationsNat Commun 14:3329https://doi.org/10.1038/s41467-023-38674-4 Google Scholar
1. Suzuki M.
2. Pennartz C.M.A.
3. Aru J.
2023How deep is the brain? The shallow brain hypothesisNat. Rev. Neurosci :1–14https://doi.org/10.1038/s41583-023-00756-z Google Scholar
1. Vasilevskaya A.
2. Widmer F.C.
3. Keller G.B.
4. Jordan R.
2023Locomotion-induced gain of visual responses cannot explain visuomotor mismatch responses in layer 2/3 of primary visual cortexCell Rep 42:112096https://doi.org/10.1016/j.celrep.2023.112096 Google Scholar
1. Vivaldo C.A.
2. Lee J.
3. Shorkey M.
4. Keerthy A.
5. Rothschild G.
2023Auditory cortex ensembles jointly encode sound and locomotion speed to support sound perception during movementPLOS Biology 21:e3002277https://doi.org/10.1371/journal.pbio.3002277 Google Scholar
1. Widmer F.C.
2. O’Toole S.M.
3. Keller G.B.
2022NMDA receptors in visual cortex are necessary for normal visuomotor integration and skill learningeLife 11:e71476https://doi.org/10.7554/eLife.71476 Google Scholar
1. Yavorska I.
2. Wehr M.
2021Effects of Locomotion in Auditory Cortex Are Not Mediated by the VIP NetworkFront Neural Circuits 15:618881https://doi.org/10.3389/fncir.2021.618881 Google Scholar
1. Yogesh B.
2. Keller G.B.
2023Cholinergic input to mouse visual cortex signals a movement state and acutely enhances layer 5 responsivenesseLife 12https://doi.org/10.7554/eLife.89986 Google Scholar
1. Zhao M.
2. Ren M.
3. Jiang T.
4. Jia X.
5. Wang X.
6. Li A.
7. Li X.
8. Luo Q.
9. Gong H.
2022Whole-Brain Direct Inputs to and Axonal Projections from Excitatory and Inhibitory Neurons in the Mouse Primary Auditory AreaNeurosci Bull 38:576–590https://doi.org/10.1007/s12264-022-00838-5 Google Scholar
1. Zhou M.
2. Liang F.
3. Xiong X.R.
4. Li L.
5. Li H.
6. Xiao Z.
7. Tao H.W.
8. Zhang L.I.
2014Scaling down of balanced excitation and inhibition by active behavioral states in auditory cortexNat Neurosci 17:841–850https://doi.org/10.1038/nn.3701 Google Scholar
1. Zmarz P.
2. Keller G.B.
2016Mismatch Receptive Fields in Mouse Visual CortexNeuron 92:766–772https://doi.org/10.1016/j.neuron.2016.09.057 Google Scholar

Article and author information

Author information

Magdalena Solyga
Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
ORCID iD: 0000-0003-2969-2963
Georg B. Keller
Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland, Faculty of Science, University of Basel, Basel, Switzerland
ORCID iD: 0000-0002-1401-0117
- Corresponding author; email: georg.keller@fmi.ch

Version history

Preprint posted: December 14, 2023
Sent for peer review: January 16, 2024
Reviewed Preprint version 1: March 22, 2024
Reviewed Preprint version 2: November 18, 2024
Reviewed Preprint version 3: January 28, 2025
Version of Record published: February 10, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.95398. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Maria Chait
University College London, London, United Kingdom
Senior Editor
Andrew King
University of Oxford, Oxford, United Kingdom

Reviewer #1 (Public Review):

Summary:

The manuscript presents a short report investigating mismatch responses in the auditory cortex, following previous studies focused on the visual cortex. By correlating the mouse locomotion speed with acoustic feedback levels, the authors demonstrate excitatory responses in a subset of neurons to halts in expected acoustic feedback. They show a lack of responses to mismatch in the visual modality. A subset of neurons show enhanced mismatch responses when both auditory and visual modalities are coupled to the animal's locomotion.

While the study is well-designed and addresses a timely question, several concerns exist regarding the quantification of animal behavior, potential alternative explanations for recorded signals, correlation between excitatory responses and animal velocity, discrepancies in reported values, and clarity regarding the identity of certain neurons.

Strengths:

(1) Well-designed study addressing a timely question in the field.

(2) Successful transition from previous work focused on the visual cortex to the auditory cortex, demonstrating generic principles in mismatch responses.

(3) The correlation between mouse locomotion speed and acoustic feedback levels provides evidence for a prediction signal in the auditory cortex.

(4) Coupling of visual and auditory feedback shows putative multimodal integration in the auditory cortex.

Weaknesses:

(1) Lack of quantification of animal behavior upon mismatches, potentially leading to alternative interpretations of recorded signals.

(2) Unclear correlation between excitatory responses and animal velocity during halts, particularly in closed-loop versus playback conditions.

(3) Discrepancies in reported values in a few figure panels raise questions about data consistency and interpretation.

(4) Ambiguity regarding the identity of the [AM+VM] MM neurons.

https://doi.org/10.7554/eLife.95398.1.sa2

Reviewer #2 (Public Review):

In this study, Solyga and Keller use multimodal closed-loop paradigms in conjunction with multiphoton imaging of cortical responses to assess whether and how sensorimotor prediction errors in one modality influence the computation of prediction errors in another modality. Their work addresses an important open question pertaining to the relevance of non-hierarchical (lateral cortico-cortical) interactions in predictive processing within the neocortex.

Specifically, they monitor GCaMP6f responses of layer 2/3 neurons in the auditory cortex of head-fixed mice engaged in VR paradigms where running is coupled to auditory, visual, or audio-visual sensory feedback. The authors find strong auditory and motor responses in the auditory cortex, as well as weak responses to visual stimuli. Further, in agreement with previous work, they find that the auditory cortex responds to audiomotor mismatches in a manner similar to that observed in visual cortex for visuomotor mismatches. Most importantly, while visuomotor mismatches by themselves do not trigger significant responses in the auditory cortex, simultaneous coupling of audio-visual inputs to movement non-linearly enhances mismatch responses in the auditory cortex.

Their results thus suggest that prediction errors within a given sensory modality are non-trivially influenced by prediction errors from another modality. These findings are novel, interesting, and important, especially in the context of understanding the role of lateral cortico-cortical interactions and in outlining predictive processing as a general theory of cortical function.

In its current form, the manuscript lacks sufficient description of methodological details pertaining to the closed-loop training and the overall experimental design. In several scenarios, while the results per se are convincing and interesting, their exact interpretation is challenging given the uncertainty about the actual experimental protocols (more on this below). Second, the authors are laser-focused on sensorimotor errors (mismatch responses) and focus almost exclusively on what happens when stimuli deviate from the animal's expectations.

While the authors consistently report strong running-onset responses (during open-loop) in the auditory cortex in both auditory and visual versions of the task, they do not discuss their interpretation in the different task settings (see below), nor do they analyze how these responses change during closed-loop i.e. when predictions align with sensory evidence.

However, I believe all my concerns can be easily addressed by additional analyses and incorporation of methodological details in the text.

Major concerns:

(1) Insufficient analysis of audiomotor mismatches in the auditory cortex:

Lack of analysis of the dependence of audiomotor mismatches on the running speed: it would be helpful if the authors could clarify whether the observed audiomotor mismatch responses are just binary or scale with the degree of mismatch (i.e. running speed). Along the same lines, how should one interpret the lack of dependence of the playback halt responses on the running speed? Shouldn't we expect that during playback, the responses of mismatch neurons scale with the running speed?

Slow temporal dynamics of audiomotor mismatches: despite the transient nature of the mismatches (1s), auditory mismatch responses last for several seconds. They appear significantly slower than previous reports for analogous visuomotor mismatches in V1 (by the same group, using the same methods) and even in comparison to the multimodal mismatches within this study (Figure 4C). What might explain this sustained activity? Is it due to a sustained change in the animal's running in response to the auditory mismatch?

(2) Insufficient analysis and discussion of running onset responses during audiomotor sessions: The authors report strong running-onset responses during open-loop in identified mismatch neurons. They also highlight that these responses are in agreement with their model of subtractive prediction error, which relies on subtracting the bottom-up sensory evidence from top-down motor-related predictions. I agree, and, thus, assume that running-onset responses during the open loop in identified 'mismatch' neurons reflect the motor-related predictions of sensory input that the animal has learned to expect. If this is true, one would expect that such running-onset responses should dampen during closed-loop, when sensory evidence matches expectations and therefore cancels out this prediction. It would be nice if the authors test this explicitly by analyzing the running-related activity of the same neurons during closed-loop sessions.

(3) Ambiguity in the interpretation of responses in visuomotor sessions.

Unlike for auditory stimuli, the authors show that there are no obvious responses to visuomotor mismatches or playback halts in the auditory cortex. However, the interpretation of these results is somewhat complicated by the uncertainty related to the training history of these mice. Were these mice exclusively trained on the visuomotor version of the task or also on the auditory version? I could not find this info in the Methods. From the legend for Figure 4D, it appears that the same mice were trained on all versions of the task. Is this the case? If yes, what was the training sequence? Were the mice first trained on the auditory and then the visual version?

The training history of the animals is important to outline the nature of the predictions and mismatch responses that one should expect to observe in the auditory cortex during visuomotor sessions. Depending on whether the mice in Figure 3 were trained on visual only or both visual and auditory tasks, the open-loop running onset responses may have different interpretations.

a) If the mice were trained only on the visual task, how should one interpret the strong running onset responses in the auditory cortex? Are these sensorimotor predictions (presumably of visual stimuli) that are conveyed to the auditory cortex? If so, what may be their role?

b) If the mice were also trained on the auditory version, then a potential explanation of the running-onset responses is that they are audiomotor predictions lingering from the previously learned sensorimotor coupling. In this case, one should expect that in the visual version of the task, these audiomotor predictions (within the auditory cortex) would not get canceled out even during the closed-loop periods. In other words, mismatch neurons should constantly be in an error state (more active) in the closed-loop visuomotor task. Is this the case?

If so, how should one then interpret the lack of a 'visuomotor mismatch' aligned to the visual halts, over and above this background of continuous errors?
As such, the manuscript would benefit from clearly stating in the main text the experimental conditions such as training history, and from discussing the relevant possible interpretations of the responses.

(4) Ambiguity in the interpretation of responses in multimodal versus unimodal sessions.

The authors show that multimodal (auditory + visual) mismatches trigger stronger responses than unimodal mismatches presented in isolation (auditory only or visual only). Further, they find that even though visual mismatches by themselves do not evoke a significant response, co-presentation of visual and auditory stimuli non-linearly augments the mismatch responses suggesting the presence of non-hierarchical interactions between various predictive processing streams.

In my opinion, this is an important result, but its interpretation is nuanced given insufficient details about the experimental design. It appears that responses to unimodal mismatches are obtained from sessions in which only one stimulus is presented (unimodal closed-loop sessions). Is this actually the case? An alternative and perhaps cleaner experimental design would be to create unimodal mismatches within a multimodal closed-loop session while keeping the other stimulus still coupled to the movement.

Given the current experiment design (if my assumption is correct), it is unclear if the multimodal potentiation of mismatch responses is a consequence of nonlinear interactions between prediction/error signals exchanged across visual and auditory modalities. Alternatively, could this result from providing visual stimuli (coupled or uncoupled to movement) on top of the auditory stimuli? If it is the latter, would the observed results still be evidence of non-hierarchical interactions between various predictive processing streams?

Along the same lines, it would be interesting to analyze how the coupling of visual as well as auditory stimuli to movement influences responses in the auditory cortex in close-loop in comparison to auditory-only sessions. Also, do running onset responses change in open-loop in multimodal vs. unimodal playback sessions?

Minor concerns and comments:

(1) Rapid learning of audiomotor mismatches: It is interesting that auditory mismatches are present even on day 1 and do not appear to get stronger with learning (same on day 2). The authors comment that this could be because the coupling is learned rapidly (line 110). How does this compare to the rate at which visuomotor coupling is learned? Is this rapid learning also observable in the animal's behavior i.e. is there a change in running speed in response to the mismatch?

(2) The authors should clarify whether the sound and running onset responses of the auditory mismatch neurons in Figure 2E were acquired during open-loop. This is most likely the case, but explicitly stating it would be helpful.

(3) In lines 87-88, the authors state 'Visual responses also appeared overall similar but with a small increase in strength during running ...'. This statement would benefit from clarification. From Figure S1 it appears that when the animal is sitting there are no visual responses in the auditory cortex. But when the animal is moving, small positive responses are present. Are these actually 'visual' responses - perhaps a visual prediction sent from the visual cortex to the auditory cortex that is gated by movement? If so, are they modulated by features of visual stimuli eg. contrast, intensity? Or, do these responses simply reflect motor-related activity (running)? Would they be present to the same extent in the same neurons even in the dark?

(4) The authors comment in the text (lines 106-107) about cessation of sound amplitude during audiomotor mismatches as being analogous to halting of visual flow in visuomotor mismatches. However, sound amplitude versus visual flow are quite different in nature. In the visuomotor paradigm, the amount of visual stimulation (photons per unit time) does not necessarily change systematically with running speed. Whereas, in the audiomotor paradigm, the SNR of the stimulus itself changes with running speed which may impact the accuracy of predictions. On a broader note, under natural settings, while the visual flow is coupled to movement, sound amplitude may vary more idiosyncratically with movement.

Perhaps such differences might explain why unlike in the case of visual cortex experiments, running speed does not affect the strength of playback responses in the auditory cortex.

https://doi.org/10.7554/eLife.95398.1.sa1

Reviewer #3 (Public Review):

This study explores sensory prediction errors in the sensory cortex. It focuses on the question of how these signals are shaped by non-hierarchical interactions, specifically multimodal signals arising from same-level cortical areas. The authors used 2-photon imaging of mouse auditory cortex in head-fixed mice that were presented with sounds and/or visual stimuli while moving on a ball. First, responses to pure tones, visual stimuli, and movement onset were characterized. Then, the authors made the running speed of the mouse predictive of sound intensity and/or visual flow. Mismatches were created through the interruption of sound and/or visual flow for 1 second while the animal moved, disrupting the expected sensory signal given the speed of movement. As a control, the same sensory stimuli triggered by the animal's movement were presented to the animal decoupled from its movement. The authors suggest that auditory responses to the unpredicted silence reflect mismatch responses. That these mismatch responses were enhanced when the visual flow was congruently interrupted, indicates the cross-modal influence of prediction error signals.

This study's strengths are the relevance of the question and the design of the experiment. The authors are experts in the techniques used. The analysis explores neither the full power of the experimental design nor the population activity recorded with 2-photon, leaving open the question of to what extent what the authors call mismatch responses are not sensory responses to sound interruption. The auditory system is sensitive to transitions and indeed responses to the interruption of the sound are similar in quality, if not quantity, in the predictive and the control situation.

https://doi.org/10.7554/eLife.95398.1.sa0