Passive exposure to task-relevant stimuli enhances categorization learning

Christian Schmid; Muhammad Haziq; Melissa M. Baese-Berk; James M. Murray; Santiago Jaramillo

doi:10.7554/eLife.88406.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Reviewing Editor
Srdjan Ostojic
École Normale Supérieure - PSL, Paris, France
Senior Editor
Michael Frank
Brown University, Providence, United States of America

Reviewer #1 (Public Review):

Schmid et al. investigate the question of how sensory learning in animals and artificial networks is driven both by passive exposure to the environment (unsupervised) and from reinforcing feedback (supervised) and how these two systems interact. They first demonstrate in mice that passive exposure to the same auditory stimuli used in a discrimination task modifies learning and performance in the task. Based on this data, they then tested how the interaction of supervised and unsupervised learning in an artificial network could account for the behavioural results.

Strengths :
The clear behavioural impact of passive exposure to sounds on accelerating learning is a major strength of the paper. Moreover, the observation that passive exposure had a positive impact on learning whether it was prior to the task or interleaved with learning sessions provides interesting constraints for modelling the interaction between supervised and unsupervised learning. A practical fallout for labs performing long training procedures is that the periods of active learning that require water-restriction could be reduced by using passive sessions. This could increase both experimental efficiency and animal well-being.

The modelling section clearly exhibits the differences between models and the step-by-step presentation building to the final model provides the reader with a lot of intuition about how supervised and unsupervised learning interact. In particular, the authors highlight situations in which the task-relevant discrimination does not align with the directions of highest variance, thus reinforcing the relevance of their conclusions for the complex structure of sensory stimuli. A great strength of these models is that they generate clear predictions about how neural activity should evolve during the different training regimes that would be exciting to test.

Weaknesses :
The experimental design presented cannot clearly show that the effect of passive exposure was due to the specific exposure to task-relevant stimuli since there is no control group exposed to irrelevant stimuli. Studies have shown that exposure to a richer sensory environment, even in the adult, swiftly (ie within days) enhances responses even in the adult and even when the stimuli are different from those present in task1-3. Since the authors conclude that their network models "build latent representations of features that are determined by statistical properties of the input distribution, as long as those features aid the decoding of task-relevant variables" (line 339, my emphasis). This conclusion, and therefore the link of behaviour to the models, is weakened by the lack of direct testing of the need for task-relevant stimuli to be presented.

The conclusion that "passive exposure influences responses to sounds not used during training" (line 147) does not seem fully supported by the authors' analysis. The authors show that there is an increase in accuracy for intermediate sweep speeds despite the fact that this is the first time the animals encounter them in the active session. However, it seems impossible to exclude that this effect is not simply due to the increased accuracy of the extreme sounds that the animals had been trained on. For example, simply prolonging learning in stage 3 is likely to increase accuracy across sounds at stage 4, passive sessions may be mimicking this effect. Moreover, the authors point out that there is no effect on the slope of the psychometric curve. Such a sharpening would be predicted if the passive presentations were indeed enhancing intermediate sound representations, making them more precise and more discriminable.

In the modelling section, the authors adjusted the hyper-parameters to maximize the difference between pure active and passive/active learning. This makes a comparison of learning rates between models somewhat confusing, raising the question of whether the differences highlight an interaction between the two types of learning or simply parameter choice. For example:

- Figure 5: although in model 3 passive listening enhances learning relative to the pure active condition, learning is overall much slower in the active condition compared to model 2. This raises the question of whether the addition of unsupervised rules makes the models more apt at exploiting passive exposure but at the cost of efficient active learning.

- Figure 6 & 7: model 5 only differs from model 4 by the addition of supervised learning at layer 1 and the use of what should be a harder task (stimuli spread over the first PCs) however model 5 clearly has much better performance for the P: A condition which is surprising given that the unsupervised and supervised learning periods are clearly separated.

1. Mandairon, N., Stack, C. & Linster, C. Olfactory enrichment improves the recognition of individual components in mixtures. Physiol. Behav. 89, 379-384 (2006).
2. Alwis, D. S. & Rajan, R. Environmental enrichment and the sensory brain: The role of enrichment in remediating brain injury. Front. Syst. Neurosci. 8, 1-20 (2014).
3. Polley, D. B., Kvašňák, E. & Frostig, R. D. Naturalistic experience transforms sensory maps in the adult cortex of caged animals. Nature 429, 67-71 (2004).

https://doi.org/10.7554/eLife.88406.1.sa2

Reviewer #2 (Public Review):

Schmid et al present a lovely study looking at the effect of passive auditory exposure on learning a categorization task.

The authors utilize a two-alternative choice task where mice have to discriminate between upward and downward-moving frequency sweeps. Once mice learn to discriminate easy stimuli, the task is made psychometric and additional intermediate stimuli are introduced (as is standard in the literature). The authors introduce an additional two groups of animals, one that was passively exposed to the task stimuli before any behavioral shaping, and one that had passive exposure interleaved with learning. The major behavioral finding is that passive exposure to sounds improves learning speed. The authors show this in a number of ways through linear fits to the learning curves. Additionally, by breaking down performance based on the "extreme" vs "psychometric" stimuli, the authors show that passive exposure can influence responses to sounds that were not present during the initial training period. One limitation here is that the presented analysis is somewhat simplistic, does not include any detailed psychometric analysis (bias, lapse rates etc), and primarily focuses on learning speed. Ultimately though, the behavioral results are interesting and seem supported by the data.

To investigate the neural mechanisms that may underlie their behavioral findings, the authors turn to a family of artificial neural network models and evaluate the consequences of different learning algorithms and schedules, network architectures, and stimulus distributions, on the learning outcomes. The authors work through five different architectures that fail to recapitulate the primary behavior findings before settling on a final model, utilizing a combination of supervised and unsupervised learning, that was capable of reproducing the key aspects of the experiments. Ultimately, the behavioral results presented are consistent with network models that build latent representations of task-relevant features that are determined by statistical properties of the input distribution.

https://doi.org/10.7554/eLife.88406.1.sa1

Reviewer #3 (Public Review):

Summary of Author's Results/Intended Achievements
The authors were trying to ascertain the underlying learning mechanisms and network structure that could explain their primary experimental finding: passive exposure to a stimulus (independent of when the exposure occurs) can lead to improvements in active (supervised) learning. They modeled their task with 5 progressively more complex shallow neural networks classifying vectors drawn from multi-variate Gaussian distributions.

Account of Major Strengths:
Overall, the experimental findings were interesting, albeit not necessarily novel. The modelling was also appropriate, with a solid attempt at matching the experimental condition to simplified network models.

Account of Major Weaknesses:
I would say there are two major weaknesses of this work. The first is that even Model 5 differs from their data. For example, the A+P (passive interleaved condition) learning curve in Figure 7 seems to be non-monotonic, and has some sort of complex eigenvalue in its decay to the steady state performance as trials increase. This wasn't present in their experimental data (Figure 2D), and implies a subtle but important difference. There also appear to be differences in how quickly the initial learning (during early trials) occurs for the A+P and A:P conditions. While both A+P and A:P conditions learn faster than A only in M5, A+P and A:P seem to learn in different ways, which isn't supported in their data. The second major weakness is that the authors also don't generate any predictions with M5. Can they test this model of learning somehow in follow-up behavioural experiments in mice?

Discussion of Likely Impact:
Without follow-up experiments to test their mechanism of why passive exposure helps in a schedule-independent way, the impact of this paper will be limited.

Additional Context:
I believe the authors need to place this work in the context of a large amount of existing literature on passive (unsupervised) and active (supervised) learning interactions. This field is broad both experimentally and computationally. For example, there is an entire sub-field of machine learning, called semi-supervised learning that is not mentioned at all in this work.

https://doi.org/10.7554/eLife.88406.1.sa0

Author Response

We thank the reviewers for their insightful reviews of our work, including both its strengths and limitations. Below we present minor corrections to the preprint and responses to the main points brought up by each reviewer.

Erratum:

Line 330 refers to Fig. 7F (instead of 7D).
Line 331 refers to Fig. 7G (instead of 7E).

Reviewer #1 (Public Review):

The experimental design presented cannot clearly show that the effect of passive exposure was due to the specific exposure to task-relevant stimuli since there is no control group exposed to irrelevant stimuli.

We acknowledge the possibility that exposure to task-irrelevant stimuli could result in improvements in learning. Testing this possibility would be a worthwhile goal of future experiments, but it is outside the scope of our current study. We have been careful in our paper to only draw conclusions about the effects of exposure to task-relevant stimuli compared to no exposure. We will also add a discussion of this point and references to the literature pointed out by the reviewer to the final version of our manuscript.

The conclusion that "passive exposure influences responses to sounds not used during training" (line 147) does not seem fully supported by the authors' analysis. The authors show that there is an increase in accuracy for intermediate sweep speeds despite the fact that this is the first time the animals encounter them in the active session. However, it seems impossible to exclude that this effect is not simply due to the increased accuracy of the extreme sounds that the animals had been trained on.

The conclusion that the reviewer quotes from our paper is drawn from Figure 3, in which we show that mice exhibit an improvement on non-extreme stimuli after training on extreme stimuli. Panel 3D illustrates that the observed improvements are not just changes in psychometric performance driven by the extreme sounds. In the context of this result, the conclusion relates to generalization in performance on task-relevant stimuli that are closely related to the training stimuli. In our view, it was not entirely obvious a priori that this result would have to occur, since it is possible that performance could improve at the extremes without improving at the intermediate stimuli.

In the modelling section, the authors adjusted the hyper-parameters to maximize the difference between pure active and passive/active learning. This makes a comparison of learning rates between models somewhat confusing.

We apologize for the confusion. None of our conclusions are based on comparisons of learning speed between models, but perhaps this was not pointed out sufficiently clearly. The relevant comparisons between conditions for each specific model are made using the same hyperparameters. We will clarify this in the updated version of our manuscript.

The description of the sound does not state whether when reducing the slope of the sweeps the center or the onset frequency of the sounds is preserved.

Frequency modulated sounds of different FM slopes were generated such that the center frequency was always the same. This will be clarified in the updated version of our manuscript.

Reviewer #2 (Public Review):

One limitation here is that the presented analysis is somewhat simplistic, does not include any detailed psychometric analysis (bias, lapse rates etc), and primarily focuses on learning speed.

In our analyses of trials that included extreme and intermediate stimuli, we investigated some metrics of the type that the reviewer suggests here. However, since such additional psychometric analyses generally led to null results and would in any case be somewhat tangential to our main results, which are about learning speed and responses to sounds not included during training, we did not include these in our manuscript. A limitation of our study is that the available data does not allow for an analysis of psychometrics during the initial learning stages, since only the extreme stimuli were presented during the task.

Reviewer #3 (Public Review):

The first [major weakness] is that even Model 5 differs from their data. For example, the A+P (passive interleaved condition) learning curve in Figure 7 seems to be non-monotonic, and has some sort of complex eigenvalue in its decay to the steady state performance as trials increase. This wasn't present in their experimental data (Figure 2D), and implies a subtle but important difference. There also appear to be differences in how quickly the initial learning (during early trials) occurs for the A+P and A:P conditions. While both A+P and A:P conditions learn faster than A only in M5, A+P and A:P seem to learn in different ways, which isn't supported in their data.

The reviewer is correct that there are subtle differences between the two learning curves produced by Model 5. Due to noise in the experimental data, however, it is possible that such subtle distinctions also appear in the learning curves of the mice. Further, the slight overshoot of the learning curve that the reviewer mentions is not constrained by the experimental data due to the fact that different mice reach asymptotic performance at different times, and many of them have not even reached asymptotic performance by the end of the training period.

However, even if there are minor discrepancies between the learning curves produced by the final version of the model and by the mice, we do not see this as being especially surprising or problematic. As in any model, there are a large number of potentially important features that are not included in any of our models–for example, realistic spectrotemporal neural responses, nonlinearity in neural activations, heterogeneity across mice, and many others. The aim of our modeling was to choose a space of possible models (which is inevitably restricted) and show which model version within that space best captures our experimental observations. Expanding the space of possible models that we considered to capture further nuances in the data will be a task for future work.

The second major weakness is that the authors also don't generate any predictions with M5. Can they test this model of learning somehow in follow-up behavioural experiments in mice? ... Without follow-up experiments to test their mechanism of why passive exposure helps in a schedule-independent way, the impact of this paper will be limited.

Although testing behavioral predictions from our models was beyond the scope of the current study, we do generate specific predictions with M5 (specifically, about neural representations). Our model produces predictions about neural representations and the ways in which they evolve through learning, and we hope to test these predictions in future work.

I believe the authors need to place this work in the context of a large amount of existing literature on passive (unsupervised) and active (supervised) learning interactions. This field is broad both experimentally and computationally. For example, there is an entire sub-field of machine learning, called semi-supervised learning that is not mentioned at all in this work.

We thank the reviewer for pointing this out. The updated version of our manuscript will include a discussion on how our results fit in with this literature.

https://doi.org/10.7554/eLife.88406.1.sa4

Passive exposure to task-relevant stimuli enhances categorization learning

Peer review process

Editors

Be the first to read new articles from eLife