Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
(1) The network they propose is extremely simple. This simplicity has pros and cons: on the one hand, it is nice to see the basic phenomenon exposed in the simplest possible setting. On the other hand, it would also be reassuring to check that the mechanism is robust when implemented in a more realistic setting, using, for instance, a network of spiking neurons similar to the one they used in the 2008 paper. The more noisy and heterogeneous the setting, the better.
The choice of a minimal model to illustrate our hypothesis is deliberate. Our main goal was to suggest a physiologically-grounded mechanism to rapidly encode temporally-structured information (i.e., sequences of stimuli) in Working Memory, where none was available before. Indeed, as discussed in the manuscript, previous proposals were unsatisfactory in several respects. In view of our main goal, we believe that a spiking implementation is beyond the scope of the present work.
We would like to note that the mechanism originally proposed in Mongillo et al. (2008), has been repeatedly implemented, by many different groups, in various spiking network models with different levels of biological realism (see, e.g., Lundquivst et al. (2016), for an especially ‘detailed’ implementation) and, in all cases, the relevant dynamics has been observed. We take this as an indication of ‘robustness’; the relevant network dynamics doesn’t critically depend on many implementation details and, importantly, this dynamics is qualitatively captured by a simple rate model (see, e.g., Mi et al. (2017)).
In the present work, we make a relatively ‘minor’ (from a dynamical point of view) extension of the original model, i.e., we just add augmentation. Accordingly, we are fairly confident that a set of parameters for the augmentation dynamics can be found such that the spiking network behaves, qualitatively, as the rate model. A meaningful study, in our opinion, then would require extensively testing the (large) parameters’ space (different models of augmentation?) to see how the network behavior compares with the relevant experimental observations (which ones? Behavioral? Physiological?). As said above, we believe that this is beyond the scope of the present work.
This being said, we definitely agree with the reviewer that not presenting a spiking implementation is a limitation of the present work. We have clearly acknowledged this limitation here, by adding the following paragraph to the Discussion.
“To illustrate our theory in a simple setting, we used a minimal model network that neglects many physiological details. This, however, constitutes a limitation of the present study. It would be reassuring to see that the mechanism we propose here is robust enough to reliably operate also in spiking networks, in the presence of heterogeneity in both single-cell and synaptic properties. While we are fairly confident that this is the case, a spiking implementation of our model is beyond the scope of the present study and will be addressed in the future. Also, because of the simplicity of the model network, a comparison between the model behavior and the electrophysiological observations cannot be completely direct. Nevertheless the model qualitatively accounts for a diverse set of experimental data”.
(2) One major issue with the population spike scenario is that (to my knowledge) there is no evidence that these highly synchronized events occur in delay periods of working memory experiments. It seems that highly synchronized population spikes would imply (a) a strong regularity of spike trains of neurons, at odds with what is typically observed in vivo (b) high synchronization of neurons encoding for the same item (and also of different items in situations where multiple items have to be held in working memory), also at odds with in vivo recordings that typically indicate weak synchronization at best. It would be nice if the authors at least mention this issue, and speculate on what could possibly bridge the gap between their highly regular and synchronized network, and brain networks that seem to lie at the opposite extreme (highly irregular and weakly synchronized). Of course, if they can demonstrate using a spiking network simulation that they can bridge the gap, even better.
Direct experimental evidence (in monkeys) in support of the existence of highly synchronized events -- to be identified with the ‘population spikes’ of our model -- during the delay period of a memory task is available in the literature, i.e., Panichello et al. (2024). we provide a short discussion of the results of Panichello et al. (2024) and how these results directly relate to our model. We also provide a short discussion of the results of Liebe et al. (2025), which, again, are fully consistent with our model.
We note that there is no fundamental contradiction between highly synchronized events in ‘small’ neural populations (e.g., a cell assembly) on one hand, and temporally irregular (i.e., Poisson-like) spiking at the single-neuron level and weakly synchronized activity at the network level, on the other hand. This was already illustrated in our original publication, i.e., Mongillo et al. (2008) (see, in particular, Fig. S2). We further note that the mechanism we propose to encode temporal order -- a temporal gradient in the synaptic efficacies brought about by synaptic augmentation -- would also work if the memory of the items is maintained by ‘tonic’ persistent activity (i.e., without highly synchronized events), provided this activity occurs at suitably low rates such as to prevent the saturation of the synaptic augmentation.
We have added the following two paragraphs to the Discussion.
“More direct support to this interpretation comes from recent electrophysiological studies [Panichello et al., 2024, Liebe et al., 2025]. By recording large neuronal populations (∼ 300) simultaneously in the prefrontal cortex of monkeys performing a WM task, [Panichello et al., 2024] found that, during the maintenance period, the decoding of the actively held item from neural activity was ’intermittent’; that is, decoding was only possible during short epochs (∼ 100ms) interleaved with epochs (also ∼ 100ms) where decoding was at chance level. The inability to decode resulted from a loss of selectivity at the population level, with a return of the single-neuron firing rates to their spontaneous (pre-stimulus) activity levels. The transitions between these two activity states (decodable/not-decodable) were coordinated across large populations of neurons in PFC. By recording single-neuron activity in the medial temporal lobe of humans performing a sequential multi-item WM task, [Liebe et al., 2025] found that during maintenance, neurons coding for a given item tended to fire at a specific phase of the underlying theta rhythm, again suggesting that the corresponding neuronal populations reactivate briefly and sequentially. In summary, these experimental results suggest that active memory maintenance relies on brief reactivations of the neural representations of the items, which we identify with the population spikes in our model, and that these reactivatations occur sequentially in time, as predicted by our theory”.
“We note that the proposed mechanism would still work if the items were maintained by tonically-enhanced firing rates, instead of population spikes, provided that those firing rates were suitably low. However, obtaining low firing rates in model networks of persistent activity is quite difficult”.
Reviewer #2 (Public review):
The study relates to the well-known computational theory for working memory, which suggests short-term synaptic facilitation is required to maintain working memory, but doesn't rely on persistent spiking. This previous theory appears similar to the proposed theory, except for the change from facilitation to augmentation. A more detailed explanation of why the authors use augmentation instead of facilitation in this paper is warranted: is the facilitation too short to explain the whole process of WM? Can the theory with synaptic facilitation also explain the immediate storage of novel sequences in WM?
In the model, synaptic dynamics displays both short-term facilitation and augmentation (and shortterm depression). Indeed, synaptic facilitation, alone, would be too short-lived to encode novel sequences. This is illustrated in Fig. 1B.
We provide a discussion of this important point, by adding the following paragraph to the Results section.
“If augmentation was the only form of synaptic plasticity present in the network, the encoding of an item in WM would require long presentation times, or alternatively high firing rates upon presentation, precisely because K_A is small. Instead, rapid encoding is made possible by the presence of the short-term facilitation, which builds up significantly faster than augmentation, as U >> K_A . For the same reason, however, the level of facilitation rapidly reaches the steady state; therefore, short-term facilitation alone is unable to encode temporal order (see Fig. 1B). Thus, our model requires the existence of transitory synaptic enhancement on at least two time scales, such that longer decays are accompanied by slower build-ups. Intriguingly, this pattern is experimentally observed [Fisher et al., 1997]”.
In Figure 1, the authors mention that synaptic augmentation leads to an increased firing rate even after stimulus presentation. It would be good to determine, perhaps, what the lowest threshold is to see the encoding of a WM task, and whether that is biologically plausible.
We believe that this comment is related to the above point. The reviewer is correct; augmentation alone would require fairly long stimulus presentations to encode an item in WM. ‘Fast’ encoding, indeed, is guaranteed by the presence of short-term facilitation. This important point is emphasized; see above.
In the middle panel of Figure 4, after 15-16 sec, when the neuronal population prioritizes with the second retro-cue, although the second retro-cue item's synaptic spike dominates, why is the augmentation for the first retro-cue item higher than the second-cue augmentation until the 20 sec?
This is because of the slow build-up and decay of the augmentation. When the second item is prioritized, and the corresponding neuronal population re-activates, its augmentation level starts to increase. At the same time, as the first item is now de-prioritized and the corresponding neuronal population is now silent, its augmentation level starts to decrease. Because of the ‘slowness’ of both processes (i.e., augmentation build-up and decay), it takes about 5 seconds for the augmentation level of the second item to overcome the augmentation level of the first item.
We note that the slow time scales of the augmentation dynamics, consistently with experimental observations, are necessary for our mechanism to work; see above.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
(1) Line 46 identify -> identity.
(2) Line 207 scale -> scales.
Fixed. Thank you.
(3) Lines 222-224 what about behavioral time-scale plasticity? This type of plasticity can apparently be induced very quickly.
We have removed the corresponding paragraph.
(4) Line 231 identification of `gamma bursts' with population spikes: These two phenomena seem to be very different - one can be weakly synchronized and can be consistent with highly irregular activity, while it is not clear whether the other can (see major issue 2). Also, it seems that population spikes occur at frequencies that are an order of magnitude lower than gamma.
We have rewritten the corresponding paragraph and we rely now on more direct electrophysiological evidence (i.e., on the simultaneous recording of large neuronal populations) to identify putative population spikes; see above.
Reviewer #2 (Recommendations for the authors):
(1) On page 7, the behavioral study of Rose et al. (2016) is quite important for readers to understand the 'low-activity regime', and to fully appreciate Figure 4, it would be beneficial to explain that study in greater detail.
We have added a panel to Fig. 4, and accompanying text in the caption, to better illustrate the main task events in the experiment of Rose et al. (2016).
(2) Line 17: "wrong order", but wrong timing matters too
Definitely, depending on the task. Specifically, in our example, timing is immaterial.
(3) Line 33-34: "special training", what is considered special? One could argue that the number of trials needed to learn, depending on the TI timing, is special, depending on the task.
We have removed the sentence as apparently it was confusing. We simply meant that ‘naive’ human subjects can perform the task (e.g., serial recall); that is, they didn’t undergo any kind of practice that can be construed as ‘training’.
(4) Line 40-41: but timing is also part of working memory processing. Perhaps it can be merged with the next sentence.
We have merged the two sentences.
(5) Line 53: Is the implication here that what happens in the synapses is what drives WM, and not just that the neurons stay persistently on?
Yes. The idea is that information can be maintained in the synaptic facilitation level, without enhanced spiking activity. Reading-out and refreshing the memory contents, however, requires neuronal activity. We explain this in some detail in the next paragraph (i.e., lines 60-65 in the revised submission).
(6) Line 102: could a lack of excitatory activity be explained by inhibitory signaling? It appears the inhibitory component is quite understated here.
Here we are just defining A-bar; according to Eq. (6), if r_a is 0 (i.e., no synaptic activity, for whatever reason), then A_a will converge to A-bar after a time much longer than \tau_A (i.e., a long period). We have rephrased the sentence to improve clarity.
(7) Line 158-172: please consider revising this paragraph for a more general audience.
We have rewritten this paragraph to improve clarity. For the same purpose, we have also slightly modified Fig. 3.
(8) Line 227: it would seem this is due to a singular inhibitory group making the model highly dependent on the excitatory groups.
We are not sure that we understand this comment. Here, we are just saying that if the item-coding populations don’t reactivate during the maintenance period (i.e., activity-silent regime) then the augmentation gradient cannot build up. If, on the other hand, the item-coding populations are constantly active at high rates during the maintenance period (i.e., persistent-activity regime) then then augmentation levels will rapidly saturate and, again, there will be no augmentation gradient. This is independent of how ‘silence’ or ‘activity’ of the item-coding populations is determined by the interplay of excitation and inhibition.
(9) Line 284: this would certainly be an interesting take, but it isn't clear that the model proved this type of decoupling of the temporal aspect of the recall.
This is an ‘educated’ speculation, based on the model and on a specific interpretation of some experimental results, as discussed in the paper and, in particular, in the last paragraph of the Discussion. We believe that the phrasing of the paragraph makes clear that this is, indeed, a speculation.