1. Neuroscience
Download icon

A theory of working memory without consciousness or sustained activity

  1. Darinka Trübutschek Is a corresponding author
  2. Sébastien Marti
  3. Andrés Ojeda
  4. Jean-Rémi King
  5. Yuanyuan Mi
  6. Misha Tsodyks
  7. Stanislas Dehaene
  1. Ecole des Neurosciences de Paris Ile-de-France, 15 rue de l'Ecole de médecine, France
  2. Université Pierre et Marie Curie, 4 Place Jussieu, France
  3. Cognitive Neuroimaging Unit, CEA DSV/I2BM, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, France
  4. University of Oxford, United Kingdom
  5. New York University, United States
  6. Frankfurt Institute for Advanced Studies, Germany
  7. Brain Science Center, Institute of Basic Medical Sciences, China
  8. Weizmann Institute of Science, Israel
  9. Columbia University, United States
  10. Collège de France, 11 Place Marcelin Berthelot, France
Research Article
Cited
1
Views
4,005
Comments
0
Cite as: eLife 2017;6:e23871 doi: 10.7554/eLife.23871

Abstract

Working memory and conscious perception are thought to share similar brain mechanisms, yet recent reports of non-conscious working memory challenge this view. Combining visual masking with magnetoencephalography, we investigate the reality of non-conscious working memory and dissect its neural mechanisms. In a spatial delayed-response task, participants reported the location of a subjectively unseen target above chance-level after several seconds. Conscious perception and conscious working memory were characterized by similar signatures: a sustained desynchronization in the alpha/beta band over frontal cortex, and a decodable representation of target location in posterior sensors. During non-conscious working memory, such activity vanished. Our findings contradict models that identify working memory with sustained neural firing, but are compatible with recent proposals of ‘activity-silent’ working memory. We present a theoretical framework and simulations showing how slowly decaying synaptic changes allow cell assemblies to go dormant during the delay, yet be retrieved above chance-level after several seconds.

https://doi.org/10.7554/eLife.23871.001

eLife digest

Many everyday activities require you to store information in your brain for immediate use. For example, imagine that you are cooking a meal: You have to remember the ingredients, add them in the correct order, and operate the stove. This ability is called working memory.

Researchers have long believed that, whenever we store information in our working memory, we are conscious of that information. That is, if someone asks you, you can report the information. Scientists usually also think that working memory comes with constant brain activity. This means that for as long as you have to remember something, the cells in your brain that code for that information will be active.

Trübutschek et al. now show that we can sometimes store information in working memory without being conscious of it and without the need for constant brain activity. As part of the experiment, a barely visible square-shaped target was briefly flashed in 1 of 20 different locations on a computer screen. Human volunteers had to locate the square and indicate whether they had seen it or not. Importantly, they had to guess the location of the target whenever they had not seen it. While the volunteers performed this task, their brain activity was monitored using magnetoencephalography, a noninvasive technique that captures the magnetic fields created by electrical signals in the brain.

Even when the volunteers had not seen the target, they could often correctly guess where it had been up to four seconds later, more often than would be predicted by chance alone. The experiment ruled out the possibility that this so-called “blindsight” was simply due to the volunteers accidentally reporting not having seen a target, when they had actually seen it. It also excluded the possibility that the volunteers guessed the location long before they had to report it and simply consciously stored that guess. Instead, without the participant knowing, the brain appears to have stored the target location in working memory using parts of the brain near the back of the head that process visual information. Importantly, this non-conscious storage did not come with constant brain activity, but seemed to rely on other, “activity-silent” mechanisms that are hidden to standard recording techniques.

Although Trübutschek et al. show that the brain can unknowingly store information, they did not test other aspects of working memory. Future studies are needed to examine whether the brain can also non-consciously manipulate or use information in its working memory. In addition, future research also needs to investigate the exact mechanism that stores information without constant brain activity.

https://doi.org/10.7554/eLife.23871.002

Introduction

Prominent theories of working memory require information to be consciously maintained (Baars and Franklin, 2003; Baddeley, 2003; Oberauer, 2002). Conversely, influential models of visual awareness hold information maintenance as a key property of conscious perception, highlighting synchronous thalamocortical activity (Tononi and Koch, 2008), cortical recurrence (Lamme and Roelfsema, 2000), or the sustained recruitment of parietal and dorsolateral prefrontal regions (i.e., the same areas as in working memory; Naghavi and Nyberg, 2005) in a global neuronal workspace (Dehaene and Changeux, 2011, 2001). Experimentally, non-conscious priming only lasts a few hundred milliseconds (Dupoux et al., 2008; Greenwald et al., 1996) and unseen stimuli typically fail to induce late and sustained cerebral responses (Dehaene et al., 2014). Conscious perception, in contrast, exerts a durable influence on behavior, accompanied by sustained neural activity (King et al., 2014; Salti et al., 2015; Schurger et al., 2015). The hypothesis of an intimate coupling between conscious perception and working memory is thus grounded in theory and supported by numerous empirical findings.

Recent behavioral and neuroimaging evidence, however, has questioned this prevailing view by suggesting that working memory may also operate non-consciously. Unseen stimuli may influence behavior for several seconds (Bergström and Eriksson, 2015; Soto and Silvanto, 2014). Soto et al. (2011), for instance, showed that participants recalled the orientation of a subjectively unseen Gabor cue above chance-level after a 5s-delay. Functional magnetic resonance imaging suggests that prefrontal activity may underlie such non-conscious working memory (Bergström and Eriksson, 2014; Dutta et al., 2014).

The verdict for non-conscious working memory is far from definitive, however. Delayed performance with subjectively unseen stimuli was barely above chance (Soto et al., 2011) and could have arisen from a small percentage of errors in visibility reports, with subjects miscategorizing a seen target as unseen (miscategorization hypothesis). If this were the case, then the blindsight trials, on which subjects correctly identified the target while denying any subjective awareness of the stimulus, should display similar, if not identical, neural signatures and contents as the seen trials. Alternatively, participants could also have ventured a guess about the target as soon as it appeared and consciously maintained this early guess (conscious maintenance hypothesis). Many priming studies have shown that fast guessing results in above-chance objective performance with subjectively unseen stimuli (Merikle et al., 2001). The observed blindsight effect would then reflect a normal form of conscious working memory (Stein et al., 2016). This alternative hypothesis is hard to eliminate on purely behavioral grounds; it can only be rejected by tracking the dynamics of working memory activity, for instance using brain-imaging, and determining whether this activity occurs immediately after the target even on unseen trials.

Here, we set out to address these issues, focusing on four main objectives: First, we probed the replicability of the long-lasting blindsight effect reported by Soto et al. (2011) as well as its robustness with respect to interference from distraction and a conscious working memory load in order to delineate it from other forms of prolonged iconic or sensory memory. Second, we interrogated the link between conscious perception and conscious working memory, examining whether the maintenance period in working memory could be likened to a prolongation of a conscious episode. Third, we tested the reality of non-conscious working memory by systematically examining the neural correlates of the blindsight effect and using them to assess the above two alternative hypotheses (the miscategorization and conscious maintenance hypothesis). Lastly, we propose a neuronal theory to offer a mechanistic account of conscious and non-conscious working memory.

Results

We combined magnetoencephalography (MEG) with a spatial masking paradigm to assess working memory performance under varying levels of subjective visibility (Figure 1A and Materials and methods). On 80% of the trials, a target square was flashed in 1 of 20 locations and then masked. Subjects were asked to localize the target after a variable delay (2.5–4.0 s) and to rate its visibility on a scale from 1 (not seen) to 4 (clearly seen). On the remaining 20% of trials, the target was omitted, allowing us to contrast brain activity between target-present and -absent trials. A visible distractor square was presented 1.5 s into the delay period on half the trials, challenging participants’ resistance to distraction and enabling us to evaluate the robustness of the blindsight effect behaviorally. In addition to this working memory task, subjects also completed a perception-only control condition without the delay and target-localization periods (perception task), so that we could isolate brain activity specific to conscious perception (without a working memory requirement) and investigate its link with working memory.

General experimental design and behavioral performance in the working memory task.

(A) Experimental design. A subsequently masked target square was flashed in 1 out of 20 positions. Subjects were asked to report this location after a delay of up to 4 s and to rate the visibility of the target on a 4-point scale. A visible distractor square with features otherwise identical to the target was shown on 50% of the trials during the retention period (at 1.75 s). In a perception-only control condition, the maintenance phase and location response were omitted, and subjects assessed the visibility of the target immediately after the mask. (B) Spatial distributions of forced-choice localization performance in the working memory task (experiment 1; 0 = correct target location; positive = clockwise offset). Error bars indicate standard error of the mean (SEM) across subjects. The horizontal, dotted line illustrates chance-level at 5%. Percentages show proportion of target-present trials from a given visibility category. Due to low number of trials in individual visibility ratings 2, 3, and 4, all seen categories were collapsed for analyses.

https://doi.org/10.7554/eLife.23871.003

Behavioral maintenance and shielding against distraction

We first examined objective performance in the working memory task as a function of target visibility. Overall, subjects reported the exact target location with high accuracy on seen trials (collapsed across visibility ratings > 1: Mcorrect = 69.1%, SDcorrect = 17.4%; chance = 5%; t(16) = 15.2, p<0.001, 95% CI = [55.2%, 73.1%]; Cohen’s d = 3.7). As subjective visibility of the target increased from glimpsed (visibility = 2) to clearly seen (visibility = 4), there was a corresponding monotonic increase in accuracy (Figure 1B; ps<0.05 for all pair-wise comparisons). Crucially, performance remained above chance even on unseen trials (rating = 1: Mcorrect = 22.4%, SDcorrect = 13.8%; t(16) = 5.2, p<0.001, 95% CI = [10.3%, 24.4%]; Cohen’s d = 1.3). This blindsight remained substantial after a 4s-delay (Mcorrect = 21.1%, SDcorrect = 14.7%; t(16) = 4.5, p<0.001, 95% CI = [8.5%, 23.7%]; Cohen’s d = 1.0).

Spatial distributions of participants’ responses were concentrated around the target (Figure 2A). To correct for small errors in localization, we computed the rate of correct responding with a tolerance of two positions (±36°) surrounding the target location. In subjects displaying above-chance blindsight (chance = 25%; p<0.05 in a χ2-test; n = 13), we estimated the precision of working memory as the standard deviation of the distribution within this tolerance interval (Materials and methods). Performance was better on seen than on unseen trials, both in terms of rate of correct responding (F(1, 16) = 198.5, p<0.001; partial η2 = 0.925) and precision (F(1, 12) = 36.7, p<0.001; partial η2 = 0.754). There was neither an effect of the distractor on these measures (all ps>0.079), nor any significant interactions between distractor and visibility (all ps>0.251), indicating that distractor presence did not affect retention for seen or unseen targets. Restricting the analyses to trials within one position of the actual target location (±18°) or to the subgroup of 13 subjects included in the MEG analyses did not change these findings qualitatively.

Figure 2 with 1 supplement see all
Behavioral evidence for non-conscious working memory.

Spatial distributions of responses (0 = correct target location; positive = clockwise offset) as a function of visibility and distractor presence (A), conscious working memory load (B) and delay duration (C). Insets show rate of correct responding (within ±2 positions of actual location) and precision of working memory representation separately for seen and unseen trials. Error bars represent standard error of the mean (SEM) across subjects and horizontal, dotted line indicates chance-level (5%). *p<0.05, **p<0.01, and ***p<0.001 in a paired sample t-test. Del = delay, Dis = distractor, L = load.

https://doi.org/10.7554/eLife.23871.004

While target detection d’ exceeded chance-level (M = 1.5, SD = 0.7; t(16) = 8.9, p<0.001, 95% CI = [1.2, 1.9]; Cohen’s d = 2.1) and correlated with accuracy and the rate of correct responding on seen trials (both Pearson rs > 0.762, both ps<0.001), there was no relationship between our participants’ sensitivity to the target and any of our performance measures on the unseen trials (all Pearson rs < 0.342, all ps>0.179; Figure 2—figure supplement 1A). Thus, target visibility predicted performance in the objective working memory task only on seen trials, but not on unseen trials.

Overall, these results confirm, with much higher non-conscious performance, the observations of previous studies (Soto et al., 2011): Non-conscious information may be maintained for up to 4 s and successfully shielded against distraction from a salient visual stimulus, independently of overall subjective visibility.

Resistance to conscious working memory load and delay duration

To probe the similarity between conscious working memory and the observed long-lasting blindsight effect, in a second behavioral experiment with 21 subjects, we examined whether imposing a load on conscious working memory (remembering digits) affected non-conscious performance. On each trial, 1 (low load) or 5 (high load) digits were simultaneously shown for 1.5 s, followed by a 1s-fixation period and the same sequence of events (target and mask) as in experiment 1. After a variable delay (0 or 4 s), participants had to (1) localize the target, (2) recall the digits in the correct order, and (3) rate target visibility.

Subjects again chose the exact target position with high accuracy on seen trials (Mcorrect = 77.8%, SDcorrect = 13.9%) and remained above chance on unseen trials (Mcorrect = 25.6%, SDcorrect = 11.8%; chance = 5%; t(18) = 7.6, p<0.001, 95% CI = [14.9%, 26.3%]; Cohen’s d = 1.7). While, as in experiment 1, cue detection d’ was greater than chance (M = 1.7, SD = 0.8; t(20) = 10.2, p<0.001, 95% CI = [1.4, 2.1]; Cohen’s d = 2.2), no correlations were observed with objective task performance on the unseen trials (all Pearson rs < 0.366, all ps>0.115; seen trials: all Pearson rs > 0.443, all ps<0.051; Figure 2—figure supplement 1B). As expected, participants were better at recalling 1 rather than 5 digits in the correct order (M = 93.3% vs. 89.5%, F(1, 17) = 4.7, p=0.045), irrespective of target visibility or delay duration (all ps>0.135).

Analyzing only the trials with correctly recalled digits, we observed an impact of load on the precision with which target location was retained (F(1, 13) = 7.3, p=0.018; partial η2 = .360). Crucially, load modulated the relationship between precision and visibility (interaction F(1, 13) = 8.7, p=0.011; partial η2 = .400), with no effect on seen (t(13) = 0.6, p=0.561) and a strong reduction of precision on unseen trials (t(13) = −3.6, p=0.004). There was no effect of working memory load on the rate of correct responding (all ps>0.229; Figure 2B).

Delay duration (0 or 4 s) also did not influence the rate of correct responding (all ps>0.082; Figure 2C). It did, however, affect overall precision (F(1, 15) = 9.3, p=0.008; partial η2 = .383) and the relationship between precision and visibility (interaction F(1, 15) = 5.2, p=0.037; partial η2 = .259). This interaction was driven by higher precision on no-delay than on 4s-delay trials, exclusively when subjects had seen the target (t(15) = −5.7, p<0.001; unseen trials: t(15) = −0.6, p=0.559).

Overall, these results highlight the replicability and robustness of the long-lasting blindsight effect and suggest that it does not just constitute a prolonged version of iconic memory: Even in the presence of a concurrent conscious working memory load, unseen stimuli could be maintained, with no detectable decay as a function of delay. However, the systems involved in the short-term maintenance of conscious and non-conscious stimuli interacted, because a conscious verbal working memory load diminished the precision with which non-conscious spatial information was maintained.

Similarity of conscious perception and conscious working memory

To tackle our second objective – a detailed examination of the link between conscious perception and conscious working memory –, we turned to our MEG data and first ensured that the mechanisms underlying conscious perception were stable across experimental conditions. The subtraction of the event-related fields (ERFs) evoked by unseen trials from those evoked by seen trials revealed similar topographies for the perception and working memory task (Figure 3A): Starting at ~300 ms and extending until ~500 ms after target onset, a response emerged over right parieto-temporal magnetometers. This divergence resulted primarily from a sudden increase in activity on seen trials (‘ignition’) in the perception (pFDR<0.05 from 384 to 416 ms and from 504 to 516 ms) and working memory task (pFDR<0.05 from 328 to 364 ms and from 396 to 404 ms; Figure 3B). The observed topographies and time courses fall within the time window of typical neural markers of conscious perception, including the P3b (e.g., Del Cul et al., 2007; Salti et al., 2015; Sergent et al., 2005). Consciously perceiving the target stimulus therefore involved comparable neural mechanisms, irrespective of task.

Neural signatures for conscious perception and maintenance in working memory.

(A) Sequence of brain activations (−200–800 ms) evoked by consciously perceiving the target in the perception (top) and working memory (bottom) task. Each topography depicts the difference in amplitude between seen and unseen trials over a 100 ms time window centered on the time points shown (magnetometers only). (B) Average time courses of seen and unseen trials (−200–800 ms) after subtraction of target-absent trials in a group of parietal magnetometers in the perception (left) and working memory (right) task. Shaded area illustrates standard error of the mean (SEM) across subjects. Significant differences between conditions are depicted with a horizontal, black line (Wilcoxon signed-rank test across subjects, uncorrected). For display purposes, data were lowpass-filtered at 8 Hz. T = target onset. (C) Temporal generalization matrices for decoding of visibility category as a function of training and testing task. In each panel, a classifier was trained at every time sample (y-axis) and tested on all other time points (x-axis). The diagonal gray line demarks classifiers trained and tested on the same time sample. Please note the event markers in any panel involving the perception task: Mean reaction time (target-present trials) for the visibility response is indicated as vertical and/or horizontal, dotted lines. Any classifier beyond this point only reflects post-visibility processes. Time courses of diagonal decoding and of classifiers averaged over the P3b time window (300–600 ms) and over the working memory maintenance period (0.8–2.5 s) are shown as black, red, and blue insets. Thick lines indicate significant, above-chance decoding of visibility (Wilcoxon signed-rank test across subjects, uncorrected, two-tailed except for diagonal). For display purposes, data were smoothed using a moving average with a window of eight samples. AUC = area under the curve.

https://doi.org/10.7554/eLife.23871.006

We next directly probed the relationship between conscious perception and information maintenance in conscious working memory. Does the latter reflect a prolonged conscious episode, or does it involve a distinct set of processes recruited only during the retention phase? If conscious working memory can indeed be likened to conscious perception, one might expect the same patterns that index such perception to be sustained throughout the working memory maintenance period. Linear multivariate pattern classifiers were trained to predict visibility (seen or unseen) from MEG signals separately for each task. Classification performance was assessed during an early time period (100–300 ms), the critical P3b time window (300–600 ms), and the first (0.6–1.55 s) and second part (1.55–2.5 s) of the delay period.

Decoding of the visibility effect was comparable in the two tasks (Figure 3C and Supplementary file 1): Classification performance rose sharply between 100 and 300 ms and peaked during the P3b time window (all ps<0.007, except 100–300 ms in the working memory task, where p=0.066). It then decayed slowly from ~1 s onwards in both tasks, yet remained above chance during the 0.6–1.55 s interval (all ps<0.001). Similar time courses were also observed when training in one task and testing for generalization to the other. Though rapidly dropping to chance-level after ~1 s, classifiers trained in the perception task performed above chance during the first three time windows on working memory trials (and vice versa; all ps<0.014), indicating that, early on, both tasks recruited similar brain mechanisms.

Temporal generalization analyses (King and Dehaene, 2014) were used to evaluate the onset and duration of patterns of brain activity. If working memory were just a prolonged conscious episode, classifiers trained at time points relevant to conscious perception (e.g., the P3b window) should generalize extensively, potentially spanning the entire delay. Our findings supported this hypothesis only in part. The temporal generalization matrix for the working memory task presented as a thick diagonal, suggesting that brain activity was mainly characterized by changing, but long-lasting patterns. Though failing to achieve statistical significance over the entire 0.6–1.55 s interval (all ps>0.101), at a more lenient, uncorrected threshold, classifiers trained during the P3b time window (300–600 ms) in the working memory task remained weakly efficient until ~692 ms (AUC = 0.54 ±- 0.02, puncorrected=0.023). Similarly, classifiers trained during the same time period in the perception task and tested in the working memory task persisted up to ~860 ms (AUC = 0.53 ± 0.01, puncorrected=0.028). Brain processes deployed for the conscious representation of the target were thus partially sustained during the working memory delay. The reverse analysis, in which we trained classifiers during the retention period in the working memory task (0.8–2.5 s), did not reveal any generalization to the P3b time window in the perception task (p= 0.101).

These results confirm that seeing the target entailed a similar unfolding of neural events in two task contexts: Conscious perception primarily consisted in a dynamic series of partially overlapping information-processing stages, each characterized by temporary, metastable patterns of neural activity. The same neural codes appeared to be recruited at the beginning of the maintenance period (up to ~1 s). As such, these findings corroborate previous accounts linking conscious perception to an ‘ignition’ of brain activity (Del Cul et al., 2007; Gaillard et al., 2009; Salti et al., 2015; Sergent et al., 2005) and suggest that, in part, working memory implies the prolongation of a conscious episode, and, in part, a succession of additional processing steps.

A sustained decrease in alpha/beta power distinguishes conscious working memory

Our focus so far has been on evoked brain activity. However, other reliable neural signatures of conscious perception have been identified in the frequency domain (Gaillard et al., 2009; Gross et al., 2007; King et al., 2016; Wyart and Tallon-Baudry, 2009). We thus turned to time-frequency analyses and first contrasted seen trials with both our target-absent control condition as well as unseen trials in both tasks (Figure 4A and Figure 4—figure supplement 1A). In order to qualify as a signature of conscious perception, any candidate characteristic should exist in the perception-only control condition (without any working memory requirement) and be specific to seen trials. Cluster-based permutation analyses singled out a desynchronization in the alpha band (8–12 Hz) as the principal correlate of conscious perception in the perception task (seen – target-absent: pclust=0.004; seen – unseen: pclust=0.009), with seen trials displaying a strong decrease in power (relative to baseline) compared to either the target-absent or the unseen trials. Initially left-lateralized in centro-temporal sensors, this effect moved to fronto-central channels and extended between ~300 and 1700 ms. A similar, albeit later (500–1700 ms) and more bilateral fronto-central, desynchronization was also observed in the beta band (13–30 Hz; seen – target-absent: pclust<0.001; seen – unseen: pclust=0.01). No differences between the unseen and target-absent trials were found in the alpha (pclust>0.676) or beta band (pclust>0.226, apart from a short-lived, weak difference between ~0.9 and 1.3 s, where pclust=0.020), suggesting that unseen trials strongly resembled trials without a target.

Figure 4 with 3 supplements see all
A sustained decrease in alpha/beta power as a marker of conscious working memory.

(A) Average time-frequency power relative to baseline (dB) as a function of task and visibility category in a group of occipital (left) and fronto-central (right) magnetometers. Mean reaction time (target-present trials) for the visibility response in the perception task is indicated as a vertical, dotted line. (B) Beta band activity (13–30 Hz; 0–2.1 s) related to conscious working memory (seen – unseen trials) as shown in magnetometers (top) and source space (bottom; in dB relative to baseline). Black asterisks indicate sensors showing a significant difference as assessed by a Monte-Carlo permutation test. (C) Same as in (A) and (B) but for unseen correct and unseen incorrect trials in the alpha band (8–12 Hz). 

https://doi.org/10.7554/eLife.23871.007

Most importantly, when comparing seen and target-absent/unseen trials in the working memory task, we again observed a similar, but now temporally sustained, pattern of alpha/beta band desynchronization (Figure 4B and Figure 4—figure supplement 1B). Starting at ~300 to 500 ms, seen targets evoked a power decrease in central, temporal/parietal, and frontal regions in the alpha (seen – target-absent: pclust=0.003; seen – unseen: pclust=0.003) and beta band (seen – target-absent: pclust=0.009; seen – unseen: pclust<0.001). Crucially, this desynchronization spanned the entire delay period and was specific to seen trials (Figure 4A), with no differences in power between the unseen and target-absent trials in either band (alpha: pclust>0.729; beta: pclust>0.657) and only a couple of interspersed periods of residual desynchronization persisting in the target-absent control trials. No task- or visibility-related modulations in power spectra were found in occipital areas, and the desynchronization originated primarily from a parietal network of brain sources (Figure 4A and B). In conjunction with the afore-mentioned results, these findings imply that alpha/beta desynchronization is a correlate of conscious perception (Gaillard et al., 2009) and a neural state common to conscious perception and conscious working memory.

A distinct neurophysiological mechanism for non-conscious working memory

Having identified markers of conscious perception and working memory in both multivariate and time-frequency analyses, we can now test the reality of non-conscious working memory by confronting it with several alternative hypotheses. The miscategorization hypothesis suggests that the long-lasting blindsight resulted from a small set of seen trials erroneously labeled as unseen. Unseen correct trials should thus display similar neural signatures as seen trials, including a shared discriminative decoding axis and a desynchronization in the alpha/beta band. An analogous reasoning holds for the conscious maintenance hypothesis, according to which the observed blindsight effect arises from the conscious maintenance of an early guess: Conscious processing would occur on unseen trials and we should thus find a sustained decrease in alpha/beta power similar to the one on seen trials. Conversely, a clear distinction between brain responses on seen trials and on unseen (correct) trials would suggest that blindsight resulted from a distinct non-conscious mechanism of information maintenance.

We first probed the alternative hypotheses with the ERF data. Training a decoder to distinguish seen from unseen trials in the perception task and applying it to the unseen correct and incorrect trials in the working memory task, we directly assessed the classifier’s ability to generalize from seen to unseen correct trials (accuracy decoder). If, indeed, the latter had actually been seen, such a decoder should look similar to the above-described generalization analysis, in which a classifier had been trained on seen/unseen trials in the perception task and tested on the same labels in the working memory task (visibility decoder). As shown in Figure 4—figure supplement 2A, this was not the case. Whereas the temporal generalization matrix for the visibility decoder presented as a thick diagonal, no discernable pattern emerged for the accuracy decoder. The time courses of diagonal decoding were also quite dissimilar. For the visibility decoder (see also above), classification performance first rose above chance at ~148 ms (AUC = 0.54 ± 0.01, pFDR=0.023), peaked at ~640 ms (AUC = 0.58 ± 0.02, pFDR=0.001), and then decayed rapidly by ~1 s (first three time windows, all ps<0.001). In contrast, classification for the accuracy decoder was erratic and transient: It first sharply peaked at ~180 ms (AUC = 0.55 ± 0.01, puncorrected=0.037), dropped to chance-level, and then exceeded chance between ~372 and 724 ms with a peak at 444 ms (AUC = 0.57 ± 0.02, puncorrected=0.007). Much unlike any of the previous decoders involving the perception task, long after the visibility response, it rose a third time between ~1.44 and 1.74 s, peaking with similar magnitude as before at ~1.58 s (AUC = 0.57 ± 0.02, puncorrected=0.010; P3b and last time window: all ps<0.023). Although the level of noise evident in the accuracy decoder thus precludes any definitive conclusion, the visibility and accuracy decoders had little in common, rendering it unlikely for the unseen correct trials to have simply been mislabeled.

We next returned to our time-frequency analysis. When averaging over all unseen trials in the working memory task, there was no indication of a desynchronization remotely comparable to the one on seen trials (Figure 4A and Figure 4—figure supplement 1C). Indeed, Bayesian statistics indicated that, on the unseen trials, evidence for the null hypothesis (i.e., no relative change in alpha/beta power) was at least similar (at the very end of the epoch) or stronger than evidence for the alternative hypothesis. By contrast, on seen trials, evidence for the alternative hypothesis was always strongly favored (Figure 4—figure supplement 3). Even when analyzing the unseen correct trials separately, there was no appreciable trace of any alpha/beta desynchronization (Figure 4C and Figure 4—figure supplement 3). Only one short-lived effect, reversed relative to conscious trials, was observed in the alpha band (pclust=0.040) in a set of posterior central sensors, corresponding to primarily occipital sources: Starting at ~1.5 s and extending until ~1.9 s, unseen correct trials exhibited a stronger increase in alpha power than their incorrect counterparts. Given the difference in performance on these two types of unseen trials, such small variations are not surprising and could, perhaps, reflect a stronger suppression of interference from the distractor on the unseen correct trials. Unseen correct trials thus appeared to be nearly indistinguishable from the unseen incorrect and target-absent trials.

As multivariate analyses might be more sensitive than univariate ones in detecting similarities between conditions, we also performed the above decoding analysis separately for average alpha (8–12 Hz) and beta (13–30 Hz) power. Overall, these analyses confirmed our previous findings, albeit more clearly so in the alpha than in the beta band. A visibility decoder trained on alpha power to distinguish seen from unseen trials in the perception task and tested in the working memory task again exhibited a thick diagonal, with above-chance decoding between ~180 ms and 1.18 s (first three time windows: all ps<0.016). There was no evidence for any generalization to the unseen correct trials (Figure 4—figure supplement 2B; all time windows: ps>0.211). Similarly, a visibility decoder trained on average beta power entirely failed to generalize to the unseen correct trials (Figure 4—figure supplement 2C; all time windows: ps>0.191). Considering the weak, although statistically significant (all four time windows, ps<=0.05), initial generalization from the perception to the working memory task, probably due to the slightly later onset of the beta desychronization in the former, this failure is less informative than the one observed in the alpha band and should be replicated in future investigations.

Taken together, we found a clear distinction in the brain responses of seen and unseen (correct) trials. Converging evidence from our decoding analyses in the ERFs and alpha/beta band suggests that there was no apparent discriminative axis shared between the seen and the unseen correct trials. Similarly, the desynchronization in alpha/beta power characterizing the seen targets did not emerge on the unseen (correct) trials. These findings therefore argue against the miscategorization and conscious maintenance hypotheses and instead suggest that non-conscious working memory is a genuine phenomenon, distinct from conscious working memory.

Contents of conscious and non-conscious working memory can be tracked transiently

We next set out to identify the neural mechanisms supporting both conscious and non-conscious working memory and first determined where and how the specific contents of working memory were stored. Circular-linear correlations between the amplitude of the ERFs and target location (across all working memory trials) revealed a strong and focal association (relative to a permuted null distribution) over posterior channels, starting at ~120 ms and lasting until 904 ms (early and P3b time windows: all ps<0.001; all BFs>109.60; Figure 5A and Supplementary files 2 and 3). Similarly, distractor position could be tracked between ~194 and 570 ms after its presentation (early and P3b time windows: all ps<0.009; all BFs>14.47). The position of our stimuli could thus be faithfully retrieved in visual areas.

In a subsequent step, we investigated how target location would be maintained in the context of conscious and non-conscious working memory (Figure 5B). Target position was transiently encoded via slowly decaying activity in occipital as well as bilateral temporo-occipital cortex from ~120 to 800 ms on seen trials (early and P3b time windows: all ps<0.001 and all BFs>24.07, with the exception of the 100–300 ms period in right temporo-occipital channels, where p=0.064 and BF=2.31) and in occipital and left temporo-occipital brain areas from ~180 to 504 ms on unseen trials (early time window: all ps<0.047; all BFs>2.58). A clear correlation with target location was therefore found for both seen and unseen trials. In fact, although it was more short-lived on the latter, it was of comparable magnitude as the one observed on the seen trials during the early time window (occipital/left temporo-occipital channels: all ps>0.110 when directly comparing the correlation scores of seen and unseen trials in a Wilcoxon signed-rank test). In the case of seen trials, both occipital and left temporo-occipital cortex also maintained the target representation at least throughout the first part of the delay period (all ps<0.024; all BFs>3.77), though, intriguingly, this was not accompanied by continuously sustained activity. Target ‘decodability’ instead waxed and waned, appearing and disappearing periodically. No such activity was observed for the maintenance of unseen targets (first and second part of the delay: all ps>0.446; all BFs<0.047). This absence of ‘decodability’ during the maintenance period persisted, even when considering unseen correct and unseen incorrect trials separately (Figure 5C). There was only a trace of residual decoding of target location on unseen correct trials in left temporo-occipital areas during the delay period, but this did not reach significance, potentially due to the low number of trials in this condition. Note that in the perception task, seen targets could be retrieved similarly to their counterparts in the working memory task between ~232 and 1184 ms in occipital and bilateral temporo-occipital regions (all ps>0.068, except for the 100–300 ms time window in occipital channels where p=0.008, when directly comparing the correlation scores of seen targets in both tasks in a Wilcoxon signed-rank test; Figure 5—figure supplement 1).

Figure 5 with 3 supplements see all
Tracking the contents of conscious and non-conscious working memory.

(A) Topographies (top) and time courses (bottom; −0.2–2.5 s) of average circular-linear correlations between the amplitude of the MEG signal (gradiometers) and target/distractor location. Shaded area demarks standard error of the mean (SEM) across subjects. Thick line represents significant increase in correlation coefficient as compared to an empirical baseline (one-tailed Wilcoxon signed-rank test across subjects, uncorrected). (B) Average time courses (−0.2–2.5 s) of circular-linear correlation coefficients between amplitude of the ERFs and target location as a function of visibility in the working memory task in a group of left temporo-occipital (left), occipital (middle), and right temporo-occipital (right) gradiometers. Shaded area demarks standard error of the mean (SEM) across subjects. Thick line represents significant increase in correlation coefficient as compared to an empirical baseline (one-tailed Wilcoxon signed-rank test across subjects, uncorrected). Insets show average correlation coefficients (relative to an empirical baseline) in four time windows: 100–300 ms (early), 300–600 ms (P3b), 0.6–1.55 s (Del1), and 1.55–2.5 s (Del2). White asterisks denote significant differences to baseline (one-tailed Wilcoxon signed-rank test across subjects), black asterisks significant differences between conditions (two-tailed Wilcoxon signed-rank test across subjects). For display purposes, data were lowpass-filtered at 8 Hz. *p<0.05, **p<0.01, and ***p<0.001. Del1= first part of delay, Del2 = second part of delay, T = target onset. (C) Same as in (B), but as a function of accuracy on the unseen trials (correct = within ±2 positions of the target).

https://doi.org/10.7554/eLife.23871.011

Given the univariate nature of the circular-linear correlations, one might again wonder whether a multivariate strategy would be more sensitive in detecting subtle associations between the MEG data and target location. We therefore used linear support vector regressions (SVR) to predict target angle from the MEG signal as a function of visibility (Materials and methods). As can be seen in Figure 5—figure supplement 2, this method resulted in similar, albeit more noisy, time courses as the ones obtained with the circular-linear correlations: Seen targets were again encoded and maintained intermittently between ~268 ms and 1.4 s (P3b time window and first part of the delay: ps<0.05). No statistically significant decoding emerged for unseen target locations. Due to the fact that subjects responded correctly on approximately half of all unseen trials (see Supplementary file 4 for average trial counts), we attempted to evaluate the dynamics of the encoding and maintenance of unseen correct and incorrect target locations by training the regression model on the strongest case, the seen correct trials, and applying it separately to the unseen correct and incorrect trials. We again observed no evidence for any generalization at all (Figure 5—figure supplement 3A), though this likely reflects the sensitivity of the analysis more so than any meaningful effect.

Taken together, in line with previous research (Harrison and Tong, 2009; King et al., 2016), these results suggest that posterior sensory regions may initially encode seen and unseen memoranda via slowly decaying neural activity. In the case of conscious working memory, these then seem to be maintained by those same areas through an intermittently reactivated, neural code (Fuentemilla et al., 2010). In contrast, no such periodically resurfacing activity appears to accompany non-conscious working memory.

Further evidence against the conscious maintenance hypothesis

The correlation between target location and brain activity affords an additional way to interrogate the conscious maintenance hypothesis. If subjects quickly guessed the location of an unseen target and then held it in conscious working memory, in addition to observing a signature of conscious processing on the unseen trials, we should observe a correlation with the location of their response long before it occurs. Potentially, remembering the response might recruit brain systems completely different from the ones representing the target.

Circular-linear correlations rendered this prediction unlikely. Associations between response location and the MEG signal were again primarily confined to posterior channels, with more frontal areas being recruited preferentially at the time of the response (Figure 6A). As such, the topographical patterns were highly similar to the ones observed for the correlation with target location. Importantly, no additional regions were identified on the unseen trials and none of these areas showed any appreciable correlation before the presentation of the response screen (Figure 6—figure supplement 1). This suggests that, irrespective of stimulus visibility, common brain networks supported memories for the target stimulus and the ensuing decision and that, in the case of non-conscious working memory, these did not come online until the response.

Figure 6 with 2 supplements see all
Tracking response location in conscious and non-conscious working memory.

(A) Topographies of average circular-linear correlations between the amplitude of the MEG signal (gradiometers) and response location. R = onset of the response screen. (B) Average time courses (left: stimulus-locked, −0.2–2.5 s; right: response-locked, −0.5–0.8 s) of circular-linear correlation coefficients between the amplitude of the ERFs and response location as a function of visibility in the working memory task in a group of occipital (top, left), frontal (top, right) left temporo-occipital (bottom, left) and right temporo-occipital (bottom, right) gradiometers. Shaded area demarks standard error of the mean (SEM) across subjects. Thick line represents significant increase in correlation coefficient as compared to an empirical baseline (one-tailed Wilcoxon signed-rank test across subjects, uncorrected). Insets show average correlation coefficients (relative to an empirical baseline) in four stimulus-locked time windows, 100–300 ms (early), 300–600 ms (P3b), 0.6–1.55 s (Del1), and 1.55–2.5 s (Del2), and two response-locked time windows, −0.5–0.0 s (Del3) and 0.0–0.8 s (Resp). White asterisks denote significant differences to baseline (one-tailed Wilcoxon signed-rank test across subjects), black asterisks significant differences between conditions (two-tailed Wilcoxon signed-rank test across subjects). For display purposes, data were lowpass-filtered at 8 Hz. *p<0.05, **p<0.01, and ***p<0.001. Del1= first part of delay, Del2 = second part of delay, Del3 = last 500 ms before response screen, R = response screen onset, T = target onset. (C) Same as in (B), but as a function of accuracy on the unseen trials (correct = within ±2 positions of the target).

https://doi.org/10.7554/eLife.23871.015

The time courses of the circular-linear correlations further solidified this interpretation (Figure 6B). On seen trials, response position was maintained throughout the majority of the epoch in occipital and left temporo-occipital brain areas (first three time windows: all ps<0.020; all BFs>4.16). This was not the case on the unseen trials: No correlation patterns appeared in any of the posterior channels during the course of the epoch (all time windows: all ps>0.064; all BFs<1.32). In contrast, a strong correlation emerged for both seen and unseen trials during the response period (0–800 ms with respect to the onset of the letter cue). Response location could be tracked with similar time courses and magnitude on seen and unseen trials in occipital, bilateral temporo-occipital, and frontal channels (all ps<0.024; all BFs>13.73; when directly comparing the correlation scores of seen and unseen targets in a Wilcoxon signed-rank test: all ps>0.216, except for left temporo-occipital channels, where p=0.040). When we further distinguished unseen correct from unseen incorrect trials, the results remained similar, though much noisier (Figure 6C): There was no clear correlation pattern before the onset of the response screen on either the unseen correct or the unseen incorrect trials (all ps>0.096; all BFs<1.47). Only after the appearance of the letter cues did we observe a correlation with response location.

Multivariate decoding analyses confirmed this picture: Whereas response location for seen targets could be tracked similarly to actual target location at least throughout the first part of the delay period (P3b time window and first part of the delay: ps<0.05; Figure 6—figure supplement 2), no such pattern was observed on the unseen trials (all ps>0.153). This absence of decodability persisted on the unseen correct and incorrect trials, even when training the regression model on the seen correct trials (Figure 5—figure supplement 3B).

Overall, these results are incompatible with the hypothesis that the long-lasting blindsight is only due to the conscious maintenance of an early guess, as, in this case, brain responses linked to the subjects’ responses should have been observed shortly after the presentation of the target stimulus.

Short-term synaptic change as a neurophysiological mechanism for conscious and non-conscious working memory

What mechanism might permit above-chance recall without any continuously sustained brain activity? Recent modelling suggests that sustained neural firing may not be required to maintain a representation in conscious working memory. Mongillo et al. (2008) proposed a theoretical framework for working memory, in which information is stored in calcium-mediated short-term changes in synaptic weights, thus linking the active cells coding for the memorized item. Once these changes have occurred, the cell assembly may go dormant during the delay, while the synaptic weights are slowly decaying. At the end of the delay period, a non-specific read-out signal may then suffice to reactivate the assembly. Furthermore, reactivation of the assembly may also occur spontaneously during the retention phase, similar to the rehearsal process postulated by Baddeley (2003), thus refreshing the weights and permitting the bridging of longer delays. Could this ‘activity-silent’ mechanism also constitute a plausible neural mechanism for non-conscious working memory?

To test this hypothesis, we simulated our experiments using a one-dimensional recurrent continuous attractor neural network (CANN) based on Mongillo et al. (2008). The CANN encoded the angular position of the target and was composed of neurons aligned according to their preferred stimulus value (Figure 7A). Transient short-term plasticity between the recurrent connections, with a 4s-decay constant, was implemented as described by Mongillo et al. (2008; Figure 7B). Timing of the simulated events was comparable to the experimental paradigm: A target signal was briefly presented at a random location, followed by a mask signal to all neurons and a non-specific recall signal after a 3s-delay.

Activity-silent neural mechanisms underlying conscious and non-conscious working memory.

(A) Structure of a one-dimensional continuous attractor neural network (CANN). Neuronal connections J (θ, θ’) are translation-invariant in the space of the neurons’ preferred stimulus values (-π, π), allowing the network to hold a continuous family of stationary states (bumps). An external input Ie (θ, t) containing the stimulus information triggers a bump state (red curve) at the corresponding location in the network. (B) Model of a synaptic connection with short-term potentiation. In response to a presynaptic spike train (bottom), the neurotransmitter release probability u increases and the fraction of available neurotransmitter x decreases (middle), representing synaptic facilitation and depression. Effective synaptic efficacy is proportional to ux (top). (C) Firing rate of neurons (top) and sequence of events (bottom; target and mask signal) when simulating conscious working memory with Amask = 50 Hz < Acritical. (D) Same as in (C) for non-conscious working memory when Amask = 65 Hz > Acritical. (E, F) Performance of the network (distribution of responses) when mask amplitude was near the critical level, Amask = 62 Hz ~ Acritical, and noise had been added to the system. Out of 4000 trials, 2035 resulted in the conscious (E) and the remainder in the non-conscious regime (F). In both cases, performance remained above chance with the responses concentrated around the initial target location.

https://doi.org/10.7554/eLife.23871.018

If the activity-silent mechanism constituted a plausible neurophysiological correlate of conscious and non-conscious working memory, these simulations should capture our principal findings. A stimulus presented at threshold should entail one of two different maintenance regimes: a first distinguished by near-perfect recall with spontaneous reactivations of the memorized representation throughout the retention period (thus resembling the prolonged, yet fluctuating, ‘decodability’ of seen target locations), and a second characterized by above-chance objective performance in the almost complete absence of delay activity (thereby portraying the time course of the circular-linear correlations for the unseen stimuli).

In a noiseless model, there indeed existed a critical value of mask amplitude, Acritical, which separated two distinct regimes: Just as was the case for our seen trials, when Amask < Acritical, the neural assembly coding for the target spontaneously reactivated during the delay (Figure 7C). However, when Amask > Acritical, the system evolved into a state without spontaneous activation of target-specific neurons, yet with a reactivation in response to a non-specific recall signal, mimicking our unseen trials (Figure 7D). When fixing mask amplitude near Acritical and adding noise continuously or just to the inputs, the network exhibited both types of regimes in nearly equal proportions: 50.8% of trials were characterized by an activity-silent delay interspersed with spontaneous reactivations and 49.2% by an entirely activity-silent delay period. Reminiscent of our behavioral results, sorting the trials according to the existence or absence of these reactivations and computing the histograms of recalled target position relative to true location produced two distributions of objective working memory performance: one, in which target position was nearly accurately stored (Figure 7E), and one, in which performance remained above chance despite a higher base rate of errors (Figure 7F). These simulations replicate our experimental findings (in particular Figures 2 and 5) and suggest the activity-silent framework as a likely candidate mechanism for both conscious and non-conscious working memory.

Discussion

Conscious perception and working memory are thought to be intimately related, yet recent evidence challenged this assumption by proposing the existence of non-conscious working memory (Soto et al., 2011). The present results may reconcile these views. Both conscious perception and conscious working memory shared similar signatures, including an alpha/beta power decrease, the latter spanning the entire delay on working-memory trials. However, participants remained able to localize a subjectively invisible target after a 4s-delay. We found no evidence that this long-lasting blindsight could simply be explained by erroneous visibility reports or by the conscious maintenance of an early guess. It thus likely reflects genuine non-conscious working memory. Despite the inherent differences in subjective experience for conscious and non-conscious working memory, a single, activity-silent mechanism might support both conscious and non-conscious information maintenance. We now discuss these points in turn.

Shared brain signatures underlie conscious perception and conscious working memory

Consistent with introspective reports and research on visual awareness and working memory (Baddeley, 2003; Dehaene et al., 2014), we observed a close relationship between conscious perception and maintenance in conscious working memory. In both tasks, classifiers trained to separate seen and unseen trials resulted in thick diagonals up to ~1 s after target onset, even when generalizing from one task to the other. Such long diagonals have repeatedly been observed in recent studies and are thought to reflect sequential processing (King and Dehaene, 2014; Marti et al., 2015; Salti et al., 2015; Stokes et al., 2015; Wolff et al., 2015). Irrespective of context, conscious perception and early parts of conscious maintenance thus involve a similar series of partially overlapping processing stages.

Time-frequency decompositions reinforced and extended this conclusion. Seen trials in the perception task were distinguished from both a target-absent control condition and unseen trials by a prominent decrease in alpha/beta power over fronto-central sensors, corresponding to a distributed network centered on parietal cortex. A similar desynchronization, sustained throughout the retention period, was also observed for conscious working memory. Alpha/beta band desynchronizations such as these have previously been linked with conscious perception (Gaillard et al., 2009; Wyart and Tallon-Baudry, 2009) and working memory (Lundqvist et al., 2016). Modelling suggests that the memorized item is encoded by intermittent gamma bursts, which interrupt an ongoing desynchronized beta default state (Lundqvist et al., 2011). Such a decreased rate of beta bursts, once averaged over many trials, would have resulted in the apparently sustained power decrease we observed. Increases in gamma power have also been shown in some studies on conscious perception (e.g., Gaillard et al., 2009), but we failed to detect it here, perhaps because our targets were brief, peripheral, and low in intensity.

Circular-linear correlations further highlighted the similarity between conscious perception and working memory. Location information could be tracked for ~1 s on perception-only trials and for at least 1.5 s of the working memory retention period. The mental representation formed during conscious perception was therefore either maintained or repeatedly replayed during conscious working memory.

Long-lasting blindsight effect reflects genuine non-conscious working memory

Even when subjects indicated not having seen the target, they still identified its position much better than chance up to 4 s after its presentation. This long-lasting blindsight effect was replicated in two independent experiments and exhibited typical properties of working memory, withstanding salient visible distractors and a concurrent demand on conscious working memory. Those results corroborate previous research showing that information can be maintained non-consciously (e.g., Bergström and Eriksson, 2014, 2015; Dutta et al., 2014; Soto et al., 2011). However, these prior findings could have arisen due to errors in visibility reports. If, for example, a participant had been left with a weak impression of the target (and, consequently, its location), he or she might not have had adequate internal evidence to refer to this perceptual state as seen, thus incorrectly applying the label unseen. A small number of such errors would have produced above-chance responding. Another explanation could have been the conscious maintenance of an early guess, whereby subjects would have ventured a prediction as to the correct target position immediately after its presentation and then consciously maintained this hunch.

The MEG results provide evidence against these possibilities. First, whereas seen trials were characterized by a sustained desynchronization in the alpha/beta band in parietal brain areas, no comparable desynchronization was observed on unseen trials, even when subjects correctly identified the target location. On the contrary, the only, short-lived, difference between unseen correct and unseen incorrect trials emerged around the time of the distractor and was reversed in direction: Unseen correct trials were accompanied by an increase in power in the alpha band with respect to their incorrect counterpart, an effect that might relate to a successful attempt to reduce interference from the distractor (Cooper et al., 2003; Jensen and Mazaheri, 2010). Otherwise, unseen correct and incorrect trials were indistinguishable in their power spectra and similar to the target-absent control condition. Second, there was no clear evidence for a shared discriminative decoding axis between the seen and the unseen correct trials: Generalization was entirely unsuccessful when the classifier was trained on the time-frequency data, and highly dissimilar from the original visibility decoder when trained on the ERFs. While it is impossible to draw definitive conclusions just from the current dataset and future research should replicate these results, the majority of our evidence thus points against an interpretation, in which the unseen correct trials constituted either just a subset of seen trials, or arose from the conscious maintenance of an early guess. Instead, inasmuch as the observed desynchronization serves as a faithful indicator of conscious processing, it argues in favor of a differential state of non-conscious working memory with a distinct neural signature.

Circular-linear correlations as well as multivariate regression models between the amplitude of the MEG signal and response location support this interpretation. On seen trials, response position was coded akin to target location: Initially maintained via slowly decaying neural activity in posterior brain areas, the response code subsequently resurfaced intermittently in the same as well as more frontal regions. There was no detectable evidence for such a code on the unseen trials. Only during the very last part of the delay, right before the response, did response-related neural activity emerge and ramp up to the same level as on seen trials during the response period. As such, the absence of any prior delay-period activity does not appear to be an artifact attributable to low statistical power or an increase in noise on the unseen trials. Instead, in conjunction with the absence of any signature of conscious processing on these trials, these findings imply that subjects did not consciously maintain an early guess and rather relied on genuine non-conscious working memory to perform the task.

In this context, an interesting avenue for future investigations might be to delineate the boundary conditions of such non-conscious working memory. Although the short-term maintenance of information certainly lies at the heart of most theories of working memory (Eriksson et al., 2015), there exist additional criteria for working memory that were not investigated in the present study. It is thus an interesting empirical question whether these other working memory processes may also occur without subjective awareness. Is it, for example, possible to manipulate information non-consciously? Though speculative, in light of the proposed activity-silent code for non-conscious maintenance (without any spontaneous reactivations; see below), it seems unlikely. Being an entirely passive process, it is not clear how stored representations could be transformed without being persistently activated and thus becoming conscious. Future research is, however, needed to provide a definitive answer.

A theoretical framework for ‘activity-silent’ working memory

Target-related activity was not continuously sustained throughout the delay period, even when the target square had been consciously perceived. It instead fluctuated, disappearing and reappearing intermittently. This feature was even more pronounced on the unseen trials, with no evidence for any such retention-related activity beyond ~1 s. We presented a theoretical framework, based on Mongillo et al. (2008) and the concept of ‘activity-silent’ working memory (Stokes, 2015), that may provide a plausible explanation for maintenance without sustained neural activity. According to this model, short-term memories are retained by slowly decaying patterns of synaptic weights. A retrieval cue presented at the end of the delay may then serve as a non-specific read-out signal capable of reactivating these dormant representations above chance-level. Support for this model comes from experiments in which non-specific, task-irrelevant stimuli (Wolff et al., 2017, 2015), neutral post-cues (Sprague et al., 2016), or transcranial magnetic stimulation (TMS) pulses (Rose et al., 2016) presented during a delay restore the decodability of representations. Direct physiological evidence for the postulated short-term changes in synaptic efficacies also exists (Fujisawa et al., 2008).

The present non-conscious condition provides further support for such an activity-silent mechanism. In this framework, a stimulus that fails to cross the threshold for sustained activity and subjective visibility may still induce enough activity in high-level cortical circuits to trigger short-term synaptic changes. Such transient non-conscious propagation of activity has been simulated in neural networks (Dehaene and Naccache, 2001) and measured experimentally in temporo-occipital, parietal, and even prefrontal cortices (Salti et al., 2015; van Gaal and Lamme, 2012). In the present work, we indeed observed some residual, transiently decodable activity over left temporo-occipital sensors on unseen correct trials. The memory of target location could therefore have arisen from posterior visual maps (Roelfsema, 2015), although future research should test this prediction further. Note that activity-silent mechanisms need not apply solely to prefrontal cortex as originally proposed by Mongillo et al. (2008), but constitute a generic mechanism that may be replicated in different areas, possibly with increasingly longer time constants across the cortical hierarchy (Chaudhuri et al., 2014). Only some of these areas/spatial maps may be storing the information on unseen trials.

A key feature of the model by Mongillo et al. (2008) and the present simulations is that, even for above-threshold (‘seen’) stimuli, delay activity is not continuously sustained. Occasional bouts of spontaneous reactivation instead refresh the synaptic weights and maintain the memory for an indefinite time. The time courses of the circular-linear correlations and of the multivariate decoding we observed on seen trials match this description: While target location was encoded and maintained in temporo-occipital areas, target ‘decodability’ was not constantly sustained, but waxed and waned throughout the delay. Fuentemilla et al. (2010) also observed that, during a delay period, decodable representations of memorized images recurred at a theta rhythm. More recently, single-trial analyses of monkey electrophysiological recordings in a working memory task have confirmed the absence of any continuous activity and instead identified the presence of discrete gamma bursts, paired with a decrease in beta-burst probability (Lundqvist et al., 2016). Such periodic refreshing of otherwise activity-silent representations could potentially serve as the neural correlate of conscious rehearsal, a central feature of working memory according to Baddeley (2003). It also suggests, however, that even consciously perceived items may not always be ‘in mind.’ Future research might attempt to more directly simulate activity-silent mechanisms in the context of conscious and non-conscious perception by, for example, relying on more elaborate models capturing decreases in alpha/beta power (Lundqvist et al., 2011).

In conjunction with prior evidence (King et al., 2016; Salti et al., 2015), our findings therefore indicate that there may be two successive mechanisms for the short-term maintenance of conscious and non-conscious stimuli: an initial, transient period of ~1 s, during which the representation is encoded by active firing with a slowly decaying amplitude, and an ensuing activity-silent maintenance via short-term changes in synaptic weights, during which activity either intermittently resurfaces (conscious case) or vanishes (non-conscious case). Such activity-silent retention need not necessarily be specific to working memory. Recent investigations have, for instance, demonstrated the existence of recognition memory for invisible cues (Chong et al., 2014; Rosenthal et al., 2016). As delay periods ranged in the order of minutes rather than seconds, persistent neural activity seems to be an unlikely candidate mechanism of maintenance. Activity-silent codes might have been at play, though they probably depended on mechanisms with longer time constants than the relatively rapidly decaying patterns of synaptic weights discussed in the context of the present experiments. Nevertheless, activity-silent representations may constitute a general mechanism for maintenance across the whole spectrum of temporal delays (from seconds over minutes/hours to days/weeks/decades), thus forming a generic property of memory.

Limitations and future perspectives

Our study presents limitations that should be addressed by future research. Due to the nature of the current investigation (a working memory task with long trials and subjectively determined variables), a relatively small number of unseen trials was acquired, thus making it difficult to detect subtle effects. While our conclusions are supported by Bayes Factor analyses, converging evidence from univariate and multivariate techniques, and similar results obtained with larger samples in the domain of activity-silent conscious working memory (e.g., Rose et al., 2016; Wolff et al., 2017), a number of our observations are based on null effects, and it remains a possibility that we missed some target- and/or response-related activity on the unseen trials. Future research should thus aim at replicating the present findings with larger datasets or with more sensitive techniques, such as intra-cranial recordings. In particular, it might be interesting to further probe the relationship between seen, unseen correct, and unseen incorrect targets: A specific prediction of the proposed model is that unseen correct trials should possess enough activity to modify synaptic weights in high-level cortical circuits, yet without crossing the threshold for sustained activity and consciousness (‘failed ignition’). Unseen correct trials should thus share some of the processes that are found on seen trials and future research is necessary to directly test this hypothesis.

Conclusion

In contrast to a widely-held belief, our findings support the existence of genuine working memory in the absence of either conscious perception or sustained activity. Our proposal is that, following a transient encoding phase via active firing, non-conscious stimuli may be maintained by ‘activity-silent’ short-term changes in synaptic weights without any detectable neural activity, allowing above-chance retrieval for several seconds. Similar activity-silent codes also subserve conscious maintenance, though in this case periodic refreshing appears to stabilize the stored representations throughout the delay. Our findings thus highlight the need to refine our understanding of working memory, and to continuously challenge the limits of non-conscious processing.

Materials and methods

Subjects

38 healthy volunteers participated in the present study (experiment 1: N = 17, Mage = 23.3 years, SDage = 2.8 years, 10 men; experiment 2: N = 21, Mage = 24.3 years, SDage = 3.8 years, 9 men). They gave written informed consent and received 80 or 15€ as compensation for the imaging and behavioral paradigms. Due to noisy recordings, only 13 of the 17 subjects in experiment 1 were retained for the MEG analyses. Although sample size had not specifically been estimated for our study, it thus was reasonable given typical experiments in the field.

Experimental protocol

Participants performed variations of a spatial delayed-response task, designed to assess the retention of a target location under varying levels of subjective visibility (Figure 1A). Each trial began with the presentation of a central fixation cross (500 ms), displayed in white ink on an otherwise black screen. In experiment 1, a faint gray target square (RGB: 89.25 89.25 89.25) was flashed for 17 ms in 1 out of 20 equally spaced, invisible positions along a circle centered on fixation (radius = 200 pixels; eight repetitions/location). Another fixation cross (17 ms) preceded the display of the mask (233 ms). Mask elements were composed of four individual squares (two right above and below, and two to the left and right of the target stimulus), arranged to tightly surround the target square without overlapping it. They appeared simultaneously at all possible target locations. Mask contrast was adjusted on an individual basis in a separate calibration procedure (see below). A variable delay period with constant fixation followed the mask (2.5, 3.0, 3.5, or 4.0 s). On 50% of the trials in experiment 1, an unmasked distractor square, randomly placed and with the same duration as the target, was presented 1.5 s into the delay period.

After the delay, 20 letters – drawn from a subset of lower-case letters of the alphabet (excluded: e, j, n, p, t, v) – were randomly presented in the 20 positions (2.5 s). Participants were asked to identify the target location by speaking the name of the letter presented at the location. They were instructed to always provide a response, guessing if necessary. A trial ended with the presentation of the word Vu? (French for seen) in the center of the screen (2.5 s), cueing participants to rate the visibility of the target on the 4-point Perceptual Awareness Scale (PAS; 1: no experience of the target, 2: brief glimpse, 3: almost clear experience, 4: clear experience; Ramsøy and Overgaard, 2004) using the index, middle, ring, or little finger of their right hand (five-button non-magnetic response box, Cambridge Research Systems Ltd., Fiber Optic Response Pad). We instructed subjects to reserve a visibility rating of 1 for those trials, for which they had absolutely no perception of the target. The target square was also replaced by a blank screen on 20% of the trials, in order to obtain an objective measure of participants’ sensitivity to the presence of the target. The inter-trial interval (ITI) lasted 1 s. Subjects completed a total of 200 trials of this working memory task, divided into four separate experimental blocks. They also undertook two blocks of 100 trials each of a perception-only control paradigm, identical to the working memory task in all respects except that the delay period and target localization screen were omitted, such that the presentation of the mask immediately preceded subjects’ visibility ratings. Task order (perception vs. working memory) was counterbalanced across participants.

Experiment 2 was designed to investigate the impact of a conscious working memory load on non-conscious working memory. Apart from the following exceptions, it was identical to experiment 1: A screen with either 1 (low load) or 5 (high load) centrally presented digits (1.5 s) – randomly drawn (without replacement) from the numbers 1 through 9 – as well as a 1s-fixation period were shown prior to the presentation of the target square. Following either a 0s- or a 4s-delay period, subjects first identified the target location by typing their responses on a standard AZERTY keyboard (4 s). The French word for numbers (Numéros?) then probed participants to recall the sequence of digits in the correct order. Responses were again logged on the keyboard during a period of 4.5 s. Subjects last rated target visibility as in experiment 1 (3 s). The ITI varied between 1 and 2 s. Participants completed two experimental blocks of 100 trials each.

Calibration task

Prior to the experimental tasks, each participant’s perceptual threshold was estimated in order to ensure roughly equal proportions of seen and unseen trials. Subjects completed 150 (experiment 1: three blocks) or 125 (experiment 2: five blocks) trials of a modified version of the working memory task (no distractor, delay duration: 2 s in experiment 1 and 0 s in experiment 2), during which mask contrast was either increased (following a visibility rating of 2, 3, or 4) or decreased (following a visibility rating of 1) on each target-present trial according to a double-staircase procedure. Individual perceptual thresholds to be used in the main tasks were derived by averaging the mask contrasts from the last four switches from seen to unseen (or vice versa) of each staircase.

Behavioral analyses

We analyzed our behavioral data in Matlab R2014a (MathWorks Inc., Natick, MA; code available upon request) and SPSS Statistics Version 20.0 (IBM, Armonk, NY), using repeated-measures analyses of variance (ANOVAs). Only meaningful trials without missing responses were included in any analysis. Distributions of localization responses were computed for visibility categories with at least five trials per subject. Objective working memory performance was quantified via two complementary measures. The rate of correct responding was defined as the proportion of trials within two positions (i.e., ±36°) of the actual target location and served as an index of the amount of information that could be retained. Because 5 out of 20 locations were counted as correct, chance on this measure was 25%. The precision of working memory was estimated as the dispersion (standard deviation) of spatial responses. In particular, we modeled the observed distribution of responses D(n) as a mixture of a uniform distribution (random guessing) and an unknown probability distribution d (‘true working memory’):

(1) D(n)=pN+(1p)d(n)

where p refers to the probability that a given trial is responded to using random guessing; N to the number of target locations (N = 20); and n is the deviation from the true target location. We assumed that d(n) = 0 for deviations beyond a fixed limit a (with a = 2). This hypothesis allowed us to estimate p from the mean of that part of the distribution D for which one may safely assume no contribution of working memory:

(2) p^=D(n)|noutside[a,a](N2a1)N

where the model is designed in such a way as to ensure that p^=1 if D is a uniform distribution (i.e., 100% of random guessing) and p^=0 if D vanishes outside the region of correct responding (i.e., 0% of random guessing). There needs to be at least chance performance inside the region of correct responding, so

(3) D(n)|nϵ[a,a][2a1]N

which ensures 0p^1. This is the reason why, when computing precision, we included only subjects whose rate of correct responding for unseen trials, collapsed across all experimental conditions, significantly exceeded chance performance (i.e., 25%) in a χ2-test (p<0.05). An estimate of d,d^, can then be derived in two steps from Equation 1 as

(4) δ(n)=D(n)p^N1p^
(5) d^(n)=δ(n)|nϵ[a,a]δ(n)|nϵ[a,a].

We note that the distribution δ has residual, yet negligible, positive and negative mass (due to noise) outside the region of correct responding. In order to obtain d^, we therefore restricted the distribution δto[a,a], set all negative values to 0, and renormalized its mass to 1. The precision of the representation of the target location in working memory was then defined as the standard deviation of that distribution.

MEG recordings and preprocessing

In experiment 1, we recorded MEG with a 306-channel (102 sensor triplets: 1 magnetometer and 2 orthogonal planar gradiometers), whole-head setup by ElektaNeuromag (Helsinki, Finland) at 1000 Hz with a hardware bandpass filter between 0.1 and 330 Hz. Eye movements as well as heart rate were monitored with vertical and horizontal EOG and ECG channels. Prior to installation of the subject in the MEG chamber, we digitized three head landmarks (nasion and pre-auricular points), four head position indicator (HPI) coils placed over frontal and mastoïdian skull areas, and 60 additional locations outlining the participant’s head with a 3-dimensional Fastrak system (Polhemus, USA). Head position was measured at the beginning of each run.

Our preprocessing pipeline followed Marti et al. (2015). Using MaxFilter Software (ElektaNeuromag, Helsinki, Finland), raw MEG signals were first cleaned of head movements, bad channels, and magnetic interference originating from outside the MEG helmet (Taulu et al., 2004), and then downsampled to 250 Hz. We conducted all further preprocessing steps with the Fieldtrip toolbox (http://www.fieldtriptoolbox.org/; Oostenveld et al., 2011) run in a Matlab R2014a environment. Initially, MEG data were epoched between −0.5 and +2.5 s with respect to target onset for all stimulus-locked, and between −0.5 and +0.8 s with respect to the onset of the response screen for all response-locked analyses. Trials contaminated by muscle or other movement artifacts were then identified and rejected in a semi-automated procedure, for which the variance of the MEG signals across sensors served as an index of contamination. To remove any residual eye-movement and cardiac artifacts, we performed independent component analysis separately for each channel type, visually inspected the topographies and time courses of the first 30 components, and subtracted any contaminated component from the MEG data. Except for analyses requiring higher spatial precision (i.e., circular-linear correlations and decoding), results are presented for magnetometers only.

Further preprocessing steps depended on the nature of the subsequent analysis: Epochs retained for investigations based on evoked responses (i.e., ERFs, decoding, circular-linear correlations) were low-pass filtered at 30 Hz, while time-frequency decompositions relied on entirely unfiltered data. In the latter case, a sliding, frequency-independent Hann taper (window size: 500 ms, step size: 20 ms) was convolved with the unfiltered epochs in order to extract an estimate of power between 1 and 99 Hz (in 2 Hz steps) to identify the neural correlates of conscious and non-conscious perception and working memory in the frequency domain. Prior to univariate or multivariate statistical analysis, data (ERFs, time-frequency power estimates) were baseline corrected using a period between −200 and −50 ms.

Circular-linear correlations

To localize and track the neural representations of target, response, and distractor location, filtered epochs were transformed into circular-linear correlation coefficients. Following King et al. (2016), we combined the two linear correlation coefficients between the MEG signal and the sine and cosine of the angle defining the location in question (i.e., target, distractor, or response). An empirical null distribution was generated for each condition separately by shuffling the labels (i.e., target, distractor, or response location) at the corresponding time points and averaging the resulting distribution from 1000 such permutations.

Due to the spatial nature of our task, there is a possibility that subjects could have systematically moved their eyes after the presentation of the target, thus contaminating the correlation analyses. However, several lines of evidence suggest that this was not the case: First, participants were carefully instructed not to move their eyes. A close inspection of the EOG traces confirmed that subjects successfully implemented this request and did not display any strategic eye movements. Second, we carefully removed any trials contaminated by such movements as part of our preprocessing procedure. Third, the topographical patterns of the correlations show that the signal primarily originated in occipital and parietal channels. Eye movements therefore unlikely have driven the circular-linear correlations.

Sources

Individual anatomical magnetic resonance images (MRI), obtained with a 3D T1-weighted spoiled gradient recalled pulse sequence (voxel size: 1 * 1 * 1.1 mm; repetition time [TR]: 2300 ms; echo time [TE]: 2.98 ms; field of view [FOV]: 256 * 240 * 176 mm; 160 slices) in a 3T Tim Trio Siemens scanner, were first segmented into gray/white matter as well as subcortical structures with FreeSurfer (https://surfer.nmr.mgh.harvard.edu/). We then reconstructed the cortical, scalp, and head surfaces in Brainstorm (http://neuroimage.usc.edu/brainstorm; Tadel et al., 2011) and co-registered these anatomical images with the MEG signals, using the HPI coils and the digitized head shape as a reference. Current density distributions on the cortical surface were subsequently estimated separately for each condition and subject. Specifically, we employed an analytical model with overlapping spheres to compute the leadfield matrix and modeled neuronal current sources with an unconstrained (dipole orientation loosening factor: 0.2) weighted minimum-norm current estimate (wMNE; depth-weighting factor: 0.5) and a noise covariance obtained from the baseline period of all trials. Average time-frequency power in the alpha (8–12 Hz) and beta (13–30 Hz) bands was then estimated with complex Morlet wavelets using the Brainstorm default parameters, the resulting transformations projected onto the ICBM 152 anatomical template (Fonov et al., 2011, 2009), and the contrasts between the conditions of interest computed. Group averages for spatial clusters of at least 150 vertices are shown in dB relative to baseline and were thresholded at 60% of the maximum amplitude (cortex smoothed at 60%).

Multivariate pattern analyses

We employed the Scikit-Learn package (Pedregosa et al., 2011) as implemented in MNE 0.13 (Gramfort et al., 2013, 2014) in order to conduct our multivariate pattern analyses (MVPA). Following Marti et al. (2015) and King et al. (2016), we fit linear estimators at each time sample within each participant to isolate the topographical patterns best differentiating our experimental conditions. Support vector machines (Chang and Lin, 2011) were trained in the case of categorical data (i.e., visibility/accuracy) and a combination of two linear support vector regressions was used for circular data (i.e., target/response location) to estimate an angle from the arctangent of the separately predicted sine and cosine of the labels of interest.

A 5- (for categorical variables) or, due to the much larger number of labels, 2-fold (for circular variables), stratified cross-validation procedure was used in order to avoid overfitting: MEG data were first split into five (two) sets of trials with the same proportion of samples for each class. Within each fold, four (one) of these sets served as the training data and the remainder as the testing data. Model fitting, including all preprocessing steps, was exclusively performed on the training set. 50% of the most informative features (i.e., channels) were selected by means of a simple, univariate analysis of variance to reduce the dimensionality of the data (Charles et al., 2014; Haynes and Rees, 2006), the remaining channel-time features z-score normalized, and a weighting procedure applied in order to counteract the effects of any class imbalances. The classifier was then trained on the resulting data and applied to the left-out trials in order to identify the hyperplane (i.e., topography) best suited to separate the classes. This sequence of events (univariate feature selection, normalization, training and testing) was repeated five (two) times, ensuring that each trial would be included in the test set once.

Within the same cross-validation loop, we also evaluated the ability of each classifier to discriminate the experimental conditions of interest at all other time samples (i.e., generalization across time). This kind of MVPA results in a temporal generalization matrix, in which each entry represents the decoding performance of each classifier trained at time point t and tested at time point t’, and in which the diagonal corresponds to classifiers trained and tested on the same time points (King and Dehaene, 2014). Importantly, when interrogating the capacity of our classifiers to generalize across tasks or labels (e.g., from the perception to the working memory task, or from seen to unseen correct target locations), we modified the aforementioned cross-validation procedure to capitalize on the independence of our training and testing data (see http://martinos.org/mne/dev/auto_examples/decoding/plot_decoding_time_generalization_conditions.html#example-decoding-plot-decoding-time-generalization-conditions-py). As such, classifiers from each training set were directly applied to the entire testing set and the respective predictions averaged.

Classifiers for categorical data generated a continuous output in the form of the distance between the respective sample and the separating hyperplane for each test trial. In order to be able to compare classification performance across subjects, we then applied a receiver operating characteristic analysis across trials within each participant and summarized overall effect sizes with the area under the curve (AUC). Unlike average decoding accuracy, the AUC serves as an unbiased measure of decoding performance as it represents the true-positive rate (e.g., a trial was correctly categorized as seen) as a function of the false-positive rate (e.g., a trial was incorrectly categorized as seen). Chance performance, corresponding to equal proportions of true and false positives, therefore leads to an AUC of 0.5. Any value greater than this critical level implies better-than-chance performance, with an AUC of 1 indicating a perfect prediction for any given class. In contrast, classifiers for circular data were first summarized by computing the mean absolute difference between the predicted and the actual angle (range: 0 to π; chance: π/2) and then transformed into an ‘accuracy’ score (range: -π/2 to π/2; chance: 0). To facilitate comparability between different conditions, an additional baseline-correction was then performed.

Statistical analyses

We performed statistical analyses across subjects. For the ERF and time-frequency data, cluster-based, non-parametric t-tests with Monte Carlo permutations were used to identify significant differences between experimental conditions (Maris and Oostenveld, 2007). Further planned comparisons of ERF time courses (seen vs. unseen) in a-priori defined spatio-temporal regions of interest (i.e., P3b time window: 300–600 ms) were conducted with non-parametric signed-rank tests (puncorrected<0.05). A correction for multiple comparisons was then applied with a false discovery rate (pFDR<0.05).

Non-parametric signed-rank tests (puncorrected<0.05) were also employed to evaluate decoding performance and the strength of circular-linear correlations. Specifically, we assessed whether classifiers could predict the trials’ classes better than chance (categorical data: AUC > 0.5; circular data: rad > 0) and whether circular-linear correlation coefficients deviated from an empirical baseline (Δrho > 0). We report temporal averages over four a-priori time bins, corresponding to an early perceptual period (100–300 ms), the P3b time window (300–600 ms), and the first (0.6–1.55 s) and second (1.55–2.53 s) part of the delay period. To capitalize on the increased spatial selectivity of gradiometers, averaged time courses of these two channels are shown for circular-linear correlations.

Bayesian statistics, based on either two- (time-frequency analyses) or one-sided (circular-linear correlations) t-tests, were also computed when appropriate with a scale factor of r = 0.707 (Rouder et al., 2009).

Simulations

A one-dimensional, recurrent continuous attractor neural network (CANN) model (Mongillo et al., 2008) was adapted in order to simulate the experimental findings (Figure 7A). Individual neurons were aligned according to their preferred stimulus value, enabling the network to encode angular position of a target stimulus (range: -π to π; periodic boundary condition). The dynamics of this system were determined by the synaptic currents of each neuron given by

(6) τhE(θ,t)t=hθ+ρππJ(θ,θ)U(θ,t)X(θ,t)RE(θ,t)dθJEIRI+Ib+δ1ξ1(θ,t)+Ie+δ2ξ2(θ,t),
(7) u(θ,t)t=Uu(θ,t)τf+U[1u(θ,t)]RE(θ,t),
(8) x(θ,t)t=1x(θ,t)τdu(θ,t)X(θ,t)RE(θ,t),and
(9) τhIt=h1+JIEππRE(θ,t),

where τ describes the time constant of firing rate dynamics (in the order of milliseconds); ρ refers to neuronal density; hE(θ,t) and RE(θ,t) capture the synaptic current to and firing rate of neurons with preference θ at time t respectively; and R(h) = α ln(1 + exp(h/α)) is the neural gain chosen in the form of a smoothed threshold-linear function. JIE and JEI represent the connection strength between excitatory and inhibitory neurons. All excitatory neurons received a constant background input, Ie, reflecting the arousal signal when the neural system was engaged in a working memory task. δ1ξ1 is background noise; Ie, any external stimulus (e.g., target, mask, and recall signal); and δ1ξ1(t) the noise related to those external stimuli. u(θ,t) and x(θ,t) denote the short-term synaptic facilitation (STF) and depression (STD) effects at time t of neurons with preference θ, respectively. The short-term plasticity dynamics are characterized by the following parameters: J1 (absolute efficacy), U (increment of the release probability when a spike arrives), τf and τd (facilitation and depression time constants). The STF value u(θ,t) is facilitated whenever a spike arrives, and decays to the baseline U within the time τf. The neurotransmitter value x(θ,t) is utilized by each spike in proportion to u(θ,t) and then recovers to its baseline, 1, within the time τd.

J(θ,θ) is the interaction strength from neurons at θ to neurons at θ and is chosen to be

(10) J(θ,θ)={J1cos[B(θθ)]J0ifB(θθ)ϵ[arcos(J0/J1),arcos(J0/J1)],J0,else

where J0, J1, and B are constants which determine the connection strength between the neurons. Note that J(θ,θ) is a function of θθ, i. e., the neuronal interactions are translation-invariant in the space of neural preferred stimuli. The other parameters of the system were as follows: τ = 0.008 s, τf = 4 s, τd = 0.3 s, J1 = 12, J0 = 1, JEI = 1.9, JIE = 1.8, Ib = -0.1 Hz, δ=0.3, δ2 = 9, N = 100, α = 1.5, B = 2.2.

During our simulations, we first presented a target signal with an amplitude of Atarget = 390 Hz at a random location (50 ms), waited for 17 ms, and then applied a mask signal to all the neurons in the system (200 ms). The amplitude of the mask signal was initially varied in order to determine a critical value which would produce two distinct maintenance patterns, but was then fixed at a threshold of Amask = 62 Hz. At the end of a 3s-delay period, a non-specific recall signal was given for 50 ms with Arecall = 10 Hz. Remembered target position was calculated as the population vector angle during this time period.

References

  1. 1
  2. 2
  3. 3
    Maintenance of non-consciously presented information engages the prefrontal cortex
    1. F Bergström
    2. J Eriksson
    (2014)
    Frontiers in Human Neuroscience, 8, 10.3389/fnhum.2014.00938, 25484862.
  4. 4
    The conjunction of non-consciously perceived object identity and spatial position can be retained during a visual short-term memory task
    1. F Bergström
    2. J Eriksson
    (2015)
    Frontiers in Psychology, 6, 10.3389/fpsyg.2015.01470, 26483726.
  5. 5
    LIBSVM
    1. C-C Chang
    2. C-J Lin
    (2011)
    ACM Transactions on Intelligent Systems and Technology 2:1–27.
    https://doi.org/10.1145/1961189.1961199
  6. 6
  7. 7
    A diversity of localized timescales in network activity
    1. R Chaudhuri
    2. A Bernacchia
    3. XJ Wang
    (2014)
    eLife, 3, 10.7554/eLife.01239, 24448407.
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
    MEG and EEG data analysis with MNE-Python
    1. A Gramfort
    2. M Luessi
    3. E Larson
    4. DA Engemann
    5. D Strohmeier
    6. C Brodbeck
    7. R Goj
    8. M Jas
    9. T Brooks
    10. L Parkkonen
    11. M Hämäläinen
    (2013)
    Frontiers in Neuroscience, 7, 10.3389/fnins.2013.00267, 24431986.
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
    Shaping functional architecture by oscillatory alpha activity: gating by inhibition
    1. O Jensen
    2. A Mazaheri
    (2010)
    Frontiers in Human Neuroscience, 4, 10.3389/fnhum.2010.00186, 21119777.
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
    Scikit-learn: machine Learning in Python
    1. F Pedregosa
    2. G Varoquaux
    3. A Gramfort
    4. V Michel
    5. B Thirion
    6. O Grisel
    7. M Blondel
    8. P Prettenhofer
    9. R Weiss
    10. V Dubourg
    11. J Vanderplas
    12. ALexandre Passos
    13. D Cournapeau
    14. M Brucher
    15. M Perrot
    16. E Duchesnay
    (2011)
    Journal of Machine Learning Research : JMLR pp. 2825–2830.
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
    Revealing hidden states in visual working memory using electroencephalography
    1. MJ Wolff
    2. J Ding
    3. NE Myers
    4. MG Stokes
    (2015)
    Frontiers in Systems Neuroscience, 9, 10.3389/fnsys.2015.00123, 26388748.
  62. 62
  63. 63

Decision letter

  1. Tatiana Pasternak
    Reviewing Editor; University of Rochester, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "A theory of working memory without consciousness or sustained activity" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Sabine Kastner as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The manuscript was well received by both reviewers. They were impressed with several aspects of the work, including the timely topic, the novelty of experimental tests, state-of-the-art methods and excellent writing. However, they raised a number of issues that must be addressed before the manuscript will be considered for publication in eLife. These are summarized below and laid out in more detail in individual reviews.

Essential revisions:

1) Please clarify how a disassociated pattern of MEG responses can account for the issue related to errors in visibility reporting (i.e.,trials misclassified as unseen).

2) Address the question raised by reviewer 1 whether cue detection quantified with d' correlates with working memory performance.

3) Please discuss if the "activity-silent" code represents a more general property or is specific to non-conscious working memory as well as discuss what types of WM processes are more likely dependent on consciousness.

4) Address the problem of the accuracy on trials reported as unseen (i.e., trials misclassified as unseen), raised by reviewer 2 ("Interpretation 1").

5) Address a possibility, raised by reviewer 2, that subjects retain the "blindsight" decision to the end of the trial ("Interpretation 2") and that memory for the event and for the decision may have different neural markers.

6) Clarify the reasons for referring to extended duration "blindsight" as working memory (reviewer 2, "Interpretation 3").

7) Please address the issue of statistical power raised by reviewer 2 ("Methodology 1"). Specifically, the power of analysis is limited by the small number of error trials.

8) For decoding, use a classifier trained on "seen" vs "unseen" trials to distinguish between "unseen correct "vs "unseen incorrect" (reviewer 2, "Methodology 2")

Reviewer #1:

The theoretical basis for this study is strong and the analytical methods are very sound. There have been recent reports of 'non-conscious' WM that have been controversial and subject to alternative accounts. The present work does an excellent job in devising novel experimental tests that provide the strongest behavioural effects of 'non-conscious' WM to date. Most crucially the MEG analyses appear to rule out some alternative accounts that have been put forward against the phenomenon on 'non-conscious' WM based on 'erroneous visibility reports' and 'conscious guessing of a non-conscious target'. Furthermore, the MEG data shows that 'non-conscious' WM is associated with neural markers dissociated from those found in conscious WM in terms of (reversed) increase in alpha power and alpha/beta desynchronization during the memory delay, respectively. Target position could be decoded early on during the trial from visual cortex in both conscious and non-conscious WM. However target position could only be decoded in temporal cortex during the maintenance phase of the conscious WM trials, but not in the non-conscious trials. This absence of decodability of target location in the non-conscious trials during the maintenance phase alongside computer simulation results that mimic the behavioral data is taken to reflect an 'activity-silent' neural coding in non-conscious WM.

Comments

It is repeatedly stated that prior findings on non-conscious WM could have arisen due to errors in reporting (i.e. reporting unseen instead of partially seen), which could explain above chance performance. However it is not clear whether the same could not apply to the present study. I am unclear as to how a dissociated pattern of MEG responses may inform this. Could the authors clarify it further?

The authors may also justify why a subjective measure of awareness was used instead of an objective criterion of lack of awareness e.g. cue detection d' = 0

What is the cue detection d'? This could be computed using the false alarms on the 20% catch trials in which targets were seen. This does not appear to be reported. Is cue detection d' correlated with individual WM performance?

This absence of decodability of target location during the memory delay in the non-conscious trials is very intriguing and the computer simulations are very appropriate. They suggest that target information is sustained in an activity-silent neural code. A series of recent studies (Rosenthal et al., 2016; Chong et al., 2014) have also demonstrated recognition memory for unaware cues associated with null cue detection d'. In these studies, the learning and the test phase are much further apart, even in the order of minutes rather than seconds. In this scenario, persistent neural activity for memory targets is extremely unlikely. The authors could also discuss their findings in light of this work: is the proposed 'activity-silent' code specific to non-conscious WM or does it represent a more general property of memory including longer-term recognition memory.

If WM can be decoupled from conscious awareness, I understand this would cast doubt on the construct of WM itself; or is it just that only a subset of WM processes/tasks may be divorced from awareness? What types of WM processes do the authors think are more dependent on consciousness? I believe that some further discussion along the lines would increase the impact of the paper and its access to a general readership.

References

Rosenthal, C.R. et al., (2016). Learning and recognition of a non-conscious sequence of events in human primary visual cortex, Current Biology, 26, 834-841

Chong, T.T-J., Husain, M., and Rosenthal, C.R. (2014). Recognizing the unconscious. Current Biology, 24, R1033-1035

Reviewer #2:

Trubutschek et al. present the results of an MEG experiment and network model simulations to argue that WM does not depend on consciousness or persistent delay activity. Specifically, they show behaviourally that participants can respond above chance in a spatial working memory task, despite reporting no conscious experience of the memory item. In the brain, they show that non-conscious WM lacks two key neural signatures: sustained desynchronization of fronto-central beta and item-location decodability (but see middle panel of Figure 5B). Finally, they report simulation results from a model adapted from Mongillo et al., (2008) to show that even in the absence of periodic refreshing (which could give rise to apparent sustained decoding when averaged across trials), changes in synaptic strength can still persist sufficiently long to guide above-chance guessing at the time-scale examined here (if I understand correctly). Although I very much enjoyed this paper (interesting question, state-of-the-art methods and well -written), I was left with a number of concerns regarding both theoretical interpretation and methodological details.

1) Interpretation 1. As I understand the logic, previous evidence for 'unconscious' working memory could be an artefact of conscious report errors. If we can assume that people do sometimes make errors in their subjective report (seems likely, even if that just means pressing the wrong button on some trials), then there should exist a subset of trials in which the subject says unseen, but still has conscious working memory (and so therefore can localise the item just fine). Of course, there will also be trials in which the subject accidentally reports the wrong location, but so long as these two types of response error are not perfectly correlated, then there should always exist a subset of trials such that: seen (but erroneously responded unseen) + conscious WM (faithfully reported), just as there should also be cases in which Ps erroneously report seen, but actually have no useful location in mind (i.e., guesses in the 2-4 visibility conditions). You only need to assume that behavioral output is not a perfect reflection of the participant's internal state. The authors address the validity of unconscious WM first by replicating the behavioral phenomenon with some additional manipulations (distractor, load, delay duration). In all cases, they find evidence of 'blindsight'; however, although these manipulations are interesting, it is difficult to see how it addresses the key question – is this just error in visibility reporting? Rather, they turn to the MEG data (which seems sensible). As I see it, the key supporting evidence here is that a neural signature that differentiates seen vs unseen WM (fronto-central beta desynchronization) does not differentiate accuracy within unseen trials (i.e., not a handful of seen, but behaviourally mis-classified trials). I don't actually see how Figure 3 relates to this question, and the results from Figure 5 seem more like a difference of degree, not kind (especially Figure 5B, and especially because this is not subdivided by accuracy). So, as to this main (central claim), I remain unconvinced.

2) Interpretation 2. The second challenge to unconscious WM is that participants might actually decide right away (i.e., standard blindsight), and maintain the 'decision' rather than the true event (normal WM of an unconscious percept). I find this a more complicated problem. If participants can decide accurately at the end of the trial, then surely they could also do so at the beginning. If conscious WM were indeed more robust, then it would seem a sensible strategy. Unless, for some reason, they require the response probe to make the blindsight decision? But this is not tested. Rather, the authors suggest that because there is no overlap in the neural markers for conscious (seen vs unseen) and unconscious WM (unseen correct vs unseen error), then they are not likely remembering the blindsight decision. But it is possible that memory for the event rather than the decision have different neural markers. Lower panel of Figure 5B address this somewhat, but the data look pretty noisy, mainly showing that distractor location influences guessing (behavioral data on the link between distractor location and guessing is not show). What would count here as evidence for early decision? The bump around 750ms in middle and right panels would seem suggestive to those inclined, and certainly, it does not look like a very convincing null effect. Especially considering the methodological details listed below.

3) Interpretation 3. Even if there is a prolonged version of blindsight, what is the rationale for calling it WM? Can it be manipulated, for example? Or is it just an extended version of priming?

4) Methodology 1. Statistical power. This study really seems underpowered, especially as some of the main claims are based on null effects. There are only thirteen subjects in the MEG experiment, and only 100 and 200 trials in the perception and WM tasks respectively (small n studies typically compensate with high within-participant power). Bayesian statistics might help convince the reader that the key null effects are meaningful. Also, it would be worth clarifying exactly how many trials are going into these analyses. For example, the behavioral mixture modelling typically requires ~80 trials per subcondition.

5) Methodology 2. Choice of analyses. Firstly, it is not clear what the decoding analysis adds (Figure 3). Something more analogous to the contrasts in Figure 4 would seem more helpful. For example, can a classifier trained on seen vs unseen also classify unseen correct vs unseen incorrect (i.e., accuracy is based on a subset of misclassified visibility ratings). Regarding the location correlation, why not a) decoding (angle, or x,y coordinates) on whole brain activity, b) why not separate correct/incorrect for unseen (assuming unseen correct might actually look like seen trials, statistical power permitting).

6) Methodology 3. The computational model is interesting, but obviously cannot say much about consciousness. I am not qualified to assess the details of the model, but I am assuming it provides a mechanistic account for how some items in WM might be associated with an activity trace, and others not. This is interesting and important, but really only addresses the question of 'activity-silent' WM. Moreover, I would hesitate to call this 'evidence', but rather a proposed model (or even hypothesis).

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "A theory of working memory without consciousness or sustained activity" for further consideration at eLife. Your revised article has been favorably evaluated by Tania Pasternak (Reviewing Editor), Sabine Kastner (Senior editor), and two reviewers.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below by reviewer 2. Also, include a Table with trial numbers, suggested by reviewer 1.

Reviewer #1:

I was relatively happy with the original version of the manuscript and I believe that it has improved following the revisions. The authors have done a good job in addressing the points raised and should be commended by it. On a minor point, the paper would benefit from a Table containing the mean number of trials on each of the awareness ratings alongside the std.

Reviewer #2:

I appreciate that the authors have worked hard to address my initial concerns, but I am left with a few remaining issues.

1) Interpretation 1. The key claim, as I understand, is that unseen correct cannot be accounted for by a subset of seen correct slipping in (visibility rating error). The main data to support this qualitative difference is the spectral profile (greater beta synchrony for seen vs unseen, not unseen correct vs incorrect; great alpha for unseen correct vs incorrect). However, in response to my original query, the authors also show significant cross-generalisation between the neural signature for seen vs unseen (perception) and unseen correct vs incorrect in the ERF (if I understand this correctly). They subsequently point out that there is no robust discrimination within unseen working memory (i.e., correct vs incorrect), but I do not see that failure to discriminate within the unseen category provides any useful counter evidence at all. Clearly, there is something in the signal that can in principle separate correct and incorrect within the unseen trial (allowing for cross-generalisation); and moreover, that something significantly overlaps with the same something that separates seen and unseen in the perception. Why can't we call this something 'consciousness'? Doesn't this provide exactly the kind of positive evidence we would expect for the mis-classification hypothesis? Why should we weight the difference in beta synchrony over the similarity in ERF topographies? I appreciate that a full dissociation is probably an unrealistically high bar to set – in theory, seen vs unseen need not be neurally orthogonal to unseen correct vs incorrect (e.g., for the reasons given in the rebuttal), but I do not see any principled way to adjudicate the stated hypothesis without requiring a complete dissociation. These results also raise the important question of whether there is an SNR issue (related to my original point about statistical power, see below for further on that). Maybe the decoding analysis is simply a more powerful test of whether there is some 'consciousness' in the unseen correct vs incorrect dimension. It is surprising that the same decoding approach was not used for the beta power. What should we conclude if a seen vs unseen decoder trained on betapower can also significantly discriminate unseen correct vs incorrect? This seems like a very important issue still hanging over the paper. At minimum, I would expect the authors to include the new decoding analyses in the main manuscript, and preferably performed on beta power as well as broadband ERF. If there is a shared discriminative axis between seen vs unseen, and unseen correct vs incorrect, the authors need to clarify exactly how this supports (or otherwise) their central claim. I would also expect the authors to tone-down the claim that the current null effect strictly rules out the misclassification hypothesis. This very strong claim, even if supported by a robust null effect, assumes a strong reverse inference (that beta desynch = consciousness). While this may be true, it is not sufficiently self-evident for a water-tight reverse inference.

2) Interpretation 2. I appreciate the authors' effort to clarify this point, and inclusion of Bayes Factors. Although I am mostly satisfied with their argument, I am also concerned that there is little evidence for decoding conscious memory either, so perhaps it is just not possible to decode memory for persistent ERFs. Conscious or otherwise. That is why I asked about using a more powerful decoding approach (e.g., svm, which could also use the x,y structure of the stimuli quite easily, e.g., svr, which would also handle any non-uniformity around the circle). The authors could also train a classifier on the strongest case (seen correct, and test for generalisation to unseen correct). For a convincing null effect, the reader should feel that a positive effect has been given the best possible chance. Already, the statistical power is really not well suited for this kind of decoding, so at the very least, a state-of-the-art classification approach should be used. Finally, I find the follow-up argument for the circular correlation (channel-wise localisation) unconvincing. Surely, in the context of the current research question, it would be far more important to sacrifice spatial information for optimal classification. I strongly recommend using a proper multivariate classification approach to better test the time course of target (and/or response) location information during unseen correct trials (potentially trained on seen correct).

3) Interpretation 3. I generally agree with the authors' conceptualisation, and appreciate their effort to better clarify the terminology.

4) Methodology 1. I find the arguments in the rebuttal unsatisfying. Evidence of some positive effects does not provide evidence for sufficient power to support null effects (which are central to the key claim of this paper). Bayes Factors help, but are only provided for the decoding analysis (which does not directly speak to the question of consciousness or not in the same way as Figure 3). If the new and unpublished data can help, then they should be included in this paper (not just in a reply to reviewer). I am particularly concerned that one of the central null effects (lack of location decoding during the unseen correct trials) is based on thirteen subjects, and fifty seven unseen trials (further break down of accuracy is not provided). This might be just about OK for strong TFR effects, but not obviously sufficient for subtle decoding effects. At a minimum, the authors need to be clearer about this limitation, and correspondingly more circumspect in the relevant interpretations/conclusions.

5) Methodology 3. Finally, I reiterate my statement that I am not able to critically evaluate the modelling.

https://doi.org/10.7554/eLife.23871.029

Author response

Essential revisions:

1) Please clarify how a disassociated pattern of MEG responses can account for the issue related to errors in visibility reporting (i.e., trials misclassified as unseen).

We thank the Reviewer for this question as it is a critical point in our argument. In what follows, we will lay out our rationale in more detail and describe the revisions made to the original manuscript.

Previous evidence for non-conscious working memory primarily consists in a behavioral finding: the existence of a long-lasting blindsight effect (i.e., above-chance objective performance with subjectively invisible stimuli after a delay of several seconds). While it is possible that this blindsight genuinely reflects the non-conscious maintenance of information, it is also conceivable that it is an artifact. Subjects may, for example, have mistakenly miscategorized some of the seen trials as unseen. If that were the case, one should be able to identify such miscategorizations based on their neural signature: They should display similar (if not, the same) characteristics of conscious processing as seen trials. If, on the other hand, no such signatures were found on the unseen correct trials, the latter do not just reflect a subset of the seen trials and erroneous visibility reports alone cannot account for the observed blindsight effect. To rule out the miscategorization hypothesis, one therefore needs to show that seen trials are characterized by a neural signature that is absent on the unseen correct trials.

Our time-frequency analysis accomplishes this goal: When compared to target-absent trials, seen trials were characterized by a pronounced decrease in alpha (8–12 Hz) and beta (13–30 Hz) power over fronto-central sensors that was sustained throughout the delay period in the working memory task. A very similar desynchronization was observed when contrasting seen with unseen trials, while no comparable differences emerged between the unseen and target-absent trials (Figure 4 and Figure 4—figure supplement 1). As such, this desynchronization in the alpha/beta band constitutes the signature of conscious processing we set out to identify: It occurs independently of general task context (i.e., perception vs. working memory) and exact inputs (i.e., target location), is specific to seen trials, and prolonged in the working memory as compared to the perception-only control condition.

Importantly, no comparable power decrease was evident on the unseen correct trials. In fact, both unseen correct and unseen incorrect trials resembled each other highly and were indistinguishable from the target-absent control condition (Figure 4 and Figure 4—figure supplement 1). As explained above, the blindsight we observed therefore cannot have resulted from a miscategorization of visibility responses: The clear divergence between seen and unseen correct trials and the absence of any appreciable difference between unseen correct and unseen incorrect/target-absent trials together demonstrate that unseen correct trials were not just mistakenly mislabeled. Please note that we confirmed that the same pattern of results emerged either when a) randomly subsampling the seen trials to have the same number of trials as the average unseen correct condition, or b) restricting our definition of accuracy to within +/- 1 positions of the target (rather than +/- 2).

Although seen and unseen trials were clearly distinct at the brain level, the direct comparison of unseen correct and unseen incorrect trials revealed that these trials were not strictly identical. We found a significant increase in alpha power on the unseen correct compared to the unseen incorrect trials over occipital channels between ~1500 and 1900 ms. Such a difference is reasonable, given that unseen correct and unseen incorrect trials differed in their performance (i.e., the subject identified the correct target location only on the former). Although our experimental design does not allow us to directly test the link between alpha power and task performance, we speculate that, purely by chance, interference from the distractor was reduced on some of the unseen trials compared to others, thus effectively shielding the representation of the target stimulus from being overwritten. Subjects therefore were more likely to later on respond correctly on these than on the other unseen trials.

In order to clarify the distinction between our rationale for rejecting the miscategorization hypothesis (no signature of conscious processing on the unseen correct trials) and the dissociated pattern of MEG responses, we have revised the relevant sections in the Introduction, Results, and Discussion. In particular, we have now included a “roadmap” paragraph, detailing our general logic, at the end of the Introduction; have laid out our hypotheses more clearly in the Results section; have included a supplement to Figure 4 in order to better highlight the signature of conscious processing distinguishing seen from unseen/target-absent trials in all tasks; and have reworked the relevant paragraph in the Discussion section.

2) Address the question raised by reviewer 1 whether cue detection quantified with d' correlates with working memory performance.

We thank the Reviewer for this suggestion. While quantifications of perceptual sensitivity do not allow one to draw inferences about whether or not conscious awareness was reached (e.g., Soto et al., 2011), it is indeed interesting to examine whether an individual’s sensitivity to the target correlates with his or her objective task performance. After all, perhaps those subjects who were better able to discriminate signal from noise, might also be the ones to show the most blindsight (which, then, could suggest that conscious perception and working memory might not be entirely decoupled).

We computed detection d’ (z(hits) – z(false alarms), where hits = proportion of seen target-present trials and false alarms = proportion of seen target-absent trials) in all experiments and correlated it with objective task performance separately for the seen and unseen trials. The results have been added to the respective paragraphs in the Results section and a supplement has been added to Figure 2 to depict those correlations. Briefly, in both the perception and working memory task in experiment 1, target detection d’ exceeded chance (perception: d’ = 1.5 +/- 0.9, t(16) = 7.1, p<0.001; working memory: d’ = 1.5 +/- 0.7, t(16) = 8.9, p<0.001). Such above-chance d’ was, of course, part of our design, since we aimed to obtain a mixture of seen and unseen stimuli for a fixed stimulus. Importantly, on unseen trials, participants’ sensitivity to the target did not correlate with any of our measures of task performance (accuracy: r = 0.320, p=0.210; rate of correct responding: r = 0.342, p=0.179; precision: r = 0.114, p=0.710). Experiment 2 replicated these results. Detection d’ was significantly greater than chance (1.7 +/- 0.8, t(20) = 10.2, p<0.001), yet uncorrelated with accuracy (r = 0.366, p=0.124), the rate of correct responding (r = 0.224, p=0.357), or precision (r = -0.410, p=0.115) on unseen trials. In both experiments 1 and 2, however, sensitivity to the target correlated positively with accuracy and the rate of correct responding on seen trials (all rs > 0.443, all ps<0.051). Objective task performance on the unseen trials was thus dissociated from perceptual awareness.

3) Please discuss if the "activity-silent" code represents a more general property or is specific to non-conscious working memory as well as discuss what types of WM processes are more likely dependent on consciousness.

We thank the Reviewer for having brought up this question as it allows us to clarify our stance. We indeed believe that the activity-silent code is not specific to non-conscious working memory and also contributes to conscious working memory. While initially encoded and maintained through slowly decaying neuronal firing, the representation of seen target locations is also not continuously sustained, but waxes and wanes throughout the retention period (Figure 5). As in our simulations (Figure 7), there thus exist activity-silent phases interspersed with periods of spontaneous reactivations (potentially, as discussed in more detail in the manuscript, reflecting conscious rehearsal). We have fully reworked all relevant parts of the Results and Discussion section in order to emphasize this common mechanism underlying conscious and non-conscious working memory.

The Reviewer also raises another intriguing possibility: Perhaps activity-silent representations reflect a more general property of memory that goes beyond working memory. We believe this to be a very relevant and reasonable comment indeed and have since included a discussion of this point.

As concerns the second part of the Reviewer’s comment, while we indeed hope to demonstrate that non-conscious working memory is genuine and that, as such, a decoupling of working memory and conscious perception is not just an artifact, we do not want to claim that there is no benefit of conscious awareness to working memory. Subjects, for example, clearly perform much better on the seen than on the unseen trials. In addition, in the present experiments, we specifically chose to focus on non-conscious maintenance of information, because this lies at the heart of most conceptualizations of working memory (see also Eriksson et al., 2015), but this does not mean that all working memory processes can necessarily occur non-consciously. It would indeed be of great value for future research to assess whether other aspects of working memory, such as manipulation of information, may also be dissociated from consciousness. We have therefore added a paragraph to the Discussion, addressing this issue and cautiously speculating that, in light of the proposed passive activity-silent mechanism for non-conscious working memory, it is unlikely that processes requiring neural activity (such as transformations of the stored representations, e.g., mental rotation) can proceed in the complete absence of awareness.

4) Address the problem of the accuracy on trials reported as unseen (i.e trials misclassified as unseen), raised by Rev 2 ("Interpretation 1").

We believe that this point is closely related to Essential revision 1 and therefore extend our arguments. The Reviewer is correct that one of our objectives consisted in testing the reality of non-conscious working memory by examining the previously reported long-lasting blindsight effect in light of two alternative hypotheses: the miscategorization hypothesis (i.e., subjects erroneously labeled seen trials as unseen) and the conscious maintenance hypothesis (i.e., subjects consciously maintained an early guess). This was, however, embedded in a much larger investigation with several key questions: First, we wanted to examine the robustness of the blindsight with regard to interference from distraction and a conscious working memory load, as this allowed us to further characterize its basic properties. These behavioral results are reported in Figure 1B and Figure 2 We then directly probed the relationship between conscious perception and conscious working memory, specifically focusing on the hypothesis that maintenance in conscious working memory would constitute a prolonged conscious episode. To this end, we employed multivariate pattern analyses and reported the results in Figure 3. The Reviewer is thus right: Figure 3 does not directly relate to the question of the genuineness of non-conscious working memory; Figures 4, 5, and 6 do. We regret that our presentation of our guiding objectives had not been sufficiently clear and have since substantially revised the paper to rectify this issue. Specifically, we have included a “roadmap” for our rationale at the end of the Introduction and have better introduced each main objective (and any specific hypotheses) in the corresponding paragraphs of the Results section. In the Discussion, we have also reworked the summary paragraph and conclusion in order to better tie in our findings with our guiding principles.

Furthermore, we agree with the Reviewer that the key evidence against the miscategorization (and as explained below, the conscious maintenance) hypothesis is that the signature of conscious processing (i.e., a desynchronization in the alpha/beta band) does not differentiate the unseen correct from the unseen incorrect trials (Figure 4). Indeed, as also laid out in response to reviewer 1 (Essential revision 1), if the long-lasting blindsight effect resulted from a subset of seen trials that had erroneously been reported as unseen, the very same signature of conscious processing, comparable in size to seen trials, should also be present on the unseen correct trials (especially so, as, due to our task design, chance is very low and the majority of the unseen correct trials reflect true task performance rather than random guessing). This is, however, not the case. While seen trials in both the perception-only control condition and the working memory task displayed a prominent decrease in alpha/beta power (as compared to target-absent and unseen trials), no such desynchronization was present on the unseen trials – not even when considering them separately by accuracy (Figure 4 and Figure 4— figure supplement 1). Bayesian statistics further supported this conclusion (Figure 4—figure supplement 3): In the beta band, evidence for the alternative hypothesis (i.e., a power decrease/increase relative to baseline) was consistently higher than for the null hypothesis (i.e., no power decrease/increase relative to baseline) for the seen trials. In contrast, for the unseen trials, the null hypothesis was generally favored over the alternative hypothesis. The observed decrease in alpha/beta power is thus specific to seen trials and, importantly, was first identified in the perception-only control condition without any maintenance requirement: It is a general signature of conscious processing. In addition, although these data are not shown, we carefully ensured that the very same signature persisted, even when subsampling the seen trials to the average number of unseen correct trials. Besides a small increase in alpha power around the time of the distractor, no differences emerged between the unseen correct and the unseen incorrect trials. As such, the miscategorization hypothesis cannot account for the observed blindsight effect.

5) Address a possibility, raised by reviewer 2, that subjects retain the "blindsight" decision to the end of the trial ("Interpretation 2") and that memory for the event and for the decision may have different neural markers.

Two important points were raised here. The first one concerns one of the main challenges to non-conscious working memory: Perhaps, just as in standard priming experiments, participants guessed the target location right after its presentation and then consciously maintained this above-chance decision (conscious maintenance hypothesis). The present study tackles this problem for the very first time, by relying on two main lines of converging evidence. On one hand, if participants consciously maintained a guess, then there should be a signature of conscious processing on the unseen trials. As explained in more detail above, this was not the case: There was no trace of any decrease in alpha/beta power on the unseen trials even remotely comparable to the one on seen trials (Figure 4 and Figure 4—figure supplement 1; Figure 4—figure supplement 3). Importantly, this desynchronization in the alpha/beta band constitutes a task- and stimulus-independent, general signature of conscious processing (i.e., not just a marker of a specific event). It was first established in our perception-only control condition (without any working memory requirement) and is blind as to specific target and/or response locations. Even if the format of a specific representation changed depending on whether or not it was a memory of the target stimulus or a memory of the decision, the absence of any appreciable decrease in alpha/beta power on the unseen trials suggests that subjects did not just maintain a conscious guess.

Circular-linear correlation analyses further supported this conclusion (Figure 6): While response location could be tracked throughout the majority of the epoch on seen trials (albeit through a slowly decaying and fluctuating code), the representation of response position on the unseen trials did not come online until much later, towards the very end of the epoch. In line with previous research (Bode et al., 2011), this build-up of activity could potentially reflect the non-conscious generation of our subjects’ guess, although future research will be needed to confirm this speculation. At the very least, this absence of decodability suggests that, for the vast majority of our epoch, participants did not maintain a conscious guess. This interpretation was validated by Bayesian statistics (Supplementary file 3), favoring the null hypothesis (i.e., no decodable information) over the alternative hypothesis (i.e., decodable information) for the vast majority of our temporal ROIs on the unseen trials. Even more importantly, combining the current data with the data from an ongoing study on the non-conscious manipulation of information (see our response to Essential revision 7 for further details), we replicated these findings in a larger sample of 29 participants (see Author response image 3 in response to Essential revision 7), thus reconfirming the present conclusions with much higher confidence.

The second point raised by the reviewer concerns a potential dissociation of the neural markers for memories of events vs. memories of decisions. While, as explained above, we believe that we identified a general signature of conscious processing and, as such, already addressed this possibility in our manuscript, we nevertheless returned to our circular-linear correlations to further interrogate this issue. Specifically, we carefully examined the topographical patterns related to a subject’s response. Just as was the case for the correlations with target location, focal associations between the response location and the MEG signal were again primarily confined to posterior channels, with more frontal areas being recruited exclusively at the time of the response (Figure 6A). No additional regions were identified on the unseen trials and none of these areas showed any appreciable correlation before the presentation of the response screen (Figure 6 —figure supplement 1). This suggests that, irrespective of stimulus visibility, common brain networks supported memories for the target stimulus and the ensuing decision and that, in the case of non-conscious working memory, these did not come online until shortly before the response. These findings were again highly reproducible across different experiments (current study and ongoing investigation; Author response image 1). We have now modified the respective parts of the Results section and separately present the topographies and time courses of the circular-linear correlations with the response location in Figure 6 and Figure 6 —figure supplement 1.

Topographies of the circular-linear correlations with response location are shown for seen (left) and unseen (right) trials across different experiments. The first row corresponds to the present study, the second to an ongoing investigation. Data from the two experiments were combined in the third row. The first three time bins are relative to stimulus onset, the last two relative to response screen onset. R = onset of response screen.

6) Clarify the reasons for referring to extended duration "blindsight" as working memory (reviewer 2, "Interpretation 3").

We thank the Reviewer for this comment, as it allows us to clarify our rationale (please see also our response to Essential revision 3). It is simply a truism that the short-term maintenance of information lies at the heart of most conceptualizations of working memory (see Eriksson et al., 2015 for a review) and the existing literature on non-conscious working memory (e.g., Soto et al., 2011). “Blindsight” is the generic term given to any situation of above-chance objective performance in the absence of a subjective feeling of consciousness. Given the current state of the evidence, “non-conscious working memory” seems to be a more accurate descriptor of the phenomenon that we study than the generic term of “blindsight.” First, the time scales investigated here and in previous reports of non-conscious working memory (e.g., Soto et al., 2011; Bergström et al., 2015) lie far beyond the ones observed in typical priming experiments, going up to delays of 15 s (Bergström et al., 2014). Second, and perhaps most importantly, the observed blindsight exhibits a lot of the same characteristics as would be expected from conscious working memory, including resistance to distraction and sensitivity to load manipulations. Third, we purposefully designed our experiments to include a large number of stimulus locations and assigned the corresponding response cues randomly on each trial. It is thus unlikely that participants’ location responses were primarily driven by automatic stimulus-response mappings and, instead, required a minimal amount of manipulation.

Following the Reviewer’s comment, we added a paragraph mentioning that there are also additional criteria for working memory, such as the mental manipulation of information, that were not interrogated in the current paper (subsection “Long-lasting blindsight effect reflects genuine non-conscious working memory”). Whether such processes can occur non-consciously is an important empirical question and should be addressed by future research. In the added discussion, we cautiously speculate that, in light of the proposed activity-silent model for non-conscious working memory, it seems unlikely that processes requiring a transformation of a neural code could occur in the complete absence of awareness.

7) Please address the issue of statistical power raised by reviewer 2 ("Methodology 1"). Specifically, the power of analysis is limited by the small number of error trials.

We are confident that our study possesses the necessary statistical power to examine the question at stake and believe that it establishes an important methodological and theoretical framework for future research. First, both our sample size and number of trials (200 each in both the perception and working memory task) appear to be well into the acceptable range for MEG experiments of this kind (i.e., a working memory task with long trials). Our stimuli were tailored in a subject-specific manner to obtain a large proportion of seen and unseen trials. After rejection of artifacted trials, there were, on average 73 (71) seen and 61 (57) unseen target-present trials in the perception (working memory) task. Second, and most importantly, throughout the entire manuscript we repeatedly demonstrate that we are able to detect generally well-established and subtle effects specifically on the unseen trials – often with magnitudes similar to the ones observed on seen trials:

1) Behaviorally, a conscious working memory load modulated the precision with which non-conscious (but not conscious) information could be stored (Figure 2).

2) When comparing the event-related fields (ERFs) of seen to unseen trials, we detected a well-established signature of conscious perception (i.e., an ignition of activity specific to the seen trials; Figure 3A).

3) As part of our time-frequency analyses, we show a significant increase in alpha power on the unseen correct compared to the unseen incorrect trials (Figure 4).

4) During our early time window (100–300 ms), we are consistently able to decode target location on the unseen (as well as the unseen correct) trials, equivalently to seen trials (as assessed in a paired Wilcoxon signed-rank test; Figure 5).

5) Similarly, we can track response location with equal magnitude on the seen and unseen trials at the time of the response and, in left temporo-occipital and frontal channels, with similar size on the unseen correct and unseen incorrect trials (Figure 6). Even more intriguingly, towards the end of the epoch (right before the response), response location ramps up on the unseen trials in all sensors (but most pronouncedly in the frontal ones), demonstrating that, even during the delay period, we can capture subtle events on the unseen trials.

We have now highlighted these instances throughout the manuscript.

In addition, we have made further efforts to demonstrate that our key effects are meaningful. We have now added Bayesian statistics, which generally supported our theoretical interpretations, showing that a) there is essentially no evidence for any change in beta power (relative to baseline) on the unseen trials until the very end of the epoch (Figure 4—figure supplement 3) and that b), similarly, on the unseen trials, the absence of any target- and response-related activity during the delay period is overall more likely than their presence (Supplementary file 3).

We recently replicated these findings as part of a new study on the same topic, which used MEG to examine the possibility of non-conscious manipulation of information. The control condition of this new experiment is strictly equivalent to the current study: Subjects were asked to maintain a target square over a 3s-delay period and to report a location later on. We combined the data from this control with the current experiment (29 participants total) and reran the circular-linear correlation analyses. As shown in Author response images 2 and 3 below, this confirmed the absence of any delay-activity on the unseen trials.

Author response image 2

(A) Time courses of circular-linear correlations between MEG signals and target location combined across data from 29 participants for seen (red) and unseen (blue) trials. In the replication study, a visual cue was presented 1.75 s after the presentation of the target stimulus, indicating specific response modalities. (B) Time courses of circular-linear correlations between MEG signals and target location combined across data from 29 participants for unseen correct (dark blue) and unseen incorrect (light blue) trials.

https://doi.org/10.7554/eLife.23871.024
Author response image 3

(A) Time courses for circular-linear correlations between MEG signals and response location combined across data from 29 participants for seen (red) and unseen (blue) trials. In the replication study, a visual cue was presented 1.75 s after the presentation of the target stimulus, indicating specific response modalities. (B) Time courses for circular-linear correlations between MEG signals and response location combined across data from 29 participants for unseen correct (dark blue) and unseen incorrect (light blue) trials.

https://doi.org/10.7554/eLife.23871.025

8) For decoding, use a classifier trained on "seen" vs "unseen" trials to distinguish between "unseen correct "vs "unseen incorrect" (Reviewer 2, "Methodology 2").

We thank the Reviewer for this suggestion and describe the results of this analysis here. In order to ensure a maximum of samples while strictly separating the train and test sets, we trained a classifier to distinguish seen from unseen trials in the perception task, and then applied it either to the same category of trials in the working memory task (visibility decoder) or to the unseen correct and incorrect trials in the working memory task (accuracy decoder). The temporal generalization matrix for the seen/unseen classification presented as a thick diagonal, indicating that brain responses consisted in a succession of distinct patterns of activity that partially overlapped in time. Diagonal classification performance exceeded chance-level between ~148 and 1048 ms (FDR-corrected). When applied to the unseen correct and incorrect trials, the same classifiers generalized only weakly: Using a lenient, uncorrected threshold, diagonal decoding rose above chance between ~372 and 724 ms. Importantly, these findings are compatible with two different interpretations. On one hand, they might indicate that unseen correct trials represent seen trials that had been mislabeled. Alternatively, the model of non-conscious working memory we propose suggests that a stimulus that fails to cross the threshold for sustained activity and subjective visibility (“failed ignition”), is still expected to induce enough activity in high-level cortical circuits to trigger short-term synaptic changes. As such, unseen correct trials should share some of the processes that are found on seen trials and a decoder should therefore generalize to some extent from seen to unseen correct (as observed in Author response image 4).

Author response image 4

(A) Temporal generalization matrices for classifiers trained to distinguish seen from unseen trials in the perception task and applied to the working memory task. Horizontal, dotted line denotes mean reaction time for the visibility response in the perception task. Inset represents the diagonal of the matrix (black), i.e. classifiers were trained and tested on the same time sample. Thick line indicates above-chance decoding performance (one-tailed Wilcoxon signed-rank test, uncorrected). Shaded area denotes standard error of the mean (SEM) across subjects. For display purposes only, data were smoothed with a moving average of eight samples. (B) Same as in (A), except that classifiers trained to distinguish seen from unseen in the perception task were applied to unseen correct and unseen incorrect trials in the working memory task.

https://doi.org/10.7554/eLife.23871.026

In order to separate these two hypotheses, we trained and tested a classifier separately on seen/unseen correct and seen/unseen incorrect trials in the working memory task. If the majority of our unseen correct trials were in fact seen trials mislabeled as unseen, then brain responses should be highly similar in the two conditions and classifiers should perform at chance. As can be seen in Author response image 5B, this was not the case. Instead, classifiers trained to separate seen and unseen correct trials performed significantly above chance from ~332 to 992ms, thus revealing a clear divergence in the brain responses elicited by each condition. In fact, the diagonal-shape pattern revealed by temporal generalization was similar to the ones observed when classifiers were trained to separate seen from either all unseen trials (Author response image 5A) or from unseen incorrect trials (Author response image 5C). Classifiers trained to distinguish unseen correct from unseen incorrect trials did not show any clear difference between these conditions (Author response image 5D). These results demonstrate that brain responses in unseen correct trials were clearly different from seen trials but not from unseen incorrect trials. As such, these findings converge with the results obtained from the time-frequency analysis and reject the hypothesis of a miscategorization of unseen correct trials. Unseen correct trials rather reflect genuine blindsight, in which subjects correctly reported the location of the target stimulus while being unaware of it.

Author response image 5

Temporal generalization matrices for classifiers trained to distinguish seen from unseen trials (A), seen from unseen correct (B), seen from unseen incorrect (C), and unseen correct from unseen incorrect (D) in the working memory task. Insets represent the diagonal of the matrix (black), i.e. classifiers were trained and tested on the same time sample, and P3b (red) slices, i.e., classifiers trained between 0.3 and 0.6 s were averaged. Thick line indicates above-chance decoding performance (one-tailed Wilcoxon signed-rank test, uncorrected). Shaded area denotes standard error of the mean (SEM) across subjects. For display purposes only, data were smoothed with a moving average of eight samples. Note that a class weight was applied to counter any class imbalances.

https://doi.org/10.7554/eLife.23871.027

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Reviewer #1:

I was relatively happy with the original version of the manuscript and I believe that it has improved following the revisions. The authors have done a good job in addressing the points raised and should be commended by it. On a minor point, the paper would benefit from a Table containing the mean number of trials on each of the awareness ratings alongside the std.

We thank the Reviewer for his/her positive evaluation of our work and are happy to hear that our revisions were satisfactory. The requested table has been added to our manuscript (Supplementary file 4).

Reviewer #2:

I appreciate that the authors have worked hard to address my initial concerns, but I am left with a few remaining issues.

1) Interpretation 1. The key claim, as I understand, is that unseen correct cannot be accounted for by a subset of seen correct slipping in (visibility rating error). The main data to support this qualitative difference is the spectral profile (greater beta synchrony for seen vs unseen, not unseen correct vs incorrect; great alpha for unseen correct vs incorrect). However, in response to my original query, the authors also show significant cross-generalisation between the neural signature for seen vs unseen (perception) and unseen correct vs incorrect in the ERF (if I understand this correctly). They subsequently point out that there is no robust discrimination within unseen working memory (i.e., correct vs incorrect), but I do not see that failure to discriminate within the unseen category provides any useful counter evidence at all. Clearly, there is something in the signal that can in principle separate correct and incorrect within the unseen trial (allowing for cross-generalisation); and moreover, that something significantly overlaps with the same something that separates seen and unseen in the perception. Why can't we call this something 'consciousness'? Doesn't this provide exactly the kind of positive evidence we would expect for the mis-classification hypothesis? Why should we weight the difference in beta synchrony over the similarity in ERF topographies? I appreciate that a full dissociation is probably an unrealistically high bar to set – in theory, seen vs unseen need not be neurally orthogonal to unseen correct vs incorrect (e.g., for the reasons given in the rebuttal), but I do not see any principled way to adjudicate the stated hypothesis without requiring a complete dissociation. These results also raise the important question of whether there is an SNR issue (related to my original point about statistical power, see below for further on that). Maybe the decoding analysis is simply a more powerful test of whether there is some 'consciousness' in the unseen correct vs incorrect dimension. It is surprising that the same decoding approach was not used for the beta power. What should we conclude if a seen vs unseen decoding trained on beta power can also significantly discriminate unseen correct vs incorrect? This seems like a very important issue still hanging over the paper. At minimum, I would expect the authors to include the new decoding analyses in the main manuscript, and preferably performed on beta power as well as broadband ERF. If there is a shared discriminative axis between seen vs unseen, and unseen correct vs incorrect, the authors need to clarify exactly how this supports (or otherwise) their central claim. I would also expect the authors to tone-down the claim that the current null effect strictly rules out the misclassification hypothesis. This very strong claim, even if supported by a robust null effect, assumes a strong reverse inference (that beta desynch = consciousness). While this may be true, it is not sufficiently self-evident for a water-tight reverse inference.

We thank the Reviewer for this comment and have since implemented all of the suggestions. In what follows, we first clarify our reasoning, describe the main findings of the new decoding analyses, and then highlight the changes we implemented in the manuscript.

Adjudicating between the miscategorization hypothesis and genuine non-conscious working memory involves determining whether the unseen correct trials are more similar to the seen trials or to the unseen incorrect trials. In response to the Reviewer’s previous comments, we presented decoding analyses aimed at assessing the overlap in neural patterns between the seen and unseen correct trials (see our response to Essential revision 8). We trained a classifier to separate seen from unseen trials in the perception task and then applied it either to the seen and unseen trials in the working memory task (decoding visibility) or to the unseen correct and unseen incorrect trials in the working memory task (decoding accuracy). Decoding visibility presented as a thick diagonal, with above-chance discrimination between ~148 and 1048 ms (Figure 4 —figure supplement 2A, left): Classification performance first rose above chance at ~148 ms (AUC = 0.54 +/- 0.01, pFDR=0.023), peaked at ~640ms (AUC = 0.58 +/- 0.02, pFDR=0.001), and then decayed rapidly by ~1 s. In contrast, the generalization to the unseen correct/incorrect trials was barely significant, did not reveal any clear pattern, and the time course of diagonal decoding was highly dissimilar from the one observed above (Figure 4 —figure supplement 2A, right): It first sharply peaked at ~180 ms (AUC = 0.55 +/- 0.01, puncorrected=0.037), dropped to chance-level, and then exceeded chance between ~372 and 724 ms with a peak at 444 ms (AUC = 0.57 +/- 0.02, puncorrected=0.007). Much unlike any of the previous decoders involving the perception task, long after the visibility response, it rose a third time between ~1.44 and 1.74 s, peaking with similar magnitude as before at ~1.58 s (AUC = 0.57 +/- 0.02, puncorrected=0.010; P3b and last time window: all ps<0.023). Although the level of noise evident in the accuracy decoder thus precludes any definitive conclusion, the visibility and accuracy decoders had little in common, rendering it unlikely for the unseen correct trials to have simply been mislabeled.

Importantly, even if this weak effect were genuine, on its own, this analysis is not well suited to distinguish among the two competing hypotheses, as it is only capable of assessing whether the two categories of trials might share common processes (without specifying the exact nature). In fact, a weak and partial generalization from seen/unseen to unseen correct/incorrect is not only compatible with, but actually expected by our proposed model of activity-silent non-conscious working memory, in which the unseen correct trials, while failing to cross the threshold for subjective visibility, behave more similar to the seen than the unseen incorrect trials in that they still induce enough activity in higher-level cortical circuits to modify synaptic weights. Similarly, it is conceivable that a partial generalization results simply from the fact that both the seen as well as the unseen correct trials share a common potential for correct responding.

The critical test thus consists in directly comparing the seen to the unseen correct trials, and the unseen correct to the unseen incorrect trials (see also our previous response to Essential revision 8). According to the miscategorization hypothesis, seen and unseen correct trials should, in fact, belong to the same category, whereas unseen correct and incorrect trials should not. A classifier trained to discriminate between seen and unseen correct trials should thus perform at chance and a classifier trained to distinguish the unseen correct from their incorrect counterpart should resemble the standard seen/unseen as well as a seen/unseen incorrect decoder. We observed the exact opposite pattern: The unseen correct/incorrect classification was at chance, while decoding of seen/unseen correct trials revealed a pattern of brain responses highly similar to the one observed for seen/unseen incorrect. Because there were about twice as many seen trials as unseen correct/incorrect trials, we repeated the same analysis, this time randomly subsampling the seen trials (and, in the case of the seen/unseen classification, the unseen trials as well) to the average number of unseen correct trials. The results remained virtually unchanged: All generalization matrices involving seen trials presented as thick diagonals, with above-chance diagonal-decoding at least between ~320 and 800 ms (in the case of the seen/unseen correct decoder). There were no apparent differences between decoding with the full and subsampled datasets. The unseen correct and incorrect trials, however, could not be discriminated at all. As such, these results do not support the miscategorization hypothesis. They are, however, compatible with the hypothesis of genuine non-conscious working memory.

In a second step, we also implemented the Reviewer’s suggestion and extended the decoding approach to the frequency domain. We first followed the same logic as for the univariate analyses, training classifiers on the average power (in dB, relative to baseline) in the alpha (8–12 Hz) or beta (13–30 Hz) band to distinguish seen from unseen trials separately in the perception and working memory task and applying it to the same visibility categories either in the same or the other task (Author response image 6). This analysis largely confirmed our univariate time-frequency results: Temporal generalization matrices for alpha and beta power were characterized by thick diagonals in the perception task, with diagonal decoding exceeding chance between ~440 and 1220 ms in the alpha (Author response image 6A; P3b [300–600 ms] time window and first part of the delay [0.6–1.5 s]: ps<0.027) and between ~100 and 180 ms, ~600 and 800 ms, as well as between ~1.26 and 1.4 s in the beta band (Author response image 6B; early [100–300m s] interval and first part of the delay period: ps <0.027). A similar, yet more sustained, pattern was also observed in the working memory task, although temporal generalization matrices for both bands were slightly more square-shaped than in the perception task and diagonal decoding persisted until the end of the epoch (alpha: last two time windows, ps<0.019; beta: 0.6–1.5 s, p=0.004; 1.5–2.2 s, p=0.066). Importantly, generalization from one task to the other resulted in a similar pattern of decodability in the alpha band. When trained in the perception task and tested in the working memory task, diagonal decoding emerged around 180 ms and lasted until 1.18 s (first three time windows: ps <0.016). Probably owing to the slightly different time courses of beta desynchronization in the perception and working memory task (Figure 4—figure supplement 1), decoding in the beta band was less strong and, if anything, tended a bit more towards the off-diagonal, such that diagonal decoding itself occurred mainly between ~320 and 540 ms as well as between 1.24 and 1.78 s, yet reached statistical significance in all four time bins (all ps<=0.05). The reverse generalization, from the working memory to the perception task, revealed a similar set of findings, with diagonal decoding in the alpha band occurring between ~100 ms and 1.44 s (first three time windows: ps<0.037) and in the beta band between ~100 and 540 ms and again between ~940 ms and 1.8 s (all time windows: ps<0.013). Taken together, these multivariate analyses are in-line with our previous univariate results, suggesting that power in the alpha/beta band may serve as a signature of conscious processing.

Last, in analogy to the ERF decoding analysis carried out in response to the Reviewer’s previous comments (Figure 4—figure supplement 2A), we also applied the aforementioned visibility decoders trained in the perception task to the unseen correct and incorrect trials in the working memory task separately for the alpha (Figure 4—figure supplement 2B) and beta (Figure 4—figure supplement 2C) band. There was no clear evidence for any generalization in either band. In light of the somewhat weaker visibility generalization in the beta band, the absence of decodability in this band might be difficult to interpret unequivocally with the current dataset. However, the fact that, despite a good initial visibility classifier, no generalization to the unseen correct trials occurred in the alpha band either argues against a simple miscategorization of the unseen correct trials.

Taken together, we presented a series of analyses aimed at evaluating the miscategorization hypothesis and, overall, found a clear distinction in the brain responses of seen and unseen (correct) trials. We were unable to classify unseen correct and unseen incorrect trials in the ERF; the univariate time-frequency analyses show a clear dissociation between the seen and unseen correct trials; there was no generalization between the seen and unseen correct trials in either the alpha or the beta band. The bulk of the evidence thus repeatedly converges towards a rejection of the miscategorization hypothesis.

We nevertheless agree with the Reviewer that, due to the nature of the research question at hand, it is difficult – if not impossible – to unequivocally rule out the miscategorization hypothesis in a single study. We believe that our paper represents an important first step in this direction and offers an interesting theoretical basis, but, ultimately, future research with a higher signal-to-noise ratio is needed to replicate these findings. In the revised version of this manuscript, we have therefore toned down our claim whenever appropriate and have added a final paragraph to the Discussion, pointing to the limitations of our study as well as the ensuing perspectives for future research. We also added the results of the generalization from visibility to accuracy decoding to the main manuscript ( Figure 4—figure supplement 2).

Author response image 6

(A) Temporal generalization matrices for decoding of visibility category with relative, average alpha (8–12 Hz) power as a function of training and testing task. In each panel, a classifier was trained at every time sample (y-axis) and tested on all other time points (x-axis). The diagonal gray line demarks classifiers trained and tested on the same time sample. Please note the event markers in any panel involving the perception task: Mean reaction time (target-present trials) for the visibility response is indicated as vertical and/or horizontal, dotted lines. Any classifier beyond this point only reflects post-visibility processes. Time courses of diagonal decoding and of classifiers averaged over the P3b time window (300–600 ms) and over the working memory maintenance period (0.8–2.5 s) are shown as black, red, and blue insets. Thick lines indicate significant, above-chance decoding of visibility (Wilcoxon signed-rank test across subjects, uncorrected, two-tailed except for diagonal). For display purposes, data were smoothed using a moving average with a window of one sample. AUC = area under the curve. (B) Same as in (A) but for average beta (13–30 Hz) power.

https://doi.org/10.7554/eLife.23871.028

2) Interpretation 2. I appreciate the authors' effort to clarify this point, and inclusion of Bayes Factors. Although I am mostly satisfied with their argument, I am also concerned that there is little evidence for decoding conscious memory either, so perhaps it is just not possible to decode memory for persistent ERFs. Conscious or otherwise. That is why I asked about using a more powerful decoding approach (e.g., svm, which could also use the x,y structure of the stimuli quite easily, e.g., svr, which would also handle any non-uniformity around the circle). The authors could also train a classifier on the strongest case (seen correct, and test for generalisation to unseen correct). For a convincing null effect, the reader should feel that a positive effect has been given the best possible chance. Already, the statistical power is really not well suited for this kind of decoding, so at the very least, a state-of-the-art classification approach should be used. Finally, I find the follow-up argument for the circular correlation (channel-wise localisation) unconvincing. Surely, in the context of the current research question, it would be far more important to sacrifice spatial information for optimal classification. I strongly recommend using a proper multivariate classification approach to better test the time course of target (and/or response) location information during unseen correct trials (potentially trained on seen correct).

We are glad to see that the Reviewer is mostly satisfied with our response and hope that we will be able to successfully address the remaining points. Indeed, while statistically significant at least throughout the first part of the delay period, decodability of seen targets was also not continuously sustained, but waxed and waned throughout the delay. While we interpret this periodically disappearing target-related activity not as an artifact of our analyses, but as a genuine neural mechanism supporting maintenance in conscious working memory, we fully agree with the Reviewer that a positive effect needs to have been given the best possible chance of being detected. We therefore implemented the Reviewer’s suggestion in order to test whether target and response location could be decoded using a multivariate regression approach. These results are depicted in Figure 5—figure supplement 2, Figure 5—figure supplement 3, and Figure 6—figure supplement 2 of the manuscript.

Following King et al. (2016), for each participant and each experimental condition, we trained two separate support vector regression models to predict the sine and cosine of the target/response location in question and then combined the resulting predictions to estimate the target angle. As can be seen in Figure 5—figure supplement 2 and Figure 6—figure supplement 2, the decoding time courses, although noisier, were very similar to the ones obtained with the circular-linear correlations and led us to the same conclusions: Seen targets/response locations were maintained at least throughout the first part of the delay period (P3b time window and first part of the delay: ps<0.05) via a slowly decaying and periodically resurfacing code, whereas no such pattern was evident on the unseen trials. When applying our best classifier (seen correct trials), separately to the unseen correct and incorrect trials, no generalization was observed for either target (Figure 5—figure supplement 3A) or response location (Figure 5—figure supplement 3B). The decoding approach thus confirmed the results from our circular-linear correlation analyses, albeit with more noise (cf., stability of standard error for the circular-linear correlations across different conditions vs. increases in standard error for the multivariate approach across different conditions).

Although, at first glance, the more stable performance of the circular-linear correlations might be surprising, it is understandable given the experimental paradigm: There are twenty target locations and, on average, only ~60 seen/unseen trials, so using a proper cross-validation scheme with a strict separation of the train and test sets makes it virtually impossible to build an adequate classifier. For example, even when combining across seen and unseen trials in the working memory task, a minimal two-fold cross-validation procedure would only include ~3 instances of each target/response location in each of the folds. If the initial classifier is poor because of limited data to train with, so will be the resulting predictions. This is obviously a limitation of the present investigation (see also below) that should be addressed by future research. Circular-linear correlations, on the other hand, combine information from all available trials and are thus more powerful in the current context. In addition, they also result in easily interpretable topographies, which, in the context of a spatial paradigm, is important information to consider. As such, circular-linear correlations appear to be the most appropriate measure in the context of the current investigation.

3) Interpretation 3. I generally agree with the authors' conceptualisation, and appreciate their effort to better clarify the terminology.

We are happy to hear that our efforts were successful.

4) Methodology 1. I find the arguments in the rebuttal unsatisfying. Evidence of some positive effects does not provide evidence for sufficient power to support null effects (which are central to the key claim of this paper). Bayes Factors help, but are only provided for the decoding analysis (which does not directly speak to the question of consciousness or not in the same way as Figure 3). If the new and unpublished data can help, then they should be included in this paper (not just as a reply to reviewer). I am particularly concerned that one of the central null effects (lack of location decoding during the unseen correct trials) is based on thirteen subjects, and fifty seven unseen trials (further break down of accuracy is not provided). This might be just about OK for strong TFR effects, but not obviously sufficient for subtle decoding effects. At a minimum, the authors need to be clearer about this limitation, and correspondingly more circumspect in the relevant interpretations/conclusions.

Bayes Factors had in fact been reported for all key analyses, including the time-frequency analysis (Figure 4—figure supplement 2) and the circular-linear correlations (Results section and Supplementary file 3). In addition, we had included data from our ongoing investigation in the – publicly available – response letter in order to draw attention to the fact that, even when doubling our sample size, the reported null effects still persist. While some conditions in this new experiment are sufficiently similar to the current manuscript to allow for averaging across all trials, the underlying experimental questions are different and will therefore be reported in the future. Nevertheless, the results of the condition shared with the Reviewer serve as a first step towards demonstrating the replicability of these findings.

While the lack of location decoding on the unseen correct trials is certainly of empirical and theoretical interest (e.g., the proposed model would actually predict for unseen correct target locations to fall somewhere in-between the seen and unseen incorrect trials in terms of decodability), we agree with the Reviewer that this particular hypothesis is difficult to assess in the context of the current study and should be further investigated in future studies. We have therefore drawn attention to this fact throughout the manuscript and discuss it in more detail at the end of the Discussion section. However, none of our central claims (i.e., rejection of the alternative hypotheses) rests on the absence of decodability on the unseen correct trials. For example, the rejection of an active conscious maintenance hypothesis requires the absence of response-related activity on all unseen trials (after all, if subjects actively maintained a conscious guess, they would have done so whenever they did not see the target) and, even on the seen trials, there was no evidence for a continuous maintenance of target/response location.

Despite all of our efforts (i.e., inclusion of Bayes’ Factors, converging evidence from univariate and multivariate analyses, ongoing replication of the main null effects), we appreciate that the current study is not without its limitations. The nature of the research question requires long trials and an approximately equal proportion of seen and unseen targets, thereby limiting the number of trials per condition one can obtain within a reasonable time frame. In addition, it necessitates the (statistically impossible) demonstration of null effects. It will therefore be important for future investigations to replicate the current findings, preferably with larger datasets (as, for example, already done in the case of conscious, activity-silent working memory in Wolff et al., 2015, 2017) and/or techniques with better signal-to-noise ratio, such as intracranial recordings. As suggested by the Reviewer, we now included a discussion of this limitation at the very end of the Discussion section and ensured a more careful wording throughout the entire manuscript. What we hope to achieve with the current work is to provide initial results and set a theoretical framework that will hopefully stimulate future research.

https://doi.org/10.7554/eLife.23871.030

Article and author information

Author details

  1. Darinka Trübutschek

    1. Ecole des Neurosciences de Paris Ile-de-France, 15 rue de l'Ecole de médecine, Paris, France
    2. Université Pierre et Marie Curie, 4 Place Jussieu, Paris, France
    3. Cognitive Neuroimaging Unit, CEA DSV/I2BM, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, Gif/Yvette, France
    Contribution
    DT, Conceptualization, Data curation, Formal analysis, Funding acquisition, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    darinkat87@gmail.com
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-7977-1366
  2. Sébastien Marti

    Cognitive Neuroimaging Unit, CEA DSV/I2BM, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, Gif/Yvette, France
    Contribution
    SM, Conceptualization, Formal analysis, Supervision, Visualization, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  3. Andrés Ojeda

    Department of Zoology, University of Oxford, Oxford, United Kingdom
    Contribution
    AO, Conceptualization, Formal analysis, Methodology
    Competing interests
    The authors declare that no competing interests exist.
  4. Jean-Rémi King

    1. Department of Psychology, New York University, New York, United States
    2. Frankfurt Institute for Advanced Studies, Frankfurt, Germany
    Contribution
    J-RK, Conceptualization, Methodology, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  5. Yuanyuan Mi

    Brain Science Center, Institute of Basic Medical Sciences, Beijing, China
    Contribution
    YM, Methodology, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  6. Misha Tsodyks

    1. Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel
    2. Department of Neuroscience, Columbia University, New York, United States
    Contribution
    MT, Methodology, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  7. Stanislas Dehaene

    1. Cognitive Neuroimaging Unit, CEA DSV/I2BM, INSERM, Université Paris-Sud, Université Paris-Saclay, NeuroSpin center, Gif/Yvette, France
    2. Collège de France, 11 Place Marcelin Berthelot, Paris, France
    Contribution
    SD, Conceptualization, Resources, Supervision, Funding acquisition, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.

Funding

Ecole des Neurosciences de Paris (PhD Fellowship)

  • Darinka Trübutschek

Fondation Schneider Electric (PhD Fellowship)

  • Darinka Trübutschek

CEA

  • Stanislas Dehaene

Institut National de la Santé et de la Recherche Médicale

  • Stanislas Dehaene

Collège de France

  • Stanislas Dehaene

European Research Council (Senior Grant, NeuroConsc)

  • Stanislas Dehaene

Fondation Roger de Spoelberch

  • Stanislas Dehaene

Canadian Institute for Advanced Research

  • Stanislas Dehaene

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We gratefully acknowledge Henrik Ueberschär, Leila Azizi, and Virginie Van Wassenhove for their invaluable daily support and stimulating discussion.

Ethics

Human subjects: The study was approved by the by CPP IDF under the reference CPP 08 021. All subjects gave written informed consent and consent to publish before participating in the study.

Reviewing Editor

  1. Tatiana Pasternak, Reviewing Editor, University of Rochester, United States

Publication history

  1. Received: December 2, 2016
  2. Accepted: July 13, 2017
  3. Accepted Manuscript published: July 18, 2017 (version 1)
  4. Version of Record published: September 7, 2017 (version 2)

Copyright

© 2017, Trübutschek et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 4,005
    Page views
  • 778
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading