Introduction

When a target stimulus is presented to one eye and a flickering Mondrian-like masker to the other eye, the target can be rendered invisible for an extended period (Tsuchiya & Koch, 2005). This paradigm, known as continuous flash suppression (CFS), has been widely used to investigate subconscious visual processing (Moors, Hesselmann, Wagemans, & van Ee, 2017; Pournaghdali & Schwartz, 2020; Yang, Brascamp, Kang, & Blake, 2014). Among the most intriguing findings are the subconscious high-level visual and cognitive functions under the influence of CFS (e.g., Adams, Gray, Garner, & Graf, 2010; Almeida, Mahon, Nakayama, & Caramazza, 2008; Fang & He, 2005; Mudrik, Breska, Lamy, & Deouell, 2011; Sklar et al., 2012; Tettamanti, Conca, Falini, & Perani, 2017; Zabelina et al., 2013). For example, priming effects are reportedly evident when the target and the invisible primer are categorically (Almeida et al., 2008) or semantically (Zabelina et al., 2013) consistent. However, many of these observations have been questioned by more recent studies, with at least some of the high-level effects being attributed to low-level feature processing (Gray, Adams, Hedger, Newton, & Garner, 2013; Hesselmann & Malach, 2011; Moors, Boelens, van Overwalle, & Wagemans, 2016; Moors & Hesselmann, 2018; Moors et al., 2017; Pournaghdali & Schwartz, 2020; Sakuraba, Sakai, Yamanaka, Yokosawa, & Hirayama, 2012; Stuit, Paffen, & Van der Stigchel, 2023).

A critical but unresolved issue in this debate is the impact of CFS on V1 neuronal activity. CFS has been hypothesized to arise from mechanisms similar to those in binocular rivalry (Moors et al., 2017; Tsuchiya & Koch, 2005; Yang et al., 2014), which likely suppress V1 responses through interocular inhibition. Only the surviving stimulus information would then be relayed to downstream areas for potential subconscious higher-level visual and cognitive processing (Adams et al., 2010; Almeida, Mahon, & Caramazza, 2010; Jiang, Costello, & He, 2007). Importantly, if V1 activity is suppressed to a sufficient degree, the low-level stimulus information carried by the remaining V1 responses may not suffice to sustain high-level processing of more complex stimuli defined by those low-level features.

Nevertheless, two fMRI studies reported that V1 activity is either unaffected or only weakly affected (Watanabe et al., 2011; Yuval-Greenberg & Heeger, 2013), implying that at least most stimulus inputs remain intact in V1 under CFS, thereby allowing for high-level unconscious processing. In these studies, the strength of neural responses under CFS, in which the stimulus and the flashing masker were presented dichoptically, was compared to that in a monocular masking condition, where the flashing masker was presented to the same eye as the target. No stronger or only slightly stronger CFS masking was found compared to monocular masking. However, monocular masking also suppresses pre-cortical neural responses (Macknik & Martinez-Conde, 2004). As a result, the dichoptic CFS masking, which is cortical, could be substantially stronger than monocular masking when accounting for the pre-cortical effects of monocular masking.

Neurons in V1 exhibit various degrees of ocular dominance (Hubel & Wiesel, 1962), which influences each neuron’s binocular combination of monocular visual inputs from the two eyes (Kato, Bishop, & Orban, 1981; Mitchell, Carlson, Westerberg, Cox, & Maier, 2023; Zhang, Zhao, Jiang, Tang, & Yu, 2024). In the present study, we examined the extent to which V1 neuronal responses are affected by CFS and how neurons preferring the target eye, masker eye, or both eyes are differently impacted. Using a customized two-photon imaging setup for awake macaques (Li, Liu, Jiang, Lee, & Tang, 2017), we sampled large neuronal populations at cellular resolution and measured ocular dominance for each individual neuron. This approach enabled us to investigate the potentially differential impacts of CFS on the responses of V1 neurons with varying ocular preferences, as well as apply machine learning tools to understand the sensory and perceptual consequences of these CFS impacts.

Results

We employed two-photon calcium imaging to record responses of V1 superficial neurons from two awake, fixating macaques, each with two response fields of view (FOVs, 850 x 850 μm2) (Fig. 1A). During the initial recording, the stimulus was a binocular 0.45-contrast square-wave grating varying at twelve orientations and two spatial frequencies (3 & 6 cpd) (Fig. 1B). A total of 3,564 neurons were identified through image processing, including 3,004 (84.29%) orientation-tuned neurons that were included in ensuing data analyses.

Two-photon imaging and ocular dominance mapping.

A. Optical windows for imaging of two macaques. Green crosses indicate the regions for viral vector injections, and yellow boxes indicate the FOVs chosen for imaging. B. Stimuli used for OD mapping. A circular-windowed square-wave grating was presented monocularly to each eye, respectively, to probe each neuron’s ODI. C. Ocular dominance functional maps of each FOV at single-neuron resolution showing OD clusters. D. Frequency distributions of individual neurons’ ODIs in each FOV.

The same grating stimulus was then presented monocularly (Fig. 1B) to each eye to characterize individual neurons’ eye preferences. Each neuron’s ocular dominance index (ODI) was calculated as ODI = (Ri – Rc)/(Ri + Rc), where Ri and Rc were the neuron’s peak responses to ipsilateral and contralateral stimulations, respectively. Neurons with an ODI at –1 or +1 would exclusively prefer the contralateral or ipsilateral eye, while neurons with an ODI at 0 would prefer both eyes equally. Consistent with previous findings (Horton & Hocking, 1996; Hubel & Wiesel, 1962; Livingstone, 1996; Zhang et al., 2024), neurons with similar eye preferences clustered together (Fig. 1C), indicating ocular dominance columns. The ODI followed unimodal distributions (Fig. 1D), in which the majority of neurons were binocular, showing comparable preferences for either eye. Only a small portion of neurons were monocular, being more responsive to the ipsilateral or contralateral eye.

In a third and last step, the grating stimulus and the flashing masker were presented dichoptically to evaluate the impact of CFS on neurons’ orientation responses (Fig. 2A). The results are summarized as population orientation tuning functions under the baseline no-CFS condition and the CFS condition following the procedure in Busse, Wade, and Carandini (2009). Specifically, neurons with similar orientation preferences were binned (bin width = 15°) relative to the target orientation for a total of 12 bins, and the resultant population orientation tuning functions based on the mean responses of these bins (Fig. 2C) were fitted with a Gaussian function. Compared to the baseline population orientation tuning functions, those under the influence of CFS displayed profound reductions in orientation responses. The amplitude decreased by 84.18% in Monkey A and 60.78% in Monkey B on the basis of Gaussian fitting, while the slope decreased by 91.31% in Monkey A and 71.50% in Monkey B (Fig. 2B).

The impacts of CFS on population orientation tuning in two macaques.

A. Stimuli used in the CFS experiment for one macaque. The grating target was presented to one eye, which was dichoptically masked by a circular flashing masker presented to the other eye. The white dot was the fixation point. B. Exemplar baseline and CFS orientation tuning functions for neurons with different eye preferences. C. Population orientation tuning functions of all neurons without CFS as the baseline and with CFS. Data from two FOVs of each monkey were pooled due to highly consistent results. Solid curves are Gaussian fittings. D. Population orientation tuning functions of sub-groups of neurons with different eye preferences without and with CFS. Solid curves are Gaussian fittings. Error bars represent ±1 SE. E. The impacts of CFS on Fisher information. Fisher information is plotted as a function of relative orientation (to the neuron’s preferred orientation) without and with CFS. Shaded areas denote ±1 SE. F. The ratio of baseline/CFS Fisher information within 15° of neurons’ preferred orientations. Data from two FOVs of each monkey were pooled due to highly consistent results.

Furthermore, neurons were divided into three groups according to their ODIs, and the impacts of CFS on their respective orientation responses were examined: neurons preferring the grating eye (ODI > 0.2 or < -0.2, depending on whether the grating stimulation was ipsilateral or contralateral), binocular neurons (-0.2 <= ODI <= 0.2), and neurons preferring the masker eye (ODI < -0.2 or > 0.2 relative to the grating eye). Compared to the no-CFS baseline condition, the orientation tuning of neurons preferring the masker eye was completely wiped out by CFS (Fig. 2B, D left), leading to flattened tuning curves with unmeasurable amplitudes or bandwidths. The orientation tuning of binocular neurons was either nearly completely wiped out (Monkey A) or substantially abolished (Monkey B) (Fig. 2B, D middle). There were 85.68% and 68.32% decreases in amplitude, and 92.64% and 77.07% decreases in slope, for Monkeys A and B, respectively. The orientation tuning of neurons preferring the grating eye was the least but still substantially affected (Fig. 2B, D right), with respective 77.78% and 41.75% decreases in amplitude and 85.23% and 57.56% decreases in slope for two monkeys.

To quantify the information loss in V1 population orientation coding due to continuous flash suppression (CFS), we compared the Fisher information (Averbeck & Lee, 2006) under both baseline and CFS conditions. Here, Fisher information serves as a statistical measure that reflects how much information the responses of neurons can provide about the grating orientation. Specifically, it indicates the sensitivity of neural responses to small changes in orientation, in that higher values signify greater precision in encoding that information. As illustrated in Fig. 2E, Fisher information was reduced by CFS primarily for orientations deviated by less than 15° from the neurons’ preferred orientations. The average Fisher information for stimuli within this 15° range decreased to 29.1% and 43.4% of the baseline values in two macaques, respectively (Fig. 2F), indicating the detrimental impact of CFS on the ability of V1 populations to accurately encode and represent orientation information, especially for orientations closely aligned with neuronal preferences.

What are the sensory and perceptual consequences of CFS-induced suppression of V1 orientation responses? To answer this question, which is crucial for understanding subconscious processing under CFS, we trained linear decoders to classify neighboring stimulus orientations (15°) in our experiments, as well as transformer models to reconstruct the grating images. Here, orientation classification reflected coarse orientation discrimination, and image reconstruction reflected orientation perception, both suggesting the upper bounds of performance assuming an ideal observer.

For orientation classification, we trained a support vector machine (SVM) to classify neighboring orientations based on population neural activity in each trial. Decoders for different FOVs, ipsilateral/contralateral target presentations, different pairs of neighboring orientations, and baseline vs. CFS conditions were trained separately. Under the baseline condition, the decoders achieved mean classification accuracies of 95.8 ± 2.3% and 97.8 ± 1.8% across ipsilateral and contralateral eye conditions and 12 neighboring orientation pairs in Monkeys A and B, respectively. Under CFS, the respective accuracies were unchanged (93.4 ± 3.3% and 98.1 ± 1.5%, Fig. 3A). These results suggest that under CFS, there is likely still sufficient information for orientation discrimination, even for Monkey A whose V1 neuronal responses were more substantially suppressed. Furthermore, with CFS suppression as severe as that in Monkey A, real and non-ideal observers with low efficiency in reading out stimulus information may still have a good chance of completing a similar coarse orientation discrimination task.

Sensory/perceptual consequences of CFS revealed by machine learning.

A. Orientation classification accuracies under CFS vs. baseline conditions obtained using SVM decoders. Each datum represents results from a contralateral or ipsilateral grating condition with a specific FOV averaged across 5-fold cross-validations. Error bars denote 95% confidence intervals. B. A diagram of the transformer model for stimulus image reconstruction. C. Exemplar learning curves of transformer models under baseline and CFS conditions from two FOVs. The vertical dashed line indicates the epoch at which the baseline model reaches 75% of its total loss decrease between the two learning plateaus estimated using a sigmoid fit. D. Illustrations of corresponding reconstructed stimulus images on the basis of learning curves in C. E. Box plots of SSIM scores between the original and reconstructed images with baseline and CFS transformers. Within a FOV, results from contralateral eye and ipsilateral eye conditions are combined. F. Distributions of absolute orientation errors between the true orientation and the orientation extracted from the reconstructed image using a gradient-based procedure.

Next, we trained transformer models to reconstruct the grating stimulus images on the basis of corresponding neuronal responses under baseline and CFS conditions. The motivation for this part of the modeling work was the assumption that high-level tasks would be difficult to carry out if the basic stimulus features forming more complex patterns were not intact. Our transformer model contained an architecture that integrated embedding, self-attention, and unembedding modules, as well as a fully connected feedforward layer (Fig. 3B). The model inputs were the responses of all neurons within a FOV to the grating stimulus (ipsilateral and contralateral presentations of the same stimulus were modeled seperately), and the model output was the reconstructed grating image. During the training process, the model typically reached two successive learning plateaus, where the validation loss temporarily stagnated (Fig. 3C). Moreover, the validation loss decreased more rapidly when training on the baseline neural response data compared to the CFS data. To compare the differences, we identified the epoch at which the validation loss of the baseline model reached 75% of its total decrease between the two plateaus using a sigmoid fit, and then we retrained both the baseline and CFS models up to this epoch.

The retained baseline models reconstructed the grating stimuli significantly better than the CFS models in Monkey A, but this discrepancy was less pronounced in Monkey B (Fig. 3D), consistent with the neuronal data that Monkey A exhibited substantially more CFS suppression than Monkey B in terms of population orientation tuning and Fisher information (Fig. 2). We used a structural similarity index (SSIM) (Brunet, Vrscay, & Wang, 2012) and a gradient-based orientation extraction procedure to quantify the reconstruction performances. Across the grating-presenting ipsilateral and contralateral eyes, the baseline models reconstructed the grating with median SSIMs of 0.52 and 0.61 for the two FOVs of Monkey A, and 0.57 and 0.63 for the two FOVs of Monkey B, respectively, while the corresponding SSIMs for the CFS models were 0.16 and 0.19 for Monkey A, and 0.55 and 0.53 for Monkey B (Fig. 3E).

Furthermore, the grating orientations extracted from the reconstructed images deviated 4.46° and 3.10° (median values) from the actual stimulus orientation in baseline models for the two FOVs of Monkey A, and 2.86° and 2.20° for the two FOVs of Monkey B, respectively. However, in the CFS models, this orientation error increased to 48.45° and 24.03° in the two FOVs of Monkey A, implying that the stimulus orientation could not be reconstructued or unconsciously “perceived” (Fig. 3F). In contrast, the orientation error increased slightly to 3.06° and 3.42° in Monkey B, implying only moderately impaired reconstruction and unconscious “perception” of the stimulus orientation.

To estimate the impact of CFS-induced V1 suppression on downstream processing, we also recorded neuronal responses from two V2 FOVs in Monkey A (FOVs 3 & 4). As anticipated, V2 neurons were binocular, with over 90% of them showing ODIs within the range of -0.2 to 0.2 (Fig. 4A). Similar to the V1 results for the same monkey, CFS on average reduced the amplitudes of the population orientation tuning functions by 80.05% and the slopes by 89.44% (Fig. 4B). It also reduced the Fisher information to 33.1% of the baseline value (Fig. 4C). Furthermore, we applied the same orientation classification and image reconstruction procedures to the V2 data. For orientation classification, the SVM decoders achieved near-perfect performance in distinguishing neighboring orientations under both baseline and CFS conditions, with classification accuracies exceeding 98% across all cases (Fig. 4D). In the image reconstruction task, the baseline model outperformed the CFS model. Specifically, the baseline transformer models reconstructed the stimulus images with the median SSIM values of 0.61 and 0.53 for the two V2 FOVs, respectively, which dropped to 0.42 and 0.18 in the CFS models (Fig. 4E left). The resulting errors of extracted orientations increased from 2.53° and 3.05° with the baseline models to 7.00° and 42.45° with the CFS models (Fig. 4E right), implying poorer or failed reconstruction and unconscious “perception” of stimulus orientations.

Effects of CFS on V2 orientation responses.

A. OD maps of the two V2 FOVs of Monkey A (MA3 & MA4). B. Population orientation tuning functions for all orientation-tuned neurons with baseline and CFS conditions. Solid lines represent the results of Gaussian fittings. Error bars represent ±1 SE. C. Fisher information as a function of the relative orientation (to the neuron’s preferred orientation) with baseline and CFS conditions. Shaded areas denote ±1 SE. Fisher information was lower in MA4 due to higher variations in the data. D. Orientation classification accuracies under CFS vs. baseline conditions using SVM decoders. Each datum represents results from a contralateral or ipsilateral grating condition with one FOV, averaged across 5-fold cross-validations. Error bars denote 95% confidence intervals. E. Box plots of SSIM scores between the original and reconstructed images with baseline and CFS transformers. Within a FOV, results from contralateral eye and ipsilateral eye conditions are combined. F. Distributions of orientation deviation errors between the original orientation and the extracted orientation.

Discussion

Our study demonstrates that CFS severely compromises orientation information in V1 neurons in an ocular dominance-dependent manner. Orientation information carried by neurons preferring the masker eye or both eyes is completely or nearly completely wiped out, while information carried by those preferring the grating eye is partially retained.

Downstream, orientation information in V2 neurons is also substantially weakened. Linear decoding and transformer models suggest that CFS-compromised orientation information may still allow coarse orientation discrimination, but will most likely impair orientation perception when the suppression is sufficiently strong as in Monkey A. Similarly strong suppression is also possible in Monkey B if the current grating contrast (0.45) is lower to be 0.1-0.3 as in some CFS experiments (Alais, Coorey, Blake, & Davidson, 2024; Lunghi & Pooresmaeili, 2023; Watanabe et al., 2011; Yuval-Greenberg & Heeger, 2013).

CFS-compromised V1 orientation information transmits for downstream visual processing, which may explain the unconscious orientation processing observed in human CFS studies. The “invisible” orientation information can be processed, as demonstrated by adaptation (Bahrami, Carmel, Walsh, Rees, & Lavie, 2008; Kanai, Tsuchiya, & Verstraten, 2006) and priming (Koivisto & Grassini, 2018) studies. The adaptation aftereffect is reduced compared to the visible condition but not entirely abolished (Bahrami et al., 2008; Kanai et al., 2006), likely a result of the degraded orientation information surviving CFS. For the same reason, the priming effect also decreases during trials in which the stimulus is rendered invisible by CFS, compared to those in which the stimulus is visible or partially visible (Koivisto & Grassini, 2018), as the degraded stimulus information provides insufficient evidence for decision-making, resulting in a diminished priming effect (Dehaene, 2011; Gomez, Perea, & Ratcliff, 2013).

Furthermore, our linear decoding and transformer results can help elucidate the debate on whether visual processing still functions at the categorization level under the influence of CFS. Previous studies have provided evidence for the preserved category information of the target, as demonstrated by tool-specific priming effects (Almeida et al., 2010; Almeida et al., 2008) and differential BOLD response patterns between tools and other object categories under CFS (Hesselmann, Hebart, & Malach, 2011; Tettamanti et al., 2017). However, an intriguing question is: Do these results rather reflect low-level feature differences between tools and other object categories? It has been reported that elongated objects, irrespective of their categorical affiliation, elicit similar priming effects (Sakuraba et al., 2012). Consistent with this, when tools are categorized by their shape (elongated vs. non-elongated), only the neural response patterns elicited by elongated tools can be discriminated from other object categories under CFS (Fogelson, Kohler, Miller, Granger, & Tse, 2014; Ludwig, Kathmann, Sterzer, & Hesselmann, 2015). Moreover, a recent study measuring the contrast thresholds required to both break from and suppress CFS found that stimuli exhibited similar suppression strengths across various categories (Alais et al., 2024). According to our results, when suppression is too strong to allow for stimulus reconstruction, as in the case of Monkey A (Fig. 3C), the orientation information under CFS may not accumulate to a level sufficient for resolving semantic category boundaries. The latter might require somewhat intact stimulus orientation, even if subconsciously. However, it could potentially assist in category discrimination when categorical differences lie in certain low-level shape dimensions like orientation, as coarse orientation discrimination appears unaffected by CFS suppression (Fig. 3A).

In the framework of global neuronal workspace theory (Mashour, Roelfsema, Changeux, & Dehaene, 2020; Seth & Bayne, 2022), a stimulus reaches consciousness when it triggers an ‘ignition’, defined as the recurrent processing and amplification of the sensory signal. This ignition enables the stimulus to broadcast and be available to a widespread global neuronal workspace, thereby becoming part of the conscious experience. To achieve this, the feedforward signal needs to be sufficiently strong to reach the ‘consciousness hub’, where high-density connectivity facilitates efficient broadcasting, presumably in the prefrontal cortex (Mashour et al., 2020). In the present study, the target information under CFS is severely compromised in V1, therefore unable to reach the ’consciousness hub’ and trigger an ignition, thus remaining subliminal.

Materials and methods

Monkey preparation

Monkey preparation was identical to procedures reported in previous studies (Guan, Ju, Tao, Tang, & Yu, 2021; Ju, Guan, Tao, Tang, & Yu, 2020; Zhang et al., 2024). Two male rhesus monkeys (Macaca mulatta, aged 5 and 6, respectively) underwent two sequential surgeries under general anesthesia and strictly sterile conditions. During the first surgery, a 20-mm diameter craniotomy was performed on the skull over V1. The dura was opened and multiple tracks of 100-150 nil AAV1.hSynap.GCaMP5G.WPRE.SV40 (AV-1-PV2478, titer 2.37e13 (GC/ml), Penn Vector Core) were pressure-injected at a depth of ∼350 µm at multiple locations. The dura was then sutured, the skull cap was re-attached with three titanium lugs and six screws, and the scalp was sutured. After the surgery, the animal was returned to the cage and treated with injectable antibiotics (Ceftriaxone sodium, Youcare Pharmaceutical Group, China) for one week. Postoperative analgesia was also administered. The second surgery was performed 45 days later. A T-shaped steel frame was installed for head stabilization, and an optical window was inserted onto the cortical surface. Data collection could start as early as one week later. More details about the preparation and surgical procedures can be found in Li et al. (2017). The procedures were approved by the Institutional Animal Care and Use Committee, Peking University.

Behavioral task

After a ten-day recovery period following the second surgery, monkeys were placed in a primate chair with head restraint. They were trained to hold fixation on a small white spot (0.2°) with eye positions monitored by an Eyelink-1000 eye tracker (SR Research) at a 1000-Hz sampling rate. During the experiment, trials with the eye position deviated 1.5° or more from the fixation before stimulus offset were discarded as ones with saccades and repeated.

Visual stimuli and experimental procedures

Visual stimuli were generated with a Matlab-based Psychtoolbox-3 software (Pelli & Zhang, 1991) and presented on a ROG Swift PG278QR monitor (refresh rate = 120 Hz, resolution = 2560 × 1440 pixel, pixel size = 0.23 mm × 0.23 mm). The screen luminance was linearized by an 8-bit look-up table, and the mean luminance was 47 cd/m2. The viewing distance was 60 cm.

A drifting square-wave grating (spatial frequency = 4 cpd, contrast = full, speed = 3 cycles/sec, starting phase = 0°, size = 0.4° in diameter) was first used to determine the population receptive field (pRF) location, shape, and approximate size associated with a specific FOV. The same stimulus was also monocularly presented to confirm the V1 location as ocular dominance columns would appear. This fast process used a 4 × objective lens mounted on the two-photon microscope and did not provide cell-specific information. The recorded V1 pRFs were centered at ∼0.90° eccentricity in Monkey A and ∼1.93° in Monkey B. V2 pRFs were centered at ∼0.67° in Monkey A. All pRFs were approximately circular with a diameter of 0.9°.

The target stimulus used in the experiments was a 0.45-contrast circular-windowed square-wave grating. It drifted at 4 cycles per second in opposite directions perpendicular to the orientation with a starting phase of 0°, and varied at 12 orientations (0° to 165° in 15° increments) and two spatial frequencies (3 & 6 cpd) trial by trial. The circular envelope had a diameter of 1°, which approximated the size of pRFs for recorded FOVs, with the edge blurred by a linear ramp starting at a radius of 0.38°. The flashing masker was a circular white noise pattern with a diameter of 1.89°, a contrast of 0.5, and a flickering rate of 10 Hz. The white noise consisted of randomly generated black and white blocks (0.07° × 0.07° each). The target grating and the flashing masker were presented through a pair of NVIDIA 3D Vision 2 active shutter glasses. To mitigate the ghost image, a low contrast (RMS contrast = 0.08) white noise was added to the grating. The width of the noise element was half of the bar width of the square grating, and the white noise was regenerated every frame.

Each block of trials consisted of four groups of stimuli: binocular, monocular, CFS, and flashing masker-only. In the binocular group, the grating was presented to both eyes simultaneously. The relevant data were only used to help identify ROIs and orientation-tuned neurons along with data from other stimulus conditions. In the monocular group, the grating was monocularly presented to the contralateral or ipsilateral eye, which served as the baseline conditions without the influences of CFS. In the CFS group, the grating and flashing masker were presented dichoptically. In the flashing masker-only group, the flashing masker was presented monocularly to either eye. Each stimulus condition was repeated for 10-12 trials. For conditions involving the grating, the trials were split for two opposite drifting directions. A block of trial contained 242 trials, two trials for each stimulus condition, with the order of stimulus conditions arranged in a pseudorandom manner. There were 5 to 6 blocks of trials with each FOV.

Each stimulus was presented for 1000 ms, followed by an inter-stimulus interval of 1500 ms, allowing sufficient time for the calcium signals to return to the baseline level (Guan, Zhang, Zhang, Tang, & Yu, 2020). For each FOV, the recording was completed in a single session with 5-6 experiment blocks and lasted for 2-3 hours.

Two-photon imaging

Two-photon imaging was performed using a FENTOSmart two-photon microscope (Femtonics), along with a Ti:sapphire laser (Mai Tai eHP, Spectra Physics). GCaMP5 was chosen as the indicator of calcium signals because the fluorescence activities it expresses are linearly proportional to neuronal spike activities within a wide range of firing rates from 10-150 Hz (Li et al., 2017). During imaging, a 16× objective lens (0.8 N.A., Nikon) with a resolution of 1.6 µm/pixel was used, along with a 1000 nm femtosecond laser. A fast resonant scanning mode (32 fps) was chosen to obtain continuous images of neuronal activity (8 frames per second after averaging every 4 frames). The strength of fluorescent signals (mean luminance of a small area) was monitored and adjusted if necessary for the drift of fluorescent signals. Two response fields of view (FOVs) measuring 850 × 850 µm2 in V1 were selected in both macaques, and two FOVs of the same size in V2 were selected in Macaque A.

Imaging data analysis: Initial screening of ROIs

Data were analyzed with customized MATLAB codes. A normalized cross-correlation based translation algorithm was used to reduce motion artifacts (Li et al., 2017). Then the fluorescence changes were associated with corresponding visual stimuli through the time sequence information recorded by Neural Signal Processor (Cerebus system, Blackrock Microsystem). By subtracting the mean of the 4 frames before stimuli onset (F0) from the average of the 6th-9th frames after stimuli onset (F) across 5 or 6 repeated trials for the same stimulus condition (same orientation, spatial frequency, size, and drifting direction), the differential image (ΔF = F - F0) was obtained.

For a specific FOV, the regions of interest (ROIs) or possible cell bodies were decided through sequential analysis of 242 differential images in the order of CFS, monocular, binocular, and flashing masker-only conditions. CFS conditions consisted of 96 (2×2×12×2 = 96) differential images, with the grating presented to either eye (2), at two spatial frequencies (2), twelve orientations (12), and two motion directions (2). Monocular conditions were identical to the CFS conditions except that the flashing masker was absent. In the binocular conditions, gratings at two spatial frequencies (2), twelve orientations (12), and two motion directions (2) were binocularly presented, resulting in 48 differential images. The flashing masker-only conditions consisted of the flashing masker presented to either eye, resulting in 2 differential images.

The first differential image was filtered with a band-pass Gaussian filter (size = 2–10 pixels), and connected subsets of pixels (>25 pixels, which would exclude smaller vertical neuropils) with average pixel value >3 standard deviations of the mean brightness were selected as ROIs. Then the areas of these ROIs were set to mean brightness in the next differential image before the bandpass filtering and thresholding were performed. This measure gradually reduced the standard deviations of differential images and facilitated the detection of neurons with relatively low fluorescence responses. If a new ROI and an existing ROI from the previous differential image overlapped, the new ROI would be on its own if the overlapping area OA < 1/4 ROInew, discarded if 1/4 ROInew < OA < 3/4 ROInew, and merged with the existing ROI if OA > 3/4 ROInew. The merges would help smoothen the contours of the final ROIs. This process went on through all differential images twice to select ROIs. Finally, the roundness for each ROI was calculated as:

where A was the ROI’s area, and P was the perimeter. Only ROIs with roundness larger than 0.9, which would exclude horizontal neuropils, were selected for further analysis.

Imaging data analysis: Orientation tuning and ocular dominance

The ratio of fluorescence change (ΔF/F0) was calculated as a neuron’s response to a specific stimulus condition. For a specific neuron’s response to a specific stimulus condition, the F0n of the n-th trial was the average of 4 frames before stimulus onset (-500 - 0 ms), and Fn was the average of the 5th-8th, 6th-9th, or 7th-10th frames after stimulus onset, whichever was the greatest. F0n was then averaged across 10 or 12 repeated trials to obtain the baseline F0 for all trials (to reduce noise in the calculation of responses), and ΔFn/F0 = (Fn-F0)/F0 was taken as the neuron’s response to this stimulus at the n-th trial.

Several steps were taken to determine whether a neuron was orientation-selective. For each monocular or binocular condition, the orientation and SF eliciting the maximal responsewere designated as the neuron’s preferred SF and orientation. We then compared responses across all 12 orientations at the preferred SF by performing a non-parametric Friedman test to determine whether the neuron’s responses at various orientations were significantly different from each other. To reduce Type I errors, the significance level was set at α = 0.01. Neurons that passed the Friedman test at least under one viewing condition were selected as orientation-tuned neurons.

The ocular dominance index (ODI) was calculated to characterize each neuron’s eye preference: ODI = (Ri – Rc)/(Ri +Rc), where Ri and Rc were the neuron’s peak responses at the best orientation and SF to ipsilateral and contralateral monocular grating conditions, respectively. Neurons with an ODI of -1 or 1 would be completely contralateral or ipsilateral eye dominant, and of 0 would be equally dominant by both eyes.

Population orientation tuning

For each neuron, neural responses at the preferred SF were selected for tuning analysis. To derive population orientation tuning curves under a specific condition, we categorized neurons into twelve orientation preference bins according to their preferred orientations (bin width = 15°). For each orientation presented, the responses of all orientation preference bins were reorganized according to the relative orientation preference. Subsequently, neuronal responses of the same relative orientation preference were averaged to generate the final population orientation tuning function. For CFS conditions, the selected SF and binning procedures were the same as their corresponding monocular conditions.

The population orientation tuning function was fitted with a Gaussian model with MATLAB’s nonlinear least-squares function ‘lsqnonlin’:

where R(θ) was the response at orientation θ, free parameters a, θ0, σ, and b were the amplitude, peak orientation, standard deviation of the Gaussian function (equal to half width at half height), and minimal response, respectively.

The population orientation tuning curves for different eye preference groups were derived using the same procedure, with additional binning of neurons according to their ODI. To obtain the tuning curve of the neurons preferring the eye seeing the grating, responses of neurons with an ODI < -0.2 (preferentially responding to the contralateral eye) under contralateral eye grating presentation and those with an ODI > 0.2 (preferentially responding to the ipsilateral eye) under ipsilateral eye grating presentation were combined. Similarly, for neurons preferring the eye seeing the masker, responses of neurons with an ODI < -0.2 under ipsilateral eye grating presentation and those with an ODI > 0.2 under contralateral eye grating presentation were combined. For binocular neurons (-0.2 < ODI < 0.2), responses under both grating presentation conditions were combined.

Fisher information

The Fisher information assesses the amount of information contained in a neuron population using an optimal decoder (Pouget, Deneve, Ducom, & Latham, 1999). Assume independent Gaussian noise distributions, the Fisher information for a population of N neurons was given as

where fi(θ) was the mean activity of neuron i in response to the presentation angle, θ, and f(θ) was its derivative with respect to θ. We fitted each neuron’s response tuning fi(θ) and variance tuning σi(θ) with Gaussian functions and calculated the averaged Fisher information across neurons at each orientation.

SVM-based orientation classification

In the orientation classification task (Fig. 3A), we trained a support vector machine (SVM) to classify neighboring orientations from standardized population neural activity. The SVM decoder was implemented using MATLAB’s ‘fitcsvm’ function with a linear kernel. To prevent overfitting and evaluate the generalization ability of the model, we employed a 5-fold cross-validation procedure, and the model performance on the validation dataset was reported.

Decoders were trained independently for each experiment condition and adjacent orientation pairs, resulting in 48 models per FOV (contralateral/ipsilateral × baseline/CFS × 12 neighboring orientation pairs). Neural response data from two spatial frequencies and two orientations in a neighboring orientation pair were used as input, with each neuron treated as a feature. In this way, each model were trained on 48 or 40 samples (2 SFs × 2 orientations × 12/10 repeats).

The transformer model

Model input, output, and training procedure

We implemented a transformer-based model to reconstruct grating stimuli from population neuronal responses recorded under different experiment conditions. The model input was a vector of neuronal responses, each corresponding to an individual neuron, and the output was the reconstructed grating image of size 70 × 70 pixels.

The transfromer was trained independently for each experimental condition, resulting in four models per FOV (contralateral/ipsilateral × baseline/CFS). Pilot experiments revealed that our original dataset was insufficient for the model to converge. To address this, we augmented the dataset to four times its original size before training. Augmentation was performed by sampling from a normal distribution centered at each neuron’s response mean, with a standard deviation equal to its original standard deviation. Within the augmented dataset, 6% was reserved for validation. Responses were normalized to [0, 1] before fed into the model.

We implemented a two-phase training procedure to assess the reconstruction ability of models trained on different neural data. During the training process, the model typically reached two learning plateaus, where the validation loss temporarily stagnated (Fig. 3C). In the first training session, we analyzed the learning curve to determine the epoch at which the baseline model’s validation loss had completed 75% of its total decrease between the two plateaus. This was estimated using a modified sigmoid fit:

where A and C defined the function range, b was the symmetry point, k was the steepness parameter, and t represented the epoch number, counted from epoch 500 (initial epochs were discarded due to a drastic drop in validation loss across all training runs, see Fig. 3C). The 75% decrease point was computed as:

In the second training session, we retrained both models up to the identified epoch and evaluated their performances.

The model was trained to minimize the mean squared error (MSE) between the reconstructed and actual stimuli. Optimization was performed using RMSprop with a learning rate of 0.00005 and a smoothing factor ρ = 0.85. Model structure

Each neuron’s response was embedded into a higher-dimensional space using a learned weight vector as follows:

where the R0 (n×1) represented the original response vector from n neurons, and Wemb(n× dmodel) was the embedding weight matrix, with each row corresponding to a neuron-specific weight vector. Here we used dmodel = 2. The symbol ⊙ denoteed row-wise multiplication, such that the ith response was multiplied by both elements in its embedding weight vector . The resulting embedding matrix R1 (n×dmodel) contained the high-dimensional representations of the neuronal responses.

The enriched embedding matrix was then passed through a self-attention module. In this module, R1was first projected into queries (Q), keys (K), and values (V) through independent learnable weight matrices, respectively. Then the attention map was computed as:

where dk represented the dimensionality of the key vectors, which scaled the dot-product to control the variance of the attention scores.

The output of self-attention was calculated as:

The output from self-attention was unembedded by projecting each neuron’s high-dimensional representation back to one-dimensional. A feedforward layer transformed the unembedded vector into a stimulus vector, which was then reshaped into the final 70 × 70 image.

Model evaluation

The original, non-augmented data was used for analysis, which had been seen during training in both the training and validation sets. We used a structural similarity index (SSIM) (Brunet et al., 2012) and a gradient-based orientation extraction procedure to quantify the reconstruction performances.

The SSIM (Brunet et al., 2011) between two images x and y (both 70×70) is defined as:

where μx and μy are the mean intensities, and are the variances, σxy is the covariance, and c1 and c2 are constants for numerical stability.

To extract the stimulus orientation from each Gabor image, we first computed the horizontal and vertical gradients and assembled them into a matrix of pixel-wise gradient vectors. Principal component analysis (PCA) was then applied to this matrix to identify the primary direction of variation in the gradient field, which reflected the dominant orientation of image features. The final orientation was converted to degrees and constrained to the range [0, 180).

Code and data avalibility

The code could be found at github: https://github.com/caviaryusi/CFS/blob/main/README.md

The data could be found at HuggingFace (Hugging Face, RRID:SCR_020958): https://huggingface.co/datasets/chencaixia/CFS_2p

Acknowledgements

This study was supported by a STI2030-Major Projects grant (2022ZD0204600), Natural Science Foundation of China grants 31230030 and 31730109, and funds from Peking-Tsinghua Center for Life Sciences, Peking University.

Additional information

Funding

Ministry of Science and Technology of the People's Republic of China (2022ZD0204600)

National Natural Science Foundation of China

Ministry of Science and Technology of the People's Republic of China (31230030)

Ministry of Science and Technology of the People's Republic of China (31730109)