Continuous flashing suppression of neural responses and population orientation coding in macaque V1

Cai-Xia Chen; Xin Wang; Dan-Qing Jiang; Shi-Ming Tang; Cong Yu

doi:10.7554/eLife.107518.2

Introduction

When a target stimulus is presented to one eye and a flickering Mondrian-like masker to the other eye, the target can be rendered invisible for an extended period (Tsuchiya & Koch, 2005). This paradigm, known as continuous flash suppression (CFS), has been widely used to investigate subconscious visual processing (Yang, Brascamp, Kang, & Blake, 2014; Moors, Hesselmann, Wagemans, & van Ee, 2017; Pournaghdali & Schwartz, 2020). Among the most intriguing findings are the subconscious high-level visual and cognitive functions under the influence of CFS (e.g., Fang & He, 2005; Almeida, Mahon, Nakayama, & Caramazza, 2008; Adams, Gray, Garner, & Graf, 2010; Mudrik, Breska, Lamy, & Deouell, 2011; Sklar et al., 2012; Zabelina et al., 2013; Tettamanti, Conca, Falini, & Perani, 2017). For example, as reported, priming effects are evident when the target and the invisible primer are categorically (Almeida et al., 2008) or semantically (Zabelina et al., 2013) consistent. However, many of these observations have been questioned by more recent studies, with at least some of the high-level effects being attributed to low-level feature processing (Hesselmann & Malach, 2011; Sakuraba, Sakai, Yamanaka, Yokosawa, & Hirayama, 2012; Gray, Adams, Hedger, Newton, & Garner, 2013; Moors, Boelens, van Overwalle, & Wagemans, 2016; Moors et al., 2017; Moors & Hesselmann, 2018; Pournaghdali & Schwartz, 2020; Stuit, Paffen, & Van der Stigchel, 2023).

A critical issue in this debate is the impact of CFS on V1 neuronal activity. CFS has been hypothesized to arise from mechanisms similar to those in binocular rivalry (Tsuchiya & Koch, 2005; Yang et al., 2014; Moors et al., 2017), which likely suppress V1 responses through interocular inhibition. Only the surviving stimulus information would then be relayed to downstream areas for potential subconscious higher-level visual and cognitive processing (Jiang, Costello, & He, 2007; Adams et al., 2010; Almeida, Mahon, & Caramazza, 2010). Importantly, if V1 activity is suppressed to a sufficient degree, the low-level stimulus information carried by the remaining V1 responses may not suffice to sustain high-level processing of more complex stimuli defined by those low-level features.

Two prominent fMRI studies have examined the impact of CFS on V1 activity (Watanabe et al., 2011; Yuval-Greenberg & Heeger, 2013). Watanabe et al. (2011) compared monocular CFS masking (stimulus visible) and dichoptic CFS masking (stimulus invisible), and reported that V1 BOLD responses were largely insensitive to stimulus visibility when attention was carefully controlled. However, using similar experimental design, Yuval-Greenberg and Heeger (2013) observed reduced BOLD responses in V1 under dichoptic masking, suggesting that V1 activity changed with stimulus visibility. They attributed the difference of results between two studies mainly to differences in number of trials and thus the statistical power (∼250 trials per condition vs. ∼90 trials per condition). Nevertheless, these studies were not designed to quantify the pure effect of CFS on stimulus-evoked V1 responses, as they contrasted monocular and dichoptic masking conditions to equate stimulus input while manipulating perceptual visibility. In contrast, original psychophysical studies (Tsuchiya & Koch, 2005; Tsuchiya, Koch, Gilroy, & Blake, 2006) demonstrated CFS masking by contrasting the visibility of the target stimulus with and without the presence of dichoptic mask. It is apparent that the pure CFS impact in above fMRI studies should measure the difference of BOLD signals between binocular masking and stimulus alone conditions. In other words, the impact of CFS on V1 activity should be larger than what has been reported by Yuval-Greenberg and Heeger (2013).

Neurons in V1 exhibit various degrees of ocular dominance (Hubel & Wiesel, 1962), which influences each neuron’s binocular combination of monocular visual inputs from two eyes (Kato, Bishop, & Orban, 1981; Mitchell, Carlson, Westerberg, Cox, & Maier, 2023; Zhang, Zhao, Jiang, Tang, & Yu, 2024). In the present study, we used a with-or-without-dichoptic-masker design similar to those used in original psychophysical studies, and examined the extent to which V1 neuronal responses were affected by CFS and how neurons preferring the target eye, masker eye, or both eyes were differently impacted. Using a customized two-photon imaging setup for awake macaques (Li, Liu, Jiang, Lee, & Tang, 2017), we sampled large neural populations at cellular resolution and measured ocular dominance for each individual neuron. This approach enabled us to investigate the potentially differential impacts of CFS on the responses of V1 neurons with varying ocular preferences, as well as apply machine learning tools to understand the impacts of CFS on V1 orientation coding at the population level.

Results

We used two-photon calcium imaging to record responses of V1 superficial neurons from two awake, fixating macaques, each with two response fields of view (FOVs, 850 x 850 µm²) (Fig. 1A). During the initial recording, the stimulus was a binocular 0.45-contrast square-wave grating varying at twelve orientations and two spatial frequencies (3 & 6 cpd) (Fig. 1B). A total of 3,564 neurons were identified through image processing, including 3,004 (84.29%) orientation-tuned neurons that were included in following data analyses.

Two-photon imaging and ocular dominance mapping.
A. Optical windows for imaging of two macaques. Green crosses indicate the regions for viral vector injections, and yellow boxes indicate the FOVs chosen for imaging. B. Stimuli used for OD mapping. A circular-windowed square-wave grating was presented monocularly to each eye, respectively, to probe each neuron’s ODI. C. Ocular dominance functional maps of each FOV at single-neuron resolution showing OD clusters. D. Frequency distributions of individual neurons’ ODIs in each FOV.

The same grating stimulus was then presented monocularly (Fig. 1B) to each eye to characterize individual neurons’ eye preferences. Each neuron’s ocular dominance index (ODI) was calculated as ODI = (R_i – R_c)/(R_i + R_c), where R_i and R_c were the neuron’s peak responses to ipsilateral and contralateral stimulations, respectively. Neurons with an ODI at –1 or +1 would exclusively prefer the contralateral or ipsilateral eye, while neurons with an ODI at 0 would prefer both eyes equally. Consistent with previous findings (Hubel & Wiesel, 1962; Horton & Hocking, 1996; Livingstone, 1996; Zhang et al., 2024), neurons with similar eye preferences clustered together (Fig. 1C), indicating ocular dominance columns. The ODI followed unimodal distributions (Fig. 1D), in which the majority of neurons were binocular, showing comparable preferences for either eye. Only a small portion of neurons were monocular, being more responsive to the ipsilateral or contralateral eye.

In a third and last step, the grating stimulus and the flashing noise masker were presented dichoptically to evaluate the impact of CFS on neurons’ orientation responses (Fig. 2A). The results are summarized as population orientation tuning functions under the baseline no-CFS condition and the CFS condition following the procedure in Busse, Wade, and Carandini (2009). Specifically, neurons with similar orientation preferences were binned (bin width = 15°) relative to the target orientation for a total of 12 bins, and the resultant population orientation tuning functions based on the mean responses of these bins (Fig. 2C) were fitted with a Gaussian function. Compared to the baseline population orientation tuning functions, those under the influence of CFS displayed profound reductions in orientation response. The amplitude decreased by 84.18% in Monkey A and 60.78% in Monkey B on the basis of Gaussian fitting, while the slope decreased by 91.31% in Monkey A and 71.50% in Monkey B (Fig. 2B).

The impacts of CFS on population orientation tuning in two macaques.
A. Stimuli used in the CFS experiment for one macaque. The grating target was presented to one eye, which was dichoptically masked by a circular flashing masker presented to the other eye. The white dot was the fixation point. B. Exemplar baseline and CFS orientation tuning functions for neurons with different eye preferences. C. Population orientation tuning functions of all neurons without CFS as the baseline and with CFS. Data from two FOVs of each monkey were pooled due to highly consistent results. Solid curves are Gaussian fittings. D. Population orientation tuning functions of sub-groups of neurons with different eye preferences without and with CFS. Solid curves are fitting results using an ocular dominance-dependent gain control model elaborated in the supplementary material (Fig. S1). Error bars represent ±1 SE. E. The impacts of CFS on Fisher information. Fisher information is plotted as a function of relative orientation (to the neuron’s preferred orientation) without and with CFS. Shaded areas denote ±1 SE. F. The ratio of baseline/CFS Fisher information within 15° of neurons’ preferred orientations. Data from two FOVs of each monkey were pooled due to highly consistent results.

Furthermore, neurons were divided into three groups according to their ODIs, and the impacts of CFS on their respective orientation responses were examined: neurons preferring the grating eye (ODI > 0.2 or < -0.2, depending on whether the grating stimulation was ipsilateral or contralateral), binocular neurons (-0.2 <= ODI <= 0.2), and neurons preferring the masker eye (ODI < -0.2 or > 0.2 relative to the grating eye). Compared to the baseline condition, the orientation tuning of neurons preferring the masker eye was completely wiped out by CFS (Fig. 2B, D left), leading to flattened tuning curves with unmeasurable amplitudes or bandwidths. The orientation tuning of binocular neurons was either nearly completely wiped out (Monkey A) or substantially abolished (Monkey B) (Fig. 2B, D middle). There were 85.68% and 68.32% decreases in amplitude, and 92.64% and 77.07% decreases in slope, for Monkeys A and B, respectively. The orientation tuning of neurons preferring the grating eye was the least but still substantially affected (Fig. 2B, D right), with respective 77.78% and 41.75% decreases in amplitude and 85.23% and 57.56% decreases in slope for two monkeys.

To quantify the loss of V1 population orientation encoding due to continuous flash suppression (CFS), we compared the Fisher information (Averbeck & Lee, 2006) under both baseline and CFS conditions. Here, Fisher information serves as a statistical measure that reflects how much information the responses of neurons can provide about the grating orientation. Specifically, it indicates the sensitivity of neural responses to small changes in orientation, in that higher values signify greater precision in encoding orientation information. As illustrated in Fig. 2E, Fisher information was reduced by CFS primarily for orientations deviated by less than 15° from the neurons’ preferred orientations. The average Fisher information for stimuli within this 15° range decreased to 29.1% and 43.4% of the baseline values in two macaques, respectively (Fig. 2F), indicating the detrimental impact of CFS on the ability of V1 populations to accurately encode and represent orientation information, especially for orientations closely aligned with neuronal preferences.

What are the impact of CFS-induced suppression on V1 orientation decoding? To answer this question, which is crucial for understanding subconscious processing under CFS, we trained linear decoders to classify neighboring stimulus orientations (15°) in our experiments, as well as transformer models to reconstruct the stimulus images. Here, orientation classification was parallel to coarse orientation discrimination, and image reconstruction was parallel to orientation recognition, both suggesting the upper bounds of performance assuming an ideal observer.

For orientation classification, we trained an all-pair multiclass support vector machine (SVM) classifier to discriminate 12 orientations based on trial-by-trial population neural responses from all trials (Allwein, Schapire, & Singer, 2000). Decoders for different FOVs, ipsilateral/contralateral target presentations, and baseline vs. CFS conditions were trained separately. Under the baseline condition, the decoders achieved mean classification accuracies of 89.5 ± 2.0% and 91.5 ± 2.1% across ipsilateral and contralateral eye conditions in Monkeys A and B, respectively, in contrast to a chance level of 8.3% (1 out of 12). Under CFS, decoding accuracy slightly decreased in Monkey A (81.7 ± 1.9%) but remained stable in Monkey B (90.4 ± 2.1%, Fig. 3A). These results suggest that under CFS, there is still sufficient information for coarse orientation discrimination, even for Monkey A whose V1 neuronal responses were substantially suppressed.

Decoding consequences of CFS revealed by machine learning.
A. Multiway orientation classification accuracies under CFS vs. baseline conditions obtained using SVM decoders. Each datum represents results from a contralateral or ipsilateral grating condition with a specific FOV averaged across 10-fold cross-validations. Error bars denote 95% confidence intervals. B. A diagram of the transformer model for stimulus image reconstruction. C. Exemplar learning curves of transformer models under baseline and CFS conditions from two FOVs. The vertical dashed line indicates the epoch at which the baseline model reaches 75% of its total loss decrease between the two learning plateaus estimated using a sigmoid fit. D. Illustrations of corresponding reconstructed stimulus images on the basis of learning curves in C. E. Box plots of SSIM scores between the original and reconstructed images with baseline and CFS transformers. Within a FOV, results from contralateral eye and ipsilateral eye conditions are combined.

Next, we trained transformer models to reconstruct the grating images on the basis of corresponding neuronal responses under baseline and CFS conditions. The motivation for this part of the modeling work was the assumption that high-level tasks would be difficult to carry out if the basic stimulus features forming more complex patterns were not intact. Our transformer model contained an architecture that integrated embedding, self-attention, and unembedding modules, as well as a fully connected feedforward layer (Fig. 3B). The model inputs were the responses of all neurons within a FOV to the grating stimulus (ipsilateral and contralateral presentations of the same stimulus were modeled separately), and the model output was the reconstructed grating image. During the training process, the model typically reached two successive learning plateaus, where the validation loss temporarily stagnated (Fig. 3C). Moreover, the validation loss decreased more rapidly when training on the baseline neural response data compared to the CFS data. To compare the differences, we identified the epoch at which the validation loss of the baseline model reached 75% of its total decrease between the two plateaus using a sigmoid fit, and then we retrained both the baseline and CFS models up to this epoch.

The retained baseline models reconstructed the grating stimuli significantly better than the CFS models in Monkey A, but this discrepancy was less pronounced in Monkey B (Fig. 3D), consistent with the neuronal data that Monkey A exhibited substantially more CFS suppression than Monkey B in terms of population orientation tuning and Fisher information (Fig. 2). We used a structural similarity index (SSIM) (Brunet, Vrscay, & Wang, 2012) to quantify the reconstruction performances. Across the grating-presenting ipsilateral and contralateral eyes, the baseline models reconstructed the grating with median SSIMs of 0.52 and 0.61 for the two FOVs of Monkey A, and 0.57 and 0.63 for the two FOVs of Monkey B, respectively, while the corresponding SSIMs for the CFS models were 0.16 and 0.19 for Monkey A, and 0.55 and 0.53 for Monkey B (Fig. 3E).

To estimate the impact of CFS-induced V1 suppression on downstream processing, we also recorded neuronal responses from two V2 FOVs in Monkey A (FOVs 3 & 4). As anticipated, V2 neurons were binocular, with over 90% of them showing ODIs within the range of -0.2 to 0.2 (Fig. 4A). Similar to V1 results from the same monkey, CFS on average reduced the amplitudes of the population orientation tuning functions by 80.05% and the slopes by 89.44% (Fig. 4B). It also reduced the Fisher information to 33.1% of the baseline value (Fig. 4C). Furthermore, we applied the same orientation classification and image reconstruction procedures to the V2 data. For orientation classification, the SVM decoders achieved near-perfect performance in classifying 12 orientations under both baseline and CFS conditions, with classification accuracies exceeding 94% across all cases (Fig. 4D). In the image reconstruction task, the baseline model outperformed the CFS model. Specifically, the baseline transformer models reconstructed the stimulus images with the median SSIM values of 0.61 and 0.53 for the two V2 FOVs, respectively, which dropped to 0.42 and 0.18 in the CFS models (Fig. 4E), implying poorer or failed reconstruction of stimulus images.

Effects of CFS on V2 orientation responses.
A. OD maps of the two V2 FOVs of Monkey A (MA3 & MA4). B. Population orientation tuning functions for all orientation-tuned neurons with baseline and CFS conditions. Solid lines represent the results of Gaussian fittings. Error bars represent ±1 SE. C. Fisher information as a function of the relative orientation (to the neuron’s preferred orientation) with baseline and CFS conditions. Shaded areas denote ±1 SE. Fisher information was lower in MA4 due to higher variations in the data. D. Multiway orientation classification accuracies under CFS vs. baseline conditions using SVM decoders. Each datum represents results from a contralateral or ipsilateral grating condition with one FOV, averaged across 5-fold cross-validations. Error bars denote 95% confidence intervals. E. Box plots of SSIM scores between the original and reconstructed images with baseline and CFS transformers. Within a FOV, results from contralateral eye and ipsilateral eye conditions are combined.

Discussion

Our study demonstrates that CFS severely compromises orientation information in V1 neurons in an ocular dominance-dependent manner. Orientation information carried by neurons preferring the masker eye or both eyes is completely or nearly completely wiped out, while information carried by those preferring the grating eye is partially retained. Downstream, orientation information in V2 neurons is also substantially weakened. Linear decoding and transformer models suggest that CFS-compromised orientation information may still allow coarse orientation discrimination, but will most likely impair orientation recognition when the suppression is sufficiently strong as in Monkey A. Similarly strong suppression is also possible in Monkey B if the current grating contrast (0.45) is lower to be 0.1-0.3 as in many CFS experiments (Watanabe et al., 2011; Yuval-Greenberg & Heeger, 2013; Lunghi & Pooresmaeili, 2023; Alais, Coorey, Blake, & Davidson, 2024).

CFS-compromised V1 orientation information transmits for downstream visual processing, which may explain the unconscious orientation processing observed in human CFS studies. The “invisible” orientation information can be processed, as demonstrated by adaptation (Kanai, Tsuchiya, & Verstraten, 2006; Bahrami, Carmel, Walsh, Rees, & Lavie, 2008) and priming (Koivisto & Grassini, 2018) studies. The adaptation aftereffect is reduced compared to the visible condition but not entirely abolished (Kanai et al., 2006; Bahrami et al., 2008), likely a result of the degraded orientation information surviving CFS. For the same reason, the priming effect also decreases during trials in which the stimulus is rendered invisible by CFS, compared to those in which the stimulus is visible or partially visible (Koivisto & Grassini, 2018), as the degraded stimulus information provides insufficient evidence for decision-making, resulting in a diminished priming effect (Dehaene, 2011; Gomez, Perea, & Ratcliff, 2013).

Furthermore, our linear decoding and transformer results can help elucidate the debate on whether visual processing still functions at the categorization level under the influence of CFS. Previous studies have provided evidence for the preserved category information of the target, as demonstrated by tool-specific priming effects (Almeida et al., 2008; Almeida et al., 2010) and differential BOLD response patterns between tools and other object categories under CFS (Hesselmann, Hebart, & Malach, 2011; Tettamanti et al., 2017). However, an intriguing question is: Do these results rather reflect low-level feature differences between tools and other object categories? It has been reported that elongated objects, irrespective of their categorical affiliation, elicit similar priming effects (Sakuraba et al., 2012). Consistent with this, when tools are categorized by their shape (elongated vs. non-elongated), only the neural response patterns elicited by elongated tools can be discriminated from other object categories under CFS (Fogelson, Kohler, Miller, Granger, & Tse, 2014; Ludwig, Kathmann, Sterzer, & Hesselmann, 2015). In line with this interpretation, Hesselmann, Darcy, Rothkirch, and Sterzer (2018) reported that tool-specific priming under CFS does not reliably emerge under conditions designed to produce strong interocular suppression, suggesting that previously observed category effects may reflect access to low-level shape features rather than preserved category representations. Moreover, a recent study measuring the contrast thresholds required to both break from and suppress CFS found that stimuli exhibited similar suppression strengths across various categories (Alais et al., 2024). According to our results, when suppression is too strong to allow for stimulus reconstruction, as in the case of Monkey A (Fig. 3C), the orientation information under CFS may not accumulate to a level sufficient for resolving semantic category boundaries. The latter might require somewhat intact stimulus information, even if subconsciously. However, it could potentially assist in category discrimination when categorical differences lie in certain low-level shape dimensions like orientation, as coarse orientation discrimination appears unaffected by CFS suppression (Fig. 3A).

A related issue is the dorsal-ventral CFS hypothesis, which proposes that CFS suppression may disproportionately affect ventral visual processing while relatively preserving dorsal pathways involved in visuomotor functions, potentially allowing category- or action-related information to remain accessible under suppression (Fang & He, 2005). However, subsequent fMRI studies have failed to provide consistent support for this dissociation, reporting either stream-invariant awareness effects (Hesselmann & Malach, 2011; Ludwig et al., 2015; Tettamanti et al., 2017), residual signal in ventral rather than dorsal regions (Hesselmann et al., 2011; Fogelson et al., 2014), or residual low-level feature information/partial visibility rather than preserved dorsal processing (Ludwig et al., 2015). Although our study does not directly test dorsal-ventral dissociations, our V1 results provide a constraint on what information downstream visual pathways could access under suppression. When CFS- induced interocular suppression was strong enough and stimuli reconstruction was markedly reduced, as in the case of Monkey A, the information required for category-level or action-related processing may not be sufficient for high-level cortical representation.

Interocular suppression under CFS is known to vary substantially across individuals (Yamashiro et al., 2013; Gayet & Stein, 2017; Blake, Goodman, Tomarken, & Kim, 2019). This inter-individual variability may contribute to the heterogeneity observed in the CFS literature. We also found that the strength of V1 response suppression during CFS differed between two monkeys, as reflected by population orientation tuning functions (Fig. 2C), Fisher information (Fig. 2F), and reconstruction performance by the transformer (Fig. 3E). Several experimental factors may have contributed to the relatively weaker suppression observed in Monkey B. Because monkeys viewed the stimuli passively, we could not determine the dominant eye for each monkey (instead we switched the eyes and averaged the results), and the target was presented at relatively high contrast. Both factors are known to reduce the effectiveness of CFS suppression (Yang, Blake, & McDonald, 2010; Yuval-Greenberg & Heeger, 2013). In addition, the random-noise masker we used might not be as effective as Mondrian patterns (Hesselmann, Darcy, Ludwig, & Sterzer, 2016). If reduced stimulus contrast and a Mondrian masker were used, we predict that CFS suppression in Monkey B would strengthen, potentially approaching the level observed in Monkey A. Nevertheless, it is worth emphasizing that our main conclusions are primarily based on data from Monkey A, who exhibited much stronger CFS suppression.

Materials and Methods

Monkey preparation

Monkey preparation was identical to procedures reported in previous studies (Ju, Guan, Tao, Tang, & Yu, 2020; Guan, Ju, Tao, Tang, & Yu, 2021; Zhang et al., 2024). Two rhesus monkeys (Macaca mulatta, aged 5 and 6, respectively) underwent two sequential surgeries under general anesthesia and strictly sterile conditions. During the first surgery, a 20-mm diameter craniotomy was performed on the skull over V1. The dura was opened and multiple tracks of 100-150 nil AAV1.hSynap.GCaMP5G.WPRE.SV40 (AV-1-PV2478, titer 2.37e13 (GC/ml), Penn Vector Core) were pressure-injected at a depth of ∼350 µm at multiple locations. The dura was then sutured, the skull cap was re-attached with three titanium lugs and six screws, and the scalp was sutured. After the surgery, the animal was returned to the cage and treated with injectable antibiotics (Ceftriaxone sodium, Youcare Pharmaceutical Group, China) for one week. Postoperative analgesia was also administered. The second surgery was performed 45 days later. A T-shaped steel frame was installed for head stabilization, and an optical window was inserted onto the cortical surface. Data collection could start as early as one week later. More details about the preparation and surgical procedures can be found in Li et al. (2017). The procedures were approved by the Institutional Animal Care and Use Committee, Peking University.

Behavioral task

After a ten-day recovery period following the second surgery, monkeys were placed in a primate chair with head restraint. They were trained to hold fixation on a small white spot (0.2°) with eye positions monitored by an Eyelink-1000 eye tracker (SR Research) at a 1000-Hz sampling rate. During the experiment, trials with the eye position deviated 1.5° or more from the fixation before stimulus offset were discarded as ones with saccades and repeated.

Visual stimuli and experimental procedures

Visual stimuli were generated with a Matlab-based Psychtoolbox-3 software (Pelli & Zhang, 1991) and presented on a ROG Swift PG278QR monitor (refresh rate = 120 Hz, resolution = 2560 × 1440 pixel, pixel size = 0.23 mm × 0.23 mm). The screen luminance was linearized by an 8-bit look-up table, and the mean luminance was 47 cd/m². The viewing distance was 60 cm.

A drifting square-wave grating (spatial frequency = 4 cpd, contrast = full, speed = 3 cycles/sec, starting phase = 0°, size = 0.4° in diameter) was first used to determine the population receptive field (pRF) location, shape, and approximate size associated with a specific FOV. The same stimulus was also monocularly presented to confirm the V1 location as ocular dominance columns would appear. This fast process used a 4 × objective lens mounted on the two-photon microscope and did not provide cell-specific information. The recorded V1 pRFs were centered at ∼0.90° eccentricity in Monkey A and ∼1.93° in Monkey B. V2 pRFs were centered at ∼0.67° in Monkey A. All pRFs were approximately circular with a diameter of 0.9°.

The target stimulus used in the experiments was a 0.45-contrast circular-windowed square-wave grating. It drifted at 4 cycles per second in opposite directions perpendicular to the orientation with a starting phase of 0°, and varied at 12 orientations (0° to 165° in 15° increments) and two spatial frequencies (3 & 6 cpd) trial by trial. The circular envelope had a diameter of 1°, which approximated the size of pRFs for recorded FOVs, with the edge blurred by a linear ramp starting at a radius of 0.38°. The flashing masker was a circular white noise pattern with a diameter of 1.89°, a contrast of 0.5, and a flickering rate of 10 Hz. The white noise consisted of randomly generated black and white blocks (0.07° × 0.07° each). The target grating and the flashing masker were presented through a pair of NVIDIA 3D Vision 2 active shutter glasses. To mitigate the ghost image, a low contrast (RMS contrast = 0.08) white noise was added to the grating. The width of the noise element was half of the bar width of the square grating, and the white noise was regenerated every frame.

Each block of trials consisted of four groups of stimuli: binocular, monocular, CFS, and flashing masker-only. In the binocular group, the grating was presented to both eyes simultaneously. The relevant data were only used to help identify ROIs and orientation-tuned neurons along with data from other stimulus conditions. In the monocular group, the grating was monocularly presented to the contralateral or ipsilateral eye, which served as the baseline conditions without the influences of CFS. In the CFS group, the grating and flashing masker were presented dichoptically. In the flashing masker-only group, the flashing masker was presented monocularly to either eye. Each stimulus condition was repeated for 10-12 trials. For conditions involving the grating, the trials were split for two opposite drifting directions. A block of trial contained 242 trials, two trials for each stimulus condition, with the order of stimulus conditions arranged in a pseudorandom manner. There were 5 to 6 blocks of trials with each FOV.

Each stimulus was presented for 1000 ms, followed by an inter-stimulus interval of 1500 ms, allowing sufficient time for the calcium signals to return to the baseline level (Guan, Zhang, Zhang, Tang, & Yu, 2020). For each FOV, the recording was completed in a single session with 5-6 experiment blocks and lasted for 2-3 hours.

Two-photon imaging

Two-photon imaging was performed using a FENTOSmart two-photon microscope (Femtonics), along with a Ti:sapphire laser (Mai Tai eHP, Spectra Physics). GCaMP5 was chosen as the indicator of calcium signals because the fluorescence activities it expresses are linearly proportional to neuronal spike activities within a wide range of firing rates from 10-150 Hz (Li et al., 2017). During imaging, a 16× objective lens (0.8 N.A., Nikon) with a resolution of 1.6 µm/pixel was used, along with a 1000 nm femtosecond laser. A fast resonant scanning mode (32 fps) was chosen to obtain continuous images of neuronal activity (8 frames per second after averaging every 4 frames). The strength of fluorescent signals (mean luminance of a small area) was monitored and adjusted if necessary for the drift of fluorescent signals. Two response fields of view (FOVs) measuring 850 × 850 µm² in V1 were selected in both macaques, and two FOVs of the same size in V2 were selected in Macaque A.

Imaging data analysis: Initial screening of ROIs

Data were analyzed with customized MATLAB codes. A normalized cross-correlation based translation algorithm was used to reduce motion artifacts (Li et al., 2017). Then the fluorescence changes were associated with corresponding visual stimuli through the time sequence information recorded by Neural Signal Processor (Cerebus system, Blackrock Microsystem). By subtracting the mean of the 4 frames before stimuli onset (F0) from the average of the 6th-9th frames after stimuli onset (F) across 5 or 6 repeated trials for the same stimulus condition (same orientation, spatial frequency, size, and drifting direction), the differential image (ΔF = F -F0) was obtained.

For a specific FOV, the regions of interest (ROIs) or possible cell bodies were decided through sequential analysis of 242 differential images in the order of CFS, monocular, binocular, and flashing masker-only conditions. CFS conditions consisted of 96 (2×2×12×2 = 96) differential images, with the grating presented to either eye (2), at two spatial frequencies (2), twelve orientations (12), and two motion directions (2). Monocular conditions were identical to the CFS conditions except that the flashing masker was absent. In the binocular conditions, gratings at two spatial frequencies (2), twelve orientations (12), and two motion directions (2) were binocularly presented, resulting in 48 differential images. The flashing masker-only conditions consisted of the flashing masker presented to either eye, resulting in 2 differential images.

The first differential image was filtered with a band-pass Gaussian filter (size = 2–10 pixels), and connected subsets of pixels (>25 pixels, which would exclude smaller vertical neuropils) with average pixel value >3 standard deviations of the mean brightness were selected as ROIs. Then the areas of these ROIs were set to mean brightness in the next differential image before the bandpass filtering and thresholding were performed. This measure gradually reduced the standard deviations of differential images and facilitated the detection of neurons with relatively low fluorescence responses. If a new ROI and an existing ROI from the previous differential image overlapped, the new ROI would be on its own if the overlapping area OA < 1/4 ROI_new, discarded if 1/4 ROI_new < OA < 3/4 ROI_new, and merged with the existing ROI if OA > 3/4 ROI_new. The merges would help smoothen the contours of the final ROIs. This process went on through all differential images twice to select ROIs. Finally, the roundness for each ROI was calculated as:

where A was the ROI’s area, and P was the perimeter. Only ROIs with roundness larger than 0.9, which would exclude horizontal neuropils, were selected for further analysis.

Imaging data analysis: Orientation tuning and ocular dominance

The ratio of fluorescence change (ΔF/F0) was calculated as a neuron’s response to a specific stimulus condition. For a specific neuron’s response to a specific stimulus condition, the F0n of the n-th trial was the average of 4 frames before stimulus onset (-500 -0 ms), and Fn was the average of the 5th-8th, 6th-9th, or 7th-10th frames after stimulus onset, whichever was the greatest. F0n was then averaged across 10 or 12 repeated trials to obtain the baseline F0 for all trials (to reduce noise in the calculation of responses), and ΔFn/F0 = (Fn- F0)/F0 was taken as the neuron’s response to this stimulus at the n-th trial.

Several steps were taken to determine whether a neuron was orientation-selective. For each monocular or binocular condition, the orientation and SF eliciting the maximal response were designated as the neuron’s preferred SF and orientation. We then compared responses across all 12 orientations at the preferred SF by performing a non-parametric Friedman test to determine whether the neuron’s responses at various orientations were significantly different from each other. To reduce Type I errors, the significance level was set at α = 0.01. Neurons that passed the Friedman test at least under one viewing condition were selected as orientation-tuned neurons.

The ocular dominance index (ODI) was calculated to characterize each neuron’s eye preference: ODI = (R_i – R_c)/(R_i +R_c), where R_i and R_c were the neuron’s peak responses at the best orientation and SF to ipsilateral and contralateral monocular grating conditions, respectively. Neurons with an ODI of -1 or 1 would be completely contralateral or ipsilateral eye dominant, and of 0 would be equally dominant by both eyes.

Population orientation tuning

For each neuron, neural responses at the preferred SF were selected for tuning analysis. To derive population orientation tuning curves under a specific condition, we categorized neurons into twelve orientation preference bins according to their preferred orientations (bin width = 15°). For each orientation presented, the responses of all orientation preference bins were reorganized according to the relative orientation preference. Subsequently, neuronal responses of the same relative orientation preference were averaged to generate the final population orientation tuning function. For CFS conditions, the selected SF and binning procedures were the same as their corresponding monocular conditions.

The population orientation tuning function was fitted with a Gaussian model with MATLAB’s nonlinear least-squares function ‘lsqnonlin’:

where R(θ) was the response at orientation θ, free parameters a, θ₀, σ, and b were the amplitude, peak orientation, standard deviation of the Gaussian function (equal to half width at half height), and minimal response, respectively.

The population orientation tuning curves for different eye preference groups were derived using the same procedure, with additional binning of neurons according to their ODI. To obtain the tuning curve of the neurons preferring the eye seeing the grating, responses of neurons with an ODI < -0.2 (preferentially responding to the contralateral eye) under contralateral eye grating presentation and those with an ODI > 0.2 (preferentially responding to the ipsilateral eye) under ipsilateral eye grating presentation were combined. Similarly, for neurons preferring the eye seeing the masker, responses of neurons with an ODI < -0.2 under ipsilateral eye grating presentation and those with an ODI > 0.2 under contralateral eye grating presentation were combined. For binocular neurons (-0.2 < ODI < 0.2), responses under both grating presentation conditions were combined.

Fisher information

The Fisher information assesses the amount of information contained in a neuron population using an optimal decoder (Pouget, Deneve, Ducom, & Latham, 1999). Assume independent Gaussian noise distributions, the Fisher information for a population of N neurons was given as

where 𝑓_i(𝜃) was the mean activity of neuron i in response to the presentation angle, θ, and 𝑓_i′(𝜃) was its derivative with respect to θ. We fitted each neuron’s response tuning 𝑓_i(𝜃) and variance tuning 𝜎_i(𝜃) with Gaussian functions and calculated the averaged Fisher information across neurons at each orientation.

SVM-based orientation classification

In the orientation classification task (Fig. 3A), we trained a support vector machine (SVM) with a one-vs.-one coding scheme to classify orientations from standardized population neural activity. The SVM decoder was implemented using MATLAB’s ‘fitcsvm’ function with a linear kernel. To prevent overfitting and evaluate the generalization ability of the model, we employed a 5-fold cross-validation procedure, and the model performance on the validation dataset was reported.

Decoders were trained independently for each experiment condition, resulting in 4 models per FOV (contralateral/ipsilateral × baseline/CFS). Neural response data from two spatial frequencies were used as input, with each neuron treated as a feature. In this way, each model were trained and tested on 288 or 240 samples (2 SFs × 12 orientations × 12/10 repeats).

The transformer model

Model input, output, and training procedure

We implemented a transformer-based model to reconstruct grating stimuli from population neuronal responses recorded under different experiment conditions. The model input was a vector of neuronal responses, each corresponding to an individual neuron, and the output was the reconstructed grating image of size 70 × 70 pixels.

The transfromer was trained independently for each experimental condition, resulting in four models per FOV (contralateral/ipsilateral × baseline/CFS). Pilot experiments revealed that our original dataset was insufficient for the model to converge. To address this, we augmented the dataset to four times its original size before training. Augmentation was performed by sampling from a normal distribution centered at each neuron’s response mean, with a standard deviation equal to its original standard deviation. Within the augmented dataset, 6% was reserved for validation. Responses were normalized to [0, 1] before fed into the model.

We implemented a two-phase training procedure to assess the reconstruction ability of models trained on different neural data. During the training process, the model typically reached two learning plateaus, where the validation loss temporarily stagnated (Fig. 3C). In the first training session, we analyzed the learning curve to determine the epoch at which the baseline model’s validation loss had completed 75% of its total decrease between the two plateaus. This was estimated using a modified sigmoid fit:

where A and C defined the function range, b was the symmetry point, k was the steepness parameter, and t represented the epoch number, counted from epoch 500 (initial epochs were discarded due to a drastic drop in validation loss across all training runs, see Fig. 3C). The 75% decrease point was computed as:

In the second training session, we retrained both models up to the identified epoch and evaluated their performances.

The model was trained to minimize the mean squared error (MSE) between the reconstructed and actual stimuli. Optimization was performed using RMSprop with a learning rate of 0.00005 and a smoothing factor ρ = 0.85.

Model structure

Each neuron’s response was embedded into a higher-dimensional space using a learned weight vector as follows:

where the 𝑅³(n×1) represented the original response vector from n neurons, and 𝑊¹² (n× d_model) was the embedding weight matrix, with each row corresponding to a neuron-specific weight vector. Here we used d_model = 2. The symbol ⊙ denoteed row-wise multiplication, such that the ith response 𝑟³was multiplied by both elements in its embedding weight vector 𝑤^12.. The resulting embedding matrix 𝑅⁺ (n×d_model) contained the high-dimensional representations of the neuronal responses.

The enriched embedding matrix was then passed through a self-attention module. In this module, 𝑅¹was first projected into queries (Q), keys (K), and values (V) through independent learnable weight matrices, respectively. Then the attention map was computed as:

where 𝑑_k represented the dimensionality of the key vectors, which scaled the dot-product to control the variance of the attention scores.

The output of self-attention was calculated as:

The output from self-attention was unembedded by projecting each neuron’s high-dimensional representation back to one-dimensional. A feedforward layer transformed the unembedded vector into a stimulus vector, which was then reshaped into the final 70 × 70 image.

Model evaluation

The original, non-augmented data was used for analysis, which had been seen during training in both the training and validation sets. We used a structural similarity index (SSIM) (Brunet et al., 2012) to quantify the reconstruction performances.

The SSIM (Brunet et al., 2011) between two images x and y (both 70×70) is defined as:

where 𝜇_x and 𝜇_y are the mean intensities, 𝜎_x² and 𝜎_y² are the variances, 𝜎_xy is the covariance, and 𝑐₁ and 𝑐₂ are constants for numerical stability.

Data availability

The code could be found at github: https://github.com/caviaryusi/CFS/blob/main/README.md. The data could be found at HuggingFace (Hugging Face, RRID:SCR_020958): https://huggingface.co/datasets/chencaixia/CFS_2p.

Acknowledgements

This study was supported by a Natural Science Foundation of China STI2030-Major Projects grant (2022ZD0204600) to SMT and CY.

Additional information

Funding

MOST | National Natural Science Foundation of China (NSFC) (2022ZD0204600)

Cong Yu

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Two-photon imaging and ocular dominance mapping.

The impacts of CFS on population orientation tuning in two macaques.

Decoding consequences of CFS revealed by machine learning.

Effects of CFS on V2 orientation responses.

Discussion

Materials and Methods

Monkey preparation

Behavioral task

Visual stimuli and experimental procedures

Two-photon imaging

Imaging data analysis: Initial screening of ROIs

Imaging data analysis: Orientation tuning and ocular dominance

Population orientation tuning

Fisher information

SVM-based orientation classification

The transformer model

Model input, output, and training procedure

Model structure

Model evaluation

Data availability

Acknowledgements

Additional information

Funding

References

Article and author information

Author information

Cai-Xia Chen#

Xin Wang#

Dan-Qing Jiang

Shi-Ming Tang

Cong Yu

Author Notes

Version history

Cite all versions

Copyright

Metrics

Cai-Xia Chen

Xin Wang