Introduction

Oscillations are ubiquitous in the cortex (Buzsáki et al., 2013) and can synchronize both within and between cortical areas (Anand et al., 2023; Lowet, Roberts, Peter, Gips, & De Weerd, 2017; Melloni et al., 2007), but whether this contributes to neural information processing remains a matter of debate (Doelling & Florencia Assaneo, 2021; Duecker et al., 2021; Fernandez-Ruiz et al., 2023; Ray & Maunsell, 2015; Roelfsema, 2023). Early suggestions that synchrony in the gamma frequency band (30 – 80 Hz) plays a central role in visual feature binding (Singer, 1999; Uhlhaas et al., 2008) have been called into question based on observations that the gamma frequency is not consistent across stimulus features (Ray & Maunsell, 2010, 2015; Shirhatti et al., 2022) and that gamma synchrony depends on distances between image elements (Roelfsema, 2023; Roelfsema et al., 2004), making it difficult to achieve synchrony among neural assemblies encoding components of the same object (Dubey & Ray, 2020; Roelfsema, 2023). Alternatively, it has been proposed that such stimulus dependence of gamma synchrony facilitates, rather than hinders, their functional significance for visual processing by allowing contiguous neural assemblies that share a sufficiently similar oscillation frequency to synchronize into meaningful groups, while also blocking synchrony among assemblies with substantial frequency difference or physical separation (Lowet, Roberts, Hadjipapas, Peter, van der Eerden, et al., 2015; Lowet, Roberts, Peter, Gips, & De Weerd, 2017). Here we show computational and empirical support for this view.

Analyzing a visual scene requires integration of features into coherent objects (feature binding), but also segregation of features belonging to distinct objects (feature separation). It remains unclear how this is achieved, but the stimulus dependence of gamma may be critical for a synchrony-based neural grouping mechanism that achieves both feature binding and separation. This idea is rooted in the theory of weakly coupled oscillators (TWCO) which describes the preconditions for synchrony among coupled oscillators (Acebrón et al., 2005; Ermentrout et al., 2019; Kuramoto, 1984; Neu, 1979; Strogatz, 2000). A group of coupled oscillators synchronizes if the discrepancy in their frequencies, referred to as their detuning, is overcome by the strength of their coupling. Thus, synchrony can occur even in the presence of strong detuning, if the coupling strength is sufficiently high, whereas if the coupling strength is low, synchrony can only occur if the detuning is also minimal. This relationship can be graphically depicted in an Arnold tongue(Coombes & Bressloff, 1999; Pikovsky et al., 2001), which shows the regions where synchrony occurs based on the balance between detuning and coupling strength (Figure 1). These abstract principles are concretely realized in early visual cortex. Neural assemblies exhibit gamma oscillations in their population activity at frequencies that are directly related to stimulus features such as spatial frequency and orientation (Dubey & Ray, 2020; Henrie & Shapley, 2005; Shapira et al., 2017), and particularly contrast (Hadjipapas et al., 2015; Lowet, Roberts, Hadjipapas, Peter, van der Eerden, et al., 2015; Roberts, Lowet, Brunet, TerWal, Tiesinga, Fries, & DeWeerd, 2013). In early visual cortical areas, coupling strength between neural assemblies is directly related to the efficacy of lateral anatomical connectivity, which declines with cortical distance (Boucsein et al., 2011; Gilbert & Wiesel, 1983; Lowet, Roberts, Hadjipapas, Peter, Eerden, et al., 2015; Lowet, Roberts, Peter, Gips, & De Weerd, 2017; Stettler et al., 2002; Ts’o et al., 1986). In conjunction with the retinotopic organization of early visual cortex, this implies that neural assemblies encoding nearby visual regions are more strongly coupled. Taken together, synchrony in early visual cortex could occur across widely spaced neuronal assemblies in response to scenes with low feature heterogeneity, but only for closely spaced assemblies in response to scenes with high feature heterogeneity. Indeed, a recent electrophysiological study in macaque V1 in which cortical distance and stimulus contrast heterogeneity were parametrically manipulated has confirmed that gamma synchrony behaves in line with the principles of TWCO (Lowet, Roberts, Peter, Gips, & De Weerd, 2017).

Schematic illustration of synchronization principles in visual cortex and stimulus design.

a, Four scenarios showing two texture elements (Gabor annuli) falling within receptive fields of neural assemblies (purple) in early visual cortex. Contrast determines oscillation frequency (orange), with higher contrasts leading to higher frequencies. Coupling strength (line thickness of connecting arrow) depends on cortical distance, reflecting the distance between texture elements in the visual field. b, Arnold tongue: triangular region shows combinations of detuning and coupling strength that allow synchrony (light grey). Open circles indicate scenarios: (I) Strong coupling due to close proximity, moderate detuning due to similar contrast inputs. (II) Weaker coupling due to increased distance, same detuning as (I). Closed circles indicate scenarios: (III) Weak coupling due to large distance, same detuning as (I, II). (IV) Distance and coupling as in (II), large detuning due to heterogeneous contrast inputs. Scenarios I and II are conducive to synchrony, scenarios III and IV are not. c, Example full texture stimulus comprised of nonoverlapping Gabor annuli on irregular grid. The lower right quadrant contains a vertical figure (magenta outline, not shown to participants). Blue dot: fixation point. Axes separating quadrants shown for illustration only, not visible to participants. On a given trial, the figure may be vertical or horizontal and participants indicated the figure’s orientation. d, Figure region cut-outs illustrating experimental conditions. Grid coarseness (five steps) manipulates coupling strength for both figure and background. Contrast heterogeneity (five steps) manipulates detuning within figure. Background always at maximum heterogeneity (equivalent to rightmost column). The 25 cut-outs show all combinations of grid coarseness and contrast heterogeneity used in experiments.

A synchrony-based grouping mechanism based on these principles has been successfully exploited for image segmentation in machine vision (Fang et al., 2014; Nikonov et al., 2020). Here we bring these perspectives together to test whether human vision likewise behaves in accordance with TWCO principles. Specifically, we hypothesize that the perception of texture-defined objects will be related to the density of the texture elements and the level of contrast heterogeneity between them, respectively correlating to the distance between neuronal groups representing the elements and their level of detuning. (see Figure 1 for an illustration). To test this hypothesis, we used a figure-ground segregation paradigm wherein human observers reported the orientation of a rectangular figure region in a texture stimulus composed of Gabor annuli. The figure was defined by a less heterogeneous contrast distribution between the elements, compared to elements in the background. We manipulated contrast heterogeneity and grid coarseness (distance between elements) as proxies of detuning and coupling strength, respectively. Additionally, we investigated whether this synchrony-based grouping mechanism is adaptive by using a perceptual learning paradigm in which participants improved their perceptual performance over 8 daily sessions. By formalizing the principles of TWCO in a V1 oscillator model augmented with a simple Hebbian learning mechanism, we derived quantitative predictions from the theory. Our psychophysics results align well with the synchrony exhibited by the model, supporting the idea that stimulus dependent gamma synchrony is behaviorally relevant.

Results

Eight participants (6 female, mean age = 23.75, standard deviation = 6.453) performed a two-alternative forced choice texture discrimination task. Sample size was determined based on comparable studies investigating visual perception and perceptual learning in humans (Intoy et al., 2024; Lange et al., 2020; Tesileanu et al., 2020). Texture stimuli consisted of nonoverlapping Gabor annuli on an irregular grid (see Figure 1c). Each Gabor annulus was characterized by its own local contrast. A rectangular figure contained within a single visual quadrant was defined by less heterogeneity in the contrasts of local Gabor elements compared to the background, with no difference in mean contrast. Participants indicated the orientation (horizontal vs vertical) of the figure while fixating centrally. We manipulated two factors. The first was contrast heterogeneity within the figure, which we operationalized as the width of a uniform distribution from which annulus contrast values were drawn. This distribution was always centered around a mean contrast of 50%. The background always exhibited maximum contrast heterogeneity (from 0% to 100%). The second factor was the coarseness of the grid (distance between annuli). This manipulation affected figure and background equally. Both factors were manipulated in five steps resulting in 25 conditions (see Figure 1d). Within an experimental session, participants completed 50 blocks of each condition (750 trials). Participants received feedback after each trial in the form of color changes of the fixation point. Eye-tracking was used to ensure fixation, and trials where fixation was broken during either the fixation period preceding the stimulus, or during stimulus presentation, were aborted and repeated at a randomly chosen time later in the session. The experiment consisted of 9 consecutive sessions (8 training and 1 transfer session). In the transfer session, the rectangular figure was moved to the diagonally opposite quadrant.

To provide a mechanistic link between contrast heterogeneity, grid coarseness and synchrony in early visual cortex on the one hand, and quantitative predictions of discrimination accuracy on the other, we developed a phase-oscillator model of V1. The model represents a patch of visual space corresponding to the figure region in our psychophysics experiments, mapped onto V1 using a complex-logarithmic topographic transformation (Balasubramanian & Schwartz, 2002; Schwartz, 1980). Each model oscillator represents a neural assembly receiving local input from the visual field. The frequency of each oscillator is a quasi-linear function of the contrast falling inside its receptive field that has previously been determined in macaques (Evers et al., 2021; Roberts, Lowet, Brunet, TerWal, Tiesinga, Fries, & DeWeerd, 2013). Receptive fields are modelled as isotropic 2D Gaussian functions with sizes that scale with eccentricity according to human cortical magnification (Freeman & Simoncelli, 2011). Furthermore, we included recurrent connections between phase oscillators reflecting the lateral anatomical connectivity among columns in V1 and other low-level visual areas(Crist et al., 2001). In line with anatomical data (Amir et al., 1993; Eckhorn, 1994; Gilbert & Wiesel, 1989; Ts’o et al., 1986), coupling strength in our model declines exponentially with physical distance along the cortical surface. Our model captures this with two parameters estimated from independent neurophysiological results in macaqes (Lowet, Roberts, Peter, Gips, & De Weerd, 2017): maximum coupling strength γ and coupling decay factor λ. The model was exposed to the same figure region texture stimuli as human participants, with manipulations of contrast heterogeneity and grid coarseness. We quantified the model’s degree of zero-lag synchrony as the magnitude of the Kuramoto order parameter (synchronization index).

In our V1 model, learning is implemented to occur offline between simulated sessions, following a Hebbian-type learning rule that adapts coupling strengths based on pairwise phase-locking values (PLVs) accumulated over trials within a session. The contribution of each trial to learning is weighted by the probability of a correct response, determined by a psychometric function relating model synchrony to performance. This learning mechanism implies that connections between oscillators that exhibited coherence on correct trials are strengthened, bounded by the maximum coupling strength. Incorporating an upper bound on connections was motivated by findings that synaptic strength is limited by intrinsic properties of vesicular docking (Malagon et al., 2020) and that late long-term potentiation asymptotes after several repeated experiences (Kandel et al., 2000). Free parameters of the learning mechanism were estimated using data from the first two experimental sessions. To maximally disentangle data used for adjusting model parameters and data used for testing model predictions, we employed a leave-one-out cross-validation procedure. Model parameters were repeatedly estimated from the first two sessions in seven of our eight participants and the resulting model was used to predict performance in the remaining six sessions of the left-out participant. Our model rests on the assumption that learning-induced structural changes in early visual cortex are specific to the retinotopic locations of the trained stimuli. We evaluated whether this assumption holds for our human participants using the transfer session following the main training period. In the transfer session, participants performed the texture discrimination task with the figure region moved to a visual quadrant that had not been previously exposed to the figure. If learning is indeed local, participants’ performance in the transfer session should resemble that of early training sessions, indicating a reset in performance for the new retinal location. On the other hand, if learning generalizes across retinal locations, performance in the transfer session should maintain the improvements seen in later training sessions. By comparing transfer session performance to both early and late training sessions, we can evaluate the validity of our model’s assumption.

Texture Segregation Performance Reveals a Behavioral Arnold Tongue

We first asked the question whether the factors that determine synchrony among coupled oscillators, frequency detuning and coupling strength, are predictive of human ability to segregate a rectangular figure from its background in texture stimuli. In early visual cortex, oscillation frequencies and coupling strength directly map onto contrast (Hadjipapas et al., 2015; Lowet, Roberts, Hadjipapas, Peter, Eerden, et al., 2015; Roberts, Lowet, Brunet, TerWal, Tiesinga, Fries, & DeWeerd, 2013) and physical proximity (Gilbert & Wiesel, 1983; Lowet, Roberts, Hadjipapas, Peter, Eerden, et al., 2015; Lowet, Roberts, Peter, Gips, & De Weerd, 2017; Stettler et al., 2002; Ts’o et al., 1986) of texture elements respectively. If texture segregation indeed depends on the synchrony principles identified by the theory of weakly coupled oscillators (TWCO), we expect discrimination accuracy to reveal a “behavioral” Arnold tongue in the space defined by contrast heterogeneity and grid coarseness and that each factor significantly predicts accuracy.

To test these predictions, we analyzed the main effects of contrast heterogeneity and grid coarseness, as well as their interaction, on discrimination accuracy using Generalized Estimation Equations (GEE) for logistic regression. This allowed us to use response correctness on individual trials as outcome variable rather than aggregated accuracy, while also accounting for within-subject variability. Note that while the principles of TWCO primarily predict main effects of contrast heterogeneity and grid coarseness, we included their interaction to capture complex relationships specific to V1 that are not immediately apparent from the general theory. Specifically, coupling strength decays exponentially with cortical distance, which itself depends on cortical magnification. This leads to a highly nonlinear relationship between grid coarseness and coupling strength that is likely to manifest as an interaction. In line with our expectations, both increased contrast heterogeneity (β =−5.64, 95% CI [−8.23, −3.03], Std. Error= 1.32, Wald Chi-Square=18.361, p < 0.0001, OR = 0.004, 95% CI for OR [0.0003, 0.048]) and grid coarseness (β = −3.00, 95% CI [−4.39, −1.61], Std. Error= 0.71, Wald Chi-Square= 17.778, p < 0.0001, OR = 0.05, 95% CI for OR [0.012, 0.200]) significantly reduced discrimination accuracy. Furthermore, Figure 2a,b shows a behavioral Arnold tongue as a triangular region of high accuracy (≥75% correct). The interaction between contrast heterogeneity and grid coarseness was also significant (β =3.29, 95% CI [1.64, 4.94], Std. Error= 0.84, Wald Chi-Square=15.267, p < 0.0001, OR = 26.84, 95% CI for OR [5.16, 139.77]). Our V1 model, which incorporates cortical magnification and exponential decay of coupling strength, similarly exhibited the interaction (see Supplementary Materials). This underscores that the specific characteristics of the early visual cortex contribute significantly beyond the general principles of TWCO. Significant effects of contrast heterogeneity, grid coarseness and their interaction were observed for most of our participants individually, underscoring the robustness of our findings (see Supplementary Materials for Figures and Statistics).

Behavioral and simulated Arnold tongues.

a, Average discrimination accuracy for each of the 25 experimental conditions revealed a behavioral Arnold tongue in the space defined by contrast heterogeneity and grid coarseness. Contrast heterogeneity translates into the variance of frequencies (detuning) whereas grid coarseness translates into cortical distance (coupling strength). b, Fitted behavioral Arnold tongue after fitting a two-dimensional psychometric curve to the results in (a). The dashed line indicates the combination of contrast heterogeneity and grid coarseness corresponding to 75% accuracy. c, Zero-lag synchrony among model oscillators showing an Arnold tongue in the same parameter space as (a). Simulation conditions matched the 25 experimental conditions. d, High-resolution visualization of zero-lag synchrony, using 900 conditions (30 levels each of contrast heterogeneity and grid coarseness) to provide a more detailed representation of the Arnold tongue.

The synchrony exhibited by our model (Figure 2d), when exposed to the same stimuli as our participants, resembled behavioral discrimination accuracy (Figure 2a). Indeed, a GEE for logistic regression revealed that model synchrony was a significant predictor of discrimination accuracy (β = 1.99, 95% CI [1.14, 2.84], Std. Error= 0.435, Wald Chi-Square=20.924, p ≪ 0.0001, OR = 7.32, 95% CI for OR = [3.13, 17.12]). This fit is remarkable given that key model parameters (maximum coupling strength and coupling decay factor) were obtained from independent observations in macaques (Lowet, Roberts, Peter, Gips, & De Weerd, 2017). To further validate these pre-set parameters and assess their performance in the context of all possible parameter combinations, we conducted a comprehensive exploration of the parameter space. We used Pearson correlation and weighted Jaccard similarity to assess the similarity between the behavioral Arnold tongue and the Arnold tongue predicted by our V1 model for various combinations of maximum coupling strength and coupling decay factor. We included both metrics because the former is more widely known while the latter is more conservative. To compute weighted Jaccard similarity between two sets of real numbers, they need to fall within the same range. For this reason, we applied min-max normalization to ensure that discrimination accuracy falls within a zero-to-one range to match the range of the synchronization index. This procedure yielded the similarity distributions depicted in Figure 3. The point marked with the black circle reflects the parameter combination that were based on independent macaque data (24.63 and 0.22, respectively) and exclusively used for our model predictions. For similarity measured using Pearson correlation, this combination within the region of optimal parameter values for our behavioral results. For the more conservative weighted Jaccard similarity, the combination lies just at the edge of the optimal region. This shows that our model parameters, although estimated from neurophysiological recordings in monkeys entirely separate from our behavioral data, were nevertheless close to optimal in accounting for our results. That they were not quite optimal is likely because horizontal connections in human visual cortex extend further than those in the macaque (Amir et al., 1993; Burkhalter & Bernardo, 1989; Lund, Yoshioka, & Levitt, 1993; Voges et al., 2010; Yoshioka et al., 1996) and may thus be associated with a slightly smaller coupling decay factor. This would move model parameters into the optimal regime.

Comparison of behavioral and simulated Arnold tongues across coupling parameter space.

a, Pearson correlation between the behavioral Arnold tongue and simulated Arnold tongues obtained from models with coupling weights determined by different combinations of maximum coupling strength and coupling decay factor. The point labelled by the black circle shows the combination of parameters that were obtained from independent (macaque) data. b, Weighted Jaccard similarity between the behavioral Arnold tongue and simulated Arnold tongues. This metric is displayed across the same parameter space as in (a).

Interestingly, this optimal regime extends diagonally. The parameter combination we derived from the neurophysiological data falls among the intermediate values of maximum coupling and decay factor and might reflect biological constraints on the strength of lateral connections (Kandel et al., 2000; Malagon et al., 2020; Rioult-Pedotti et al., 1998).

Training Enhances Figure-Ground Segregation Performance

Our results demonstrate that synchrony principles provide a viable neural grouping mechanism for texture segregation. We next asked whether training-induced changes of lateral connections among neural assemblies in early visual cortex affect assemblies’ readiness to synchronize and whether this is accompanied by performance improvements. We reasoned that neural synchrony must remain adaptable to the statistics of visual experiences to function effectively as a grouping mechanism.

Consequently, we hypothesized that if synchrony among neural assemblies is related to figure-ground segregation and enhanced through perceptual learning, the ability to segregate figure from ground should increase with training.

To test this, both the model and human participants were exposed to eight daily sessions of extensive training using identical stimuli and experimental conditions. We hypothesized that both grid coarseness and contrast heterogeneity exhibit main effects on discrimination accuracy.

However, we also expected coupling strength to increase with learning and that this would allow synchrony to occur for increasingly coarser grids. We thus hypothesized an interaction effect between session and grid coarseness on discrimination accuracy. Furthermore, we hypothesized an additional interaction between session and contrast heterogeneity where the effect of contrast heterogeneity increases over sessions. Model simulations for the first session never revealed a synchronized state for contrast heterogeneity values beyond 0.25, even for the densest grids. This, together with an upper bound on coupling strength, suggests that synchrony cannot be achieved far beyond this cutoff point, even after extensive training. Indeed, model simulations of training confirmed this, showing that synchrony approached this cutoff point for increasingly coarser grids over sessions (Figure 4c). These model results indicate that the effect of contrast heterogeneity would increase over sessions with high performance for values below the cutoff point and low performance above the cutoff point. Finally, extensive training may globally increase participants’ performance, implying a main effect of session.

Learning effects on Arnold tongues.

a, Group average behavioral Arnold tongues for the 25 experimental conditions for each session. The vertical black line separates transfer session 9 from training sessions 1 to 8. b, Two-dimensional psychometric curves fitted to session-specific group average behavioral Arnold tongues. The dashed line again indicates the combination of contrast heterogeneity and grid coarseness at which participants achieve 75% accuracy. c, Simulated Arnold tongues for each of the eight training sessions including session-by-session learning in the model. We did not include a simulation of the ninth session because the location-specificity of the model learning rule would render it identical to the first session. Note that for visualization purposes we simulated the model for 30 levels of contrast heterogeneity and 30 levels of grid coarseness, in both cases including the 5 levels investigated experimentally.

We performed a GEE analysis including the main effects of contrast heterogeneity, grid coarseness, and session, as well as interactions between session and contrast heterogeneity, between session and grid coarseness, and between contrast heterogeneity and grid coarseness. As before, we included the interaction between contrast heterogeneity and grid coarseness to account for potential nonlinear effects specific to V1. The analysis revealed a significant main effect of session (β = 0.2229, 95% CI [0.143, 0.303], Std. Error = 0.041, Wald Chi-Square = 29.912, p ≪ 0.0001, OR 1.25, 95% CI for OR = [1.15, 1.35]), confirming that participants’ ability to segregate figure from ground should increase with training. Furthermore, we found a significant interaction between contrast heterogeneity and session (β = −0.1941, 95% CI [−0.239, −0.149], Std. Error = 0.023, Wald Chi-Square = 71.568, p ≪ 0.0001, OR = 0.82, 95% CI for OR = [0.79, 0.86]). However, the interaction between grid coarseness and session was not significant (β = −0.0359, 95% CI [−0.095, 0.023], Std. Error = 0.030, Wald Chi-Square = 1.411, p = 0.235, OR = 0.96, 95% CI for OR = [0.91, 1.02]). Nevertheless, grid coarseness did maintain a significant main effect on discrimination accuracy (β = −3.1893, 95% CI [−3.742, −2.636], Std. Error = 0.282, Wald Chi-Square = 127.690, p ≪ 0.0001, OR = 0.04, 95% CI for OR = [0.02, 0.07]) with increasing accuracy as grid coarseness decreased. See Supplementary Materials for an overview of effects at the individual participant level. As can be appreciated from Figure 4, there indeed seemed to be a cutoff value for contrast heterogeneity beyond which the figure could not be discriminated from the background. This cutoff may also explain why an interaction between session and grid coarseness was not found. Below the cutoff point, the top and middle rows of Figure 4 suggest that participants could discriminate the figure for increasingly coarser grids. Beyond the cutoff, however, grid coarseness seemed to have had no discernible effect regardless of how much training participants received.

Next, we examined simple effects of contrast heterogeneity on discrimination accuracy for each session separately (see Table 1). As expected from the presence of the cutoff, the effect of contrast heterogeneity increased over sessions, reflected in decreasing log-odds (β) and corresponding odds ratios (ORs) over sessions as shown in Table 1. Whereas in the first session the odds of correctly discriminating the texture decreased by approximately 77% for each unit increase in contrast heterogeneity, it decreased by approximately 95% in the eighth session. This was confirmed by a meta-regression analysis revealing a significant negative effect of session on log-odds (β = -0.186, 95% CI [-0.249, -0.122], Std. Error = 0.026, t (6) = -7.188, p < 0.0001).

Effects of contrast heterogeneity on discrimination accuracy across sessions.

CI = Confidence Interval. Log-odds and Odds Ratios represent the effect of a one-unit increase in contrast heterogeneity on the odds of correct discrimination. P-values are for the simple effect of contrast heterogeneity in each session.

Finally, we evaluated whether the observed effects reflected localized learning in early visual cortex, as assumed by our mode, implying that the training effect would be specific to the trained location. Performance in the transfer session should thus resemble that observed at training locations during early rather than late sessions. To test this, we used a model selection procedure based on the Akaike Information Criterion (AIC; Wagenmakers & Farrell, 2004; see Methods for details). We fitted separate logistic regression models for each of the eight learning sessions and evaluated model performance in terms of the likelihood of observed outcomes in the transfer session. The resulting session-specific likelihoods were used to calculate session-specific AIC values. Subsequently, we obtained ΔAIC as the difference between each AIC and the minimum AIC among all sessions. To identify the session that most resembled the transfer session, we obtained an Akaike weight for each session. Akaike weights represent the normalized likelihood of each session being the best fit for the transfer session. The third session exhibited an Akaike weight of one, while the remaining sessions all had weights of zero. The third session can be considered early since performance in that session benefits from only two previous sessions, indicating location specificity of learning and supporting our assumption that perceptual learning in our experiment is dominated by local processes in low-level visual areas. Based on this, we expected that the local learning mechanism implemented in our model can provide quantitative predictions of performance changes over the course of the eight training sessions. Note that contrary to the transfer session, in the first session participants were unfamiliar with the task and laboratory environment, and thus had to learn to maintain fixation and develop a decision-response mapping in addition to solving the perception task. These additional factors may explain why the transfer session most resembles the third session.

Model Provides Quantitative Predictions of Learning Effects

We evaluated the quantitative agreement between model synchrony and empirical discrimination performance. To that end, we measured the similarity between simulated and behavioral Arnold tongues using Pearson correlations and weighted Jaccard similarity. Because we employed a leave-one-out cross-validation procedure, we obtained eight simulated Arnold tongues in sessions 2-8 after optimizing learning parameters on data from seven participants. Simulated Arnold tongues in each fold were always compared to behavioral Arnold tongues of the left-out participant. The first session did not involve learning and model simulations were identical to those reported above. Note that data from the second session was used to adjust model parameters and hence only sessions 3-8 could be used for evaluating model predictions. This cross-validation approach enabled us to assess the model’s ability to predict performance in unseen data, rather than merely fitting observed results post-hoc. Figure 5a and 5b show correlations and Jaccard similarity between simulated and behavioral Arnold tongues, respectively. Grey regions indicate a noise ceiling that was obtained by computing the fit between average behavioral Arnold tongues in a fold and the behavioral Arnold tongue of the left-out participant. The grey regions extend vertically from the 25th to the 75th percentile of fit values obtained using this procedure. The figure demonstrates consistent quantitative agreement between simulated and behavioral Arnold tongues across sessions.

Model predictions of learning effects.

a, Pearson correlations between simulated and behavioral Arnold tongues for each training session. Error bars indicate 95% confidence intervals. Grey regions indicate a noise ceiling that was obtained by computing the fit between average behavioral Arnold tongues in a fold and the behavioral Arnold tongue of the left-out participant. The grey regions reflect the 25th to the 75th percentile of fit values obtained using this procedure. b, Weighted Jaccard similarity values between simulated and behavioral Arnold tongues for each training session. Error bars and grey regions as in (a). c, Sizes of simulated (blue circles) and behavioral (orange squares) Arnold tongues across sessions. Arnold tongue sizes were averaged across participants and subsequently min-max normalized. This normalization highlights the growth patterns while accounting for the different value ranges of simulated and behavioral Arnold tongues. d, Sizes of behavioral Arnold tongues as a function of sizes of simulated Arnold tongues. The best fitting regression (black line) was obtained from a mixed effects model fitted to data from sessions 3-8 (blue circles). Red circles reflect data from the first two sessions that was not included in the mixed effect model. The black line was extended to include these points. Error bars indicate 95% confidence intervals.

To examine this further, we tested whether the size of the simulated Arnold tongue across sessions was predictive of the size of the behavioral Arnold tongues. We quantified the size of each Arnold tongue in terms of the volume under its surface computed using Simpson’s numerical integration. Arnold tongues grew across sessions with comparable growth curves for simulated and behavioral Arnold tongues (see Figure 5c). The precise relationship between simulated and behavioral Arnold tongue sizes is depicted in Figure 5d. Subsequently, we performed a mixed effects analysis to investigate this in sessions 3-8. We ignored the first two sessions since these were used for estimating model learning parameters. As expected, the size of the simulated Arnold tongue significantly predicted the size of the behavioral Arnold tongue (β = 0.765, 95% CI [0.193, 1.338], Std. Error=0.292, z = 2.621, p = 0.009). The model’s capability to accurately reflect learning effects observed in human participants supports the notion that enhanced synchrony among neural assemblies in early visual cortex because of perceptual learning enhances human’s ability to segregate figure from ground. This further strengthens the view that synchrony principles provide a viable neural grouping mechanism for texture segregation.

Discussion

The role of synchrony in the gamma frequency band for visual perception remains a matter of debate (Duecker et al., 2021; Fernandez-Ruiz et al., 2023; Ray & Maunsell, 2015; Roelfsema, 2023). A putative role for gamma synchrony in processing the features of a stimulus both within and across visual areas (Fries, 2009; Singer, 1999; Uhlhaas et al., 2008; Womelsdorf et al., 2007) has been called into question based on the stimulus-dependence of gamma synchrony (Ray & Maunsell, 2010, 2015; Roelfsema, 2023). Alternatively, it has also been suggested that feature dependent gamma frequencies and distance dependent synchrony are key ingredients in a neural grouping mechanism underlying figure-ground segregation (Lowet, Roberts, Hadjipapas, Peter, Eerden, et al., 2015; Lowet, Roberts, Peter, Gips, & De Weerd, 2017). It is well established that the frequency of gamma oscillations in visual cortex depends on local stimulus features (Baldi & Meir, 1990; Buia & Tiesinga, 2006; Hall et al., 2005; Henrie & Shapley, 2005; Roberts, Lowet, Brunet, TerWal, Tiesinga, Fries, & DeWeerd, 2013; Shapira et al., 2017) and that lateral connectivity between neural groups within early visual cortex depends on cortical distance (Amir et al., 1993; Boucsein et al., 2011; Eckhorn, 1994; Gilbert & Wiesel, 1989; Stettler et al., 2002; Ts’o et al., 1986). It is likewise a well-known property of coupled oscillators that they synchronize when their coupling is sufficiently strong to overcome differences in their frequency, but not otherwise (Acebrón et al., 2005; Ermentrout et al., 2019; Kuramoto, 1984; Neu, 1979; Strogatz, 2000). Synchrony may thus drive the perceptual grouping of elements that are similar and sufficiently near to each other whereas a lack of synchrony segregates figure elements from those that form the background.

We tested this hypothesis in a psychophysics experiment wherein human observers identified a rectangular figure region in a texture stimulus to discriminate the figure’s orientation (vertical vs horizontal). The stimulus consisted of small Gabor annuli arranged on an irregular grid. Each Gabor annulus was characterized by its own contrast and the figure region was defined by less heterogenous contrasts among the Gabor annuli compared to the background. We manipulated contrast heterogeneity and grid coarseness (distance between annuli) as a proxy of frequency detuning and coupling strength, respectively. Both contrast heterogeneity and grid coarseness significantly affected discrimination accuracy. Specifically, we found that accuracies beyond 75% were limited to a triangular region in the space spanned by these two factors, forming a behavioral Arnold tongue. In line with our expectations, increased contrast heterogeneity in the figure permitted figure-ground segregation if accompanied by a reduction in grid coarseness. These results quantitatively aligned well with synchrony exhibited in a coupled oscillator V1 model exposed to the same texture stimuli.

The capacity of our model to predict human psychophysical performance is remarkable given that the key parameters of maximum coupling strength and coupling decay factor were obtained from neurophysiological data recorded from macaques (Lowet, Roberts, Peter, Gips, & De Weerd, 2017). This cross-species validation underscores the robustness of our model and suggests that the neural mechanisms underlying gamma oscillations and figure-ground segregation are conserved across primate species (Buzsáki et al., 2013). It is noteworthy that the parameter combination obtained from macaque data bordered the region of optimal combinations exhibiting the highest match to psychophysics results that our model could, in principle, achieve (see Figure 3). The slight deviation from the optimal regime likely stems from the fact that parameters were estimated from data obtained in macaques and subsequently used to predict human behavior. It is likely that horizontal connections in the human extend further than those in the macaque (Amir et al., 1993; Burkhalter & Bernardo, 1989; Lund, Yoshioka, & Levitt, 1993; Voges et al., 2010) and may thus be associated with a slightly smaller coupling decay factor.

To further investigate whether synchrony among neuronal populations exhibiting contrast dependent frequencies provides a perceptual grouping mechanism, we tested whether training-induced changes of lateral coupling in a network of phase oscillators improved the readiness of these oscillators to synchronize and whether this model provided accurate predictions of performance on the figure-ground segregation task. We reasoned that neural synchrony must be adaptable to the statistics of visual experiences to function effectively as a grouping mechanism and hence subject to training-induced learning effects. We observed that discrimination performance improved significantly as a function of training session, indicating that participants’ perceptual skills adapted to the statistics of our stimuli and task. Importantly, we found that training-induced increases in accuracy were well accounted for by model predictions of synchrony strength inside the figure. Our results support the notion that synchrony mechanisms in low-level visual areas contribute in a behaviorally relevant manner to texture segregation and that training-induced changes of local synchrony are reflected by concurrent changes in perception. Synchrony and discrimination accuracy revealed highly congruent Arnold tongues. A close quantitative resemblance of these Arnold tongues was maintained across sessions as both tongues grew in a highly consistent manner. This supports the idea that learning-induced changes in figure-ground segregation may be mediated by plasticity-induced changes in synchrony.

It is important to note that the learning mechanism integrated into our model assumes that learning is local. We validated this assumption in the human participants by testing whether moving the figure region from its trained location to a new location would lead to transfer of performance to the new location, or rather a decrease in performance in the new location. Our results supported the latter. This is in line with other studies which demonstrated that after location-specific training, low-level visual areas contribute significantly to the location and stimulus specificity of expert visual performance (Karni & Bertini, 1997). Based on previous findings that location-specific training induces localized plasticity in low-level visual areas (Brosch et al., 2015; Raiguel et al., 2006; Schoups et al., 2001; Yang & Maunsell, 2004), we further assumed that learning in our paradigm primarily affects lateral connectivity within V1 and hence manipulates coupling strength between neural assemblies. An alternative hypothesis could be that learning, by targeting feedforward or feedback connectivity, alters the contrast sensitivity of neural assemblies. If this were to reduce the slope of the contrast-frequency relationship, it could theoretically offer a pathway to achieve synchrony across more heterogeneous contrasts by minimizing detuning rather than increasing coupling strength. However, empirical evidence suggests that training on perceptual tasks tends to steepen, rather than flatten, the contrast-frequency relationship (Chen et al., 2013; Hua et al., 2010; Sanayei et al., 2018). Given its lack of empirical support, we therefore did not incorporate this alternative mechanism into our model.

The predominant cue for figure-ground segregation in our stimuli lay in the global variations of population statistics in the contrast distribution, rather than local differences at the boundary between the figure and the ground (de Weerd et al., 1994; Poort et al., 2016; Roelfsema et al., 2002). The consistency between our model’s predictions and psychophysics experiment results suggests that, for the stimuli we used, our simple bottom-up model captures a crucial component of the mechanism underlying figure-ground segregation. Nevertheless, our model does not account for attentional effects, even though the significance of attention in figure-ground segregation and in learning is well-established (Huang et al., 2020) and it is likely that pure exposure to the stimuli in our experiment would have revealed very limited effects (Seitz & Dinse, 2007). Indeed, the synchrony mechanism implemented in our V1 model may reflect pre-attentive lateral interactions triggered by bottom-up input that result in a feedforward scaffold of the visual scene, which is then read out by top-down mechanisms to segregate the figure from the ground (Ahissar & Hochstein, 1997, 2004; Hochstein & Ahissar, 2002; Liu & Weinshall, 2000; Rubin et al., 1997). Hence, our proposed mechanism expands upon the widely-accepted model of boundary detection, followed by region-filling feedback (Grossberg & Mingolla, 1985, 1987; Keil et al., 2005; Layton et al., 2014; Motoyoshi, 1999; Neumann et al., 2001; Pessoa & de Weerd, 2003; Roelfsema et al., 2002), a notion substantiated by neurophysiological and psychophysical evidence (Poort et al., 2016; Roelfsema et al., 2002; Self et al., 2012) as well as by lesion and optogenetics experiments(Kirchberger et al., 2021; Lamme et al., 1998; Supèr & Lamme, 2007). Possibly, a local analysis of element distributions may lead to coarse bottom-up information about the layout of surfaces that is elaborated through further recurrent processes to refine or extract boundary representations and give figure status to one region over another (Poort et al., 2016; Roelfsema et al., 2002). Overall, although the lack of attentional mechanisms in our current model highlights the strong potential contributions of V1 (and other low-level visual areas) on their own, we also acknowledge this limitation of the current work concerning the mismatch in comparing a psychophysical dataset obtained during attentional performance with a model that lacks attention mechanisms.

Another limitation of this work is that participants were unfamiliar with the task and thus had to learn to maintain fixation and acquire a decision-response mapping in addition to solving the actual perception task. As such, results in the first session may represent cognitive processes related to these factors. Future versions of our experiment might consider including a baseline training session during which participants get acquainted with the experimental setup and tasks using stimuli that define figure and background with features that are independent of those manipulated in the main experiment. Moreover, participants were not informed of which visual quadrant the figure would appear in the transfer session. This raises the concern that our results partly reflect visual search effects (Eckstein, 2011; Neisser, 1964) rather than a return to a naïve state of the figure-ground segregation skill. Arguably, however, this only affected a minority of trials and is thus not sufficient to account for the loss of skill we observed. Finally, although a strength of this work is the prediction of human psychophysical performance based on a model whose parameters were set by independent neurophysiological data, a weakness is the absence of neurophysiological data for our specific experimental paradigm. Such data would strengthen our interpretations. At the same time, a combined psychophysical and neurophysiological experiment in an animal model replicating the experimental conditions used here would benefit from strong predictions provided by the present study as well as prior neurophysiological data (Lowet, Roberts, Peter, Gips, & De Weerd, 2017).

Despite these limitations, our results strongly support the notion that gamma synchrony serves an important mechanistic role in figure-ground segregation. It is evident that local neural circuits in V1 oscillate at stimulus-dependent gamma frequencies (Baldi & Meir, 1990; Gray et al., 1989; Roberts, Lowet, Brunet, TerWal, Tiesinga, Fries, & de Weerd, 2013) and that they may achieve synchrony depending on the proximity (Eckhorn, 1994; Lowet, Roberts, Hadjipapas, Peter, van der Eerden, et al., 2015; Lowet, Roberts, Peter, Gips, & de Weerd, 2017; Singer & Gray, 1995) and similarity of local elements (Lowet, Roberts, Hadjipapas, Peter, van der Eerden, et al., 2015; Lowet, Roberts, Peter, Gips, & de Weerd, 2017). Furthermore, oscillations have been shown to facilitate learning through spike-timing dependent plasticity (Masquelier et al., 2009), rendering an oscillation-based Hebbian learning mechanism biologically plausible. Our results indicate that these properties of neural oscillations and synchrony interact constructively to give rise not only to the perceptual skill of figure-ground segregation but also to its practice-induced enhancement.

The synchrony-based grouping mechanism studied here provides a theoretical framework for previous experimental results. Research on figure-ground segregation has primarily used texture stimuli, wherein features like contrast (Hadjipapas et al., 2015), spatial frequency (Bredfeldt & Ringach, 2002; Henriksson et al., 2008), color (Shapley & Hawken, 2011), orientation (Lamme, 1995) and movement direction (Lamme, 1995), differ in figure and background. It is well documented that the difference between figure and background in one or a combination of these features (Landy & Bergen, 1991; Motoyoshi & Nishida, 2001; Nothdurft, 1985a, 1991b, 1991a) as well as in population statistics (de Weerd et al., 1992; Nothdurft, 1985b) and the physical proximity among texture elements within a figure (de Weerd et al., 1992; Nothdurft, 1985b) are the main parameters that determine the accuracy of figure-ground segregation. Given the feature dependence of gamma frequency (Dubey & Ray, 2020; Hadjipapas et al., 2015; Henrie & Shapley, 2005; Roberts, Lowet, Brunet, TerWal, Tiesinga, Fries, & DeWeerd, 2013; Shapira et al., 2017) and the relationship between physical proximity and lateral connectivity in early visual cortex (Boucsein et al., 2011; Gilbert & Wiesel, 1983; Lowet, Roberts, Hadjipapas, Peter, Eerden, et al., 2015; Lowet, Roberts, Peter, Gips, & De Weerd, 2017; Stettler et al., 2002; Ts’o et al., 1986), all these manipulations affect frequency detuning or coupling strength among neural oscillators and may thus be explainable by a single synchrony-based grouping mechanism. Similar synchrony has furthermore been observed in the auditory cortex, where it facilitates the integration of sound features and the segregation of auditory streams (Giraud & Poeppel, 2012). This suggests that the synchrony-based grouping mechanism we propose could extend beyond visual processing. Future research could explore the applicability of our findings to other sensory modalities.

In conclusion, this study shows that figure-ground segregation performance can be well predicted by the factors that determine synchrony according to the theory of weakly coupled oscillators. Frequency detuning driven by contrast heterogeneity and coupling strength driven by physical distance interact constructively to give rise to the perceptual skill of figure-ground segregation as well as its practice-induced enhancement. Our results provide empirical evidence for a synchrony-based neural grouping mechanism wherein the documented dependence of gamma synchrony on stimulus features and element distance plays an important functional role. This research sheds light on the underlying mechanisms of visual perception and perceptual learning and highlights the importance of gamma oscillations and synchrony in the process of figure-ground segregation.

Methods

Behavioral Experiments

The study and its experimental procedures were approved by the local Ethical Committee of the Faculty of Psychology and Neuroscience (ERCPN).

Participants

Eight healthy volunteers (6 female, mean age = 23.75, standard deviation = 6.453) participated in this study. Sample size was determined based on comparable studies investigating visual perception and perceptual learning in humans (Intoy et al., 2024; Lange et al., 2020; Tesileanu et al., 2020). All participants had normal or corrected-to-normal visual acuity. After receiving full information about all procedures and the right to withdraw participation at any time, participants gave their written informed consent. All participants were compensated monetarily for their time.

Stimuli

Each texture stimulus consisted of a full-screen irregular grid of non-overlapping Gabor annuli with a diameter 0.7°, a spatial frequency of 5.7 cycles/degree and a mean luminance of 60.76 Cd⁄m2 placed on a grey (60.76 Cd⁄m2) background. Annuli contrasts were uniformly sampled from the full contrast range U[0,1], except for a rectangular figure region [ (9 ± 0.7)°× ( 5 ± 0.4)°] whose contrasts were drawn from a second uniform distribution with range ζ whose 2 values were {0.01, 0.2575, 0.505, 0.7525, 1}. The figure region thus exhibited limited contrast heterogeneity, except when ζ = 1 which is identical to the background (maximum) contrast heterogeneity. The center of the figure region was placed at an eccentricity of (7 ± 1)°. The polar angle of the figure was varied on each trial with the condition that it was always completely inside a single visual field quadrant. The coarseness of the grid was expressed as a factor ρ that scales the average center-to-center distance between any pair of neighboring annuli in the whole texture. The values of ρ were {1, 1.125, 1.250, 1.375, 1.5}. Each annulus was initially placed on a regular grid and subsequently slightly shifted in a random direction by a distance chosen from a uniform distribution that ranged from zero to half of the edge-to-edge distances of neighboring annuli. All combinations of ζ and ρ yield 25 unique stimulus conditions.

Tasks and Procedure

The experiment consisted of nine consecutive sessions (eight training and one transfer session) with a two-alternative forced choice design in which participants were required to indicate whether the rectangular figure was oriented horizontally or vertically by pressing the right and left arrow key, respectively. Responses were given with the middle and index fingers of the right hand. Each trial of the experiment started with the presentation of a fixation point (a small bright turquoise disk of 2° × 2°) for minimally 1000 ms, during which accurate fixation was to be initiated (i.e., deviation < 2° from fixation point) to trigger stimulus presentation. Participants were required to maintain fixation throughout presentation of the stimulus (1000 ms or less in case that a participant lost fixation or provided a response). Participants received feedback after each trial in the form of color changes (green correct; red incorrect) of the fixation point lasting for 500 ms. Feedback was followed by a 600 ms inter-trial interval during which an isoluminant (grey) screen was shown. When a participant’s gaze fell outside the fixation window during the fixation period preceding the stimulus, or during stimulus presentation, the trial was aborted. Aborted trials were repeated at a randomly chosen time during the experiment.

The 25 conditions defined by contrast heterogeneity and grid coarseness were aggregated into experimental blocks such that all 25 combinations were shown exactly once per 25-trial block in random order. Each participant completed 30 blocks (750 trials) in each of the sessions. The figure was placed in the lower right quadrant for the eight training sessions. In the transfer session, the figure was moved to the orthogonal (upper left) quadrant. Participants were made aware of the figure displacement but were not told in which quadrant to expect it.

The experiment was conducted in a dimly lit room. A chin and headrest were used to support the participant’s head and to keep eye-screen distance constant at 57 cm. Stimuli were displayed on a 19 Samsung SyncMaster 940BF LCD monitor (Samsung, Seoul, South Korea; 60 Hz refresh rate, 1280 × 1024 resolution). Stimulus representation and response recording were performed by Psychtoolbox-3 for MATLAB 64-Bit (Version 3.0.14 - Build date: Apr 6th, 2018), under Microsoft Windows. Fixation was monitored with a desktop-mounted Eyelink 1000 eye-tracker (SR Research Ltd., 500 Hz or 1000 Hz sampling frequency, < 0.01° RMS spatial resolution, eye-movement data were down-sampled to 250 Hz).

Statistical Analyses

We used generalized estimating equations (GEEs) to analyze main and interaction effects of our variables of interest (contrast heterogeneity, grid coarseness and session) on accuracy (discrimination performance). Observations were always grouped by participant, and we used an exchangeable variance structure to account for subject-specific variability. To investigate the relationship between the sizes of empirical and model Arnold tongues while accounting for subject variability, we used a mixed linear model (mixed LM).

To assess whether accuracy in the transfer session reflects reflect a naive state, we used a model selection procedure based on the Akaike Information Criterion (AIC; Wagenmakers & Farrell, 2004). We fitted separate logistic regression models for each of our eight learning sessions. These models predict accuracy based on contrast heterogeneity and grid coarseness. To account for the within-subject design, we fitted individual models for each participant. We constructed logistic regression models for each session and participant and evaluated model performance in terms of the likelihood of observed outcomes in the transfer session for the corresponding participant. We then aggregated these likelihoods across participants within a session to derive a session-specific likelihood. The session-specific likelihoods were used to calculate session-specific AIC values. Subsequently, we computed ΔAIC as the difference between each AIC and the minimum AIC among all models. The relative likelihood of models was determined as e−0.5 ΔAIC. Finally, we obtained the Akaike weight for each model by calculating the ratio of its relative likelihood to the sum of relative likelihoods across all models. Akaike weights represent the normalized likelihood of each model being the best fit for the transfer session data. Please note that when aggregating likelihoods across participants within a session, we assume that each participant’s model contributes equally to the session’s overall predictive capability. This approach may overlook individual variability. Generalized Estimating Equations are better suited to capture this variability, which is why we use them for the main statistical analysis. However, GEE does not straightforwardly support model comparison akin to our AIC-based approach.

The GEE, mixed LM, and logistic regression analyses were performed with the statsmodels package (version 0.14.1) in Python 3.12.2.

Oscillator Model of V1

We model a small patch of V1 that receives input from a 6.7° × 6.7° square region of the visual field. The area of this square region matches the area of the rectangular figure region in our psychophysics experiments. The centre of this region is furthermore located at an eccentricity matching that of the figure. The corresponding patch of cortical surface in V1 was established through a complex-logarithmic topographic mapping of neuronal receptive field coordinates onto the cortical surface of V1 (Balasubramanian & Schwartz, 2002; Schwartz, 1980) with generic human parameter values (α = 0.7, α = 0.9; Polimeni et al., 2005). We model this V1 patch as a network of weakly coupled phase oscillators arranged on an n × n (n = 20) grid defined in visual space. Note that due to the nonlinear cortical mapping, this procedure implies that while receptive fields are equally spaced in the visual field, neural oscillators themselves are not equally spaced on the cortical surface. The phase of each neural oscillator evolves according to a Kuramoto model:

where, θi is the phase of the ith oscillator, ωi its intrinsic frequency, Ksij the coupling strength between oscillators i and j in session s (note that s is an index and not an exponent) and N = n2 is the total number of oscillators.

Intrinsic Frequency

In accordance with electrophysiological findings, the intrinsic frequency of each oscillator is a function of the local contrast in its receptive field (Roberts, Lowet, Brunet, TerWal, Tiesinga, Fries, & de Weerd, 2013). Specifically, the typical firing rate ν (in Hz with corresponding ω = 2πν) of a neural circuit in V1 is a linear function of local contrast: ν = 25 + 0.25C. The local contrast received by each oscillator i is given by the weighted root-mean-squared (RMS) value of contrast (Frazor & Geisler, 2006):

where Lh is the luminance of pixel ℎ in the stimulus, is the mean luminance over all pixels, and wih is the weight of pixel ℎ and oscillator i. The weighting was specific to each oscillator as it reflects its unique receptive field which we modelled using an isotropic 2D Gaussian function:

Here, (xh, yh) are the coordinates of the ℎth pixel, while (Xi, Yi) are the coordinates of the receptive field center of the ith oscillator. In addition, σi is the size of the receptive field. We estimated receptive field sizes based on their location relative to the center of gaze. Specifically, receptive field diameter in V1 exhibits a threshold linear relationship with receptive field eccentricity (e; Freeman & Simoncelli, 2011) such that ⌀ = max(0.172e − 0.25, 1). We related the receptive field diameter to the standard deviation of a Gaussian in two steps. First, we related the diameter to the full width at half maximum (FWHM) of a Gaussian beam (Hill, 2007). Then, we related the FWHM to the standard deviation . Combining these steps, the standard deviation is one fourth of the receptive field diameter.

Adaptive Coupling

The coupling strength K1ij between pairs of oscillators in the first session is a function of their cortical distance:

Here, γ is the maximum coupling strength and λ controls how fast the coupling strength decreases as a function of cortical distance (dij) between oscillators i and j. We estimated γ and λ from previously published data relating coupling strength to cortical distance within V1 in two macaque monkeys (Lowet, Roberts, Peter, Gips, & de Weerd, 2017).

Coupling strength in the remaining sessions is the result of an offline learning process that takes the experience accumulated over an individual training session into account. Specifically, learning in our model depends on the pairwise phase-locking value (PLV; Lachaux et al., 1999) between model oscillators accumulated over trials within one session. Phase-locking values were computed over the second half of the simulation period, which was subsampled to 50 timepoints. Accumulation across trials involves summing PLVs over trials, where the contribution of each trial is weighted by the probability that the model would produce a correct response on that trial. The weighted PLV is summarized in a matrix Q. To obtain the probability of a correct response (Pc) from model simulations, we related it to the degree of synchrony (r) among phase oscillators through a psychometric function: . Parameters of this function (i.e., μ0 and μ1) were estimated based on model simulations and empirical results from the first session.

The temporal evolution of pairwise coupling strength is given by a Hebbian-type learning rule:

Here, ε is a learning rate. Essentially, pairwise structural coupling approaches pairwise functional coupling, as measured by the weighted PLV within a session (Qs), scaled by the maximum coupling strength γ. Integration of Equation 5 with respect to time yields

Here, ε is the time between two sessions during which learning occurs (e.g., during sleep). Since neither ε nor ε can be measured independently and are not known a priori, we merged them into a single free parameter E = εt. We refer to this as the effective learning rate. We adjusted the parameter E to maximize the correspondence, measured by the weighted Jaccard similarity, between the distribution of performance observed in the second experimental session and the distribution of synchrony after letting the model learn according to Equation 6. The learning procedure thus depends on data from the first two sessions to establish a mapping from synchrony to performance (parameters μ0 and μ1of the psychometric function linking synchrony to performance) and to estimate the effective learning rate, respectively. We kept these parameters fixed for predicting the results of sessions 3-8. To further disentangle data used for parameter tuning and data used for testing model predictions, we utilized a leave-one-out cross-validation procedure. We estimated all parameters from the first two sessions of seven of our eight participants, and then predicted results of session 3-8 in the left-out participant. We repeated this procedure eight times, once per participant, and stored all results for further analysis.

Simulations

We simulated eight training sessions, each consisting of 30 blocks with 25 trials. Within each trial, we simulated a one-second stimulus monitoring interval assigned to a specific combination of contrast heterogeneity and grid coarseness. All simulations were performed in Python 3.12.2 using the odeint method from scipy’s (version 1.12.0) integrate submodule. For each simulated trial, we evaluated synchrony by measuring the radius (r ∈ [0,1]) of the Kuramoto order parameter given by

where θi is the phase of the oscillator j. For each simulated trial, r was averaged within the second half of the trial duration, and over all blocks.

System Specifications

All analyses and simulations were performed as a Docker containerized Snakemake workflow executed on a single compute node of Maastricht University’s Data Science Research Infrastructure (DSRI). The node is equipped with two AMD EPYC 7551 32-Core Processors, has a nominal 512 GB of RAM, and operates on Fedora 37. The workflow utilized 30 of 64 available cores to simulate all blocks of a particular trial in parallel. To ensure that all results can be reproduced exactly, the random seed of our workflow was fixed at 1709026616.

Data Availability

The data acquired in the study can be accessed at https://zenodo.org/doi/10.5281/zenodo.10817186.

Code Availability

The code for data acquisition can be accessed at https://github.com/ccnmaastricht/TextureStimuli-FigureGround.git.The code for performing all analyses and simulations can be accessed at https://github.com/ccnmaastricht/NeuralSynchrony-FigureGround.

Supplementary Materials

Statistical Analysis of Model Simulation

We performed an analysis of model synchrony comparable to that performed on human psychophysics results. Specifically, we analyzed the main effects of contrast heterogeneity and grid coarseness as well as their interaction on model synchrony using logistic regression. Increased contrast heterogeneity significantly decreased synchrony (β = -12.71, 95% CI [-18.03, -7.40], Std. Error = 2.71, z = -4.688, p < 0.0001, OR = 0.00003, 95% CI for OR [0.00002, 0.0006]). Similarly, coarser grids also significantly decreased synchrony (β = -14.47, 95% CI [-17.55, -11.39], Std. Error = 1.57, z = -9.204, p < 0.0001, OR = 0.000005, 95% CI for OR [0.000002, 0.00003]). Furthermore, the interaction between contrast heterogeneity and grid coarseness was significant (β = 9.48, 95% CI [4.97, 13.99], Std. Error = 2.30, z = 4.116, p < 0.0001, OR = 13,000, 95% CI for OR [143, 1,186,281]). The direction and significance of these effects are consistent with human results.

Participant-Specific Texture Segregation Performance

To evaluate the robustness of our findings, we performed statistical analyses of the main effects of contrast heterogeneity and grid coarseness on discrimination accuracy using separate logistic regressions for each participant. As can be appreciated from Supplementary Table 1, the effect of contrast heterogeneity is significant for seven of our eight participants. The effect of grid coarseness is significant for six of our eight participants. Neither of the two factors displays a significant effect for participant 4. Indeed, Supplementary Figure 1 clearly shows that unlike the other participants, participant 4 does not exhibit a behavioral Arnold tongue in the first session.

Participant-Specific Learning Effects

To evaluate the robustness of our learning experiment, we performed statistical analyses of the main effects of contrast heterogeneity, grid coarseness, and session as well as interactions between session and the other two factors on discrimination accurcay using separate logistic regressions for each participant. Supplementary Table 2 summarizes the main effects, whereas Supplementary Table 3 summarizes the interaction effects.

Results of logistic regression analysis for each participant.

Individual behavioral Arnold tongues.

Average discrimination accuracy for each of the 25 discrete experimental conditions. Contrast heterogeneity translates into the variance of intrinsic frequencies (detuning) whereas grid coarseness translates into cortical distance (coupling strength). Panels a through h display results of participants 1 through 8, respectively.

As can be seen in Supplementary Table 2, the main effects of contrast heterogeneity and grid coarseness were significant for each participant, even participant 4. The effect of the session was only significant for four of our eight participants, explaining its overall small effect size. Supplementary Table 3 reveals that the interaction between session and contrast heterogeneity was significant for every individual participant. While the interaction between session and grid coarseness was not significant at the group level, it did reach significance for three of our eight participants individually.

Main effects of participant-specific logistic regression analyses of learning effects.

Interaction effects of participant-specific logistic regression analyses of learning effects