The PPC as a slower integrator network.

(A) In any given trial, a pair of stimuli (here, sounds) separated by a variable delay interval is presented to a subject. After the second stimulus, and after a go cue, the subject must decide which of the two sounds is louder by pressing a key (humans) or nose-poking in an appropriate port (rats). (B) The stimulus set. The stimuli are linearly separable, and stimulus pairs are equally distant from the s1 = s2 diagonal. Error-free performance corresponds to network dynamics from which it is possible to classify all the stimuli below the diagonal as s1 > s2 (shown in blue) and all stimuli above the diagonal as s1 < s2 (shown in red). An example of a correct trial can be seen in (E). In order to assay the psychometric threshold, several additional pairs of stimuli are included (purple box), where the distance to the diagonal s1 = s2 is systematically changed. The colorbar expresses the fraction classified as s1 < s2. (C) Schematics of contraction bias in delayed comparison tasks. Performance is a function of the difference between the two stimuli, and is impacted by contraction bias, where the base stimulus s1 is perceived as closer to the mean stimulus. This leads to a better/worse (green/red area) performance, depending on whether this “attraction” increases (Bias+) or decreases (Bias−) the discrimination between the base stimulus s1 and the comparison stimulus s2. (D) Our model is composed of two modules, representing working memory (WM), and sensory history (PPC). Each module is a continuous one-dimensional attractor network. Both networks are identical except for the timescales over which they integrate external inputs; PPC has a significantly longer integration timescale and its neurons are additionally equipped with neuronal adaptation. The neurons in the WM network receive input from those in the PPC, through connections (red lines) between neurons coding for the same stimulus. Neurons (gray dots) are arranged according to their preferential firing locations. The excitatory recurrent connections between neurons in each network are a symmetric, decreasing function of their preferential firing locations, whereas the inhibitory connections are uniform (black lines). For simplicity, connections are shown for a single pre-synaptic neuron (where there is a bump in green). When a sufficient amount of input is given to a network, a bump of activity is formed, and sustained in the network when the external input is subsequently removed. This activity in the WM network is read out at two time points: slightly before and after the onset of the second stimulus, and is used to assess performance. (E) The task involves the comparison of two sequentially presented stimuli, separated by a delay interval (top panel, black lines). The WM network integrates and responds to inputs quickly (middle panel), while the PPC network integrates inputs more slowly (bottom panel). As a result, external inputs (corresponding to stimulus 1 and 2) are enough to displace the bump of activity in the WM network, but not in the PPC. Instead, inputs coming from the PPC into the WM network are not sufficient to displace the activity bump, and the trial is consequently classified as correct. In the PPC, instead, the activity bump corresponds to a stimulus shown in previous trials.

Contraction bias and short-term sensory history effects as a result of PPC network activity.

(A) Performance of network model for psychometric stimuli (colored lines) is not error-free (green dashed lines). A shorter inter-stimulus delay interval yields a better performance. (B) Errors occur due to the displacement of the bump representing the first stimulus s1 in the WM network. Depending on the direction of this displacement with respect to s2, this can give rise to trials in which the comparison task becomes harder (easier), leading to negative (positive) biases (top and bottom panels). Top sub-panel: stimuli presented to both networks in time. Middle/ bottom sub-panels show activity of WM and PPC networks (in green). (C) Left: performance is affected by contraction bias – a gradual accumulation of errors for stimuli below (above) the diagonal upon increasing (decreasing) s1. Colorbar indicates fraction of trials classified as s1 < s2. Middle and Right: for comparison, data from the auditory version of the task performed in humans and rats. Data from Ref. [7]. (D) Panel 1: For each combination of current (x-axis) and previous trial’s stimulus pair (y-axis), fraction of trials classified as s1 < s2 (colorbar). Performance is affected by preceding trial’s stimulus pair (modulation along the y-axis). For readability, only some tick-labels are shown. Panel 2: bias, quantifying the (attractive) effect of previous stimulus pairs. Colored lines correspond to linear fits of this bias for each pair of stimuli in the current trial. Black dots correspond to average over all current stimuli, and black line is a linear fit. These history effects are attractive: the smaller the previous stimulus, the higher the probability of classifying the first stimulus of the current trial s1 as small, and vice-versa. Panel 3: human auditory trials. Percentage of trials in which humans chose left for each combination of current and previous stimuli; vertical modulation indicates attractive effect of preceding trial. Panel 4: Percentage of trials in which humans chose left minus the average value of left choices, as a function of the stimuli of the previous trial, for fixed previous trial response choice and reward. Panel 5 and 6: same as panels 3 and 4 but with rat auditory trials. Data from Ref. [7]. (E) Top: performance of network, when the weights from the PPC to the WM network is weakened, is improved for psychometric stimuli (yellow curve), relative to the intact network (black curve). Bottom: psychometric curves for rats (only shown for one rat) are closer to error-free during PPC inactivation (yellow) than during control trials (black). (F) Left: the attractive bias due to the effect of the previous trial is present with the default weights (black line), but is eliminated with reduced weights (yellow line). Right: while there is bias induced by previous stimuli in the control experiment (black), this bias is reduced under PPC inactivation (yellow). Experimental figures reproduced with permission from Ref.[7].

Multiple timescales at the core of short-term sensory history effects.

(A) Schematics of activity bump dynamics in the WM vs PPC network. Whereas the WM responds quickly to external inputs, the bump in the PPC drifts slowly and adapts, until it is extinguished and a new bump forms. (B) The location of the activity bump in both the PPC (pink line) and the WM (purple line) networks, immediately before the onset of the second stimulus s2 of each trial. This location corresponds to the amplitude of the stimulus being encoded. The bump in the WM network closely represents the stimulus s1 (shown in colored dots, each color corresponding to a different delay interval). The PPC network, instead, being slower to integrate inputs, displays a continuous drift of the activity bump across a few trials, before it jumps to a new stimulus location, due to the combined effect of inhibition from incoming inputs and adaptation that extinguishes previous activity. (C) Fraction of trials in which the bump location corresponds to the base stimulus that has been presented in the current trial, as well as the two preceding trials ( to ). In the WM network, in the majority of trials, the bump coincides with the first stimulus of the current trial . In a smaller fraction of the trials, it corresponds to the previous stimulus , due to the input from the PPC. In the PPC network instead, a smaller fraction of trials consist of the activity bump coinciding with the current stimulus . Relative to the WM network, the bump is more likely to coincide with the previous trial’s comparison stimulus . (D) During the inter-stimulus delay interval, in the absence of external sensory inputs, the activity bump in the WM network is mainly sustained endogenously by the recurrent inputs. It may, however, be destabilized by the continual integration of inputs from the PPC. (E) As a result, with an increasing delay interval, given that more errors are made, contraction bias increases. Green (orange) bars correspond to the performance in Bias+ (Bias−) regions, relative to the mean performance over all pairs (Fig. 1 C). (F) Left and middle: longer delay intervals allow for a longer integration times which in turn lead to a larger frequency of WM disruptions due to previous trials, leading to a larger previous-trial attractive biases (2s vs. 6s vs. 10s). Right: Weak repulsive effects for larger delays become apparent. Colored dots correspond to the bias computed for different values of the inter-stimulus delay interval, while colored lines correspond to their linear fits. (G) When neuronal adaptation is at its lowest in the PPC i.e. following a bump jump, the WM bump is maximally susceptible to inputs from the PPC. The attractive bias (towards previous stimuli) is present in trials in which a jump occurred in the previous trial (black triangles, with black line a linear fit). Such biases are absent in trials where no jumps occur in the previous trial (black dots, with dashed line a linear fit). Colored lines correspond to bias for specific pairs of stimuli in the current trial, regular lines for the jump condition, and dashed for the no jump condition.

Errors are drawn from the marginal distribution of stimuli, giving rise to contraction bias.

(A) A simple mathematical model illustrates how contraction bias emerges as a result of a volatile working memory for s1. A given trial consists of two stimuli and . We assume that the encoding of the second stimulus is error-free, contrary to the first stimulus that is prone to change, with probability ϵ. Furthermore, when s1 does change, it is replaced by another stimulus, ŝ (imposed by the input from the PPC in our network model). Therefore, ŝ is drawn from the marginal distribution of bump locations in the PPC, which is similar to the marginal stimulus distribution (see panel B), pm (see also Sect. 4.2). Depending on the new location of ŝ, the comparison to s2 can either lead to an erroneous choice (Bias−, with probability pe) or a correct one (Bias+, with probability pc = 1 pe). (B) The bump locations in both the WM network (in pink) and the PPC network (in purple) have identical distributions to that of the input stimulus (marginal over s1 or s2, shown in gray). (C) The distribution of bump locations in PPC (from which replacements ŝ are sampled) is overlaid on the stimulus set, and repeated for each value of s2. For pairs below the diagonal, where s1 > s2 (blue squares), the trial outcome will be an error if the displaced WM bump ŝ ends up above the diagonal (red section of the pm distribution). The probability to make an error, pe, equals the integral of pm over values above the diagonal (red part), which increases as s1 increases. Vice versa, for pairs above the diagonal (s1 < s2, red squares), pe equals the integral of pm over values below the diagonal, which increases as s1 decreases. (D) The performance of the attractor network as a function of the first stimulus s1, in red dots for pairs of stimuli where s1 > s2, and in blue dots for pairs of stimuli where s1 < s2. The solid lines are fits of the performance of the network using Eq. 9, with ϵ as a free parameter. (E) Numbers correspond to the performance, same as in (D), while colors expresses the fraction classified as s1 < s2 (colorbar), to illustrate the contraction bias. (F) Performance of rats performing the auditory delayed-comparison task in Ref.[7]. Dots correspond to the empirical data, while the lines are fits with the statistical model, using the distribution of stimuli. The additional parameter δ captures the lapse rate. (G) Same as (F), but with humans performing the task. Data in (F) and (G) reproduced with permission from Ref.[7].

The stimulus distribution impacts the pattern of contraction bias through its cumulative.

(A) Left panel: prediction of performance (left y-axis) of our statistical model (solid lines) and the Bayesian model (dashed lines) for a negatively skewed stimulus distribution (gray bars, to be read with the right y-axis). Blue (red): performance as a function of s1 for pairs of stimuli where s1 > s2 (s1 < s2). Vertical dashed line: median of distribution. Right: same as left, but for a bimodal distribution. (B) The distribution of performance across different stimuli pairs and subjects for the negatively skewed (gray) and the bimodal distribution (black). On average, across both distributions, participants performed with an accuracy of 75%. (C) Left: mean performance of human subjects on the negatively skewed distribution (dots, error-bars correspond to the standard deviation across different participants). Solid (dashed) lines correspond to fits of the mean performance of subjects with the statistical (Bayesian) model, ϵ = 0.55 (σ = 0.38). Red (blue): performance as a function of s1 for pairs of stimuli where s1 < s2 (s1 > s2), to be read with the left y-axis. The marginal stimulus distribution is shown in gray bars, to be read with the right y-axis. Right: same as left panel, but for the bimodal distribution. Here ϵ = 0.54 (σ = 0.73). (D) Left: goodness of fit, as expressed by the mean-squared-error (MSE) between the empirical curve and the fitted curve (statistical model in the x-axis and the Bayesian model in the y-axis), computed individually for each participant and each distribution. Right: goodness of fit, computed for the average performance over participants in each distribution.

Attractive effects of the previous trials lead to contraction bias in human subjects, both increasing with delay interval.

(A) Left: bias, quantifying the (attractive) effect of previous stimulus pairs, for 1 3 trials back in history. The attractive bias increases with the delay interval separating the two stimuli (light to dark green: increasing delay). Right: fraction of trials (colorbar) where participants responded with s1 < s2, for each combination of the current (x-axis) and previous (y-axis) trial’s stimulus pair. (B) Same as (A), left, but for the bimodal distribution shown in Fig.5 A, right panel. (C) Top: human performance on trials with a 2-second delay interval, bottom: 4-second delay. Colorbar expresses the fraction of trials in which participants responded that s1 < s2. (D) Amount of bias computed separately on bias+ and Bias− trials, for all delay intervals and two stimulus distributions tested (negatively skewed in gray and bimodal in black).

A prolonged inter-trial interval (ITI) improves average performance and reduces attractive bias. Working memory is attracted towards short-term and repelled from long-term sensory history.

(A) Performance of the network model for the psychometric stimuli improves with an increasing inter-trial interval. Errorbars (not visible) correspond to the s.e.m. over different simulations. (B) The network performance is on average worse for longer ITIs (right panel, ITI=10s), compared to shorter ones (left panel, ITI=1.2s). Colorbar indicates the fraction of trials classified as s1 < s2. (C) Quantifying contraction bias separately for Bias+ trials (green) and Bias− trials (orange) yields a decreasing bias as the inter-trial interval increases. (D) Bias, quantifying the (attractive) effect of the two previous trials. Different shades of purple correspond to different values of the ITI, with dots corresponding to simulation values and lines of the same color to linear fits. The one-trial back history effects are attractive: the larger the previous stimulus, the higher the probability of classifying the first stimulus s1 as large, and vice-versa. The attractive bias is larger for a smaller ITI (light purple, ITI=1.2s), and smaller for a larger ITI (dark purple, ITI=10s). (E) Performance is affected by the previous stimulus pairs (modulation along the y-axis), more for a short ITI (left, ITI=1.2s) than for a longer ITI (right, ITI=10s). The colorbar corresponds to the fraction classified s1 < s2. For readability, only some labels are shown. (F) Although the stimuli shown up to two trials back yield attractive effects, those further back in history yield repulsive effects, notably when the ITI is larger. Such repulsive effects extend to up to 6 trials back.

Apparent trade-off between short- and long-term biases, controlled by the timescale of neural adaptation.

(A) the bias exerted on the current trial by the previous trial (see main text for how it is computed), for three values of the adaptation timescale that mimic similar behavior to the three cohorts of subjects. (B) As in (Fig. 2 D), for three different values of adaptation timescale. The colorbar corresponds to the fraction classified s1 < s2. (C) GLM weights corresponding to the three values of the adaptation parameter marked in (Fig. S6 A), including up to 4 trials back. In a GLM variant incorporating a small number of past trials as regressors, the model yields a high weight for the running mean stimulus regressor. Errorbars correspond to the standard deviation across different simulations. (D) Same as in C, but including regressors corresponding to the past 10 trials as well as the running mean stimulus. With a larger number of regressors extending into the past, the model yields a small weight for the running mean stimulus regressor. Errorbars correspond to the standard deviation across different simulations. (E) The weight of the running mean stimulus regressor as a function of extending the number of past trial regressors decays upon increasing the number of previous-trial stimulus regressors.

Dynamics of responses in a one-dimensional continuous attractor network, in the presence of adaptation.

(A) We study a one dimensional line attractor in which neurons code for a stimulus feature that varies along a physical dimension, such as amplitude of an auditory stimulus. The connections between pairs of neurons is a decreasing, symmetric function of the distance between their preferred firing locations, allowing for a bump of activity to form and self-sustain when sufficient input is given to the network. However, this self-sustaining activity may be disrupted if neuronal adaptation is present. In particular, drifting dynamics may be observed. (B) Left: phase diagram of the average drift velocity as a function of the adaptation timescale and amplitude DP. The average drift velocity is simply computed as the distance travelled by the center of the bump in a duration of 50 seconds. Color codes for the average drift velocity (a.u.). Numbers indicate four points for which sample dynamics are shown in (C). (C) We observe three main phases: in the first, the activity bump is stable when no or little neuronal adaptation is present (point 4). Larger values of neural adaptation induce drift of the activity bump; the average drift velocity increases upon increasing the neural adaptation (points 2 and 3). Finally, increasing it even further leads to the dissipation of the activity bump (point 1). The boundary between the drift and dissipation phases is abrupt. In these simulations, periodic boundary conditions have been used in order to compute the average average drift velocity over longer durations.

The role of neural adaptation in short-term history biases.

In order to better understand the network mechanisms that give rise to short-term history effects, we removed neural adaptation in the PPC network and assessed the performance in the WM network. (A) As in (Fig. 3 B). We track the location of the bump, in the PPC (pink), and in the WM network (purple) before the onset of the second stimulus (the pink curve cannot be seen as the purple curve goes perfectly on top). In this case, the displacement of the bump of activity is smooth and new sensory stimuli (colored dots) induce only a minimal shift in the location of the bump. This behavior is to be contrasted with the case in which there is adaptation in the PPC network, inducing jumps in the bump location (Fig. 3 A). An additional effect of no neural adaptation is that the activity in the PPC network, completely overrides the activity in the WM network. (B) As in (Fig. 3 C). We compute the fraction of times the bump is in a given location, current trial , four preceding trials ( to ), the running mean stimulus, or all other locations (overlapping sets). In this case, in the majority of the trials, the bump is either at the running mean stimulus, or any other location. The fraction of trials in which it is in the position of the four previous stimuli roughly corresponds to chance occurrence (dashed black lines), with only a minor increase for the current stimulus. (C) As in (Fig. 2 C). In this setting, the performance expresses a very strong contraction bias, and it is as if the decision boundary is orthogonal to the optimal decision boundary. Color codes for fraction of trials in which a s1 > s2 classification is made. (D) As in (Fig. 2 D) Left: the network behavior conditioned on the previous trial stimulus pair does not exhibit any previous-trial history dependence (vertical modulation). Colorbar corresponds to the fraction classified as s1 < s2. Right: this can also be expressed through the bias measure (see main text for how it is computed). Colored lines correspond to current trial pairs, the black dots to the mean over all current trial pairs, and the black line to its linear fit. (E) As in (Fig. 4 B). Marginal distribution of the bump location in both networks (pink for PPC, purple for WM) before the onset of s2 is more peaked than the marginal distribution of the stimuli (gray), as a result of the absence of “jumps”.

Inactivating the inputs from the PPC network improves performance, in line with experimental findings.

(A) As in (Fig. 2 C). The performance of the network when the strength of the inputs from the PPC to the WM network is weakened (modelling the optogenetic inactivation of the PPC) is dramatically improved, and contraction bias is virtually eliminated. The colorbar corresponds to the fraction classified as s1 < s2. (B) As in (Fig. 2 D). The performance for each stimulus pair in the current trial is improved and no modulation by the previous stimulus pairs can be observed. The colorbar corresponds to the fraction classified as s1 < s2. (C) As in (Fig. 3 B). This improvement of the performance can be traced back to how well the activity bump in the WM network (in purple), before the onset of the second stimulus s2, tracks the first stimulus s1 (shown in colored dots, each corresponding to a different value of the inter-stimulus delay interval). Relative to the case in which inputs from the PPC are intact (Fig. 3 A), it can be seen that the location of the bump tracks the first stimulus with high fidelity. The activity in the PPC (in pink), instead, is identical to that shown previously (Fig. 5 A), as all the other parameters are kept constant. (D) As in (Fig. 2 C). The bump location can be quantified not only for the stimulus s1 of the current trial (colored dots, each color corresponding to a given delay interval), but for the four preceding stimuli from the two previous trials (from back to ). With weaker inputs from the PPC (pink), the WM (purple) function of the circuit is disrupted less frequently, and in the majority of the trials, the bump of activity corresponds to the first stimulus .

The stimulus distribution impacts the pattern of contraction bias.

The model makes different predictions for the performance, depending on the shape of the stimulus distribution. (A) Panel 1: schema of model prediction. Regions shaded in red correspond to the probability of correct comparison, for stimulus pairs above the diagonal, when replacing s1 with a random value sampled from the marginal distribution with a resampling probability ϵ = 0.25 (see Fig. 4). Panel 2: prediction of both models for a unimodal symmetric (in this case quasi-uniform) stimulus distribution, statistical model (solid line) and Bayesian model (dashed line). The marginal stimulus distribution is shown in grey bars (to be read with the right y-axis). The value of s1 for which there is equal performance for pairs of stimuli below and above the diagonal is indicated by the vertical dashed line, corresponding to the median of the distribution. Panel 3: for each stimulus pair, fraction of trials classified as s1 < s2 (colorbar), for statistical model. Panel 4: same as panel 3, but for Bayesian model of equal average performance (corresponding to a width of the likelihood of σ = 0.08 (see Sect. 1.5.1 and Sect. 4.3). (B) Similar to A, for a negatively skewed distribution. (C) Similar to A, for a positively skewed distribution. (D) Similar to A, for a bimodal distribution.

Model predictions for a block design.

(A) As in (Fig. 2 A). Performance of the network model for the psychometric stimuli improves with a short delay interval and worsens as this delay is increased. (B) As in (Fig. 2 C). Performance is affected by contraction bias – a gradual accumulation of errors for stimuli below (above) the diagonal upon increasing (decreasing) s1. As the delay interval increases, the contraction bias is increased which results in reduced performance across all pairs. Colorbar indicates the fraction of trials classified as s1 < s2. (C) As in (Fig. 3 C). The location of the bump that corresponds to the value of s1 occupies a smaller fraction of trials, as the delay interval increases. (D) As in (Fig. 2 D). Performance is affected by the previous stimulus pairs (modulation along the y-axis), and becomes worse as the delay interval is increased. The colorbar corresponds to the fraction classified s1 < s2. (E) As in (Fig. 3 F). Bias, quantifying the (attractive) effect of the previous stimulus pairs, each color corresponding to a different delay interval. These history effects are attractive: the larger the previous trial stimulus pair, the higher the probability of classifying the first stimulus s1 as large, and vice-versa. Middle/right panels: same as the left panel, for stimuli extending two and three trials back. (F) Quantifying contraction bias separately for Bias+ trials (green) and Bias− trials (orange) yields an increasing bias as the inter-stimulus interval increases.

Apparent trade-off between short- and long-term biases, controlled by the timescale of neural adaptation.

(A) Left: GLM weight associated with the regressor corresponding to the mean stimulus across trials (value indicated by colorbar), as a function of the strength of the weights from the PPC to the WM network (x-axis), and the adaptation timescale in the PPC (y-axis). Right: Same as left panel, but displaying the GLM weight associated with the regressor corresponding to the previous trial’s stimulus. These two panels indicate that the adaptation timescale seemingly exerts a trade-off between the two biases: while decreasing it increases short-term sensory history biases, increasing it increases long-term sensory history biases. The values of the adaptation parameter marked by the three colored dots (in red, blue and green) can mimic behaviors similar to dyslexic, neurotypical, and autistic spectrum subjects (see also Fig. 8). (B) Left: phase diagram of the fraction of trials in which the activity bump at the end of the delay interval is in the location of the running mean stimulus as a function of the strength of the weights from the PPC to the WM network (x-axis), and the adaptation timescale in the PPC (y-axis). Right: Same as (left), but for the location of any of the two stimuli presented in the previous trial. (C) The fraction of trials in which the activity bump at the end of the delay interval corresponds to different locations shown in the x-axis, for three different values of the adaptation timescale parameter, corresponding to qualitatively similar to dyslexic, neurotypical, and autistic spectrum subjects, shown in colors.

Simulation parameters, when not explicitly mentioned. Used to produce Figs. 1, 2, 3, 7, S2, S3, S5.

Simulation parameters Fig. S1.

Simulation parameters Fig. 8 and Fig. S6. Other parameters as in Tab. 1