Experimental design and behavioural results.

a, Timeline of task events in a single trial. b, Participants learnt by trial-and-error to combine two task features (colour and shape) in a non-linear fashion (XOR) and respond either by pressing the left or the right response button. For example, blue+square and green+diamond combined with left button press (XOR == False (0)) were followed by positive feedback; blue+diamond and green+square combined with right button press (XOR == True (1)) were also followed by positive feedback. Any other combination was followed by negative feedback. b, The task consisted of three features: colour (n = 4), shape (n = 2) and XOR conditions (n = 2; aligned fully with left and right button responses). c, Participants could generalise the meaning of colour 1 (e.g. blue) to colour 3 (e.g. pink), and vice versa, as both shared the same shape-response mapping. This could result in colours being grouped by the contextual information they provided for the shape-response mapping. d, The cue stimulus (first feature) had two dimensions: the task-irrelevant colour dimension and the task-relevant context dimension. e, In context switch trials the context of the current trial (trial 3) changed compared to the previous trial (trial 2) whereas in context stay trials the context of the current trial (trial 2) remained the same compared to the previous trial (trial 1). f, In shape switch trials the shape of the current trial (trial 3) changed compared to the previous trial (trial 2) whereas in shape stay trials the shape of the current trial (trial 2) remained the same compared to the previous trial (trial 1). g, In colour switch trials the colour of the current trial (trial 2) changed compared to the previous trial (trial 1) whereas in colour stay trials the colour of the current trial (trial 3) remained the same compared to the previous trial (trial 2). h, Mean performance accuracy plotted as a function of learning stages; shaded area indicates 95% CIs obtained through random resampling with replacement (n = 1000); accuracy in stage 1 was compared to accuracy in stage 4. i, Accuracy is lower on context switch trials (the context of the current trial changed compared to the previous trial) than context stay trials (the context of the current trial remained the same compared to the previous trial); horizontal lines are individual participants. j, Accuracy is also lower on context switch trials than shape switch trials (the shape of the current trial changed compared to the previous trial). Plotting conventions analogous to e. k, Comparison of context switch trials with colour switch trials (the colour of the current trial changed compared to the previous trial; as the context remained the same this measure captures changes caused by sensory differences only). l-o, Analogous to panels d-g but plotted for reaction time medians (before transformations). All p-values were calculated using t-tests for dependent samples (***, p < 0.01; **, p <0.01; *, p < 0.05; †, p < 0.1; n.s., not significant); reaction times were transformed using the log transformation prior to running parametric tests.

Participants maintain a context signal and construct an XOR representation over learning.

a, Time-resolved decoding of context; horizontal bars indicate statistical significance; the pale orange area indicates the time windows for which subsequent time-averaged decoding analyses were run (e.g., panel b). Vertical three dashed lines show the onset of the colour, delay, and shape, respectively; b, The learning dynamics of the context decoding computed from the time-averaged signal in the delay-locked period. c-d, Analogues to a-b but for within-context colour decoding (captures only the physical properties of the colour). i,j, k,l, and m,l Analogues to a-b but for shape, XOR and motor decoding, respectively. The shaded area indicated 95% CIs obtained through random resampling with replacement (n = 1000); All p-values for stage 1 vs stage 4 comparisons were calculated using t-tests for dependent samples (***, p < 0.01; **, p <0.01; *, p < 0.05; †, p < 0.1; n.s., not significant).

Context and XOR are progressively represented in an abstract format over learning.

a, Time-resolved cross-colour decoding of context; horizontal bars indicate statistical significance; the pale orange areas indicate the time windows for which subsequent decoding analyses were run (panel b). Vertical three dashed lines show the onset of the colour, delay, and shape, respectively; b, The learning dynamics of the cross-colour context decoding computed from the time-averaged signal in the delay period. c,d and e,f, and g,h, Analogues to a-b but for shape, XOR and motor cross-colour decoding, respectively. The shaded area indicated 95% CIs obtained through random resampling with replacement (n = 1000); All p-values for stage 1 vs stage 4 comparisons were calculated using t-tests for dependent samples (***, p < 0.01; **, p <0.01; *, p < 0.05; †, p < 0.1; n.s., not significant).

Neural geometry but not dimensionality changes over learning before the decision.

a, Linear decoding of task variables for learning stages 1 (grey) and 4 (black). b, Cross-generalised linear decoding of task variables for learning stages 1 (grey) and 4 (black). c, Comparison of motor decoding (response) in stage 1 and stage 4; horizontal black lines represent value pairs for each participant. d, The learning dynamic of mean decoding of all possible task dichotomies excluding context, shape and XOR (shattering dimensionality) run on all trials (gold) and only correct trials (green); solid lines indicate the mean and shaded areas 95% confidence intervals over participants, respectively. e, Linear decoding of task variables in learning stage 1 run on correct trials (green) and incorrect (red). f,g, Correlation between normalised mean cross-generalised context decoding scores in the delay-locked period (context maintenance) and the shattering dimensionality computed at the moment of the decision on all trials in stage 1 and stage 4, respectively. All p-values were calculated using a t-test for dependent samples (a-h; l-m) and Pearson’s correlation coefficient (***, p < 0.01; **, p < 0.01; *, p < 0.05; †, p < 0.1; n. s., not significant).