Experimental design and behavioral performance.

(A) Main task procedure. Each block started with a rule cue indicating the categorization rule for this block. On each trial, participants saw two orientations consecutively and were then cued to remember one of the orientations. In maintenance task (cued by letter ‘P’), participants needed to maintain the remembered orientation as precisely as possible. In categorization task (cued by letter ‘C’), participants needed to categorize the remembered orientation following the categorization rule of the current block. maintenance and categorization trials were interleaved within an experimental block of nine trials. Categorization rule (Rule A or Rule B) switched randomly on a block-by-block basis. Response keys (‘F’ and ’J’) for categorization task were randomly assigned to the two categories. Each pair of keys displayed at random locations within the category to eliminate information on rule boundaries. (B) Illustration of the two orthogonal categorization rules (Rule A and Rule B). (C) Rule learning performance during learning session for Rule A (purple) and Rule B (pink). (D) Errors in participants’ self-reported rule boundaries. Errors were calculated as the average distance from reported boundaries to ground truth boundaries. (E) Accuracy compared between tasks. Boxplots show the median and the 25th and 75th percentiles. Whiskers extend to 1.5 Inter quartile range (IQR) from the quartiles. Asterisks denote significant results, n.s.: not significant; **: p < 0.01. (F) Reaction time compared between tasks. Same conventions as (E). (G) Upper panel: accuracy in relation to distance from categorization boundaries. Lower panel: reaction time in relation to distance from categorization boundaries. Shaded areas represent ± SEM.

Orientation reconstructions at the population level using IEMs.

(A) Reconstructed population-level orientation representations from selected time points at EVC, IPS, and sPCS for maintenance (blue) and categorization (orange) tasks, respectively. X axis represents distance from the cued orientation (at 0°), and y axis represents reconstructed channel responses in arbitrary units. Significant orientation representation was observed at 6 s and 12 s but not at 0 s. Shaded areas represent ± SEM. (B) To quantify the strength of orientation reconstructions, we calculated the reconstruction fidelity by first projecting the channel response at each orientation onto a vector at the cued orientation and then averaging the projected vectors. (C) Time course of representational strength of orientations at EVC, IPS and sPCS. Gray shaded areas indicate the entire memory delay following task cue. Blue and orange dots at the bottom indicate the FDR-corrected significance of representational fidelity at each time point of the corresponding task at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). The bottom black dots indicate significant difference in representational fidelity between tasks (uncorrected). Horizontal dashed lines represent a baseline of 0. Shaded areas represent ± SEM. (D) Average difference of representational strength across 11 – 16 s in each ROI (from EVC to sPCS: p < 0.00001, p = 0.063, p = 0.007, respectively). Positive difference indicates higher representational strength for categorization, and vice versa for negative difference. Black asterisks denote FDR-corrected significance, *: p < 0.05; **: p < 0.01; ***: p < 0.001.

Behavioral correlation of stimulus representation for maintenance (blue) and categorization (orange) tasks.

(A) Time course of correlation coefficients in EVC, IPS, and sPCS. Gray shaded areas indicate the entire memory delay following task cue. Blue and orange dots at the top indicate significance of correlation (uncorrected) at each time point at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). (B) Correlation scatter plots at representative time points (7 s and at 12 s) in EVC, IPS, and sPCS. R denotes Pearson correlation coefficients. Asterisks denote significant results, *: p < 0.05; **: p < 0.01.

Orientation reconstructions in Experiment 2 at the population level using IEMs.

(A) Time course of representational strength of orientations at EVC, IPS and sPCS. Gray shaded areas indicate the entire memory delay following task cue. Blue and orange dots at the bottom indicate the FDR-corrected significance of representational fidelity at each time point of the corresponding task at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). The bottom black dots indicate significant difference in representational fidelity between tasks (uncorrected). Horizontal dashed lines represent a baseline of 0. Shaded areas represent ± SEM. (B) Average difference of representational strength across an early task epoch (5 – 10 s) in each ROI. Positive difference indicates higher representational strength for categorization, and vice versa for negative difference (FDR-corrected). n.s.: not significant; *: p < 0.05; ***: p < 0.001. (C) Average difference of representational strength across a late task epoch (11 – 16 s) in each ROI. Same conventions as (B).

Decoding performance for category and abstract category information.

(A) Average category decoding accuracy across the delay period (11 – 16 s) in each ROI of both experiments, black asterisks denote FDR-corrected significance, n.s.: not significant; *: p < 0.05; ***: p < 0.001. (B) Schematic illustration of abstract category decoding. In categorization task, category information can be decoded using category labels according to the true categorization rule. On the other hand, category can also be decoded due to stimulus similarity. Thus, to remove stimulus-dependent categorical information, we calculated an abstract category index by removing decoding accuracy using orthogonal category boundaries (assuming comparable stimulus-dependent effect) from that using true rule boundaries. (C) Average abstract category decoding index across the delay period (11 – 16 s) in each ROI of both experiments, black asterisks denote FDR-corrected significance, n.s.: not significant; *: p < 0.05; **: p < 0.01.

Architecture of RNNs and simulation results.

(A) All networks consist of 3 layers of artificial units: the input, hidden and output layers. For both RNN1 and RNN2, the input layer contains 20 units including 15 orientation-tuned, red) units and 5 cue units (retro-cue and task cue, orange and yellow). The hidden layer consists of three modules of 200 recurrent units with short-term synaptic plasticity (STSP), further divided into 80% excitatory (black) and 20% inhibitory (white). Connectivity within each module (black arrow) is denser compared to between modules (red and green arrows), which only occur between excitatory units. Only excitatory units in module 1 receive projections from the input layer and only excitatory units in module 3 project to the output units. For RNN1, networks output (0,1) or (1,0) through the 2 units in the output layer to indicate responses. For RNN2, the network output (0,1) or (1,0) to report the category to which the cued orientation belonged in the categorization task, or (0,0) in the maintenance task (blue units). Importantly, the models also output the orientation itself through 15 additional orientation-tuned units (red). (B) Difference in orientation decoding between tasks in RNN1 and RNN2. Results were averaged across the delay period. Positive difference indicates higher decoding accuracy for categorization, and negative difference indicates higher decoding accuracy for maintenance. Error bars represent ± SEM. Black asterisks denote FDR-corrected significance, n.s.: not significant; *: p < 0.05; **: p < 0.01; ***: p < 0.001. (C) Average abstract category information across the delay period for RNN1 and RNN2. Same conventions as (B).

Behavioral performance of Experiment 2.

(A) Accuracy (upper) and reaction time (lower) of Maintenance (blue) and Categorization (orange) tasks in Experiment 2. Asterisks denote significant results, n.s.: not significant; **: p < 0.01. (B) Accuracy (upper) and reaction time (lower) for orientations based on their distances from category center for Categorization task. Shaded areas represent ± SEM. Vertical dashed line represents category center.

Control analyses for stimulus representation results.

(A) Time course of representational strength of orientations in EVC, IPS and sPCS using IEMs trained separately for each condition. Bar plot on the right showing corresponding averaged difference between tasks. Average difference of representational strength across later delay period (11 – 16 s) in each ROI. Positive difference indicates higher representational strength for categorization, and vice versa for negative difference. (B) Time course of representational strength of orientations in functional ROIs defined by top 500 most selective voxels during sample or delay period. Bar plot same as (A). (C) Time course of representational strength of orientations after removing voxel-wise mean activation for each condition at each TR. Bar plot same as (A). (D) Time course of stimulus decoding accuracy. Bar plot same as (A). Gray shaded areas indicate the entire memory delay following task cue. Horizontal dashed lines represent a baseline of 0 or 0.5. Blue and orange dots at the bottom indicate the significance of representational fidelity at each time point of the corresponding task at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). The bottom black dots indicate significant difference in representational fidelity between tasks. Shaded areas represent ± SEM. n.s.: not significant; black asterisks denote significance, *: p < 0.05; **: p < 0.01; ***: p < 0.001. Gray asterisk denotes marginal significance p < 0.1.

Behavioral correlation of stimulus representation in Experiment 2.

(A) Time course of correlation coefficients in EVC, IPS, and sPCS. Correlation was performed between strength of stimulus representations and behavioral performance (accuracy) for Maintenance (blue) and Categorization (orange) tasks. Gray shaded areas indicate the entire memory delay following task cue. Horizontal dashed lines represent a baseline of 0. Bottom dots indicate the significance of corresponding analyses at each time point of the corresponding task at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). Shaded areas represent ± SEM.

Category, Abstract Category and Rule Information in Experiment 1 and 2.

(A) Time course of category decoding strength in Experiment 1 with flexible rule (orange) and in Experiment 2 with fixed rule (light orange). Horizontal dashed lines represent the chance level of 0.5. Gray shaded areas indicate the entire memory delay following task cue. Bottom dots indicate FDR-corrected significance of decoding accuracy at each time point at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). Error bars represent ± SEM. (B) Time course of abstract category decoding strength in Experiment 1 (dark blue) and in Experiment 2 (light blue). Horizontal dashed lines represent a baseline of 0. Gray shaded areas indicate the entire memory delay following task cue. Bottom dots indicate uncorrected significance of decoding accuracy at each time point at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). Error bars represent ± SEM.

P-values for the time course of IEM results in Figure 2. Underline denotes significant results (p < 0.05).

P-values for the time course of correlation results in Figure 3. Underline denotes significant results (p < 0.05).

P-values for the time course of IEM results in Figure 4. Underline denotes significant results (p < 0.05).