Experimental design and behavioral performance.

(A) Main task procedure. Each block started with a rule cue indicating the categorization rule for this block. On each trial, participants saw two orientations consecutively and were then cued to remember one of the orientations. In maintenance task (cued by letter “P”), participants needed to maintain the remembered orientation as precisely as possible. In categorization task (cued by letter “C”), participants needed to categorize the remembered orientation following the categorization rule of the current block. Maintenance and categorization trials were interleaved within an experimental block of nine trials. Categorization rule (Rule A or Rule B) switched randomly on a block-by-block basis. Response keys (“F” and “J”) for categorization task were randomly assigned to the two categories. Each pair of keys displayed at random locations within the category to eliminate information on rule boundaries. (B) Illustration of the two orthogonal categorization rules (Rule A and Rule B). (C) Rule learning performance during learning session for Rule A (purple) and Rule B (pink). (D) Errors in participants’ self-reported rule boundaries. Errors were calculated as the average distance from reported boundaries to ground truth boundaries. (E) Accuracy compared between tasks. Boxplots show the median and the 25th and 75th percentiles. Whiskers extend to 1.5 Inter quartile range (IQR) from the quartiles. Asterisks denote significant results, n.s.: not significant; **: p < 0.01. (F) Reaction time compared between tasks. Same conventions as (E). (G) Upper panel: accuracy in relation to distance from categorization boundaries. Lower panel: reaction time in relation to distance from categorization boundaries. Shaded areas represent ± SEM.

Orientation reconstructions at the population level using IEMs.

(A) Reconstructed population-level orientation representations from selected time points at EVC, IPS, and sPCS for maintenance (blue) and categorization (orange) tasks, respectively. X axis represents distance from the cued orientation (at 0°), and y axis represents reconstructed channel responses in arbitrary units. Significant orientation representation was observed at 6 s and 12 s but not at 0 s. Shaded areas represent ± SEM. (B) To quantify the strength of orientation reconstructions, we calculated the reconstruction fidelity by first projecting the channel response at each orientation onto a vector at the cued orientation and then averaging the projected vectors. (C) Time course of representational strength of orientations at EVC, IPS and sPCS. Gray shaded areas indicate the entire memory delay following task cue. Blue and orange dots at the bottom indicate the FDR-corrected significance of representational fidelity at each time point of the corresponding task at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). The bottom black dots indicate significant difference in representational fidelity between tasks (uncorrected). Horizontal dashed lines represent a baseline of 0. Shaded areas represent ± SEM. (D) Average difference of representational strength across 11 – 16 s in each ROI (from EVC to sPCS: p < 0.00001, p = 0.063, p = 0.007, respectively). Positive difference indicates higher representational strength for categorization, and vice versa for negative difference. Black asterisks denote FDR-corrected significance, **: p < 0.01; ***: p < 0.001.

Behavioral correlation of stimulus representation for maintenance (blue) and categorization (orange) tasks.

(A) Time course of correlation coefficients between behavioral performance and orientation representational fidelity in EVC, IPS, and sPCS. Gray shaded areas indicate the entire memory delay following task cue. Blue and orange dots at the top indicate significance of correlation (uncorrected) at each time point at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). (B) Correlation scatter plots at representative time points (7 s and at 12 s) in EVC, IPS, and sPCS. R denotes Pearson correlation coefficients. (C) Correlation between behavioral performance and orientation representational fidelity collapsed across the late epoch (11 – 16 s). Asterisks denote significant results, *: p < 0.05; **: p < 0.01.

Orientation reconstructions in Experiment 2 at the population level using IEMs.

(A) Time course of representational strength of orientations at EVC, IPS and sPCS. Gray shaded areas indicate the entire memory delay following task cue. Blue and orange dots at the bottom indicate the FDR-corrected significance of representational fidelity at each time point of the corresponding task at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). The bottom black dots indicate significant difference in representational fidelity between tasks (uncorrected). Horizontal dashed lines represent a baseline of 0. Shaded areas represent ± SEM. (B) Average difference of representational strength across an early (top, 5 – 10 s) and a late (bottom, 11 – 16 s) task epoch in each ROI. Positive difference indicates higher representational strength for categorization, and vice versa for negative difference (FDR-corrected). *: p < 0.05; ***: p < 0.001.

Decoding performance for category and abstract category information.

(A) Average category decoding accuracy using category labels under true rule across the late task epoch (11 – 16 s) in each ROI of both experiments. (B) Average category decoding accuracy using category labels under orthogonal rule across the late task epoch (11 – 16 s) in each ROI of both experiments. (C) Schematic illustration of abstract category decoding. In categorization task, category information can be decoded using category labels according to the true categorization rule. On the other hand, category can also be decoded due to stimulus similarity. Thus, to remove stimulus-dependent categorical information, we calculated an abstract category index by removing decoding accuracy using orthogonal category boundaries (assuming comparable stimulus-dependent effect) from that using true rule boundaries. (D) Average abstract category decoding index across the late task epoch (11 – 16 s) in each ROI of both experiments. Black asterisks denote FDR-corrected significance, *: p < 0.05; **: p < 0.01; ***: p < 0.001.

Architecture of RNNs and simulation results.

(A) All networks consist of 3 layers of artificial units: the input, hidden and output layers. For both RNN1 and RNN2, the input layer contains 20 units including 15 orientation-tuned units (red) and 5 cue units (retro-cue and task cue, orange and yellow). The hidden layer consists of three modules of 200 recurrent units with short-term synaptic plasticity (STSP), further divided into 80% excitatory (black) and 20% inhibitory (white). Connectivity within each module (black arrow) is denser compared to between modules (red and green arrows), which only occur between excitatory units. Only excitatory units in Module 1 receive projections from the input layer and only excitatory units in Module 3 project to the output units. For RNN1, networks output (0,1) or (1,0) through the 2 units in the output layer to indicate responses. For RNN2, the network output (0,1) or (1,0) to report the category to which the cued orientation belonged in the categorization task, or (0,0) in the maintenance task (blue units). Importantly, the models also output the orientation itself through 15 additional orientation-tuned units (red). (B) Difference in stimulus decoding between tasks in RNN1 (upper panel) and RNN2 (lower panel). Results were averaged across the delay period. Positive difference indicates higher decoding accuracy for categorization, and negative difference indicates higher decoding accuracy for maintenance. The inset above illustrates stimulus difference in human fMRI results during late epoch in Experiment 1 to provide a reference for expected patterns in RNNs. (C) Average abstract category information across the delay period for RNN1 (upper panel) and RNN2 (lower panel). The inset above illustrates abstract category representation in human fMRI. Error bars represent ± SEM. Black asterisks denote FDR-corrected significance, *: p < 0.05; **: p < 0.01; ***: p < 0.001.

Visualization of the anatomical locations of the ROIs on the MNI brain.

Control analyses for stimulus representation results in Experiment 1.

(A) Time course of representational strength of orientations in EVC, IPS and sPCS using IEMs trained separately for each condition. Bar plot on the right shows corresponding averaged difference between tasks across the late task epoch (11 – 16 s) in each ROI. Positive difference indicates higher representational strength for categorization, and vice versa for negative difference. Gray shaded areas indicate the entire memory delay following task cue. Horizontal dashed lines represent a baseline of 0 or 0.5. Blue and orange dots at the bottom indicate the significance of representational fidelity at each time point of the corresponding task at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). The bottom black dots indicate significant difference in representational fidelity between tasks. Shaded areas represent ± SEM. black asterisks denote significance, *: p < 0.05; **: p < 0.01; ***: p < 0.001. Gray asterisk denotes marginal significance p < 0.1. (B) Time course of stimulus decoding accuracy. Same conventions as (A). (C) Time course of representational strength of orientations after removing voxel-wise mean activation for each condition at each TR. Same conventions as (A). (D) Time course of representational strength of orientations in functional ROIs defined by top 500 most selective voxels during sample or delay period. Same conventions as (A).

Orientation reconstructions in primary motor cortex (M1) at the population level using IEMs.

(A) Time course of representational strength of orientations at M1 in Experiment 1. Gray shaded areas indicate the entire memory delay following task cue. The bottom black dots indicate significant difference in representational fidelity between tasks (uncorrected) at p < 0.05. Horizontal dashed lines represent a baseline of 0. Shaded areas represent ± SEM. (B) Time course of representational strength of orientations at M1 in Experiment 2. Same conventions as (A).

Behavioral performance of Experiment 2.

(A) Accuracy (upper) and reaction time (lower) of maintenance (blue) and categorization (orange) tasks in Experiment 2. Asterisks denote significant results, n.s.: not significant; **: p < 0.01. (B) Accuracy (upper) and reaction time (lower) for orientations based on their distances from category center for categorization task. Shaded areas represent ± SEM. Vertical dashed line represents category center.

Behavioral correlation of stimulus representation in Experiment 2.

Time course of correlation coefficients in EVC, IPS, and sPCS. Correlation was performed between strength of stimulus representations and behavioral performance (accuracy) for maintenance (blue) and categorization (orange) tasks. Gray shaded areas indicate the entire memory delay following task cue. Horizontal dashed lines represent a baseline of 0. Bottom dots indicate the significance of corresponding analyses at each time point of the corresponding task at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). Shaded areas represent ± SEM.

Time course of category and abstract category decoding performance in Experiment 1 and 2.

(A) Time course of category decoding strength in Experiment 1 with flexible rule (orange) and in Experiment 2 with fixed rule (light orange). Horizontal dashed lines represent the chance level of 0.5. Gray shaded areas indicate the entire memory delay following task cue. Bottom dots indicate uncorrected significance of decoding accuracy at each time point at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). Shaded areas represent ± SEM. (B) Time course of abstract category decoding strength in Experiment 1 (dark blue) and in Experiment 2 (light blue). Horizontal dashed lines represent a baseline of 0.

Stimulus, category, and abstract category results in additional frontal ROIs in Experiment 1 and 2.

(A) Time course of representational strength of orientations in iPCS, IFS, and MFG in Experiment 1 (top panel) and Experiment 2 (bottom panel). Gray shaded areas indicate the entire memory delay following task cue. Blue and orange dots at the bottom indicate the FDR-corrected significance of representational fidelity at each time point of the corresponding task at p < 0.05 (small), p < 0.01 (medium), and p < 0.001 (large). The bottom black dots indicate significant difference in representational fidelity between tasks (uncorrected). Horizontal dashed lines represent a baseline of 0. Shaded areas represent ± SEM. (B) Average category decoding accuracy across the late task epoch (11 – 16 s) in iPCS, IFS, and MFG of both experiments. (C) Average abstract category decoding index across the late task epoch (11 – 16 s) in the same regions of both experiments, black asterisks denote FDR-corrected significance, n.s.: not significant; *: p < 0.05.

RNN results using IEMs.

Difference in the IEM fidelity of orientation representations between tasks in RNN2. Results were averaged across the delay period. Uncorrected p-values from Module 1 to 3: 0.10, 0.48, 0.01. Positive difference indicates higher fidelity for categorization, and negative difference indicates higher fidelity for maintenance. Error bars represent ± SEM.

P-values for the time course of IEM results in Figure 2. Underline denotes significant results (p < 0.05).

P-values for the time course of correlation results in Figure 3. Underline denotes significant results (p < 0.05).

P-values for the time course of IEM results in Figure 4. Underline denotes significant results (p < 0.05).