(a) Bump attractor model (adapted from Wimmer et al., 2014) with overlapping working memory and motor preparation populations. The full population (112 units) received two sets of inputs: 8 working memory and 8 motor preparation inputs. Each input activated sets of 10 adjacent units, and the working memory and motor preparation inputs overlapped by 43% (which matched the percentage of overlap in the LPFC data). This architecture predicts that if we sort neurons according to the working memory ‘bumps’ in Delay 1, we would be able to see the ‘bumps’ representing motor preparation in Delay 2. (b) Cross-temporal decoding of the model (without normalization) in the full space. Delay 2 decoding performance (94.1 ± 2.5%) was significantly higher than Delay 1 performance (76.9 ± 5.3%). LP11 refers to the average cross-temporal decoding performance across the bins indicated by the dashed lines where the training and testing windows were both in Delay 1, while LP22 refers to the average cross-temporal decoding performance where the training and testing windows were both in Delay 2. There was also no reduction of performance in the working memory subspace in Delay 2 (75.9 ± 7.1% in LP11, 76.6 ± 5.9% in LP22, P > 0.81, g = 0.33, figure not shown), and the mean population activity increased from Delay 1 (1.2 ± 0.04 spikes/s) to Delay 2 (1.4 ± 0.06 spikes/s, P < 0.05, g = 4.04). These three results were inconsistent with our observations from the neuronal data. (c) Schematic illustrating the increase in information without normalization. Green and purple circles represent two different target clusters in the full space separated by an inter-cluster distance of d. If correlated preparation information (in the form of inter-cluster distance of d) was added, this would result in the full-space inter-cluster distance in Delay 2 increasing by a factor of . (d) Cross-temporal decoding performance of the model (with normalization) in the full space. Delay 2 decoding performance and Delay 1 performance were not significantly different (LP22 - LP11 overlapped with 0, p > 0.69, g = 0.54). We did not observe any changes in the mean population firing rate (D1: 1.0 ± 0.04 spikes/s, D2: 1.0 ± 0.04 spikes/s, p > 0.9, g = 0.05). (e), Cross-temporal decoding performance of the model (with normalization) in the working memory subspace. Decoding performance reduced significantly in the working memory subspace in Delay 2 (84.1 ± 7.7% in LP11, 58.4 ± 4.1% in LP22, p < 0.05, g = 3.39). (f) Cross-temporal decoding performance of the model (with normalization) in the motor preparation subspace. As expected, target information in the motor preparation subspace emerged in Delay 2. (g) Single-unit activity in the attractor model. Top, a unit exclusively selective to working memory input; Middle, a unit exclusively selective to motor preparation input; Bottom, a unit showing mixed selectivity to both working memory and motor preparation inputs. (h) An example of the activity found in the model units in one trial. There was one bump in Delay 1 (blue trace), and two bumps in Delay 2 (red trace). Note that the overlapping bump in Delay 2 was smaller, which was a result of divisive normalization. (i) The memory (yellow trace) and preparation (green trace) activity unmixed from Delay 1 and Delay 2 activity. (j) The relationship between Delay 2 decoding performance (with normalization) and strength of distractor input (as a ratio compared to target input strength). Stronger distractor inputs decreased the Delay 2 decoding performance as it increased the within-cluster-variance of the data when grouped by target labels.