Optimal representations are formed through an information bottleneck.

(a) Consider the task Y of discerning whether the image is of a dog. Images to the right have less information than the original image (I(Z; X) is smaller) but still contain approximately the same amount of information, I(Z; Y) to perform the task: “is this a dog?” (b) The information bottleneck trades off minimality, I(Z; X) as small as possible, with sufficiency, I(Z; Y) ≈ I(X; Y). (c) Checkerboard task. The monkey reaches to the target whose color matches the checkerboard dominant color. Because there are two equally likely target configurations where the color of the left and right targets are swapped, this task unmixes the color and direction choice. (d) A minimal sufficient representation of this task is to only retain the direction decision, in this case, reach left. A cortical information bottleneck should therefore only find direction choice information in premotor and motor output areas.

DLPFC and PMd recordings during the checkerboard task.

(a) Psychometric and (b) reaction time curves of the monkey. X-axes in both (a) and (b) depict signed coherence which indicates the relative amount of red vs. green in the checkerboard. (c) Example DLPFC and (d) PMd PSTHs aligned to checkerboard onset. Red and green traces correspond to red and green color choices, respectively. Dotted and solid traces correspond to right and left direction choices, respectively. Data are smoothed with a 25 ms Gaussian and averaged across trials. (e) PCs 1, 3, and 4 for DLPFC and (f) PCs 1,2,3 for PMd. (g) Results of dPCA analysis for DLPFC and (h) PMd showing the dPCs for direction, target configuration, and color. (i) Histogram (across sessions) of direction, color, and target configuration decode accuracy and (j) usable information for DLPFC and PMd. The large variance in recordings is due to across-session variance.

RNN modeling of the CB task.

(a) Multi-area RNN configuration. The RNN received 4 inputs. The first two inputs indicated the identity of the left and right targets, which was red or green. These inputs were noiseless. The last two inputs indicated the value of the signed color coherence (proportional to amount of red in checkerboard) and negative signed color coherence (proportional to amount of green in checkerboard). We added independent Gaussian noise to these signals (see Methods). The network outputted two analog decision variables indicating evidence towards the right target (solid line) or left target (dashed line). A decision was made in the direction of whichever decision variable passed a preset threshold (0.6) first. The time at which the decision variable passed the threshold was defined to be the reaction time. (b,c) Psychometric and reaction time curves for exemplar multi-area RNN. (d) Area 1 and Area 3 principal components for exemplar RNN. (e) CCA correlation between each area and DLPFC principal components (left) and PMd principal components (right). DLPFC activity most strongly resembles Area 1, while PMd activity most strongly resembles Area 3. See also Fig. S3 where we computed CCA as a function of the number of dimensions. (f) Relative dPCA variance captured by the direction, color, and target configuration axes. Normalization makes direction variance equal to 1. Area 1 (3) variances more closely resemble DLPFC (PMd). (g) Area 1 has significantly higher decoding accuracies and (h) usable information compared to Area 3, consistent with DLPFC and PMd.

IB hypotheses and mechanism.

(a) Candidate mechanisms for IB. (b) Axes overlap of the direction, color, and target configuration axes for DLPFC and RNN data. The direction axis is more orthogonal to the color and target configuration axes. (c) Projections onto the potent space of the intra-areal dynamics for each area. We computed the potent projection of the direction axis, color axis, and a random vector with each area’s intra-areal dynamics matrix. We found intra-areal dynamics amplify color information in Area 1, and do not selectively attenuate color information in Areas 2 and 3. (d) Illustration depicting how the orientation of the axes affect information propagation. Information on the direction axis (orange) can be selectively propagated through inter-areal connections which information on the color axis (maroon) is not. (e) Inter-areal hypotheses. (f) Projections onto the potent space between areas for the color axis, direction axis, and random vector. Regardless of the dimension of the potent space, the direction axis is preferentially aligned with the potent space, indicating the information along this axis propagates, while the color axis is approximately randomly aligned. We emphasize the high alignment of the direction axis: the direction axis has a stronger alignment onto the first potent dimension of W21 than the remaining dimensions combined. Meanwhile, the color axis is aligned at nearly chance levels, and will therefore be propagated significantly less than the direction axis. Shading indicates s.e.m.

Robustness of the information bottleneck across hyperparameters and computational advantage.

Varying (a) proportion of feedforward connections in an unconstrained network, (b) E-I connections in a Dale’s law network, (c) the proportion of feedforward and feedback E-E connections in a Dale’s law network, (d) the number of areas, and (e) the machine learning hyperparameters revealed that Area 3 color variance and color accuracy decrease as long as there is a connectivity bottleneck between areas. (f) Summary of these results quantifying usable information.

Hyperparameters of exemplar RNN.

(a) Psychometric and (b) reaction time curves for multi-area RNNs. The hyperparameters used for these RNNs are described in Table 1. Gray lines represent individual RNNs and the black solid line is the average across all RNNs.

(a) Another rotation of the first three PCs for Area 1 RNN, with PC3 amplified to show that there is a low variance color axis. (b) Area 2 PCs in the same projection as used in Figure 3. While these PCs qualitatively appear to represent the direction decision, they are distinct from Area 3, with Area 3 demonstrating a stronger resemblance to PMd activity.

Because DLPFC is higher-dimensional than PMd, we performed the CCA correlation coefficient comparison to Areas 1-3 of the RNN varying the number of dimensions used for the DLPFC PCs. Note that as dimensionality increases, CCA correlation coefficient increases because additional dimensions, which are low variance, can be weighted to better reproduce the RNN PCs. We nevertheless observe that Area 1 has the highest CCA correlation to DLPFC, while Area 3 has the least.

SVM Mutual Information (approximated using the Usable Information) for each RNN area as a function of increasing decoder regularization C. A lower C implies more regularization.

Candidate mechanism for axis orthogonalization.

(a) Top 2 PCs of RNN Area 1 activity. Trajectories are now colored based on the coherence of the checkerboard, and the condition-independent signal is not removed. We did not remove the condition-independent signal so we could directly study the high-dimensional dynamics of the RNN and its equilibrium states. The trajectories separate to two regions corresponding to the two potential target configurations (Target config 1 in blue, Target config 2 in purple). The trajectories then separate upon checkerboard color input, leading to four trajectory motifs. (b) Projection of the dPCA principal axes onto the PCs. (c) Projection of the target configuration and color inputs onto the PCs. Target configuration inputs are shown in pink, a strongly green checkerboard in green, and a strongly red checkerboard in red. Irrespective of the target configuration, green checkerboards cause the RNN state to increase along PC2 while red checkerboards cause the RNN state to decrease along PC2. The strength of the input representation is state-dependent: checkerboards corresponding to left reaches, whether they are green or red, cause smaller movements of the RNN state along the color axis. (d) Visualization of RNN dynamics and inputs during the target presentation. In the Targets On epoch, target configuration inputs cause movement along the vertical target configuration axis. The RNN dynamics implemented a leftward flow-field that pushed the RNN state into an attractor region of slow dynamics. (e) At the Target config 1 attractor, we plot the local dynamics using a previously described technique51. The RNN implements approximately opposing flow fields above and below a line attractor. Above the attractor, a leftward flow-field increases direction axis activity, while below the attractor, a rightward flow-field decreases direction axis activity. A green checkerboard input therefore pushes the RNN state into the leftward flow-field (solid green trajectories) while a red checkerboard input pushes the RNN state into a rightward flow-field (dotted red trajectories). This computes the direction choice in a given target configuration, while allowing the direction axis to be orthogonal to color inputs. Arrows are not to scale; checkerboard inputs have been amplified to be visible. (f) Visualized dynamics across multiple trajectory motifs. These dynamics hold in both target configurations leading to separation of right and left decisions on the direction axis. Arrows are not to scale, for visualization purposes.

The norm of the direction discriminability (left red -right red + left green -right green)/2 and color discriminability (left green -left red + right green -right red)/2 as a function of the processing area. The inputs are shown in lighter transparency and the overall activity is shown in solid lines. Area 1 has significant recurrence evidenced by a large separation between the input and overall activity. For our exemplar network, there is very little evidence of recurrent filtering of color information (i.e recurrent activity is never below inputs).

Relationship between PCs and inter-area potent space.

(a) Variance explained of the excitatory units in Area 1 by the top principal components and top dimensions of potent space of W21, swept across all dimensions. (b) Variance explained of the excitatory units in Area 2 by the top principal components and top dimensions of potent space of W32, swept across all dimensions. These plots show that the connections between areas do not necessarily propagate the most dominant axes of variability in the source area to the downstream area. Excitatory units were used for the comparison because only excitatory units are read out by subsequent areas. These results were upheld when comparing to the variance explained by the top principal components obtained from all units.

(a) Alignment of dpca color and direction axes from area 2 with inter-areal connections W32. ((b,c) Alignment of dpca axes with intra-areal recurrent matrices for 3 area dale networks (Area 2 and Area 3). (d). Alignment of dpca axes in area 1 with W21 for networks without Dale’s law. In contrast to Fig. 4f, direction information is not preferentially propagated. Same conventions as Fig. 4c,f.

Effect of feedback connections

bold>(a) dPCA variance in area 3 of RNNs where we varied the amount of feedback connectivity. RNNs exhibited nearly zero dPCA color variance in Area 3 across networks with 0%, 5%, and 10% feedback connections. (b, c) RNNs also exhibited minimal color representations, achieving nearly chance levels of decode accuracy and nearly zero mutual information. (d, e) Feedback projections of the color and direction axis on the feedback inter-area matrix between (d) area 2 and area 1, and (e) area 3 and area 2 (for networks trained with 5% feedback connections, across variable feedforward connectivity percentages).

Area 3 mechanism.

(a) Projection of input and overall activity onto the direction axis identified through dPCA. (b) Readout weights in Wout are sparse, with many zero entries, and selective weights for a left or right reach. (c) The unsorted connectivity matrix for the nonzero readout units (left panel), and the sorted connectivity matrix when the matrix was reordered based on the readout weight pools (right). (d) Average PSTHs from units for a leftward reach and (inset) rightwards reach. When one pool increases activity, the other pool decreases activity. (e) Averaged recurrent connectivity matrix. (f) Schematic of output area. (g) Psychometric curve after perturbation experiment, where 10% of inhibitory weights to the left pool (orange) and right pool (blue) were increased (doubled). Directional evidence is computed by using the signed coherence and using target configuration to identify the strength of evidence for a left reach and strength of evidence for a right reach. Increasing inhibition to the left excitatory pool leads to more right choices and vice versa. (h,i) Firing rates of PMd neurons for PREF direction reaches and NONPREF direction reaches aligned to checkerboard and movement onset. In Winner-take-all models, when one pool wins the firing rate of the other pool is suppressed due to lateral inhibition. In PMd, when the PREF direction wins the firing rate of the NONPREF direction decreases (blue shaded region in i).

Potential multi-area computational advantage.

(Top Row) Sensitivity to isotropic readout noise added to the output weights. (a) Noise added to all units in output (even the zero weights). (b) Noise only added to nonzero units. (Middle Row) Readout weights for left (dashed orange) and right (blue) reaches. (c) Readout weight with Dale’s Law enforced, (d) Readout weights in unconstrained networks. (e) Readout weights in unconstrained but ensuring positive outputs. (Bottom Row) No correlation between robustness to noise and usable color information across random initializations for networks with 10% feedforward inhibition, where after training some networks had color information (Fig. 5b). We used a noise perturbation to each unit of variance (f) σ2 = 0.3 and (g) σ2 = 0.5.