Figures and data

Prevalent candidate selection mechanisms in CDM cannot be dissociated by classical neural dynamics analysis.
(A) A pulse-based context-dependent decision-making task (adapted from Pagan et al., 2022). In each trial, rats were first cued by sound to indicate whether the current context was the location (LOC) context or the frequency (FRQ) context. Subsequently, rats were presented with a sequence of randomly-timed auditory pulses. Each pulse could come from either the left speaker or right speaker and could be of low frequency (6.5 kHz, light blue) or high frequency (14 kHz, dark blue). In the LOC context, rats were trained to turn right (left) if more pulses are emitted from the right (left) speaker. In the FRQ context, rats were trained to turn right (left) if there are more (fewer) high-frequency pulses compared to low-frequency pulses. (B) Two prevalent candidate mechanisms for context-dependent decision-making. Top: The input modulation mechanism. In this scenario, while the selection vector remains invariant across contexts, the stimulus input representation is altered in a way such that only the relevant stimulus input representation (i.e., the location input in the LOC context and the frequency input in the FRQ context) is well aligned with the selection vector, thereby fulfilling the requirement of context-dependent computation. Bottom: The selection vector modulation mechanism. In this scenario, although the stimulus input representation remains constant across different contexts, the selection vector itself is altered by the context input to align with the relevant sensory input. Red line: line attractor (choice axis). Green arrow: selection vector. Thick grey and blue arrows stand for the projections of the location and frequency input representation directions on the space spanned by the line attractor and selection vector, respectively. The small grey arrows stand for direction of relaxing dynamics. (C) Networks with distinct selection mechanisms may lead to similar trial-averaged neural dynamics (adapted from Pagan et al., 2022). In a model with pure input modulation, the irrelevant sensory input can still be represented by the network in a direction orthogonal to the selection vector. Therefore, using the classical targeted dimensionality reduction method (Mante et al., 2013), both the input modulation model (top) and the selection vector modulation model (bottom) would exhibit similar trial-averaged neural dynamics as shown in Pagan et al., 2022. (D) The setting of low-rank RNN modeling for the CDM task. The network has four input channels. Input 1 and input 2 represent two sensory inputs, while the other two channels indicate the context. The connectivity matrix J is constrained to be low-rank, expressed as

No selection vector modulation in rank-1 neural network models.
(A) Illustration of rank-1 connectivity matrix structure. Left: a rank-1 matrix can be represented as the outer product of an output vector mdv and an input-selection vector ndv, of which the input-selection vector ndv played the role of selecting the input information through its overlap with the input embedding vectors I1 and I2. The context signals are fed forward to the network with embedding vectors

Connectivity structure for the example rank-1 RNN.
(A) Projection of the connectivity space for the example rank-1 RNN. Each dot denotes a neuron. On each panel, the x and y coordinates of the i-th dot represent the i-th entry of the corresponding connectivity vectors.

A rank-3 neural network model with pure selection vector modulation.
(A) Illustration of the utilized rank-3 connectivity matrix structure. Left: the rank-3 matrix can be represented as the summation of three outer products, including the one with the output vector mdv and the input-selection vector ndv, the one with the output vector

Connectivity structure for the example rank-3 RNN.
(A) Projection of the connectivity space for the example rank-3 RNN. This RNN has 30,000 neurons divided into three populations. Dots of the same color represent neurons within the same population. The inset in the top right corner shows the projection on the two context input axes. For brevity, we did not include the projections onto the context input axis and other connectivity vectors. Within each population, the context input axis is independent of the other connectivity vectors. This independence implies that the context signal only affects the average sensitivity of each neuron population, thereby serving a modulatory function.

Pathway-based information flow analysis.
(A) The information flow graph of the rank-1 model presented in Figure 2. In this graph, nodes represented task variables communicating with each other through directed connections (denoted as Esender→receiver) between them. Note that Esender→receiver is the overlap between the representation direction of the sender variable (e.g., the representation directions of input variable and decision variable Iinp and

Effective coupling between task variables for rank-1 and rank-3 RNNs.
(A) Effective coupling between task variables for 100 trained rank-1 RNNs (Figure 2) in each context. Effective coupling between two task variables is defined as the overlap between the corresponding representation vector and input-selection vector. For example, the effective coupling from input 1 to decision variable

Neural activity and task variable dynamics for single pulse input.
(A) Task setting for single pulse input. We study the neural activity dynamics for low-rank RNN when they receive pulse input. For simplicity, only the RNNs’ neural activity given a pulse from input 1 in context 1 is considered. (B) Illustration of neural activity for rank-1 RNN given pulse input. For rank-1 RNN (Figure 2), dynamics of

A novel pathway-based definition of selection vector modulation.
(A) A pathway-based decomposition of contextual modulation in a model with both input and selection vector modulations. This definition is based on an explicit formula of the effective connection from the input variable to the decision variable in the model (i.e., Einp→dv+ Einp→ivEiv→dv; see Method for details). The input modulation component is then defined as the modulation induced by the change of the input representation direction across contexts. The remaining component is then defined as the selection vector modulation one. (B) Illustration of contextual modulation decomposition introduced in Pagan et al., 2022. In this definition, the selection vector has to be first reverse-engineered through linearized dynamical systems analysis. The input modulation component is then defined as the modulation induced by the change of input representation direction across contexts while the selection vector modulation component is defined as the one induced by the change of the selection vector across contexts. (C) A family of handcrafted RNNs with both input and selection vector modulations. α, β, and η represent the associated effective coupling between task variables. In this model family, the inp → dv pathway, susceptible to the input modulation, is parameterized by α while the inp → iv → dv pathway, susceptible to the selection vector modulation, is parameterized by β and η. As such, the ratio of the input modulation to the selection vector modulation can be conveniently controlled by adjusting α, β, and η. (D) Comparison of pathway-based definition in (A) with the classical definition in (B) using the model family introduced in (C).

An explicit pathway-based formula of selection vector.
(A) Illustration of how an explicit pathway-based formula of selection vector is derived. In a model with both the first-order selection pathway (i.e., inp → dv) and the second-order selection pathway (i.e., inp → iv → dv), the second-order pathway can be reduced to a pathway with the effective selection vector

The correlation between the dimensionality of neural dynamics and the proportion of selection vector modulation is confirmed in vanilla RNNs.
(A) A general neural circuit model of CDM. In this model, there are multiple pathways capable of propagating the input information to the decision variable slot, of which the blue connections are susceptible to the input modulation while the green connections are susceptible to the selection vector modulation (see Methods for details). (B) The explicit formula of both the effective connection from the input variable to the decision variable and the effective selection vector for the model in (A). (C) The setting of vanilla RNNs trained to perform the CDM task. See Methods for more details. (D) Positive correlation between effective connectivity dimension and proportion of selection vector modulation. Given a trained RNN with matrix J, the effective connectivity dimension, defined by

Training vanilla RNNs with different regularization coefficients.
(A) The influence of regularization coefficient to effective connectivity dimension of trained RNNs. For each regularization coefficient, we trained 100 full-rank RNNs (Figure 7, panel D). Larger regularization results in connectivity matrices with lower rank, leading to a smaller effective connectivity dimension. (B) The influence of regularization coefficient on selection vector modulation of trained RNNs Distribution of selection vector modulation for networks trained with different regularization coefficients. Larger regularization leads to networks that favor the input modulation strategy. (C) The relationship between the proportion of explained variance (PEV) in extra-dimensions and effective connectivity dimension. There is a strong positive correlation between the PEV in extra dimensions and the effective connectivity dimension in both contexts. In each panel, each dot denotes a trained RNN, with different colors denoting different regularization coefficients.

Verification correlation results using vanilla RNNs trained with different hyper-parameter settings.
(A) Similar results in trained vanilla RNNs with a softplus activation function. Left: Spearman’s rank correlation, r= 0.945, p<1e-3, n= 2564. Right: Spearman’s rank correlation, r=0.803, p<1e-3, n=2564. The x-axes are displayed in log-scale for both panels. (B) Similar results in trained vanilla RNNs initialized with a variance of 1/N. Left: Spearman’s rank correlation, r=0.973, p<1e-3, n=2630. Right: Spearman’s rank correlation, r=0.976, p<1e-3, n=2630. The x-axes are displayed in log-scale for both panels.

Two RNNs with distinct modulation strategies produce the same neural activities.
(A) Information flow graph for the two RNNs. The black arrows denote that the effective coupling from the head to the tail is 1. For RNN1, the closure of inp1 → iv2 on the pathway inp1 → iv+ → dv prevents inp1 from reaching iv2 and subsequently the decision variable (dv), indicating that RNN1 uses solely input modulation strategy for input 1. For RNN2, the closure of iv1 → dv on the pathway inp1 → iv1 → dv means that although inp1 can reach iv1, the subsequent step of iv1 reading dv is blocked. This indicates RNN2 uses solely the selection vector modulation strategy for input 1. (B) Connectivity weight among three example neurons in the two RNNs. Each neuron belongs to one of the three neuron populations (see Method for more details). Notice that the connectivity weights from n1 (neuron 1) to n3 (or n2 to n3) are different between the two RNNs. (C) Neural activities for the three neurons in three example trials. Orange lines denote activities for RNN1 and blue lines denote activities for RNN2. The neural activity is approximately equal between the two RNNs. (D) Histogram of the single neuron activity similarity between the two RNNs. We calculated the similarity between the activity of the i-th neuron in RNN1 and the i-th neuron in RNN2 during trial k (r2_score function in the sklearn package of Python). Averaging over the batches provides the similarity between corresponding neurons (neuron i in RNN1 and neuron i in RNN2).

Artificially introducing redundant structure can disrupt the PEV of extra dynamical modes index
A. Positive correlation between response-kernel-based index and proportion of selection vector modulation in trained vanilla RNNs. (Spearman’s rank correlation, n=75). B. No significant correlation between these two metrics in RNNs with additional task-irrelevant variance. (Spearman’s rank correlation, n=75). C. PEV of irrelevant activity showed significant difference between trained vanilla RNNs and RNNs with additional task-irrelevant variance. (one-way ANOVA test, n=75, ***: p<0.001).