Prevalent candidate selection mechanisms in CDM cannot be dissociated by classical neural dynamics analysis.

(A) A pulse-based context-dependent decision-making task (adapted from Pagan et al., 2022). In each trial, rats were first cued by sound to indicate whether the current context was the location (LOC) context or the frequency (FRQ) context. Subsequently, rats were presented with a sequence of randomly-timed auditory pulses. Each pulse could come from either the left speaker or right speaker and could be of low frequency (6.5 kHz, light blue) or high frequency (14 kHz, dark blue). In the LOC context, rats were trained to turn right (left) if more pulses are emitted from the right (left) speaker. In the FRQ context, rats were trained to turn right (left) if there are more (fewer) high-frequency pulses compared to low-frequency pulses.

(B) Two prevalent candidate mechanisms for context-dependent decision-making. Top: The input modulation mechanism. In this scenario, while the selection vector remains invariant across contexts, the stimulus input representation is altered in a way such that only the relevant stimulus input representation (i.e., the location input in the LOC context and the frequency input in the FRQ context) is well aligned with the selection vector, thereby fulfilling the requirement of context-dependent computation. Bottom: The selection vector modulation mechanism. In this scenario, although the stimulus input representation remains constant across different contexts, the selection vector itself is altered by the context input to align with the relevant sensory input. Red line: line attractor (choice axis). Green arrow: selection vector. Thick grey and blue arrows stand for the projections of the location and frequency input representation directions on the space spanned by the line attractor and selection vector, respectively. The small grey arrows stand for direction of relaxing dynamics.

(C) Networks with distinct selection mechanisms may lead to similar trial-averaged neural dynamics (adapted from Pagan et al., 2022). In a model with pure input modulation, the irrelevant sensory input can still be represented by the network in a direction orthogonal to the selection vector. Therefore, using the classical targeted dimensionality reduction method (Mante et al., 2013), both the input modulation model (top) and the selection vector modulation model (bottom) would exhibit similar trial-averaged neural dynamics as shown in Pagan et al., 2022.

(D) The setting of low-rank RNN modeling for the CDM task. The network has four input channels. Input 1 and input 2 represent two sensory inputs, while the other two channels indicate the context. The connectivity matrix J is constrained to be low-rank, expressed as , where N is the number of neurons, R is the matrix’s rank, and is a rank-1 matrix formed by the outer product of two N-dimensional connectivity vectors mr and nr.

No selection vector modulation in rank-1 neural network models.

(A) Illustration of rank-1 connectivity matrix structure. Left: a rank-1 matrix can be represented as the outer product of an output vector mdv and an input-selection vector ndv, of which the input-selection vector ndv played the role of selecting the input information through its overlap with the input embedding vectors I1 and I2. The context signals are fed forward to the network with embedding vectors and. Since the overlap between the context embedding vectors and input-selection vector ndv are close to 0, for simplicity, we omitted the context embedding vectors here. Right: an example of the trained rank-1 connectivity structure characterized by the cosine angle between every pair of connectivity vectors (see Figure 2-figure supplement 1 and Methods for details).

(B) The psychometric curve of the trained rank-1 RNNs. In context 1, input 1 strongly affects the choice, while input 2 has little impact on the choice. In context 2, the effect of input 1 and input 2 on the choice is exchanged. The shaded area indicates the standard deviation. Ctx. 1, context 1. Ctx. 2, context 2.

(C) Characterizing the change of selection vector as well as input representation direction across contexts using cosine angle. The selection vector in each context is computed using linearized dynamical system analysis. The input representation direction is defined as the elementwise multiplication between the single neuron gain vector and the input embedding vector (see Methods for details). *** p<0.001, one-way ANOVA test, n=100. Inp., input. Rep., representation.

(D) Characterizing the overlap between the input representation direction and the selection vector. *** p<0.001, one-way ANOVA test, n=100. Dir., direction.

(E) The state space analysis for example trained rank-1 RNN. The space is spanned by the line attractor axis (red line) and the selection vector (green arrow).

(F) Trial-averaged dynamics for example rank-1 RNN. We applied targeted dimensionality reduction (TDR) to identify the choice, input 1 and input 2 axes. The neuron activities were averaged according to input 1 strength, choice and context and then projected onto the choice and input 1 axes to obtain the trial-averaged population dynamics.

Connectivity structure for the example rank-1 RNN.

(A) Projection of the connectivity space for the example rank-1 RNN. Each dot denotes a neuron. On each panel, the x and y coordinates of the i-th dot represent the i-th entry of the corresponding connectivity vectors.

A rank-3 neural network model with pure selection vector modulation.

(A) Illustration of the utilized rank-3 connectivity matrix structure. Left: the rank-3 matrix can be represented as the summation of three outer products, including the one with the output vector mdv and the input-selection vector ndv, the one with the output vector and the input-selection vector , and the one with the output vector and the input-selection vector , of which the input-selection vectors and played the role of selecting the input information from I1 and I2, respectively. Right: the connectivity structure of the handcrafted RNN model characterized by the cosine angle between every pair of connectivity vectors (see Figure 3-figure supplement 1 and Methods for more details).

(B) The psychometric curve of the handcrafted rank-3 RNN model.

(C) Characterizing the change of selection vector as well as input representation direction across contexts using cosine angle. The selection vector in each context is computed using linearized dynamical system analysis. The input representation direction is defined as the elementwise multiplication between the single neuron gain vector and the input embedding vector (see Methods for details). *** p<0.001, one-way ANOVA test, n=100.

(D) Characterizing the overlap between the input representation direction and the selection vector. *** p<0.001, one-way ANOVA test, n=100.

(E) The state space analysis for example rank-3 RNN. The space is spanned by the line attractor axis (red line, invariant across contexts), selection vector in context 1 (green arrow, top panel) and selection vector in context 2 (green arrow, bottom panel).

(F) Trial-averaged dynamics for example rank-3 RNN.

Connectivity structure for the example rank-3 RNN.

(A) Projection of the connectivity space for the example rank-3 RNN. This RNN has 30,000 neurons divided into three populations. Dots of the same color represent neurons within the same population. The inset in the top right corner shows the projection on the two context input axes. For brevity, we did not include the projections onto the context input axis and other connectivity vectors. Within each population, the context input axis is independent of the other connectivity vectors. This independence implies that the context signal only affects the average sensitivity of each neuron population, thereby serving a modulatory function.

Pathway-based information flow analysis.

(A) The information flow graph of the rank-1 model presented in Figure 2. In this graph, nodes represented task variables communicating with each other through directed connections (denoted as Esenderreceiver) between them. Note that Esenderreceiver is the overlap between the representation direction of the sender variable (e.g., the representation directions of input variable and decision variable and ) and the input-selection vector of the receiver variable (e.g., the input-selection vector of decision variable ndv). As such, Esenderreceiver naturally inherits the context dependency from the representation direction of task variable: while exhibited a large value and was negligible in context 1, the values of these two exchanged in context 2.

(B) Illustration of information flow dynamics in (A) through discretized steps. At step 1, sensory information A1 and A2 were placed in inp1 and inp2 slots, respectively. Depending on the context, different information contents (i.e., in context 1 and in context 2) entered into the dv slot at step 2 and were maintained by recurrent connections in the following steps, which is desirable for the context-dependent decision-making task.

(C) The information flow graph of the rank-3 model presented in Figure 3. Different from (A), here to arrive at the dv slot, the input information has to first go through an intermediate slot (e.g., the inp1iv1dv pathway in context 1 and the inp2iv2dv pathway in context 2).

(D) Illustration of information flow dynamics in (C) through discretized steps.

Effective coupling between task variables for rank-1 and rank-3 RNNs.

(A) Effective coupling between task variables for 100 trained rank-1 RNNs (Figure 2) in each context. Effective coupling between two task variables is defined as the overlap between the corresponding representation vector and input-selection vector. For example, the effective coupling from input 1 to decision variable is the overlap between and ndv . As can be seen, the effective couplings of recurrent connection (Edvdv) are close to 1 in both contexts. The effective coupling from input task variables to decision variable is large in the relevant context (i.e. in context 1 and in context 2) and are negligible in the irrelevant context.

(B) Effective coupling between task variables for 100 trained rank-3 RNNs (Figure 3) in each context. The effective coupling for recurrent connectivity (Edvdv, i. e. overlap between and ndv) is close to 1 in both contexts. There is no direction connectivity from the input task variable to the decision variable since are zero in both contexts. The difference between in the two contexts leads to selection vector modulation.

Neural activity and task variable dynamics for single pulse input.

(A) Task setting for single pulse input. We study the neural activity dynamics for low-rank RNN when they receive pulse input. For simplicity, only the RNNs’ neural activity given a pulse from input 1 in context 1 is considered.

(B) Illustration of neural activity for rank-1 RNN given pulse input. For rank-1 RNN (Figure 2), dynamics of is always constrained in the subspace spanned by {I1, mdv}, with the corresponding coefficients being the input task variable and decision variable (kdv), respectively. Moreover, the neural activity x(t) is always constrained in a line (dashed line) orthogonal to the selection vector.

(C) Task variable dynamics for rank-1 RNN given pulse input. We run the example RNN for a given input and project the results onto each axis to obtain the dynamics of each task variable (solid line). The analytical expressions for the dynamics of each task variable are provided in the methods section (dotted line). The simulated RNN results closely match the theoretical values.

(D) Illustration of neural activity for rank-3 RNN given pulse input. For rank-3 RNN (Figure 3), dynamics of are always constrained in the subspace spanned by {I1, miv, mdv}, with the corresponding coefficients being input task variable , intermediate task variable and decision variable (kdv), respectively.

(E) Task variable dynamics for rank-3 RNN given pulse input. Solid lines denote task variable dynamics calculated numerically by RNN simulation and dotted lines denote theoretical results.

A novel pathway-based definition of selection vector modulation.

(A) A pathway-based decomposition of contextual modulation in a model with both input and selection vector modulations. This definition is based on an explicit formula of the effective connection from the input variable to the decision variable in the model (i.e., Einpdv + EinpivEivdv; see Method for details). The input modulation component is then defined as the modulation induced by the change of the input representation direction across contexts. The remaining component is then defined as the selection vector modulation one.

(B) Illustration of contextual modulation decomposition introduced in Pagan et al., 2022. In this definition, the selection vector has to be first reverse-engineered through linearized dynamical systems analysis. The input modulation component is then defined as the modulation induced by the change of input representation direction across contexts while the selection vector modulation component is defined as the one induced by the change of the selection vector across contexts.

(C) A family of handcrafted RNNs with both input and selection vector modulations. α, β, and η represent the associated effective coupling between task variables. In this model family, the inpdv pathway, susceptible to the input modulation, is parameterized by α while the inpivdv pathway, susceptible to the selection vector modulation, is parameterized by β and η. As such, the ratio of the input modulation to the selection vector modulation can be conveniently controlled by adjusting α, β, and η.

(D) Comparison of pathway-based definition in (A) with the classical definition in (B) using the model family introduced in (C).

An explicit pathway-based formula of selection vector.

(A) Illustration of how an explicit pathway-based formula of selection vector is derived. In a model with both the first-order selection pathway (i.e., inpdv) and the second-order selection pathway (i.e., inpivdv), the second-order pathway can be reduced to a pathway with the effective selection vector that exhibited the contextual dependency missing in rank-1 models.

(B) Comparison between this pathway-based selection vector and the classical one (Mante et al., 2013) using 1,000 RNNs.

(C) The connection between our understanding and the classical understanding in neural state space. Based upon the explicit formula of selection vector in (A), the selection vector modulation has to rely on the contextual modulation of additional representation direction (i.e.,) orthogonal to both the input representation direction and decision variable representation direction (, line attractor). Therefore, it requires at least three dimensions (i.e., , and ) to account for the selection vector modulation in neural state space.

The correlation between the dimensionality of neural dynamics and the proportion of selection vector modulation is confirmed in vanilla RNNs.

(A) A general neural circuit model of CDM. In this model, there are multiple pathways capable of propagating the input information to the decision variable slot, of which the blue connections are susceptible to the input modulation while the green connections are susceptible to the selection vector modulation (see Methods for details).

(B) The explicit formula of both the effective connection from the input variable to the decision variable and the effective selection vector for the model in (A).

(C) The setting of vanilla RNNs trained to perform the CDM task. See Methods for more details.

(D) Positive correlation between effective connectivity dimension and proportion of selection vector modulation. Given a trained RNN with matrix J, the effective connectivity dimension, defined by where σ1σ2 ≥⋯≥ σn are singular values of J, is used to quantify the connectivity dimensionality. Spearman’s rank correlation, r=0.919, p<1e-3, n=3,892. The x-axis is displayed in log-scale.

(E) Single neuron response kernels for two example RNNs. The neuron response kernels were calculated using a regression method (Pagan et al., 2022; see Methods for details). For simplicity, only response kernels for input 1 are displayed. Top: Response kernels for two example neurons in the RNN with low effective dimension (indicated by a star marker in panel D). Two typical response kernels, including the decision variable profile (left) and the sensory input profile (right), are displayed. Bottom: Response kernels for three example neurons in the RNN with high effective dimension (indicated by a square marker in panel D). In addition to the decision variable profile (left) and sensory input profile (middle), there are neurons whose response kernels initially increase and then decrease (right). Gray lines, response kernels in context 1 (i.e., rel. ctx.). Blue lines, response kernels in context 2 (i.e., irrel. ctx.).

(F) Principal dynamical modes for response kernels in the population level extracted by singular value decomposition. Left: Shared dynamical modes including one persistent choice mode (grey) and three transient modes (blue, orange, green) are identified across both RNNs. Right: For the i-th transient mode, the normalized percentage of explained variance (PEV) is given by , where σ1σ2 ≥⋯≥ σ39 are singular values for each transient mode (see Methods for details).

(G) Positive correlation between response-kernel-based index and proportion of selection vector modulation. For a given RNN, PEV of extra dynamical modes is defined as the accumulated normalized PEV of the second and subsequent transient dynamical modes (see Methods for details). Spearman’s rank correlation, r=0.902, p<1e-3, n=3,892. The x-axis is displayed in log-scale.

Training vanilla RNNs with different regularization coefficients.

(A) The influence of regularization coefficient to effective connectivity dimension of trained RNNs. For each regularization coefficient, we trained 100 full-rank RNNs (Figure 7, panel D). Larger regularization results in connectivity matrices with lower rank, leading to a smaller effective connectivity dimension.

(B) The influence of regularization coefficient on selection vector modulation of trained RNNs Distribution of selection vector modulation for networks trained with different regularization coefficients. Larger regularization leads to networks that favor the input modulation strategy.

(C) The relationship between the proportion of explained variance (PEV) in extra-dimensions and effective connectivity dimension. There is a strong positive correlation between the PEV in extra dimensions and the effective connectivity dimension in both contexts. In each panel, each dot denotes a trained RNN, with different colors denoting different regularization coefficients.

Verification correlation results using vanilla RNNs trained with different hyper-parameter settings.

(A) Similar results in trained vanilla RNNs with a softplus activation function. Left: Spearman’s rank correlation, r= 0.945, p<1e-3, n= 2564. Right: Spearman’s rank correlation, r=0.803, p<1e-3, n=2564. The x-axes are displayed in log-scale for both panels.

(B) Similar results in trained vanilla RNNs initialized with a variance of 1/N. Left: Spearman’s rank correlation, r=0.973, p<1e-3, n=2630. Right: Spearman’s rank correlation, r=0.976, p<1e-3, n=2630. The x-axes are displayed in log-scale for both panels.

Two RNNs with distinct modulation strategies produce the same neural activities.

(A) Information flow graph for the two RNNs. The black arrows denote that the effective coupling from the head to the tail is 1. For RNN1, the closure of inp1iv2 on the pathway inp1iv2dv prevents inp1 from reaching iv2 and subsequently the decision variable (dv), indicating that RNN1 uses solely input modulation strategy for input 1. For RNN2, the closure of iv1dv on the pathway inp1iv1dv means that although inp1 can reach iv1, the subsequent step of iv1 reading dv is blocked. This indicates RNN2 uses solely the selection vector modulation strategy for input 1.

(B) Connectivity weight among three example neurons in the two RNNs. Each neuron belongs to one of the three neuron populations (see Method for more details). Notice that the connectivity weights from n1 (neuron 1) to n3 (or n2 to n3) are different between the two RNNs.

(C) Neural activities for the three neurons in three example trials. Orange lines denote activities for RNN1 and blue lines denote activities for RNN2. The neural activity is approximately equal between the two RNNs.

(D) Histogram of the single neuron activity similarity between the two RNNs. We calculated the similarity between the activity of the i-th neuron in RNN1 and the i-th neuron in RNN2 during trial k (r2_score function in the sklearn package of Python). Averaging over the batches provides the similarity between corresponding neurons (neuron i in RNN1 and neuron i in RNN2).