Semantic illustration of extracting and validating behaviorally-relevant signals.

a-e The ideal decomposition of raw signals. a, The temporal neuronal activity of raw signals, where x-axis denotes time, and y-axis represents firing rate. Raw signals are decomposed to relevant (b) and irrelevant (d) signals. The red dotted line indicates the decoding performance of raw signals. The red and blue bars represent the decoding performance of relevant and irrelevant signals. The purple bar represents the generating performance of relevant signals, which measures the neural similarity between generated signals and raw signals. The longer the bar, the larger the performance. The ground truth of relevant signals decode information perfectly (c, red bar) and is similar to raw signals to some extent (c, purple bar), and the ground truth of irrelevant signals contain little behavioral information (e, blue bar). f-h, Three different cases of behaviorally-relevant signals distillation. f, When the model is biased toward generating relevant signals that are similar to raw signals, it will achieve high generating performance, but the decoding performance will suffer due to the inclusion of too many irrelevant signals. As it is dificult for models to extract complete relevant signals, the residuals will also contain some behavioral information. g, When the model is biased toward generating signals that prioritize decoding over similarity to raw signals, it will achieve high decoding performance, but the generating performance will be low. Meanwhile, the residuals will contain a significant amount of behavioral information. h, When the model balances the trade-off of decoding and generating capabilities of relevant signals, both decoding and generating performance will be good, and the residuals will only contain a little behavioral information.

Evaluation of separated signals.

a, The obstacle avoidance paradigm b, The decoding R2 between true velocity and predicted velocity of raw signals (purple bars with slash lines) and behaviorally-relevant signals obtained by d-VAE (red), PSID (pink), pi-VAE (green), LFADS (blue), and VAE (light green) on dataset A. Error bars denote mean ± standard deviation (s.d.) across five cross-validation folds. Asterisks represent significance of Wilcoxon rank-sum test with P < 0.05, P < 0.01. c, Same as b, but for behaviorally-irrelevant signals obtained by five different methods. d, The neural similarity (R2) between raw signals and behaviorally-relevant signals extracted by d-VAE, PSID, pi-VAE, LFADS, and VAE. Error bars represent mean ± s.d. across five cross-validation folds. Asterisks indicate significance of Wilcoxon rank-sum test with P < 0.01. e-h and i-l, Same as a-d, but for dataset B with the center-out paradigm (e) and dataset C with the self-paced reaching paradigm (i). m, The firing rates of raw signals and distilled signals obtained by d-VAE in five held-out trials under the same condition of dataset B.

The effect of irrelevant signals on analyzing neural activity at the single-neuron level.

a, The angle difference (AD) of preferred direction (PD) between raw and distilled signals as a function of the R2 of raw signals on datasets A, where R2 represents the proportion of neuronal activity explained by the linear encoding model. Each black point represents a neuron (n = 90). The red curve is the fitting curve between R2 and AD. Five example larger R2 neurons’ PDs are shown in the inset plot, where the solid and dotted line arrows represent the PDs of relevant and raw signals, respectively. b, Comparison of the cosine tuning fit (R2) before and after distillation of single neurons (black points), where the x-axis and y-axis represent neurons’ R2 of raw and distilled signals, respectively. c, Comparison of neurons’ Fano factor (FF) averaged across conditions of raw (x-axis) and distilled (y-axis) signals, where FF is used to measure the neuronal variability of different trials in the same condition. d, Boxplots of raw (purple) and distilled (red) signals under different conditions for all neurons (12 conditions). Boxplots represent medians (lines), quartiles (boxes), and whiskers extending to ± 1.5 times the interquartile range. The broken lines represent the mean FF across all neurons. e-h, Same as a-d, but for dataset B (n=159, 8 conditions). i, Example of three neurons’ raw firing activity decomposed into behaviorally-relevant and irrelevant parts using all trials under two conditions (2 of 8 directions) in held-out test sets of dataset B.

The effect of irrelevant signals on analyzing neural activity at the population level.

a,b, PCA is separately applied on relevant and irrelevant signals to get relevant PCs and irrelevant PCs. The thick lines represent the cumulative variance explained for the signals on which PCA has been performed, while the thin lines represent the variance explained by those PCs for other signals. Red, blue, and gray colors indicate relevant signals, irrelevant signals, and random Gaussian noise (0, I) (for chance level) where the mean vector is zero and the covariance matrix is the identity matrix. The horizontal lines represent the percentage of variance explained. The vertical lines indicate the number of dimensions accounted for 90% of the variance in behaviorally-relevant (left) and irrelevant (right) signals. For convenience, we defined the principal component subspace describing the top 90% variance as the primary subspace and the subspace capturing the last 10% variance as the secondary subspace. The cumulative variance explained for behaviorally-relevant (a) and irrelevant (b) signals got by d-VAE on dataset A. c,d, PCA is applied on raw signals to get raw PCs. c, The bar plot shows the composition of each raw PC. The inset pie plot shows the overall proportion of raw signals, where red, blue, and purple colors indicate relevant signals, irrelevant signals, and the correlation between relevant and relevant signals. The PC marked with a red triangle indicates the last PC where the variance of relevant signals is greater than or equal to that of irrelevant signals. d, The cumulative variance explained by raw PCs for different signals, where the thick line represents the cumulative variance explained for raw signals (purple), while the thin line represents the variance explained for relevant (red) and irrelevant (blue) signals. e-h, Same as a-d, but for dataset B.

Smaller R2 neurons encode rich behavioral information in complex nonlinear ways.

a, The comparison of decoding performance between raw (purple) and distilled signals (red) on dataset A with different neuron groups, including smaller R2 neuron (R2<= 0.03), larger R2 neuron (R2> 0.03), and all neurons. Error bars indicate mean ± s.d. across five cross-validation folds. Asterisks denote significance of Wilcoxon rank-sum test with P < 0.05, P < 0.01. b, The correlation matrix of all neurons of raw (left) and behaviorally-relevant (right) signals on dataset A. Neurons are ordered to highlight correlation structure (details in Methods). c, The decoding performance of KF (left) and ANN (right) with neurons dropped out from larger to smaller Ron dataset A. The vertical gray line indicates the number of dropped neurons at which raw and behaviorally-relevant signals have the greatest performance difference. d-f, Same as a-c, for dataset B.

Signals composed of smaller variance PCs encode rich behavioral information in complex nonlinear ways.

a, The comparison of decoding performance between raw (purple) and distilled signals (red) composed of different raw PC groups, including smaller variance PCs (the proportion of irrelevant signals that make up raw PCs is higher than that of relevant signals), larger variance PCs (the proportion of irrelevant signals is lower than that of relevant ones) on dataset A. Error bars indicate mean ± s.d. across five cross-validation folds. Asterisks denote significance of Wilcoxon rank-sum test with *P < 0.05, **P < 0.01. b, The cumulative decoding performance of signals composed of cumulative PCs that are ordered from smaller to larger variance using KF (left) and ANN (right) on dataset A. The red patches indicate the decoding ability of the last 10% variance of relevant signals. c, The cumulative decoding performance of signals composed of cumulative PCs that are ordered from larger to smaller variance using KF (left) and ANN (right) on dataset A. The red patches indicate the decoding gain of the last 10% variance signals of relevant signals superimposing on their top 90% variance signals. The inset shows the partially enlarged plot for view clearly. d-f, Same as a-c, but for dataset B.

Evaluation of separated signals on the synthetic dataset.

a, The temporal neuronal activity of raw signals (the purple line) of an example test trial, which is decomposed into relevant (b) and irrelevant (c) signals. b, Relevant signals (orange lines) extracted by d-VAE under three distillation cases, where bold gray lines represent ground truth relevant signals. Results show that when a = 0.09, the relevant signals are too similar to raw signals but not similar to ground truth; when a = 0.9, the relevant signals are well similar to the ground truth; when a = 9, the relevant signals are not similar to the ground truth. c, Same as b, but for irrelevant signals (blue lines). Notably, when a = 9, some useful signals are left in irrelevant signals. d, The decoding R2 of distilled relevant signals of three cases. Error bars indicate mean ± s.d. across five cross-validation folds. Results demonstrate that decoding R2 increases as a increases. e, Same as d, but for irrelevant signals. Notably, when a = 9, irrelevant signals will contain large behavioral information. f, The neural similarity between relevant and raw signals. Results show that the neural R2 decreases as a increases. g, The neural R2 between relevant signals and the ground truth. Results show that d-VAE can utilize a proper trade-off to extract effective relevant signals that are similar to the ground truth. h, The decoding R2 between true velocity and predicted velocity of raw signals (purple bars with slash lines), the ground truth signals (gray) and behaviorally-relevant signals obtained by d-VAE (red), PSID (pink), pi-VAE (green), LFADS (blue), and VAE (light green) on dataset A. Error bars denote mean ± standard deviation (s.d.) across five cross-validation folds. Asterisks represent significance of Wilcoxon rank-sum test with *P < 0.05, **P < 0.01. i, Same as h, but for irrelevant signals. j, The neural R2 between generated relevant signals and raw signals. k, Same as j, but for the ground truth.

Decoding performance comparison with CEBRA.

a, The decoding R2 comparison between d-VAE and CEBRA on synthetic dataset. The red bar represents the behaviorally-relevant signals extracted by d-VAE, and the light purple bar represents the behaviorally-relevant embeddings extracted by CEBRA. Error bars indicate mean ± s.d. across five cross-validation folds. Asterisks denote significance of Wilcoxon rank-sum test with *P < 0.05, **P < 0.01. b-d, Same as a, but for datasets A, B, and C, respectively.

The effect of irrelevant signals on relevant signals at the single-neuron level.

a,b Same as Fig. 3, but for dataset C. a, The angle difference (AD) of preferred direction (PD) between raw and distilled signals as a function of the R2 of raw signals. Each black point represents a neuron (n=91). The red curve is the fitting curve between R2 and AD. Five example larger R2 neurons’ PDs are shown in the inset plot, where the solid line arrows represent the PD of relevant signals, and the dotted line arrows represent the PDs of raw signals. b, Comparison of the cosine tuning fit (R2) before and after distillation of single neurons (black points), where the x-axis and y-axis represent neurons’ R2 of raw and distilled signals.

The firing activity of example neurons.

a, Example of three neurons’ raw firing activity decomposed into behaviorally-relevant and irrelevant parts using all trials in held-out test sets for four conditions (4 of 8 directions) of center-out reaching task. b, Example of three neurons’ raw firing activity decomposed into behaviorally-relevant and irrelevant parts using all trials in held-out test sets for four conditions (4 of 12 conditions) of obstacle avoidance task.

The effect of irrelevant signals on analyzing neural activity at the population level.

a-d, Same as Fig. 4, but for dataset C. a,b, PCA is separately applied on relevant and irrelevant signals to get relevant PCs and irrelevant PCs. The thick lines represent the cumulative variance explained for the signals on which PCA has been performed, while the thin lines represent the variance explained by those PCs for other signals. Red, blue, and gray colors indicate relevant signals, irrelevant signals, and random Gaussian noise (for chance level). The cumulative variance explained for behaviorally-relevant (a) and irrelevant (b) signals got by d-VAE. c,d, PCA is applied on raw signals to get raw PCs. c, The bar plot represents the composition of each raw PC. The inset pie plot shows the overall proportion of raw signals, where red, blue, and purple colors indicate relevant signals, irrelevant signals, and the correlation between relevant and relevant signals. The PC marked with a red triangle indicates the last PC where the variance of relevant signals is greater than or equal to that of irrelevant signals. d, The cumulative variance explained by raw PCs for different signals, where the thick lines represent the cumulative variance explained for raw signals(purple), while the thin lines represent the variance explained for relevant (red) and irrelevant (blue) signals.

The effect of irrelevant signals obtained by pi-VAE on analyzing neural activity at the population level.

a-l, Same as Fig. 4 and Fig. S5, but for pi-VAE. a,b, PCA is separately applied on relevant and irrelevant signals to get relevant PCs and irrelevant PCs. The thick lines represent the cumulative variance explained for the signals on which PCA has been performed, while the thin lines represent the variance explained by those PCs for other signals. Red, blue, and gray colors indicate relevant signals, irrelevant signals, and random Gaussian noise (for chance level). The cumulative variance explained for behaviorally-relevant (a) and irrelevant (b) signals on dataset A. c,d, PCA is applied on raw signals to get raw PCs. c, The bar plot represents the composition of each raw PC. The inset pie plot shows the overall proportion of raw signals, where red, blue, and purple colors indicate relevant signals, irrelevant signals, and the correlation between relevant and relevant signals. The PC marked with a red triangle indicates the last PC where the variance of relevant signals is greater than or equal to that of irrelevant signals. d, The cumulative variance explained by raw PCs for different signals, where the thick line represents the cumulative variance explained for raw signals(purple), while the thin line represents the variance explained for relevant (red) and irrelevant (blue) signals. e-h, i-l, Same as a-d, but for datasets B and C.

The rotational dynamics of raw, relevant, and irrelevant signals.

datasets A and B have twelve and eight conditions, respectively. We get the trial-averaged neural responses for each condition, then apply jPCA to raw, relevant, and irrelevant signals to get the top two jPC, respectively. a, The rotational dynamics of raw neural signals. b, The rotational dynamics of relevant signals obtained by d-VAE. c, The rotational dynamics of irrelevant signals obtained by d-VAE. We can see that the rotational dynamics of behaviorally-relevant signals are similar to that of raw signals, but the rotational dynamics of behaviorally-irrelevant signals are irregular. d-f, Same as a-c, but for dataset B.

The cumulative variance of raw and behaviorally-relevant signals.

a, PCA is applied separately on raw and distilled behaviorally-relevant signals to get raw PCs and relevant PCs. The cumulative variance of raw (purple) and behaviorally-relevant signals (red) on dataset A (n=90). Two upper left corner curves denote the variance accumulation from larger to smaller variance PCs. Two lower right corner curves indicate accumulation from smaller to larger variance PCs. The horizontal lines represent the 10%, and 90% variance explained. The vertical lines indicate the number of dimensions accounted for the last 10% and top 90% of the variance of behaviorally-relevant (red) and raw (purple) signals. Here we call the subspace composed by PCs of capturing top 90% variance the primary subspace, and the subspace composed by PCs of capturing last 10% variance the secondary subspace. We can see that the dimensionality of the primary subspace of raw signals is significantly higher than that of relevant signals, indicating that irrelevant signals make us overestimate the dimensionality of specific behaviors. b,c, Same as a, but for datasets B (n=159) and C (n=91).

trivial neural activities encode rich behavioral information in complex nonlinear ways.

a-c, Same as Fig. 5, but for dataset C (n=91). d-f, Same as Fig. 6, but for dataset C. a, The comparison of decoding performance between raw (purple) and distilled signals (red) with different neuron groups, including smaller R2 neuron (R2 <= 0.03), larger R2 neuron (R2 > 0.03), and all neurons. Error bars indicate mean ± SD across cross-validation folds. Asterisks denote significance of Wilcoxon rank-sum test with *P < 0.05, **P < 0.01. b, The correlation matrix of all neurons of raw and behaviorally-relevant signals. c, The decoding performance of KF (left) and ANN (right) with neurons dropped out from larger to smaller R2. The vertical gray lines indicate the number of dropped neurons at which raw and behaviorally-relevant signals have the greatest performance difference. d, The comparison of decoding performance between raw (purple) and distilled signals (red) composed of different raw-PC groups, including smaller variance PCs (the proportion of irrelevant signals that make up raw PCs is higher than that of relevant signals), larger variance PCs (the proportion of irrelevant signals is lower than that of relevant ones). Error bars indicate mean ± s.d. across five cross-validation folds. Asterisks denote significance of Wilcoxon rank-sum test with *P < 0.05, **P < 0.01. e, The cumulative decoding performance of signals composed of cumulative PCs that are ordered from smaller to larger variance using KF (left) and ANN (right). The red patches indicate the decoding ability of the last 10% variance of relevant signals. f, Same as e, but PCs are ordered from larger to smaller variance. The red patches indicate the decoding gain of the last 10% variance signals of relevant signals superimposing on their top 90% variance signals.

Simulation results.

a-c, The simulation dataset A simulated a situation where the smaller R2 neurons contain a certain amount of behavioral information, but the behavioral information cannot be decoded from these neurons due to being covered by noise. We used this experiment to demonstrate that d-VAE can utilize the larger R2 neurons to help the smaller R2 neurons restore their original face. a, The comparison of decoding performance between raw (purple), distilled (red), and ground truth (gray) signals with different neuron groups, including smaller R2 neuron (R2 <= 0.03), larger R2 neuron (R2 > 0.03), and all neurons. Error bars indicate mean ± s.d. (n=5 folds). Asterisks denote the significance of Wilcoxon rank-sum test with **P < 0.01. b, The decoding performance of behaviorally-irrelevant signals got by d-VAE with different neuron groups. Error bars indicate mean ± s.d. (n=5 folds). c, The neural similarity between distilled behaviorally-relevant (red) and raw signals (purple) and between relevant and ground truth signals (gray) with different neuron groups. Error bars indicate mean ± s.d. (n=5 folds). d-f, The simulation dataset B simulated the situation where linear decoding is significantly inferior to nonlinear decoding. We used this experiment to demonstrate that d-VAE can not make the linear decoder achieve similar performance as the nonlinear decoder. d, The comparison of decoding performance between raw (purple), relevant (red), and ground truth (gray) signals. Error bars indicate mean ± s.d. (n=5 folds). e, The decoding performance of behaviorally-irrelevant signals got by d-VAE. Error bars indicate mean ± s.d. (n=5 folds). f, The neural similarity between distilled behaviorally-relevant (red) and raw signals (purple) and between relevant and ground truth signals (gray). Error bars indicate mean ± s.d. (n=5 folds).

Smaller variance PC signals preferentially improve lower-speed velocity.

a, The comparison of absolute improvement ratio between lower-speed (red) and higher-speed (purple) velocity when superimposing secondary signals on primary signals with KF on dataset A. Error bars indicate mean ± s.d. across five cross-validation folds. Asterisks denote significance of Wilcoxon rank-sum test with *P < 0.05, **P < 0.01. b, c, Same as a, but for datasets B and C. d, The comparison of relative improvement ratio between lower-speed (red patch) and higher-speed (no patch) velocity when superimposing secondary signals on primary signals with KF on dataset B. The first-row plot shows five example trials’ speed profile of the decoded velocity using primary signals (light blue line) and full signals (dark blue line; superimposing secondary signals on primary signals) and the true velocity (red line). The black horizontal line denotes the speed threshold. The second and third-row plots are the same as the first-row plot, but for X and Y velocity. The fourth-row plot shows the relative improvement ratio for each point in trials.