Adjoint propagation of error signal through modular recurrent neural networks for biologically plausible learning
Figures
Schematics of the MR-RNN and the AP framework.
(a) Illustration of cross-region recurrent connections in the brain. Each filled circle represents a local cortex area. (b) Illustration of horizontal and layer-wise recurrent connections in a cortical area. (c) An MR-RNN model with two modular RNNs and . In this model, the internal connections of RNNs are sparse. For visual clarity, each circle represents a group of neurons, and the arrows represent bundles of interconnections. (d) The inference phase (left half) and the learning phase (right half) of AP. ①, The input data at clamps through . ②, in turn clamps through . ③, produces an output at via . ④, The prediction error generated in the output layer learns . ⑤, The error of layer nudges , producing local error at layer for learning . ⑥, Subsequently, ’s error nudges and yields local error for learning . The shadows on the layers and arrows denote error computation and weight updating, respectively.
Learning performance of the AP framework.
(a–b) The training and test accuracy on MNIST and FMNIST datasets. The increasing accuracies confirm the effectiveness of the AP algorithm in training the MR-RNN model. (c) MLE/FTMLE and accuracy on FMNIST versus SR. The accuracy peaks around SR of 0.1. SR affects the dynamics of the RNN and the accuracy on the datasets. MLE and FTMLE almost overlap with each other. Both increase when SR increases. The shaded area indicates three regions, namely stable, periodic, and chaotic dynamics of the RNN. (d) States evolution with different SR. The states of the network (top row) and the one-step difference (bottom row) are plotted against iteration step . The one-step difference is the state of a step minus the state of its previous step, which reveals how much the states change.
-
Figure 2—source data 1
The source data for Figure 2a–b.
- https://cdn.elifesciences.org/articles/108237/elife-108237-fig2-data1-v2.xlsx
-
Figure 2—source data 2
The source data for Figure 2c.
- https://cdn.elifesciences.org/articles/108237/elife-108237-fig2-data2-v2.xlsx
-
Figure 2—source data 3
The source data for Figure 2d.
- https://cdn.elifesciences.org/articles/108237/elife-108237-fig2-data3-v2.xlsx
Error representations and fast learning.
(a) Schematics of the short-trajectory error (STE). In the inference phase, an RNN evolves towards end point along the blue trajectory. We can terminate the neural evolution earlier and compute output error at intermediate steps (, , and ). Under the influence of its input and error, the RNN evolves from the same starting point and follows the red trajectory to , , , or . The vectors , , , all approximate the end-point error . (b) Visualization of STE trajectories. The STE trajectories of a 1024-node RNN for training on FMNIST dataset, visualized after PCA, correspond well to the schematics in a. (c) Test accuracy on FMNIST with different STEs. The legend - specifies the numbers of iterations in the inference phase and the learning phase. Note that the first step is to feed the inference signal or error to an RNN, so means the whole RNN states update once. The test accuracy for different STEs stays nearly the same for a small SR. For SR larger than 1, more iterations perform worse, because the neural state easily saturates and the inference and error signals are lost in the iterations. (d) Schematics of the error-perturbed error. The blue trajectory of the inference phase is the same as in a. However, if the output error is computed at an intermediate step (e.g. ) and lets the RNN continue updating with the error-perturbation, then the trajectory is steered towards destination . The vector also approximates the end point error.(e) Visualization of EPE trajectories. The actual EPE trajectories of a 1024-node RNN for training on FMNIST dataset, visualized after PCA, correspond well to the schematics in d. (f) Test accuracy on FMNIST with different EPEs.
-
Figure 3—source data 1
The source data for Figure 3b, c, e and f.
- https://cdn.elifesciences.org/articles/108237/elife-108237-fig3-data1-v2.xlsx
Comparison with EP and scalability.
(a) Test accuracies of EP and AP with 2-hidden-layer (2-HL) on FMNIST versus training epochs. The numbers of iterations in the first/inference and second/learning phase are specified in the legend. EP learns faster for the first few epochs and exhibits signs of instability when iteration numbers are reduced to 32 and 10 for the two phases. In contrast, AP’s accuracy is stable. (b) Test accuracies versus time for 2-hidden-layer (2-HL) networks. AP takes shorter time than EP to achieve the same accuracy. (c) Test accuracies versus time for 3-hidden-layer (3-HL) networks. The instability issue of EP becomes more severe when scaled to deeper networks, whereas AP’s performance remains as stable as 2-hidden-layer networks. (d) Test accuracies on FMNIST of AP, FA, and BP with different number of layers. ‘2e-4’ is the learning rate. (e) Test accuracies on FMNIST with or without role constraint versus the number of layers. ‘16–8’ is the number of iterations in the first and second phase, which is sufficient for AP without role constraint. There are 256 neurons per block by default. RNNs without role constraints maintain the size of the shared block, thus the neural number of the layer is reduced by a factor of four. To compare the size effect, we also plot out the accuracy of unconstrained RNN with default neural number per layer 1024. (f) Test accuracies on FMNIST with or without sender/receiver constraint versus the number of neurons in each block. (g–i), Test accuracies on FMNIST versus the number of RE/RI/SE/SI neurons. ‘X=SI/RI’ means that the numbers of SI and RI neurons are both changed.
-
Figure 4—source data 1
The source data for Figure 4.
- https://cdn.elifesciences.org/articles/108237/elife-108237-fig4-data1-v2.xlsx
The reuse and flexible recruitment of neurons in AP.
(a), Reuse of RNNs in different cognitive tasks. When the orange paths are used for the FMNIST task ( for input and for output), the green paths are suppressed (grayed out in left panel), and vice versa for MNIST task (right panel, for input and output). The roles of neurons are different in different tasks. There are two different groups of RI neurons (RI1 and RI2) performing different tasks. (b) Flexible recruitment of neurons. Similar to a, the FMNIST task uses orange paths and the MNIST task uses green paths. Here, the FMNIST also shares the error feedback path between and (dashed green). (c) On the basis of b, FMNIST task can additionally recruit and for inference, namely incorporating the green path between and without learning it. The MNIST classification task only uses the green paths. (d-f) Accuracies of different tasks in different configurations illustrated in a, b, and c.
-
Figure 5—source data 1
The source data for Figure 5d-f.
- https://cdn.elifesciences.org/articles/108237/elife-108237-fig5-data1-v2.xlsx
Test accuracies on FMNIST with and without .
A 2-hidden-layer model by default setting is used.
-
Appendix 1—figure 1—source data 1
The source data for Appendix 1—figure 1.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app1-fig1-data1-v2.xlsx
2AFC task.
(a) Sample training images tilted 6 degrees to the left with different noise levels. (b) Test accuracy versus noise level. (c) Psychometric Curve. Noise level is defined as the standard deviation of the normal (Gaussian) distribution noise.
-
Appendix 2—figure 1—source data 1
The source data for Appendix 2—figure 1.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app2-fig1-data1-v2.xlsx
The test accuracy of 3-hidden-layer (a), 5-hidden-layer (b), and 10-hidden-layer (c) with varying and spectral radius.
The models were trained for 100 epochs. Each experiment is repeated twice.
-
Appendix 3—figure 1—source data 1
The source data for Appendix 3—figure 1.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app3-fig1-data1-v2.xlsx
Mean absolute values of error of each SI neuron and accuracy versus epochs at different .
The error values are collected from the last batch of training of a five-hidden-layer RNN. All SI neurons in the hidden layers and the output layer are indexed from the input to the output layer, i.e., neurons with smaller index are closer to the input layer. In a, small-indexed neurons exhibit weak error signals, which manifests gradient vanishing. With increasing (a, c, e), these error signals become larger, leading to better performances (see b, d, f). But excessively larger causes gradient explosion (see e, g, i), ultimately reducing testing accuracies (f, h, j). The network has been trained for 50 epochs in each case.
-
Appendix 3—figure 2—source data 1
The source data for Appendix 3—figure 2.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app3-fig2-data1-v2.xlsx
Mean absolute values of error of each SI neuron and accuracies versus epochs at different SR.
The error values are collected from the last batch of training a five-hidden-layer RNN. All SI neurons in the hidden layers and the output layer are indexed from the input to the output layer, i.e., neurons with smaller index are closer to the input layer. The effect of increasing SR is similar to that of increasing as observed from Appendix 3—figure 2.
-
Appendix 3—figure 3—source data 1
The source data for Appendix 3—figure 3.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app3-fig3-data1-v2.xlsx
The batch cosine similarity of error (a) and weight updates (b) of task 1, computed with and without simultaneous input of two tasks in Figure 5b.
Training consisted of 20 epochs with 120 batches per epoch (500 samples per batch). corresponds to the update of weights , and .
-
Appendix 4—figure 1—source data 1
The source data for Appendix 4—figure 1.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app4-fig1-data1-v2.xlsx
The batch cosine similarity of error (a) and weight updates (b) of task 1, computed with and without simultaneous input of two tasks in Figure 5c.
Training consisted of 20 epochs with 120 batches per epoch (500 samples per batch).
-
Appendix 4—figure 2—source data 1
The source data for Appendix 4—figure 2.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app4-fig2-data1-v2.xlsx
Tables
Comparison of test accuracies among different training methods.
We use backpropagation and feedback alignment to train fully-connected feedforward neural networks with one or three hidden layers, each with 256 neurons. The training continues for a maximum of 300 epochs or terminates if the accuracy for the training set is reaching 99.99%.
| Model | Layer | MNIST | FMNIST | CIFAR10 |
|---|---|---|---|---|
| BP | 1 | 98.04% ± 0.07% | 89.64% ± 0.09% | 53.66% ± 0.24% |
| 3 | 98.14% ± 0.06% | 89.95% ± 0.21% | 53.65% ± 0.19% | |
| FA | 1 | 97.94% ± 0.08% | 89.53% ± 0.11% | 52.67% ± 0.29% |
| 3 | 97.77% ± 0.09% | 89.36% ± 0.04% | 52.29% ± 0.21% | |
| AP | 1 | 97.47% ± 0.08% | 89.12% ± 0.06% | 50.11% ± 0.31% |
| 3 | 97.30% ± 0.10% | 88.82% ± 0.19% | 50.06% ± 0.54% |
The Adam optimizer parameters.
| Parameter name | Default value |
|---|---|
| Learning rate (MNIST FMNIST) | 0.001 |
| First-order moment estimation decay rate () | 0.9 |
| Second-order moment estimation decay rate () | 0.999 |
| Small constant for numerical stability () | 1e-8 |
Additional files
-
MDAR checklist
- https://cdn.elifesciences.org/articles/108237/elife-108237-mdarchecklist1-v2.docx
-
Appendix 1—figure 1—source data 1
The source data for Appendix 1—figure 1.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app1-fig1-data1-v2.xlsx
-
Appendix 2—figure 1—source data 1
The source data for Appendix 2—figure 1.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app2-fig1-data1-v2.xlsx
-
Appendix 3—figure 1—source data 1
The source data for Appendix 3—figure 1.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app3-fig1-data1-v2.xlsx
-
Appendix 3—figure 2—source data 1
The source data for Appendix 3—figure 2.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app3-fig2-data1-v2.xlsx
-
Appendix 3—figure 3—source data 1
The source data for Appendix 3—figure 3.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app3-fig3-data1-v2.xlsx
-
Appendix 4—figure 1—source data 1
The source data for Appendix 4—figure 1.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app4-fig1-data1-v2.xlsx
-
Appendix 4—figure 2—source data 1
The source data for Appendix 4—figure 2.
- https://cdn.elifesciences.org/articles/108237/elife-108237-app4-fig2-data1-v2.xlsx