Adjoint propagation of error signal through modular recurrent neural networks for biologically plausible learning

  1. Zhuo Liu
  2. Hao Shu
  3. Linmiao Wang
  4. Xu Meng
  5. Yousheng Wang
  6. Xuancheng Li
  7. Wei Wang
  8. Tao Chen  Is a corresponding author
  1. School of Microelectronics, University of Science and Technology of China, China
  2. Digital Intelligence Centre, Shenzhen Power Supply Bureau, China Southern Power Grid, China
12 figures, 2 tables and 8 additional files

Figures

Schematics of the MR-RNN and the AP framework.

(a) Illustration of cross-region recurrent connections in the brain. Each filled circle represents a local cortex area. (b) Illustration of horizontal and layer-wise recurrent connections in a cortical area. (c) An MR-RNN model with two modular RNNs L(1) and L(2). In this model, the internal connections of RNNs are sparse. For visual clarity, each circle represents a group of neurons, and the arrows represent bundles of interconnections. (d) The inference phase (left half) and the learning phase (right half) of AP. ①, The input data at L(0) clamps L(1) through W0,1. ②, L(1) in turn clamps L(2) through W1,2. ③, L(2) produces an output at L(3) via W2,3. ④, The prediction error generated in the output layer L(3) learns W2,3. ⑤, The error of L(3) layer nudges L(2), producing local error at L(2) layer for learning W1,2. ⑥, Subsequently, L(2)’s error nudges L(1) and yields local error for learning W0,1. The shadows on the layers and arrows denote error computation and weight updating, respectively.

Learning performance of the AP framework.

(a–b) The training and test accuracy on MNIST and FMNIST datasets. The increasing accuracies confirm the effectiveness of the AP algorithm in training the MR-RNN model. (c) MLE/FTMLE and accuracy on FMNIST versus SR. The accuracy peaks around SR of 0.1. SR affects the dynamics of the RNN and the accuracy on the datasets. MLE λmax and FTMLE λmax8 almost overlap with each other. Both increase when SR increases. The shaded area indicates three regions, namely stable, periodic, and chaotic dynamics of the RNN. (d) States evolution with different SR. The states of the network u (top row) and the one-step difference du (bottom row) are plotted against iteration step t. The one-step difference is the state of a step minus the state of its previous step, which reveals how much the states change.

Error representations and fast learning.

(a) Schematics of the short-trajectory error (STE). In the inference phase, an RNN evolves towards end point D along the blue trajectory. We can terminate the neural evolution earlier and compute output error at intermediate steps (A, B, and C). Under the influence of its input and error, the RNN evolves from the same starting point O and follows the red trajectory to A, B, C, or D. The vectors AA, BB, CC, all approximate the end-point error DD. (b) Visualization of STE trajectories. The STE trajectories of a 1024-node RNN for training on FMNIST dataset, visualized after PCA, correspond well to the schematics in a. (c) Test accuracy on FMNIST with different STEs. The legend te1-te2 specifies the numbers of iterations in the inference phase and the learning phase. Note that the first step is to feed the inference signal or error to an RNN, so t=2 means the whole RNN states update once. The test accuracy for different STEs stays nearly the same for a small SR. For SR larger than 1, more iterations perform worse, because the neural state easily saturates and the inference and error signals are lost in the iterations. (d) Schematics of the error-perturbed error. The blue trajectory of the inference phase is the same as in a. However, if the output error is computed at an intermediate step (e.g. B) and lets the RNN continue updating with the error-perturbation, then the trajectory is steered towards destination B. The vector BB also approximates the end point error.(e) Visualization of EPE trajectories. The actual EPE trajectories of a 1024-node RNN for training on FMNIST dataset, visualized after PCA, correspond well to the schematics in d. (f) Test accuracy on FMNIST with different EPEs.

Comparison with EP and scalability.

(a) Test accuracies of EP and AP with 2-hidden-layer (2-HL) on FMNIST versus training epochs. The numbers of iterations in the first/inference and second/learning phase are specified in the legend. EP learns faster for the first few epochs and exhibits signs of instability when iteration numbers are reduced to 32 and 10 for the two phases. In contrast, AP’s accuracy is stable. (b) Test accuracies versus time for 2-hidden-layer (2-HL) networks. AP takes shorter time than EP to achieve the same accuracy. (c) Test accuracies versus time for 3-hidden-layer (3-HL) networks. The instability issue of EP becomes more severe when scaled to deeper networks, whereas AP’s performance remains as stable as 2-hidden-layer networks. (d) Test accuracies on FMNIST of AP, FA, and BP with different number of layers. ‘2e-4’ is the learning rate. (e) Test accuracies on FMNIST with or without role constraint versus the number of layers. ‘16–8’ is the number of iterations in the first and second phase, which is sufficient for AP without role constraint. There are 256 neurons per block by default. RNNs without role constraints maintain the size of the shared block, thus the neural number of the layer is reduced by a factor of four. To compare the size effect, we also plot out the accuracy of unconstrained RNN with default neural number per layer 1024. (f) Test accuracies on FMNIST with or without sender/receiver constraint versus the number of neurons in each block. (g–i), Test accuracies on FMNIST versus the number of RE/RI/SE/SI neurons. ‘X=SI/RI’ means that the numbers of SI and RI neurons are both changed.

The reuse and flexible recruitment of neurons in AP.

(a), Reuse of RNNs in different cognitive tasks. When the orange paths are used for the FMNIST task (x1 for input and y1 for output), the green paths are suppressed (grayed out in left panel), and vice versa for MNIST task (right panel, x2 for input and y2 output). The roles of neurons are different in different tasks. There are two different groups of RI neurons (RI1 and RI2) performing different tasks. (b) Flexible recruitment of neurons. Similar to a, the FMNIST task uses orange paths and the MNIST task uses green paths. Here, the FMNIST also shares the error feedback path between L(3) and L(4) (dashed green). (c) On the basis of b, FMNIST task can additionally recruit L(3) and L(4) for inference, namely incorporating the green path between L(3) and L(4) without learning it. The MNIST classification task only uses the green paths. (d-f) Accuracies of different tasks in different configurations illustrated in a, b, and c.

Appendix 1—figure 1
Test accuracies on FMNIST with f` and without f`.

A 2-hidden-layer model by default setting is used.

Appendix 2—figure 1
2AFC task.

(a) Sample training images tilted 6 degrees to the left with different noise levels. (b) Test accuracy versus noise level. (c) Psychometric Curve. Noise level is defined as the standard deviation of the normal (Gaussian) distribution noise.

Appendix 3—figure 1
The test accuracy of 3-hidden-layer (a), 5-hidden-layer (b), and 10-hidden-layer (c) with varying β and spectral radius.

The models were trained for 100 epochs. Each experiment is repeated twice.

Appendix 3—figure 2
Mean absolute values of error of each SI neuron and accuracy versus epochs at different β.

The error values are collected from the last batch of training of a five-hidden-layer RNN. All SI neurons in the hidden layers and the output layer are indexed from the input to the output layer, i.e., neurons with smaller index are closer to the input layer. In a, small-indexed neurons exhibit weak error signals, which manifests gradient vanishing. With increasing β (a, c, e), these error signals become larger, leading to better performances (see b, d, f). But excessively larger β causes gradient explosion (see e, g, i), ultimately reducing testing accuracies (f, h, j). The network has been trained for 50 epochs in each case.

Appendix 3—figure 3
Mean absolute values of error of each SI neuron and accuracies versus epochs at different SR.

The error values are collected from the last batch of training a five-hidden-layer RNN. All SI neurons in the hidden layers and the output layer are indexed from the input to the output layer, i.e., neurons with smaller index are closer to the input layer. The effect of increasing SR is similar to that of increasing β as observed from Appendix 3—figure 2.

Appendix 4—figure 1
The batch cosine similarity of error (a) and weight updates (b) of task 1, computed with and without simultaneous input of two tasks in Figure 5b.

Training consisted of 20 epochs with 120 batches per epoch (500 samples per batch). ΔW1-ΔW3 corresponds to the update of weights x1L(1), L(1)L(2) and L(2)y1.

Appendix 4—figure 2
The batch cosine similarity of error (a) and weight updates (b) of task 1, computed with and without simultaneous input of two tasks in Figure 5c.

Training consisted of 20 epochs with 120 batches per epoch (500 samples per batch).

Tables

Table 1
Comparison of test accuracies among different training methods.

We use backpropagation and feedback alignment to train fully-connected feedforward neural networks with one or three hidden layers, each with 256 neurons. The training continues for a maximum of 300 epochs or terminates if the accuracy for the training set is reaching 99.99%.

ModelLayerMNISTFMNISTCIFAR10
BP198.04% ± 0.07%89.64% ± 0.09%53.66% ± 0.24%
398.14% ± 0.06%89.95% ± 0.21%53.65% ± 0.19%
FA197.94% ± 0.08%89.53% ± 0.11%52.67% ± 0.29%
397.77% ± 0.09%89.36% ± 0.04%52.29% ± 0.21%
AP197.47% ± 0.08%89.12% ± 0.06%50.11% ± 0.31%
397.30% ± 0.10%88.82% ± 0.19%50.06% ± 0.54%
Table 2
The Adam optimizer parameters.
Parameter nameDefault value
Learning rate (MNIST FMNIST)0.001
First-order moment estimation decay rate (β1)0.9
Second-order moment estimation decay rate (β2)0.999
Small constant for numerical stability (ϵ)1e-8

Additional files

MDAR checklist
https://cdn.elifesciences.org/articles/108237/elife-108237-mdarchecklist1-v2.docx
Appendix 1—figure 1—source data 1

The source data for Appendix 1—figure 1.

https://cdn.elifesciences.org/articles/108237/elife-108237-app1-fig1-data1-v2.xlsx
Appendix 2—figure 1—source data 1

The source data for Appendix 2—figure 1.

https://cdn.elifesciences.org/articles/108237/elife-108237-app2-fig1-data1-v2.xlsx
Appendix 3—figure 1—source data 1

The source data for Appendix 3—figure 1.

https://cdn.elifesciences.org/articles/108237/elife-108237-app3-fig1-data1-v2.xlsx
Appendix 3—figure 2—source data 1

The source data for Appendix 3—figure 2.

https://cdn.elifesciences.org/articles/108237/elife-108237-app3-fig2-data1-v2.xlsx
Appendix 3—figure 3—source data 1

The source data for Appendix 3—figure 3.

https://cdn.elifesciences.org/articles/108237/elife-108237-app3-fig3-data1-v2.xlsx
Appendix 4—figure 1—source data 1

The source data for Appendix 4—figure 1.

https://cdn.elifesciences.org/articles/108237/elife-108237-app4-fig1-data1-v2.xlsx
Appendix 4—figure 2—source data 1

The source data for Appendix 4—figure 2.

https://cdn.elifesciences.org/articles/108237/elife-108237-app4-fig2-data1-v2.xlsx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zhuo Liu
  2. Hao Shu
  3. Linmiao Wang
  4. Xu Meng
  5. Yousheng Wang
  6. Xuancheng Li
  7. Wei Wang
  8. Tao Chen
(2026)
Adjoint propagation of error signal through modular recurrent neural networks for biologically plausible learning
eLife 15:e108237.
https://doi.org/10.7554/eLife.108237