Occurrence of feedback alignment and an intuitive understanding of its mechanism. (A) Over-trial changes in the angle between the value-weight vector w and the fixed random feedback vector c in the simulations of VRNNrf (7 RNN units). The solid line and the dashed lines indicate the mean and ± SD across 100 simulations, respectively. (B) The relation between the angle between w and c (horizontal axis) and the value of the pre-reward state (vertical axis) at 1000-th trial. The dots indicate the results of individual simulations, and the line indicates the regression line. (C) Angle between the hypothetical change in x(t) = f(Ax(t−1),Bo(t−1)) in case A and B were replaced with their updated ones, multiplied with the sign of TD-RPE (sign(δ(t))), and the fixed random feedback vector c across time-steps. The black thick line and the gray lines indicate the mean and ± SD across 100 simulations, respectively (same applied to (D)). (D) Multiplication of TD-RPEs in successive trials at individual states (top: cue, 4th from the top: reward). Positive or negative value indicates that TD-RPEs in successive trials have the same or different signs, respectively. (E) Left: RNN trajectories mapped onto the primary and secondary principal components (horizontal and vertical axes, respectively) in three successive trials (red, blue, and green lines (heavily overlapped)) at different phases in an example simulation (10th-12th, 300th-302th, 600th-602th, and 900th-902th trials from top to bottom). The crosses and circles indicate the cue and reward states, respectively. Right: State values (black lines) and TD-RPEs (red lines) at 11th, 301th, 601th, and 901th trial.