A. Description of learning rules corresponding to different types of learning problems and corresponding expressions for the recall factor used in the recall-gated consolidation model. B. Schematic indicating a possible implementation of the model in a supervised learning problem, where LTM plasticity is modulated by the consistency between STM predictions and ground-truth labels. C. Like B, but for a reinforcement learning problem. LTM plasticity is gated by both STM action confidence and the presence of reward. D. Like B and C, but for an autoassociative unsupervised learning problem. As above, x corresponds to neural activity and W to the network weights, which here are recurrent. LTM plasticity is gated by familiarity detection in the STM module. E. Simulation of a binary classification problem, N = 2000, θ = 0.125, p = 0.1. There are twenty total stimuli each associated with a random binary (±1) label and each appearing with probability λ = 0.01 at each timestep (otherwise a random stimulus is presented, with a random binary label). Plot shows the classification accuracy over time, given by the outputs of the STM and LTM of the consolidation model. Shaded region indicates standard deviation over 50 simulations. F. Simulation of a reinforcement learning problem, N = 2000, θ = 0.125, p = 1.0. There are five total stimuli, each appearing with probability λ = 0.01 at each timestep (otherwise a random stimulus is presented), and three possible actions. Each stimulus has a corresponding action that yields reward (the reward is randomly sampled for the random stimuli). The plot shows average reward per step over time, evaluated using the actions given by the STM or LTM (during learning, the STM action was always used). G. Simulation of an autoassociative learning problem. N = 4000, p = 1.0. A single stimulus appears with probability λ = 0.25 at each timestep, and otherwise a random stimulus appears. Recall performance is evaluated by exposing the system to a noisy version of the reliable stimulus seen during training, allowing the recurrent dynamics of the network to run for 5 timesteps, and measuring the correlation of the final state of the network with the ground-truth pattern.