Neuroscience

Twice as nice: Boosts in adolescent reinforcement learning from Pavlovian bias and age-related prioritization of reward-motivated incidental memory

Haley Hegefeld
Juliet Y Davidow author has email address

Psychology Department, Northeastern University, Boston, United States
Institute for Cognitive and Brain Health, Northeastern University, Boston, United States

https://doi.org/10.7554/eLife.108428.1

Open access
Copyright information

Figures and data

(A) In the Affective Go/No-Go Task, participants first see one of four doors (the cue), after which they choose whether to ring the doorbell (the target), and then receive feedback according to their choice (the outcome). (B) For doors signaling the potential to win money, 80% of the time participants will win $0.25 for making the correct choice (depicted as a green frame around a trial-unique image of an everyday object) and win $0 for making the incorrect choice (yellow frame). For doors signaling the potential to lose money, 80% of the time participants will lose $0 for making the correct choice (the yellow frame) and will lose $0.25 for making the incorrect choice (red frame). The feedback is probabilistic, such that the outcome is reversed on 20% of trials. (C) Action and valence are orthogonalized across the four doors. Thus, each door signals the correct action, in addition to the potential to win/lose money. Pavlovian-congruent conditions are outlined by a navy box, while Pavlovian-incongruent conditions are not. (D) Participants complete a surprise memory task of the trial-unique images of objects presented during the outcome phase of the learning task interleaved with novel object images. Participants first identify whether an image is “Old” or “New”, then they rate their confidence and report their source memory for the associated outcome.

Results from the Best-Fitting Generalized Mixed-Effects Model of Learning Accuracy

(A) Pavlovian influence on instrumental learning is evident at the group level. Participants perform better in the Pavlovian-congruent than Pavlovian-incongruent condition within each action (i.e., Go to Win > Go to Avoid Loss; No-Go to Avoid Loss > No-Go to Win, see Fig. S3 for paired t-tests). Bars display model predictions for the last trial of the task at the mean age of participants (Fig. S5 shows plots at different trials and ages). Error bars show 95% confidence intervals, and points show each participant’s accuracy within the trial types averaged across the entire task. Dashed lines at 50% and 80% represent chance and probability-matching performance, respectively. (B) There is a peak in performance for adolescents in the Pavlovian-congruent conditions (“Con.”), whereas participants of all ages perform similarly in the Pavlovian-incongruent conditions (“Inc.”). (C) Adolescents demonstrate the greatest relative speeding to the Pavlovian-congruent Go to Win vs. Pavlovian-incongruent Go to Avoid Loss conditions, shown in the plot by the difference between the two fit lines. In (B) and (C) the lines show model predictions across age, shaded regions show 95% confidence intervals, and points show the participants’ raw data averaged across the task within relevant conditions. Fit lines are shown for statistically significant effects.

(A) Memory accuracy (d’) negatively correlates with age. Shaded regions show 95% confidence intervals, and points show the raw data. (B) We visualized the interaction of Reward-by-Age on d’ by plotting a difference score here, showing that reward memory bias is higher in the younger participants than the older. See Fig. S11 for the effects plot of the interaction. R = Reward, NR = No Reward. (C) The Prediction Error-by-Age interaction shows that children have positive prediction error-related memory enhancements relative to adolescents and adults. The lines represent model predictions at each age (10, 15, 20, and 25-years-old), and shaded regions show 95% confidence intervals.

Results from the Best-Fitting Generalized Mixed-Effects Model of Memory Accuracy

Sample Demographics.
The left plot shows the sample demographics for people who completed the learning task (N = 174). A subset of participants also completed the memory task with adequate performance, shown on the right (N = 162). Both samples have good representation across the age range.

Model Comparison for Nested Learning Models.
Models increase in complexity starting with a null model. Predictors are added in order of largest expected effect. Likelihood ratio tests indicate that the best-fitting model includes all predictors of interest, the FE Quadratic Age model. RE = random effect. FE = fixed effect.

Model Comparison for Reduced Learning Models.
Highest-order non-significant interactions are removed from the model before testing for degraded model fit. Likelihood ratio tests indicate that including the non-significant, higher-order interactions do not result in improved model fit, so we report results from the FE Quadratic Age – Parsimonious Model.

Best-Fitting Model Standardized Odds Ratios.
Odds ratios less than one correspond to “negative” coefficients (i.e., the odds of a correct response decrease with an additional unit of the predictor), and ratios greater than one correspond to “positive” coefficients (i.e., the odds of a correct response increase with an additional unit of the predictor).

Results from the More Complex Model.
The results of the more complex model (FE Quadratic Age) include the non-significant, higher-order interactions that are removed from the Parsimonious model. The odds ratios are similar to those presented in the main text (Table 1) estimated by the parsimonious model.

Learning within each Condition.
The influence of Pavlovian bias on instrumental learning was evident at the group level, whereby participants had higher accuracy in the Pavlovian-congruent than incongruent conditions. This is generally consistent with prior literature employing this task, whether in adults only, across development, or throughout the lifespan. Participants performed better in the Go to Win than the Go to Avoid Loss condition (M_Difference = .15, 95% CI = [.12, .18], t(173) = 10.40, p < .001) and in the No-Go to Avoid Loss than the No-Go to Win condition (M_Difference = .13, 95% CI = [.09, .17], t(173) = 6.47, p < .001). Further, paired t-tests of accuracy in the first and last task blocks indicated that accuracy increased across the task in the Go to Win (M_Difference = .08, 95% CI = [.04, .12], t(173) = 3.68, p < .001), No-Go to Win (M_Difference = .11, 95% CI = [.06, .16], t(173) = 4.13, p < .001), and No-Go To Avoid Loss (M_Difference = .16, 95% CI = [.12, .20], t(173) = 7.39, p < .001) conditions, but not in the Go to Avoid Loss (M_Difference = .01, 95% CI = [−.03, .04], t(173) = 0.29, p = .77) condition. Light and dark bars represent accuracy in the first and last blocks of the task (averaged over 15 trials each), respectively, reflecting learning over time. The probabilistic frequency of the optimal outcome was set to 80%.

Effects Plots of Key Interactions.
The first plot shows the model predictions for the Action*Valence*Trial Number interaction (at the sample mean age 17.09). The slopes of the lines indicate the rate of learning across the task for each trial type. The second plot shows the model predictions for the Action*Valence*Age interaction (at the mean trial). The adolescent peak in performance is more prominent in the Pavlovian-congruent conditions (Go to Win and No Go to Avoid Loss) than the Pavlovian-incongruent conditions. Shaded regions show the 95% confidence intervals.

Predicted Learning Accuracy at Different Ages and Trials.
The bars show the predicted probability of a correct response at the first, median, and last trials for participants aged 8.15 (sample minimum), 17.09 (sample mean), and 25.86 (sample maximum) years old. The “17yo / Last Trial” plot (middle row, right column) is the same as Figure 2a in the main text. The error bars represent 95% confidence intervals. Pavlovian-congruent conditions are outer bars (GW & NGAL) and Pavlovian-incongruent are inner bars (GAL & NGW). Dashed lines are at 50% (chance responding to either go or no-go) and 80% (reflecting the probabilistic frequency of best outcome per condition in this task).

Comparing Linear, Binomial, and Beta-binomial Regression Models for “Go to Win.”
Simulated data from the beta-binomial model (top) were more similar to the observed data (i.e., the blue lines overlap the green more closely) than simulated data from linear (middle) and binomial (bottom) models. These posterior predictive checks were conducted using the performance package (Lüdecke et al., 2021).

Age-related Differences in Behavioral Pavlovian Bias Scores.
The behavioral metric of Pavlovian bias had a shallow quadratic association with age, with an adolescent boost in the score (Quadratic vs. Linear Model: F(1,171) = 4.90, p < .05). The line represents model predictions across age, shaded regions show 95% CIs, and points show raw data.

Model List.
We tested a set of nine RL models, which are described in this table. The winning model (5^th row, bolded) included a learning rate, lapse rate, gain and loss sensitivities, go bias, and Pavlovian bias parameters. To evaluate which of the alternative reinforcement learning models had the most explanatory power, we computed an integrated Bayes Information Criterion (iBIC) for the log likelihood estimated from the best-fit iteration for each model. The Bayes Information Criterion (BIC, Schwarz, 1978) for each subject was summed for each model separately, with the lowest value indicating the best model. Following Cavanagh et al., 2013, differences in iBIC between alternative models greater than 20 are highly suggestive of a meaningful difference (Kass & Raftery, 1995). The difference here between the best-fit model and the model with the next lowest iBIC was 42.8. This model also had the same number of fit parameters (6).

Model List.
We tested a set of nine RL models, which are described in this table. The winning model (5^th row, bolded) included a learning rate, lapse rate, gain and loss sensitivities, go bias, and Pavlovian bias parameters. To evaluate which of the alternative reinforcement learning models had the most explanatory power, we computed an integrated Bayes Information Criterion (iBIC) for the log likelihood estimated from the best-fit iteration for each model. The Bayes Information Criterion (BIC, Schwarz, 1978) for each subject was summed for each model separately, with the lowest value indicating the best model. Following Cavanagh et al., 2013, differences in iBIC between alternative models greater than 20 are highly suggestive of a meaningful difference (Kass & Raftery, 1995). The difference here between the best-fit model and the model with the next lowest iBIC was 42.8. This model also had the same number of fit parameters (6).

Model Validation.
The bars represent mean accuracy within each age group (Children: 8 – 12.99; Adolescents: 13 – 17.99; Young Adults: 18 – 21.99; Adults: 22 – 25.99) for each trial type from the participants’ choice data (dark: Pavlovian-congruent, light: Pavlovian-incongruent). Points show each participants’ accuracy. The large orange points are the mean accuracy for each age group generated from the winning RL model, computed as the average from 100 simulations of the task using each set of individual-level parameter estimates (N = 174). The model recapitulates participants’ choice behavior well, although there is some deviation for the adults in the Win conditions.

Parameter Recovery.
Parameter recovery was assessed by generating data for 1000 subjects, using bootstrapped samples of the individual parameter solutions. The best-fitting model was fit to these data, and the estimates of the recovered parameters were correlated with the generative parameters. Parameter recovery was adequate (Spearman’s rhos = .42 - .87, with lapse rate and go bias as the best and worst recovered, respectively), but as the expectation maximization approach tends to constrain estimates towards the group mean, some of the more extreme values were poorly recovered and should be interpreted cautiously. Each point represents one of the 1000 estimates. The black line represents the “ideal” linear association between the estimates, and the blue line represents the estimated linear association.

Parameter Associations with Age.
We tested linear and quadratic associations with age for each parameter. The only parameter that significantly related to age is loss sensitivity, shown in the top right plot (Linear Age vs. Null Model: F(1,172) = 7.04, p < .01; r = .20, 95% CI: [.05, .34]), while the null model fit all other parameters best (Linear Age vs. Null Model ps > .05). Fit lines are shown for statistically significant effects. Points show individual-level parameter estimates, and shaded regions show 95% CIs.

Effects Plot of Reward*Age Interaction Predicting d’.
The lines represent the predicted values of the interaction of Reward*Age on d’. Memory accuracy of both types is worse in the older participants than in the younger. Younger participants remembered images associated with gaining rewards (red line) better than images associated with not gaining rewards (blue line). Older participants showed the opposite, remembering images associated with rewards relatively worse. The points show each participants’ sensitivity for each reward condition, and shaded regions show 95% confidence intervals.

Association between Learning and Memory Accuracy.
There is a weak, negative correlation between learning and memory accuracy (r = -.17, 95% CI = [−.32, -.02], t(160) = - 2.20, p = .03). The negative correlation suggests that better learning comes at the cost of incidental memory, which is generally consistent with prior work in adults (Wimmer et al., 2014).

Model Comparison for Prediction Error-Related Memory Models.
Likelihood ratio tests indicate that the best-fitting model includes linear and quadratic prediction error and linear age, the FE Linear Age model.

Sign up for email alerts