Valence biases in reinforcement learning shift across adolescence and modulate subsequent memory

  1. Gail M Rosenbaum
  2. Hannah L Grassie
  3. Catherine A Hartley  Is a corresponding author
  1. Department of Psychology, New York University, United States
  2. Center for Neural Science, New York University, United States
23 figures, 2 tables and 2 additional files

Figures

Task structure.

(A) Schematic of the structure of a trial in the risk-sensitive reinforcement learning task. (B) The probabilities and point values associated with each of five ‘point machines’ (colors were counterbalanced). (C) Example memory trial.

Probabilistic choices by age.

Probabilistic (i.e., risky) choices by age on trials in which the risky and safe machines had equal expected value (EV). Data points depict the mean percentage of trials where each participant selected the probabilistic choice option as a function of age. The regression line is from a linear regression including linear and quadratic age terms (significant quadratic effect of age: b = 0.06, 95% CI [0, 0.12], t(59) = 2.14, p=0.036, f2 = .08, 95% CI [0, 0.29], N = 62). Shaded region represents 95% CIs for estimates.

Asymmetry index (AI) by age.

The regression line is from a linear regression model including linear and quadratic age terms (b = 0.17, 95% CI [0.03, 0.31], t(59) = 2.43, p=0.018, f2 = 0.10, 95% CI [0, 0.33], N = 62). Data points represent individual participants. Shaded region represents 95% CIs for estimates.

The relation between valence biases in learning and incidental memory for pictures presented with choice outcomes (Experiment 1).

(A) Results from generalized mixed-effects regression depicting fixed effects on memory accuracy. Whiskers represent 95% CI. (B) Estimated marginal means plot showing the three-way interaction between AI, PE valence, and PE magnitude (z = 3.45, p=0.001, OR = 1.12, 95% CI [1.05, 1.19], N = 62). Individuals with higher AIs were more likely to remember images associated with larger positive PEs, and those with lower AIs were more likely to remember images associated with larger negative PEs. Shaded areas represent 95% CI for estimates. ***p < .001.

Self-reported risk taking by age.

Self-reported risk taking on the Domain-Specific Risk Taking (DOSPERT) scale changed nonlinearly with age (linear regression: b = –0.42, 95% CI [–0.69,–0.15], t(59) = –3.09, p=0.003, f2 = 0.16, 95% CI [0.02, 0.44], N = 62). Shaded region represents 95% CIs for estimates.

The relation between valence biases in learning and incidental memory for pictures presented with trial outcomes (Experiment 2).

Reanalysis of data from Rouhani et al., 2018. (A) Results from generalized mixed-effects regression depicting fixed effects on memory accuracy. Whiskers represent 95% CI. (B) Estimated marginal means plot showing the three-way interaction between AI, PE valence, and PE magnitude (z = 2.19, p=0.029, OR = 1.07, 95% CI [1.01, 1.13], N = 305). Individuals with higher AIs were more likely to remember images associated with larger positive PEs, and those with lower AIs were more likely to remember images associated with larger negative PEs. Shaded areas represent 95% CI for estimates. *p < .05, ***p < .001.

Appendix 1—figure 1
Choices as a function of block and age group.

(A) Mean accuracy on test trials (approximately seven per participant per block). (B) Mean risk taking for equal expected value (EV) trials (seven per participant per block). (C) Mean risk taking for unequal-EV trials (four per participant per block).

Appendix 1—figure 2
Probabilistic choices for unequal-expected value (EV) risk trials.

Probabilistic (i.e., risky) choices by age on trials with unequal-expected value (EV) risky and safe machines, with a choice between the 0/80 probabilistic machine and the deterministic 20-point machine. Data points depict the mean percentage of trials where each participant selected the probabilistic choice option as a function of age. Regression line is from the glmer model including linear and quadratic age terms. Shaded region represents 95% CIs for estimates.

Appendix 1—figure 3
Memory performance across age.

(A) False alarm rate as a function of age. As reported in the article, false alarm rate increased with age (p=0.037). (B) D’ as a function of age. As reported in the article, there is a marginal linear decrease in d’ with age (p=0.070).

Appendix 1—figure 4
Ordinal regression analysis of incidental memory judgments (Experiment 1).

Results from an ordinal regression demonstrating that incidental memory accuracy for pictures presented with choice outcomes varies as a function of PE valence, PE magnitude, and asymmetry index (AI) without collapsing across response confidence levels. The probability of each memory response is plotted separately for three different AI levels (top: AI = –0.8; middle: AI = 0; bottom: AI = 0.8) as a function of PE valence, PE magnitude.

Appendix 1—figure 5
BIC distributions for all four models tested.
Appendix 1—figure 6
Relative BIC as a function of asymmetry index (AI).

The difference between risk-sensitive temporal difference (RSTD) and temporal difference (TD) model fit (BIC) for all participants in Experiment 1. Values below 0 indicate a better fit by the RSTD model.

Appendix 1—figure 7
Relative BIC as a function of asymmetry index (AI) for participants simulated by the risk-sensitive temporal difference (RSTD) model.

The difference between risk-sensitive temporal difference (RSTD) and temporal difference (TD) model fit (BIC). The difference in model fit (BIC) between the risk-sensitive temporal difference (RSTD) and temporal difference (TD) models for 10,000 subjects simulated using the RSTD model. Values below 0 indicate a better fit by the RSTD model.

Appendix 1—figure 8
Parameter recovery at different levels of Asymmetry Index (AI).

Parameter recovery for simulated participants at low (AIs ranging from –0.94 to –.0374), medium-low (AIs ranging from –0.373 to –0.07684), medium-high (AIs ranging from –0.07683 to 0.2501), and high (AIs ranging from 0.2502 to 0.97) levels of AI. (A) α+ recovery. (B) α- recovery. (C) β recovery. (D) BIC.

Appendix 1—figure 9
Posterior prediction check results.

Relationship between choices of the risky (probabilistic) option in real versus simulated data for (A) the risk-sensitive temporal difference (RSTD) model and (C) the Utility model. Relationship between risky choices and the corresponding real participant age for (B) RSTD-simulated and (D) Utility-simulated participants.

Appendix 1—figure 10
Age patterns in risk-sensitive temporal difference (RSTD) model parameters.

Age-related change in (A) α+, (B) α-, and (C) β parameter estimates from the RSTD model.

Appendix 1—figure 11
Relationship between parameter estimates and PEs derived from the risk-sensitive temporal difference (RSTD) and Utility models.

(A) Relationship between asymmetry index (AI) and Rho. (B) PEs for all participants, colored by the mean proportion of risk taking in the task. Purple dots are PEs from risk-seeking participants, while green dots are from risk-averse participants. (C) PEs for an example risk-seeking participant. (D) PEs for an example risk-averse participant. (E) Mean PE magnitudes, z-scored within the full sample, and then averaged within-subject, ordered by increasing Rho. (F) Mean PE magnitudes, z-scored within the full sample and then averaged within-subject, ordered by increasing AI.

Appendix 1—figure 12
Learning parameters for free and forced choices.

(A) Negative learning rates, (B) positive learning rates, and (C) asymmetry indices from the FourLR model are plotted as a function of choice type (free or forced).

Appendix 1—figure 13
Learning biases and subsequent memory as a function of agency.

(A) PE Valence × PE Magnitude × AI for free choices. (B) PE Valence × PE Magnitude × AI for forced trials. (C) PE Valence × PE Magnitude × AI for Experiment 1 (Figure 4B). (D) PE Valence × PE Magnitude × AI for Experiment 2 (Figure 6B). Note that the interaction effect for forced choices (B) resembles that in Experiment 2 (D) where participants were not asked to make choices.

Appendix 1—figure 14
Distributions of parameters from the Explicit Prediction model.

Distributions for (A) α+, (B) α-, and (C) asymmetry index (AI) in Experiment 2.

Appendix 1—figure 15
Experiment 2 sensitivity analysis.

Generalized linear mixed-effects regression results demonstrating incidental memory accuracy for pictures presented during learning as a function of PE valence, PE magnitude, including participants poorly fit by the RL model. (A) Fixed-effects results. Whiskers represent 95% CI. (B) Estimated marginal means plot showing the marginally significant three-way interaction between asymmetry index (AI), PE valence, and PE magnitude. Shaded areas represent 95% CI for estimates. (C) Estimated marginal means for significant two-way interaction between AI and PE valence. Whiskers represent 95% CI. **p < .01, ***p < .001.

Author response image 1
BICs for each participant from RSTD and Utility models.
Author response image 2
Non-significant results from a 3-way interaction in subsequent memory data using individual difference and outcome value measures not derived from a reinforcement learning model.

Tables

Table 1
Bounds, priors, and recoverability for parameters in each model.
ModelParameterBoundsPriorRecoverability
TDα0,1Beta(2,2)0.84
β0.000001, 30Gamma(2,3)0.88
RSTDα+0,1Beta(2,2)0.79
α-0,1Beta(2,2)0.88
β0.000001, 30Gamma(2,3)0.90
FourLRα+ free0,1Beta(2,2)0.79
α- free0,1Beta(2,2)0.89
α+ forced0,1Beta(2,2)0.76
α- forced0,1Beta(2,2)0.78
β0.000001, 30Gamma(2,3)0.90
Utilityα0,1Beta(2,2)0.75
β0.000001, 30Gamma(2,3)0.88
ρ0, 2.5Gamma(1.5,1.5)0.88
  1. Priors for α and β were based on those used in Niv et al., 2012.

  2. TD, temporal difference; RSTD, risk-sensitive temporal difference; LR, learning rate.

Table 2
Model recovery.
Comparison model
TDRSTDFourLRUtility
 Generating modelTD-0.981.000.97
RSTD0.57-0.990.65
FourLR0.500.31-0.39
Utility0.580.760.99-
  1. TD, temporal difference; RSTD, risk-sensitive temporal difference; LR, learning rate.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Gail M Rosenbaum
  2. Hannah L Grassie
  3. Catherine A Hartley
(2022)
Valence biases in reinforcement learning shift across adolescence and modulate subsequent memory
eLife 11:e64620.
https://doi.org/10.7554/eLife.64620