Neuroscience

Approach-avoidance reinforcement learning as a translational and computational model of anxiety-related avoidance

Yumeya Yamamori
Oliver J Robinson
Jonathan P Roiser author has email address

Institute of Cognitive Neuroscience, University College London, London, UK
Research Department of Clinical, Educational and Health Psychology, University College London, London, UK

https://doi.org/10.7554/eLife.87720.3

Open access
Copyright information

Figures and data

The approach-avoidance reinforcement learning task.
a, Trial timeline. A fixation cross initiates the trial. Participants are presented with two options for up to 2 s, from which they choose one. The outcome is then presented for 1 s. b, Possible outcomes. There were four possible outcomes: 1) no reward and no aversive sound; 2) a reward and no aversive sound; 3) an aversive sound and no reward; or 4) both the reward and the aversive sound. c, Probabilities of observing each outcome given the choice of option. Unbeknownst to the participant, one of the options (which we refer to as the ‘conflict’ option - solid lines) was generally more rewarding compared to the other option (the ‘safe’ option – dashed line) across trials. However, the conflict option was also the only option of the two that was associated with a probability of producing an aversive sound (the probability that the safe option produced the aversive sound was 0 across all trials). The probabilities of observing each outcome given the choice of option fluctuated randomly and independently across trials. The correlations between these dynamic probabilities were negligible (mean Pearson’s r = 0.06). d, Distribution of outcome probabilities by option and outcome. On average, the conflict option was more likely to produce a reward than the safe option. The conflict option had a variable probability of producing the aversive sound across trials, but this probability was always 0 for the safe option.

Predictors of choice in the approach-avoidance reinforcement learning task.
a, Coefficients from the mixed-effects logistic regression of trial-by-trial choices in the task. On any given trial, participants chose the option that was more likely to produce a reward. They also avoided choosing the conflict option when it was more likely to produce the punishment. Task-induced anxiety significantly interacted with punishment probability. Significance levels are shown according to the following: p < 0.05 - *; p < 0.05 - **; p < 0.001 - ***. b, Subjective ratings of task- induced anxiety, given on a scale from ‘Not at all’ (0) to ‘Extremely’ (50). c, On each trial, participants were likely to choose the option with greater probability of producing the reward. d, Participants tended to avoid the conflict option when it was likely to produce a punishment. e, Compared to individuals reporting lower anxiety during the task, individuals experiencing greater anxiety showed greater avoidance of the conflict option, especially when it was more likely to produce the punishment. Note. Figures c-e show logistic curves fitted to the raw data using the ‘glm’ function in R. For visualisation purposes, we categorised continuous task-induced anxiety into tertiles. We show linear curves here since these effects were estimated as linear effects in the logistic regression models, however the raw data showed non-linear trends – see Supplementary Figure 15.

Computational modelling of approach-avoidance reinforcement learning.
a, Model comparison results. The difference in integrated Bayesian Information Criterion scores from each model relative to the winning model is indicated on the x-axis. The winning model included specific learning rates for reward (α^R) and punishment learning (α^p), and specific outcome sensitivity parameters for reward (β^R) and punishment (β^P). Some models were tested with the inclusion of a lapse term (ξ). b, Distributions of individual parameter values from the winning model. c, The winning model was able to reproduce the proportion of conflict option choices over all trials in the observed data with high accuracy (observed vs predicted data r = 0.97). d, The distribution of the reward-punishment sensitivity index – the computational measure of approach-avoidance bias. Higher values indicate approach biases, whereas lower values indicate avoidance biases.

Relationships between task-induced anxiety, model parameters and avoidance.
a Task-induced anxiety was negatively correlated with the punishment learning rate. b, Task-induced anxiety was also negatively correlated with reward-punishment sensitivity index. Kendall’s tau correlations and approximate Pearson’s r equivalents are reported above each figure. c, The mediation model. Mediation effects were assessed using structural equation modelling. Bold terms represent variables and arrows depict regression paths in the model. The annotated values next to each arrow show the regression coefficient associated with that path, denoted as coefficient (standard error). Only the reward-punishment sensitivity index significantly mediated the effect of task- induced anxiety on avoidance. Significance levels in all figures are shown according to the following: p < 0.05 - *; p < 0.05 - **; p < 0.001 - ***.

Model specification

Statistical results across the discovery and replication samples, and the effect of data cleaning exclusions

Statistical results across the discovery and replication samples, and the effect of data cleaning exclusions

Model comparison and parameter distributions across studies.
a, Model comparison results. The difference in integrated Bayesian Information Criterion scores from each model relative to the winning model is indicated on the x-axis. The winning model in both studies included specific learning rates for reward (α^R) and punishment learning (α^p), and specific outcome sensitivity parameters for reward (β^R) and punishment (β^P). Some models were tested with the inclusion of a lapse term (ξ). b, Distributions of individual parameter values from the winning model across studies. The reward-punishment sensitivity index constituted our computational measure of approach-avoidance bias, calculated by taking the ratio between the reward and punishment sensitivity parameters.

Correlation matrices for the estimated parameters across studies.
Lower-right diagonal of each matrix shows a scatterplot of cross-parameter correlations. Upper-right diagonal denotes the Pearson’s r correlation coefficients for each pair of parameters, based on the untransformed parameter values.

Mediation analyses across studies.
Mediation effects were assessed using structural equation modelling. Bold terms represent variables and arrows depict regression paths in the model. The annotated values next to each arrow show the regression coefficient associated with that path, denoted as coefficient (standard error). Only the reward- punishment sensitivity index significantly mediated the effect of task-induced anxiety on avoidance. Significance levels in all figures are shown according to the following: p < 0.05 - *; p < 0.05 - **; p < 0.001 - ***.

Parameter recovery.
Pearson’s r values across the data-generating and recovered parameters by parameter. Coloured points represent Pearson’s r values for each of 100 simulation iterations, and black points represent the mean value across simulations.

Split-half reliability of the task.
Reliability via Pearson’s correlations of measures calculated from the first and second halves of the task are shown with their estimates of reliability. Reliability estimates for the computational measures from the winning computational model were computed by fitting split-half parameters within a single model, then using the parameter covariance matrix to derive Pearson’s correlation coefficients for each parameter across halves. Reliability estimates are reported as unadjusted values (r) and after adjusting for reduced number of trials via Spearman-Brown correction (radjusted). Dotted lines represent the reference line, indicating perfect correlation. Red lines show lines-of-best-fit.

Test-retest reliability of the task.
Correlations of measures across the test and retest sessions are shown with their estimates of reliability. Reliability estimates for the model-agnostic measures (task-induced anxiety, proportion of conflict option choices) were estimated using intra-class correlation coefficients. Reliability estimates for the computational measures from the winning computational model were computed by first fitting both sessions’ parameters within a single model, then using the parameter covariance matrix to derive a Pearson’s correlation coefficient (rModel-derived) for each parameter across sessions to be calculated from their covariance. Dotted lines represent the reference line, indicating perfect correlation. Red lines show lines-of-best-fit.

Task practice effects.
Comparison of behavioural measures and model parameters across time. Lines represent individual data, red points represent mean values, and red lines represent standard error bars. P-values of paired t-tests are annotated above each plot. Task-induced anxiety and the punishment learning rate was significantly lower in the second session, whilst the other measures did not change significantly across sessions.

Inter-parameter correlations across the expectation-maximisation (EM, red) and variational Bayesian inference (VBI, blue) algorithms.
Overall, the VBI algorithm produced lower correlations compared to EM.

Sensitivity analysis of the computational findings relating to task-induced anxiety; comparing results when using parameters estimated via expectation maximisation (EM, red) and variational Bayesian inference (VBI, blue).
a Kendall’s tau correlations across each parameter and task-induced anxiety. b, Mediating effects of the punishment learning rate and reward-punishment sensitivity index.

Distribution of self-reported punishment unpleasantness ratings.
Ratings were scored from ‘Not at all’ to ‘Extremely’ (encoded as 0 and 50, respectively). Distributions are shown across the discovery and replication samples.

The effect of including unpleasantness ratings as a covariate in the hierarchical logistic regression models of task choices.
Dots represent coefficient estimates from the model, with confidence intervals. Models are shown for both the discovery and replication samples. Significance levels are shown according to the following: p < 0.05 - *; p < 0.05 - **; p < 0.001 - ***.

The effect of including unpleasantness ratings as a covariate in the mediation models.
Dots represent coefficient estimates from the model, with confidence intervals. Models are shown for both the discovery and replication samples. Significance levels are shown according to the following: p < 0.1 - †; p < 0.05 - *; p < 0.05 - **; p < 0.001 - ***.

Test-retest reliability of unpleasantness ratings.
a, Comparing unpleasantness ratings across timepoints, participants rated the punishments as significantly less unpleasant in the second session. b, Correlation of ratings across timepoints.

Mixed-model-derived intraclass correlation coefficients (ICCs) for measures of task performance, with and without accounting for unpleasantness.
Dots represent model-derived ICCs.

Effects of outcome probabilities on proportion of conflict option choices.
Mean probabilities of choosing the conflict arm across the sample are plotted with standard errors. The relationships between the drifting outcome probabilities in the task and group choice proportions showed non-linear trends in both the discovery and replication samples, especially for the effect of punishment probability on choice (both main effect and interaction effect with anxiety). Note. For visualisation purposes, the continuous predictors (based on the latent outcome probabilities or task-induced anxiety) were categorised into discrete bins.

Sign up for email alerts