Conformist social learning leads to self-organised prevention against adverse bias in risky decision making
Figures
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig1-v1.tif/full/617,/0/default.jpg)
Mitigation of suboptimal risk aversion by social influence.
(a) A schematic diagram of the task. A safe option provides a constant reward whereas a risky option provides a reward randomly drawn from a Gaussian distribution with mean and . (b, c): The emergence of suboptimal risk aversion (the hot stove effect) depending on a combination of the reinforcement learning parameters; (b): under no social influence (i.e. the copying weight ), and (c): under social influences with different values of the conformity exponents and copying weights . The dashed curve is the asymptotic equilibrium at which asocial learners are expected to end up choosing the two alternatives with equal likelihood (i.e. ), which is given analytically by (Denrell, 2007). The coloured background is a result of the agent-based simulation with total trials and group size , showing the average proportion of choosing the risky option in the second half of the learning trials under a given combination of the parameters. (d): The differences between the mean proportion of risk aversion of asocial learners and that of social learners, highlighting regions in which performance is improved (orange) or undermined (purple) by social learning.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig1-figsupp1-v1.tif/full/617,/0/default.jpg)
The simulation result with a wider parameter space.
The effect of the relationship between individual learning rate () and individual inverse temperature () across the different combinations of social learning parameters on the mean proportion of choosing the risky alternative in the second half of the trials of the two-armed bandit task described in Figure 1 in the main text. The dashed curves give a set of parameter combinations with which asocial learners are expected to choose the risky alternative in the same proportion as they choose the safe alternative (i.e. ) in the infinite time horizon , given by .
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig1-figsupp2-v1.tif/full/617,/0/default.jpg)
The results of the value-shaping social influence model.
The relationships between individual learning rate () and individual inverse temperature () across different combinations of social learning parameters. The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials . Different social learning weights () are shown from top to bottom (). Different conformity exponents are shown from left to right (). The dashed curve is the asymptotic equilibrium at which asocial learners are expected to end up choosing both alternatives with equal likelihood (i.e. ), given by .
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig1-figsupp3-v1.tif/full/617,/0/default.jpg)
The simulation result with the negative risk premium.
The relationships between individual learning rate () and individual inverse temperature () across different combinations of social learning parameters. The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials . Different social learning weights () are shown from top to bottom (). Different conformity exponents are shown from left to right (). The risk premium is negative
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig1-figsupp4-v1.tif/full/617,/0/default.jpg)
The simulation result with the Bernoulli noise distribution.
The relationships between individual learning rate () and individual inverse temperature () across different combinations of social learning parameters. The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials . Different social learning weights () are shown from top to bottom (). Different conformity exponents are shown from left to right (). The binary payoff distribution was used where the safe alternative always provides while the risky alternative provides either a 70% chance of or a 30% chance of . The risk premium was 1.5.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig1-figsupp5-v1.tif/full/617,/0/default.jpg)
The simulation results under the positive risk premium experimental setups (a,d: the 1-risky-1-safe; b,e: the 1-risky-3-safe; c,f: the 2-risky-2-safe).
The relationships between individual learning rate () and individual inverse temperature () across different combinations of social learning parameters. (a–c): The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials () under social influences with different values of the conformity exponents and copying weights . The dashed curve is the asymptotic equilibrium at which asocial learners are expected to end up choosing the two alternatives with equal likelihood (i.e. ). (d–f): The differences between the mean proportion of risk aversion of asocial learners and that of social learners, highlighting regions in which performance is improved (that is, risk-seeking increases; orange) or undermined (that is, risk-aversion is amplified; purple) by social learning.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig1-figsupp6-v1.tif/full/617,/0/default.jpg)
The simulation results under the negative risk premium experimental setup.
The relationships between individual learning rate () and individual inverse temperature () across different combinations of social learning parameters. (left): The coloured background shows the average proportion of choosing the (optimal) safe option in the second half of the learning trials under social influences with different values of the conformity exponents and copying weights . The dashed curve shows the proportion of choosing the safe option at . (right): The differences between the mean proportion of risk aversion of asocial learners and that of social learners, highlighting regions in which (suboptimal) risk-seeking increases (orange) and (optimal) risk-aversion increases (purple) by social learning.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig2-v1.tif/full/617,/0/default.jpg)
The effect of social learning on average decision performance.
The x axis is a product of two reinforcement learning parameters , namely, the susceptibility to the hot stove effect. The y axis is the mean probability of choosing the optimal risky alternative in the last 75 trials in a two-armed bandit task whose setup was the same as in Figure 1. The black solid curve is the analytical prediction of the asymptotic performance of individual reinforcement learning with infinite time horizon (Denrell, 2007). The analytical curve shows a choice shift emerging at ; that is, individual learners ultimately prefer the safe to the risky option in the current setup of the task when . The dotted curves are mean results of agent-based simulations of social learners with two different mean values of the copying weight (green and yellow, respectively) and asocial learners with (purple). The difference between the agent-based simulation with and the analytical result was due to the finite number of decision trials in the simulation, and hence, the longer the horizon, the closer they become (Figure 2—figure supplement 1). Each panel shows a different combination of the inverse temperature and the conformity exponent .
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig2-figsupp1-v1.tif/full/617,/0/default.jpg)
The effect of social learning on the average decision performance on the longer time horizon.
The x axis is an interaction of two reinforcement learning parameters , that is, the susceptibility to the hot stove effect. The y axis is the mean probability of choosing the optimal risky alternative in the last 75 trials in the two-armed bandit task whose setup was the same as in Figures 1 and 2 in the main text (i.e. , s.d. = 1) except for the longer time horizon compared to the time horizon used in the main text (). The dotted curves are the mean result of agent-based simulations of groups of social learners with two different mean values of the copying weight or individual learners with . Each panel shows a different combination of the inverse temperature and the conformity exponent . The black solid curve is the theoretical benchmark where individual reinforcement learners were expected to asymptote with . Compared to Figure 2 in the main text, individual learners got closer to the benchmark. On the other hand, the performance of social learners remained deviated from the benchmark, suggesting that social influence had a qualitative impact on the course of learning and decision making, rather than merely slowing down approaching the equilibrium of individual learning.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig2-figsupp2-v1.tif/full/617,/0/default.jpg)
The effect of social learning on the time evolution of decision performance.
The x axis is the number of trials. The y axis is the mean proportion of choosing the optimal risky alternative. Each colour shows a different . For the asocial learning condition (i.e. ), the analytical benchmark to which reinforcement learners asymptote is shown as a horizontal line. Conformity exponent was 2. Group size was 8. The simulation was repeated 1000 times for each combination of parameters. Compared to asocial learning cases, social learning () qualitatively alters the course of learning, rather than just speeding up or slowing down learning.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig3-v1.tif/full/617,/0/default.jpg)
The effect of individual heterogeneity on the proportion of choosing the risky option in the two-armed bandit task.
(a) The effect of heterogeneity of , (b) , (c) , and (d) . Individual values of a focal behavioural parameter were varied across individuals in a group of five. Other non-focal parameters were identical across individuals within a group. The basic parameter values assigned to non-focal parameters were , , , and , and groups’ mean values of the various focal parameters were matched to these basic values. We simulated 3 different heterogeneous compositions: The majority (3 of 5 individuals) potentially suffered the hot stove effect (a, b) or had the highest diversity in social learning parameters (c, d; purple); the majority were able to overcome the hot stove effect (a, b) or had moderate heterogeneity in the social learning parameters (c, d; blue); and all individuals had but smaller heterogeneity (green). The yellow diamond shows the homogeneous groups’ performance. Lines are drawn through average results across the same compositional groups. Each round dot represents a group member’s mean performance. The diamonds are the average performance of each group for each composition category. For comparison, asocial learners’ performance, with which the performance of social learners can be evaluated, is shown in gray. For heterogeneous and , the analytical solution of asocial learning performance is shown as a solid-line curve. We ran 20,000 replications for each group composition.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig4-v1.tif/full/617,/0/default.jpg)
The population dynamics model.
(a) A schematic diagram of the dynamics. Solid arrows represent a change in population density between connected states at a time step. The thicker the arrow, the larger the per-capita rate of behavioural change. (b, c) The results of the asocial, baseline model where and (). Both figures show the equilibrium bias towards risk seeking (i.e., ) as a function of the degree of risk premium as well as of the per-capita probability of moving to the less preferred behavioural option . (b) The explicit form of the curve is given by . (c) The dashed curve is the analytically derived neutral equilibrium of the asocial system that results in , given by . (d) The equilibrium of the collective behavioural dynamics with social influences. The numerical results were obtained with , , and .
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig4-figsupp1-v1.tif/full/617,/0/default.jpg)
The result of the differential equation model.
The effect of both the per capita probability of exploration and (i.e. the ratio of individuals who prefer behavioural state ) on the equilibrium degree of risk seeking (i.e. ), across the different combinations of social influence parameters. Different social influence weights are shown from top to bottom (). Different conformity exponents are shown from left to right (). The dashed curve is . The numeric solution was obtained with conditions , , and .
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig5-v1.tif/full/617,/0/default.jpg)
The approximate bifurcation analysis.
The relationships between the social influence weight and the equilibrium number of individuals in the risky behavioural state across different conformity exponents and different values of risk premium , are shown as black dots. The background colours indicate regions where the system approaches either risk aversion (; blue) or risk seeking (; red). The horizontal dashed line is . Two locally stable equilibria emerge when , which suggests that the system has a bifurcation when is sufficiently large. The other parameters are set to , , and .
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig5-figsupp1-v1.tif/full/617,/0/default.jpg)
The approximate bifurcation analysis.
The relationship between the social influence weight and the equilibrium number of individuals choosing the risky alternative across the different conformity exponents , shown as black dots. The triangular points shown in the background of each panel indicate regions in which the group approaches risk aversion (i.e., ; blue) or the risk-seeking equilibrium (i.e. ; red). Two different equilibria mean that the system has a bifurcation under a given . The direction of the background triangles indicates whether increases () or decreases () relative to its starting position. The other parameters are set to , .
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig6-v1.tif/full/617,/0/default.jpg)
Prediction of the fit learning model.
Results of a series of agent-based simulations with individual parameters that were drawn randomly from the best fit global parameters. Independent simulations were conducted 100,000 times for each condition. Group size was fixed to six for the group condition. Lines are means (black-dashed: individual, coloured-solid: group) and the shaded areas are 80% Bayesian credible intervals. Mean performances of agents with different are shown in the colour gradient. (a) A two-armed bandit task. (b) A 1-risky-3-safe (four-armed) bandit task. (c) A 2-risky-2-safe (four-armed) bandit task. (d) A negative risk premium two-armed bandit task.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig6-figsupp1-v1.tif/full/617,/0/default.jpg)
Experimental results with the mixed logit model regression.
The black triangles are subjects in the individual learning condition; the orange dots are those in the group condition with group sizes ranging from 2 to 8. The solid lines are predictions from a mixed logit model for the individual condition (black) and for the group condition (orange), with the shaded area showing the 95% Bayesian credible intervals (CIs). (a) A two-armed bandit task (. (b) A 1-risky-3-safe (four-armed) bandit task (). (c) A 2-risky-2-safe (four-armed) bandit task (). (d) A negative risk premium (RP) two-armed bandit task (). The width of the CI for the individual condition in the negative RP task is due to the lack of data points in the region. The x axis is , namely, the susceptibility to the hot stove effect. (a, b, and d) The y axis is the mean proportion of choosing the risky alternative averaged over the second half of the trials. (c) The y axis is the mean proportion of choosing the optimal risky alternative averaged over the second half of the trials. The horizontal lines show the chance-level probability.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig6-figsupp2-v1.tif/full/617,/0/default.jpg)
Bayesian model comparison.
(a) The model recovery performance: model frequencies (dark shade) and exceedance probability (XP) for each pair of simulated and fitted models, calculated by the Widely Applicable Information Criterion (WAIC). (b–d) Model comparison results. The lengths of the bars indicate model frequencies. Exceedance probability (XP) of the decision-biasing model is shown.
![](https://iiif.elifesciences.org/lax/75308%2Felife-75308-fig6-figsupp3-v1.tif/full/617,/0/default.jpg)
The parameter recovery performance.
The top half and bottom half of the figure are the results of parameter recovery test 1 and 2, respectively. The left column shows the global parameters fitted for each of the two four-armed bandit tasks, the 1-risky-3-safe task () and the 2-risky-2-safe task (). The red points are the true values and the black points are the mean posterior values (i.e. recovered values). The 95% Bayesian credible intervals are shown with error bars. The middle and right column are individual-level parameters across the two task conditions (). The x axis is the true value and the y axis is the fitted (i.e. the mean posterior) individual value. The differences between the true value and the estimated value are shown in different colours (Dark: fit well). The Pearson’s correlation coefficients between the true and fitted values are shown.
Videos
A sample screenshot of the online experimental task (Individual condition).
This video was taken only for the demonstration purpose and hence not associated to any actual participant’s behaviour.
A sample screenshot of the online experimental task with N = 3 (group condition).
This video was taken only for the demonstration purpose and hence not associated to any actual participant’s behaviour. Also note that actual participants could see only one browser window per participant in the experimental sessions.
Tables
Summary of the learning model parameters.
Symbol | Meaning | Range of the value |
---|---|---|
α | Learning rate | [0, 1] |
β | Inverse temperature | [0, +∞] |
α(1+β) | Susceptibility to the hot stove effect | |
σ | Copying weight | [0, 1] |
θ | Conformity exponent | [-∞, +∞] |
Summary of the differential equation model parameters.
Symbol | Meaning | Range of the value |
---|---|---|
Density of individuals choosing and preferring | ||
Density of individuals choosing and preferring | ||
Density of individuals choosing and preferring | ||
Density of individuals choosing and preferring | ||
Per capita rate of moving to the unfavourable option | ||
Per capita rate of moving to the favourable option | ||
Per capita rate of becoming enchanted with the risky option | ||
Social influence weight | ||
Conformity exponent |
Means and 95% Bayesian credible intervals (shown in square brackets) of the global parameters of the learning model.
The group condition and individual condition are shown separately. All parameters satisfied the Gelman–Rubin criterion . All estimates are based on over 500 effective samples from the posterior.
Task category | Positive risk premium (positive RP) | Negative risk premium (negative RP) | ||
---|---|---|---|---|
Task | 1-risky-1-safe | 1-risky-3-safe | 2-risky-2-safe | 1-risky-1-safe |
Group | n = 123 | n = 97 | n = 87 | n = 93 |
μlogitα | –2.2 [-2.8,–1.5] | –1.8 [-2.3,–1.4] | –1.7 [-2.1,–1.3] | –0.09 [-0.7, 0.6] |
(Mean α) | 0.10 [0.06, 0.18] | 0.14 [0.09, 0.20] | 0.15 [0.11, 0.21] | 0.48 [0.3, 0.6] |
μlogitβ | 1.4 [1.1, 1.6] | 1.5 [1.3, 1.8] | 1.3 [1.0, 1.5] | 1.2 [1.0, 1.5] |
(Mean β) | 4.1 [3.0, 5.0] | 4.5 [3.7, 6.0] | 3.7 [2.7, 4.5] | 3.3 [2.7, 4.5] |
μlogitα | –2.4 [-3.1,–1.8] | –2.1 [-2.6,–1.6] | –2.1 [-2.5,–1.7] | –2.0 [-2.7,–1.5] |
(Mean σ) | 0.08 [0.04, 0.14] | 0.11 [0.07, 0.17] | 0.11 [0.08, 0.15] | 0.12 [0.06. 0.18] |
μθ = mean θ | 1.4 [0.58, 2.3] | 1.6 [0.9, 2.4] | 1.8 [1.0, 2.9] | 1.6 [0.9, 2.3] |
Individual | n = 45 | n = 51 | n = 64 | n = 25 |
μlogitα | –2.1 [-3.1,–0.87] | –2.1 [-2.6,–1.6] | –1.3 [-2.1,–0.50] | –1.3 [-2.2,–0.4] |
(Mean α) | 0.11 [0.04, 0.30] | 0.11 [0.07, 0.17] | 0.21 [0.11, 0.38] | 0.2 [0.1, 0.4] |
μlogitβ | 0.42 [-0.43, 1.1] | 0.91 [0.63, 1.2] | 0.76 [0.42, 1.1] | 1.2 [0.9, 1.4] |
(Mean β) | 1.5 [0.65, 3.0] | 2.5 [1.9, 3.3] | 2.1 [1.5, 3.0] | 3.3 [2.5, 4.1] |
Means and 95% Bayesian credible intervals (CIs; shown in square brackets) of the posterior estimations of the mixed logit model (generalised linear mixed model) that predicts the probability of choosing the risky alternative in the second half of the trial (.
All parameters satisfied the Gelman–Rubin criterion . All estimates are based on over 500 effective samples from the posterior. Coefficients whose CI is either below or above 0 are highlighted.
Task category | Positive Risk Premium (positive RP) | Negative Risk Premium (negative RP) | ||
---|---|---|---|---|
Task | 1-risky-1-safe | 1-risky-3-safe | 2-risky-2-safe | 1-risky-1-safe |
n = 168 | n = 148 | n = 151 | n = 118 | |
Intercept | –0.1 [-0.6, 0.3] | –1.1 [-1.5,–0.6] | –0.8 [-1.2,–0.4] | –3.5 [-4.4,–2.7] |
Susceptibility to the hot stove effect (α(β+1)) | –0.9 [-1.3,–0.4] | –1.0 [-1.5,–0.5] | –0.9 [-1.3,–0.6] | 0.6 [-0.1, 1.4] |
Group (no = 0/yes = 1) | 0.0 [-0.7, 0.7] | –0.2 [-1.0, 0.7] | 0.4 [-0.5, 1.2] | 3.8 [2.7, 4.9] |
Group × α(β+1) | 0.6 [0.0, 1.1] | 0.4 [0.0, 0.9] | 0.3 [-0.1, 0.7] | –1.1 [-1.9,–0.3] |
Group × copying weight σ | 1.4 [0.5, 2.3] | 1.9 [0.8, 3.0] | 2.2 [0.4, 4.0] | 3.8 [2.2, 5.3] |
Group × conformity exponent θ | –0.7 [-0.9,–0.5] | 0.2 [0.0, 0.5] | –0.3 [-0.5,–0.1] | –1.8 [-2.1,–1.5] |