Neuroscience

Policy shaping based on the learned preferences of others accounts for risky decision-making under social observation

HeeYoung Seon
Dongil Chung author has email address

Department of Biomedical Engineering, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea

https://doi.org/10.7554/eLife.102228.2

Open access
Copyright information

Figures and data

Experimental paradigm.
(a) The task comprised three phases: Solo, Learning, and Observed phases. (e) During the Solo phase, participants were asked to make a series of risky choices alone, which were used to measure their own risk preference. (a,b,f) During the Learning phase, participants were introduced with two random partners and asked to predict their choices. Unbeknownst to participants, one partner had risk-aversive (Risk-averse partner) and the other partner had risk-tolerant (Risk-seeking partner) preferences. To help partner identification, each partner was labeled with an alphabet letter (A or B) and color-coded (counterbalanced). On each trial, an agent identifier that indicates the identity of the predicted partner was presented on the center of the screen. (a,g) During the Observed phase, participants were asked to make the same type of gamble choices as the Solo phase. Critically, at the beginning of some trials (‘Observer trial’), participants were informed that their choice on the corresponding trial will be later used in the Learning phase for one of the two assigned partners. On these Observer trials, the identity of the designated partner on each trial was presented as an avatar observing through an open door. ‘No observer trials’, the trials at which individuals’ choices will not be presented to any partners, were informed with a vacant open door. (c) To depict individuals’ prediction performance during the Learning phase, participants’ prediction choices were binned with a bin-size of 6 trials and 3-trial overlaps. Along the repeated trials of prediction with feedbacks, individuals successfully learned the two partners’ simulated risk preferences. Error bars indicate s.e.m. (d) At the end of the Learning phase, individuals were asked to answer to a few questions regarding their impression about each partner’s characteristics. Particiapnts’ reports on the question ‘How risky was this partner?’ showed a consistent pattern with their prediction behavior, such that they evaluated the Risk-seeking partner to be significantly more riskier than the Risk-averse partner (t(42)=-35.83, P=4.10e-33). Grey dots represent each individual’s evalation score; Erro rbars indicate s.e.m.; ***P < 0.001.

Behavioral results.
(a) During the Learning phase, individuals were asked to predict partners’ choices. On the very first prediction trial, individuals had to make predictions without any information about partners. Compared to the choices that individuals would have made based on their estimated risk preferences, individuals predicted that both partners are more likely to choose the risky option (Risk-averse partner: χ²=3.33, P=0.068; Risk-seeking partner: χ²=7.37, P=0.0066). (b) Individuals did not receive feedbacks about their predictions on the last 10 trials in the Learning phase. On these trials, participants still predicted that one partner whose true risk preference is set to be riskier than the other partner to make riskier choices, and vice versa (t(42)= -21.54, P=2.56e-24). Each dot represents an individual participant. (c) In the Observed phase, individuals made gambling choices under three different conditions: Risk-averse observer, No observer, and Risk-seeking observer trials. Relative to No observer trials, participants made more safe gambles when the risk-averse partner was to observe their choices, and more risky gambles when the risk-seeking partner was to observe (repeated-measures ANOVA, F(2, 42) = 6.82, P = 0.0018; paired t-tests: Risk-averse vs. No observer: t(42) = – 2.28, P = 0.028; Risk-seeking vs. No observer: t(42) = 1.84, P = 0.072; Risk-averse vs. Risk-seeking: t(42) = –3.28, P = 0.0021). Error bars indicate s.e.m.; †P < 0.1, *P < 0.05, **P < 0.01. ***P < 0.001. (d) Model parameters were estimated using the Social reliance model. Light-colored dots represent individuals. Filled green dots and empty markers indicate means and medians of each parameter, respectively. (e) Estimated Social reliance parameters well explained individuals’ choices during the Observed phase. Specifically, individuals who relied the most on the observers’ choice tendencies chose the risky option the least when the Risk-averse partner would be observing, but chose the risky option the most when the Risk-seeking partner would be observing. Each dot represents an individual participant, and solid lines indicate regression lines.

dmPFC and TPJ are recruited for valuation under social observation in addition to the regions tracking non-social subjective value.
(a) When viewing gamble options during the Solo phase, trial-by-trial probability of the chosen option was positively encoded in the vmPFC (x = –3, y = 62, z = –13, k_E = 165, cluster-level P_{FWE, SVC} = 0.009) and vStr (x = 3, y = 14, z = -10, k_E =40, cluster-level P_{FWE, SVC} = 0.015), and negatively encoded in the dACC (x = 12, y = 32, z = 29, k_E = 386, cluster-level P_{FWE, SVC} = 0.005). These brain regions were set as regions-of-interest (ROI) for the decision-making signals in the Observed phase where gambling choices were identical besides the social context. (b) To examine whether the same decision-tracking regions were recruited in the Observed phase, trial-by-trial probability of the chosen option was calculated based on our suggested Social reliance model. As expected, the same type of decision probability information comprising the social and non-social components was tracked in the ROIs during the Observed phase. Each dot represents an individual participant, and error bars indicate s.e.m.; *P < 0.05; **P < 0.01; ***P < 0.001. (c) Whole brain analysis revealed that trial-by-trial probability of the chosen option was positively encoded in the bilateral TPJ when individuals were viewing gamble options during the Observed phase (left TPJ: x = –54, y = –37, z = 14, k_E = 104, P_unc. < 0.001; right TPJ: x = 63, y = –40, z = 17, k_E = 191, cluster-level P_{FWE, SVC} = 0.019). (d) An additional whole-brain analysis revealed that the dmPFC responded to the initial social cue (x = –3, y = 50, z = 14, k_E = 22, P_unc. < 0.005).

TPJ-dmPFC connectivity is associated with individuals’ social reliance.
To examine whether the dmPFC_PPI and the TPJ interacted with each other while individuals made choices under social observation, we conducted psychophysiological interaction (PPI) analyses. (a,b) The functional connectivity between the dmPFC_contrast from fig. 3d and its adjacent, anatomically distinct region within the dmPFC region (dmPFC_PPI) was positively associated with log-transformed Social reliance (peak at [x = 3, y = 50, z = 5], k_E = 74, cluster-level P_{FWE, SVC} = 0.011). The clusters displayed in yellow P_unc. < 0.005 and red P_unc. < 0.001. (c) Additional PPI analysis between the TPJ from fig. 3c and the dmPFC_PPI from fig. 4a (a region sensitive to the initial social cue) was also positively associated with log-transformed Social reliance (r = 0.43, P = 0.018). (d) The positive relationship between individuals’ social reliance and TPJ-dmPFC_PPI connectivity was mediated by the dmPFC_contrast-dmPFC_PPI connectivity (a: β = 0.15, P = 0.0013, b: β = 2.12, P = 1.42e-06, a × b: β = 0.32, P = 0.0016). Black and gray arrows indicate significant and non-significant associations between the components, respectively. Red arrow indicates a significant mediation effect; *P < 0.05; **P < 0.01; ***P < 0.001.

Task timeline for each task phase.
The task comprised three phases: (a) Solo, (b) Learning, and (c) Observed phases. (a) The Solo phase was conducted inside the MR scanner. Each trial started with a fixation screen. After the fixation screen, two gambles and the agent identifier were presented. Individuals were asked to choose one of the two presented gambles by pressing the corresponding button when a choice cue (a green circle at the center) appeared. At the end of each trial, individuals were presented with a choice review. (b) The Learning phase was conducted outside the scanner. In the Learning phase, participants were asked to predict two partners’ choices. At the beginning of each trial, the identity of the partner whose choice they had to predict was presented in a form of an avatar. When two gamble options were revealed, the partner identifier that matches the identity of the partner was present together. At the end of each prediction trial, individuals were shown feedback displaying the gamble chosen by the partner in a green box. (c) The Observed phase was conducted inside the MR scanner. The procedure of the Observed phase was almost the same as that of the Solo phase. In addition, at the beginning of each trial, individuals were presented with the information whether (or not) the choice was being observed by a partner. On the trials where their choices were observed (‘Observer trial’), the identity of the observing partner was presented as an avatar standing at an open door. On these trials, the partner identifier was displayed at the top of the screen when the two gamble options were revealed. On the trials where their choices were made unobserved (‘No observer trial’), an empty open door was displayed. For all trials, the face of the avatar selected by the participant appeared as the agent identifier in the center of the screen when the two lotteries were revealed.

Formal model comparison and identifiability evaluation.
(a) The model fit of the Social reliance model (Model 1) was compared with two alternative models: Risk preference change model (Model 2) and Other-conferred utility model (Model 3). The Social reliaance model explained participants’ behavioral choices the best (smaller integrated Bayesian Information Criteria (iBIC) indicates better fit). (b) To confirm whether our tested models were identifiable from one another, we performed a model recovery analysis. The best fitting model at the group-level for each simulated model was the true model used to generate the simulated dat. The values in the confusion matrix represent the probabilities that the best models are the same as the simulated model for each case. (c) To confirm whether we can identify each parameter from other parameters within the Social reliance model, we performed a parameter recovery analysis. All parameters in the model showed significant positive correlation between the simulated (True) and re-estimated parameters, indicating that all parameters could be recovered.

Estimated model parameters in the Solo phase.
Individuals’ risk preference and inverse temperature (i.e., value sensitivity) parameters were estimated from their behavioral choices in the Solo phase. Note that individuals’ estimated risk preferences were between the two partners’ risk preferences, which were determined based on the behavioral choices of 74 non-overlapping participants.

Relationship between risk preferences in the Solo and the Observed phases.
(a) Individuals’ risk preferences between the Solo and the Observed phases were significantly correlated (Pearson’s correlation r=0.73, P=2.14e-08). (b) Individuals’ risk preferences in the Observed phase showed significantly larger variance compared to that in the Solo phase (χ² = 18.90, P = 1.45e-05). (c) The extent to which individuals’ risk preferences changed under social observation (Observer – No observer trials) was significantly correlated with their own risk preferences in the Solo phase (r=0.33, P=0.032), indicating that their preferences were more pronounced in the social context. (d) Participants were divided into two groups based on their partner evaluation responses to Q8 ("This partner has a similar preference to mine"). The group who evaluated the risk-seeking partner to be more similar to themselves compared to the risk-averse partner (n = 15) became more risk-seeking under social observation, while the group who believed the risk-averse partner to be more similar (n = 27) became more risk-averse. The risk preference differences between the two groups were significantly different (t(40)=2.37, P=0.023). Error bars indicate s.e.m.; *P < 0.05.

Individuals’ impressions of the partners and their impact on subsequent choices.
(a-c) Based on individuals’ partner evaluation responses to Q8 ("This partner has a similar preference to mine"), the partner who was rated to be more similar was designated as the ‘Similar partner’, while the other was designated as the ‘Dissimilar partner’. Compared to the Dissimilar partner, participants rated the Similar partner as (a) significantly more likelable (Q1 in Table S9; t(41)=5.21, P=5.72e-06), (b) more trustworthy (Q2; t(41)=3.02, P=0.0043), and (c) as having better academic grades (Q6; t(41)= 2.33, P=0.025). Error bars indicate s.e.m.; **P < 0.01; ***P < 0.001. (d) Individuals’ model-estimated social reliance parameter was positivily correlated with the average trustworthiness rating for the two partners (Q2; Pearson’s correlation r=0.40, P=0.0080).

Neural substrates of trial-by-trial encoding of utility differences between the chosen and unchosen options.
In the Solo phase, the vmPFC positively and the dorsal anterior cingulate cortex negatively tracked individuals’ subjective valuation (vmPFC: peak at [x = 0, y = 65, z = –7]; dACC: peak at [x = –6, y = 26, z = 32]; Table S2).

Representation of one’s own utility and that of the observer along the ventral- to-dorsal axis of the mPFC.
In our Social reliance model, we assumed that the decision probability based on an individual’s own risk preferences, as well as that based on the observing partner’s risk preferences, both contribute to the individual’s final choice. Neural evidence that differentiates our model from the two alternative models—the Risk preference change model and the Other-conferred utility model—would involve demonstrating neural encoding of both the participant’s own utility and the observer’s utility. The utility differences between chosen and unchosen options from the two perspectives—self and observer—were highly correlated, preventing us from including both as regressors in the same design matrix. Instead, we defined ROIs along the ventral-to-dorsal axis of the mPFC, and examined whether each ROI more strongly reflected one’s own utility or that of the observer. Based on the meta-analysis by Clithero and Rangel (2014), we defined the most ventral mPFC ROI (ROI1) as a 10 mm-radius sphere centered at coordinate [x=-3, y=41, z=-7], a region previously associated with subjective value. From this ventral seed, we defined four additional spherical ROIs (10 mm radius each) at 12 mm intervals along the ventral-to-dorsal axis, resulting in five ROIs in total: ROI2 [x=-3, y=41, z=5], ROI3 [x=-3, y=41, z=17], ROI4 [x=-3, y=41, z=29], ROI5 [x=-3, y=41, z=41]. The representation of one’s own utility (labelled as ‘Own subjective value’) and that of the observer (‘Observer’s subjective value’) was organized along the ventral-to-dorsal axis of the mPFC. Specifically, utility signals from the participant’s own perspective (SV_chosen_{, self} – SV_unchosen_{, self}) were most prominently represented in the ventral-most ROIs (blue), whereas utility signals from the observer’s perspective (SV_{chosen, observer} – SV_{unchosen, observer}) were most strongly represented in the dorsal-most ROIs (orange).

Neurosynth meta-maps for the term “decision.”
The coordinates, which represented the center of gravity from clusters within the “decision” meta-map, were used to select the ROIs for small volume correction of the results related to neural substrates of decision processes (vmPFC: [x = -3, y = 38, z = -10]; right vStr: [x = 12, y = 11, z = -7]; left vStr: [x = -12, y = 8, z = -7]; dACC: [x = 3, y = 26, z = 44]; left Insula [x = -30, y = 23, z = -1]). Specifically, these ROIs were used for the parametric modulation of Prob(chosen) in the Solo and Observed phases (Table S2, S4) and for the parametric modulation of ΔU in the Solo phase (Table S3). See ROI analyses in Methods for additional information on ROI definition.

Neurosynth meta-maps for non-valuation social processing.
(a) The coordinates, which were the center of gravity from clusters within the “social” meta-map, were used to select the ROIs for small volume correction of the results related to neural substrates of decision processes under social observation (right TPJ: [x = 51, y = -52, z = 14]; left TPJ: [x = -51, y = -58, z = 17]; left superior temporal gyrus: [x = -45, y = 11, z = -28]; right superior temporal gyrus: [x = 51, y = 11, z = -19]; left amygdala: [x = -21, y = -4, z = -19]; right amygdala: [x = 24, y = -1, z = -19]; fusiform gyrus: [x = 45, y = -46, z = -22]; precuneus: [x = 0, y = -55, z = 32]; inferior parietal lobule: [x = -27, y = -64, z = 47]; supplementary motor area: [x = 0, y = 2, z = 53]). (b) A Neurosynth meta-map, associated with the term “social” but not with the term “value”, was created using Neurosynth meta-analyses. Regions associated with “value” were excluded from the “social” map. The coordinate, which was the center of gravity from the dmPFC cluster within the map, was used to select the ROI for small volume correction of the results related to sensitivity to social cues (dmPFC: [x = 0, y = 50, z = 14]). These ROIs were used for (a) the parametric modulation of Prob(chosen) (Table S4) and (b) PPI analyses (Table S6) in the Observed phase. See ROI analyses in Methods for additional information on ROI definition.

Left TPJ-dmPFC_PPI connectivity is associated with individuals’ social reliance.
(a) The functional connectivity between the left TPJ from fig. 3c and the dmPFC_PPI from fig. 4a (a region sensitive to the initial social cue) demonstrated a positive correation with log-transformed Social reliance (r = 0.45, P = 0.013). (b) The positive relationship individuals’ social reliance and lTPJ-dmPFC_PPI connectivity was mediated by the dmPFC_contrast-dmPFC_PPI connectivity (a: β = 0.15, P = 0.0013, b: β = 1.47, P = 5.88e-05, a × b: β = 0.22, P = 0.0008). Black and gray arrows indicate significant and non-significant associations between the components, respectively. Red arrow indicates significant mediation effect; *P < 0.05; **P < 0.01; ***P < 0.001.

Demographic data
Values are expressed as means ± SD unless noted otherwise. A monthly income was converted from Korean Won to U.S Dollars (1,000 Korean Won is approximately equivalent to 1 U.S Dollar).

Regions showing significant associations with final decision probabilities P(chosen) during the Solo phase

Regions showing significant associations with subjective utility differences between the chosen and unchosen options during the Solo phase

Regions showing significant associations with final decision probabilities P(chosen) during the Observed phase

Regions showing significantly higher responses during Observer trials compared to No observer trials at the time which the identity of the observer was revealed

Regions showing significant associations between dmPFC_contrast-dmPFC_PPI connectivity and individuals’ log-transformed Social reliance

First 30 sets of gambles used for the first phase (‘Solo’) and the third phase (‘Observed’) of the task

Second 30 sets of gambles used for the second phase (‘Learning’) of the task

Questions about individuals’ impressions of each predicted partner

Sign up for email alerts