Policy shaping based on the learned preferences of others accounts for risky decision-making under social observation

  1. HeeYoung Seon
  2. Dongil Chung  Is a corresponding author
  1. Department of Biomedical Engineering, Ulsan National Institute of Science and Technology, Republic of Korea
5 figures and 2 additional files

Figures

Figure 1 with 2 supplements
Experimental paradigm.

(a) The task comprised three phases: Solo, Learning, and Observed phases. (e) During the Solo phase, participants were asked to make a series of risky choices alone, which were used to measure their own risk preference. (a, b, f) During the Learning phase, participants were introduced to two random partners and asked to predict their choices. Unbeknownst to participants, one partner had risk-aversive (Risk-averse partner) and the other partner had risk-tolerant (Risk-seeking partner) preferences. To help partner identification, each partner was labeled with an alphabet letter (A or B) and color-coded (counterbalanced). On each trial, an agent identifier that indicates the identity of the predicted partner was presented on the center of the screen. (a, g) During the Observed phase, participants were asked to make the same type of gamble choices as the Solo phase. Critically, at the beginning of some trials (‘Observer trial’), participants were informed that their choice on the corresponding trial would be later used in the Learning phase for one of the two assigned partners. On these Observer trials, the identity of the designated partner on each trial was presented as an avatar observing through an open door. ‘No observer trials’, the trials at which individuals’ choices will not be presented to any partners, were informed with a vacant open door. (c) To depict individuals’ prediction performance during the Learning phase, participants’ prediction choices were binned with a bin size of six trials and three-trial overlaps. Along the repeated trials of prediction with feedback, individuals successfully learned the two partners’ simulated risk preferences. Error bars indicate s.e.m. (d) At the end of the Learning phase, individuals were asked to answer a few questions regarding their impression about each partner’s characteristics. Participants’ reports on the question ‘How risky was this partner?’ showed a consistent pattern with their prediction behavior, such that they evaluated the Risk-seeking partner to be significantly riskier than the Risk-averse partner (t(42)=-35.83, p=4.10e-33). Gray dots represent each individual’s evaluation score; Error bars indicate s.e.m.; ***p<0.001.

Figure 1—figure supplement 1
Task timeline for each task phase.

The task comprised three phases: (a) Solo, (b) Learning, and (c) Observed phases. (a) The Solo phase was conducted inside the MR scanner. Each trial started with a fixation screen. After the fixation screen, two gambles and the agent identifier were presented. Individuals were asked to choose one of the two presented gambles by pressing the corresponding button when a choice cue (a green circle at the center) appeared. At the end of each trial, individuals were presented with a choice review. (b) The Learning phase was conducted outside the scanner. In the Learning phase, participants were asked to predict two partners’ choices. At the beginning of each trial, the identity of the partner whose choice they had to predict was presented in a form of an avatar. When two gamble options were revealed, the partner identifier that matches the identity of the partner was present together. At the end of each prediction trial, individuals were shown feedback displaying the gamble chosen by the partner in a green box. (c) The Observed phase was conducted inside the MR scanner. The procedure of the Observed phase was almost the same as that of the Solo phase. In addition, at the beginning of each trial, individuals were presented with the information whether (or not) the choice was being observed by a partner. On the trials where their choices were observed (‘Observer trial’), the identity of the observing partner was presented as an avatar standing at an open door. On these trials, the partner identifier was displayed at the top of the screen when the two gamble options were revealed. On the trials where their choices were made unobserved (‘No observer trial’), an empty open door was displayed. For all trials, the face of the avatar selected by the participant appeared as the agent identifier in the center of the screen when the two lotteries were revealed.

Figure 1—figure supplement 2
Estimated model parameters in the Solo phase.

Individuals’ risk preference and inverse temperature (i.e. value sensitivity) parameters were estimated from their behavioral choices in the Solo phase. Note that individuals’ estimated risk preferences were between the two partners’ risk preferences, which were determined based on the behavioral choices of 74 non-overlapping participants.

Figure 2 with 3 supplements
Behavioral results.

(a) During the Learning phase, individuals were asked to predict partners’ choices. On the very first prediction trial, individuals had to make predictions without any information about partners. Compared to the choices that individuals would have made based on their estimated risk preferences, individuals predicted that the partners were more likely to choose the risky option (Wilcoxon signed-rank test W=66, p=0.0495). (b) Individuals did not receive feedback about their predictions on the last 10 trials in the Learning phase. On these trials, participants still predicted that one partner whose true risk preference is set to be riskier than the other partner would make riskier choices, and vice versa (t(42) = –21.54, p=2.56e-24). Each dot represents an individual participant. (c) In the Observed phase, individuals made gambling choices under three different conditions: Risk-averse observer, No observer, and Risk-seeking observer trials. Relative to No observer trials, participants made more safe gambles when the risk-averse partner was to observe their choices, and more risky gambles when the risk-seeking partner was to observe (repeated-measures ANOVA, F(2, 42)=7.26, p=0.0012; paired t-tests: Risk-averse vs. No observer: t(42) = –2.28, p=0.028; Risk-seeking vs. No observer: t(42) = 1.84, p=0.072; Risk-averse vs. Risk-seeking: t(42) = –3.28, p=0.0021). Error bars indicate s.e.m.; †p<0.1, *p<0.05, **p<0.01. ***p<0.001. (d) Model parameters were estimated using the Social reliance model. Light-colored dots represent individuals. Filled green dots and empty markers indicate means and medians of each parameter, respectively. (e) Estimated Social reliance parameters well explained individuals’ choices during the Observed phase. Specifically, individuals who relied the most on the observers’ choice tendencies chose the risky option the least when the Risk-averse partner would be observing, but chose the risky option the most when the Risk-seeking partner would be observing. Each dot represents an individual participant, and solid lines indicate regression lines.

Figure 2—figure supplement 1
Formal model comparison and identifiability evaluation.

(a) The model fit of the Social reliance model (Model 1) was compared with two alternative models: Risk preference change model (Model 2) and Other-conferred utility model (Model 3). The Social reliance model explained participants’ behavioral choices the best (smaller integrated Bayesian Information Criteria (iBIC) indicates better fit). (b) To confirm whether our tested models were identifiable from one another, we performed a model recovery analysis. The best-fitting model at the group level for each simulated model was the true model used to generate the simulated data. The values in the confusion matrix represent the probabilities that the best models are the same as the simulated model for each case. (c) To confirm whether we can identify each parameter from other parameters within the Social reliance model, we performed a parameter recovery analysis. All parameters in the model showed significant positive correlation between the simulated (True) and re-estimated parameters, indicating that all parameters could be recovered.

Figure 2—figure supplement 2
Relationship between risk preferences in the Solo and the Observed phases.

(a) Individuals’ risk preferences between the Solo and the Observed phases were significantly correlated (Pearson’s correlation r=0.73, p=2.14e-08). (b) Individuals’ risk preferences in the Observed phase showed significantly larger variance compared to that in the Solo phase (χ2=18.90, p=1.45e-05). (c) The extent to which individuals’ risk preferences changed under social observation (Observer – No observer trials) was significantly correlated with their own risk preferences in the Solo phase (r=0.33, P=0.032), indicating that their preferences were more pronounced in the social context. (d) Participants were divided into two groups based on their partner evaluation responses to Q8 (‘This partner has a similar preference to mine’). The group who evaluated the risk-seeking partner to be more similar to themselves compared to the risk-averse partner (n=15) became more risk-seeking under social observation, while the group who believed the risk-averse partner to be more similar (n=27) became more risk-averse. The risk preference differences between the two groups were significantly different (t(40)=2.37, p=0.023). Error bars indicate s.e.m.; *p<0.05.

Figure 2—figure supplement 3
Individuals’ impressions of the partners and their impact on subsequent choices.

(ac) Based on individuals’ partner evaluation responses to Q8 (‘This partner has a similar preference to mine’), the partner who was rated to be more similar was designated as the ‘Similar partner’, while the other was designated as the ‘Dissimilar partner’. Compared to the Dissimilar partner, participants rated the Similar partner as (a) significantly more likable (Q1 in Supplementary file 1I; t(41) = 5.21, p=5.72e-06), (b) more trustworthy (Q2; t(41) = 3.02, p=0.0043), and (c) as having better academic grades (Q6; t(41) = 2.33, p=0.025). Error bars indicate s.e.m.; **p<0.01; ***p<0.001. (d) Individuals’ model-estimated social reliance parameter was positively correlated with the average trustworthiness rating for the two partners (Q2; Pearson’s correlation r=0.40, p=0.0080).

Figure 3 with 4 supplements
dmPFC and TPJ are recruited for valuation under social observation in addition to the regions tracking non-social subjective value.

(a) When viewing gamble options during the Solo phase, trial-by-trial probability of the chosen option was positively encoded in the vmPFC (x = –3, y=62, z = –13, kE = 165, cluster-level PFWE, SVC = 0.009) and vStr (x=3, y=14, z = –10, kE = 40, cluster-level PFWE, SVC = 0.015), and negatively encoded in the dACC (x=12, y=32, z=29, kE = 386, cluster-level PFWE, SVC = 0.005). These brain regions were set as regions-of-interest (ROI) for the decision-making signals in the Observed phase where gambling choices were identical besides the social context. (b) To examine whether the same decision-tracking regions were recruited in the Observed phase, trial-by-trial probability of the chosen option was calculated based on our suggested Social reliance model. As expected, the same type of decision probability information comprising the social and non-social components was tracked in the ROIs during the Observed phase. Each dot represents an individual participant.; *p<0.05; **p<0.01; ***p<0.001. (c) Whole brain analysis revealed that trial-by-trial probability of the chosen option was positively encoded in the bilateral TPJ when individuals were viewing gamble options during the Observed phase (left TPJ: x = –54, y = –37, z=14, kE = 104, Punc. <0.001; right TPJ: x=63, y = –40, z=17, kE = 191, cluster-level PFWE, SVC = 0.019). (d) An additional whole-brain analysis revealed that the dmPFC responded to the initial social cue (x = –3, y=50, z=14, kE = 22, Punc. <0.005).

Figure 3—figure supplement 1
Neural substrates of trial-by-trial encoding of utility differences between the chosen and unchosen options.

In the Solo phase, the vmPFC positively and the dorsal anterior cingulate cortex negatively tracked individuals’ subjective valuation (vmPFC: peak at [x=0, y=65, z = –7]; dACC: peak at [x = –6, y=26, z=32]; Supplementary file 1B).

Figure 3—figure supplement 2
Representation of one’s own utility and that of the observer along the ventral-to-dorsal axis of the mPFC.

In our Social reliance model, we assumed that the decision probability based on an individual’s own risk preferences, as well as that based on the observing partner’s risk preferences, both contribute to the individual’s final choice. Neural evidence that differentiates our model from the two alternative models—the Risk preference change model and the Other-conferred utility model—would involve demonstrating neural encoding of both the participant’s own utility and the observer’s utility. The utility differences between chosen and unchosen options from the two perspectives—self and observer—were highly correlated, preventing us from including both as regressors in the same design matrix. Instead, we defined ROIs along the ventral-to-dorsal axis of the mPFC and examined whether each ROI more strongly reflected one’s own utility or that of the observer. Based on the meta-analysis by Clithero and Rangel, 2014, we defined the most ventral mPFC ROI (ROI1) as a 10 mm-radius sphere centered at coordinate [x=-3, y=41, z=-7], a region previously associated with subjective value. From this ventral seed, we defined four additional spherical ROIs (10 mm radius each) at 12 mm intervals along the ventral-to-dorsal axis, resulting in five ROIs in total: ROI2 [x=-3, y=41, z=5], ROI3 [x=-3, y=41, z=17], ROI4 [x=-3, y=41, z=29], ROI5 [x=-3, y=41, z=41]. The representation of one’s own utility (labeled as ‘Own subjective value’) and that of the observer (‘Observer’s subjective value’) was organized along the ventral-to-dorsal axis of the mPFC. Specifically, utility signals from the participant’s own perspective (SVchosen, self – SVunchosen, self) were most prominently represented in the ventral-most ROIs (blue), whereas utility signals from the observer’s perspective (SVchosen, observer – SVunchosen, observer) were most strongly represented in the dorsal-most ROIs (orange).

Figure 3—figure supplement 3
Neurosynth meta-maps for the term “decision..

The coordinates, which represented the center of gravity from clusters within the “decision” meta-map, were used to select the ROIs for small volume correction of the results related to neural substrates of decision processes (vmPFC: [x = –3, y=38, z = –10]; right vStr: [x=12, y=11, z = –7]; left vStr: [x = –12, y=8, z = –7]; dACC: [x=3, y=26, z=44]; left Insula [x = –30, y=23, z = –1]). Specifically, these ROIs were used for the parametric modulation of Prob(chosen) in the Solo and Observed phases (Supplementary file 1B, D) and for the parametric modulation of ∆U in the Solo phase (Supplementary file 1C). See ROI analyses in Materials and methods for additional information on ROI definition.

Figure 3—figure supplement 4
Neurosynth meta-maps for non-valuation social processing.

(a) The coordinates, which were the center of gravity from clusters within the ‘social’ meta-map, were used to select the ROIs for small volume correction of the results related to neural substrates of decision processes under social observation (right TPJ: [x=51, y = –52, z=14]; left TPJ: [x = –51, y = –58, z=17]; left superior temporal gyrus: [x = –45, y=11, z = –28]; right superior temporal gyrus: [x=51, y=11, z = –19]; left amygdala: [x = –21, y = –4, z = –19]; right amygdala: [x=24, y = –1, z = –19]; fusiform gyrus: [x=45, y = –46, z = –22]; precuneus: [x=0, y = –55, z=32]; inferior parietal lobule: [x = –27, y = –64, z=47]; supplementary motor area: [x=0, y=2, z=53]). (b) A Neurosynth meta-map, associated with the term ‘social’ but not with the term ‘value’, was created using Neurosynth meta-analyses. Regions associated with ‘value’ were excluded from the ‘social’ map. The coordinate, which was the center of gravity from the dmPFC cluster within the map, was used to select the ROI for small volume correction of the results related to sensitivity to social cues (dmPFC: [x=0, y=50, z=14]). These ROIs were used for (a) the parametric modulation of Prob(chosen) (Supplementary file 1D) and (b) PPI analyses (Supplementary file 1F) in the Observed phase. See ROI analyses in Materials and methods for additional information on ROI definition.

Figure 4 with 1 supplement
TPJ-dmPFC connectivity is associated with individuals’ social reliance.

To examine whether the dmPFCPPI and the TPJ interacted with each other while individuals made choices under social observation, we conducted psychophysiological interaction (PPI) analyses. (a, b) The functional connectivity between the dmPFCcontrast from Figure 3d and its adjacent, anatomically distinct region within the dmPFC region (dmPFCPPI) was positively associated with log-transformed Social reliance (peak at [x=3, y=50, z=5], kE = 74, cluster-level PFWE, SVC = 0.011). The clusters displayed in yellow Punc. <0.005 and red Punc. <0.001. (c) Additional PPI analysis between the TPJ from Figure 3c and the dmPFCPPI from a (a region sensitive to the initial social cue) was also positively associated with log-transformed Social reliance (r=0.43, p=0.018). (d) The positive relationship between individuals’ social reliance and TPJ-dmPFCPPI connectivity was mediated by the dmPFCcontrast-dmPFCPPI connectivity (a: β=0.15, p=0.0013, b: β=2.12, p=1.42e-06, a×b: β=0.32, p=0.0016). Black and gray arrows indicate significant and non-significant associations between the components, respectively. Red arrow indicates a significant mediation effect; *p<0.05; **p<0.01; ***p<0.001.

Figure 4—figure supplement 1
Left TPJ-dmPFCPPI connectivity is associated with individuals’ social reliance.

(a) The functional connectivity between the left TPJ from Figure 3c and the dmPFCPPI from Figure 4a (a region sensitive to the initial social cue) demonstrated a positive correlation with log-transformed Social reliance (r=0.45, p=0.013). (b) The positive relationship between individuals’ social reliance and lTPJ-dmPFCPPI connectivity was mediated by the dmPFCcontrast-dmPFCPPI connectivity (a: β=0.15, p=0.0013, b: β=1.47, p=5.88e-05, a×b: β=0.22, p=0.0008). Black and gray arrows indicate significant and non-significant associations between the components, respectively. Red arrow indicates significant mediation effect; *p<0.05; **p<0.01; ***p<0.001.

Author response image 1
dACC and dlPFC are associated with the discrepancy between participants’ own choice tendencies and those of observing partners, as estimated based on prior beliefs about the partners’ risk preferences.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. HeeYoung Seon
  2. Dongil Chung
(2025)
Policy shaping based on the learned preferences of others accounts for risky decision-making under social observation
eLife 13:RP102228.
https://doi.org/10.7554/eLife.102228.3