The Self-Interest of Adolescents Overrules Cooperation in Social Dilemmas

Xiaoyan Wu; Hongyu Fu; Gökhan Aydogan; Chunliang Feng; Shaozheng Qin; Chao Liu

doi:10.7554/eLife.106840.1

Introduction

Cooperation among individuals facilitates the achievement of shared goals and enhances overall group efficiency (Fehr and Fischbacher, 2003; Nowak, 2006). For individuals, cooperation skills are key to success in society; this ability is not innate but gradually acquired through socialization (Warneken, 2018). Successful cooperation requires individuals to prioritize the common purpose over their personal interests, focusing on collective goals (Sachs et al., 2004). Experimental psychology has often used the Prisoner’s Dilemma Game (PDG; Axelrod and Hamilton (1981)) to study human cooperative behaviors. Extensive research has adapted the PDG into a repeated version to explore how people respond to interactive cooperation (Andreoni and Miller, 1993; Embrey et al., 2018), requiring individuals to adjust their responses dynamically to others and simulating real-life cooperation more closely (Axelrod and Hamilton, 1981). In such social dilemmas, individuals face a trade-off between immediate rewards from defection and long-term benefits from cooperation (Rilling et al., 2002). Decision-making in these situations is thought to engage mentalizing abilities, which are functions related to theory of mind that enable individuals to form expectations about others’ cooperative intentions (Rilling et al., 2004).

Cooperation is not an innate skill but is gradually cultivated and refined through socialization (House et al., 2020). Adolescence, in particular, marks a critical developmental phase in the transition to independent social roles (Steinberg, 2005). Studies using the PDG consistently show that adolescents cooperate less than adults (Belli et al., 2012; Nava et al., 2023; Taheri et al., 2018). This reduced cooperation is often attributed to an underdeveloped theory of mind, which may lead adolescents to underestimate others’ trustworthiness and willingness to cooperate (Gutiérrez-Roig et al., 2014; Fett et al., 2014).

However, there are findings that may not support this hypothesis. For example, a previous study found that adolescents’ lower cooperation, compared to adults, emerges only when following a partner’s cooperation. Conversely, when the partner defected, adolescents’ cooperative behaviors resembled those of adult (Gutiérrez-Roig et al., 2014). Similarly, a Trust Game study (Fett et al., 2014) reported a comparable pattern: adolescents invested less (measured as trust behavior) than adults only when the partner was cooperative. When faced with a non-cooperative partner, both adolescents and adults consistently reduced their trust behaviors. These findings suggest that adolescents, like adults, are able to adjust their behavior in response to others’ actions. This selective reduction in adolescent cooperation implies that factors beyond deficits in mentalizing may be at play. Adolescents may prioritize maximizing immediate rewards over long-term reciprocity (Rilling et al., 2002). When confident that their partner will cooperate, defection may become the optimal strategy for maximizing self-interest. This hypothesis remains untested, but computational modeling could provide a valuable approach for examining the underlying mental processes behind these behavioral variations (Farrell and Lewandowsky, 2010).

This study aimed to investigate variations in cooperative behavior between adolescents and adults and to explore the mental processes underlying these differences using computational modeling. A total of 127 adolescents and 134 adults participated in the study, playing a repeated Prisoner’s Dilemma Game (rPDG) with a presumed human partner, whose behavior was predetermined by a computer program (see Figure 1a). The program ensured consistent conditions across age groups. To enhance realism, variability was introduced into the computer-simulated partner’s behavior. Based on the standard payoff matrix of the rPDG (Figure 1b), mutual cooperation maximizes collective interests, while defection maximizes self-interest from an individual perspective. Our focus was on how adolescents respond to their partner’s consistent cooperation and defection, aiming to identify potential mental variables contributing to adolescents’ lower cooperation.

Experiment setup and behavioral results.
(a) Partner’s Cooperation Probability: In the first half of the 120 trials, the partner cooperated 78% of the time; in the second half, cooperation alternated between 20% and 80%. (b) Payoff Matrix: Payoffs are 4 for mutual cooperation, 2 for mutual defection, 0 for cooperation when the other defects, and 6 for defecting when the other cooperates. (c) Trial Illustration: After a 0.5-second fixation, participants choose a shape (triangle for cooperation, square for defection) within 4 seconds and see both players’ choices for 1.5 seconds. (**d-e**) Post hoc Comparisons: Figures 1d and 1e show the partner’s previous cooperation on the x-axis and participants’ cooperation probability on the y-axis. Large red (adolescents) and blue (adults) dots indicate mean probabilities, with black error bars for standard error (SE). Gray dots represent mean probabilities across trials, and green error bars show predicted cooperation rates with SE. Notes: n.s. p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001.

We developed computational models to investigate the dynamic variables guiding cooperative decisions in the rPDG. The model explicitly incorporates both expectations of the partner’s cooperation and the intrinsic reward of reciprocity. A basic reinforcement learning (RL) algorithm was used to model participants’ dynamic expectations regarding the partner’s cooperation. Drawing on research on asymmetric reward learning in adolescents (Palminteri et al., 2016; Rosenbaum et al., 2022), we included asymmetric updating for positive (better-than-expected) and negative (worse-than-expected) outcomes. Participants’ expectations were modeled as a trial-by-trial dynamic variable, represented by parameter p. Following previous studies (Fareri et al., 2012, 2015), a non-monetary reward for cooperation, represented by parameter ω, reflects individual preferences for mutual cooperation. The term p×ω quantifies the intrinsic reward for reciprocity.

We hypothesize that adolescents will exhibit lower overall cooperation compared to adults, consistent with previous studies (Belli et al., 2012; Nava et al., 2023; Taheri et al., 2018). Specifically, we expect adolescents to demonstrate reduced cooperation after their partner’s cooperation but not following defection (Gutiérrez-Roig et al., 2014; Fett et al., 2014). Furthermore, we aim to explore whether this lower conditional cooperation is driven by inappropriate expectations of their partners (represented by p), a reduced intrinsic reward for reciprocity (represented by p×ω), or a combination of both.

Results

Adolescents exhibit lower cooperation than adults following partner cooperation, but not defection

In each trial of the rPDG, as shown in Figure 1c, participants were presented with two choices: a triangle representing cooperation and a square representing defection. The choice associated with each symbol was randomly balanced across participants. They were informed that they were playing the game simultaneously with another partner. After making their decision, participants were shown both their own choice and that of their partner. We performed a generalized linear mixed model (GLMM1) analysis (see Appendix Table 1) to examine the effects of each independent variable and their interactions on the decision to cooperate or defect.

Consistent with most previous studies (Belli et al., 2012; Nava et al., 2023; Taheri et al., 2018), adolescents cooperated less than adults (b of group = 0.79, 95% CI = [0.311, 1.270], p = 0.001; Figure 1d). Following the interaction of group × previous trial × partner’s choice (b of interaction = 0.24, 95% CI = [0.126, 0.361], p < 0.001), we found that adolescents showed significantly less cooperation compared to adults only after the partner’s cooperation (t(259)_group = −2.84, p = 0.005). However, such a difference was not significant after the partner’s defection (t(259)_group = −1.86, p = 0.064; Figure 1d). We also found that adults increased cooperation in response to their partners’ consistent cooperation (the partner cooperated once vs. the partner cooperated thrice: t(266)_adults = −2.50, p = 0.013), but this pattern was not observed in adolescents (t(252)_adolescents = −1.18, p = 0.239, see Figure 1d).

Nevertheless, both groups significantly decreased cooperation in response to the partner’s continual defection (the partner defected once vs. the partner defected twice: t(266)_adults = 4.46, p < 0.001, t(252)_adolescents = 2.78, p = 0.006; the partner defected once vs. the partner defected thrice: t(266)_adults = 5.56, p < 0.001 for adults, t(252)_adolescents = 4.32, p < 0.001 for adolescents, Figure 1e).

Asymmetric RL learning in the social reward model best explains cooperative decisions of adolescents and adults

Computational modeling was used to simulate participants’ mental processes during the rPDG. Starting with a baseline model that assumed decisions were made through random selection (Model 1), we compared several alternatives: a win-stay and loss-shift model (Model 2), a reward learning model (Model 3), an inequality aversion model (Model 4), and a social reward model (Model 5). Among these, the social reward model outperformed the others. We then compared a basic RL algorithm (Model 6), an influence learning rule (Model 7), and an asymmetric RL learning rule (Model 8) within the social reward framework. The asymmetric RL learning model best explained the cooperative decisions of both adolescents and adults (see Figure 2a for adolescents and Figure 2b for adults; methods for details). Model recovery analysis indicated that the asymmetric RL learning within the social reward model was distinguishable from the other models (Figure 2c) and accurately captured the behaviors of both adolescents (Figure 2d) and adults (Figure 2e). For further validation of the best-fitting model, see Figure 2—figure Supplement 1 for model predictions, Figure 2—figure Supplement 2 for the distributions of free parameters, and Figure 2—figure Supplement 3 for parameter recovery.

Computational modeling.
(**a-b**) Model comparisons for adolescents and adults, respectively. The y-axis represents model fitness based on the Akaike Information Criterion with a correction for sample size (AICc) (Hurvich and Tsai, 1989). For each participant, the model with the lowest AICc served as a reference to compute ΔAICc by subtracting it from the AICc of other models (ΔAICc = AICc_x - AICc_lowest). A lower ΔAICc indicates a better model fit. Protected exceedance probability (PEP) is a group-level measure that assesses the likelihood of each model’s superiority over the others (Rigoux et al., 2014). (c) Model recovery analysis. Each model was used to generate 100 synthetic datasets, and for each dataset, model fitting and comparison were performed. Each column corresponds to one generative model, and each row corresponds to one fitting model. The color in each cell indicates the probability that the synthetic datasets generated by the model in the column were best fit by the model in the row, with a darker color denoting a higher probability. (**d-e**) Model prediction. Sample illustration of the best-fitting model prediction versus data for adolescents and adults, respectively.
Figure 2—figure supplement 1. Model Prediction. This figure compares the actual data and predictions of best-fitting model for adults (a) and adolescents (b). The x-axis represents the trial number, while the y-axis represents the mean cooperation probability of all participants. The shaded area indicates the 95% confidence interval.
Figure 2—figure supplement 2. Distributions of Estimated Parameters from the Best-Fitting Model. Each panel displays one parameter. The histograms and their kernel fits are represented by color bars and curves, respectively. Red indicates participants in the adolescent sample, and blue denotes those in the adult sample. Parameters have been transformed into a log scale for enhanced visualization.
Figure 2—figure supplement 3. Parameter Recovery for the Best-Fitting Model. Each panel represents one parameter. Each dot corresponds to one virtual participant. The value of r indicates Pearson’s correlation coefficient between the true values (estimated from the participants) and the recovered parameters.

Distinct learning rates and social preferences between adolescents and adults in repeated cooperation

Although the asymmetric RL learning in the social reward model best explained the behaviors of both adolescents and adults, the two groups exhibited distinct learning dynamics and social preferences for cooperation. Specifically, adolescents applied a higher positive learning rate (a+, t(259) = 2.95, p = 0.003, Figure 3a) to update better-than-expected prediction errors, and a lower negative learning rate (a−, t(259) = −2.62, p = 0.009, Figure 3b) for worse-than-expected prediction errors.

Learning Rates and Social Preferences.
(**a-d**) Comparison between adolescents and adults for positive learning rate (α+), negative learning rate (α−), social preference (ω), and inverse temperature (β), respectively. (**e-h**) Correlation between age and positive learning rate, negative learning rate, social preference, and inverse temperature, respectively. Notes: *p < 0.05; **p < 0.01.

Additionally, a positive correlation was found between participants’ age and the negative learning rate (a−, r = 0.21, p < 0.001, Figure 3f), while no significant correlation was observed with the positive learning rate (a+, r = −0.09, p = 0.16, Figure 3e).

Furthermore, adolescents displayed a weaker preference for cooperation compared to adults (ω, t(259) = −3.03, p = 0.003, Figure 3c), and their social preferences for cooperation increased with age (r = 0.20, p < 0.001, Figure 3g). Additionally, adolescents exhibited a higher inverse temperature parameter compared to adults, indicating they were more sensitive to utility differences between cooperation and defection (β, t(259) = 2.14, p = 0.034, Figure 3d). This sensitivity decreased with age, as shown by a negative correlation with age (r = −0.17, p = 0.007, Figure 3h).

Adolescents compared to adults show no inappropriate expectations but less intrinsic reward for reciprocity

To further explore what underlies the observed decrease in cooperation among adolescents, we focused on two hidden trial-by-trial updating variables: the partner cooperation expectation (p) and the intrinsic reward for reciprocity (p×ω). Additionally, participants’ self-reported cooperative-ness scores, assessed every 15 trials, provided further insight into their subjective estimation of the partner’s willingness to cooperate.

Partner cooperation expectation

We performed a linear mixed model (LMM1, Appendix Table 2) on partner cooperation expectation to assess the effects of each independent variable and their interactions. Following the interaction of group × previous trial × partner’s choice (b of interaction = 0.03, 95% CI = [0.022, 0.038], p < 0.001), we found that the partner cooperation expectation for both adolescents and adults increased with the partner’s consistent cooperation (the partner cooperated once vs. the partner cooperated twice: t(252)_adolescents = −2.81, p = 0.005, t(266)_adults = −4.45, p < 0.001; the partner cooperated once vs. the partner cooperated thrice: t(252)_adolescents = −3.69, p < 0.001, t(266)_adults = −6.23, p < 0.001; Figure 4a). Additionally, expectations decreased with the partner’s consistent defection (the partner cooperated once vs. the partner cooperated twice: t(252)_adolescents = 4.44, p < 0.001, t(266)_adults = 7.02, p < 0.001; the partner cooperated once vs. the partner cooperated thrice: t(252)_adolescents = 5.60, p < 0.001, t(266)_adults = 8.40, p < 0.001; Figure 4b). These results showed that both adolescents and adults held very similar expectations toward their partner’s cooperation and did not have significant differences between the groups (b of group = -0.04, 95% CI = [-0.102, 0.021], p = 0.198).

Analysis of hidden variables from the best-fitting model.
(**a-b**) Post-hoc Comparison of LMM1: Interaction of group × previous trial × partner’s choice. The y-axis shows participants’ expectations of partner cooperation probability (p) from the best-fitting model. (**c-d**) Self-Reported Cooperativeness: Normalized scores on partner cooperativeness for two orders of partner cooperation probability, with adolescents (orange-red line) and adults (blue line). Scores were assessed on a 0-9 scale and normalized to 0-1. The dotted line indicates the presumed partner’s cooperation probability, with mean values and standard errors shown. (**e-f**) Post-hoc Comparison of LMM3: Interaction of group × previous trial × partner’s choice. The y-axis shows participants’ intrinsic reward for reciprocity (p × ω) from the best-fitting model. The x-axis represents the consistency of the partner’s actions in previous trials (t₁, t_1,2, t_1,2,3). Colored dots with error bars indicate mean values with standard errors for adolescents (orange-red) and adults (blue), while small gray dots represent individual participants. Notes: n.s. p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001.

Moreover, we performed a LMM2 (Appendix Table 3) analysis on participants’ self-reported scores regarding the cooperativeness of their partners to examine the effects of each independent variable and their interactions. In line with the expectation of partner cooperation, we observed minimal discrepancy in the self-reported scores on partner cooperativeness between adolescents and adults. Neither the main effect of group nor the interaction achieved statistical significance (b of group = 0.17, 95% CI = [-0.51, 0.85], p = 0.616; b of interaction = 0.38, 95% CI = [-0.052, 0.812], p = 0.085; Figure 4c-d). These results provide evidence that adolescents did not differ from adults in assessing their partner’s cooperation.

Intrinsic reward for reciprocity

We performed a LMM3 (Appendix Table 4) on the intrinsic reward for reciprocity to assess the effects of each independent variable and their interactions. We found that adolescents appreciated reciprocity less than adults did (b of group = 0.52, 95% CI = [0.224, 0.816], p < 0.001).

Following the interaction of group × previous trial × partner’s choice (b of interaction = 0.37, 95% CI = [0.318, 0.424], p < 0.001), unlike adults, adolescents did not increase their intrinsic reward for reciprocity in response to the partner’s consistent cooperation (the partner cooperated once vs. the partner cooperated twice: t(252)_adolescents = −0.96, p = 0.336, t(266)_adults = −2.13, p = 0.034; the partner cooperated once vs. the partner cooperated thrice: t(252)_adolescents = −1.38, p = 0.170, t(266)_adults = −3.08, p = 0.002; Figure 4e).

However, in response to consistent defection by the partner, both adolescents and adults exhibited decreased intrinsic reward for reciprocity (the partner defected once vs. the partner defected twice: t(252)_adolescents = 1.99, p = 0.047, t(266)_adults = −2.71, p = 0.007; the partner defected once vs. the partner defected thrice: t(252)_adolescents = 2.64, p = 0.009, t(266)_adults = 3.37, p < 0.001; Figure 4f).

In brief, adolescents did not deviate in forming expectations about their partner’s willingness to cooperate, but they showed lower social preferences for cooperation and a reduced intrinsic reward for reciprocity. Specifically, compared to adults, adolescents displayed less intrinsic reward for reciprocity and did not increase it in response to consistent cooperation, though their reactions to consistent defection were similar to those of adults.

Discussion

Cooperation lies at the heart of societal functioning, facilitating the achievement of shared goals and fostering social harmony. In this study, we sought to deepen our understanding of the developmental aspects of cooperation by examining differences in cooperative behavior between adolescents and adults in the context of the rPDG. Our findings shed light on the cognitive and affective processes underlying these behaviors, offering insights into the mechanisms driving cooperative decision-making across different developmental stages.

Consistent with many previous studies (Fett et al., 2014; Gutiérrez-Roig et al., 2014; Westhoff et al., 2020), our results showed that adolescents exhibited lower levels of cooperation compared to adults. However, such lower cooperation was not generally observed during the task, but selectively occurred after their partner cooperated in the previous rounds. Moreover, our results showed that adults increased cooperation in response to their partner’s consistent cooperation, such a pattern was not observed in adolescents. However, both age groups decreased cooperation in response to consistent partner defection, indicating shared responses to non-cooperative behavior.

Our results suggest that the lower levels of cooperation observed in adolescents stem from a stronger motive to prioritize self-interest rather than a deficiency in mentalizing. In both, the expectation of partner’s cooperation estimated from computational modeling and the self-reported measurements, adolescents did not exhibit significant differences from adults. However, adolescents exhibited a weaker preference for (conditional) cooperation compared to adults, resulting in a reduced intrinsic reward for reciprocity. These findings align with prior research (Crone and Dahl, 2012; Do et al., 2017; Pfeifer and Berkman, 2018; Van Den Bos et al., 2010, 2011), suggesting that adolescents prioritize immediate gains over long-term benefits, potentially undermining the benefits of cooperation. This tendency aligns with earlier findings that adolescents exhibit heightened sensitivity to reward feedback (Blakemore and Mills, 2014; Crone and Dahl, 2012; Davis et al., 2023; Do et al., 2017; Van Den Bos et al., 2011; Van Duijvenvoorde et al., 2015), which may influence their decision-making in cooperative interactions.

It has been acknowledged that individuals update positive and negative outcomes by different weights in social cooperation, such asymmetric learning process can be modeled by basic RL algorithm with both positive and negative learning rates (Garrett and Daw, 2020; Rosenbaum et al., 2022). In this study, we find that an asymmetrical RL algorithm in a social reward model provided best model fits of the behaviors of both, adolescents and adults. Adolescents demonstrated a larger positive learning rate, but a smaller negative learning rate compared to adults, suggesting heightened sensitivity to positive feedback from cooperative behavior and reduced sensitivity to negative feedback from defection. This asymmetrical learning pattern may drive adolescents to focus more on self-beneficial social signals, maximizing immediate gains in response to cooperative behavior. These findings align with Van Den Bos et al. (2011), which highlight adolescents’ heightened sensitivity to immediate rewards and less stable trusting behavior compared to adults.

Adolescence is characterized by increased self-discovery and egocentrism (Pfeifer and Berkman, 2018; Ting et al., 2019), leading individuals to prioritize their immediate gains over long-term benefits. Consequently, adolescents may be more inclined towards selfish motives in long-term social interactions (Pfeifer and Berkman, 2018). However, it is essential to recognize that these behaviors are not static, and as adolescents mature into adults, their socio-emotional abilities evolve (Worthman and Trang, 2018), enabling a more balanced consideration of short-term and long-term outcomes (Crone and Dahl, 2012).

It is important to note some limitations of this study. First, we used artificial opponents with pre-determined cooperation patterns to better control the stimuli. While this approach allowed us to isolate specific motivations for cooperation (financial vs. social rewards), it’s possible that participants might behave differently in more natural settings. Our study serves as an initial step in understanding cooperation motivations in adolescents and adults, and future research could explore these behaviors in more real-world contexts.

In conclusion, our study contributes to an understanding of the developmental aspects of cooperation and the cognitive-affective processes underlying cooperative decision-making. By examining differences in cooperative behavior between adolescents and adults in the rPDG and integrating computational modeling, we offer valuable insights into the mechanisms driving cooperative behavior across different developmental stages. These findings have implications for promoting prosocial behaviors and designing effective socialization interventions during adolescence. By highlighting the importance of reciprocity, our findings offer insights into the developmental trajectory of cooperation from adolescence to adulthood and provide practical implications for enhancing cooperative interactions in real-world contexts.

Methods and Materials

Participants

A total of 261 participants took part in the current study, consisting of 127 adolescents (n = 127, aged 14-17 years, mean ± SD: 16.13 ± 0.63, 44 females) and 134 adults (n = 134, aged 18-30 years, mean ± SD: 21.63 ± 2.88, 79 females). No a priori power analysis was conducted. The sample size was determined based on previous studies investigating cooperation behaviors in adolescents and adults. Adolescents were recruited from the local high school with the consent of their legal guardians, while adults were recruited through advertisements on the university campus forum. Before the experiment, written informed consent was obtained from all participants and their legal guardians (for adolescents). Participants were included if they had normal or corrected-to-normal vision and no history of psychiatric or neurological illness. Exclusion criteria included any self-reported diagnosed psychiatric or neurological disorder. This study was approved by the ethics committee of Beijing Normal University. No participants dropped out of the experiment, and all data were included in the statistical analysis. Participants received compensation based on their performance in the tasks (see rPDG for details).

Experimental procedure

All participants completed the experiments in a laboratory setting with multiple participants present. They were informed that they were participating in a multiple-round interaction game with an anonymous partner. In the instructions section, we referred to the interaction game as the rPDG and refrained from using the terms ‘cooperate’ and ‘defect’ to minimize the influence of social expectations, biases and promote comparability across studies. Participants were instructed to believe that their partner was also playing the game at the same time. Compensation for their participation was based on the tokens earned during the game, with 10% of the rounds randomly selected for payment calculation at an exchange rate of 1 token to 1 yuan. Participants were explicitly informed in advance about this incentive mechanism. Prior to the formal experiment, participants underwent a quiz and several practice rounds to ensure a full understanding of the task. Following the experiment, participants completed a Social Value Orientation (SVO) task to assess their prosocial personality traits. The entire procedure lasted approximately 60 minutes. Blinding was not applicable in this study, as all participants interacted with a computer-controlled partner. To minimize potential bias, the partner’s behavior patterns and stimulus meanings were randomized across participants. A detailed protocol is available upon request.

The repeated prisoner’s dilemma game

Similarly to the classic version of PDG, rPDG involves two players. Consistent with the standard payoff matrix of the PDG (Figure 1b), when both players cooperated (defected), they each received 4 tokens (2 tokens). If the players made different decisions, the one who cooperated received 0 tokens, while the one who defected received 6 tokens. However, in reality, the actions of the partner were predetermined by a computer program. This setup allowed for a clear comparison of the behavioral responses between adolescents and adults. In order to enhance the realism of the partner’s response, we manipulated the variability in the partner’s decision making. The partner’s cooperation probability remained stable at 78% for half of the trials. In the other half of the trials, the partner’s cooperation probability varied, switching between 20%, 80%, and 20% for each set of 20 trials. The order of these two sessions was counterbalanced between participants. During the rPDG, participants were asked every 15 rounds to evaluate their partner’s cooperativeness using a 10-point scale, where 0 represents “no cooperation” and 9 represents “very high cooperation.” The question posed to the participants was “How cooperative do you think your partner is at the moment?”

Behavioral data analysis

All statistical analyses were conducted in MATLAB R2023a (RRID:SCR 001622). GLMM was implemented using the ‘fitglme’ function in MATLAB. Interaction contrasts were performed for significant interactions and, when higher-order interactions were not significant, pairwise or sequential contrasts were performed for significant main effects.

GLMM1: Participant’s choices (cooperate or defect) of all trials are the dependent variable; fixed effects include an intercept, the main effects of group (adolescents or adults), previous trial (last one trial, last two trials and last three trials), partner’s choice (cooperation or defection), and all possible interaction effects of the independent variables. Gender (male and female) and timing (trial number from 1 to 120) were also included as the control variables. Random effects include correlated random slopes of group, previous trial, partner’s choice, gender, trial number and random intercept for participants. The group, previous trial, partner’s choice, gender are the category variables. The trial number is a continuous variable. See Appendix Table 1 for the statistical results of GLMM1.

LMM2: Participants’ self-reported score on partner’s cooperativeness is the dependent variable; fixed effects include an intercept, the main effects of group (adolescents or adults), the order of the sessions (regarding the partner’s cooperation involved fixed 78% cooperation probability, followed by shifting into 20%, 80%, and 20% for each 20, or vice versa), the interaction of group × order. Gender (male and female) and timing (trial number from 1 to 120) were also included as the control variables. Random effects include correlated random slopes of group, gender, timing and random intercept for participants. The group, previous trial, partner’s choice, gender are the category variables. The trial number is a continuous variable. See Appendix Table 3 for the statistical results of LMM2.

Behavioral modelling

We systematically developed models based on various assumptions regarding participants’ decision-making processes in the rPDG.

Model 1: The baseline model

We modeled each participant’s choices in each trial (i.e., whether to cooperate) as outcomes from a Bernoulli distribution, where the cooperation probability is controlled by a parameter, b ∈ [0, 1]. For each participant, the probabilities of cooperation (q(cooperation)) and defection (q(defection)) are denoted as follows:

Model 2: Win-stay & loss-shift model

The model assumes that individuals adopt a tit-for-tat strategy in decision-making. Participants are likely to repeat their previous choice with a probability if they won, and if they lost in the last trial, where ε represents the choice variability. Winning and losing are defined based on the payoff outcomes of 4 or 6 (win) and 0 or 2 (loss), respectively.

where q_t denotes the probability of repeating the previous choice and 1 − q_t denotes the probability of shifting to another option at trial t.

Model 3: Reward learning model

This model assumes that participants make decisions by comparing the values of choosing cooperation and defection. The values of the two options are updated using a RL algorithm:

where V_c (V_d) denotes the value of cooperation (defection) option. R represents the reward feedback, which can be 0, 2, 4, or 6, depending on the payoff matrix. represents the reward prediction error for the cooperation (defection) option, and a is the learning rate. Participants’ choices are modeled by a softmax function:

where q_t denotes the participants’ probability of cooperation and β denotes the inverse temperature. The lower the value of the inverse temperature, the greater the sensitivity to the different values between options.

Model 4: Inequality aversion model

The model assumes that participants’ decisions aim to reduce both disadvantageous and advantageous inequality between themselves and their partners:

where U_c (U_d) denotes the utility of cooperation (defection). φ represents aversion to disadvantageous inequality and ν represents aversion to advantageous inequality. c_self and c_other denote the expected payoffs for cooperation to oneself and the partner, respectively, while d_self and d_other denote the expected payoffs for defection to oneself and the partner. p denotes participants’ partner cooperation expectation. The model assumes that participants did not update the inferred cooperation probability based on feedback; p is fixed at 0.5.

Based on the payoff matrix, the payoffs for participants and their partners are calculated using the following functions:

Participants’ choices are modeled by a softmax function:

where q_t denotes the participants’ probability of cooperation and β denotes the inverse temperature. The lower the value of the inverse temperature, the greater the randomness in decisions.

Model 5: Social reward model

The model assumes that participants make decisions by comparing the expected payoff of cooperation and defection based on the payoff matrix and an additional subjective bonus from cooperation:

where ω represents an additional social reward associated with cooperation.

Model 6: Social reward model with RL algorithm

The model, building on Model 5, assumed that participants update their expectations of partner cooperation trial-by-trial, based on the partner’s previous decisions, using a basic RL algorithm:

where p_t denotes participants’ expectation of partner cooperation probability at trial t and is updated by the following function:

where a is the learning rate applied to the prediction error, (P_t − p_t). P_t represents the partner’s decision at trial t, equating to 1 if the partner cooperates and 0 if the partner defects.

Model 7: Social reward model with influence model

The model is based on Model 6 and includes an additional assumption that participants update their expectation of the partner’s cooperation by considering not only the partner’s previous decisions but also the influence of their own previous decisions on the partner’s subsequent decisions. This aspect is referred to as second-order belief and is updated by the following function:

where Q_t represents the participants’ decision at trial t, equating to 1 if the participants cooperate and 0 if participants defect. represents the participants’ inferred cooperation probability of themselves from the partner’s perspective in trial t, which was inferred from function 13. Therefore, denotes the second-order prediction error, and κ is the second-order learning rate that governs the updating of second-order belief.

Model 8: Social reward model with asymmetric RL rule

The model, based on Model 6, assumes that participants asymmetrically update positive expectation errors (better than expected) and negative prediction errors (worse than expected) using two distinct learning rates:

Model fitting and model comparison

We used maximum likelihood estimation to fit models to each participant’s choices across all trials. The likelihood function, based on the binomial distribution, captured the association between each participant’s choices and each model’s predictions. To minimize the negative log-likelihood, we employed MATLAB’s (MathWorks) fmincon function. To enhance the likelihood of finding the global minimum, we repeated the parameter search process 500 times, using different starting points.

For model evaluation, we first used the Akaike Information Criterion corrected for sample size (AIC) (Hurvich & Tsai, 1989), which accounts for the model’s complexity and the number of observed data points. The second metric was the protected exceedance probability from group-level Bayesian model selection (Rigoux et al., 2014), providing a measure of the likelihood that a specific model is superior to other models under consideration.

The log-likelihood is calculated as the following function:

where q_t represents the probability of participants’ decision at trial t, equating to q(cooperation) if participants cooperate and q(defection) if participants defect.

Model identifiability and parameter recovery analyses

We performed a model identifiability analysis to ensure that model comparisons were not compromised by model misidentification. For each model, we generated synthetic datasets using parameters estimated from the data of all participants. We then fitted each alternative model to its corresponding synthetic dataset and identified the best-fitting model through model comparison. To test robustness, we repeated this procedure 100 times, calculating the percentage of instances where each model was recognized as the best model across all synthetic datasets generated by that specific model. Consistently high percentages indicated model identifiability. Additionally, we assessed parameter recovery for the best-fitting model (model 8: social reward model with an asymmetric RL rule). This assessment involved calculating the Pearson correlation between the parameters estimated from the 100 synthetic datasets (recovered parameters) and the parameters used to generate these datasets. A higher correlation coefficient between the recovered and the estimated parameters suggested non-redundancy in the parameter space (Figure 2—figure Supplement 3).

Hidden mental variables analysis

LMM1

Participants’ expectation of partner’s cooperation probability that estimated from the winning model, the variable p, is the dependent variable. The fixed and random effects remain the same as GLMM1. See Appendix Table 2 for the statistical results of LMM1.

LMM3

Participants’ intrinsic reward for reciprocity that estimated from the winning model, p×ω, are the dependent variable. The fixed and random effects remain the same as GLMM1. See Appendix Table 4 for the statistical results of LMM3.

Data analysis

All analyses were performed with custom code in MATLAB, version R2023a (RRID:SCR 001622). All scripts used for analysis are available at https://doi.org/10.5281/zenodo.15046430.

Supplementary material

Appendix Tables

Statistical results for cooperation decision (GLMM1).

Statistical results for partner cooperation expectation (LMM1).

Statistical results for self-reported perceived partner cooperativeness (LMM2).

Statistical results for intrinsic reward for reciprocity (LMM3).

Appendix Figures

Model Prediction.
This figure compares the actual data and predictions of best-fitting model for adults (a) and adolescents (b). The x-axis represents the trial number, while the y-axis represents the mean cooperation probability of all participants. The shaded area indicates the 95% confidence interval.

Distributions of Estimated Parameters from the Best-Fitting Model.
Each panel displays one parameter. The histograms and their kernel fits are represented by color bars and curves, respectively. Red indicates participants in the adolescent sample, and blue denotes those in the adult sample. Parameters have been transformed into a log scale for enhanced visualization.

Parameter Recovery for the Best-Fitting Model.
Each panel represents one parameter. Each dot corresponds to one virtual participant. The value of r indicates Pearson’s correlation coefficient between the true values (estimated from the participants) and the recovered parameters.

Acknowledgements

This work was supported by the Scientific and Technological Innovation(STI) 2030-Major Projects (2021ZD0200500), the National Natural Science Foundation of China (32441109, 32271092, 32130045), the Beijing Major Science and Technology Project under Contract No.Z241100001324005and the Opening Project of the State Key Laboratory of General Artificial Intelligence (SKLAGI20240P06). We thank Christian C. Ruff for his insightful discussion.

Additional information

Data and code availability

All datasets and code are available at https://doi.org/10.5281/zenodo.15046430.

Significance of findings

Strength of evidence

Abstract

Introduction

Experiment setup and behavioral results.

Results

Adolescents exhibit lower cooperation than adults following partner cooperation, but not defection

Asymmetric RL learning in the social reward model best explains cooperative decisions of adolescents and adults

Computational modeling.

Distinct learning rates and social preferences between adolescents and adults in repeated cooperation

Learning Rates and Social Preferences.

Adolescents compared to adults show no inappropriate expectations but less intrinsic reward for reciprocity

Partner cooperation expectation

Analysis of hidden variables from the best-fitting model.

Intrinsic reward for reciprocity

Discussion

Methods and Materials

Participants

Experimental procedure

The repeated prisoner’s dilemma game

Behavioral data analysis

Behavioral modelling

Model 1: The baseline model

Model 2: Win-stay & loss-shift model

Model 3: Reward learning model

Model 4: Inequality aversion model

Model 5: Social reward model

Model 6: Social reward model with RL algorithm

Model 7: Social reward model with influence model

Model 8: Social reward model with asymmetric RL rule

Model fitting and model comparison

Model identifiability and parameter recovery analyses

Hidden mental variables analysis

LMM1

LMM3

Data analysis

Supplementary material

Appendix Tables

Statistical results for cooperation decision (GLMM1).

Statistical results for partner cooperation expectation (LMM1).

Statistical results for self-reported perceived partner cooperativeness (LMM2).

Statistical results for intrinsic reward for reciprocity (LMM3).

Appendix Figures

Model Prediction.

Distributions of Estimated Parameters from the Best-Fitting Model.

Parameter Recovery for the Best-Fitting Model.

Acknowledgements

Additional information

Data and code availability

References

Article and author information

Author information

Xiaoyan Wu†

Hongyu Fu†

Gökhan Aydogan

Chunliang Feng

Shaozheng Qin

Chao Liu

Author Notes

Version history

Cite all versions

Copyright

Metrics

Xiaoyan Wu

Hongyu Fu