Abstract
Cooperation is essential for success in society. Research consistently showed that adolescents are less cooperative than adults, which is often attributed to underdeveloped mentalizing that limits their expectations of others. However, the internal computations underlying this reduced cooperation remain largely unexplored. This study compared cooperation between adolescents and adults using a repeated Prisoner’s Dilemma Game. Adolescents cooperated less than adults, particularly after their partner’s cooperation. Computational modeling revealed that adults increased their intrinsic reward for reciprocating when their partner continued cooperating, a pattern absent in adolescents. Both computational modeling and self-reported ratings showed that adolescents did not differ from adults in building expectations of their partner’s cooperation. Therefore, the reduced cooperation appears driven by a lower intrinsic reward for reciprocity, reflecting a stronger motive to prioritize self-interest, rather than a deficiency in mentalizing or social learning. These findings provide insights into the developmental trajectory of cooperation from adolescence to adulthood.
Introduction
Cooperation among individuals facilitates the achievement of shared goals and enhances overall group efficiency (Fehr and Fischbacher, 2003; Nowak, 2006). For individuals, cooperation skills are key to success in society; this ability is not innate but gradually acquired through socialization (Warneken, 2018). Successful cooperation requires individuals to prioritize the common purpose over their personal interests, focusing on collective goals (Sachs et al., 2004). Experimental psychology has often used the Prisoner’s Dilemma Game (PDG; Axelrod and Hamilton (1981)) to study human cooperative behaviors. Extensive research has adapted the PDG into a repeated version to explore how people respond to interactive cooperation (Andreoni and Miller, 1993; Embrey et al., 2018), requiring individuals to adjust their responses dynamically to others and simulating real-life cooperation more closely (Axelrod and Hamilton, 1981). In such social dilemmas, individuals face a trade-off between immediate rewards from defection and long-term benefits from cooperation (Rilling et al., 2002). Decision-making in these situations is thought to engage mentalizing abilities, which are functions related to theory of mind that enable individuals to form expectations about others’ cooperative intentions (Rilling et al., 2004).
Cooperation is not an innate skill but is gradually cultivated and refined through socialization (House et al., 2020). Adolescence, in particular, marks a critical developmental phase in the transition to independent social roles (Steinberg, 2005). Studies using the PDG consistently show that adolescents cooperate less than adults (Belli et al., 2012; Nava et al., 2023; Taheri et al., 2018). This reduced cooperation is often attributed to an underdeveloped theory of mind, which may lead adolescents to underestimate others’ trustworthiness and willingness to cooperate (Gutiérrez-Roig et al., 2014; Fett et al., 2014).
However, there are findings that may not support this hypothesis. For example, a previous study found that adolescents’ lower cooperation, compared to adults, emerges only when following a partner’s cooperation. Conversely, when the partner defected, adolescents’ cooperative behaviors resembled those of adult (Gutiérrez-Roig et al., 2014). Similarly, a Trust Game study (Fett et al., 2014) reported a comparable pattern: adolescents invested less (measured as trust behavior) than adults only when the partner was cooperative. When faced with a non-cooperative partner, both adolescents and adults consistently reduced their trust behaviors. These findings suggest that adolescents, like adults, are able to adjust their behavior in response to others’ actions. This selective reduction in adolescent cooperation implies that factors beyond deficits in mentalizing may be at play. Adolescents may prioritize maximizing immediate rewards over long-term reciprocity (Rilling et al., 2002). When confident that their partner will cooperate, defection may become the optimal strategy for maximizing self-interest. This hypothesis remains untested, but computational modeling could provide a valuable approach for examining the underlying mental processes behind these behavioral variations (Farrell and Lewandowsky, 2010).
This study aimed to investigate variations in cooperative behavior between adolescents and adults and to explore the mental processes underlying these differences using computational modeling. A total of 127 adolescents and 134 adults participated in the study, playing a repeated Prisoner’s Dilemma Game (rPDG) with a presumed human partner, whose behavior was predetermined by a computer program (see Figure 1a). The program ensured consistent conditions across age groups. To enhance realism, variability was introduced into the computer-simulated partner’s behavior. Based on the standard payoff matrix of the rPDG (Figure 1b), mutual cooperation maximizes collective interests, while defection maximizes self-interest from an individual perspective. Our focus was on how adolescents respond to their partner’s consistent cooperation and defection, aiming to identify potential mental variables contributing to adolescents’ lower cooperation.

Experiment setup and behavioral results.
(a) Partner’s Cooperation Probability: In the first half of the 120 trials, the partner cooperated 78% of the time; in the second half, cooperation alternated between 20% and 80%. (b) Payoff Matrix: Payoffs are 4 for mutual cooperation, 2 for mutual defection, 0 for cooperation when the other defects, and 6 for defecting when the other cooperates. (c) Trial Illustration: After a 0.5-second fixation, participants choose a shape (triangle for cooperation, square for defection) within 4 seconds and see both players’ choices for 1.5 seconds. (d-e) Post hoc Comparisons: Figures 1d and 1e show the partner’s previous cooperation on the x-axis and participants’ cooperation probability on the y-axis. Large red (adolescents) and blue (adults) dots indicate mean probabilities, with black error bars for standard error (SE). Gray dots represent mean probabilities across trials, and green error bars show predicted cooperation rates with SE. Notes: n.s. p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001.
We developed computational models to investigate the dynamic variables guiding cooperative decisions in the rPDG. The model explicitly incorporates both expectations of the partner’s cooperation and the intrinsic reward of reciprocity. A basic reinforcement learning (RL) algorithm was used to model participants’ dynamic expectations regarding the partner’s cooperation. Drawing on research on asymmetric reward learning in adolescents (Palminteri et al., 2016; Rosenbaum et al., 2022), we included asymmetric updating for positive (better-than-expected) and negative (worse-than-expected) outcomes. Participants’ expectations were modeled as a trial-by-trial dynamic variable, represented by parameter p. Following previous studies (Fareri et al., 2012, 2015), a non-monetary reward for cooperation, represented by parameter ω, reflects individual preferences for mutual cooperation. The term p×ω quantifies the intrinsic reward for reciprocity.
We hypothesize that adolescents will exhibit lower overall cooperation compared to adults, consistent with previous studies (Belli et al., 2012; Nava et al., 2023; Taheri et al., 2018). Specifically, we expect adolescents to demonstrate reduced cooperation after their partner’s cooperation but not following defection (Gutiérrez-Roig et al., 2014; Fett et al., 2014). Furthermore, we aim to explore whether this lower conditional cooperation is driven by inappropriate expectations of their partners (represented by p), a reduced intrinsic reward for reciprocity (represented by p×ω), or a combination of both.
Results
Adolescents exhibit lower cooperation than adults following partner cooperation, but not defection
In each trial of the rPDG, as shown in Figure 1c, participants were presented with two choices: a triangle representing cooperation and a square representing defection. The choice associated with each symbol was randomly balanced across participants. They were informed that they were playing the game simultaneously with another partner. After making their decision, participants were shown both their own choice and that of their partner. We performed a generalized linear mixed model (GLMM1) analysis (see Appendix Table 1) to examine the effects of each independent variable and their interactions on the decision to cooperate or defect.
Consistent with most previous studies (Belli et al., 2012; Nava et al., 2023; Taheri et al., 2018), adolescents cooperated less than adults (b of group = 0.79, 95% CI = [0.311, 1.270], p = 0.001; Figure 1d). Following the interaction of group × previous trial × partner’s choice (b of interaction = 0.24, 95% CI = [0.126, 0.361], p < 0.001), we found that adolescents showed significantly less cooperation compared to adults only after the partner’s cooperation (t(259)group = −2.84, p = 0.005). However, such a difference was not significant after the partner’s defection (t(259)group = −1.86, p = 0.064; Figure 1d). We also found that adults increased cooperation in response to their partners’ consistent cooperation (the partner cooperated once vs. the partner cooperated thrice: t(266)adults = −2.50, p = 0.013), but this pattern was not observed in adolescents (t(252)adolescents = −1.18, p = 0.239, see Figure 1d).
Nevertheless, both groups significantly decreased cooperation in response to the partner’s continual defection (the partner defected once vs. the partner defected twice: t(266)adults = 4.46, p < 0.001, t(252)adolescents = 2.78, p = 0.006; the partner defected once vs. the partner defected thrice: t(266)adults = 5.56, p < 0.001 for adults, t(252)adolescents = 4.32, p < 0.001 for adolescents, Figure 1e).
Asymmetric RL learning in the social reward model best explains cooperative decisions of adolescents and adults
Computational modeling was used to simulate participants’ mental processes during the rPDG. Starting with a baseline model that assumed decisions were made through random selection (Model 1), we compared several alternatives: a win-stay and loss-shift model (Model 2), a reward learning model (Model 3), an inequality aversion model (Model 4), and a social reward model (Model 5). Among these, the social reward model outperformed the others. We then compared a basic RL algorithm (Model 6), an influence learning rule (Model 7), and an asymmetric RL learning rule (Model 8) within the social reward framework. The asymmetric RL learning model best explained the cooperative decisions of both adolescents and adults (see Figure 2a for adolescents and Figure 2b for adults; methods for details). Model recovery analysis indicated that the asymmetric RL learning within the social reward model was distinguishable from the other models (Figure 2c) and accurately captured the behaviors of both adolescents (Figure 2d) and adults (Figure 2e). For further validation of the best-fitting model, see Figure 2—figure Supplement 1 for model predictions, Figure 2—figure Supplement 2 for the distributions of free parameters, and Figure 2—figure Supplement 3 for parameter recovery.

Computational modeling.
(a-b) Model comparisons for adolescents and adults, respectively. The y-axis represents model fitness based on the Akaike Information Criterion with a correction for sample size (AICc) (Hurvich and Tsai, 1989). For each participant, the model with the lowest AICc served as a reference to compute ΔAICc by subtracting it from the AICc of other models (ΔAICc = AICcx - AICclowest). A lower ΔAICc indicates a better model fit. Protected exceedance probability (PEP) is a group-level measure that assesses the likelihood of each model’s superiority over the others (Rigoux et al., 2014). (c) Model recovery analysis. Each model was used to generate 100 synthetic datasets, and for each dataset, model fitting and comparison were performed. Each column corresponds to one generative model, and each row corresponds to one fitting model. The color in each cell indicates the probability that the synthetic datasets generated by the model in the column were best fit by the model in the row, with a darker color denoting a higher probability. (d-e) Model prediction. Sample illustration of the best-fitting model prediction versus data for adolescents and adults, respectively.
Figure 2—figure supplement 1. Model Prediction. This figure compares the actual data and predictions of best-fitting model for adults (a) and adolescents (b). The x-axis represents the trial number, while the y-axis represents the mean cooperation probability of all participants. The shaded area indicates the 95% confidence interval.
Figure 2—figure supplement 2. Distributions of Estimated Parameters from the Best-Fitting Model. Each panel displays one parameter. The histograms and their kernel fits are represented by color bars and curves, respectively. Red indicates participants in the adolescent sample, and blue denotes those in the adult sample. Parameters have been transformed into a log scale for enhanced visualization.
Figure 2—figure supplement 3. Parameter Recovery for the Best-Fitting Model. Each panel represents one parameter. Each dot corresponds to one virtual participant. The value of r indicates Pearson’s correlation coefficient between the true values (estimated from the participants) and the recovered parameters.
Distinct learning rates and social preferences between adolescents and adults in repeated cooperation
Although the asymmetric RL learning in the social reward model best explained the behaviors of both adolescents and adults, the two groups exhibited distinct learning dynamics and social preferences for cooperation. Specifically, adolescents applied a higher positive learning rate (a+, t(259) = 2.95, p = 0.003, Figure 3a) to update better-than-expected prediction errors, and a lower negative learning rate (a−, t(259) = −2.62, p = 0.009, Figure 3b) for worse-than-expected prediction errors.

Learning Rates and Social Preferences.
(a-d) Comparison between adolescents and adults for positive learning rate (α+), negative learning rate (α−), social preference (ω), and inverse temperature (β), respectively. (e-h) Correlation between age and positive learning rate, negative learning rate, social preference, and inverse temperature, respectively. Notes: *p < 0.05; **p < 0.01.
Additionally, a positive correlation was found between participants’ age and the negative learning rate (a−, r = 0.21, p < 0.001, Figure 3f), while no significant correlation was observed with the positive learning rate (a+, r = −0.09, p = 0.16, Figure 3e).
Furthermore, adolescents displayed a weaker preference for cooperation compared to adults (ω, t(259) = −3.03, p = 0.003, Figure 3c), and their social preferences for cooperation increased with age (r = 0.20, p < 0.001, Figure 3g). Additionally, adolescents exhibited a higher inverse temperature parameter compared to adults, indicating they were more sensitive to utility differences between cooperation and defection (β, t(259) = 2.14, p = 0.034, Figure 3d). This sensitivity decreased with age, as shown by a negative correlation with age (r = −0.17, p = 0.007, Figure 3h).
Adolescents compared to adults show no inappropriate expectations but less intrinsic reward for reciprocity
To further explore what underlies the observed decrease in cooperation among adolescents, we focused on two hidden trial-by-trial updating variables: the partner cooperation expectation (p) and the intrinsic reward for reciprocity (p×ω). Additionally, participants’ self-reported cooperative-ness scores, assessed every 15 trials, provided further insight into their subjective estimation of the partner’s willingness to cooperate.
Partner cooperation expectation
We performed a linear mixed model (LMM1, Appendix Table 2) on partner cooperation expectation to assess the effects of each independent variable and their interactions. Following the interaction of group × previous trial × partner’s choice (b of interaction = 0.03, 95% CI = [0.022, 0.038], p < 0.001), we found that the partner cooperation expectation for both adolescents and adults increased with the partner’s consistent cooperation (the partner cooperated once vs. the partner cooperated twice: t(252)adolescents = −2.81, p = 0.005, t(266)adults = −4.45, p < 0.001; the partner cooperated once vs. the partner cooperated thrice: t(252)adolescents = −3.69, p < 0.001, t(266)adults = −6.23, p < 0.001; Figure 4a). Additionally, expectations decreased with the partner’s consistent defection (the partner cooperated once vs. the partner cooperated twice: t(252)adolescents = 4.44, p < 0.001, t(266)adults = 7.02, p < 0.001; the partner cooperated once vs. the partner cooperated thrice: t(252)adolescents = 5.60, p < 0.001, t(266)adults = 8.40, p < 0.001; Figure 4b). These results showed that both adolescents and adults held very similar expectations toward their partner’s cooperation and did not have significant differences between the groups (b of group = -0.04, 95% CI = [-0.102, 0.021], p = 0.198).

Analysis of hidden variables from the best-fitting model.
(a-b) Post-hoc Comparison of LMM1: Interaction of group × previous trial × partner’s choice. The y-axis shows participants’ expectations of partner cooperation probability (p) from the best-fitting model. (c-d) Self-Reported Cooperativeness: Normalized scores on partner cooperativeness for two orders of partner cooperation probability, with adolescents (orange-red line) and adults (blue line). Scores were assessed on a 0-9 scale and normalized to 0-1. The dotted line indicates the presumed partner’s cooperation probability, with mean values and standard errors shown. (e-f) Post-hoc Comparison of LMM3: Interaction of group × previous trial × partner’s choice. The y-axis shows participants’ intrinsic reward for reciprocity (p × ω) from the best-fitting model. The x-axis represents the consistency of the partner’s actions in previous trials (t1, t1,2, t1,2,3). Colored dots with error bars indicate mean values with standard errors for adolescents (orange-red) and adults (blue), while small gray dots represent individual participants. Notes: n.s. p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001.
Moreover, we performed a LMM2 (Appendix Table 3) analysis on participants’ self-reported scores regarding the cooperativeness of their partners to examine the effects of each independent variable and their interactions. In line with the expectation of partner cooperation, we observed minimal discrepancy in the self-reported scores on partner cooperativeness between adolescents and adults. Neither the main effect of group nor the interaction achieved statistical significance (b of group = 0.17, 95% CI = [-0.51, 0.85], p = 0.616; b of interaction = 0.38, 95% CI = [-0.052, 0.812], p = 0.085; Figure 4c-d). These results provide evidence that adolescents did not differ from adults in assessing their partner’s cooperation.
Intrinsic reward for reciprocity
We performed a LMM3 (Appendix Table 4) on the intrinsic reward for reciprocity to assess the effects of each independent variable and their interactions. We found that adolescents appreciated reciprocity less than adults did (b of group = 0.52, 95% CI = [0.224, 0.816], p < 0.001).
Following the interaction of group × previous trial × partner’s choice (b of interaction = 0.37, 95% CI = [0.318, 0.424], p < 0.001), unlike adults, adolescents did not increase their intrinsic reward for reciprocity in response to the partner’s consistent cooperation (the partner cooperated once vs. the partner cooperated twice: t(252)adolescents = −0.96, p = 0.336, t(266)adults = −2.13, p = 0.034; the partner cooperated once vs. the partner cooperated thrice: t(252)adolescents = −1.38, p = 0.170, t(266)adults = −3.08, p = 0.002; Figure 4e).
However, in response to consistent defection by the partner, both adolescents and adults exhibited decreased intrinsic reward for reciprocity (the partner defected once vs. the partner defected twice: t(252)adolescents = 1.99, p = 0.047, t(266)adults = −2.71, p = 0.007; the partner defected once vs. the partner defected thrice: t(252)adolescents = 2.64, p = 0.009, t(266)adults = 3.37, p < 0.001; Figure 4f).
In brief, adolescents did not deviate in forming expectations about their partner’s willingness to cooperate, but they showed lower social preferences for cooperation and a reduced intrinsic reward for reciprocity. Specifically, compared to adults, adolescents displayed less intrinsic reward for reciprocity and did not increase it in response to consistent cooperation, though their reactions to consistent defection were similar to those of adults.
Discussion
Cooperation lies at the heart of societal functioning, facilitating the achievement of shared goals and fostering social harmony. In this study, we sought to deepen our understanding of the developmental aspects of cooperation by examining differences in cooperative behavior between adolescents and adults in the context of the rPDG. Our findings shed light on the cognitive and affective processes underlying these behaviors, offering insights into the mechanisms driving cooperative decision-making across different developmental stages.
Consistent with many previous studies (Fett et al., 2014; Gutiérrez-Roig et al., 2014; Westhoff et al., 2020), our results showed that adolescents exhibited lower levels of cooperation compared to adults. However, such lower cooperation was not generally observed during the task, but selectively occurred after their partner cooperated in the previous rounds. Moreover, our results showed that adults increased cooperation in response to their partner’s consistent cooperation, such a pattern was not observed in adolescents. However, both age groups decreased cooperation in response to consistent partner defection, indicating shared responses to non-cooperative behavior.
Our results suggest that the lower levels of cooperation observed in adolescents stem from a stronger motive to prioritize self-interest rather than a deficiency in mentalizing. In both, the expectation of partner’s cooperation estimated from computational modeling and the self-reported measurements, adolescents did not exhibit significant differences from adults. However, adolescents exhibited a weaker preference for (conditional) cooperation compared to adults, resulting in a reduced intrinsic reward for reciprocity. These findings align with prior research (Crone and Dahl, 2012; Do et al., 2017; Pfeifer and Berkman, 2018; Van Den Bos et al., 2010, 2011), suggesting that adolescents prioritize immediate gains over long-term benefits, potentially undermining the benefits of cooperation. This tendency aligns with earlier findings that adolescents exhibit heightened sensitivity to reward feedback (Blakemore and Mills, 2014; Crone and Dahl, 2012; Davis et al., 2023; Do et al., 2017; Van Den Bos et al., 2011; Van Duijvenvoorde et al., 2015), which may influence their decision-making in cooperative interactions.
It has been acknowledged that individuals update positive and negative outcomes by different weights in social cooperation, such asymmetric learning process can be modeled by basic RL algorithm with both positive and negative learning rates (Garrett and Daw, 2020; Rosenbaum et al., 2022). In this study, we find that an asymmetrical RL algorithm in a social reward model provided best model fits of the behaviors of both, adolescents and adults. Adolescents demonstrated a larger positive learning rate, but a smaller negative learning rate compared to adults, suggesting heightened sensitivity to positive feedback from cooperative behavior and reduced sensitivity to negative feedback from defection. This asymmetrical learning pattern may drive adolescents to focus more on self-beneficial social signals, maximizing immediate gains in response to cooperative behavior. These findings align with Van Den Bos et al. (2011), which highlight adolescents’ heightened sensitivity to immediate rewards and less stable trusting behavior compared to adults.
Adolescence is characterized by increased self-discovery and egocentrism (Pfeifer and Berkman, 2018; Ting et al., 2019), leading individuals to prioritize their immediate gains over long-term benefits. Consequently, adolescents may be more inclined towards selfish motives in long-term social interactions (Pfeifer and Berkman, 2018). However, it is essential to recognize that these behaviors are not static, and as adolescents mature into adults, their socio-emotional abilities evolve (Worthman and Trang, 2018), enabling a more balanced consideration of short-term and long-term outcomes (Crone and Dahl, 2012).
It is important to note some limitations of this study. First, we used artificial opponents with pre-determined cooperation patterns to better control the stimuli. While this approach allowed us to isolate specific motivations for cooperation (financial vs. social rewards), it’s possible that participants might behave differently in more natural settings. Our study serves as an initial step in understanding cooperation motivations in adolescents and adults, and future research could explore these behaviors in more real-world contexts.
In conclusion, our study contributes to an understanding of the developmental aspects of cooperation and the cognitive-affective processes underlying cooperative decision-making. By examining differences in cooperative behavior between adolescents and adults in the rPDG and integrating computational modeling, we offer valuable insights into the mechanisms driving cooperative behavior across different developmental stages. These findings have implications for promoting prosocial behaviors and designing effective socialization interventions during adolescence. By highlighting the importance of reciprocity, our findings offer insights into the developmental trajectory of cooperation from adolescence to adulthood and provide practical implications for enhancing cooperative interactions in real-world contexts.
Methods and Materials
Participants
A total of 261 participants took part in the current study, consisting of 127 adolescents (n = 127, aged 14-17 years, mean ± SD: 16.13 ± 0.63, 44 females) and 134 adults (n = 134, aged 18-30 years, mean ± SD: 21.63 ± 2.88, 79 females). No a priori power analysis was conducted. The sample size was determined based on previous studies investigating cooperation behaviors in adolescents and adults. Adolescents were recruited from the local high school with the consent of their legal guardians, while adults were recruited through advertisements on the university campus forum. Before the experiment, written informed consent was obtained from all participants and their legal guardians (for adolescents). Participants were included if they had normal or corrected-to-normal vision and no history of psychiatric or neurological illness. Exclusion criteria included any self-reported diagnosed psychiatric or neurological disorder. This study was approved by the ethics committee of Beijing Normal University. No participants dropped out of the experiment, and all data were included in the statistical analysis. Participants received compensation based on their performance in the tasks (see rPDG for details).
Experimental procedure
All participants completed the experiments in a laboratory setting with multiple participants present. They were informed that they were participating in a multiple-round interaction game with an anonymous partner. In the instructions section, we referred to the interaction game as the rPDG and refrained from using the terms ‘cooperate’ and ‘defect’ to minimize the influence of social expectations, biases and promote comparability across studies. Participants were instructed to believe that their partner was also playing the game at the same time. Compensation for their participation was based on the tokens earned during the game, with 10% of the rounds randomly selected for payment calculation at an exchange rate of 1 token to 1 yuan. Participants were explicitly informed in advance about this incentive mechanism. Prior to the formal experiment, participants underwent a quiz and several practice rounds to ensure a full understanding of the task. Following the experiment, participants completed a Social Value Orientation (SVO) task to assess their prosocial personality traits. The entire procedure lasted approximately 60 minutes. Blinding was not applicable in this study, as all participants interacted with a computer-controlled partner. To minimize potential bias, the partner’s behavior patterns and stimulus meanings were randomized across participants. A detailed protocol is available upon request.
The repeated prisoner’s dilemma game
Similarly to the classic version of PDG, rPDG involves two players. Consistent with the standard payoff matrix of the PDG (Figure 1b), when both players cooperated (defected), they each received 4 tokens (2 tokens). If the players made different decisions, the one who cooperated received 0 tokens, while the one who defected received 6 tokens. However, in reality, the actions of the partner were predetermined by a computer program. This setup allowed for a clear comparison of the behavioral responses between adolescents and adults. In order to enhance the realism of the partner’s response, we manipulated the variability in the partner’s decision making. The partner’s cooperation probability remained stable at 78% for half of the trials. In the other half of the trials, the partner’s cooperation probability varied, switching between 20%, 80%, and 20% for each set of 20 trials. The order of these two sessions was counterbalanced between participants. During the rPDG, participants were asked every 15 rounds to evaluate their partner’s cooperativeness using a 10-point scale, where 0 represents “no cooperation” and 9 represents “very high cooperation.” The question posed to the participants was “How cooperative do you think your partner is at the moment?”
Behavioral data analysis
All statistical analyses were conducted in MATLAB R2023a (RRID:SCR 001622). GLMM was implemented using the ‘fitglme’ function in MATLAB. Interaction contrasts were performed for significant interactions and, when higher-order interactions were not significant, pairwise or sequential contrasts were performed for significant main effects.
GLMM1: Participant’s choices (cooperate or defect) of all trials are the dependent variable; fixed effects include an intercept, the main effects of group (adolescents or adults), previous trial (last one trial, last two trials and last three trials), partner’s choice (cooperation or defection), and all possible interaction effects of the independent variables. Gender (male and female) and timing (trial number from 1 to 120) were also included as the control variables. Random effects include correlated random slopes of group, previous trial, partner’s choice, gender, trial number and random intercept for participants. The group, previous trial, partner’s choice, gender are the category variables. The trial number is a continuous variable. See Appendix Table 1 for the statistical results of GLMM1.
LMM2: Participants’ self-reported score on partner’s cooperativeness is the dependent variable; fixed effects include an intercept, the main effects of group (adolescents or adults), the order of the sessions (regarding the partner’s cooperation involved fixed 78% cooperation probability, followed by shifting into 20%, 80%, and 20% for each 20, or vice versa), the interaction of group × order. Gender (male and female) and timing (trial number from 1 to 120) were also included as the control variables. Random effects include correlated random slopes of group, gender, timing and random intercept for participants. The group, previous trial, partner’s choice, gender are the category variables. The trial number is a continuous variable. See Appendix Table 3 for the statistical results of LMM2.
Behavioral modelling
We systematically developed models based on various assumptions regarding participants’ decision-making processes in the rPDG.
Model 1: The baseline model
We modeled each participant’s choices in each trial (i.e., whether to cooperate) as outcomes from a Bernoulli distribution, where the cooperation probability is controlled by a parameter, b ∈ [0, 1]. For each participant, the probabilities of cooperation (q(cooperation)) and defection (q(defection)) are denoted as follows:
Model 2: Win-stay & loss-shift model
The model assumes that individuals adopt a tit-for-tat strategy in decision-making. Participants are likely to repeat their previous choice with a probability
where qt denotes the probability of repeating the previous choice and 1 − qt denotes the probability of shifting to another option at trial t.
Model 3: Reward learning model
This model assumes that participants make decisions by comparing the values of choosing cooperation and defection. The values of the two options are updated using a RL algorithm:
where Vc (Vd) denotes the value of cooperation (defection) option. R represents the reward feedback, which can be 0, 2, 4, or 6, depending on the payoff matrix.
where qt denotes the participants’ probability of cooperation and β denotes the inverse temperature. The lower the value of the inverse temperature, the greater the sensitivity to the different values between options.
Model 4: Inequality aversion model
The model assumes that participants’ decisions aim to reduce both disadvantageous and advantageous inequality between themselves and their partners:
where Uc (Ud) denotes the utility of cooperation (defection). φ represents aversion to disadvantageous inequality and ν represents aversion to advantageous inequality. cself and cother denote the expected payoffs for cooperation to oneself and the partner, respectively, while dself and dother denote the expected payoffs for defection to oneself and the partner. p denotes participants’ partner cooperation expectation. The model assumes that participants did not update the inferred cooperation probability based on feedback; p is fixed at 0.5.
Based on the payoff matrix, the payoffs for participants and their partners are calculated using the following functions:
Participants’ choices are modeled by a softmax function:
where qt denotes the participants’ probability of cooperation and β denotes the inverse temperature. The lower the value of the inverse temperature, the greater the randomness in decisions.
Model 5: Social reward model
The model assumes that participants make decisions by comparing the expected payoff of cooperation and defection based on the payoff matrix and an additional subjective bonus from cooperation:
where ω represents an additional social reward associated with cooperation.
Model 6: Social reward model with RL algorithm
The model, building on Model 5, assumed that participants update their expectations of partner cooperation trial-by-trial, based on the partner’s previous decisions, using a basic RL algorithm:
where pt denotes participants’ expectation of partner cooperation probability at trial t and is updated by the following function:
where a is the learning rate applied to the prediction error, (Pt − pt). Pt represents the partner’s decision at trial t, equating to 1 if the partner cooperates and 0 if the partner defects.
Model 7: Social reward model with influence model
The model is based on Model 6 and includes an additional assumption that participants update their expectation of the partner’s cooperation by considering not only the partner’s previous decisions but also the influence of their own previous decisions on the partner’s subsequent decisions. This aspect is referred to as second-order belief and is updated by the following function:
where Qt represents the participants’ decision at trial t, equating to 1 if the participants cooperate and 0 if participants defect.
Model 8: Social reward model with asymmetric RL rule
The model, based on Model 6, assumes that participants asymmetrically update positive expectation errors (better than expected) and negative prediction errors (worse than expected) using two distinct learning rates:
Model fitting and model comparison
We used maximum likelihood estimation to fit models to each participant’s choices across all trials. The likelihood function, based on the binomial distribution, captured the association between each participant’s choices and each model’s predictions. To minimize the negative log-likelihood, we employed MATLAB’s (MathWorks) fmincon function. To enhance the likelihood of finding the global minimum, we repeated the parameter search process 500 times, using different starting points.
For model evaluation, we first used the Akaike Information Criterion corrected for sample size (AIC) (Hurvich & Tsai, 1989), which accounts for the model’s complexity and the number of observed data points. The second metric was the protected exceedance probability from group-level Bayesian model selection (Rigoux et al., 2014), providing a measure of the likelihood that a specific model is superior to other models under consideration.
The log-likelihood is calculated as the following function:
where qt represents the probability of participants’ decision at trial t, equating to q(cooperation) if participants cooperate and q(defection) if participants defect.
Model identifiability and parameter recovery analyses
We performed a model identifiability analysis to ensure that model comparisons were not compromised by model misidentification. For each model, we generated synthetic datasets using parameters estimated from the data of all participants. We then fitted each alternative model to its corresponding synthetic dataset and identified the best-fitting model through model comparison. To test robustness, we repeated this procedure 100 times, calculating the percentage of instances where each model was recognized as the best model across all synthetic datasets generated by that specific model. Consistently high percentages indicated model identifiability. Additionally, we assessed parameter recovery for the best-fitting model (model 8: social reward model with an asymmetric RL rule). This assessment involved calculating the Pearson correlation between the parameters estimated from the 100 synthetic datasets (recovered parameters) and the parameters used to generate these datasets. A higher correlation coefficient between the recovered and the estimated parameters suggested non-redundancy in the parameter space (Figure 2—figure Supplement 3).
Hidden mental variables analysis
LMM1
Participants’ expectation of partner’s cooperation probability that estimated from the winning model, the variable p, is the dependent variable. The fixed and random effects remain the same as GLMM1. See Appendix Table 2 for the statistical results of LMM1.
LMM3
Participants’ intrinsic reward for reciprocity that estimated from the winning model, p×ω, are the dependent variable. The fixed and random effects remain the same as GLMM1. See Appendix Table 4 for the statistical results of LMM3.
Data analysis
All analyses were performed with custom code in MATLAB, version R2023a (RRID:SCR 001622). All scripts used for analysis are available at https://doi.org/10.5281/zenodo.15046430.
Supplementary material
Appendix Tables

Statistical results for cooperation decision (GLMM1).

Statistical results for partner cooperation expectation (LMM1).

Statistical results for self-reported perceived partner cooperativeness (LMM2).

Statistical results for intrinsic reward for reciprocity (LMM3).
Appendix Figures

Model Prediction.
This figure compares the actual data and predictions of best-fitting model for adults (a) and adolescents (b). The x-axis represents the trial number, while the y-axis represents the mean cooperation probability of all participants. The shaded area indicates the 95% confidence interval.

Distributions of Estimated Parameters from the Best-Fitting Model.
Each panel displays one parameter. The histograms and their kernel fits are represented by color bars and curves, respectively. Red indicates participants in the adolescent sample, and blue denotes those in the adult sample. Parameters have been transformed into a log scale for enhanced visualization.

Parameter Recovery for the Best-Fitting Model.
Each panel represents one parameter. Each dot corresponds to one virtual participant. The value of r indicates Pearson’s correlation coefficient between the true values (estimated from the participants) and the recovered parameters.
Acknowledgements
This work was supported by the Scientific and Technological Innovation(STI) 2030-Major Projects (2021ZD0200500), the National Natural Science Foundation of China (32441109, 32271092, 32130045), the Beijing Major Science and Technology Project under Contract No.Z241100001324005and the Opening Project of the State Key Laboratory of General Artificial Intelligence (SKLAGI20240P06). We thank Christian C. Ruff for his insightful discussion.
Additional information
Data and code availability
All datasets and code are available at https://doi.org/10.5281/zenodo.15046430.
References
- Rational cooperation in the finitely repeated prisoner’s dilemma: Experimental evidenceThe economic journal 103:570–585https://doi.org/10.2307/2234532Google Scholar
- The evolution of cooperationscience 211:1390–1396https://doi.org/10.1126/science.7466396Google Scholar
- Adult and adolescent social reciprocity: Experimental data from the Trust GameJournal of adolescence 35:1341–1349https://doi.org/10.1016/j.adolescence.2012.05.004Google Scholar
- Is adolescence a sensitive period for sociocultural processing?Annual review of psychology 65:187–207https://doi.org/10.1146/annurev-psych-010213-115202Google Scholar
- Understanding adolescence as a period of social–affective engagement and goal flexibilityNature reviews neuroscience 13:636–650https://doi.org/10.1038/nrn3313Google Scholar
- Thumbs up or thumbs down: neural processing of social feedback and links to social motivation in adolescent girlsSocial Cognitive and Affective Neuroscience 18:nsac055https://doi.org/10.1093/scan/nsac055Google Scholar
- But is helping you worth the risk? Defining prosocial risk taking in adolescenceDevelopmental cognitive neuroscience 25:260–271https://doi.org/10.1016/j.dcn.2016.11.008Google Scholar
- Cooperation in the finitely repeated prisoner’s dilemmaThe Quarterly Journal of Economics 133:509–551https://doi.org/10.1093/qje/qjx033Google Scholar
- Effects of direct social experience on trust decisions and neural reward circuitryFrontiers in neuroscience 6:148https://doi.org/10.3389/fnins.2012.00148Google Scholar
- Computational substrates of social value in interpersonal collaborationJournal of Neuroscience 35:8170–8180https://doi.org/10.1523/JNEUROSCI.4775-14.2015Google Scholar
- Computational models as aids to better reasoning in psychologyCurrent Directions in Psychological Science 19:329–335https://doi.org/10.1177/0963721410386677Google Scholar
- The nature of human altruismNature 425:785–791https://doi.org/10.1038/nature02043Google Scholar
- Default distrust? An fMRI investigation of the neural development of trust and cooperationSocial cognitive and affective neuroscience 9:395–402https://doi.org/10.1093/scan/nss144Google Scholar
- Biased belief updating and suboptimal choice in foraging decisionsNature communications 11:3417https://doi.org/10.1038/s41467-020-16964-5Google Scholar
- Transition from reciprocal cooperation to persistent behaviour in social dilemmas at the end of adolescenceNature communications 5:4362https://doi.org/10.1038/ncomms5362Google Scholar
- Universal norm psychology leads to societal diversity in prosocial behaviour and developmentNature Human Behaviour 4:36–44https://doi.org/10.1038/s41562-019-0734-zGoogle Scholar
- Regression and time series model selection in small samplesBiometrika 76:297–307https://doi.org/10.1093/biomet/76.2.297Google Scholar
- Age-dependent changes in intuitive and deliberative cooperationScientific Reports 13:4457https://doi.org/10.1038/s41598-023-31691-9Google Scholar
- Five rules for the evolution of cooperationscience 314:1560–1563https://doi.org/10.1126/science.1133755Google Scholar
- The computational development of reinforcement learning during adolescencePLoS computational biology 12:e1004953https://doi.org/10.1371/journal.pcbi.1004953Google Scholar
- The development of self and identity in adolescence: Neural evidence and implications for a value-based choice perspective on motivated behaviorChild development perspectives 12:158–164https://doi.org/10.1111/cdep.12279Google Scholar
- Bayesian model selection for group studies—revisitedNeuroimage 84:971–985https://doi.org/10.1016/j.neuroimage.2013.08.065Google Scholar
- The neural correlates of theory of mind within interpersonal interactionsNeuroimage 22:1694–1703https://doi.org/10.1016/j.neuroimage.2004.04.015Google Scholar
- A neural basis for social cooperationNeuron 35:395–405https://doi.org/10.1016/s0896-6273(02)00755-9Google Scholar
- Valence biases in reinforcement learning shift across adolescence and modulate subsequent memoryeLife 11:e64620https://doi.org/10.7554/eLife.64620Google Scholar
- The evolution of cooperationThe Quarterly Review of Biology 79:135–160https://doi.org/10.1086/383541Google Scholar
- Cognitive and affective development in adolescenceTrends in cognitive sciences 9:69–74https://doi.org/10.1016/j.tics.2004.12.005Google Scholar
- The effect of attachment and environmental manipulations on cooperative behavior in the prisoner’s dilemma gamePLOS One 13:e0205730https://doi.org/10.1371/journal.pone.0205730Google Scholar
- Toddlers and infants expect individuals to refrain from helping an ingroup victim’s aggressorProceedings of the National Academy of Sciences 116:6025–6034https://doi.org/10.1073/pnas.1817849116Google Scholar
- Changing Brains, Changing Perspectives: The Neurocognitive Development of ReciprocityPsychological Science 22:60–70https://doi.org/10.1177/0956797610391102Google Scholar
- Development of trust and reciprocity in adolescenceCognitive Development 25:90–102https://doi.org/10.1016/j.cogdev.2009.07.004Google Scholar
- Neural Correlates of Expected Risks and Returns in Risky Choice across DevelopmentThe Journal of Neuroscience 35:1549–1560https://doi.org/10.1523/JNEUROSCI.1924-14.2015Google Scholar
- How children solve the two challenges of cooperationAnnual Review of Psychology 69:205–229https://doi.org/10.1146/annurev-psych-122216-011813Google Scholar
- Developmental asymmetries in learning to adjust to cooperative and uncooperative environmentsScientific Reports 10:21761https://doi.org/10.34894/Z1OYYAGoogle Scholar
- Dynamics of body time, social time and life history at adolescenceNature 554:451–457https://doi.org/10.1038/nature25750Google Scholar
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.106840. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2025, Wu et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 126
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.