Experiment setup and behavioral results.

(a) Partner’s Cooperation Probability: In the first half of the 120 trials, the partner cooperated 78% of the time; in the second half, cooperation alternated between 20% and 80%. (b) Payoff Matrix: Payoffs are 4 for mutual cooperation, 2 for mutual defection, 0 for cooperation when the other defects, and 6 for defecting when the other cooperates. (c) Trial Illustration: After a 0.5-second fixation, participants choose a shape (triangle for cooperation, square for defection) within 4 seconds and see both players’ choices for 1.5 seconds. (d-e) Post hoc Comparisons: Figures 1d and 1e show the participants’ cooperation probability on the y-axis. The x-axis represents the consistency of the partner’s actions in previous trials (t-1: last trial, t-1,2: last two trials, t-1,2,3: last three trials). Large red (adolescents) and blue (adults) dots indicate mean probabilities, with black error bars for standard error (SE). Gray dots represent mean probabilities across trials, and green error bars show predicted cooperation rates with SE. Notes: n.s. p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001.

Computational modeling.

(a-b) Model comparisons for adolescents and adults, respectively. The y-axis represents model fitness based on the Akaike Information Criterion with a correction for sample size (AICc) (Hurvich and Tsai, 1989). For each participant, the model with the lowest AICc served as a reference to compute ΔAICc by subtracting it from the AICc of other models (ΔAICc = AICcx - AICclowest). A lower ΔAICc indicates a better model fit. Protected exceedance probability (PEP) is a group-level measure that assesses the likelihood of each model’s superiority over the others (Rigoux et al., 2014). (c) Model recovery analysis. Each model was used to generate 100 synthetic datasets, and for each dataset, model fitting and comparison were performed. Each column corresponds to one generative model, and each row corresponds to one fitting model. The color in each cell indicates the probability that the synthetic datasets generated by the model in the column were best fit by the model in the row, with a darker color denoting a higher probability. (d-e) Model prediction. Sample illustration of the best-fitting model prediction versus data for adolescents and adults, respectively.

Learning Rates and Social Preferences.

(a-d) Comparison between adolescents and adults for positive learning rate (α+), negative learning rate (α−), social preference (ω), and inverse temperature (β), respectively. (e-h) Correlation between age and positive learning rate, negative learning rate, social preference, and inverse temperature, respectively. Notes: *p < 0.05; **p < 0.01.

Analysis of hidden variables from the best-fitting model.

(a-b) Post-hoc Comparison of LMM1: Interaction of group × previous trial × partner’s choice. The y-axis shows participants’ expectations of partner cooperation probability (p) from the best-fitting model. (c-d) Self-Reported Cooperativeness: Normalized scores on partner cooperativeness for two orders of partner cooperation probability, with adolescents (orange-red line) and adults (blue line). Scores were assessed on a 0-9 scale and normalized to 0-1. The dotted line indicates the presumed partner’s cooperation probability, with mean values and standard errors shown. (e-f) Post-hoc Comparison of LMM3: Interaction of group × previous trial × partner’s choice. The y-axis shows participants’ intrinsic reward for reciprocity (p × ω) from the best-fitting model. The x-axis represents the consistency of the partner’s actions in previous trials (t-1: last trial, t-1,2: last two trials, t-1,2,3: last three trials). Colored dots with error bars indicate mean values with standard errors for adolescents (orange-red) and adults (blue), while small gray dots represent individual participants. Notes: n.s. p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001.

Statistical results for cooperation decision (GLMM1).

Statistical results for partner cooperation expectation (LMM1).

Statistical results for self-reported perceived partner cooperativeness (LMM2).

Statistical results for intrinsic reward for reciprocity (LMM3).

Statistical results for cooperation decision with age (GLMMsup1).

Statistical results for partner cooperation expectation with age (LMMsup1).

Statistical results for self-reported perceived partner cooperativeness with age (LMMsup2).

Statistical results for partner cooperation expectation with age (LMMsup3).

Statistical results for cooperation decision with phase (GLMMsup2).

Statistical results for partner cooperation expectation with phase (LMMsup4).

Statistical results for intrinsic reward for reciprocity with phase (LMMsup5).

Statistical results for cooperation decision with SVO (GLMMsup3).

Statistical results for intrinsic reward for reciprocity with SVO (LMMsup6).

Statistical results for cooperation decision predicted by cooperation expectation (GLMMsup4).

Model comparison results for (a) adults and (b) adolescents, including the newly added M9 (Social Reward & Pearce–Hall learning).

Lower ΔAICc values indicate better model fits. The dynamic learning rate model (Model 9: Social Reward model with dynamic RL algorithm) did not outperform the best-fitting model (Model 8) in either group.

Convergence diagnostics for the hierarchical Bayesian model.

(a) Distribution of (Rhat) values across all model parameters. The majority of values are below the conservative convergence threshold of 1.01 (red dashed line), indicating stable and well-mixed MCMC chains. The gray shaded area highlights the region where . (b) Trace plots for the group-level parameters (four chains) in adolescents (left, red box) and adults (right, blue box). Each line represents the sampled posterior values of one chain across iterations, with overlapping traces and stable fluctuations confirming adequate convergence and mixing for all key parameters (α+, α−, ω, β).

Model prediction.

This figure compares the actual data and presupplement 1 (a) and adolescents (b). The x-axis represents the trial number, while the y-axis represents the mean cooperation probability of all participants. The shaded area indicates the 95% confidence interval.

Distributions of estimated parameters from the best-fitting model.

Each panel displays one parameter. The histograms and their kernel fits are represented by color bars and curves, respectively. Red indicates participants in the adolescent sample, and blue denotes those in the adult sample. Parameters have been transformed into a log scale for enhanced visualization.

Parameter recovery for the best-fitting model.

Each panel represents one parameter. Each dot corresponds to one virtual participant. The value of r indicates Pearson’s correlation coefficient between the true values (estimated from the participants) and the recovered parameters.

Partial correlation matrices among parameters for the best-fitting model.

The upper-triangular cells show partial correlations for adults, and the lower-triangular cells show partial correlations for adolescents. Each cell shows the partial Pearson correlation coefficient (controlling for the other parameters). Colors range from green (negative) to violet (positive), with the color bar spanning [-1,1]. Notes: n.s. p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001.

Group-level posterior distributions from the hierarchical Bayesian estimation for adolescents and adults.

Posterior densities are shown separately for adolescents (red) and adults (blue). Δ values indicate the posterior mean difference (Adult – Adolescent) with 95% credible intervals (CrI) and Bayesian p values. Compared with adolescents, adults exhibited higher positive learning rates (α+) and lower negative learning rates (α−), suggesting greater differentiation between learning from positive and negative feedback. Adults also showed lower inverse temperature (β), indicating more exploratory decision behavior, and higher social reward weight (ω), reflecting greater valuation of reciprocity or social outcomes. Notes: n.s. p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001.