Hypothetical relationship between internalizing psychopathology and learning.

To ensure survival, a hiker has to take into account the probability of encountering predators in the forest. Learning the probability that, for example, a bear will suddenly appear requires the ability to regulate the learning rate dynamically, determining the extent of change occasioned by a prediction error (e.g., when the bear appears unexpectedly). a| On the one hand, this requires lower learning rates in response to risk due to random outcome variability (aleatoric uncertainty). For example, the hiker should remain cautious in a potentially dangerous area despite not encountering bears every time. b| On the other hand, beliefs should be substantially changed via a high learning rate in response to environmental changes that demand new learning (epistemic uncertainty). That is, upon encountering a bear in a previously safe area, the hiker should become more cautious around there. c-e| Hypothetical relationships between internalizing psychopathology and the modulation of learning. Green lines illustrate the learning rate simulated from one form of normative Bayesian models called the reduced Bayesian model (details in Methods Reduced Bayesian model). In the range of smaller prediction errors, mainly due to risk, the model uses a lower learning rate. For larger prediction errors related to environmental changes, the model adopts a higher learning rate. Pink lines illustrate hypothetical learning rates in individuals with internalizing psychopathology. c| H1: Higher overall learning rate, leading to overlearning from prediction errors. d| H2: Impaired learning about environmental changes, selectively concerning larger prediction errors. e| H3: No systematic relationship between learning rates and internalizing psychopathology. Graphics in (a) and (b) generated through Canva Magic Media.

Correlation and factor analysis of questionnaire measures.

Questionnaire measures were correlated, so we applied a bi-factor analysis to extract three distinct factors: a general internalizing factor G, an anxiety-related factor F1, and a depression-related factor F2. a| Pearson correlation matrix showing the correlation between items of each questionnaire. b| Factor loadings of each questionnaire item on the general factor G and the two sub-factors F1 (anxiety related) and F2 (depression related). c| The correlation of factor scores with questionnaire sum scores validates the hierarchical 3-factor structure. Based on Cohen’s effect size conventions, the general factor G shows moderate-to-large correlations with all questionnaires and captures variance common to both anxiety and depression, supporting the overall construct validity. F1 has moderate-to-large correlations with the anxiety-related scales MASQ-aa, STICSA-som and STICSA-cog. F2 shows moderate-to-large correlations with depression-related measures like MASQ-ad and STAI-dep. Cohen’s effect size conventions: r = 0.10 - small, r = 0.30 - moderate, r = 0.50 - large. STAI-S: Spielberger State-Trait Anxiety Inventory Y1; STAI-anx: Spielberger State-Trait Anxiety Inventory Y2 - anxiety subscale; STAI-dep: Spielberger State-Trait Anxiety Inventory Y2 - depression subscale; STICSA-cog: State-Trait Inventory for Cognitive and Somatic Anxiety - trait cognitive subscale; STICSA-som: State-Trait Inventory for Cognitive and Somatic Anxiety - trait somatic subscale; BDI: Beck’s Depression Inventory; IUS-27: Intolerance of Uncertainty Scale; PSWQ: Penn State Worry Questionnaire; MASQ-ad: Mood and Anxiety Symptom Questionnaire - anhedonia subscale; MASQ-aa: Mood and Anxiety Symptom Questionnaire - anxious-arousal subscale

Predator task and reduced Bayesian model.

a| The goal of participants was to scare away as many attacking predators as possible. To do so, participants placed a flame in the predicted attack location of the predator. When the flame was placed accurately, the predator was scared away. However, when predictions were inaccurate, the predator caught the participants. In each trial, the participants first placed the flame in the expected predator position (Prediction). The predator then started the attack (Outcome), revealing a prediction error (difference between expected and actual location). Finally, participants updated the location of their flame (Update). b| The predator’s average attack location (dashed line) is mostly stable but shifts occasionally at change points. Actual attack locations (outcomes, shown by black dots) are corrupted by random variability. We used a reduced Bayesian model with near-optimal learning performance. The model (pink line) calculates prediction errors (second panel) to update its predictions. The learning rate (LR) that controls the influence of prediction errors on belief updating is influenced by change-point probability (CPP) and relative uncertainty (RU). c| Histogram showing participants’ responses to the question “How much did you want to avoid the predator?” (task motivation). d| Histogram showing participants’ responses to the question “How anxious did the game make you feel?” (task-induced anxiety). e| Correlation between trait-anxiety scores (STICSA-T) and task-induced anxiety. f| Correlation between the general-factor score and task-induced anxiety. STICSA-T: State-Trait Inventory for Cognitive and Somatic Anxiety - trait subscale

Behavioral results from the predator task.

The model-agnostic analysis did not reveal a significant association of internalizing psychopathology (general factor) with behavior in the task. Participants are split into low- and high-internalizing groups for illustration purposes. The low-internalizing group includes participants with a general-factor score below the mean, while the high-internalizing group includes those with a score above the mean. a| Learning rates plotted as a function of prediction errors. We divided prediction errors into 20 quantiles showing the mean ± 95% confidence interval. The plot suggests a similar increase in the learning rate as a function of the prediction error for both groups. b| Estimation errors (absolute difference between predator mean and participant prediction), used as a measure of performance in the task, were not found to differ significantly between the two groups. c| Analyzing estimation errors across the full spectrum of general-factor scores did not reveal any significant associations. d| Average single-trial learning rates were not found to differ significantly between the two groups. e| We did not find any significant association between average singletrial learning rates and general-factor scores. f| Participants’ perseveration probability (repeating the previous prediction) did not differ significantly between the two groups. g| Average reaction times of participants did not differ significantly between the groups.

Regression using the reduced Bayesian model reveals no significant effect of internalizing on learning rates in the predator task.

a| Illustration of the fixed and adaptive learning rate (LR) using example data from one participant. Each gray dot represents a single trial. The fixed learning rate represents a heuristic learning strategy through which the prediction error has a constant effect on the belief update, i.e., there is a linear relationship between update (difference between current and previous flame location) and prediction error. The adaptive learning rate reflects an approximately normative learning strategy, i.e., a non-linear relationship where larger prediction errors increase the learning rate to adaptively respond to change points. Human participants typically show a mixture of fixed and adaptive learning rates. b| The low- and high-internalizing groups used fixed learning rates to update their beliefs. However, we did not find significant group differences. c| A regression analysis did not yield a significant association between internalizing and fixed learning rates (controlling for age and gender). d| Participants also relied on adaptive learning rates, taking into account the underlying environmental dynamics. However, the test did not reveal significant differences between the groups. e| Regression analysis of adaptive learning rates similarly did not indicate a significant relationship with internalizing (controlling for age and gender).

Results from the laboratory study comparing electric shocks and screams as aversive stimuli in the predator task.

This study examined the effects of screams and electric shocks as aversive stimuli. a| We did not find a significant difference in estimation errors between blocks with screams and shocks. b| Analyzing single-trial learning rates similarly did not reveal a significant difference between the two types of aversive stimuli. c| Analyzing fixed learning rates based on our regression model did not reveal a significant association between fixed learning rates and internalizing psychopathology for screams or shocks. d| Similarly, analyzing adaptive learning rates based on the regression model did not yield significant associations with internalizing psychopathology for both types of stimuli. e| Baseline-corrected time course of mean standard error of the mean ± (SEM) skin conductance response (SCR) for successful and failed trials. SCR was significantly higher for failed trials compared to successful trials. The horizontal lines at the bottom of the plot show time points where SCRs for failed trials (pink) were significantly different from zero after permutation testing. The gray line shows the significant differences between the two conditions. f| For both screams (green) and shocks (pink), the SCR was significantly different from zero. Additionally, SCRs for screams were significantly lower than for shocks (gray line).

Probabilistic reversal learning task.

a| The participants’ goal in the task was to maximize rewards. They chose between two fractals on each trial, each associated with a different reward probability. Only one fractal resulted in a reward per trial, and participants were required to learn the reward probability of each fractal. b| The task comprised stable and volatile phases. During the stable phase, one fractal had a 75% reward probability and the other 25%. During the volatile phase, reward probabilities switched between 80% and 20% every 20 trials. c| From the perspective of a volatile Kalman filter (VKF), learning in the task involves tracking environmental volatility (top), which dynamically adjusts the learning rate (middle). In the stable phase, low volatility results in a lower learning rate, leading to stable reward probability estimates. In the volatile phase, volatility increases after each reversal, raising the learning rate and enabling faster adaptation to shifting reward contingencies (bottom). In the bottom panel, the blue line represents the predicted reward probability of fractal 2, the black dashed line denotes the true reward probability, and the green circles indicate actual outcomes. VKF simulation parameters: λ = 0.1, v0 = 0.1, ω = 0.05.

Results from the reversal learning task.

a| We did not find a significant difference in the probability of choosing the correct fractal between the high- and low-internalizing groups in either the stable or the volatile phase. b| Posterior distributions for the mean parameter component of the learning-rate µ0, with errorbars representing the mean value and 95% highest density intervals (HDI) for each distribution. We found significant effects of outcome valence (good vs. bad) and task phase (volatile vs. stable). c| Posterior means and 95% HDIs for the effect of the general factor on learning rates. No significant effect of the general factor was observed, as all HDIs included zero. d,e| Learning rates depending on outcome valence and task phase plotted separately for the low-internalizing group (d) and high-internalizing group (e). Small markers show individual participant learning rates, while larger markers represent the average across participants, with the error bars showing the standard deviation.

Basic task and demographic details of all conducted experiments.

Predator Task*: Version of predator task with differing variability and hazard-rate conditions; results of this task version are reported in the main analysis.

Scree plot of eigenvalues and parallel analysis results.

The plot shows the eigenvalues of our questionnaire data, ordered in descending magnitude. The red dotted lines represent thresholds derived from parallel analyses, with a factor considered viable if its corresponding eigenvalue lies above the threshold. Using a randomly generated normal dataset, the parallel analysis yields eigenvalue thresholds close to 1 due to the large sample size, suggesting an unrealistic number of 48 factors. In contrast, parallel analysis with resampled data provides higher thresholds, identifying only three eigenvalues above the threshold. This supports a three-factor structure consistent with prior research (Clark & Watson, 1991; Gagne et al., 2020).

Comparison of factor scores from our analysis with those derived using loadings from Gagne et al. (2020).

G represents the internalizing factor, Dep represents the depression-related factor, and Anx represents the anxiety- related factor. a-c| Correlation of our factor scores (x-axes) with scores derived using loadings from the clinical dataset (y-axes): a| General-factor scores exhibit large correlations. b| Depression-related scores also exhibit strong correlations. c| The anxiety-related factor shows a negligible association with the scores derived using the clinical dataset. d-e| Correlation of our factor scores (x-axes) with scores derived using loadings from the independent online sample (y-axes). d| General-factor scores reveal large correlations. e| Similarly, we observed large correlations for the depression-related factor. f| The anxiety-related scores show a large association with the scores derived using the online dataset.

Participant performance in the training quiz for the predator-task version with different conditions (N = 554).

Participants demonstrated a good understanding of the predator task, with 392 participants getting all three answers correct, 137 participants getting two answers correct, and 24 participants getting only 1 answer correct. Only 1 participant failed to answer any question correctly.

Basic demographic details of participants who completed the predator task version with different variability and hazard-rate conditions.

STAI-Y1: Spielberger State-Trait Anxiety Inventory - state scale; STAI-Y2: Spielberger State-Trait Anxiety Inventory - trait scale; STICSA-T: State-Trait Inventory for Cognitive and Somatic Anxiety - trait scale, IUS-27: Intolerance of Uncertainty Scale; BDI: Beck’s Depression Inventory; PSWQ: Penn State Worry Questionnaire; MASQ: Mood and Anxiety Symptom Questionnaire; MASQ-Anh: Mood and Anxiety Symptom Questionnaire - anhedonia subscale; MASQ-AnxAr: Mood and Anxiety Symptom Questionnaire - anxious arousal subscale

Estimation errors across variability and hazard-rate conditions in the predator task.

Across all four experiments using the predator task with differing variability and hazard-rate conditions (a-d), participants exhibited lower estimation errors in low-variability conditions compared to high-variability conditions. Similarly, high-hazard-rate conditions consistently led to higher estimation errors compared to low-hazard-rate conditions across tasks.

Effect of internalizing on estimation errors and empirical learning rates across variability and hazard-rate conditions in the predator task.

a-d| Participants were categorized into low- and high-internalizing groups based on their general-factor scores. We did not find any significant differences in estimation errors between groups in any condition: low variability, low hazard rate (a); high variability, low hazard rate (b); low variability, high hazard rate (c); or high variability, high hazard rate (d). e-h| Regression analysis of estimation errors against internalizing did not reveal any significant associations across conditions. i-l| Similarly, empirical learning rates were not found to differ significantly between internalizing groups across all variability and hazard-rate conditions. m-p| Regression analyses also did not show any significant associations between internalizing and learning rates across conditions. All p-values were corrected for multiple comparisons using false discovery rate correction.

Model comparison for the linear regression using the reduced Bayesian model.

We evaluated multiple regression models, including those with a single fixed learning rate, a fixed learning rate with valence, a fixed and adaptive learning rate, and a combination of all three. All models included parameters accounting for hazard-rate level, stochasticity, and their interaction. The best-fitting model was the full model with a fixed learning rate, an adaptive learning rate, and valence, which achieved the lowest summed Bayesian information criterion (BIC) score.

Parameter recovery analysis for the winning regression model.

To evaluate parameter identifiability, we simulated five datasets using subject-specific parameter values from the winning model. The model was then applied to each dataset, and Spearman’s rank correlation was computed between the ground truth and recovered parameter values. The figure presents results from one example dataset, where each panel corresponds to a model parameter (x-axis: ground truth values, y-axis: recovered values). Across all five simulated datasets, the average Spearman correlation was ρ = 0.93, highlighting excellent parameter recoverability.

Split-half reliability of model parameters in the linear regression model.

We assessed split-half reliability by dividing the data into odd and even trials, fitting the linear regression model to each subset, and calculating Spearman’s rank correlation ρ between the resulting model parameters. This yielded moderate-to-good correlations, indicating reliable model parameters, in particular for the fixed learning rate.

Estimates from the regression model predicting learning-related parameters from the factor scores derived from our factor analysis while controlling for age and gender.

All p-values are corrected for multiple comparisons using false discovery rate correction. G = General-Factor Score (Internalizing), F1 = Anxiety-related factor, F2 = Depression-related factor.

Model parameters from the linear regression using the reduced Bayesian model did not reveal any significant effects of internalizing on learning rates in the predator task.

a| Boxplots of model parameters across participants indicate the use of both fixed and adaptive learning rates to facilitate learning. b-g| We did not find any significant associations between internalizing and the fixed learning rate (b), adaptive learning rate (c), valence (d), hazard-rate levels (e), variability levels (f), or the interaction between variability and hazard-rate levels (g). All p-values were FDR-corrected.

Regression of model parameters against general-factor scores across experiments.

We did not find any significant associations between internalizing psychopathology and fixed or adaptive learning rates (LR) across all four experiments. a| In experiment 1, participants completed a predator-task version with different conditions. Neither fixed nor adaptive learning rates were found to be significantly associated with internalizing. b| In experiment 2, participants completed the predator task along with the probabilistic reversal learning task. Fixed and adaptive learning rates did not reveal any significant associations with internalizing in the predator task. c| In experiment 3, participants completed the predator task and a probabilistic reversal learning task with reward magnitudes. We did not find any significant associations between internalizing and fixed or adaptive LRs. d| In experiment 4, participants completed the predator task along with a probabilistic reversal learning task incorporating both reward and loss conditions. Fixed and adaptive LRs in the predator task were not found to be significantly associated with internalizing.

Regression of model parameters against questionnaire scores and associated subscales, controlling for age and gender.

a-k| Association between fixed learning rate (LR) and questionnaire scores. No significant associations were observed between fixed LR and any questionnaire or subscale. l-v| Association between adaptive LR and questionnaire scores. The only significant association was a decrease in adaptive LR with increasing scores on the MASQ-AnxAr sub-scale. All p-values are corrected based on the false discovery rate. STAI-Y1: Spielberger State-Trait Anxiety Inventory State Scale; STAI-Y2: Spielberger State-Trait Anxiety Inventory Trait Scale; STICSA-T: State-Trait Inventory for Cognitive and Somatic Anxiety - Trait scale, IUS-27: Intolerance of Uncertainty Scale; BDI: Beck’s Depression Inventory; PSWQ: Penn State Worry Questionnaire; MASQ: Mood and Anxiety Symptom Questionnaire; STICSA-cognitive: State-Trait Inventory for Cognitive and Somatic Anxiety - Trait cognitive subscale; STICSA-somatic: State-Trait Inventory for Cognitive and Somatic Anxiety - Trait somatic subscale; MASQ-Anh: Mood and Anxiety Symptom Questionnaire - anhedonia subscale; MASQ-AnxAr: Mood and Anxiety Symptom Questionnaire - anxious arousal subscale

Results from the predator-task version with minimal training phase.

Descriptive and model-based behavioral results were not found to be significantly associated with internalizing (general-factor scores). a| Estimation errors did not show significant associations with internalizing. b| Single-trial learning rates indicate participants exhibited higher overall learning rates, but these were not found to be significantly associated with internalizing. c| Regression of fixed learning rate with internalizing did not reveal any significant associations with internalizing. However, we observed a significant positive relationship between internalizing and adaptive learning rates, suggesting that individuals with higher internalizing symptoms exhibited more adaptive learning behavior. All p-values are corrected for multiple comparisons based on the false discovery rate.

Results from the predator-task version with full training and a single variability and hazardrate condition.

Descriptive and model-based parameters were not found to be significantly associated with internalizing (general-factor scores) in the predator-task version featuring only one variability and hazard rate condition (low variability, low hazard rate). a| Estimation errors in this task version were not found to be significantly associated with internalizing. b| Single-trial learning rates did not reveal significant associations between learning and internalizing. c| Regression of model parameters estimated using the reduced Bayesian model with internalizing did not reveal any significant associations with internalizing. All p-values of this regression are corrected using the false discovery rate.

Results from the in-person studies with the predator task.

a| In experiment 7, skin conductance response (SCR) significantly increased following outcome onset for failed trials (pink curve), whereas we did not observe a significant increase for successful trials (green curve). b| Further partitioning of failed trials into shock and scream phases revealed a significant SCR increase for both electric shocks (pink curve) and screams (green curve) after outcome onset. Additionally, SCR was significantly higher for shocks compared to screams in experiment 7 (horizontal gray line). c| Replicating the pattern observed in experiment 7, experiment 8 also showed a significant SCR increase for failed trials after outcome onset, while successful trials did not exhibit a significant increase. d| Similarly, in experiment 8, failed trials showed a significant SCR increase for both electric shocks (pink curve) and screams (green curve) after outcome onset. However, unlike in experiment 7, SCR for shocks was not significantly different from screams based on permutation testing. Horizontal lines in each plot indicate significant SCR regions identified through permutation testing: the pink line corresponds to the pink curve, the green line corresponds to the green curve, and the gray line represents the region of significant differences between conditions.

Basic demographic details of participants who completed the binary reversal learning task without reward magnitudes.

STAI-Y1: Spielberger State-Trait Anxiety Inventory - state scale; STAI-Y2: Spielberger State-Trait Anxiety Inventory - trait scale; STICSA-T: State-Trait Inventory for Cognitive and Somatic Anxiety - trait scale, IUS-27: Intolerance of Uncertainty Scale; BDI: Beck’s Depression Inventory; PSWQ: Penn State Worry Questionnaire; MASQ: Mood and Anxiety Symptom Questionnaire; MASQ-Anh: Mood and Anxiety Symptom Questionnaire - anhedonia subscale; MASQ-AnxAr: Mood and Anxiety Symptom Questionnaire - anxious arousal subscale

Performance and switch rates in the probabilistic reversal learning task.

We analyzed performance and switch rates (proportion of trials in which participants chose a different fractal compared to their previous choice) by comparing participants with low and high general-factor scores. a| Performance, calculated as the proportion of trials in which participants chose the highly rewarding fractal, was not found to be significantly different between low- and high-internalizing groups in either the stable or volatile task phases. b| The percentage of switches following a win was not found to be significantly different between low- and high-internalizing groups. c| While participants switched more frequently after losses than after wins, switch rates after losses were not found to be significantly different between the groups. d| Performance was not found to be significantly associated with internalizing in either the stable or volatile phase. e,f| In neither stable nor volatile task phase did we find a significant association of internalizing with switch rates after wins (e) or switch rates after losses (f). All regression p-values are corrected based on the false discovery rate.

Model comparison for the probabilistic reversal learning task.

α: Learning-rate parameter, β: Inverse temperature, δ: Decay parameter. The subscript ck represents the choice-kernel parameter. Superscripts indicate parameter components:

sv: Baseline component and a task phase-dependent component (stable vs. volatile).

gb: Component encoding outcome valence (good vs. bad).

sv· gb: Interaction between task phase and outcome valence.

PSIS-LOO: Pareto-smoothed importance-sampling-based approximation of leave-one-out cross-validation.

SE: standard error.

Parameter recovery analysis for the winning model (Model 6).

To assess parameter identifiability, we simulated five datasets using subject-specific posterior means from the winning model. The model was then fitted to each dataset, and Spearman’s rank correlation was computed between the ground truth and recovered parameter values. The figure shows results for one example dataset, with each panel representing a model parameter (x-axis: ground truth parameter values, y-axis: recovered values). The average Spearman correlation between the ground truth and recovered parameters across all 5 simulated datasets was ρ = 0.68.

Posterior predictive checks for the winning model.

To assess how well the winning model (Model 6) captured key qualitative features of choice behavior, we generated 500 simulated choice datasets per participant by drawing random samples from each participant’s joint posterior distribution (320 trials per dataset). a-b| Spearman correlations between the average number of switches across simulations and actual participant switches in the stable (a) and volatile (b) task phases. We observed strong correlations in both phases, indicating that the model successfully reproduces switching behavior. Circles represent the average number of switches, and error bars denote the standard deviation across simulations. c| Correlation between the proportion of trials in which the high-reward fractal was chosen in the simulated data and actual participants (P(Correct)). The strong correlation suggests that the model accurately captures choice accuracy. d| Distribution of the model’s predicted performance across trials and participants. The red line represents the actual average performance across participants. The close alignment between the predicted distribution and the actual value further supports the model’s ability to replicate key behavioral patterns.

Split-half reliability of model parameters for the reversal learning task.

To assess reliability, we split the data into two subsets: one containing the first block of each task phase and the other containing the second block. The winning hierarchical model (Model 6) was fit separately to each subset, and Spearman’s rank correlation ρ was computed for the resulting parameter estimates. The analysis revealed poor-to-moderate reliability for the learning-rate components.

No significant effects of F1 and F2 on learning-rate components.

a| The effect of the anxiety-related factor F1 on learning rates, represented by the β1 population-parameter posterior means and the corresponding 95% highest-density interval (HDI). We did not observe a significant effect of F1, as all HDIs included zero. b| The β2 population parameter, representing the depression-related factor F2, was also not found to have any significant effect on learning-rate components.

Posterior distributions and associations with factors for the inverse temperature parameter.

The parameter represents choice stochasticity, with higher values indicating more deterministic choices. a| Posterior distributions for the mean parameter µ0 of the inverse temperature components, with error bars representing the mean and 95% highest-density intervals (HDI) for each distribution. Significant effects of outcome valence (good vs. bad), task phase (volatile vs. stable), and their interaction were observed across all participants. b| The influence of internalizing (general-factor scores) on choice-stochasticity components (inverse temperature), shown by posterior means and 95% HDIs. General-factor scores significantly modulated the baseline inverse temperature, leading to more deterministic choices (BBaseline βg = 0.19, 95% HDI = [0.05, 0.34]). Additionally, general-factor scores influenced the difference in choice stochasticity between stable and volatile task phases, leading to more stochastic choices in volatile compared to stable phase (BV olatilestable βg = -0.06, 95% HDI = [-0.12, 0.0]). c| The anxiety-related factor F1, represented by the population parameter β1, showed no significant effects on any inverse temperature components. d| The depression-related factor F2, represented by the population parameter β2, was associated with a significant increase in baseline inverse temperature, suggesting reduced choice stochasticity for participants with higher F2 scores

Parameter recovery analysis for the volatile Kalman filter (VKF) model.

To assess parameter identifiability, we simulated five datasets using subject-specific parameter values from the VKF. The model was then fitted to each dataset, and Spearman’s rank correlation was computed between the ground truth and recovered parameter values. The figure shows results for one example dataset, with each panel representing a model parameter (x-axis: ground truth parameter values, y-axis: recovered values). The average Spearman correlation between the ground truth and recovered parameters across all 5 simulated datasets was ρ = 0.70.

Results from a volatile Kalman filter (VKF) fitted to participant data and analyzed against internalizing.

The VKF model fitted to the participant data consisted of 4 free parameters: the volatility update rate λ, the initial estimate of volatility v0, the noise parameter ω and the inverse temperature parameter B. a| The volatility update parameter λ, which determines the extent of change in the inferred volatility, was not found to differ significantly between the low- and high-internalizing groups. b| Regression analyses did not reveal any significant associations between λ and internalizing. c| The initial volatility estimate v0 was not found to differ significantly between the low- and high-internalizing groups. d| Regression analyses did not reveal significant associations between v0 and internalizing. e| The noise parameter ω, which indicates the scale of volatility throughout the task, was not observed to be significantly different between the low- and high-internalizing groups. f| We did not find a significant association of ω with internalizing in a regression analysis. g| The inverse temperature parameter B was not found to be significantly different between the low- and high-internalizing groups in either task phase. h| Regression analyses did not reveal any significant association between B and internalizing. All regression p-values are corrected based on the false discovery rate.

Basic demographic details of participants who completed the binary reversal learning task with varying reward magnitudes.

STAI-Y1: Spielberger State-Trait Anxiety Inventory - state scale; STAI-Y2: Spielberger State-Trait Anxiety Inventory - trait scale; STICSA-T: State-Trait Inventory for Cognitive and Somatic Anxiety - trait scale, IUS-27: Intolerance of Uncertainty Scale; BDI: Beck’s Depression Inventory; PSWQ: Penn State Worry Questionnaire; MASQ: Mood and Anxiety Symptom Questionnaire; MASQ-Anh: Mood and Anxiety Symptom Questionnaire - anhedonia subscale; MASQ-AnxAr: Mood and Anxiety Symptom Questionnaire - anxious arousal subscale

Total score, performance, and percentage of switches in the binary reversal learning task with reward magnitudes.

rst and pst represent statistics for the stable phase, whereas rvol and pvol represent statistics for the volatile task phase. a| Total scores, calculated as the average of the maximum scores achieved in each block of a task phase, were not found to be significantly different between the low- and high-internalizing groups across either task phase. b| Regression analyses did not reveal any significant associations between total scores and internalizing in either the stable or volatile phase. c| Performance, measured as the proportion of trials in which participants selected the rewarding fractal, did not show any significant differences between the low- and high-internalizing groups in either task phase. d| Regression analyses did not reveal significant associations between performance and internalizing for either task phase. e| The percentage of switches after hits was not found to be significantly different between the low- and high-internalizing groups in either the stable or volatile phase. f| Regression analyses did not show significant associations between the percentage of switches after hits and internalizing for either phase. g| The percentage of switches after losses was also not found to be significantly different between the low- and high-internalizing groups in either task phase. h| Regression analyses did not reveal any significant associations between the percentage of switches after losses and internalizing in either task phase. All regression p-values are corrected based on the false discovery rate.

Model comparison for the probabilistic reversal learning task with reward magnitudes.

α: Learning-rate parameter, β: Inverse temperature parameter, γ: Risk parameter, λ: Mixture parameter, r: Magnitude-scaling parameter, δ: Decay parameter, ϵ: Lapse parameter. The subscript ck represents the choice-kernel parameter. Superscripts indicate parameter components:

sv: Baseline component and a task-phase-dependent component (stable vs. volatile).

gb: Component encoding outcome valence (good vs. bad).

sv · gb: Interaction between task phase and outcome valence.

PSIS-LOO: Pareto-smoothed importance-sampling-based approximation of leave-one-out cross-validation.

SE: standard error.

Parameter recovery analysis for the winning model (Model 9).

The average Spearman correlation between the ground truth and recovered parameters across all 5 simulated datasets was ρ = 0.71.

Posterior predictive checks for the winning model.

To assess how well the winning model (Model 9) captured key qualitative features of choice behavior, we generated 500 simulated choice datasets per participant by drawing random samples from each participant’s joint posterior distribution (320 trials per dataset). a-b| Spearman correlations between the average number of switches across simulations and actual participant switches in the stable (a) and volatile (b) task phases were strong, indicating that the model successfully reproduces switching behavior. Circles represent the average number of switches, and error bars denote the standard deviation across simulations. c| We found a high correlation between the proportion of trials in which the high-reward fractal was chosen (P(Correct)) in the simulated data and actual participant performance. This suggests that the model captured choice accuracy accurately. d| Distribution of the model’s predicted performance across trials and participants. The red line represents the actual average P(Correct) across participants. The close alignment between the predicted distribution and the actual value further supports the model’s ability to replicate key behavioral patterns.

Mean learning-rate components and the effect of internalizing on learning-rate components for the reversal learning task with reward magnitudes.

a| Mean learning-rate components across all participants. Learning rates were significantly higher after positive outcomes compared to negative outcomes and during the volatile phase compared to the stable phase. b| We did not find an effect of internalizing, represented by the population-level parameter βg, on learning-rate components, with the 95% highest-density intervals (HDIs) of all components containing zero. c,d| Participants were divided into low- and high-internalizing groups based on their general-factor scores (below and above the mean). We extracted their mean learning rates for the interaction between valence and task phase, visualizing the learning rate for the two groups.

Effects of anxiety-related factor (F1) and depression-related factor (F2) on learning rates in the reversal learning task with reward magnitudes.

a,b| We did not find an effect of either the anxiety-related factor (F1; represented by the β1 population-level parameter) (a) or the depression-related factor (F2; represented by the β2 population-level parameter) (b) on any learning-rate component, with all corresponding 95% highest-density intervals (HDIs) containing zero.

Mean inverse temperature and effect of internalizing, the anxiety-related factor F1, and the depression-related factor F2 on the inverse temperature parameter in the reversal learning task with reward magnitudes.

a| Group mean (µ0, population-level parameter) of inverse temperature B across participants. A significant effect of outcome valence was observed, with inverse temperature increasing after good outcomes compared to bad outcomes. b| Effect of internalizing (βg, population-level parameter) on inverse temperature. We did not find a significant effect of internalizing on any of the inverse temperature components, with all corresponding 95% highest-density intervals (HDIs) containing zero. c| We did not observe an effect of the anxiety-related factor (F1, represented by the β1 population-level parameter) on any inverse temperature component, with all corresponding 95% HDIs containing zero. d| Similarly, we did not observe an effect of the depression-related factor (F2, represented by the β2 population-level parameter) on any inverse temperature component, with all corresponding 95% HDIs containing zero.

Basic demographic details of participants who completed the reversal learning task with outcome magnitudes, consisting of reward and loss domains.

STAI-Y1: Spielberger State-Trait Anxiety Inventory - state scale; STAI-Y2: Spielberger State-Trait Anxiety Inventory - trait scale; STICSA-T: State-Trait Inventory for Cognitive and Somatic Anxiety - trait scale, IUS-27: Intolerance of Uncertainty Scale; BDI: Beck’s Depression Inventory; PSWQ: Penn State Worry Questionnaire; MASQ: Mood and Anxiety Symptom Questionnaire; MASQ-Anh: Mood and Anxiety Symptom Questionnaire - anhedonia subscale; MASQ-AnxAr: Mood and Anxiety Symptom Questionnaire - anxious arousal subscale

Total score, performance, and percentage of switches in the reward domain of the binary reversal learning task with reward and loss domain.

Here, rst and pst denote the stable phase statistics, while rvol and pvol pertain to the volatile phase. a| We did not observe a significant difference in total scores between the low- and high-internalizing groups across either task phase. b| Regression analyses did not reveal any significant associations between total scores and internalizing in either the stable or volatile phase. c| Performance, measured as the proportion of trials in which participants selected the rewarding fractal, did not show any significant differences between the low- and high-internalizing groups in either task phase. d| Regression analyses found no significant associations between performance and internalizing for either task phase. e| We did not find any significant differences in the percentage of switches following rewards (successful trials) between the low- and high-internalizing groups in either the stable or volatile phase. f| Regression analyses did not reveal any significant associations between the percentage of switches after rewards and internalizing for either phase. g| The percentage of switches after no rewards (failed trials) was also not found to be significantly different between the low- and high-internalizing groups in either task phase. h| Regression analyses revealed a significant association between internalizing and the percentage of switches after no rewards in the volatile task phase. All p-values are corrected for multiple comparisons using false discovery rate correction.

Total score, performance, and switching behavior in the loss domain of the binary reversal learning task of experiment 4.

Participants began the task with an initial score of 10,000 points, which decreased after failed trials in the loss domain. Subscripts st and vol denote the stable and volatile task phases, respectively. a| We did not observe any significant differences in total scores between the low- and high-internalizing groups across either task phase. b| Regression analyses did not reveal any significant associations between total scores and internalizing in either the stable or volatile phase. c| Performance, measured as the proportion of trials in which participants selected the rewarding fractal, was not found to be significantly different between the low- and high-internalizing groups in either task phase. d| Regression analyses did not reveal any significant associations between performance and internalizing for either task phase. e|We did not find any significant difference in the percentage of switches following no-loss trials (successful trials) between the low- and high-internalizing groups in either the stable or volatile phase. f| Regression analyses did not reveal any significant associations between the percentage of switches after no-loss trials and internalizing for either phase. g| The percentage of switches after loss trials (failed trials) was also not found to be significantly different between the low- and high-internalizing groups in either task phase. h| Regression analyses did not reveal a significant association between the percentage of switches after loss trials and internalizing in either task phase. All p-values are corrected for multiple comparisons using false discovery rate correction.

Model comparison for the probabilistic reversal learning task with reward and loss domains.

α: Learning-rate parameter, β: Inverse temperature parameter, γ: Risk parameter, λ: Mixture parameter, r: Magnitude-scaling parameter, δ: Decay parameter, ϵ: Lapse parameter. The subscript ck represents the choice-kernel parameter. Superscripts indicate parameter components:

sv: Baseline component and a task phase-dependent component (stable vs. volatile).

rl: Component encoding task domain (reward vs. loss).

gb: Component encoding outcome valence (good vs. bad).

sv· gb: Interaction between task phase and outcome valence.

sv · rl: Interaction between task phase and task domain.

rl · gb: Interaction between task domain and outcome valence.

PSIS-LOO: Pareto-smoothed importance-sampling-based approximation of leave-one-out cross-validation.

SE: standard error.

Posterior predictive checks for model 12.

To assess how well the model 12 captured key qualitative features of choice behavior, we generated 500 simulated choice datasets per participant by drawing random samples from each participant’s joint posterior distribution (180 trials each for the reward and loss domain). a-b| Spearman correlations between the average number of switches across the task domains in simulations and actual participant switches in the stable (a) and volatile (b) task phases were strong, indicating that the model successfully reproduces switching behavior. Circles represent the average number of switches, and error bars denote the standard deviation across simulations. c| We found a high correlation between the proportion of trials in which the high-reward fractal was chosen (P(Correct)) in the simulated data and actual participant P(Correct). This suggests that the model captured choice accuracy accurately. d| Distribution of the model’s predicted P(Correct) across trials and participants. The red line represents the actual average P(Correct) across participants. The close alignment between the predicted distribution and the actual value further supports the model’s ability to replicate key behavioral patterns.

Mean learning-rate components and overall learning rates in the reversal learning task across reward and loss domains.

a| Mean learning-rate components across all participants. Learning rates were significantly higher after positive outcomes compared to negative outcomes and during the volatile phase compared to the stable phase. b| Overall learning rates in the reward domain, categorized by different outcome valence and task phase conditions. The error bars represent the mean learning rate and standard error of the mean for each condition. c| Overall learning rates in the loss domain, categorized by different outcome valence and task phase conditions.

No significant effect of internalizing on the learning-rate components.

a| We did not find a significant effect of internalizing on any of the learning-rate components, with all 95% HDIs containing zero. b,c| Participants were categorized into low- and high-internalizing groups, with their learning rates for different outcome valence and task phase conditions shown for the reward domain (b) and the loss domain (c).

Effect of the anxiety-related factor F1 and the depression-related factor F2 on learning-rate components.

a| We did not find a significant effect of the anxiety-related factor on any learning-rate component, with all 95% highest-density intervals (HDIs) containing zero. b| The depression-related factor showed a marginally significant effect on the interaction between outcome valence and task phase, suggesting higher learning from good compared to bad outcomes in the volatile phase relative to the stable phase.

Mean inverse temperature and effects of internalizing, the anxiety-related factor F1, and the depression-related factor F2 on the inverse temperature parameter in the reversal learning task with reward and loss domains.

a| Group mean (µ0, population-level parameter) of inverse temperature across participants revealed a significant effect of task phase, with inverse temperature being lower in the volatile phase compared to the stable phase. This suggests an increased choice stochasticity in the volatile phase compared to the stable phase. b| We did not find a significant effect of internalizing (βg, population-level parameter) on any inverse temperature components, with all corresponding 95% HDIs containing zero. c| We also did not find a significant effect of the anxiety-related factor (β1, population-level parameter) on any inverse temperature components, with all corresponding 95% highest-density intervals (HDIs) containing zero. c| Similarly, we did not find a significant effect of the depression-related factor (β2, population-level parameter) on any inverse temperature components, with all corresponding 95% HDIs containing zero.