Participant Demographics

Ethnicity and race classifications were self-reported by participant/parent. Participants who identified as two or more categories of race (Black, White, and/or Asian) were classified as Multiple. Participants who specified Asian (Indian) or South Asian were classified as Asian. Participants who identified as one or more races other than Black, White, or Asian were classified as Other. Abbreviations: Cont., continuous; Det., deterministic; Dis., discrete; n, number; Prob, probabilistic; RH, right-handed; std, standard deviation; yrs, years.

Game Environment.

a. Screenshot of game environment and sample movement path (large text, arrows, and movement path were not displayed to participants). During the learning block, participants either experienced a continuous target (continuous groups) or seven discrete targets (discrete groups). b. Reward landscape for the learning blocks for the different task paradigms. Continuous probabilistic: continuous target with position-based reward probability gradient; discrete probabilistic: discrete targets with target specific reward probabilities; continuous deterministic: continuous target with a single 100% rewarded zone; discrete deterministic: discrete targets with a single target giving 100% reward. c. Outcome feedback for continuous probabilistic task. Success (top), movie clip and pleasant sound plays, and video screen is outlined in blue. Failure (bottom), movie clip does not play, the penguin falls over and red text “You slipped!” appears with a sad face image.

Paradigm, example behavior, and target accuracy in the continuous probabilistic task.

a. Experimental design with baseline: single discrete target presented in randomized locations across the screen; learning: learning block with reward determined by endpoint position; success clamp: feedback clamped to 100% success independent of endpoint position; fail clamp: feedback clamped to 100% failure independent of endpoint position; single target: single discrete target presented in the middle of the screen. b. Representative endpoint time series from various aged participants. Gray shaded zones indicate positions in the workspace where a reward is given 100% of the time (thin gray lines are for discrete targets). Green filled circles indicate rewarded trials while open circles indicate unrewarded trials. The horizontal colored bar on the x-axis indicates the trials corresponding with the experimental blocks outlined in a. In the learning block (trials 21–120), rewards were given based on the continuous probabilistic landscape. c. Mean baseline accuracy (average reach deviation from the discrete targets) by age. Adult data are averaged and plotted to the right with standard error of the mean. The gray region shows the width of a discrete target. d. Same as c for the single target in block 5. In c and d, participants who completed the task in person (in lab) are indicated in white circles.

Continuous probabilistic task learning block time series.

Data and model (red, smooth curves) for each trial of the learning block grouped into age ranges. Data shows mean (solid line) and standard error of mean (shading) of participants’ endpoint. Model in red shows mean (dashed lines) and standard deviation (shading) from the model simulations. The gray region shows 100% reward zone.

Variability and learning in the continuous probabilistic task.

a. Baseline variability by age. Average adult variability shown for comparison. b. Learning block performance (absolute distance from 100% reward zone) by age. c. Endpoint variability in the success clamp by age. d. Endpoint variability in the failure clamp by age. For a - d, regression line with 95% confidence limits shown for children and error bars show standard error of the mean for adults. e. Predicted vs. observed performance from the multiple regression of learning as a function of age, baseline variability and fail clamp variability. f. Mediation analysis (see Methods for details). Top pathway shows the direct relationship between age and learning. Bottom pathway shows the indirect relationship between age and learning when mediated by baseline variability and fail clamp variability. Note that in our measure of learning, smaller distances from the 100% reward zone reflect better learning, which explains the negative relationships in this analysis (e.g., increasing age is associated with decreased distances from the reward zone). Age is coded in months. Participants who completed the task in person (in lab) are indicated in white symbols.

Reinforcement learning model for the continuous probabilistic task.

a. Model schematic. The participant maintains an estimate of desired reach which they can update across the experiment. The actual reach on the current trial (pink box) depends on whether the previous trial (yellow box) was a failure (top) or success (bottom). After failure (top) the actual reach is the desired reach with the addition of exploration and motor noise (draws from zero mean Gaussian distributions with standard deviations σe and σm, respectively). In contrast, if the previous trial was a success (bottom), the participant does not explore so that the actual reach is the desired reach with only motor noise. The actual reach determines the probability of whether the current trial is a failure or a success. If the current trial is a success the desired reach is updated for the next trial (blue box) by the exploration (if any), modulated by a learning rate η. b. Model fit parameters {σm, σe, η}, by age for the continuous probabilistic group. The solid thick line is a regression fit to the data for participants less than 18 years old and the thin line is a running mean ±3 years with the standard error of the mean. The correlation and p-value for each regression are shown in the bottom left corner of each plot (and exclude the adult data). Average adult parameters are shown on the right with standard error of the mean. c. Predicted vs. actual variability in the success (left column) and fail (right column) clamp blocks. Correlations and p-vales are shown above each plot (the plots and statistics exclude the adult data).

Comparison of the four tasks for the three- to eight-year-old children.

a. Learning block performance for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic tasks in the same format as Fig. 3. b. Comparative performance between tasks for learning distance and variability in baseline, success clamp, and fail clamp. Learning significantly improved with discrete targets and deterministic reward feedback. Baseline variability was not statistically different between tasks. Statistically significant pairwise comparisons indicated as follows: * = p < 0.05, + = p < 0.01, and Δ = p < 0.001. Abbreviations: CD = Continuous Deterministic; CP = Continuous Probabilistic; DD = Discrete Deterministic; DP = Discrete Probabilistic

Mediation analysis for continuous probabilistic task.

Results of the effect of age on learning mediated by baseline variability and variability after failure. Baseline variability and variability after failure together partially mediate the effect of age on learning. Significant effects are in bold. β: regression coefficient, SE: standard error

Mediation analysis for discrete probabilistic task.

Results of the effect of age on learning mediated by variability after success and after failure. Together, variability after success and after failure completely mediate the effect of age on learning. Significant effects are in bold. β: regression coefficient, SE: standard error

Map of participant locations.

Thirty-eight states of the United States of America are represented in this dataset. The map was generated in Excel on Microsoft 365.

Example baseline paths for participants ages 3 to 11 years old.

Each trajectory begins at (0,0) and ends at Y = 24 when the penguin crosses the back edge of the ice. The final X position of each trajectory corresponds to the interpolated final position of the movement (see Methods for additional details). As available, a sample for each age bin from each input device type is provided. Note that trajectories tend to be straighter for touchscreen input compared to other devices. The twenty squares represent the target centers. Note that the full reward zone is not shown due to overlap between targets. Unrewarded and reward paths are shown as dashed and solid lines, respectively.

Example baseline paths for participants for participants ages 12 to 17 years old and adults.

Same format as Supp. Fig. 2.

Path length ratios.

The path length ratio is a measure of path curvature (path length divided by distance from first to last point of movement) for the four tasks. Significant pairwise comparisons between age bins indicated above plots as follow: * = p < 0.05, + = p < 0.01, and Δ = p < 0.001. Bars show mean and standard error of the mean.

Timing information.

Reaction time (time from when penguin appeared until the participant clicked on the penguin to start the trial), stationary time (time from click to start of movement), movement time (time from start to end of movement) and game time (time to complete the whole task in minutes) for the 4 tasks split by age bins. Significant pairwise comparisons between age bins indicated above plots as follow: * p < 0.05, + p < 0.01, Δ p < 0.001. Bars show mean and standard error of the mean.

History dependence of change in reach as a function of reward history for the continuous probabilistic task.

We performed a regression analysis on the change in absolute reach direction (ΔXt = XtXt−1) as a function of whether the last three trials were successes or failures. That is, we fit ΔXt = w0 +w1ft−1 +w2ft−2 +w3ft−3, where ft is 1 if trial t was a failure and 0 for success. Note that w0 reflects the change in reach when there were no failures in the previous three trials. This decreased with age and may represent decreasing sensorimotor noise (w0: 100 out of 111 participants significant at p=0.05; correlation with age R2 = 0.122, F = 15.2, p < 0.001). w1, w2 and w3 reflect the contribution of failing on the previous trial, two and three trials ago, respectively to the change in reach. The change in reach after one failure increased with age (w1: n = 66 out of 111 participants significant; correlation with age R2 = 0.139, F = 17.6, p < 0.001). The effect of failure for two and three trials back were mostly not significant (w2: 20 out of 111 participants significant; correlation with age R2 = 0.0166, F = 1.83, p = 0.178; w3: 17 out of 111 participants significant; correlation with age R2 = 0.119, F = 14.7, p < 0.001). For the adults, the average of all of the data points is plotted as a horizontal line.

Model parameter recovery.

The recovered vs. true parameters for synthetic data generated by the model (for 100 learning trials) and then fit. Correlations are shown above the plots.

Comparison of significant models.

Some participants in each age bin were best fit with the noise only model (left) compared to the full model (right). When removing the participants who were best fit with the noise only model, the same age related trends in learning remained as depicted in Fig. 3. Children age 3 to 8 years old show poor learning compared to older participants.

Example behavior, and discrete target performance in the discrete probabilistic task.

Same format as Fig. 2.

Discrete probabilistic task learning block time series.

Same format as Fig. 3.

Variability and learning in the discrete probabilistic task.

Same format as Fig. 4.

Reinforcement learning model for the discrete probabilistic task.

Panels in same format as Fig. 5b.

Adult and 3 to 8-year-old children performance on all four tasks.

Panels in same format as Fig. 6a.