Participant Demographics

Participant Ethnicity and Race

Game Environment.

a. Screenshot of game environment and sample movement path (large text, arrows, and movement path were not displayed to participants). During the learning block, participants either experienced a continuous target (continuous groups) or seven discrete targets (discrete groups). b. Reward landscape for the learning blocks for the different task paradigms. The x-axis represents normalized units based on the participant’s computer setup. Continuous probabilistic: continuous target with position-based reward probability gradient; discrete probabilistic: discrete targets with target specific reward probabilities; continuous deterministic: continuous target with a single 100% rewarded zone; discrete deterministic: discrete targets with a single target giving 100% reward. c. Outcome feedback for continuous probabilistic task. Success (top), movie clip (different for each trial) and pleasant sound plays, and video screen is outlined in blue. Failure (bottom), movie clip does not play, the penguin falls over and red text “You slipped!” appears with a sad face image.

Figure 1—figure supplement 1. Map of participant locations.

Paradigm, example behavior, and target accuracy in the continuous probabilistic task.

a. Experimental design with baseline: single discrete target presented in randomized locations across the screen; learning: learning block with reward determined by endpoint position; success clamp: feedback clamped to 100% success independent of endpoint position; fail clamp: feedback clamped to 100% failure independent of endpoint position; single target: single discrete target presented in the middle of the screen. b. Representative endpoint time series from various aged participants. Gray shaded zones indicate positions in the workspace where a reward is given 100% of the time (thin gray lines in first and last blocks are for the discrete targets). Green filled circles indicate rewarded trials while open circles indicate unrewarded trials. The horizontal colored bar on the x-axis indicates the trials corresponding with the experimental blocks outlined in a. In the learning block (trials 21–120), rewards were given based on the continuous probabilistic landscape. c. Mean baseline accuracy (average reach deviation from the discrete targets) by age. Adult data are averaged and plotted to the right with standard error of the mean. The gray region shows the width of a discrete target. d. Same as c for the single target in block 5. In c and d, participants who completed the task in person (in lab) are indicated in white circles.

Figure 2—figure supplement 1. Example baseline paths for participants ages 3 to 11 years old.

Figure 2—figure supplement 2. Example baseline paths for participants ages 12 to 17 years old and adults.

Figure 2—figure supplement 3. Path length ratios.

Figure 2—figure supplement 4. Timing information.

Continuous probabilistic task learning block time series.

Data and model (red, smooth curves) for each trial of the learning block grouped into age ranges. Data shows mean (solid line) and standard error of mean (shading) of participants’ endpoint. Model in red shows mean (dashed lines) and standard error (shading) from the model simulations. The gray region shows 100% reward zone.

Variability and learning in the continuous probabilistic task.

a. Baseline precision by age. Average adult variability shown for comparison. b. Learning block performance by age. Learning measure is the distance from the target, measured as the absolute distance from the 100% reward zone. c. Endpoint variability in the success clamp by age. d. Endpoint variability in the failure clamp by age. Regression line with 95% confidence interval shown for children and error bars show standard error of the mean for adults. Participants who completed the task in person (in lab) are indicated in white symbols.

Reinforcement learning model for the continuous probabilistic task.

a. Model schematic. The participant maintains an estimate of the desired reach which they can update across the experiment. The actual reach on the current trial (pink box) depends on whether the previous trial (yellow box) was a failure (top) or success (bottom). After failure (top) the actual reach is the desired reach with the addition of exploration variability and motor noise (draws from zero mean Gaussian distributions with standard deviations σe and σm, respectively). In contrast, if the previous trial was a success (bottom), the participant does not explore so that the actual reach is the desired reach with only motor noise. The actual reach determines the probability of whether the current trial is a failure or a success. If the current trial is a success, the desired reach is updated for the next trial (blue box) by the exploration (if any). b. Examples of model fits to three participants. The data are shown as circles, with success trials filled green and unsuccessful trials filled white. The estimated desired reach is shown as a thick black line and the estimated exploration variability (orange line) and motor noise (blue line) connect the desired reach to the data. The simulation of the participant with the fit parameters are shown in the pink line with shading showing one standard deviation across the simulations. c. Model fit parameters {σm, σe}, by age for the continuous probabilistic group. The line is a regression fit (with 95% confidence interval in shading) to the data for participants younger than 18 years old. The correlation and p-value for each regression are shown in the bottom left corner of each plot (and exclude the adult data). Average adult parameters are shown on the right with standard error of the mean.

Figure 5—figure supplement 1. Model comparison for the Continuous and Discrete Probabilistic tasks.

Figure 5—figure supplement 2. Model parameter recovery.

Figure 5—figure supplement 3. Example fits of the model to the Continuous Probabilistic task.

Figure 5—figure supplement 4. Fits to the success and failure clamp phases for the Continuous Probabilistic task.

Discrete probabilistic task learning block time series and parameter fits.

a Same format as Fig. 3. b Panels in same format as Fig. 5c with the regressions for the continuous probabilistic task overlaid in blue.

Figure 6—figure supplement 1. Example behavior, and discrete target performance in the discrete probabilistic task.

Figure 6—figure supplement 2. Variability and learning in the discrete probabilistic task.

Figure 6—figure supplement 3. Example fits of the model to the Discrete Probabilistic task.

Figure 6—figure supplement 4. Fits to the success and fail clamp phases for the Discrete Probabilistic task.

Comparison of the four tasks for the three-to eight-year-old children.

a. Learning block performance for the continuous probabilistic, discrete probabilistic, continuous deterministic, and discrete deterministic tasks in the same format as Fig. 3. b. Comparative performance between tasks for precision in baseline, learning distance from target, and variability in the success and fail clamp. Learning significantly improved with discrete targets and deterministic reward feedback. Precision in baseline was not statistically different between tasks. c. Estimates of motor noise and exploration variability standard deviation from the model. Statistically significant pairwise comparisons indicated as follows: * = p < 0.05, + = p < 0.01, and Δ = p < Abbreviations: CD = Continuous Deterministic; CP = Continuous Probabilistic; DD = Discrete Deterministic; DP = Discrete Probabilistic

Figure 7—figure supplement 1. Model fit parameters by age for the deterministic tasks.

Figure 7—figure supplement 2. Expected distance as a function of model parameters.

Distribution of desired reach and actual reach for a reach after an unsuccessful trial (left) and after a successful trial (right).

Plots show probability density across endpoint positions. In both cases the desired reach on trial t is at xt = 0 (thin red line shows distribution as a delta function). After an unsuccessful reach (left panel) the actual reach (blue distribution) includes exploration variability and motor noise. The probability of reward depends on the height of this distribution at the target (d = 10, shown by green line). For a successful trial with exploration (right panel) the distribution (xt and st, thin lines) are the same as for the left panel. However, the desired reach is updated by exploration that led to success which gives the distribution of the next desired reach (xt+1, thick red line) and the next actual reach is this distribution with the addition of motor noise (thick blue line). For this illustration, both motor and exploration standard deviations were set to 8 and the target was set +10 (green lines).

Statistical analysis of sex, handedness, device, and browser on behavior.

Results from one-way ANOVAs of participant specific factors on precision from experimental block one (baseline) and distance from target from experimental block two (learning) for each of the four tasks. For all tasks, participant specific factors did not significantly affect behavior.

Map of participant locations.

Thirty-eight states of the United States of America are represented in this dataset. The map was generated in Excel on Microsoft 365.

Example baseline paths for participants ages 3 to 11 years old.

Each trajectory begins at (0,0) and ends at Y = 24 when the penguin crosses the back edge of the ice. The final X position of each trajectory corresponds to the interpolated final position of the movement (see Methods for additional details). As available, a sample for each age bin from each input device type is provided. Note that trajectories tend to be straighter for touchscreen input compared to other devices. The twenty squares represent the target centers. Note that the full reward zone is not shown due to overlap between targets. Unrewarded and reward paths are shown as dashed and solid lines, respectively.

Example baseline paths for participants ages 12 to 17 years old and adults.

Same format as Figure 2—figure Supplement 1

Path length ratios.

The path length ratio is a measure of path curvature (path length divided by distance from first to last point of movement) for the four tasks. Significant pairwise comparisons between age bins indicated above plots as follow: * = p < 0.05, + = p < 0.01, and Δ = p < 0.001. Bars show mean and standard error of the mean.

Timing information.

Reaction time (time from when penguin appeared until the participant clicked on the penguin to start the trial), stationary time (time from click to start of movement), movement time (time from start to end of movement) and game time (time to complete the whole task in minutes) for the 4 tasks split by age bins. Significant pairwise comparisons between age bins indicated above plots as follow: * p < 0.05, + p < 0.01, Δ p < 0.001. Bars show mean and standard error of the mean.

Model comparison for the Continuous and Discrete Probabilistic tasks.

Difference in Bayesian Information Criterion (BIC) between the preferred model (Model 11) and the other variants for the Continuous Probabilistic task. The variants depend on the presence (+) or absence (0) of each source of variability (σe, σpσm); whether the learning rates (ηe, ηp) are absent (.), fit (+), set to unity (1) or ηp = ηe; and whether rp is absent (.), set to rt or unity (1). Degrees of freedom of each model is shown by d.o.f. The number of participant who are best fit by each model are shown for the continuous (Nc), discrete (Nd) and combined (Nc + Nd) tasks. When we restricted model selection to only the Continuous Probabilistic task Model 11 was again preferred with ΔBIC of 155 and 90 for the children alone or all participants. When we restricted model selection to only the Discrete Probabilistic task Model 11 was again preferred with ΔBIC of 6 and 53 for the children alone or all participants.

Model parameter recovery.

The recovered vs. true parameters for synthetic data generated by the model and then fit. Correlations are shown above the plots.

Example fits of the model to the Continuous Probabilistic task.

Fits to the same participants shown in Fig. 2 in the same format at Fig. 5b.

Fits to the success and failure clamp phases for the Continuous Probabilistic task.

a. Success clamp standard deviation as a function of age for the data (blue) and model (red) with regression lines with 95% confidence interval shading. b. Same as a. for the fail clamp. c. model v.s. empirical success clamp s.d. with variance explained and correlation with p-value. d. Same as c. for the fail clamp.

Example behavior, and discrete target performance in the discrete probabilistic task.

Same format as Fig. 2.

Variability and learning in the discrete probabilistic task.

Same format as Fig. 4.

Example fits of the model to the Discrete Probabilistic task.

Fits to the same participants shown in Figure 6—figure Supplement 1 in the same format at Fig. 5b.

Fits to the success and fail clamp phases for the Discrete Probabilistic task.

Same format at Figure 5—figure Supplement 4

Model fit parameters by age for the deterministic tasks

Panels in same format as Fig. 5c.

Expected distance to target as a function of model parameters.

Expected distance to target as a function of motor noise and exploration variability for the 4 tasks. The green filled circle shows the optimal parameters to maximize reward. The white line shows the optimal exploration variability for different levels of motor noise. The gray shaded line shows the exploration variability vs. age regression line plotted against the motor noise vs. age regression with shading showing the 95% confidence interval. The gray shading shows the participant age in the regression. As the task goes from continuous to discrete, note how the youngest children (darkest end of bar) increase their exploration variability. As the task goes from probabilistic to deterministic, the youngest children decrease their motor noise and increase their exploration variability.