1. Neuroscience
Download icon

De novo learning versus adaptation of continuous control in a manual tracking task

  1. Christopher S Yang  Is a corresponding author
  2. Noah J Cowan
  3. Adrian M Haith
  1. Department of Neuroscience, Johns Hopkins University, United States
  2. Department of Mechanical Engineering, Laboratory for Computational Sensing and Robotics, Johns Hopkins University, United States
  3. Department of Neurology, Johns Hopkins University, United States
Research Article
  • Cited 0
  • Views 736
  • Annotations
Cite this article as: eLife 2021;10:e62578 doi: 10.7554/eLife.62578

Abstract

How do people learn to perform tasks that require continuous adjustments of motor output, like riding a bicycle? People rely heavily on cognitive strategies when learning discrete movement tasks, but such time-consuming strategies are infeasible in continuous control tasks that demand rapid responses to ongoing sensory feedback. To understand how people can learn to perform such tasks without the benefit of cognitive strategies, we imposed a rotation/mirror reversal of visual feedback while participants performed a continuous tracking task. We analyzed behavior using a system identification approach, which revealed two qualitatively different components of learning: adaptation of a baseline controller and formation of a new, task-specific continuous controller. These components exhibited different signatures in the frequency domain and were differentially engaged under the rotation/mirror reversal. Our results demonstrate that people can rapidly build a new continuous controller de novo and can simultaneously deploy this process with adaptation of an existing controller.

Introduction

In many real-world motor tasks, skilled performance requires us to continuously control our actions in response to ongoing external events. For example, remaining stable on a bicycle depends on being able to rapidly respond to the tilt of the bicycle as well as obstacles in our path. The demand for continuous control in such tasks can make it challenging to initially learn them. In particular, new skills often require us to learn arbitrary relationships between our actions and their outcomes (like moving our arms to steer or flexing our fingers to brake). Learning such mappings is thought to depend on the use of time-consuming cognitive strategies (McDougle et al., 2016), but continuous control tasks require us to produce responses rapidly, leaving little time for deliberation about our actions. Exactly how we are able to learn new, continuous motor skills therefore remains unclear.

Mechanisms of motor learning have often been studied by examining how people learn to compensate for imposed visuomotor perturbations. Prior studies along these lines have revealed a variety of different ways in which humans learn new motor tasks (Krakauer et al., 2019). One of the most well characterized is adaptation, an implicit, error-driven learning mechanism by which task performance is improved by using sensory prediction errors to recalibrate motor output (Figure 1AMazzoni and Krakauer, 2006; Tseng et al., 2007; Shadmehr et al., 2010). Adaptation is primarily characterized by the presence of aftereffects (Redding and Wallace, 1993; Shadmehr and Mussa-Ivaldi, 1994; Kluzik et al., 2008) and is known to support learning in a variety of laboratory settings including reaching under imposed visuomotor rotations (Krakauer et al., 1999; Fernandez-Ruiz et al., 2011; Morehead et al., 2015), displacements (Martin et al., 1996; Fernández-Ruiz and Díaz, 1999) or force fields (Lackner and Dizio, 1994; Shadmehr and Mussa-Ivaldi, 1994), walking on split-belt treadmills (Choi and Bastian, 2007; Finley et al., 2015), and can also occur in more complex settings such as path integration in gain-altered virtual reality (Tcheang et al., 2011; Jayakumar et al., 2019). However, it appears that adaptation can only adjust motor output to a limited extent; in the case of visuomotor rotations, implicit adaptation can only alter reach directions by around 15–25°, even when much larger rotations are applied (Taylor et al., 2010; Fernandez-Ruiz et al., 2011; Taylor and Ivry, 2011; Bond and Taylor, 2015). Furthermore, under more drastic perturbations, such as a mirror reversal of visual feedback, adaptation seems to act in the wrong direction and can worsen performance rather than improve it (Abdelghani et al., 2008; Hadjiosif et al., 2021). Thus, other learning mechanisms besides adaptation seem to be required when learning to compensate for perturbations that impose significant deviations from one’s existing baseline motor repertoire.

Conceptual overview and experimental design.

(A) We conceptualize adaptation as a parametric change to an existing controller (changing θ to θ), re-aiming as feeding surrogate movement goals to an existing controller (changing g to g), and de novo learning as building a new controller (h) to replace the baseline controller (f). (B) Participants performed planar movements with their right hand while a target (yellow) and cursor (blue) were presented to them on an LCD display. Participants were asked to either move the cursor to a static target (point-to-point task) or track a moving target with the cursor (tracking task). (C) Twenty participants learned to control the cursor under one of two visuomotor perturbations: a 90° clockwise visuomotor rotation (n=10), or a mirror reversal (n=10). (D) Participants alternated between point-to-point reaching (one block = 150 reaches) and tracking (one block = 8 trials lasting 46 s each) in a single testing session in one day. We first measured baseline performance in both tasks under veridical visual feedback (blue), followed by interleaved tracking and point-to-point blocks with perturbed visual feedback from early learning (orange) to late learning (yellow). Blocks between early and late learning are indicated in grey. At the end of the experiment, we assessed aftereffects in the tracking task by removing the perturbation (purple).

Another way people learn to compensate for visuomotor perturbations is by using re-aiming strategies. This involves aiming one’s movements towards a surrogate target rather than the true target of the movement (Figure 1A). In contrast to adaptation, in which the controller itself is altered to meet changing task demands, re-aiming feeds an existing controller a fictitious movement goal in order to successfully counter the perturbation without needing to alter the controller itself. It has been shown that people use re-aiming strategies, often in tandem with adaptation, to compensate for visuomotor rotations (Mazzoni and Krakauer, 2006; de Rugy et al., 2012; Taylor et al., 2014; Morehead et al., 2015) as well as for imposed force fields (Schween et al., 2020) and perturbations to muscular function (de Rugy et al., 2012). In principle, re-aiming enables people to compensate for arbitrary visuomotor re-mappings of their environment, including large (90°) visuomotor rotations (Bond and Taylor, 2015) or mirror-reversed visual feedback (Wilterson and Taylor, 2019). However, implementing re-aiming is a cognitively demanding and time-consuming process that significantly increases reaction times (Fernandez-Ruiz et al., 2011; Haith et al., 2015; Leow et al., 2017; McDougle and Taylor, 2019).

A third possible approach to learning, aside from adaptation or re-aiming, is to build a new controller to implement the newly required mapping from sensory input to motor output – a process that has been termed de novo learning (Figure 1ATelgen et al., 2014; Sternad, 2018). This approach contrasts with adaptation, in which an existing controller is parametrically altered, and with re-aiming, in which fictitious movement goals are fed to an existing controller to generate a successful movement. Previous studies suggest that learning to counter a mirror reversal of visual feedback may engage de novo learning. Learning under a mirror reversal shows a number of important differences from learning under a visuomotor rotation: it does not result in aftereffects when the perturbation is removed (Gutierrez-Garralda et al., 2013; Lillicrap et al., 2013), it shows offline gains (Telgen et al., 2014), and it seems to have a distinct neural basis (Schugens et al., 1998; Maschke et al., 2004; Morton and Bastian, 2006; Gutierrez-Garralda et al., 2013). However, these properties would also be expected if participants learned to counter the mirror reversal by simply re-aiming their movements to a different target, as has been suggested by Wilterson and Taylor, 2019. It is therefore unclear whether or not people ever compensate for visuomotor perturbations by building a de novo controller.

How might one dissociate re-aiming from building a new controller? A key property of re-aiming is that it is cognitively demanding and time-consuming to implement (Fernandez-Ruiz et al., 2011; Haith et al., 2015; Leow et al., 2017). This leads to increased reaction times (Fernandez-Ruiz et al., 2011), and performance worsens if reaction times are forced to be shorter (Fernandez-Ruiz et al., 2011; Haith et al., 2015; Huberdeau et al., 2019; McDougle and Taylor, 2019). While this may not significantly hamper performance in discrete movement tasks like point-to-point reaching or throwing to a stationary target, in continuous control tasks where one’s movement goal is constantly and unpredictably changing, movements to the goal cannot be completely planned in advance. Thus, continuous control tasks may severely limit one’s ability to use re-aiming strategies and may not be solvable by the same means as point-to-point tasks. Although several studies have examined learning in continuous control tasks (Schugens et al., 1998; Bock and Schneider, 2001; Bock et al., 2001), these studies used relatively slow-moving targets (<0.35 Hz movement), which could potentially be tracked using intermittent ‘catch-up’ movements that are strategically planned similar to explicit re-aiming of point-to-point movements (Craik, 1947; Miall et al., 1993a; Russell and Sternad, 2001; Susilaradeya et al., 2019). To more strictly limit peoples’ ability to rely on re-aiming, it is necessary to consider tasks in which movement goals change more quickly than the time it takes for slow cognitive strategies to be applied.

In the present study, participants learned to counter a mirror reversal of visual feedback in both a point-to-point movement task and a continuous tracking task in which a target moved in a pseudo-random sum-of-sinusoids trajectory (Figure 1B,CMiall et al., 1993b; Kiemel et al., 2006; Roth et al., 2011; Madhav et al., 2013; Sponberg et al., 2015; Yamagami et al., 2019). In the tracking task, the target moved at frequencies up to 2 Hz, much faster than in previous tracking experiments, resulting in a target trajectory that was quick, unpredictable, and unlikely to be trackable while using a re-aiming strategy. In order to achieve good tracking performance, participants instead had to continuously generate movements to track the target. Critically, the sum-of-sines structure of the target motion allowed us to employ a frequency-based system identification approach to characterize changes in participants’ motor controllers during mirror-reversal learning. We compared learning in this group to that of a second group of participants that learned to counter a visuomotor rotation, where presumably – unlike mirror reversal – adaptation would contribute to learning.

We hypothesized that if participants learned to counter the mirror reversal via de novo learning, then they would be able to successfully track the target despite its rapid and unpredictable nature. If, however, the mirror reversal can only be learned through a re-aiming strategy, then we predicted that participants would have difficulty tracking the target and may have to generate intermittent catch-up movements to pursue the target. We further hypothesized that, under the rotation, participants would parametrically alter their baseline controller via adaptation and would therefore be able to smoothly track the target.

Results

Participants learned to compensate for the rotation and mirror reversal but using different learning mechanisms

Twenty participants used their right hand to manipulate an on-screen cursor under either a 90° clockwise visuomotor rotation (n=10) or a mirror reversal (n=10) about an oblique 45° axis (Figure 1C). These perturbations were chosen such that, in both cases, motion of the hand in the x-axis was mapped to motion of the cursor in the y-axis and vice versa. Each group practiced using their respective perturbations by performing a point-to-point task, reaching towards stationary targets that appeared at random locations on the screen in blocks of 150 trials (Figure 1D). Each participant completed the experiment in a single session in 1 day. We assessed both groups’ performance in this task by measuring the error between the initial direction of cursor movement and the direction of the target. For the rotation group, this error decreased as a function of training time and plateaued near 0°, demonstrating that participants successfully learned to compensate for the rotation (Figure 2A, upper panel). For the mirror-reversal group, the directional error did not show any clear learning curve (Figure 2A, lower panel), but performance was better than would be expected if participants had not attempted to compensate at all (which would manifest as reach errors uniformly distributed between ±180°). Thus, both groups of participants at least partially compensated for their perturbations in the point-to-point task, consistent with previous findings.

Figure 2 with 1 supplement see all
Task performance improved in the point-to-point and tracking tasks.

(A) Performance in the point-to-point task, as quantified by initial reach direction error, is plotted as heat maps for the rotation group (top) and mirror-reversal groups (bottom). Each column shows the distribution of initial reach direction errors, pooled across all participants, over a horizontal bin of 15 trials. The intensity of color represents the number of trials in each 10° vertical bin where the maximum possible value of each bin is 150 (15 trials for 10 participants for each group). (B) Example tracking trajectories from a representative participant in each group. Target trajectories are shown in black while cursor trajectories are shown in brown. Each trajectory displays approximately 5 s of movement. (C) Performance in the tracking task as quantified by average mean-squared positional error between the cursor and target during each 40 s trial. Individual participants are shown in thin lines and group mean is shown in thick lines.

To test whether participants could compensate for these perturbations in a continuous control task after having practiced them in the point-to-point task, we had them perform a manual tracking task. In each 46 s tracking trial (one block = eight trials), participants tracked a target that moved in a continuous sum-of-sinusoids trajectory at frequencies ranging between 0.1 and 2.15 Hz, with distinct frequencies used for x- and y-axis target movement. The resulting target motion was unpredictable and appeared random. Furthermore, the target’s trajectory was altered every block by randomizing the phases of the component sinusoids, preventing participants from being able to learn a specific target trajectory. Example trajectories from single participants are presented in Figure 2B (see also Figure 2—video 1 for a video of tracking behavior).

As an initial assessment of how well participants learned to track the target, we measured the average mean-squared error (tracking error) between the target and cursor positions during every trial (each tracking trial lasted 46 s, 40 s of which was used for analysis; see ‘Tracking task’ in the Materials and methods for more details). Tracking error improved with practice in both groups of participants, approaching similar levels of error by late learning (Figure 2C). Therefore, in both the point-to-point and tracking tasks, participants’ performance improved with practice. However, much of this improvement can be attributed to the fact that participants learned to keep their cursor within the bounds of target movement; during early learning, participants’ cursors often deviated far outside the area of target movement, thus inflating the tracking error.

To better quantify improvements in participants’ ability to track the target, we examined the geometric relationship between hand and target trajectories—an approach that would be more sensitive to the small changes in movement direction associated with rotation/mirror reversal learning, not just large deviations outside the target’s movement area. We aligned the hand and target tracking trajectories with a linear transformation matrix (alignment matrix) that, when applied to the target trajectory, minimized the discrepancy between the hand and target trajectories (see Materials and methods for details). This matrix compactly summarizes the relationship between target movements and hand movements and can be thought of as a more general version of reach direction for point-to-point movements. We visualized these matrices by plotting their column vectors (green and purple arrows in Figure 3A) which depicts how they would transform the unit x and y vectors.

The rotation group exhibited reach-direction aftereffects while the mirror-reversal group did not.

(A) Alignment matrices relating target and hand movement established by trajectory alignment. The top row illustrates the ideal alignment matrices at baseline or to successfully compensate for each perturbation (blue represents positive values, red represents negative values). Alignment matrices (calculated from one trial averaged across participants) from the rotation (middle row) and mirror-reversal (bottom row) groups are depicted at different points during learning. Below each matrix, we visualized how the unit x and y vectors (black lines) would be transformed by the columns of the matrices (transformed x = green, transformed y = purple). Shaded areas are 95% confidence ellipses across participants. (B) The average of the two off-diagonal elements of the estimated alignment matrices across all blocks of the experiment in the tracking task (for the rotation group, the negative of the element in row 1, column 2 was used for averaging). Grey boxes indicate when the rotation or mirror reversal were applied. Thin black lines indicate individual participants and thick lines indicate the mean across participants. (C) (Left: rotation group) Angular compensation for the rotation, computed by approximating each alignment matrix with a pure rotation matrix. (Right: mirror-reversal group) Scaling factor orthogonal to the mirror axis. In each plot, dashed lines depict ideal performance when the perturbation is (green) or is not (black) applied. Thin black lines indicate individual participants and thick lines indicate the mean across participants.

Figure 3—source data 1

This file contains the results of all statistical analyses performed on the data in Figure 3B.

https://cdn.elifesciences.org/articles/62578/elife-62578-fig3-data1-v2.xlsx

In Figure 3A, we illustrate how ideal performance under different visual feedback conditions would manifest in the alignment matrices and vectors. These matrices should approximate the identity matrix when performing under veridical feedback and similarly approximate the inverse of the applied perturbation matrix under perturbed feedback. Incomplete compensation would manifest as, for example, a 45° counter-clockwise rotation matrix in response to the 90° clockwise rotation. For both groups of participants, the estimated alignment matrices were close to the identity matrix at baseline and approached the inverses of the respective perturbations by late learning (Figure 3A), demonstrating that participants performed the task successfully at baseline and mostly learned to compensate for the imposed perturbations.

To test whether these changes were statistically significant, we focused on the off-diagonal elements of the matrices. These elements critically distinguish the different transformations from one another and from baseline. In the last trial of the late-learning block, both the rotation (linear mixed-effects model [see ‘Statistics’ in Materials and methods for details about the model structure]: interaction between group and block, F(2,36)=7.56, p=0.0018; Tukey’s range test: p<0.0001; see Figure 3—source data 1 for p-values of all statistical comparisons related to Figure 3) and mirror-reversal groups (Tukey’s range test: p<0.0001) exhibited off-diagonal values that were significantly different from the first trial of the baseline block (Figure 3B), and in the appropriate direction to compensate for their respective perturbations.

From these matrices, we derived additional metrics associated with each perturbation to further characterize learning. For the rotation group, we computed a compensation angle, θ, using a singular value decomposition approach (Figure 3C; see ‘Trajectory-alignment analysis’ in Materials and methods for details). At baseline, we found that θ=3.8±1.0 (mean ± SEM), and this increased to θ=72.5±1.9 by late learning. For the mirror-reversal group, to assess whether participants learned to flip the direction of their movements across the mirroring axis, we computed the scaling of the target trajectory along the direction orthogonal to the mirror axis (Figure 3C). This value was positive at baseline and negative by late learning, indicating that participants successfully inverted their hand trajectories relative to that of the target.

Lastly, we sought to confirm that the rotation and mirror reversal were learned using different mechanisms, as has been suggested by previous studies (Gutierrez-Garralda et al., 2013; Telgen et al., 2014). We did so by assessing whether participants in each group expressed reach-direction aftereffects – the canonical hallmark of adaptation – at the end of the experiment, following removal of each perturbation in the tracking task (and with participants made explicitly aware of this). Again estimating alignment matrices (Figure 3B), we found that the magnitude of aftereffects (as measured by the off-diagonal elements of the alignment matrices) was different between the two groups in the first trial post-learning (Tukey’s range test: p<0.0001). Within groups, the off-diagonal elements for the rotation group were significantly different between the first trial of baseline and the first trial of post-learning (Tukey’s range test: p<0.0001), indicating clear aftereffects. These aftereffects corresponded to a compensation angle of θ=32.4±1.4, similar to the magnitude of aftereffects reported for visuomotor rotation in point-to-point tasks (Bond and Taylor, 2015; Morehead et al., 2017). For the mirror-reversal group, by contrast, the off-diagonal elements from the first trial of post-learning were not significantly different from the first trial of baseline (Tukey’s range test: p=0.2057; baseline range: –0.11 to 0.11; post-learning range: −0.07 to 0.28), suggesting negligible aftereffects. The lack of aftereffects under mirror reversal implies that participants did not counter this perturbation via adaptation of an existing controller and instead used an alternative learning mechanism.

In summary, these data suggest that participants were able to compensate for both perturbations in the more challenging tracking task. Consistent with previous studies focusing on point-to-point movements, these data support the idea that the rotation was learned via adaptation, while the mirror reversal was learned via a different mechanism – putatively, de novo learning.

Participants used continuous movements to perform manual tracking

Although participants could learn to successfully perform the tracking task under the mirror reversal, it is not necessarily clear that they achieved this by building a new, continuous controller; the largest amplitudes and velocities of target movement occurred primarily at low frequencies (0.1–0.65 Hz), which could potentially have allowed participants to track the target through a series of discretely planned ‘catch-up’ movements (Craik, 1947; Miall et al., 1993a; Russell and Sternad, 2001; Susilaradeya et al., 2019) that might have involved re-aiming. If participants were employing such a re-aiming strategy, we would expect this to compromise their ability to track the target continuously. To examine the possibility that participants may have tracked the target intermittently rather than continuously, we turned to linear systems analysis to analyze participants behavior at a finer-grained level than was possible through the trajectory-alignment analysis.

According to linear systems theory, a linear system will always translate sinusoidal inputs into sinusoidal outputs at the same frequency, albeit potentially scaled in amplitude and shifted in phase. Additionally, linearity implies that the result of summing two input signals is to simply sum the respective outputs. Therefore, a linear system can be fully described in terms of how it maps sinusoidal inputs to outputs across all relevant frequencies. If participants’ behavior can be well approximated by a linear model – as is often the case for planar arm movements (McRuer and Jex, 1967; Yamagami et al., 2019; Zimmet et al., 2020) – then we can fully understand their tracking behavior in terms of their response to different frequencies of target movement. The design of the tracking task enabled us to examine the extent to which participants’ behavior was linear; if participants were indeed behaving linearly (which would suggest they were tracking the target continuously), then we should find that their hand also moved according to a sum-of-sines trajectory, selectively moving at the same frequencies as the target.

We assessed whether participants selectively moved at the same frequencies as target movement by first converting their trajectories to a frequency-domain representation via the discrete Fourier transform. This transformation decomposes the full hand trajectory into a sum of sinusoids of different amplitudes, phases, and frequencies. Figure 4A shows the amplitude spectra (i.e., amplitude of movement as a function of frequency) of hand movements in the x-axis at different points during the experiment, averaged across participants (analogous data for y-axis movements can be found in Figure 4—figure supplement 1 and data from single subjects can be found in Figure 4—figure supplement 2). The amplitudes and frequencies of target movement are shown as diamonds (x- and y-axis sinusoids in green and brown, respectively) and the amplitude of participants’ movements at those same frequencies are marked by circles.

Figure 4 with 2 supplements see all
Tracking behavior was approximately linear, indicating that the hand tracked the target continuously.

(A) Amplitude spectra of x-axis hand trajectories (black line) averaged across participants from one trial in each listed block. In each plot, the amplitudes and frequencies of target motion are indicated by diamonds (green: x-axis target frequencies; brown: y-axis target frequencies). Hand responses at x- and y-axis target frequencies are highlighted as green and brown circles, respectively, and are connected by lines for ease of visualization. (B) Spectral coherence between target movement in the x-axis and hand movement in both axes. This measure is proportional to the linear component of the hand’s response to the target. Darker colors represent lower frequencies and lighter colors represent higher frequencies. Error bars are SEM across participants. (C) Difference in phase lag between movements at late learning and baseline. Data from individual participants are shown as thin lines and averages for the rotation (black) and mirror-reversal (pink) groups are shown as thick lines.

At baseline, late learning, and post-learning, participants moved primarily at the frequencies of x- or y-axis target movement (Figure 4A). At frequencies that the target did not move in, the amplitude of hand movement was low. This behavior resulted in clearly discernible peaks in the amplitude spectra, which is consistent with the expected response of a linear system. In contrast, participants’ behavior at early learning was qualitatively different, exhibiting high amplitude at movement frequencies below 1 Hz, regardless of whether the target also moved at that frequency. This suggests that a much greater proportion of participants’ behavior was nonlinear/noisy, as would be expected during early learning when neither group of participants had adequately learned to counter the perturbations.

As a further test of the linearity of participants’ behavior, we computed the spectral coherence between target and hand movement, which is simply the correlation between two signals in the frequency domain. As demonstrated by Roddey et al., 2000, for an arbitrary system responding to an arbitrary input, the fraction of the system’s response that can be explained by a linear model is proportional to the coherence between the input and the system’s output (a perfectly linear, noiseless system would exhibit a coherence of 1 across all frequencies). At baseline, for both groups, we found that the coherence between target movement and participants’ hand movement was roughly 0.75 in both the x- and y-axes (Figure 4B), meaning that 75% of participants’ behavior could be accounted for by a linear model. Although dramatically lower during early learning, the coherence approached that of baseline by late learning, indicating that the proportion of participants’ behavior that could be accounted for by a linear model increased with more practice time.

As with any correlation, the residual variance in behavior not explained by a linear model was attributable to either nonlinearities or noise. Because catch-up movements could manifest as nonlinear behavior, we estimated the additional variance that could be explained by a nonlinear, but not a linear, model by measuring the square root of the coherence between multiple responses to the same input (Roddey et al., 2000), that is, hand movements from different trials within a block. We found that across all blocks, only an additional 5–10% of tracking behavior could be explained by a nonlinear model (data not shown), suggesting that most of the residual variance was attributable to noise and that a linear model was almost as good as a nonlinear model at explaining behavior on a trial-by-trial basis. In summary, these analyses suggest that participants’ behavior at baseline, late learning, and post-learning could be well described as a linear system, thereby suggesting that their movements were continuous.

Although behavior was approximately linear across all frequencies, it is possible that performing a sequence of discretely planned catch-up movements – which might have depended on the use of a re-aiming strategy – could approximate linear behavior, particularly at low frequencies of movement. As a result, we analyzed the lag between hand and target movements to examine the plausibility of participants repeatedly re-aiming in the tracking task. Previous work suggests that in tasks with many possible target locations, planning point-to-point movements under large rotations of visual feedback incurs an additional ∼300 ms of planning time on top of that required under baseline conditions (Fernandez-Ruiz et al., 2011; McDougle and Taylor, 2019). In the context of the tracking task, this suggests that, compared to baseline, people would require an additional 300 ms of reaction time for each catch-up movement under the rotation or mirror reversal, which would increase the lag between hand movements relative to the target.

We computed this lag at late learning and baseline at every frequency of target movement (Figure 4—figure supplement 1C). We then examined how much this lag increased from baseline to late learning Figure 4C. For all but the lowest frequency of movement for the mirror-reversal group, the average increase in lag was below 300 ms. In fact, averaging across all frequencies, the increase in lag for the rotation and mirror-reversal groups were 83 ± 31 and 191 ± 62 ms (mean ± standard deviation across participants), respectively. This analysis suggests that participants responded to target movement quickly—more quickly than would be expected if participants tracked the target by repeatedly re-aiming toward an alternative target location.

In summary, the above analyses show that participants were able to track the target smoothly and continuously after learning to compensate for either the rotation or the mirror reversal. Participants did not appear to be making intermittent catch-up movements nor relying on a re-aiming strategy. Rather, their performance suggests that they were able to continuously track the target by building a de novo controller.

Adaptation and de novo learning exhibit distinct signatures in the frequency domain

The fact that tracking behavior could be well approximated as a linear dynamical system, particularly late in learning, facilitates a deeper analysis into how learning altered participants’ control capabilities. Following this approach, we treated each 40 s tracking trial as a snapshot of participants’ control capabilities at a particular time point during learning, assuming that the behavior could be regarded as being generated by a linear, time-invariant system. Although participants’ behavior changed over the course of the experiment due to the engagement of (likely nonlinear) learning processes, within the span of individual trials, our data suggest that their behavior was both approximately linear (Figure 4A,B) and changed only minimally from trial-to-trial (Figure 3B,C), validating the use of linear systems analysis on single-trial data.

We first examined learning in the amplitude spectra analysis. To perfectly compensate for either the rotation or the mirror reversal, participants’ responses to movement of the target in the x-axis needed to be remapped from the x-axis to the y-axis, and vice versa for movement of the target in the y-axis. Since the target moved at different frequencies in each axis, this remapping could be easily observed in the amplitude spectra as peaks at different frequencies. During early learning, both groups’ movements were nonlinear and were not restricted to x- or y-axis target frequencies (Figure 4A). However, by late learning, both groups learned to produce x-axis hand movements in response to y-axis target frequencies, indicating some degree of compensation for the perturbation. However, they also inappropriately continued to produce x-axis hand movements at x-axis target frequencies, suggesting that the compensation was incomplete.

After the perturbation was removed, the rotation group exhibited x-axis hand movements at both x- and y-axis target frequencies, unlike baseline where movements were restricted to x-axis target frequencies (Figure 4A). The continued movement in response to y-axis target frequencies indicated aftereffects of having learned to counter the rotation, consistent with our earlier trajectory-alignment analysis. In contrast, the amplitude spectra of the mirror-reversal group’s x-axis hand movements post-learning was similar to baseline, suggesting negligible aftereffects and again recapitulating the findings of our earlier analysis. These features of the amplitude spectra, and the differences across groups, were qualitatively the same for y-axis hand movements (Figure 4—figure supplement 1) and were also evident in individual subjects (Figure 4—figure supplement 2).

Although the amplitude spectra illustrate important features of learning, they do not carry information about the directionality of movements and thus do not distinguish learning of the two different perturbations; perfect compensation would lead to identical amplitude spectra for each perturbation. In order to distinguish these responses, we needed to determine not just the amplitude, but the direction of the response along each axis, i.e., whether it was positive or negative. We used phase information to disambiguate the direction of the response (the sign of the gain) by assuming that the phase of the response at each frequency would remain similar to baseline throughout learning. We then used this information to compute signed gain matrices, which describe the linear transformations relating target and hand motion (Figure 5—figure supplement 1). These matrices relay similar information as the alignment matrices in Figure 3 except that here, different transformations were computed for different frequencies of movement. To construct these gain matrices, the hand responses from neighboring pairs of x- and y-axis target frequencies were grouped together. This grouping was performed because target movement at any given frequency was one-dimensional, but target movement across two neighboring frequencies was two-dimensional; examining hand/target movements in this way thus provided two-dimensional insight into how the rotation/mirroring of hand responses varied across the frequency spectrum (see ‘Frequency-domain analysis’ in Materials and methods for details).

Similar to the trajectory-alignment analysis, these gain matrices should be close to the identity matrix at baseline the inverse of the matrix describing the perturbation if participants are able to perfectly compensate for the perturbation. We again visualized these frequency-dependent gain matrices by plotting their column vectors, which illustrates the effect of the matrix on the unit x and y vectors, only now we include a set of vectors for each pair of neighboring frequencies (Figure 5A: average across subjects, Figure 5—figure supplement 2A: single subjects). We also plotted the same information represented as colormapped gain matrices in Figure 5—figure supplement 1, similar to Figure 3A.

Figure 5 with 2 supplements see all
Adaptation and de novo learning exhibit distinct frequency-dependent signatures.

We estimated how participants transformed target motion into hand movement across different frequencies (i.e., gain matrix analysis). (A) Visualizations of the gain matrices relating target motion to hand motion across frequencies (associated gain matrices can be found in Figure 5—figure supplement 1). These visualizations were generated by plotting the column vectors of the gain matrices from one trial of each listed block, averaged across participants. Green and purple arrows depict hand responses to x- and y-axis target frequencies, respectively. Darker and lighter colors represent lower and higher frequencies, respectively. (B) Average of the two off-diagonal values of the gain matrices at different points during learning. Grey boxes indicate when the rotation or mirror reversal were applied. (C) (Top) Compensation angle as a function of frequency for the rotation group. (Bottom) Gain of movement orthogonal to the mirror axis for the mirror-reversal group. Green and black dashed lines show ideal compensation when the perturbation is or is not applied, respectively. All error bars in this figure are SEM across participants.

Figure 5—source data 1

This file contains the results of all statistical analyses performed on the data in Figure 5B.

https://cdn.elifesciences.org/articles/62578/elife-62578-fig5-data1-v2.xlsx

At baseline, participants in both groups responded to x- and y-axis target motion by moving their hands in the x- and y-axes, respectively, with similar performance across all target frequencies. Late in learning for the rotation group, participants successfully compensated for the perturbation – apparent through the fact that all vectors rotated clockwise during learning. The extent of compensation, however, was not uniform across frequencies; compensation at low frequencies (darker arrows) was more complete than at high frequencies (lighter arrows). For the mirror-reversal group, compensation during late learning occurred most successfully at low frequencies, apparent as the darker vectors flipping across the mirror axis (at 45° relative to the x-axis) from their baseline direction. At high frequencies, however, responses failed to flip across the mirror axis and remained similar to baseline.

To quantify these observations statistically, we focused again on the off-diagonal elements of the gain matrices from individual trials. The rotation group’s gain matrices were altered in the appropriate direction to counter the perturbation, showing a significant difference between the first trial of baseline and the last trial of late learning at all frequencies (Figure 5B; linear mixed-effects model (see ‘Statistics’ in Materials and methods for details about the model structure): interaction between group, block, and frequency, F(12,360)=3.39, p=0.0001; data split by frequency for post hoc Tukey’s range test: Bonferroni-adjusted p<0.05 for all frequencies; see Figure 5—source data 1 for p-values of all statistical comparisons related to Figure 5). Comparing the first trial of baseline and last trial of late learning for the mirror-reversal group revealed that the low-frequency gain matrices were also altered in the appropriate direction to counter the perturbation (Tukey’s range test: Bonferroni-adjusted p<0.001 for lowest three frequencies), but the high-frequency gain matrices were not significantly different from each other (Tukey’s range test: p>0.6 [not Bonferroni-adjusted] for highest three frequencies; baseline gain range: −0.18 to 0.18; late-learning gain range: −0.25 to 0.66).

Calculating a rotation matrix that best described the rotation group’s gain matrix at each frequency (using the same singular value decomposition approach applied to the alignment matrices) revealed that participants’ baseline compensation angle was close to 0° at all frequencies (Figure 5C). By late learning, compensation was nearly perfect at the lowest frequency but was only partial at higher frequencies. For the mirror-reversal group, the gains of participants’ low-frequency movements orthogonal to the mirror axis were positive at baseline and became negative during learning, appropriate to counter the perturbation. At high frequencies, by contrast, the gain reduced slightly during learning but never became negative. Thus, both groups of participants were successful at compensating at low frequencies but, at high frequencies, the rotation group was only partially successful and the mirror-reversal group was largely unsuccessful.

The gain matrices also recapitulated the post-learning trends from the trajectory-alignment analysis in Figure 3. In the first post-learning trial, the rotation group’s off-diagonal gains were significantly different from the first trial of baseline for all frequencies except the lowest (Figure 5B; Tukey’s range test: Bonferroni-adjusted p<0.003 for highest six frequencies). By contrast, there was no strong evidence that the mirror-reversal group’s post-learning matrices were significantly different from baseline (Tukey’s range test: p>0.04 (not Bonferroni-adjusted) for all frequencies; baseline gain range: −0.18 to 0.18; post-learning gain range: −0.49 to 0.37). Additionally, the post-learning gains differed significantly between the rotation and mirror-reversal groups, albeit only for three of the intermediate frequencies (Tukey’s range test: Bonferroni-adjusted p<0.001). Similar trends were evident in the compensation angles for the rotation group and orthogonal gains for the mirror-reversal group (Figure 5C). These data again suggest that the rotation group expressed aftereffects while the mirror-reversal group did not.

To summarize, compensation for the visuomotor rotation was expressed at both low and high frequencies of movement, and this compensation resulted in reach-direction aftereffects of similar magnitude to that reported in previous studies using point-to-point movements (Taylor et al., 2010; Fernandez-Ruiz et al., 2011; Taylor and Ivry, 2011; Bond and Taylor, 2015). This suggests that participants learned to compensate for the rotation through adaptation, that is, by adapting their existing baseline controller. In contrast, the mirror-reversal group only expressed compensation at low frequencies of movement, exhibiting little to no compensation at high frequencies, and did not exhibit aftereffects, suggesting that they did not learn through adaptation of an existing controller. Combined with the results from Figure 4 suggesting that participants did not utilize a re-aiming strategy while tracking, these data suggest that participants learned to counter the mirror reversal by building a new controller from scratch, that is, through de novo learning.

Learning in the rotation group also appeared to be, to some extent, achieved through de novo learning. The magnitude of aftereffects in this group (∼25°) was only a fraction of the overall compensation achieved (∼70°) during late learning, suggesting that implicit adaptation cannot entirely account for the rotation group’s behavior. The results from Figure 4 also suggest that the rotation group’s behavior could not be explained by a strategy of tracking the target through a series of re-aimed catch-up movements. Examining the time course of learning for both groups in Figure 5B, while the rotation group’s gains were overall higher than the mirror-reversal group’s, there was a striking similarity in the frequency-dependent pattern of learning between the two groups. We therefore conclude that the residual learning not accounted for by adaptation was attributable to the same de novo learning process that drove learning under the mirror reversal.

Examining the effect of re-aiming strategies on learning

Although the data suggest that participants did not primarily rely on a re-aiming strategy while tracking, participants likely did use such a strategy to learn to counter the rotation/mirror reversal while performing point-to-point reaches. How important might such cognitive strategies be for ultimately learning the tracking task? To better understand this, we performed a follow-up experiment with 20 additional participants. This experiment was similar to the main experiment except for the fact that participants experienced the rotation/mirror reversal almost exclusively in the tracking task, performing only 15 point-to-point reaches between the early and late learning tracking blocks compared to the 450 reaches in the main experiment (Figure 6A).

Figure 6 with 1 supplement see all
Making point-to-point reaches improves tracking performance, especially under mirror reversal.

(A) Participants learned to counter either a visuomotor rotation (n=10) or mirror-reversal (n=10). The experimental design was similar to the main experiment except point-to-point reaching practice was almost entirely eliminated; between the early- and late-learning tracking blocks, participants only performed 15 point-to-point reaches. The purpose of these reaches was not for training but simply to assess learning in the point-to-point task. (B–D) Gain matrix analysis, identical to that in Figure 5, performed on data from the follow-up experiment. (B) Visualization of the gain matrix from one trial of each listed block, averaged across participants. (C) Off-diagonal elements of the gain matrices, averaged across participants. (D) Computed rotation angle for the rotation group’s gain matrices (upper) and gain orthogonal to mirroring axis for the mirror-reversal group (lower), averaged across participants. All error bars in this figure are SEM across participants.

Figure 6—source data 1

This file contains the results of all statistical analyses performed on the data in Figure 6C.

https://cdn.elifesciences.org/articles/62578/elife-62578-fig6-data1-v2.xlsx

We applied the gain matrix analysis from Figure 5 to data from this experiment and found that our previous results were largely reproduced despite the very limited point-to-point training (Figure 6B–D). The rotation group exhibited aftereffects in the gain matrices (linear mixed-effects model [see ‘Statistics’ in Materials and methods for details about the model structure]: interaction between block, frequency, and group, F(12,360)=3.26, p=0.0002; data split by frequency for post hoc Tukey’s range test: Bonferroni-adjusted p<0.01 for four of seven frequencies; see Figure 6—source data 1 for p-values of all statistical comparisons related to Figure 6) which were significantly greater than that of the mirror-reversal group (Tukey’s range test: Bonferroni-adjusted p<0.0005 for two out of seven frequencies). In contrast, the mirror-reversal group did not express aftereffects (Tukey’s range test: p>0.4 [not Bonferroni-adjusted] for all seven frequencies; Figure 6C). Furthermore, the rotation group exhibited compensation at high frequencies (Tukey’s range test: Bonferroni-adjusted p=0.0073 at third highest frequency) whereas the mirror-reversal group did not (Tukey’s range test: p>0.5 [not Bonferroni-adjusted] for highest four frequencies). These trends were also evident in single participants (Figure 6—figure supplement 1). Thus, the follow-up experiment provided evidence that the effects we observed in the main experiment were replicable.

Directly comparing the results between the two experiments (comparing Figures 6C and 5B), we found that participants in the follow-up experiment exhibited significantly less compensation in the last trial of late learning compared to participants in the main experiment, as quantified by the off-diagonal gain (two-way ANOVA [see ‘Statistics’ in Materials and methods for details about the ANOVA]: main effect of experiment, F(1,252)=37.69, p<0.0001, with no significant interactions between any predictors; see Figure 6—source data 1 for more detailed statistics related to Figure 6). It is unclear, however, whether this reduced learning was attributable to participants being unable to develop a re-aiming strategy without point-to-point training, or whether it could be explained by the fact that participants simply spent less total time being exposed to the perturbations.

Therefore, while virtually eliminating point-to-point training may have diminished participants’ ability to learn the task, participants were still able to counter the perturbation to some extent, reproducing the most salient findings from the main experiment.

Discussion

In the present study, we tested whether participants could learn to successfully control a cursor to track a continuously moving target under either rotated or mirror-reversed visual feedback. Although previous work has established that participants can learn to compensate for these perturbations during point-to-point movements, this compensation often seems to depend upon the use of re-aiming strategies – a solution that is time-consuming and therefore does not seem feasible in a task in which goals are constantly changing.

We found that both groups’ tracking behavior was inconsistent with that of a re-aiming strategy, suggesting other mechanisms were used to compensate for these perturbations. The rotation group exhibited strong aftereffects once the perturbation was removed, amounting to an approximately 25° rotation of hand motion relative to target motion – consistent with previous findings in point-to-point tasks (Taylor et al., 2010; Fernandez-Ruiz et al., 2011; Taylor and Ivry, 2011; Bond and Taylor, 2015). This suggests that these participants learned to counter the rotation, at least in part, via adaptation. In contrast, participants who learned to compensate for the mirror-reversal showed no aftereffects, suggesting that they did not adapt their existing controller, but instead learned to compensate by establishing a de novo controller.

The role of re-aiming strategies in executing tracking behavior

In principle, a target can be tracked by executing a series of intermittent catch-up movements. However, our results suggest that this possibility was unlikely for three reasons. First, under both perturbations, a majority of participants’ tracking behavior could be accounted for by a linear model, and the additional variance in behavior that could be accounted for by a nonlinear model was comparatively small. This implies that participants tracked the target continuously, rather than intermittently, which would likely have introduced greater nonlinearities. Although it might be possible for very frequent catch-up movements to appear approximately linear, the frequency of such catch-up movements would have to be at least double the frequency of target motion being tracked (i.e., the Nyquist rate). The highest frequency at which participants were able to successfully compensate for the mirror reversal was around 1 Hz. This means participants would have had to generate at least two re-aimed movements per second to track the target smoothly at this frequency, a process that would have been fairly rapid and cognitively demanding over the course of a trial.

The second reason we reject the idea of repeated re-aiming is based on the delay between hand and target movement. Compensation for either of the perturbations introduced some additional tracking delay relative to baseline. However, this delay was less than 200 ms, which is smaller than would be expected if the participants had compensated by repeated strategic re-aiming. It has been demonstrated in some circumstances that re-aiming can occur in as little as 200 ms by caching the movement required for a given target location (Huberdeau et al., 2019; McDougle and Taylor, 2019). However, caching associations in this way appears to be limited to just two to seven discrete elements (McDougle and Taylor, 2019; Collins and Frank, 2012), and it seems doubtful that this mechanism could support a controller that must generate output when the state of the target (its location and velocity), as well as that of the hand, may vary in a continuous space.

Finally, participants’ anecdotal reports also suggest they did not utilize a re-aiming strategy. After the experiment was complete, we asked participants to describe how they performed the tracking task under the perturbations. The vast majority of participants reported that when they tried to think about how to move their hand to counter the perturbations, they felt that their tracking performance deteriorated. Instead, they felt their performance was best when they let themselves respond naturally to the target without explicitly thinking about how to move their hands. Participants’ disinclination to explicitly coordinate their hand movements provides further evidence against their use of a re-aiming strategy.

We believe, therefore, that it is unlikely that participants solved the tracking task under a mirror-reversal by using a deliberative re-aiming strategy that is qualitatively similar to that which has been described in the context of point-to-point reaching tasks. Instead, we believe that these participants constructed a new controller that instantiated a new, continuous mapping from current states and goals to actions.

However, it is possible that, given our experimental design, participants countered the perturbation in a way that is similar in some respects to traditional re-aiming and potentially indistinguishable from continuous control. Traditional accounts of re-aiming suggest that participants identify a fixed surrogate target location to aim their movements toward – effectively manipulating one of the inputs to the controller to achieve a particular desired output. Our results suggest that participants could not have performed the tracking task in this way. However, it is still possible for tracking to be performed by manipulating the input to a controller in a more general manner. For instance, the output of the tracking controller could depend on the instantaneous position and velocity of the target, and participants may have been able to counter the perturbation by manipulating these inputs to a fixed underlying controller in order to achieve output that would successfully track the target under the mirror reversal. Although this solution bears similarities to re-aiming, it differs significantly in that it entails modifying potentially many different inputs and doing so in a continuously changing manner. Such a solution would be unlikely to be amenable to the deliberative processes responsible for static re-aiming and, in composite, could be considered a de novo controller.

The role of re-aiming strategies in acquiring a de novo controller

Although our analyses revealed that participants did not primarily rely on an aiming strategy to execute continuous tracking movements, they could have initially depended on such a strategy to acquire the controller necessary to perform these movements. In a follow-up experiment, we tested whether limited practice in the point-to-point task would impair how well participants could learn to counter the rotation/mirror reversal. Although we found that both groups expressed less compensation for the perturbations compared to the main experiment, both groups still expressed some compensation, reproducing the qualitative features of learning from the main experiment. The fact that there are multiple explanations for this reduction in compensation (failure to develop a re-aiming strategy versus less time on task) makes it difficult to draw any strong conclusions from these results about what role re-aiming strategies play in acquiring a new controller.

However, previous evidence clearly demonstrates that people can learn to counter a mirror reversal using a re-aiming strategy when performing point-to-point reaches (Wilterson and Taylor, 2019). It is possible, therefore, that re-aiming strategies could contribute to acquiring a de novo controller. How exactly might such strategies contribute to learning? One possibility is that the deliberative computations performed when planning upcoming movements are used to help build a de novo controller. Alternatively, it may be easier for people to evaluate the quality of straight-line reaches (e.g., reach direction, movement time, task error) compared to tracking a pseudo-random trajectory, allowing them to update the parameters of a nascent controller more readily. Ultimately, the question of how a de novo controller is constructed is a major open question for future research.

Frequency-domain signatures of adaptation and de novo learning

The pattern of compensation under the rotation and mirror-reversal was frequency specific (Figure 5B), with the nature of compensation at high frequencies revealing distinct signatures of adaptation and de novo learning between the two groups. At low frequencies, both groups of participants successfully compensated for their perturbations. But at high frequencies, only the rotation group was able to compensate; behavior for the mirror-reversal group at high frequencies was similar to baseline behavior. There were similarities, however, in the time course and frequency dependence of learning under each perturbation (Figure 5B), with both groups exhibiting a steady increase in compensation over time, particularly at lower frequencies. Additionally, both groups’ compensation exhibited a similar diminution as a function of frequency.

We believe these results show that distinct learning processes drove two separate components of learning. One component, present only in the rotation group, was expressed uniformly at all frequencies and exhibited aftereffects, likely reflecting a parametric adjustment of an existing baseline controller, that is, adaptation. A second component of learning contributed to compensation in both groups of participants. This component was expressed primarily at low frequencies, exhibited a gradation as a function of frequency, and was not associated with aftereffects. We suggest this component corresponds to formation of a de novo controller for the task.

Although compensation for the rotation bore many hallmarks of adaptation, it also exhibited features of de novo learning seen in the mirror-reversal group, suggesting that participants in the rotation group employed a combination of the two learning processes. This is consistent with previous suggestions that residual learning under a visuomotor rotation that cannot be attributed to implicit adaptation may rely on the same mechanisms as those used for de novo learning (Krakauer et al., 2019). In summary, our data suggest that adaptation and de novo learning can be deployed in parallel to learn novel motor tasks.

Potential control architectures supporting multiple components of learning

The properties of adaptation and de novo learning we have identified here can potentially be explained by the existence of two distinct control pathways, each capable of different forms of plasticity but with differing sensorimotor delays. An inability to compensate at high frequencies (when tracking an unpredictable stimulus; see Roth et al., 2011) suggests higher phase lags, potentially due to greater sensorimotor delays or slower system dynamics; as phase lags approach the period of oscillation, it becomes impossible to exert precise control at that frequency. Therefore, we suggest that one control pathway may be slow but reconfigurable to implement arbitrary new controllers, while the other is fast but can only be recalibrated to a limited extent through adaptation.

It is possible that the two different control pathways that appear to learn differently might correspond to feedforward control (generating motor output based purely on target motion) and feedback control (generating motor output based on the cursor location and/or distance between cursor and target). Feedback control is slower than feedforward control due to the additional delays associated with observing the effects of one’s earlier motor commands on the current cursor position. The observed pattern of behavior may thus be due to a fast but inflexible feedforward controller that responds rapidly to target motion, but always expresses baseline behavior (potentially recalibrated via implicit adaptation) interacting with a slow but reconfigurable feedback controller that responds to both target motion and the current cursor position. At low frequencies, the target may move slowly enough that any inappropriate feedforward control to track the target is masked by corrective feedback responses. But at high frequencies, the target may move too fast for feedback control to be exerted, leaving only inappropriate feedforward responses. It is not possible to dissociate the contributions of feedforward and feedback control on the basis of our current dataset, but in principle our approach can be extended to do so by including perturbations to the cursor position in addition to target movement (Yamagami et al., 2019; Yamagami et al., 2020).

An alternative possibility is that there may be multiple feedforward controllers (and/or feedback controllers) that incur different delays. A fast but inflexible baseline controller, amenable to recalibration through adaptation, might interact with a slower but more flexible controller. This organization parallels dual-process theories of learning and action selection (Hardwick et al., 2019; Day and Lyon, 2000; Huberdeau et al., 2015) and raises the possibility that the de novo learning exhibited by our participants might be, in some sense, cognitive in nature. Although we have rejected the possibility that participants countered the perturbation by repeated strategic re-aiming, recent theories have framed the prefrontal cortex as a general-purpose network capable of learning to perform arbitrary computations on its inputs (Wang et al., 2018). From this perspective, it does not seem infeasible that such a network could learn to implement an arbitrary continuous feedback controller that could compensate for the imposed perturbation or continuously modulate the input to an existing controller, albeit likely at the cost of incurring an additional delay over controllers that support task performance in baseline conditions.

System identification as a tool for characterizing motor learning

Our characterization of learning made use of frequency-based system identification, a powerful tool that has been previously used to study biological motor control such as insect flight (Fuller et al., 2014; Sponberg et al., 2015; Roth et al., 2016), electric fish refuge tracking (Cowan and Fortune, 2007; Madhav et al., 2013), human posture (Oie et al., 2002; Kiemel et al., 2006), and human manual tracking (Yamagami et al., 2019; Zimmet et al., 2020). System identification and other sinusoidal perturbation techniques have previously been applied to characterize the trial-by-trial dynamics of learning from errors in adaptation tasks (Baddeley et al., 2003; Ueyama, 2017; Miyamoto et al., 2020). Our approach differs critically from these previous applications in that we use system identification to assess the state of learning and properties of the learned controller at a given time. In this latter sense, frequency-based system identification has not, to our knowledge, previously been applied to investigate motor learning. We have shown that this approach provides a powerful means to identify distinct forms of learning based on dissociable properties of the controllers they give rise to.

Our system identification approach has several advantages over other methods for studying motor control. In terms of practicality, this approach is more time efficient for data collection compared to the standard point-to-point reaches used in motor learning studies. Compared to time-domain methods, the frequency domain is particularly amenable for system identification given the rich suite of tools that have been developed for it (Schoukens et al., 2004). Moreover, our approach is also general as it can be applied to assess learning of arbitrary linear visuomotor mappings (e.g., 15° rotation, body-machine interfaces Mussa-Ivaldi et al., 2011). Under previous approaches, characterizing the quality of movements under different types of learned mappings (rotation, mirror-reversal) has necessitated different ad hoc analyses that cannot be directly compared (Telgen et al., 2014). In contrast, our frequency-based approach provides a general method to characterize behavior under rotations, mirror-reversals, or any linear mapping from effectors to a cursor, owing to our ‘multi-input multi-output’ approach of identifying the 2×2 transformation matrix relating target movement and hand movement.

While the system identification approach used in the present study does capture learning, the results obtained using this approach do warrant careful interpretation. In particular, one must not interpret the empirical relationship that we measure between the target and hand as equivalent to the input-output relationship of the brain’s motor controller. The former measures the response of the entire sensorimotor system to external input. The latter only measures how the controller sends motor commands to the body in response to input from the environment/internal feedback. Estimating the latter relationship requires a more nuanced approach that takes into account the closed-loop topology (Roth et al., 2014; Yamagami et al., 2019). Despite this, changes to the controller are still revealed using our approach; assuming that learning only drives changes in the input-output relationship of the controller – as opposed to, for example, the plant or the visual system – any changes in the overall target–hand relationship will reflect changes to the controller. Thus, our approach is a valid way to investigate learning.

Although the primary goal of our frequency-based analysis was to establish how participants mapped target motion into hand motion, system identification yields more detailed information than this; in principle, it provides complete knowledge of a linear system in that knowing how the system responds to sinusoidal input at different frequencies enables one to predict how the system will respond to arbitrary inputs. These data can be used to formally compare different possible control system architectures (Zimmet et al., 2020) supporting learning, and we plan to explore this more detailed analysis in future work.

Mechanisms and scope of de novo learning

We have used the term ‘de novo learning’ to refer to any mechanism, aside from implicit adaptation and re-aiming, that leads to the creation of a new controller. We propose that de novo learning proceeds initially through explicit processes before becoming cached or automatized into a more procedural form. There are, however, a number of alternative mechanisms that could be engaged to establish a new controller. One proposal is that de novo learning occurs by simultaneously updating forward and inverse models by simple gradient descent (Pierella et al., 2019). Another possibility is that a new controller could be learned through reinforcement learning. In motor learning tasks, reinforcement has been demonstrated to engage a learning mechanism that is independent of implicit adaptation (Izawa and Shadmehr, 2011; Cashaback et al., 2017; Holland et al., 2018) potentially via basal-ganglia-dependent mechanisms (Schultz et al., 1997; Hikosaka et al., 2002). Such reinforcement could provide a basis for forming a new controller. Although prior work on motor learning has focused on simply learning the required direction for a point-to-point movement, theoretical frameworks for reinforcement learning have been extended to continuous time and space to learn continuous controllers for robotics (Doya, 2000; Theodorou et al., 2010; Smart and Kaelbling, 2000; Todorov, 2009), and such theories could be applicable to how people learned continuous control in our experiment. However, it is important to note that regardless of the exact mechanism by which de novo learning occurs, our central claims from the present study still hold.

Although we have described the mirror-reversal task as requiring de novo learning, we acknowledge that there are many types of learning which might be described as de novo learning that this task does not capture. For example, many skills, such as playing the cello, challenge one to learn how to execute new movement patterns that one has never executed before (Costa, 2011). This is not the case in the tracking task which only challenges one to select movements one already knows how to execute. Also, in many cases, one must learn to use information from new sensory modalities for control (van Vugt and Ostry, 2018; Bach-y-Rita and W Kercel, 2003), such as using auditory feedback to adjust one’s finger positioning while playing the cello. Our task, by contrast, only uses very familiar visual cues. Nevertheless, we believe that learning a new controller that maps familiar sensory feedback to well-practiced actions in a novel way is a critical element of many real-world learning tasks (e.g., driving a car, playing video games) and should be considered a fundamental aspect of any de novo learning.

Ultimately, our goal is to understand real-world skill learning. We believe that studying learning in continuous tracking tasks is important to bring us closer to this goal since a critical component of many skills is the ability to continuously control an effector in response to ongoing external events, like in juggling or riding a bicycle. Studies of well-practiced human behavior in continuous control tasks has a long history, such as those examining the dynamics of pilot and vehicle interactions (McRuer and Jex, 1967). However, most existing paradigms for studying motor learning have examined only point-to-point movements. We believe the tracking task presented here offers a simple but powerful approach for characterizing continuous control and, as such, provides an important new direction for advancing our understanding of how real-world skills are acquired.

Materials and methods

Participants

Forty right-handed, healthy participants over 18 years of age were recruited for this study (24.28 ± 5.06 years old; 19 male, 21 female): 20 for the main experiment (Figures 25) and 20 for the follow-up experiment (Figure 6). Participants all reported having no history of neurological disorders. All methods were approved by the Johns Hopkins School of Medicine Institutional Review Board.

Experimental tasks

Participants made planar movements with their right arm, which was supported by a frictionless air sled on a table, to control a cursor on an LCD monitor (60 Hz). Participants viewed the cursor on a horizontal mirror which reflected the monitor (Figure 1B). Hand movement was monitored at 130 Hz using a Flock of Birds magnetic tracker (Ascension Technology, VT) positioned near the participants’ index finger. The (positive) x axis was defined as rightward and the y axis, forward. The cursor was controlled under three different hand-to-cursor mappings: (1) veridical, (2) 90° clockwise visuomotor rotation, and (3) mirror reversal about the 45° oblique axis in the (x,y)=(1,1) direction. Participants were divided evenly into two groups: one that experienced the visuomotor rotation (n=10; four male, six female) and one that experienced the mirror reversal (n=10; six male, four female). Both groups were exposed to the perturbed cursors while performing two different tasks: (1) the point-to-point task and (2) the tracking task. Each participant completed the experiment in a single session in 1 day.

Point-to-point task

Request a detailed protocol

To start a trial, participants were required to move their cursor (circle of radius 2.5 mm) into a target (gray circle of radius 10 mm) that appeared in the center of the screen. After 500 ms, the target appeared 12 cm away from the starting location in a random direction. Participants were instructed to move in a straight line, as quickly and accurately as possible to the new target. Once the cursor remained stationary (speed < 0.065 m/s) in the new target for 1 s, the target appeared in a new location 12 cm away, but constrained to lie within a 20 × 20 cm workspace. Different random target locations were used for each block. Blocks in the main experiment consisted of 150 reaches while blocks in the follow-up experiment (Figure 6) consisted of 15 reaches. To encourage participants to move quickly to each target, we provided feedback at the end of each trial about the peak velocity they attained during their reaches, giving positive feedback (a pleasant tone and the target turning yellow) if the peak velocity exceeded roughly 0.39 m/s and negative feedback (no tone and the target turning blue) if the peak velocity was below that threshold.

Tracking task

Request a detailed protocol

At the start of each trial, a motionless target (gray circle of radius 8 mm) appeared in the center of the screen, and the trial was initiated when the participant’s cursor (circle of radius 2.5 mm) was stationary (speed < 0.065 m/s) in the target. From then, the target began to move for 46 s in a continuous, pseudo-random trajectory. The first 5 s was a ramp period where the amplitude of the cursor increased linearly from 0 to its full value, and for the remaining 41 s, the target moved at full amplitude. The target moved in a two-dimensional, sum-of-sinusoids trajectory where the movement in each axis was parameterized by amplitude, a, frequency, ω, and phase, ϕ, vectors. The target’s position, r, along one axis at time, t, was computed as

(1) r=i=17aicos(2πtωi+ϕi).

For the x-axis, a=[2.31,2.31,2.31,1.76,1.30,0.97,0.73] (cm) and ω=[0.1,0.25,0.55,0.85,1.15,1.55,2.05] (Hz). For the y-axis, a=[2.31,2.31,2.31,1.58,1.03,0.81,0.70] (cm) and ω=[0.15,0.35,0.65,0.95,1.45,1.85,2.15] (Hz). The elements of ϕ were randomized for different blocks of the experiment, taking on values between [π,π). The amplitudes of the sinusoids for all but the lowest frequencies were proportional to the inverse of their frequency to ensure that each individual sinusoid had similar peak velocity. We set a ceiling amplitude for low frequencies in order to prevent target movements that were too large for participants to comfortably track.

Different frequencies were used for the x- and y-axes so that hand movements at a given frequency could be attributed to either x- or y-axis target movements. All frequencies were prime multiples of 0.05 Hz to ensure that the harmonics of any target frequency would not overlap with any other target frequency. The prime multiple design of the input signal ensured that there were no low-order harmonic relations between any of the component sinusoids on the input, making it likely that nonlinearities in the tracking dynamics would manifest as easily discernible harmonic interactions (i.e. extraneous peaks in the output spectra). Moreover, by designing discrete Fourier transform windows that were integer multiples of the base period (20 s, i.e., the inverse of the base frequency), any nonlinearities produced by taking the Fourier transform of non-periodic signals (i.e., non-integer multiples) were eliminated.

Participants were instructed to keep their cursor inside the target for as long as possible during the trial. The target’s color changed to yellow any time the cursor was inside the target to provide feedback for their success. For the main experiment, one block of the tracking task consisted of eight, 46 s trials, while for the main experiment, one block consisted of six 46 s trials. Within each block, the same target trajectory was used for every trial. For different blocks, we randomized the phases of the target sinusoids to produce different trajectories. This produced five different target trajectories for participants to track in the six tracking blocks. The trajectory used for baseline and post-learning were the same to allow a better comparison for aftereffects. All participants tracked the same five target trajectories, but the order in which they experienced these trajectories was randomized in order to minimize any phase-dependent learning effects.

Block structure

Request a detailed protocol

In the main experiment, we first assessed the baseline control of the rotation and mirror-reversal groups by having them perform one block of the tracking task followed by one block of the point-to-point task under veridical cursor feedback. We then applied either the visuomotor rotation or mirror reversal to the cursor and used the tracking task to measure their control capabilities during early learning. Afterwards, we alternated three times between blocks of point-to-point training and blocks of tracking. In total, each participant practiced their respective perturbation with 450 point-to-point reaches in between the early and late-learning tracking blocks. Finally, we measured aftereffects in the tracking task by removing the rotation/mirror reversal.

The follow-up experiment followed a similar block structure as the main experiment, but there were two differences of note. First, the number of point-to-point reaches was dramatically reduced per block to 15 reaches. Second, the number of point-to-point blocks was also reduced to 3 (one point-to-point block after the baseline, early, and late-learning tracking blocks), providing participants only 15 point-to-point reaches between the early and late-learning tracking blocks.

Software

All non-statistical analyses were performed in MATLAB R2018b (The Mathworks, Natick, MA). All statistical analyses were performed in R version 4.0.2 (RStudio, Inc, Boston, MA) using the nlme and lsmeans packages (R Development Core Team, 2016; Pinheiro et al., 2016; Lenth, 2016). Figures were created using Adobe Illustrator (Adobe Inc, San Jose, CA).

Point-to-point and trajectory-alignment analyses

Request a detailed protocol

In the point-to-point task, we assessed performance by calculating the angular error between the cursor’s initial movement direction and the target direction relative to the start position. To determine the cursor’s initial movement direction, we computed the direction of the cursor’s instantaneous velocity vector ∼150 ms after the time of movement initiation. Movement initiation was defined as the time when the cursor left the start circle on a given trial.

In the tracking task, we assessed performance by measuring the average mean-squared error between the hand and target positions for every trial. For the alignment matrix analysis, we fit a matrix, M^=[abcd], that minimized the mean-squared error between the hand and target trajectories for every trial. In the latter analysis, the mean-squared error was additionally minimized in time by delaying the target trajectory relative to the hand. (While the time-delay allowed for the fairest possible comparison between the hand and target trajectories in subsequent analysis, changing or eliminating the alignment did not qualitatively change our results.) We estimated M^ as:

(2) M^=argminM{[HxHy]M[TxTy]}

where H and T represent hand and target trajectories. These estimated M^ ’s were averaged element-wise across participants to generate the alignment matrices shown in Figure 3A. These matrices were visualized by plotting their column vectors, also shown in Figure 3A.

The off-diagonal elements of each participant’s alignment matrix were used to calculate the off-diagonal scaling, S, in Figure 3B:

(3) Srotation=-b+c2,Smirror=b+c2.

Compensation angles, θ, for the rotation group’s alignment matrices were found using the singular value decomposition, SVD(). This is a standard approach which, as described in Umeyama, 1991, identifies a 2D rotation matrix, R, that best describes M^ irrespective of other transformations (e.g., dilation, shear) (Figure 3C, left). Briefly,

(4) UΣVT=SVD(M^T),
(5) R=VUT

where U and V contain the left and right singular vectors and Σ contains the singular values. Note that R is a rotation matrix only if det(M^T)0, but R is a reflection matrix when det(M^T)<0. Although Umeyama, 1991 have described a method whereby all R can be forced to be a rotation matrix, we did not want to impose nonexistent structure onto R and, thus, did not analyze trials which yielded reflection matrices. However, this was not a major issue for the analysis as nearly all trials yielded rotation matrices (3205 of 3360 data points for experiment 1; 2230 of 2520 data points for experiment 2). Subsequently, θ was calculated as

(6) θ=atan2(R2,1,R1,1)

where atan2() is the 2-argument arctangent and the inputs to the arctangent are elements of R subscripted by the row and column numbers of the matrix.

Finally, for the mirror-reversal group, the scaling orthogonal to the mirror axis was found by computing how the matrix transformed the unit vector along the orthogonal axis (Figure 3C, right):

(7) Sorthogonal=12([1-1][abcd][1-1])=12(a-b-c+d).

Frequency-domain analysis

Request a detailed protocol

To analyze trajectories in the frequency domain, we applied the discrete Fourier transform to the target and hand trajectories in every tracking trial. This produced a set of complex numbers representing the amplitude and phase of the signal at every frequency. We only analyzed the first 40 s of the trajectory that followed the 5 s ramp period so that our analysis period was equivalent to an integer multiple of the base period (20 s). This ensured that we would obtain clean estimates of the sinusoids at each target frequency. Amplitude spectra were generated by taking double the modulus of the Fourier-transformed hand trajectories at positive frequencies.

The spectral coherence between signals was computed using Welch’s periodogram technique, implemented using the MATLAB function mscohere. Windowing was performed using a 1040-sample Blackman–Harris window with 50% overlap between windows. To evaluate the proportion of participants’ behavior that could be explained by a linear model, for every trial, we evaluated the single-input multi-output coherence at every frequency of target motion (‘linear coherence’), determining how target motion in one axis elicited hand movement in both axes. This best captured the linearity of participants’ behavior as using hand movement in only one axis for the analysis would only partially capture participants’ responses to target movement at a given frequency. To evaluate the additional proportion of participants’ behavior that could be explained by a nonlinear model but not a linear model, we computed the square root of the single-input single-output coherence (i.e., movements from the same axis) between hand movements from every pairwise combination of trials within each block (‘nonlinear coherence’). Because this nonlinear coherence is calculated from data across trials, it cannot be computed on a trial-by-trial basis so we averaged this coherence within blocks to obtain one coherence measure per block. We then averaged the linear coherence within blocks and subtracted the linear coherence from the nonlinear coherence.

During each 40 s stimulus period, we assumed the relationship between target position and hand position behavior was well approximated by linear, time-invariant dynamics; this assumption was tested using the coherence analysis described above. Under this assumption, pure sinusoidal target motion at each frequency should be translated into pure sinusoidal hand motion at the same frequency but with different magnitude and phase. The relationship between hand and target can therefore be described in terms of a 2×2 matrix of transfer functions describing the behavior of the system at each possible frequency:

(8) [Hx(ω)Hy(ω)]=P(ω)[Tx(ω)Ty(ω)],P(ω)=[pxx(ω)pxy(ω)pyx(ω)pyy(ω)].

Here, H(ω) and T(ω) are the Fourier transforms of the time-domain hand and target trajectories, respectively, and ω is the frequency of movement. Each element of P(ω) represents a transfer function relating a particular axis of target motion to a particular axis of hand motion; the first and second subscripts represent the hand- and target-movement axes, respectively. Each such transfer function is a complex-valued function of frequency, which can further be decomposed into gain and phase components, for example: 

(9) pxy(ω)=gxy(ω)ejϕxy(ω),

where j is the imaginary number, gxy(ω) describes the gain (ratio of amplitudes) between y-axis target and x-axis hand motion as a function of frequency, and ϕxy(ω) describes the corresponding difference in the phase of oscillation.

We used this phase (in radians) to obtain the frequency-dependent lag between hand and target movement, δ(ω), (in seconds) follows:

(10) δ(ω)=ϕ(ω)2πω.

The difference in δ(ω) between baseline and late learning was used to generate Figure 4C.

We estimated the elements of P(ω) for frequencies at which the target moved by first noting that, for x-axis frequencies ω, Ty(ω)=0. Consequently,

(11) [Hx(ω)Hy(ω)]=[pxx(ω)Tx(ω)pyx(ω)Tx(ω)],

and we can therefore estimate pxx(ω) and pyx(ω) as: 

(12) pxx(ω)=Hx(ω)Tx(ω),pyx(ω)=Hy(ω)Tx(ω).

We estimated pyx(ω) and pyy(ω) analogously at y-frequencies of target motion.

These estimates yielded two elements of the overall transformation matrix P(ω) at each frequency of target movement. In order to construct a full 2×2 matrix, we paired the gains from neighboring x- and y-frequencies, assuming that participants’ behavior would be approximately the same at neighboring frequencies. The resulting seven frequency pairings were (x then y frequencies reported in each parentheses in Hz): (0.1, 0.15), (0.25, 0.35), (0.55, 0.65), (0.85, 0.95), (1.15, 1.45), (1.55, 1.85), (2.05, 2.15).

The spatial transformation of target motion into hand motion at each frequency is described by the gain of each element of P(ω). However, gain and phase data can lead to certain ambiguities; for example, a positive gain with a phase of π radians is indistinguishable from a negative gain with a phase of 0. Conventionally, this is resolved by assuming that gain is positive. In our task, however, the sign of the gain was crucial to disambiguate the directionality of the hand responses (e.g., whether the hand moved left or right in response to upward target motion). We used phase information to disambiguate positive from negative gains. Specifically, we assumed that the phase lag of the hand response at a given frequency would be the same across both axes of hand movement and throughout the experiment, but the gain would vary:

(13) pxx(ω)gxx(ω)ejϕ~(ω),pyx(ω)gyx(ω)ejϕ~(ω).

For a given movement frequency, ϕ~(ω) was set to be the same as the mean phase lag during the baseline block, where the gain was unambiguously positive. This assumption enabled us to compute a signed gain for each transfer function by taking the dot product between the transfer function and ejϕ~(ω). This method thus yielded gains for each axis of hand motion, at each target frequency, and at each point during learning.

As we did for the transfer-function matrix P(ω), we paired the gains from neighboring frequencies to obtain a series of seven gain matrices which geometrically described how target motion was translated into hand motion from low to high frequencies. Similar to the alignment matrix analysis, visualizations of these gain matrices were constructed by plotting the column vectors of the matrices. Off-diagonal gain, rotation angle, and gain orthogonal to the mirroring axis were calculated in the same way as in Equations 2–6.

Statistics

The primary statistical tests for the main and follow-up experiments were performed using linear mixed-effects models. These models were fit using data from three parts of the study: (1) alignment matrix analysis in the main experiment, (2) gain matrix analysis in the main experiment, and (3) gain matrix analysis in the follow-up experiment. The data used in these models were the off-diagonal values of the transformation and gain matrices. In all models, data from the first trial of baseline, the last trial of late learning, and the first trial of post-learning were analyzed. No outlier rejection was performed for these analyses. Using Wilkinson notation, the structure of the model for the alignment matrix analysis was [off-diagonal scaling] ∼ [block of learning] * [perturbation group], while the structure for both gain matrix analyses was [off-diagonal gain] ∼ [block of learning] * [perturbation group] * [frequency of movement]. Data were grouped within subjects (subjects were considered a random effect of the model).

We subsequently performed post hoc statistical comparisons as needed for each of the linear mixed-effects models. For the alignment matrix analysis, we performed pairwise comparisons using Tukey’s range test. For the gain matrix analysis in the main and follow-up experiments, there was a three-way interaction between frequency and the other regressors, so we fit seven different mixed-effects models for each frequency of movement post hoc. We performed pairwise comparisons on these frequency-specific models using Tukey’s range test. Although this test corrects for multiple comparisons, it only corrected the p-values for comparisons within each of the seven frequency-specific models. Because we ran Tukey’s range test seven times in total, we applied an additional Bonferroni correction by multiplying the p-values by seven.

We used a two-way ANOVA to compare the late-learning gain matrices between the main and follow-up experiments. Similar to the linear mixed-effects analyses, we compared the off-diagonal elements of the matrices. No outlier rejection was performed for this analysis. Using Wilkinson notation, the structure of the ANOVA was [off-diagonal gain] ∼ [experiment] * [perturbation group] * [frequency of movement].

Data availability

The data and code used to produce the results in this study can be found in the Johns Hopkins University Data Archive (https://doi.org/10.7281/T1/87PH8T).

The following data sets were generated
    1. Yang CS
    2. Cowan NJ
    3. Haith AM
    (2021) Johns Hopkins University Data Archive
    Data and software associated with the publication "De novo learning versus adaptation of continuous control in a manual tracking task".
    https://doi.org/10.7281/T1/87PH8T

References

    1. Bock O
    2. Schneider S
    (2001)
    Acquisition of a sensorimotor skill in younger and older adults
    Acta Physiologica Et Pharmacologica Bulgarica 26:89–92.
    1. Craik KJW
    (1947)
    Theory of the human operator in control systems. i. the operator as an engineering system
    British Journal of Psychology 38:56–61.
    1. Fernández-Ruiz J
    2. Díaz R
    (1999)
    Prism adaptation and aftereffect: specifying the properties of a procedural memory system
    Learning & Memory 6:47–53.
    1. McRuer DT
    2. Jex HR
    (1967) A review of Quasi-Linear pilot models
    IEEE Transactions on Human Factors in Electronics HFE-8:231–249.
    https://doi.org/10.1109/THFE.1967.234304
  1. Software
    1. Pinheiro J
    2. Bates D
    3. DebRoy S
    4. Sarkar D
    5. Core Team R
    (2016)
    Nlme: Linear and Nonlinear Mixed Effects Models
    Nlme: Linear and Nonlinear Mixed Effects Models.
  2. Software
    1. R Development Core Team
    (2016) R: A Language and Environment for Statistical Computing
    R Foundation for Statistical Computing, Vienna, Austria.
  3. Conference
    1. Schoukens J
    2. Pintelon R
    3. Rolain Y
    (2004)
    Time domain identification, frequency domain identification Equivalencies! Differences?
     Proceedings of the 2004 American Control Conference. pp. 661–666.
  4. Conference
    1. Smart WD
    2. Kaelbling LP
    (2000)
    Practical reinforcement learning in continuous spaces
    Proceedings of the Seventeenth International Conference on Machine Learning. pp. 903–910.
  5. Conference
    1. Theodorou E
    2. Buchli J
    3. Schaal S
    (2010)
    Reinforcement learning of motor skills in high dimensions: a path integral approach
    2010 IEEE International Conference on Robotics and Automation. pp. 2397–2403.

Decision letter

  1. Timothy Verstynen
    Reviewing Editor; Carnegie Mellon University, United States
  2. Tamar R Makin
    Senior Editor; University College London, United Kingdom
  3. Timothy Verstynen
    Reviewer; Carnegie Mellon University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

All three reviewers (including the Reviewing Editor) were impressed with the changes to the manuscript. They felt that the revised work strongly justifies the conclusion that humans can rapidly and flexibly shift control policies in response to environmental perturbations.

Decision letter after peer review:

Thank you for submitting your article "De novo learning and adaptation of continuous control in a manual tracking task" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Timothy Verstynen as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Tamar Makin as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

Summary:

This work looks at "de novo learning" in the context of fast continuous tasks, i.e., shifts of control policies (or controllers), rather than parameter changes in existing policies, with visuomotor adaptation. In a set of 2 experiments, using a mixture of discrete point-to-point movement trials and continuous tracking of moving target trials, the authors set out to determine whether the structure of shifts between visual and proprioceptive information determines whether learning relies on adaptation or shifts in control policies. Using both the presence of post-shift aftereffects and trailwise model fitting, the authors find that, simple rotations of visual inputs of the hand lead primarily to changes in control parameters while mirror reversals lead to changes in the control policy itself. Although there was evidence for a mixture of adaptation and de novo learning in both conditions. The authors infer from this evidence that humans can rapidly and flexibly shift control policies in response to environmental perturbations.

In general, this was a very cleverly designed and executed set of studies. The theoretical framing and experimental design are clean and clear. The data is compelling on the existence of condition differences. However, all three reviewers identified significant concerns that should be addressed.

Essential revisions:

1. Inferential logic

Reviewer #1 pointed out that there are two key parts to the analyses used to infer that mirror-rotations lead to de novo policy shifts while rotations lead to adaptation. The first is the presence of post-perturbation aftereffects. While we clearly see stronger aftereffects in the rotation condition than in the mirror reversal condition, suggesting a difference in fundamental control mechanisms, it is not clear why control policy shifts are the only alternative explanation for attenuated aftereffects. I'm pretty sure that this is just a confusion based on how the problem is posed in the paper.

The second are the alignment matrices (in both immediate hand position and movement frequency spaces), that are estimated based on model fits to the data. I'll consider both in turn. Perhaps more problematically, the alignment matrices (Figure 3A) and vectors (Figure 3A, 5B, 6B), based on the model fits, show a very high degree of variability across conditions and do not perfectly align to the simple predictions shown in Figure 3A. While the reviewer agrees that if you squint on the mean vector direction, they look qualitatively consistent with the models, but only qualitatively. In fact, the fits to the "ideal" shifts or rotations (Figure 5C, 6C) suggest only partial alignment to the pure models. How are we sure that this isn't reflecting an alternative mechanism, instead of partial de novo learning?

In both the aftereffect and alignment fit analyses, the inference for de novo learning seems to be based on either a null (i.e., no aftereffect in mirror-rotation) or partial fits to a specific model. This leaves the main conclusions on somewhat shaky ground.

Reviewer #2 raised similar concerns, pointing out that the authors introduce the concept of de novo learning in contrast to both error-driven adaptation and re-aiming: 'a motor task could be learned by forming a de novo controller, rather than through adaptation or re-aiming.' However, the discussion reframes de novo learning as purely in contrast with implicit adaptation: '[…] de novo learning refers to any mechanism, aside from implicit adaptation, that leads to the creation of a new controller'. While this apparent shift in perspective is likely due to their results and realistically represents the scientific process, this shift should be more explicitly communicated.

As explicitly raised in the discussion and suggested in the introduction, the authors have categorized any learning process that is not implicit adaptation as a de novo learning process. To substantiate this conceptual decision, the authors should further explain why motor learning unaccounted for by established learning processes should be accounted for by a de novo learning process.

This same reviewer also pointed out that, participants could not learn mirror-reversal under continuous tracking without the point-to-point task, which the authors interpret to mean that re-aiming is important for the ‘acquisition’ of a de novo controller. This suggests that re-aiming may not be important for the ‘execution’ of a de novo controller.

However, the frequency-based performance analysis presented in the main experiment would seem to suggest otherwise. As mentioned in the introduction, low stimulus frequencies allow a catch-up strategy. Both rotation and mirror groups were successful at compensating at low frequencies, but the mirror-reversal group was largely unsuccessful at high frequencies. Assuming that higher frequencies inhibit cognitive strategy, this suggests to me that catch-up strategies might be essential to mirror-reversal, possibly not only during learning but also during execution.

Further, the authors note that, in the rotation group, aftereffects only accounted for a fraction of total compensation, then suggest that residual learning not accounted for by adaptation was attributable to the same de novo learning process driving mirror reversal. This framing makes it unclear to me how the authors think re-aiming fits into the concept of a de novo learning process (e.g. Is all learning not driven by implicit adaptation de novo learning? What about the role of re-aiming?)

Reviewer #3 points out that in the abstract, the last line says, 'Our results demonstrate that people can rapidly build a new continuous controller de novo and can flexibly integrate this process with adaptation of an existing controller'. It's not clear if the authors have shown the latter definitively. What is the reasoning for this statement, "flexibly integrate this process with adaptation of an existing controller"? It would seem you would need the same subjects to perform both experimental tasks (mirror reversal and VMR) concurrently to make this claim.

Reviewer #3 also points out that, on lines 339-342, the results show that mirror-reversal learning is low at high frequencies (Figure 5B). The authors interpret this as reason to believe that this is actually de-novo learning and not adaptation of an existing controller. This seems somewhat unfounded. Could it be that de novo learning performs well at low frequency, through 'catch-up' movements, but not at high frequencies? Do the authors have a counter argument for this explanation?

On lines 343 – 350, Reviewer #3 points out that the authors ascribe the difference between after-effects and end of learning to be due to de-novo learning even in the rotation group. However, that difference would likely be due to the use of explicit strategy during learning and its disengagement afterwards, or perhaps a temporally labile learning. Can the authors rule these possibilities out? What were the instructions given at the end of the block and how much time elapsed?

2. Linearity analysis

Reviewer #1 reported having a hard time understanding the analysis leading to the conclusion that there is a linear relationship between target motion and hand motion. The logic of the spectral analysis was not clear to me, and the results shown in Figure 4 were not intuitive. In addition, there was no actual quantification used to make a conclusion about linearity. Thus, it was difficult to determine whether this aspect of the authors' conclusion (a critical inference for them to justify their main conclusion) was correct.

Reviewer #2 raised similar concerns, pointing out that using linearity as a metric for mechanistic inference has limitations.

– The absence of learning (errors) would present as nonlinearity.

– The use of cognitive strategy could present as nonlinearity.

– It doesn't seem possible to parse the two mechanisms, especially as you might expect both an increase in error at the beginning of learning and possibly an intervening cognitive strategy at the beginning of learning.

Given these issues, a more grounded interpretation is that linearity simply represents real-time updating. If the relationship between the cursor and the hand is nonlinear, then updating is not in real time.

The data shown in Figure 4B do not appear to provide clear evidence that the relationship between the cursor and the hand was approximately linear. Currently, it seems equally plausible to say that the data are approximately non-linear. Establishing a criterion for nonlinearity would be useful (e.g. shuffling a linear response for comparison). This was also pointed out by Reviewer #3 who pointed out that details about frequency analysis are buried deep in the methods (around line 711), especially how the hand-target coherence (shown in 4B) is calculated. It would be helpful to include some of these details in the main text. For example, it is currently very difficult to understand the relationship when from moving from Figure 4A to 4B.

Reviewer #3 raises a similar concern. The authors show the tracking strategies participants applied by investigating the relationship between hand and target movement. The linear relationship would suggest that participants tracked the target using continuous movements. In contrast, a nonlinear relationship would suggest that participants used an alternative tracking strategy. The authors only state this relationship is based on figure 4 but it seems do not provide any proof of the linearity. It would be more convincing to provide an analysis to show that the relationship is indeed linear or nonlinear.

3. Statistical results

Reviewer #1 points out that any of the key statistical results were buried in the main text and some were incompletely reported. Can the authors provide a table (or set of tables) of the key statistics, including at least the value of the statistical test itself and the p-value, if not also estimates of confidence on the estimates?

Reviewer #3 also points out that outlier rejection based on some subjects who had greatly magnified, or attenuated data seems like it might be biasing the data. Also, the outlier rejection criteria used (>1.5 IQR) seems very stringent. Furthermore, it appears there was no outlier rejection on the main experiment. It would be good to be consistent across experiments.

4. Experiment 2

The intention for experiment 2 is to see how much training on the point-to-point task influenced adaptation mechanisms during the tracking task. Yet, this experiment still included extensive exposure to the point-to-point task. Just not as much as in experiment 1. Given this, how can an inference be cleanly made about the influence of one task on the other? Wouldn't the clean way to ask this question be to just not run the point-to-point tracking task at all?

5. Frequency analysis

The authors state that "The failure to compensate at high frequencies.… is consistent with the observation that people who have learned to make point-to-point movements under mirror-reversed feedback are unable to generate appropriate rapid corrections to unexpected perturbations." This logic is not clear. How is this inferred based on which movement frequencies show an effect, and which do not, leading to this conclusion?

6. Clarity of logic

Reviewer #3 states that would be helpful if the authors could provide more background/context on their view of de novo learning and explanations on relationship between de novo learning and the adapted controller model. For example, why does the lack of aftereffects under the mirror-reversal imply that the participants did not counter this perturbation via adaptation and instead engaged the learning by forming a de novo controller (Line 199)? Is the reasoning purely behavioral observations, or is there a physiological basis for this assertion?

In addition, this same reviewer points out that on lines 197-199: The reason for the lack of after-effects in the mean-squared error analysis is a little vague. It took a few tries to understand the reasoning. It would be good to spell this out a little more clearly. In lines 223-225: The logic behind why coupling across axes is not nonlinear behavior seems to be missing. It's quite unclear and currently difficult to understand. It would be very helpful to spell this out too.

7. Learning in the visuomotor rotation (VMR) condition.

Reviewer #3 also shows that surprisingly, there is no measurement of aiming in the learning to VMR. Several motor learning studies (several the authors cite) show that learning in VMR is a combination of implicit and explicit. It is understood that this is not possible in the continuous tracking task, but can certainly be done in the point to point task. Is there a reason this was not done? Wouldn't this have further supported the author's claim of an existing controller?

https://doi.org/10.7554/eLife.62578.sa1

Author response

Essential revisions:

1. Inferential logic

Reviewer #1 pointed out that there are two key parts to the analyses used to infer that mirror-rotations lead to de novo policy shifts while rotations lead to adaptation. The first is the presence of post-perturbation aftereffects. While we clearly see stronger aftereffects in the rotation condition than in the mirror reversal condition, suggesting a difference in fundamental control mechanisms, it is not clear why control policy shifts are the only alternative explanation for attenuated aftereffects. I'm pretty sure that this is just a confusion based on how the problem is posed in the paper.

We thank the reviewer for this comment. We argue that there are three different possibilities for how the brain learns new motor tasks: (1) adaptation (parametrically changing the properties of an existing controller), (2) re-aiming (specifying an alternate movement goal to an existing controller so as to achieve a particular desired outcome), and (3) de novo learning (switching to an alternative controller which has been newly instantiated). We argue that neither adaptation nor re-aiming can explain participants’ behavior in the mirror reversal group (based on the lack of aftereffects and the linearity analysis in Figure 4). We therefore conclude that this group compensated for the mirror reversal by building a new controller.

Our reasoning that learning must either be achieved by leveraging an existing controller or instantiating a new one parallels that of Telgen and colleagues (2014)1, who first introduced the idea that mirror reversal might be learned by forming a new controller de novo. They, however, overlooked the possibility that learning to generate point-to-point movements under a mirror reversal might be accomplished by a simple re-aiming mechanism. Indeed, subsequent work has shown that this is likely to be the case2, calling into question the conclusion that participants are able to build a new controller in order to compensate for a mirror reversal.

We think that one likely source of confusion is that our use of the term “de novo learning” appears to describe a specific putative mechanism of learning. However, this is not what we mean; by the phrase “de novo learning,” we intend to describe any process by which a de novo controller might be created. We are agnostic as to the specific learning mechanisms (e.g. reinforcement learning, error-driven learning, automatization of cognitive strategies) that might bring this about.

We have revised much of the Introduction to clarify our logic and more concretely pose our conceptual framework to the reader. Additionally, we have addressed these issues in lines 461-547 of the Discussion.

The second are the alignment matrices (in both immediate hand position and movement frequency spaces), that are estimated based on model fits to the data. I'll consider both in turn. Perhaps more problematically, the alignment matrices (Figure 3A) and vectors (Figure 3A, 5B, 6B), based on the model fits, show a very high degree of variability across conditions and do not perfectly align to the simple predictions shown in Figure 3A. While the reviewer agrees that if you squint on the mean vector direction they look qualitatively consistent with the models, but only qualitatively. In fact, the fits to the "ideal" shifts or rotations (Figure 5C, 6C) suggest only partial alignment to the pure models. How are we sure that this isn't reflecting an alternative mechanism, instead of partial de novo learning?

We would like to clarify that the vectors illustrated in Figure 3A are not intended as predictions, but rather as a reference to help illustrate to the reader what ideal compensation for the mirror reversal should look like in this type of plot. The imperfect alignment between this reference and participants’ behavior does indeed show that compensation is only partial. However, we would not characterize this as “partial de novo learning”. Rather, the compensation that did occur was fully achieved by creating a new controller de novo, even though it did not achieve full compensation.

We also want to emphasize that the plotted vectors should not really be thought of as “model fits”. They are direct estimates of the (frequency-dependent) transformation relating target movement to hand movement and, in this respect, they are better thought of as more akin to initial movement direction for point-to-point reaches, rather than as model fits.

We have revised the manuscript to clarify the interpretation of these matrices in lines 161-163 and 167-171.

In both the aftereffect and alignment fit analyses, the inference for de novo learning seems to be based on either a null (i.e., no aftereffect in mirror-rotation) or partial fits to a specific model. This leaves the main conclusions on somewhat shaky ground.

As addressed in response 1.2, the reference in Figure 3A was never intended to represent a ‘model’ of de novo learning. The alignment/gain matrix analyses are a means of quantifying the behavior of participants and was never meant to provide a prediction under one hypothesis or another. So the inference is not based on a ‘partial fit to a specific model’. Instead, the inference is based on the differences in aftereffects across conditions.

Regarding the null result for the after-effects, our conclusion is not solely drawn based on the absence of an aftereffect but rather is based on the difference in aftereffects across conditions. We realized that we had neglected to report these statistical differences across groups in the original manuscript. We agree that if we had only tested the mirror-reversal group this would be a significant weakness, but the comparison to the rotation group shows a very clear difference in learning processes.

We have now included explicit comparisons of aftereffect size across groups in the Results in lines 197-200, 397-399, and 439-441.

Reviewer #2 raised similar concerns, pointing out that the authors introduce the concept of de novo learning in contrast to both error-driven adaptation and re-aiming: 'a motor task could be learned by forming a de novo controller, rather than through adaptation or re-aiming.' However, the discussion reframes de novo learning as purely in contrast with implicit adaptation: '[…] de novo learning refers to any mechanism, aside from implicit adaptation, that leads to the creation of a new controller'. While this apparent shift in perspective is likely due to their results and realistically represents the scientific process, this shift should be more explicitly communicated.

We thank the reviewer for pointing out this passage, which was poorly worded. This statement was not intended to mark a shift in the central claim of our paper, and it was an oversight not to explicitly mention re-aiming here. Re-aiming does not lead to the creation of a new controller; rather, it works by altering the movement goal that is fed to an existing controller. We argue that the mirror reversal group’s tracking behavior could not be explained by either adaptation or re-aiming. We have corrected this statement in lines 658-659 to maintain clearer and more consistent messaging throughout the paper.

As explicitly raised in the discussion and suggested in the introduction, the authors have categorized any learning process that is not implicit adaptation as a de novo learning process. To substantiate this conceptual decision, the authors should further explain why motor learning unaccounted for by established learning processes should be accounted for by a de novo learning process.

We thank the reviewer for this comment. We do define the term “de novo learning” broadly as any learning that does not proceed either by adaptation or by re-aiming. This follows the same framing as Telgen and colleagues (2014)1, except that we also consider re-aiming as a possible means of compensation.

We also think a potentially confusing aspect of our framing was a failure on our part to clearly distinguish the product of learning (a de novo controller) and the learning process itself (de novo learning). While the former is well defined, the latter is not. We do not make any qualitative claims about the learning process that brings about the instantiation of this new controller. We do believe, however, that our findings rule out both adaptation and re-aiming as potential learning processes. When we refer to ‘de novo learning processes,’ we intend to refer to whatever processes are responsible for learning a new controller (after having ruled out adaptation or re-aiming). At present, little is understood about how such learning might proceed. Our central claim is, however, agnostic to the exact mechanism by which a new controller is built, and our goal was not to characterize the learning mechanism itself but rather to characterize the properties of the learned behavior.

We have revised our Discussion (see section titled “Mechanisms and Scope of de novo Learning”) to better explain our reasoning on this point. We have also reframed the Introduction and key parts of the Results to better emphasize that our paper identifies the formation of a new controller (in contrast to adaptation of an existing controller or leveraging an existing controller by re-aiming) and avoid giving the impression that our experiments identify the process by which this new controller is established.

This same reviewer also pointed out that, participants could not learn mirror-reversal under continuous tracking without the point-to-point task, which the authors interpret to mean that re-aiming is important for the ‘acquisition’ of a de novo controller. This suggests that re-aiming may not be important for the ‘execution’ of a de novo controller.

However, the frequency-based performance analysis presented in the main experiment would seem to suggest otherwise. As mentioned in the introduction, low stimulus frequencies allow a catch-up strategy. Both rotation and mirror groups were successful at compensating at low frequencies but the mirror-reversal group was largely unsuccessful at high frequencies. Assuming that higher frequencies inhibit cognitive strategy, this suggests to me that catch-up strategies might be essential to mirror-reversal, possibly not only during learning but also during execution.

We thank the reviewer for raising this very important point regarding whether the mirror-reversal group may be using a catch-up strategy. The fact that participants do not appropriately compensate for the mirror reversal at high frequencies may be because the time required to deploy a catch-up (i.e., re-aiming) strategy is too long to be effective at high frequencies (i.e., longer than the period of the sinusoids). Previously, we noted that the fast-paced tracking task would have precluded time-consuming re-aiming and appealed to the idea that participants behavior was approximately linear as evidence that they were not performing intermittent ‘catch-up’ movements. We agree with the reviewer, however, that this was not entirely rigorous. We realized that even if participants perform a series of discretely planned movements, their overall behavior might appear linear depending on how rapidly the movements could be planned.

We have now included an additional analysis (Figure 4C) which we believe provides more convincing evidence that participants did not solve the mirror-reversal through a series of discretely planned ‘catch-up’ movements. We examined the lag between hand and target movements to determine whether this lag provided enough time to apply a re-aiming strategy under the rotation/mirror reversal. Based on previous literature3,4, we expected that if one were to use a re-aiming strategy to counter a large rotation/mirror reversal, this would incur an additional ~300 ms of reaction time on top of that required at baseline. However, in measuring the increase in lag between baseline and late learning, we found that the lag only increased by an average of 83 and 191 ms for the rotation and mirror-reversal groups, respectively. Together with the coherence analysis, this suggests that participants did not employ a catch-up strategy. We have included the lag analysis in lines 279-297 in the Results.

Additionally, participants’ anecdotal reports also suggest they did not utilize a re-aiming strategy. After the experiment was complete, we asked participants to describe how they performed the tracking task under the perturbations. The vast majority of participants reported that when they tried to think about how to move their hand to counter the perturbations, they felt that their tracking performance deteriorated. Instead, they felt their performance was best when they let themselves respond naturally to the target without explicitly thinking about how to move their hands. Participants’ disinclination to explicitly coordinate their hand movements provides further evidence against their use of a re-aiming strategy. We have mentioned these informal reports in lines 500-507.

We do acknowledge that we cannot entirely rule out the possibility that participants used a re-aiming strategy. However, in order for re-aiming to be viable, it would have to be extremely rapid—more rapid than almost all accounts of re-aiming have suggested. Moreover, cases where participants have been found to apply re-aiming strategies extremely rapidly (of the order of ~200ms4,5) by “caching” the solution for each target tend to include only a small number of static targets with a fixed start position. This strategy does not appear to be scalable to cases where there are 12 or more possible targets4. In the tracking task, the possible state space of the target and hand is vastly greater than for center-out reaching tasks, making it unlikely that a re-aiming + caching solution could be feasible, at least as it is currently understood. We have revised the section in the Discussion titled “The Role of Re-aiming Strategies in Executing Tracking Behavior” to address these points.

Further, the authors note that, in the rotation group, aftereffects only accounted for a fraction of total compensation, then suggest that residual learning not accounted for by adaptation was attributable to the same de novo learning process driving mirror reversal. This framing makes it unclear to me how the authors think re-aiming fits into the concept of a de novo learning process (e.g. Is all learning not driven by implicit adaptation de novo learning? What about the role of re-aiming?)

As described in responses 1.1, 1.4, and 1.5, our original discussion may have been unclear about how we view the relationship between de novo learning and re-aiming. It perhaps incorrectly gave the impression that we believe learning is a dichotomy between implicit adaptation and de novo learning and that we believe re-aiming is a form of de novo learning. As described in responses 1.1, 1.4, and 1.5, we have revised the Introduction, Results, and Discussion to clarify our conceptualization of these learning processes and more carefully explain the differences between a de novo learned controller and use of a re-aiming strategy.

Reviewer #3 points out that in the abstract, the last line says, 'Our results demonstrate that people can rapidly build a new continuous controller de novo and can flexibly integrate this process with adaptation of an existing controller'. It's not clear if the authors have shown the latter definitively. What is the reasoning for this statement, "flexibly integrate this process with adaptation of an existing controller"? It would seem you would need the same subjects to perform both experimental tasks (mirror reversal and VMR) concurrently to make this claim.

We agree with the reviewer that our claim that de novo learning can “flexibly integrate” with adaptation was not as clear as it should be. Our intention was to point out that de novo learning and adaptation can operate in tandem to learn new tasks, as seen in the rotation group (explained in lines 343–350 and more extensively discussed in lines 424–445 of the initially submitted manuscript). The basis for this conclusion is that these participants expressed aftereffects, which suggested they engaged adaptation during learning, but the aftereffect’s magnitude was only a fraction of the total compensation exhibited at late learning. This suggested that some component of the compensation for the rotation was immediately disengaged after the rotation was removed, and we believe this component to be de novo learning. We have changed the words “flexibly integrate” in the Abstract to “simultaneously deploy” to communicate our conclusion more clearly. We have also added a sentence in the discussion to emphasize this point (lines 570-571).

Regarding the reviewer’s suggestion to have subjects perform both the mirror reversal and rotation simultaneously, this visuomotor perturbation simply amounts to a mirror reversal where the mirroring axis is oriented differently from that of the originally applied reversal. Thus, we would expect that this perturbation would only engage de novo learning and not adaptation.

Reviewer #3 also points out that, on lines 339-342, the results show that mirror-reversal learning is low at high frequencies (Figure 5B). The authors interpret this as reason to believe that this is actually de-novo learning and not adaptation of an existing controller. This seems somewhat unfounded. Could it be that de novo learning performs well at low frequency, through 'catch-up' movements, but not at high frequencies? Do the authors have a counter argument for this explanation?

We thank the reviewer for pointing out our claim here was unclear. We were not suggesting that the mirror reversal group’s lack of compensation at high frequencies is a reason to believe that they engaged de novo learning. Instead, we claim that this group engaged de novo learning because their behavior was inconsistent with adaptation (based on lack of aftereffects) and the use of a re-aiming strategy (based on analysis in Figure 4).

Compensating for the perturbation via frequent ‘catch-up’ movements would not be a form of de novo learning, but rather repeated re-aiming. The issue of whether the frequency-dependence of compensation under the mirror-reversal (and rotation) might reflect a series of ‘catch-up’ movements is an important one which the reviewers’ comments have prompted us to consider more deeply and provide better evidence against. We have provided a more extensive argument in response 1.6 detailing why we do not believe that participants’ performance in the mirror-reversal group is consistent with a series of catch-up movements.

To more clearly explicate our claim that the mirror-reversal group learned by creating a de novo controller, we have edited lines 408-413 to emphasize that this claim is based on the fact that this group’s behavior was inconsistent with adaptation and a re-aiming strategy.

On lines 343 – 350, Reviewer #3 points out that the authors ascribe the difference between after-effects and end of learning to be due to de-novo learning even in the rotation group. However, that difference would likely be due to the use of explicit strategy during learning and its disengagement afterwards, or perhaps a temporally labile learning. Can the authors rule these possibilities out? What were the instructions given at the end of the block and how much time elapsed?

Before the post-learning block, we verbally informed participants that the rotation/mirror reversal would be disengaged and that their cursor control would revert to normal.

The time elapsed between the last perturbation trial and the first aftereffect trial was not different from the time elapsed in between other blocks of perturbation trials (~30 seconds). Therefore, the lack of aftereffects cannot be attributed to temporal lability of the learned compensation.

As we discussed in detail in response 1.6, we do not believe that the difference between aftereffects and performance at the end of learning can be ascribed to use of an explicit strategy—at least not in a form that is qualitatively similar to what has been extensively described in the case of point-to-point movements. As we argued in the paper initially, explicit strategies are known to be time-consuming to implement, which would prohibit them from being used to solve a continuous tracking task like the one we used. We have strengthened this argument by analyzing in more detail the additional tracking delays introduced by having to track the perturbation (<200 ms) and comparing this to previous estimates of the time required for re-aiming (~300ms). Although there is some evidence that explicit strategies can become cached and, thereby, deployed much more rapidly, this appears to occur only in very limited scenarios with very few targets and does not seem possible when there are 12 or more targets, let alone a whole workspace (as there is in the tracking task). We have updated lines 279-297 to include this new analysis.

2. Linearity analysis

Reviewer #1 reported having a hard time understanding the analysis leading to the conclusion that there is a linear relationship between target motion and hand motion. The logic of the spectral analysis was not clear to me, and the results shown in Figure 4 were not intuitive. In addition, there was no actual quantification used to make a conclusion about linearity. Thus, it was difficult to determine whether this aspect of the authors' conclusion (a critical inference for them to justify their main conclusion) was correct.

We agree with the reviewer that our approach to quantifying linearity was not explained clearly enough. We have extensively revised the section in the Results titled “Participants Used Continuous Movements to Perform Manual Tracking” to include more intuitive explanations about these analyses. We also agree that the logic relating the linearity analysis to the main conclusion was somewhat loose and will attempt to clarify them below.

Using the amplitude spectra and coherence analyses, we attempted to demonstrate that participants’ behavior was consistent with that of a linear system. By showing that behavior was linear, this would justify our use of linear systems tools for subsequent analyses. Additionally, this would provide evidence that participants’ movements responded continuously to the varying location of the target, rather than in an intermittent manner. According to linear systems theory, a linear system will always translate sinusoidal inputs into sinusoidal outputs at the same frequency, albeit potentially scaled in amplitude and shifted in phase. Thus, by showing that participants’ behavior is linear, this would suggest that they were attempting to track the target continuously.

Another point we did not explain clearly enough is the importance of spectral coherence for assessing linearity. We based this on a theory outlined by Roddey et al. (2000)6, which we will briefly summarize here. Roddey and colleagues showed that the entire input-output relationship of any arbitrary system responding to an arbitrary input can be broken up into three components: (1) a component that can be explained by a linear model, (2) a component that can be explained by a nonlinear model but not a linear model, and (3) a component that cannot be explained by any model (i.e., noise). Roddey and colleagues showed that the first component—the linear part of the system’s response—is proportional to the spectral coherence between the input and output (input-output coherence). Coherence is analogous to the correlation between two signals in the time domain and is therefore bounded between 0 and 1. If the coherence between an input and output is 0.5, then this implies that 50% of the system’s response can be explained by a linear model. We found that at baseline, late learning, and post-learning, the input-output coherence was high, ranging roughly between 0.5–0.8. Furthermore, by examining the coherence between different outputs, we were also able to attribute the remaining variance not accounted for by a linear model to noise, rather than to a systematic nonlinearity in participants’ behavior (see response 2.2 for more detail on this point). Therefore, a significant majority of the variance in participant’s behavior could be described by a linear model, justifying our use of linear analysis methods in subsequent analyses.

One important new analysis we have now included in the paper examines whether we could rule out the possibility that participants were employing catch-up movements to track the target (Figure 4C). This analysis involved using the frequency-dependent phase lags in the Fourier-transformed data to estimate the increase in tracking delay incurred late in learning and comparing this to previous estimates of the time cost of re-aiming. Our results suggest that participants responded to target movement too quickly to successfully deploy a re-aiming strategy. We believe this new analysis more directly addresses the plausibility of re-aiming strategies than the amplitude spectra and coherence analyses by themselves. See response 1.6 for a more detailed discussion of this new analysis.

The data shown in Figure 4B do not appear to provide clear evidence that the relationship between the cursor and the hand was approximately linear. Currently, it seems equally plausible to say that the data are approximately non-linear. Establishing a criterion for nonlinearity would be useful (e.g. shuffling a linear response for comparison).

As described in response 2.1, the coherence between target and hand movement provides an estimate for the proportion of participants’ behavior that can be explained by a linear model. The residual variance unexplained by a linear model can be attributed to either nonlinearity (as the reviewer suggests) or noise. The theory from Roddey and colleagues also provides a method to estimate the proportion of behavior that is nonlinear, which can equivalently be thought of as the proportion of the behavior that is deterministic. (This quantity can be computed as the square root of the coherence between hand movement across different trials within a block.) We found that a nonlinear model could only account for an additional 5-10% of behavioral variance compared to a linear model. This suggests that the deterministic portion of participants’ tracking behavior could mostly be explained as a linear response rather than as a systematic nonlinearity. We have included this additional analysis in lines 268-278.

However, we do agree that coherence has limited interpretability as a metric for linearity because there is no threshold for determining that a system is linear, other than exhibiting a coherence of 1 at all frequencies. We believe, however, that the fact that a substantial majority of the variability in behavior being linear offers justification for the use of linear methods in later analyses and is suggestive that participants tracked the target continuously, rather than intermittently.

Reviewer #3 raises a similar concern. The authors show the tracking strategies participants applied by investigating the relationship between hand and target movement. The linear relationship would suggest that participants tracked the target using continuous movements. In contrast, a nonlinear relationship would suggest that participants used an alternative tracking strategy. The authors only state this relationship is based on figure 4 but it seems do not provide any proof of the linearity. It would be more convincing to provide an analysis to show that the relationship is indeed linear or nonlinear.

As discussed in responses 2.1 and 2.2, we have clarified in the revised manuscript how the amplitude spectra and coherence analyses support our argument and we have also offered improved guidance as to how these measures should be interpreted.

Reviewer #2 raised similar concerns, pointing out that using linearity as a metric for mechanistic inference has limitations.

– The absence of learning (errors) would present as nonlinearity.

– The use of cognitive strategy could present as nonlinearity.

– It doesn't seem possible to parse the two mechanisms, especially as you might expect both an increase in error at the beginning of learning and possibly an intervening cognitive strategy at the beginning of learning.

We thank the reviewer for this point. The absence of learning would not in fact present as a nonlinearity. Participants could linearly translate x-axis target motion into x-axis hand motion (consistent with baseline; no learning) or linearly translate x-axis target motion into y-axis hand motion (perfect compensation). In both cases the behavior would be linear. Errors per se would not therefore present as a nonlinearity. It may be that errors might trigger a nonlinear correction mechanism, but they also could be corrected by a linear mechanism.

We agree that the use of a cognitive strategy would likely present as a nonlinearity and we think this is a likely explanation for the low coherence (high nonlinearity + noise) early in learning. Thus, it is possible for us to parse these two possibilities.

As addressed in response 2.1, we have revised the section in the Results titled “Participants Used Continuous Movements to Perform Manual Tracking” to clarify these points.

In lines 223-225: The logic behind why coupling across axes is not nonlinear behavior seems to be missing. It's quite unclear and currently difficult to understand. It would be very helpful to spell this out too.

Coupling between the target and the hand across axes is not nonlinear; it can be described in the time domain by a simple matrix transformation. Behavioral coupling across axes is thus linear in the same way that the imposed perturbations are linear, even though they couple hand motion and cursor motion across different axes. It may also be helpful to note that the Fourier transformation into the frequency domain is also a linear operation and a linear transformation of a linear transformation is also linear. Thus, coupling across axes—whether expressed in the time domain or the frequency domain, falls within linear behavior.

We acknowledge that the intuition behind this may not be straightforward and have provided more detailed explanations of our logic in lines 226-256 and 314-323.

Given these issues, a more grounded interpretation is that linearity simply represents real-time updating. If the relationship between the cursor and the hand is nonlinear, then updating is not in real time.

We are unsure exactly what the reviewer means by ‘real-time updating’. A delay between hand and cursor would not, in fact, render the behavior nonlinear (sinusoidal input would still be translated to sinusoidal output at the same frequency but at a different phase). If the reviewer is suggesting that the cursor and hand are updated in a continuous manner, i.e. not intermittently, then we agree—this is the essence of our argument: that participants counter the mirror-reversal by using a continuous controller rather than by an intermittently applied re-aiming strategy. We are happy to further address this point if the reviewer clarifies this comment.

This was also pointed out by Reviewer #3 who pointed out that details about frequency analysis are buried deep in the methods (around line 711), especially how the hand-target coherence (shown in 4B) is calculated. It would be helpful to include some of these details in the main text. For example, it is currently very difficult to understand the relationship when from moving from Figure 4A to 4B.

We thank the reviewer for pointing out our lack of clarity in describing the methods in the Results. Indeed, the issue of coherence is not easy to intuit and is unlikely to be familiar to the majority of readers. We have attempted to better convey the intuition behind how the amplitude spectra and coherence analyses work and how to interpret the plots in Figure 4 throughout the section titled “Participants Used Continuous Movements to Perform Manual Tracking”.

3. Statistical results

Reviewer #1 points out that any of the key statistical results were buried in the main text and some were incompletely reported. Can the authors provide a table (or set of tables) of the key statistics, including at least the value of the statistical test itself and the p-value, if not also estimates of confidence on the estimates?

We have included source data files that are linked to in Figures 3, 5, and 6 that provide these statistics.

Reviewer #3 also points out that outlier rejection based on some subjects who had greatly magnified, or attenuated data seems like it might be biasing the data. Also, the outlier rejection criteria used (>1.5 IQR) seems very stringent. Furthermore, it appears there was no outlier rejection on the main experiment. It would be good to be consistent across experiments.

We thank the reviewer for pointing this out. We originally excluded a small subset of datapoints (25 out of 560) because they heavily biased group averages for the statistical analyses. However, we agree that it would be better to stay consistent across experiments, so we have reported our results without outlier rejection.

4. Experiment 2

The intention for experiment 2 is to see how much training on the point-to-point task influenced adaptation mechanisms during the tracking task. Yet, this experiment still included extensive exposure to the point-to-point task. Just not as much as in experiment 1. Given this, how can an inference be cleanly made about the influence of one task on the other? Wouldn't the clean way to ask this question be to just not run the point-to-point tracking task at all?

We thank the reviewer for pointing out this concern with Experiment 2. We agree that not including any point-to-point movements at all would have enabled us to more directly assess the influence of point-to-point training on tracking performance. However, the exposure to the point-to-point task was far from “extensive” in this group. Participants only performed 15 point-to-point reaches between the early and late tracking blocks, much fewer than the 450 reaches in the main experiment (we have made this explicit in lines 431-432). Our point-to-point reaching data from experiment 1, as well as other studies (Fernandez-Ruiz et al. 2011, Taylor et al. 2014, Bond and Taylor 2015)3,7,8, suggest that people learn to counter large perturbations of visual feedback over many dozens of point-to-point reaches, much greater than the 15 used in Experiment 2. Furthermore, even if there were some effect of the 15 point-to-point reaches on tracking performance, this effect appears to be minimal as there was no significant difference in the off-diagonal gains between tracking trials 12 and 13 (the trials before and after the 15 point-to-point reaches) for both groups of participants (Figure 6C).

We also note that we feel it is difficult to draw any firm conclusions from this experiment about the importance of point-to-point training. Participants do appear capable of improving their performance under the mirror reversal, though there appears to be less compensation than in the corresponding group in the main experiment. This could, however, be due to less overall exposure time to the perturbation. Despite this issue, we feel that the follow-up experiment provides a valuable replication of the results of the main experiment, so we feel it is worthwhile to still include in the paper.

5. Frequency analysis

The authors state that "The failure to compensate at high frequencies.… is consistent with the observation that people who have learned to make point-to-point movements under mirror-reversed feedback are unable to generate appropriate rapid corrections to unexpected perturbations." This logic is not clear. How is this inferred based on which movement frequencies show an effect, and which do not, leading to this conclusion?

We thank the reviewer for pointing this out and agree that the link between the cited studies and our results was unclear. Although we believe these results may be related, developing this link more rigorously is actually far from straightforward (as the reviewer suggested). We have therefore decided to omit this statement from the revised manuscript, which was not germane to the main claims of our paper.

6. Clarity of logic

Reviewer #3 states that would be helpful if the authors could provide more background/context on their view of de novo learning and explanations on relationship between de novo learning and the adapted controller model. For example, why does the lack of aftereffects under the mirror-reversal imply that the participants did not counter this perturbation via adaptation and instead engaged the learning by forming a de novo controller (Line 199)? Is the reasoning purely behavioral observations, or is there a physiological basis for this assertion?

As discussed in responses 1.1 and 1.3–1.5, persistent aftereffects are a ubiquitous hallmark of adaptation. The lack or aftereffects (or at best minimal aftereffects) in the mirror-reversal group are thus inconsistent with this compensation being achieved through adaptation.

The proposed dissociation between adapting a controller and building a de novo controller is also supported by numerous previous studies which have found both behavioral1,9,10 and physiological evidence9,11,12 to support a dissociation between the learning mechanisms responsible for rotation versus mirror-reversal learning.

In addition, this same reviewer points out that on lines 197-199: The reason for the lack of after-effects in the mean-squared error analysis is a little vague. It took a few tries to understand the reasoning. It would be good to spell this out a little more clearly.

We thank the reviewer for pointing out this lack of clarity. The reason that aftereffects were not evident in the mean-squared error analysis is that aftereffects are typically small in magnitude and introduce relatively little error to the total mean-squared error over the course of a tracking trial. The increase in mean-squared error during early learning can largely be explained by the fact that during this block, participants’ cursors deviated far from the area of target movement. Compared to these large deviations, the errors introduced by aftereffects are relatively small. We have revised lines 152-154 to better clarify this reasoning.

7. Learning in the visuomotor rotation (VMR) condition.

Reviewer #3 also shows that surprisingly, there is no measurement of aiming in the learning to VMR. Several motor learning studies (several the authors cite) show that learning in VMR is a combination of implicit and explicit. It is understood that this is not possible in the continuous tracking task, but can certainly be done in the point to point task. Is there a reason this was not done? Wouldn't this have further supported the author's claim of an existing controller?

We thank the reviewer for this suggestion. We did not assess explicit re-aiming in our point-to-point task for several reasons. One reason is this would have significantly lengthened an already quite long experiment. We preferred to use our participant’s time obtaining more extensive data in the tracking task, which was the primary focus of this study.

It is also not clear in what way collecting re-aiming data would have further supported our argument. The use of explicit strategies is well established and there seems to be little need to replicate this effect in our specific pool of participants. One possible benefit of obtaining an implicit learning measure in the point-to-point task is that it would have been possible to test whether this was correlated with the aftereffect estimated in the tracking task. We do note that the magnitude of aftereffects we observed in the tracking task was broadly consistent with experiments that have assessed implicit learning in point-to-point tasks.

A more comprehensive examination of the extent of generalization and shared mechanisms between the point-to-point and tracking conditions would certainly be interesting to examine in the future using the methods established in this paper (and building on prior studies along these lines13). Such questions are largely tangential to our main question in this paper, however, and we do not believe they are critical to our present conclusions.

References

1. Telgen, S., Parvin, D. and Diedrichsen, J. Mirror Reversal and Visual Rotation Are Learned and Consolidated via Separate Mechanisms: Recalibrating or Learning de novo? J. Neurosci. 34, 13768–13779 (2014).

2. Wilterson, S. A. and Taylor, J. A. Implicit visuomotor adaptation remains limited after several days of training. bioRxiv 711598 (2019) doi:10.1101/711598.

3. Fernández-Ruiz, J., Wong, W., Armstrong, I. T. and Flanagan, J. R. Relation between reaction time and reach errors during visuomotor adaptation. Behav. Brain Res. 219, 8–14 (2011).

4. McDougle, S. D. and Taylor, J. A. Dissociable cognitive strategies for sensorimotor learning. Nat. Commun. 10, 40 (2019).

5. Huberdeau, D. M., Krakauer, J. W. and Haith, A. M. Practice induces a qualitative change in the memory representation for visuomotor learning. J. Neurophysiol. 122, 1050–1059 (2019).

6. Roddey, J. C., Girish, B. and Miller, J. P. Assessing the performance of neural encoding models in the presence of noise. J. Comput. Neurosci. 8, 95–112 (2000).

7. Taylor, J. A., Krakauer, J. W. and Ivry, R. B. Explicit and implicit contributions to learning in a sensorimotor adaptation task. J. Neurosci. Off. J. Soc. Neurosci. 34, 3023–3032 (2014).

8. Bond, K. M. and Taylor, J. A. Flexible explicit but rigid implicit learning in a visuomotor adaptation task. J. Neurophysiol. 113, 3836–3849 (2015).

9. Gutierrez-Garralda, J. M. et al. The effect of Parkinson’s disease and Huntington’s disease on human visuomotor learning. Eur. J. Neurosci. 38, 2933–2940 (2013).

10. Lillicrap, T. P. et al. Adapting to inversion of the visual field: a new twist on an old problem. Exp. Brain Res. 228, 327–339 (2013).

11. Schugens, M. M., Breitenstein, C., Ackermann, H. and Daum, I. Role of the striatum and the cerebellum in motor skill acquisition. Behav. Neurol. 11, 149–157 (1998).

12. Maschke, M., Gomez, C. M., Ebner, T. J. and Konczak, J. Hereditary cerebellar ataxia progressively impairs force adaptation during goal-directed arm movements. J. Neurophysiol. 91, 230–238 (2004).

13. Abeele, S. and Bock, O. Transfer of sensorimotor adaptation between different movement categories. Exp. Brain Res. 148, 128–132 (2003).

14. Miall, R. C. and Jackson, J. K. Adaptation to visual feedback delays in manual tracking: evidence against the Smith Predictor model of human visually guided action. Exp. Brain Res. 172, 77–84 (2006).

15. Yamagami, M., Peterson, L. N., Howell, D., Roth, E. and Burden, S. A. Effect of Handedness on Learned Controllers and Sensorimotor Noise During Trajectory-Tracking. bioRxiv 2020.08.01.232454 (2020) doi:10.1101/2020.08.01.232454.

https://doi.org/10.7554/eLife.62578.sa2

Article and author information

Author details

  1. Christopher S Yang

    Department of Neuroscience, Johns Hopkins University, Baltimore, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Funding acquisition, Validation, Investigation, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    christopher.yang@jhmi.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7645-3861
  2. Noah J Cowan

    Department of Mechanical Engineering, Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Funding acquisition, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2502-3770
  3. Adrian M Haith

    Department of Neurology, Johns Hopkins University, Baltimore, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5658-8654

Funding

Link Foundation (Modeling, Simulation and Training Fellowship)

  • Christopher S Yang

National Institutes of Health (5T32NS091018-17)

  • Christopher S Yang

National Institutes of Health (5T32NS091018-18)

  • Christopher S Yang

National Science Foundation (1825489)

  • Noah J Cowan

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Amanda Zimmet, John Krakauer, Amy Bastian, and Christopher Fetsch for immensely helpful discussions. This material is based upon work supported by the National Science Foundation under Grant No. 1825489. CSY was supported by NIH 5T32NS091018-17, 5T32NS091018-18, and the Link Foundation Modeling, Simulation and Training Fellowship.

Ethics

Human subjects: Informed consent and consent to publish was obtained from all participants included in this work. All methods were approved by the Johns Hopkins School of Medicine Institutional Review Board under NA_00048918.

Senior Editor

  1. Tamar R Makin, University College London, United Kingdom

Reviewing Editor

  1. Timothy Verstynen, Carnegie Mellon University, United States

Reviewer

  1. Timothy Verstynen, Carnegie Mellon University, United States

Publication history

  1. Received: August 29, 2020
  2. Accepted: June 22, 2021
  3. Accepted Manuscript published: June 25, 2021 (version 1)
  4. Version of Record published: July 8, 2021 (version 2)

Copyright

© 2021, Yang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 736
    Page views
  • 79
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Genetics and Genomics
    2. Neuroscience
    Li Hou et al.
    Research Article Updated

    Long-term flight depends heavily on intensive energy metabolism in animals; however, the neuroendocrine mechanisms underlying efficient substrate utilization remain elusive. Here, we report that the adipokinetic hormone/corazonin-related peptide (ACP) can facilitate muscle lipid utilization in a famous long-term migratory flighting species, Locusta migratoria. By peptidomic analysis and RNAi screening, we identified brain-derived ACP as a key flight-related neuropeptide. ACP gene expression increased notably upon sustained flight. CRISPR/Cas9-mediated knockout of ACP gene and ACP receptor gene (ACPR) significantly abated prolonged flight of locusts. Transcriptomic and metabolomic analyses further revealed that genes and metabolites involved in fatty acid transport and oxidation were notably downregulated in the flight muscle of ACP mutants. Finally, we demonstrated that a fatty-acid-binding protein (FABP) mediated the effects of ACP in regulating muscle lipid metabolism during long-term flight in locusts. Our results elucidated a previously undescribed neuroendocrine mechanism underlying efficient energy utilization associated with long-term flight.

    1. Neuroscience
    Krishna N Badhiwala et al.
    Research Article

    Hydra vulgaris is an emerging model organism for neuroscience due to its small size, transparency, genetic tractability, and regenerative nervous system; however, fundamental properties of its sensorimotor behaviors remain unknown. Here, we use microfluidic devices combined with fluorescent calcium imaging and surgical resectioning to study how the diffuse nervous system coordinates Hydra's mechanosensory response. Mechanical stimuli cause animals to contract, and we find this response relies on at least two distinct networks of neurons in the oral and aboral regions of the animal. Different activity patterns arise in these networks depending on whether the animal is contracting spontaneously or contracting in response to mechanical stimulation. Together, these findings improve our understanding of how Hydra’s diffuse nervous system coordinates sensorimotor behaviors. These insights help reveal how sensory information is processed in an animal with a diffuse, radially symmetric neural architecture unlike the dense, bilaterally symmetric nervous systems found in most model organisms.