Action-sequence learning, habits and automaticity in obsessive-compulsive disorder

Paula Banca; Maria Herrojo Ruiz; Miguel Fernando Gonzalez-Zalba; Marjan Biria; Aleya A. Marzuki; Thomas Piercy; Akeem Sule; Naomi Anne Fineberg; Trevor William Robbins

doi:10.7554/eLife.87346.1

Introduction

Considerable evidence has supported the concept of imbalanced cortico-striatal pathways mediating compulsive behavior in obsessive-compulsive disorder (OCD). This imbalance has been suggested to reflect a weaker goal-directed control and an excessive habitual control (Gillan et al., 2016). Dysfunctional goal-directed control in OCD has been strongly supported both behaviorally (Gillan et al., 2011; Vaghi et al., 2018) and in neural terms (Gillan et al., 2015a). However, enhanced (and potentially maladaptive) habit formation has hitherto mainly been inferred by the absence of goal-directed control, although there is recent evidence of greater self-reported habitual tendencies in OCD measured using the Self-Report Habit Index Scale (Ferreira et al., 2017). Problems with this “zero-sum” hypothesis (Robbins and Costa, 2017) (i.e., diminished goal directed control thus enhanced habitual control) have been underlined by recent findings linking stimulus-response strength (Zwosta et al., 2018) and goal devaluation (Gillan et al., 2015b) exclusively to a dysfunctional goal-directed neural system. There is thus a need to focus specifically on the habit component of the associative dual-process (i.e. goal/habit) model of behavior and test more directly the hypothesis of enhanced habit formation in OCD.

We have recently postulated that extensive training of sequential actions would more rapidly engage the ‘habit system’ as compared to single-action instrumental learning tasks typically used in the laboratory (Robbins et al., 2019). The rationale is that, on sequential actions (as it occurs during skilled routines), the performance of one action facilitates the next (via kinesthetic feedback, in a stimulus-response/automatic manner) and independent motor acts become integrated into a unified and coordinated sequence of actions (or “chunks”) (Graybiel, 1998; Sakai et al., 2003), which can be quickly and more efficiently implemented. With practice and training, selection and execution of individual component actions become more efficient, stereotypical, requiring less cognitive effort, and performed with little variation, in a highly efficient manner (Diedrichsen and Kornysheva, 2015). Such features relate to the concept of automaticity, which captures many of the shared elements between habits and skills (Ashby et al., 2010). At a neural level, automaticity is associated with a shift in control from the anterior/associative (goal-directed) to the posterior/sensorimotor (habitual) striatal regions (Ashby et al., 2010; Graybiel and Grafton, 2015; Kupferschmidt et al., 2017), accompanied by a disengagement of cognitive control hubs in frontal and cingulate cortices (Bassett et al., 2015). In fact, within the skill learning literature, this progressive shift to posterior striatum has been linked to the gradually attained asymptotic performance of the skill (Bassett et al., 2015; Doyon et al., 2018, 2015; Lehericy et al., 2005). Hence chunked action sequences provide an opportunity to target the brain’s goal-habit transition and study the relationships between automaticity, skills and habits (Dezfouli et al., 2014; Graybiel and Grafton, 2015; Robbins and Costa, 2017). This approach is relevant for OCD research as it mimics the sequences of motor events and routines observed in typical compulsions, which are often performed in a ‘‘just right’’ manner (Hellriegel et al., 2017), akin to skill learning. Chunked action sequences also enable investigation of the relationships between hypothesized procedural learning deficits in OCD (Rauch et al., 1997) and automaticity.

Following this reasoning, we developed a smartphone motor sequence application with attractive sensory features in a game-like setting, to investigate automaticity and measure habit/skill formation within a naturalistic setting (at home). Our Motor Sequencing App is a self-instructed and self-paced app that enables subjects to learn and practice two sequences of finger movements, composed of chords and single presses (like a piano-based app). The task was specifically designed to focus on the positive features of habits, as suggested by Watson et al., 2022, and satisfies central criteria that define habits, according to a recent proposal by Balleine and Dezfouli, 2019: rapid execution, invariant response topography and action chunking. We also aimed to investigate within the same experiment three facets of automaticity which, according to Haith and Krakauer (2018), have rarely been measured together: habit, skill and cognitive load. Although there is no consensus on how exactly skills and habits interact (Robbins and Costa, 2017), it is generally agreed that both lead to automaticity with sufficient practice (Graybiel and Grafton, 2015) and that the autonomous nature of habits and the fluid proficiency of skills engage the same sensorimotor cortical-striatal ‘loops’ (the so-called ‘habit circuitry’) (Ashby et al., 2010; Graybiel and Grafton, 2015). By focusing more on the automaticity of the response per se (as reflecting the speed and stereotypy of over-trained movement sequences), rather than on the autonomous nature of the behavior (an action that continues after a state change, e.g. devaluation of the goal), we do not solely rely on the devaluation criterion used in previous studies of compulsive behavior. This is important because outcome devaluation insensitivity as a test of habit in humans is controversial (Watson et al., 2022) and may indeed be a more sensitive indicator of failures of goal-directed control rather than of habitual control per se (Balleine and Dezfouli, 2019; Robbins et al., 2019; Robbins and Costa, 2017).

While designing our app, we additionally took into consideration previous findings which defined training frequency, context stability, and reward contingencies as important features for increasing habit strength (Wood and Rünger, 2016). We used a period of 1-month’s training to enable effective consolidation, required for habitual action control or skill retention to occur. This acknowledged previous studies showing that practice alone is insufficient for habit development as it also requires off-line consolidation computations, through longer periods of time (de Wit et al., 2018) and sleep (Nusbaum et al., 2018; Walker et al., 2003). Since previous research suggests that the speed to which habits are acquired depends on how predictable the reinforcers are (Bouton, 2021) we also employed two different schedules of reinforcement (continuous versus variable [probabilistic]) to assess its impact on habit formation amongst healthy volunteers (HV) and patients with OCD.

Outline

In this article, we applied, for the first time, a novel app-based behavioral training (experiment 1) to a sample of patients with OCD. We compared 32 patients (19 females) and 33 healthy controls (19 females), matched for age, gender, IQ and years of education in measures of motivation and app engagement (see Methods for participants’ demographics and clinical characteristics). We also assessed to what extent performing such repetitive actions in one month impacted OCD symptomatology. In an initial phase (30 days), two action sequences were trained daily to produce habits/automatic actions (experiment 1). We continuously collected data online, in real time, thus enabling measurements of procedural learning as well as automaticity development. In a second phase, we administered two follow-up behavioral tasks (experiments 2 and 3) which allowed us to address two important questions relevant to the habit theory of OCD. The first research question explored the possibility that repeated performance of acquired motor sequences may develop implicit rewarding properties, hence gaining value, and leading to compulsive-like behaviors (experiment 2). The hypothesis is that repeated performance of the motor sequences, initially driven by the goal of becoming proficient, may eventually become driven by their own implicit reward linked to their proprioceptive and kinesthetic feedback (for example, providing anxiety relief as well as skilled performance). Second, we examined whether a functional impairment in the mechanism of dynamic arbitration between automatic, habit-like actions and goal-directed behavior underlies OCD (experiment 3). This hypothesis has been previously proposed (Gruner et al., 2016), but to our knowledge, has not been rigorously tested so far. Finally, we administered a comprehensive set of self-reported clinical questionnaires, including a recently-developed questionnaire (Ersche et al., 2017) on aspects of habits such as routine and automatic tendencies to test: 1) whether patients with OCD exhibit more self-reported habits; 2) whether greater habitual tendencies expressed subjectively are predictive of enhanced procedural learning, automaticity development and an (in)ability to adjust to changing circumstances; and 3) whether habit reversal via app training produces any therapeutic benefit or and has any subjective sequelae in OCD.

Hypothesis

Following current theories related to implicit learning deficiency (Deckersbach et al., 2002; Kathmann et al., 2005; Rauch et al., 1997) and visuospatial and fine-motor skill difficulties in patients with OCD (Bloch et al., 2011), we anticipated that patients would display poorer procedural learning as compared to healthy volunteers. However, once learning is established, we predicted that OCD patients would attain automaticity faster than healthy volunteers, assuming a hypothetically greater tendency in OCD to form habitual/automatic actions (Gillan et al., 2016, 2014). We also hypothesized that the acquisition of learning and automaticity would differ between the two action sequences based on their associated rewarded schedule (continuous versus variable) and reward valence (positive or negative). This is based on previous research suggesting that habits may form more quickly when reinforcers are well-predicted and more slowly when uncertain (Bouton, 2021). We additionally examined differential effects of positive and negative feedback changes on performance to build on previous work demonstrating enhanced sensitivity to negative feedback in patients with OCD (Apergis-Schoute et al 2023, Becker et al., 2014; Kanen et al., 2019). Finally, we expected that OCD patients would generally report greater habits, as well as show enhanced habit attainment through a greater preference for performing familiar app sequences when given the choice to select any other, easier sequence. We also anticipated that patients would exhibit greater resistance to returning to a goal state.

Results

Self-reported habit tendencies

Participants completed self-reported questionnaires measuring various psychological constructs (see Methods). Highly relevant for the current topic is the Creature of Habit (COHS) Scale (Ersche et al., 2017), recently developed to measure individual differences in habitual routines and automatic tendencies in everyday life. This 27-item questionnaire incorporates two aspects of the general concept of habits: routine behavior and automatic responses. As compared to healthy controls, OCD patients reported significantly higher habitual tendencies both in the routine (t = -2.79, p = 0.01; HV: and the automaticity (t = -3.15, p < 0.001; HV: subscales.

Phase A: Experiment 1

Motor Sequence Acquisition using the App

The task was a self-instructed and self-paced smartphone application (app) downloaded to participants’ iPhones. It consisted of a motor practice program that participants committed to pursue daily, for a period of one month. An exhaustive description of the method has been previously published (Banca et al., 2020) but a succinct description can be found below, in Figure 1 and in the following video https://www.youtube.com/watch?v=XSYrBzD7ZpI.

Motor Sequencing App.
a) A trial starts with a static image depicting the abstract picture that identifies the sequence to be performed (or ‘played’) as well as the 4 keys that will need to be tapped. Participants use their dominant hand to play the required keys: excluding the thumb, the leftmost finger corresponds to the first circle and the rightmost finger corresponds to the last circle. b) Screenshot examples of the task design: (1) sequence selection panel, each sequence is identified by an abstract picture; (2) panel exemplifying visual cues that initially guide the sequence learning; (3) panel exemplifying the removal of the visual cues, when sequence learning is only guided by auditory cues. c) Example of a sequence performed with the right hand: 6-moves in length, each move can comprise multiple finger presses (2 or 3 simultaneous) or a single finger press. Each sequence comprises 3 single press moves, 2 two-finger moves, and 1 three-finger move. d) Short description of the daily practice schedule. Each day, participants are required to play *a minimum* of 2 practices per sequence. Each practice comprised 20 successful trials. Participants could play more if they wished and the order of the training practices was self-determined.

The training consisted of practicing two sequences of finger movements, composed of chords (two or three simultaneous finger presses) and single presses (one finger only). Each sequence comprised six moves and was performed using four fingers of the dominant hand (index, middle, ring and little finger). Participants received feedback on each sequence performance (trial). Successful trials (to which we later refer as sequence trial number, n) were followed by a positive ring tone and the unsuccessful ones by a negative ring tone. Every time a mistake occurred (irrespective of which move in the sequence it occurred), participants were prompted to restart the trial. Instructions were to respond swiftly and accurately. Participants were required to keep their fingers very close to the keys to minimize movement amplitude variation and to facilitate fast performance. To promote sequence memorization, exteroceptive cues (visual and auditory) were given in the first two days of training. These were then gradually removed (first visual and then auditory) to enable motoric independence and automaticity.

Each sequence, identified by a specific abstract image, was associated with a particular reward schedule. Points were calculated as a function of the time taken to complete a sequence trial. Accordingly, performance time was the instructed task-related dimension (i.e. associated with reward). In the continuous reward schedule, points were received for every successful trial whereas in the variable reward schedule, points were shown only on 37% of the trials. The rationale for having two distinct reward schedules was to assess their possible dissociable effect on the participants’ development of automatic actions. For each rewarded trial, participants could see their achieved points on the trial. To promote motivation, the total points achieved on each daily training sessions were also shown, so participants could see how well they improved across days. The permanent accessibility of the app (given that most people carry their mobile phones everywhere) facilitated training frequency and enabled context stability.

Practice Schedule

All participants were presented with a calendar schedule and were asked to practice both sequences daily. They were instructed to practice as many times as they wish, whenever they wanted during the day and with the sequence order they would prefer. However, a minimum of 2 practices (P) per sequence was required every day; each practice comprised 20 successful sequence trials. If participants missed a day of practice, they were required to catch up on the training the following day, i.e. to complete the minimum training for the current and previous day.

A minimum of 30 days of training was required and all data were anonymously collected in real-time, through an online server. On the 21^st day of practice, the rewards were removed (extinction) to ensure that the action sequences were more dependent on proprioceptive and kinesthetic, rather than on external, feedback.

Training engagement

Participants reliably committed to their regular training schedule, practicing consistently both sequences every day. Unexpectedly, OCD patients completed significantly more practices as compared with HV (p = 0.005) (Figure 2a). Descriptive statistics are as follows: HV: M_P = 122, IQR (Interquartile range) = 7; OCD: M_P = 130, IQR = 14. Note that main values are represented as median, and errors are reported as interquartile range unless otherwise stated, due to the non-Gaussian distribution of the datasets. When inspecting the weekly training pattern, we observed that the patients’ commitment to training was consistently higher across the entire training period (Figure 2b). Daily, the approximately bimodal distribution observed by HV in Figure 2c, which depicts a tendency to practice mostly during early mornings (∼8:00) and evenings (∼19:00), is not followed by the patients, who had a somewhat more random pattern of daily practice, including engagement even during night times.

Learning

Learning was evaluated by the decrement in sequence duration throughout training. To follow the nomenclature of the motor control literature, we refer to sequence duration as movement time (MT), which is defined as

where t₆ and t₁ are the time of the last (6^th) and first key presses, respectively.

For each participant and sequence reward type (continuous and variable), we measured MT of a successful trial, as a function of the sequence trial number, n, across the whole training. Across trials, MT decreased exponentially (Figure 3a). The decrease in MT has been widely used to quantify learning in previous research (Crossman, 1959). A single exponential is viewed as the most statistically robust function to model such decrease (Heathcote et al., 2000). Accordingly, each participant’s learning profile was modeled as follows:

Learning.
**Upper panel:** Model fitting procedure conducted for the continuous reward sequence. **Lower panel:** Model fitting procedure conducted for the variable reward sequence. a) Individual plots exemplifying the time-course of MT as training progresses (lighter color) as well as the exponential decay fit modelling the learning profile of a single participant (darker color). Left panels depicts a HV and right panels a patient with OCD. b) Group comparison resulting from all individual exponential decays modelling the learning profile of each participant. Significant group difference on the amount of learning, MT_L, in both reward schedule conditions (continuous: p = 0.009; variable: p < 0.001). Solid lines: median (M); Transparent regions: interquartile range (*IQR*); Purple: healthy volunteers (HV); Blue: patients with obsessive-compulsive-disorder (OCD).

where n_r is the learning rate (measured in number of trials), which governs the rate of exponential decay. Parameter MT₀ is the movement time at asymptote (at the end of the training). Last, MT_L is the speed-up achieved over the course of the training (referred to as amount of learning) (Figure 3a). The larger the value of MT_L the bigger the decline in the movement time and thus the larger the improvement in the motor learning.

The individual fitting approach we used has the advantage of handling the different number of trials executed by each participant by modeling their behavior to a consolidated maximum value of n, n_max = 1200. We used a moving average of 20 trials to mitigate any effect of outlier trials. This analysis was conducted separately for continuous and variable reward schedules.

To statistically assess between-group differences in learning behavior, we pooled the relevant individual model parameters, MT_L and n_r conducted a Kruskal–Wallis H test to assess the effect of group (HV and OCD), reward type (continuous and variable) and their interaction on each parameter (Figure 3b).

There was a significant effect of group on the amount of learning parameter (MT_L H = 16.5, p < 0.001, but no reward (p = 0.06) or interaction effects (p = 0.34) (Figure 3c). Descriptive statistics: HV: M MT_L= 3.1 s, IQR = 1.2 s and OCD: M MT_L= 3.9 s, IQR = 2.3 s for the continuous reward sequence; HV: M MT_L= 2.3 s, IQR = 1.2 s and OCD: M MT_L= 3.6 s, IQR = 2.5 s for the variable reward sequence.

Regarding the learning rate (n_r) parameter, we found no significant main effects of group (p = 0.79), reward (p = 0.47) or interaction effects (p = 0.46). Descriptive statistics: sequence trials needed to asymptote HV: Mn_r= 176, IQR = 99 and OCD: Mn_r= 200, IQR = 114 for the continuous reward sequence; HV: Mn_r= 182, IQR = 123 and OCD: Mn_r= 162, IQR = 141 for the variable reward sequence. These non-significant effects on the learning rate were further assessed with Bayes Factors (BF) for factorial designs (see Methods). This approach estimates the ratio between the full model, including main and interaction effects, and a restricted model that excludes a specific effect. The evidence for the lack of main effect of group was associated with a BF of 0.38, which is anecdotal evidence. We additionally obtained moderate evidence supporting the absence of a main effect of reward or a reward x group interaction (BF = 0.16 and 0.17 respectively).

Finally, there were no significant main or interaction effects on the asymptote parameter, MT₀ (group effect: p = 0.17; reward effect: p = 0.65 and interaction effect: p = 0.64). BF analysis provided anecdotal evidence supporting that there is no main effect of group (BF = 0.53), yet moderate evidence that reward and a reward x interaction factors do not modulate the performance time (BF = 0.12 and 0.17, respectively).

These results suggest that OCD patients have no learning deficits. Despite performing the action sequences significantly slower at the beginning of the training, patients demonstrated similar learning rates as compared to HV. Furthermore, both groups exhibited similar motor durations at asymptote which, which combined with the previous conclusion, indicates that OCD patients improved their motor learning more than controls, but to the same asymptote.

Automaticity

To assess automaticity, the ability to perform actions with low-level cognitive engagement, we examined the decline over time in the consistency of inter-keystroke interval (IKI) patterns trial to trial. We mathematically defined IKI consistency as the sum of the absolute value of the timelapses between finger presses from one sequence to the previous one

where n is the sequence trial number and k is the inter-keystroke response interval (Figure 4a). In other words, C quantifies how consistent/reproducible the press pattern is from trial to trial. The assumption here is that the more reproducible the sequences are over time, the more automatic the person’s motor performance becomes.

Automaticity.
a) We mathematically defined trial-to-trial inter-keystroke-interval consistency (IKI consistency), denoted as C (in seconds), as the sum of the absolute values of the time lapses between finger presses across consecutive sequences. The variable n represents the sequence trial and k denotes the IKI. We evaluated automaticity by analyzing the decline in C over time, as it approached asymptotic levels. b) Group comparison resulting from all individual exponential decays modelling the automaticity profile (drop in C) of each participant. A significant group effect was found on the amount of automaticity gain, C_L (Kruskal–Wallis H = 11.1, p < 0.001) and on the automaticity constant, n_C (Kruskal– Wallis H = 4.61, p < 0.03). Solid and dashed lines are median values (M). Light purple: healthy volunteers (HV); Dark purple: patients with obsessive-compulsive-disorder (OCD); Solid lines: continuous reward condition; Dashed lines: variable reward condition.

For each participant and sequence reward type (continuous and variable), automaticity was assessed based on the decrement in C, as a function of n, across the entire training period. Since C decreased in an exponential fashion, we fitted the C data with an exponential decay function (following the same reasoning and procedure as MT) to model the automaticity profile of each participant,

where n_c is the automaticity rate (measured in number of trials), C₀ is the sequence consistency at asymptote (by the end of the training) and C_L is the change in automaticity over the course of the training (which we refer to amount of automation gain). The model fitting procedure was conducted separately for continuous and variable reward schedules.

A Kruskal–Wallis H test was then conducted to assess the effect of group (OCD and HV) and reward type (continuous and variable) on each parameter resulting from the individual exponential fits (C_L, n_C and C₀).

There was a significant effect of group on the amount of automation gain (C_L: H = 11.1, p < 0.001) but no reward (p = 0.12) or interaction effects (p = 0.5) (Figure 4b). Descriptive statistics are as follows: HV: MD_L= 1.4 s, IQR = 0.7 s and OCD: MD_L= 1.9 s, IQR = 1.0 s for the continuous reward sequence; HV: MD_L = 1.1 s, IQR = 0.8 s and OCD: MD_L = 1.5 s, IQR = 1.1 s for the variable reward sequence.

There was also a significant group effect on the automaticity rate (n_C: H = 4.61, p < 0.03) but no reward (p = 0.42) or interaction (p = 0.12) effects. Descriptive statistics: sequence trials needed to asymptote HV: Mn_D= 142, IQR = 122 and OCD: Mn_D= 198, IQR = 162 for the continuous reward sequence; HV: Mn_D= 161, IQR = 104 and OCD: Mn_D= 191, IQR = 138 for the variable reward sequence.

At asymptote, no group (p = 0.1), reward (p = 0.9) or interaction (p = 0.45) effects were found. We found anecdotal evidence supporting that the group factor did not modulate the results (BF = 0.65). In addition, there was moderate evidence in favor of no main effects of reward or interaction (BF = 0.12 and 0.18 respectively).

Of note is the median difference in consecutive sequences achieved at asymptote: HV: MD₀ = 287 ms, IQR = 127 ms, OCD: MD₀ =301 ms, IQR = 186 ms for the continuous reward sequence and HV: MD₀ = 288 ms, IQR = 110 ms, OCD: MD₀ = 300 ms, IQR = 114 ms for the variable reward sequence. These values of the C at asymptote are generally shorter than the normal reaction time for motor performance (Kosinski, 2008), reinforcing the idea that automaticity was reached by the end of the training.

In conclusion, patients were significantly slower, as compared to HV, at achieving a similar level of automaticity, in both types of reward sequences: they started slower, with a more irregular pace and had a slower progression rate to automaticity.

Sensitivity of sequence duration to reward

Our next goal was to investigate the sensitivity of performance improvements over time in our participant groups to changes in scores, either an increment or a decrease. To do this, we quantified the trial-by-trial behavioral changes in response to a decrement or increase in reward from the previous trial using the sequence duration (in ms), labeled as MT (movement time). Note that in our experimental design, MT was negatively correlated with the scores received. Following Pekny et al., 2015, we represented the change from trial n to n + 1 in MT simply as:

Reward (R) change at trial n was computed as:

Next, we analyzed separately ΔMT values that followed an increase in reward from trial n − 1 to n,ΔR+, denoting a positive sign in ΔR; and those that followed a drop in reward, ΔR−, indicating a negative sign in ΔR. Following Pekny et al., 2015, we estimated for each participant the conditional probability distributions p(ΔT|ΔR+) and p(ΔT|ΔR−) (where T denotes a behavioral measure, MT in this section or IKI consistency in the next section) by fitting a Gaussian distribution to the histogram of each data sample (Figure S1). The standard deviation (σ) and the center μ of the resulting distributions were used for statistical analyses (Figure S1). Similar analyses were carried out on C index (defined above), which already reflected changes between consecutive trials.

As a general result, we expected that healthy participants would introduce larger behavioral changes (more pronounced reduction in MT, more negative ΔMT) following a decrease in scores, as shown previously (Chen et al., 2017; van Mastrigt et al., 2020). Accordingly, we predicted that the p(ΔT/ΔR−) distribution would be centered at more negative values than p(ΔT|ΔR+), corresponding to greater speeding following negative reward changes. Given previous suggestions of enhanced sensitivity to negative feedback in patients with OCD (Apergis-Schoute et al., 2023, Becker et al., 2014; Kanen et al., 2019), we predicted that the OCD group, as compared to the control group, would demonstrate greater trial-to-trial changes in movement time and a more negative center of the p(ΔT/ΔR−) distribution. Additionally, we examined whether OCD participants would exhibit more irregular changes to ΔR− and ΔR+ values, as reflected in a larger spread of the p(ΔT/ΔR+) and p(ΔT/ΔR −) distributions, compared to the control group.

The conditional probability distributions were separately fitted to subsamples of the data across continuous reward practices, splitting the total number of correct sequences into four bins. This analysis allowed us to assess changes in reward sensitivity and behavioral changes across bins of sequences (bins 1-4 by partitioning the total number of sequences, from the whole training, into four). We focused the analysis on the continuous reward schedule for two reasons: 1) changes in scores on this schedule are more obvious to the participants and 2) a larger number of trials in each subsample were available to fit the Gaussian distributions, due to feedback being provided on all trials.

We observed that participants speeded up their sequence duration more (negative changes in trial-wise MT) following a drop in scores, as expected (Figure 5a). Conducting a 3-way ANOVA with reward change (increase, decrease) and bin (1:4) as within-subject factors, and group as between-subject factor, we found a significant main effect of reward (p = 1.53 × 10⁻⁰⁷), indicating that both groups reduced MT differently as a function of the change in reward. There was also a significant main effect of bin (p = 1.30 × 10⁻¹⁰), such that participants speeded up their sequence performance over practices. The main effects are illustrated in Figure 5b. In addition, there was a significant interaction between reward and bin in predicting the trial-to-trial changes in movement time (p = 4.78 × 10⁻⁰⁹). This outcome suggested that the improvement in MT over sequences depended on whether the reward increased or decreased from the previous trial. To explore this interaction effect further, we conducted a dependent-sample pairwise t-test on MT, after collapsing the data across groups. In each sequence bin, participants speeded up MT more following a drop in scores than following an increment, as expected (corrected p_FDR = 2 × 10⁻¹⁶). On the other hand, assessing the effect of bins separately for each level of reward, we observed that the large sensitivity of MT changes to reward decrements was attenuated over bins of practices (corrected p_FDR = 2 × 10⁻¹⁶; dependent-sample t-tests between consecutive pairs of bins). By contrast, the sensitivity of MT changes to reward increments— consistently smaller—did not change over bins (p_FDR = 0.88). Overall, these findings indicate that both OCD and HV participants exhibited an acceleration in sequence performance following a decrease in scores (main effect). However, this sensitivity to score decrements was reduced as participants approached automaticity through repeated practice. Notably, the increased sensitivity to reward decrements relative to increments persisted throughout the practice sessions in both groups.

Sensitivity of movement time to changes in reward in the continuous reward schedule.
a) Mean change in movement time (MT, ms) from trial n to *n+1* following an increment (*ΔR+*, in purple) or decrement (*ΔR-*, in green) in scores at n. The dots represent mean MT changes (error bars denote SEM) in each bin of correctly performed sequences, after partitioning all correct sequences into four subsets, and separately for OCD (dark colors), and HV (light colors). b) Both groups of participants speeded up their sequence performance more following a drop in scores (main effect of reward, p = 1.53 × 10⁻⁰⁷ ; 2 × 4 reward x bin ANOVA); yet this acceleration was reduced over the course of practiced sequences (main bin effect, p = 1.40 × 10⁻¹⁰). c) Same as a) but for the spread (std, ms) of the MT change distribution. d) Illustration of the main effect of group on std (p = 2.56 × 10⁻⁰⁷). Each bin depicted in the plots (x-axis) contains 140 correct sequences on average (further details in ***Supplementary Results: Sample size for the reward sensitivity analysis)***.

Assessment of the std (σ) of the Gaussian distributions p (ΔT/ΔR−) and p(ΔT/ΔR+) in the continuous reward condition (Figure 5c) with a similar 3-way ANOVA revealed a significant main effect of group (p = 2.56 × 10⁻⁰⁷). As shown in Figure 5d, the std (σ) of the distribution of trial-to-trial MT changes was smaller in HV than in OCD. In addition, we observed a significant change over bins of sequences in (σ), and independently of the group or reward factors (main effect of bin, p = 2 × 10⁻¹⁶). This outcome reflected that over practices, both groups introduced less variable changes in MT over the course of training in response to both reward increments and decrements, in line with improvements in skill learning (Wolpert et al., 2011). No additional main or interaction effects were found. Control analyses demonstrated that the group, reward or bin effects were not confounded by differences in the size of the subsamples used for the Gaussian distribution fits (Supplementary Results).

Sensitivity of IKI consistency (C) to reward

To further explore the potential impact of reward changes on the previously reported group effects on automaticity, we quantified the trial-by-trial behavioral changes in IKI consistency (represented by C) in response to changes in reward scores relative to the previous trial. Note that a smaller C indicates a more reproducible IKI pattern trial to trial. During continuous reward practices, both patients and healthy controls exhibited an increased consistency of IKI patterns trial to trial across bins of correct sequences (decreased C, equation [3], Figure 6a and 6b; main effect of bin on the center of the Gaussian distribution, p = 2 × 10⁻¹⁶; 3-way ANOVA). Performance in OCD and HV, however, differed with regards to how reproducible their timing patterns were (main effect of group, p = 2.6 × 10⁻⁰⁵). Moreover, the IKI consistency improved more (smaller C) following reward increments than after decrements, as shown in Figure 6a and 6b (main reward effect, p = 0.002). No significant interaction effects between factors were found. Accordingly, although OCD participants exhibited an attenuated IKI consistency in their performance relative to HV, the main effects of reward and bins of sequences were independent of the group.

Regarding the spread of the p(ΔT |ΔR) distributions, we found a significant main effect of the group (p = 2.56 × 10⁻⁰⁷), bin (p = 2 × 10⁻¹⁶) and reward (p = 0.001) factors, but no significant interaction between the factors (Figure 6c and 6d). These outcomes suggest that the σ of the Gaussian distribution for C values differed between groups, such that OCD patients had a more variable distribution than healthy control participants, and independently of the reward or bin factor (Figure 6c and 6d). In addition, the σ was reduced across bins of practiced sequences. Last, the C values were more stable (smaller spread) following an increase in scores. The results highlight that over the course of training participants’ IKI consistency values stabilized, but more so following reward gains. On a group level, the degree of IKI consistency was more irregular in the OCD sample. Similarly to the MT analyses, the sensitivity effects to reward changes were not accounted for by differences in the size of the subsamples used for the Gaussian distribution fits (Supplementary Results).

Phase B: Tests of action-sequence preference and goal/habit arbitration

Once the month-long training was completed, participants attended a laboratory session for follow-up behavioral tests aimed at assessing preference for familiar versus novel sequences (experiment 2 and 3) including a re-evaluation test to assess goal/habit arbitration (experiment 3 only). Below we briefly describe these two experiments and report the results. See Methods and Table 3 for a more detailed description of the tasks.

Experiment 2

Preference for familiar versus novel action sequences

This experiment tests the hypothesis stated in the outline, that the trained action sequence gains intrinsic/rewarding properties or value. We used an explicit preference task, assessing participants’ preferences for familiar (hypothetically habitual) sequences over goal-seeking sequences. We assume that if the trained sequences have acquired rewarding properties (for example, anxiety relief, or the inherent gratification of skilled performance or routine), participants would express a greater preference to select or ‘play’ them, even when given the option to play easier sequences (i.e., goal-seeking sequences).

After reporting which app sequence was their preferred, participants started the explicit preference task. On each trial, they were required to select and play 1 of 2 sequences. The 2 possible sequences were presented and identified using a corresponding image and participants had to choose which one they wanted to play. There were 3 conditions, each comprising a specific sequence pair: 1) app preferred sequence versus app non-preferred sequence (control condition) 2) app preferred sequence versus any 6-move sequence (experimental condition 1) ; 3) app preferred sequence versus any 3-move sequence (experimental condition 2). The app preferred sequence was their preferred putative habitual sequence and the ‘any 6’ or ‘any 3’-move sequences were the goal-seeking sequences because they are supposedly easier for 2 reasons: 1) they could comprise any key press of participant’s choice (for example the same single key press repeatedly 6 or 3 times respectively) and 2) they could have same or different key press combinations every time the ‘any-sequence’ needed to be input (i.e. not necessarily the same sequence on every trial). The conditions (15 trials each) were presented sequentially but counterbalanced among participants. See Methods and Figure 7a for further details.

Preference for familiar versus novel action sequences. a) Explicit Preference Task.
Participants had to choose and play one of two given sequences. Once choice was made, the image correspondent to the selected sequence was highlighted in blue. Participants then played the sequence. While playing it, the bar on top registered each move progressively lighting up in green. There were 3 conditions, each comprising a specific sequence pair: 1) app preferred sequence *versus* app non-preferred sequence (*control condition*) 2) app preferred sequence *versus* any 6-move sequence of participant’s choice (*experimental condition 1*); 3) app preferred sequence *versus* any 3-move sequence of participant’s choice (*experimental condition 2*). b) No evidence for enhanced preference for the app sequence in either HV nor OCD patients. In fact, when an easier and shorter sequence is pitted against the app familiar sequence (right raincloud plot), both groups significantly preferred it (Kruskal-Wallis main effect of Condition H = 23.2, p < 0.001). Left raincloud plot: control condition; Middle raincloud plot: experimental condition 1; Right raincloud plot: experimental condition 2. Y-axis depicts the number of app-sequence choices (15 choice trials maximum). Connected lines depict mean values. **(c)** Exploratory analysis of the preference task following up unexpected findings on the mobile-app effect on symptomatology: re-analysis of the data conducting a Dunn’s *post hoc* test splitting the OCD group into 2 subgroups based on their YBOCS change after the app training [14 patients with improved symptomatology (reduced YBOCS scores) and 18 patients who remained stable or felt worse (i.e. respectively, unchanged or increased YBOCS scores)]. Patients with reduced YBOCS scores after the app training had significantly higher preference to play the app sequence in both experimental conditions (left panel: p_bonf *= 0*.04*; right panel: p_bonf *= 0*.03*). The bar plots represent the sample mean and the vertical lines the confidence interval. Individual data points are included to show dispersion in the sample. Abbreviations: YBOCS = Yale-Brown obsessive-compulsive scale, HV = Healthy volunteers, OCD = patients with obsessive-compulsive disorder.

A Kruskal-Wallis H test indicated a main effect of Condition (H = 23.2, p < 0.001) but no Group (p = 0.36) or interaction effects (p = 0.72) (Figure 7b). Dunn’s post hoc pairwise comparisons revealed that experimental condition 2 (app sequence versus any 3 sequence) was significantly different from control condition (p_bonf < 0.001) and from experimental condition 1 (app sequence versus any 6 sequence) (p_bonf = 0.006). No differences were found between the latter two conditions (p_bonf = 0.08). Bayesian analysis further provided moderate evidence in support of the absence of main effects of group (BF = 0.129) and interaction (BF = 0.054). These results denote that both groups evaluate the trained app sequences as being equally attractive as the alternative novel-but-easier sequence when of the same length (Figure 7b, middle plot). However, when given the option to play an easier-but-shorter sequence (in experimental condition 2), both groups significantly preferred it over the app familiar sequence (Figure 7b, right plot). A positive correlation between COHS and the app sequence choice (Pearson r = 0.36, p = 0.005) further showed that those participants with greater habitual tendencies had a greater propensity to prefer the trained app sequence under this condition.

Given the high variance of participants’ choices on this preference task, particularly in the experimental conditions, and the findings reported below related to the mobile-app performance effect on symptomatology, we further conducted an exploratory Dunn’s post hoc test splitting the OCD group into 2 subgroups based on their YBOCS score changes after the app training: 14 patients with improved symptomatology (reduction in YBOCS scores) and 18 patients who remained stable or felt worse (i.e. respectively, same or increase in YBOCS scores). Patients with lowered YBOCS scores after the app training had significantly greater preference for the app trained sequence in both experimental conditions as compared to patients with same or increased YBOCS scores after the app training: experimental condition 1 (p_bonf= 0.04, Figure 7c, left) and experimental condition 2 (p_bonf= 0.03, Figure 7c, right). In conclusion, most participants prefer to play shorter and easier alternative sequences, thus not showing a bias towards the trained/familiar app sequences. Contradicting our hypothesis, OCD patients followed the same behavioral pattern. However, some participants still preferred the app sequence, specifically those with greater habitual tendencies, including patients who considered the app training beneficial. Such preference presumably arose because some intrinsic value may have been attributed to the trained action sequence.

Experiment 3

Test of goal/habit arbitration: re-evaluation of the learned action sequence

A 2-choice appetitive learning task was used to test participants’ ability to switch to a different behavior. By providing more value to alternative action sequences, participants were required to re-evaluate their options and act accordingly. The task was conducted in a new context, which has been shown to promote re-engagement of the goal system (Bouton, 2021).

On each trial, participants were required to choose between two ‘chests’ based on their associated reward value. Each chest depicted an image identifying the sequence that needed to be completed to be opened. After choosing which chest they wanted, participants had to play the specific correct sequence to open it. Their task was to learn by trial and error which chest would give them more rewards (gems), which by the end of the experiment would be converted into real monetary reward. There was no penalty for incorrectly keyed sequences because behavior was assessed based on participants’ choice regardless of the sequence accuracy.

Four chest-pairs (conditions, 40 trials each) were tested (see Figure 8a and methods for detailed description of each condition): three conditions pitted the trained/familiar app sequence against alternative sequences of higher monetary outcomes (given by variable amount of reward that did not overlap [deterministic]). The fourth condition kept the monetary value equivalent for the two options (thus rendering a probabilistic rather than deterministic contingency) but provided a much easier/shorter alternative sequence, and thus pitted the intrinsic value of the familiar sequence against a motorically less effortful sequence. The conditions were presented sequentially but counterbalanced among participants.

Re-evaluation procedure: 2-choice appetitive learning task.
a) shows the task design. We tested 4 conditions, with chest-pairs corresponding to the following motor sequences: 1) app preferred sequence *versus* any 6-move sequence; 2) app preferred sequence *versus* novel (difficult) sequence; 3) app preferred sequence *versus* app non-preferred sequence; 4) app preferred sequence *versus* any 3-move sequence. The ‘any 6-move’ or ‘any 3-move’ sequences could comprise any key press of the participant’s choice and could be played by different key press combinations on each trial. The ‘novel sequence’ (in 2) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the test day (therefore, not overtrained). In conditions 1, 2 and 3, the preferred app sequence was pitted against alternative sequences of higher monetary value. In condition 4, the intrinsic value of the preferred app sequence was pitted against a motorically less effortful sequence (i.e. a shorter/easier sequence). Each condition addressed specific research questions, which are detailed in the right column of the table. b) demonstrates the task performance per group and over the 4 conditions. Both groups were able to switch from previous automatic to new goal-directed action sequences as a function of monetary re-evaluation. When re-evaluation involved an effort manipulation, OCD patients chose the app sequence significantly more than HV (* = p < 0.05) (condition 4). Y-axis depicts the number of app-sequence chests chosen (40 trials maximum) and connected lines depict mean values.

Both groups were highly sensitive to the re-evaluation procedure based on monetary feedback, choosing more often the non-app sequence, irrespective of the novelty of that sequence (Figure 8b, no group effects (p = 0.210 and BF = 0.742, anecdotal evidence). However, when re-evaluation involved motoric effort (condition 4), participants did not choose the alternative, ‘any 3’, the lower motoric effort sequence as readily (Kruskal-Wallis main effect of condition: H = 151.1 p < 0.001) and OCD patients persisted significantly more than HV with the trained app sequence (post hoc group x condition 4 comparison: p = 0.04). In conclusion, after the month’s training, both groups demonstrated ability to arbitrate between previous automatic and new goal-directed action sequences as a function of monetary re-evaluation. However, OCD patients nevertheless chose the familiar sequence when an easier and shorter (thus motorically less effortful) alternative was available.

Mobile-app performance effect on symptomatology: exploratory analyses

In a debriefing questionnaire, participants were asked to give feedback about their app training experience and how it interfered with their routine: a) how stressful/relaxing the training was (rated on a scale from -100% highly stressful to 100% very relaxing); b) how much it impacted their life quality (Q) (rated on a scale from -100% maximum decrease to 100% maximum increase in life quality). Supplementary Table S4 and Figure S2 depicts participants’ qualitative and quantitative feedback. Of the 33 HV, 30 reported the app was neutral and did not impact their lives, neither positively nor negatively. The remaining 3 reported it as being a positive experience, with an improvement in their life quality (rating their life quality increase as 10%, 15% and 60%). Of the 32 patients assessed, 14 unexpectedly showed improvement (I) in their OCD symptoms during the month as measured by the YBOCS difference, in percentage terms, pre-post training (Ī= 20 ± 9%), 5 felt worse (Ī= -19 ± 9%) and 13 remained stable during the month (all errors are standard deviations). Of the 14 who felt better, 10 directly related their OCD improvement to the app training (life quality increase: Q= 43 ± 24%). Nobody rated the app negatively. Of note, the symptom improvement was positively correlated with patients’ habitual tendencies reported in the Creature of Habits questionnaire, particularly with the routine subscale (Pearson r = 0.45, p = 0.01) (Figure 9a, left). A three-way ANOVA test showed that patients who reported less obsessions and compulsions after the month training were the ones with more pronounced habit routines (Group effect: F = 13.7, p < 0.001, Figure 9a, right). A strong positive correlation was also found between the OCD improvement reported subjectively as direct consequence of the app training and the OCI scores and reported habit tendencies (Pearson r = 0.8, p = 0.008; Pearson r = 0.77, p < 0.01, respectively) (Figure 9b): i.e., patients who considered the app somewhat beneficial were the ones with higher compulsivity scores and higher habitual tendencies. In HV, participants who also had greater tendency for automatic behaviors, regarded the app as more relaxing (Pearson r = 0.44, p < 0.01). However, such correlation between the self-reported relaxation measure attributed to the app and the COHS automaticity subscale was not observed in OCD (p = 0.1). Additionally, there was no correlation between patients’ symptom improvement and how relaxing they considered the app training (p = 0.1).

Mobile-app effect on symptomatology.
a) *Left:* positive correlation between patients’ routine tendencies reported in the Creature of Habits (COHS) questionnaire and the symptom improvement (Pearson r = 0.45, p = 0.01). Symptom improvement was measured by the difference in YBOCS scale before and after app training. *Right*: Patients with greater improvement in their symptoms after the one month app training had greater habitual tendencies as compared to HV (p < 0.001) and to patients who did not improve post-app training (p = 0.002). The bar plot represents the sample means and the vertical lines the confidence interval. Individual data points are included to show dispersion in the sample. b) OCD patients who related their symptom improvement directly to the app training were the ones with higher compulsivity scores on the OCI (Pearson r = 0.8, p = 0.008) (*left*) and higher habitual tendencies on the COHS (Pearson r = 0.77, p < 0.01) (*right*). Note that b) has one missing patient because he did not complete the OCI and COHS scales.
**Abbreviations**: OCI = Obsessive-Compulsive Inventory, COHS = Creature of Habits Scale, YBOCS = Yale-Brown obsessive-compulsive scale, HV = Healthy volunteers, OCD = patients with obsessive-compulsive disorder.

Other self-reported symptoms

In addition to the Creature of Habit findings, of the remaining self-reported questionnaires assessed (see Methods), OCD patients also reported enhanced intolerance of uncertainty, elevated motivation to avoid aversive outcomes and higher perfectionism, worries and perceived stress, as compared to healthy controls (see Table 1 for statistical results and Figure 10 in the Methods section for overall summary).

Self-reported measures on various scales measuring impulsiveness, compulsiveness, habitual tendencies, self-control, behavioral inhibition and activation, intolerance of uncertainty, perfectionism, stress and the trait of worry.

a) Participants’ demographics and clinical characteristics. b) Between group results from the self-reported questionnaires. **Abbreviations**: HV, Healthy Volunteers; OCD, Patients with Obsessive-Compulsive Disorder; Y-BOCS, Yale-Brown obsessive-compulsive scale; MADRS, Montgomery-Asberg Depression Rating Scale; STAI, The State-Trait Anxiety Inventory; BDI, Beck Depression Inventory; OCI, Obsessive-Compulsive Inventory; CPAS, Compulsive Personality Assessment Scale; COHS, Creature of Habit Scale; HSCQ, Habitual Self Control Questionnaire; BIS, Behavioral Inhibition System; BAS, Behavioral Activation System; Barratt, Barratt Impulsiveness Scale; IUS, Intolerance of Uncertainty Scale; SCS, Self-Control Scale; FMPS, Frost Multidimensional Perfectionism Scale; PSS, Perceived Stress Scale; PSWQ, Penn State Worry Questionnaire. ** = p < 0.01, *** = p <0.001.

Discussion

This study investigated the possible roles of habits, including their automaticity, and impaired goal/habit arbitration as explanations of compulsive OCD symptoms. We focused specifically on the habit component of the associative dual-process (i.e. goal/habit) model of behavior as applied to OCD, and described in the Introduction. First, we provide evidence, via a recently developed self-report questionnaire (Ersche et al., 2017), of greater subjective habitual tendencies in patients with OCD as compared to controls, in terms of both factors of ‘routine’ and ‘automaticity’. Then, using a novel smartphone tool, we studied in real life the acquisition of two putative habits (6-element action sequences) for the first time in patients with OCD during their everyday schedule and home environment, while continuously collecting 30 days of real-time data. We found that OCD patients engaged more with the app training than healthy volunteers: patients enjoyed performing the sequences and practiced them significantly more, even though we did not request additional training. Despite performing the action sequences slower and at a more irregular speed at the beginning of training, OCD patients reached the same asymptotic level of automaticity (objectively determined) as healthy controls and exhibited equivalent evidence of ‘chunking’ (Smith and Graybiel, 2016). There was no evidence of procedural learning deficits per se in patients, but they attained automaticity significantly more slowly than controls. In subsequent, second phase testing conducted in a new context, we confirmed that both groups successfully transferred both trained action sequences to their corresponding discriminative stimuli (i.e., visual icons). Both groups also demonstrated successful arbitration between their previous automatic and new goal-directed action sequences in that they preferred the monetarily more valuable sequences. Nevertheless, OCD patients disadvantageously preferred the previously trained/familiar action sequence under certain conditions. First, exploratory analysis revealed that those patients with higher habitual tendencies and compulsivity scores significantly preferred the familiar sequence. Second, when required to choose between the familiar and a novel, less effortful sequence, the OCD group preferred the previously trained option. In both cases, this preference for the familiar sequence in OCD patients presumably arose because of its intrinsic value. These results are relevant to the theory of goal-direction/habit imbalance in OCD (Gillan et al., 2016) by suggesting the predominance of habits in certain contexts where such habits may have acquired intrinsic value. One possible source of such value is symptom relief. This would be consistent with follow-up findings that many of the patients found the app to be beneficial, improving their symptomatology after the month training (as measured by the Y-BOCS scale difference pre-post training, as well as individual feedback).

Implications for the dual associative theory of habitual and goal-directed control

Rapid execution, invariant response topography, action chunking and low cognitive load, have all been considered essential criteria for the definition of habits (Balleine and Dezfouli, 2019; Haith and Krakauer, 2018). We have successfully achieved all of these elements with our app using the criteria of extensive training and context stability, both previously shown to be essential to enhance formation and strengthening of habits (Haith and Krakauer, 2018; Verplanken and Wood, 2006). Context stability was provided by the tactile, visual and auditory stimulation associated with the phone itself, which establishes a strong and similar context for all participants, regardless of their concurrent circumstances. Overtraining has been one of the most important criteria for habit development, and used by many as an operational definition on how to form a habit (Dickinson et al., 1995; Haith and Krakauer, 2018; Tricomi et al., 2009) (for a review see Balleine and O’Doherty, 2010), despite current controversies raised by de Wit et al., 2018 on its use as an objective test of habits. A recent study has demonstrated though that even short overtraining (1 day only) is effective at producing habitual behavior in participants high in affective stress (Pool et al., 2022), confirming previous suggestions for the key role of anxiety and stress on the behavioral expression of habits (Dias-Ferreira et al., 2009; Hartogsveld et al., 2020; Schwabe and Wolf, 2009). Here we have trained a clinical population with moderately high baseline levels of stress and anxiety, with training sessions of a higher order of magnitude than in previous studies (de Wit et al., 2018, 2018; Gera et al., 2022) (30 days instead of 3 days). By all accounts our overtraining is valid: to our knowledge the longest overtraining in human studies achieved so far. Both OCD patients and healthy controls attained automaticity in this study, exhibiting similar and stable asymptotic performance, both in terms of speed and the invariance in the kinematics of the motor movement.

We succeeded in achieving automaticity - which at a neural level is known to reliably engage the brain’s habitual circuitry (Ashby et al., 2010; Bassett et al., 2015; Graybiel and Grafton, 2015; Lehericy et al., 2005) - and fulfilled three of the four criteria for the definition of habits according to Balleine and Dezfouli 2019 (Balleine and Dezfouli, 2019) (rapid execution, invariant topography and chunked action sequences). We were not, however, able to test the fourth criterion, of resistance to devaluation. Therefore, we are unable to firmly conclude that the action sequences are habits rather than, for example, goal-directed skills. According to a very recent study, also employing an app to study habitual behavior, the criterion of devaluation resistance was shown to apply to a simple 2-element sequence with less training (Gera et al., 2022). Thus, it remains possible that the overtraining of our 6-element sequence might have achieved behavioral autonomy from the goal in addition to behavioral automaticity.

Regardless of whether the trained action sequences can be defined as habits or goal-directed motor skills, it has to be considered why OCD patients chose the familiar sequences in certain conditions, even when it was superficially maladaptive to do so (i.e. effort condition). This leads us to postulate that action sequences may be motivated by more than one explicit goal (i.e. money in such patients), especially given the apparent therapeutic (intrinsic) value of their performance, and therefore the difficulty of allocating specific, single goals to human action sequences. One implication of this analysis may be to consider that behavior in general is ‘goal-directed’ but may vary in the balance of control by extrinsic and intrinsic goals. This may be consistent with motor control theories that refer to the successful completion of a motor action itself, in the spatio-temporal sense, as being ‘goal-related’. For any action sequences there may thus exist an hierarchy of goals underpinning performance, ranging for example from explicit monetary feedback to intrinsic relief from an endogenous state (e.g. anxiety or boredom). Therefore, the dual associative process account of behavioral control may be reconstrued in terms of the relative balance of extrinsic to intrinsic outcomes. Another possible formulation is that habits, which depend initially on cached or historically acquired rewarding action values may not necessarily lose current value, but instead acquire alternative sources of value (O’Doherty, 2014).

Implications for understanding OCD symptoms

We observed a slower and greater irregularity of performance in patients with OCD as compared to controls at the beginning of the training. This was expected given previous reports of visuospatial and fine-motor skill difficulties in patients with OCD (Bloch et al., 2011). However, despite this initial slowness, no procedural learning deficits were found in patients. Such a finding is inconsistent with other implicit learning deficits previously reported in OCD using the serial reaction time (SRT) paradigm (Deckersbach et al., 2002; Joel et al., 2005; Kathmann et al., 2005; Rauch et al., 2001, 1997). However, it is in line with recent studies demonstrating successful learning both in patients with OCD (Soref et al., 2018) and healthy individuals with subclinical OCD symptoms (Barzilay et al., 2022) when instructions are given explicitly and participants intentionally search for the underlying sequence structure. In fact, our task does not tap into memory processes as strongly as SRT tasks because we explicitly demonstrate the sequence to participants before they begin their 30-day training, which likely decreases demands on procedural learning.

The quantification of trial-to-trial behavioral changes as function of a drop or increase in reward further revealed that the slower automaticity progression observed in OCD patients was mainly driven by a reduced sensitivity to changes in feedback scores in this group relative to healthy participants. Both groups, however, reproduced the IKI patterns more consistently following reward increments. This outcome contrasts with the more pronounced acceleration of MT in both samples in response to negative reward changes. Greater sensitivity to negative feedback has been reported previously (Becker et al., 2014). Here we show it has a dissociable effect on sequence duration and IKI consistency. In particular, we observed that the reduction in feedback scores interfered with automatization, despite a general beneficial effect on movement speed. The enhanced sensitivity to negative feedback is in line with recent studies showing higher response switching followed negative feedback (Marzuki et al., 2021), hyperactive monitoring system, increased prediction errors (Hauser et al., 2017) and enhanced error-related negativity amplitudes in OCD, the latter currently considered a biomarker for the disorder (Endrass and Ullsperger, 2014; Gehring et al., 2000; Riesel, 2019).

Considering the hypothetically greater tendency in OCD to form habitual/automatic actions described earlier (Gillan et al., 2014; Voon et al., 2015), we predicted that OCD patients would attain automaticity faster than healthy controls. This was not the case. In fact, the opposite was found. Since this was the first study to our knowledge assessing action sequence automatization in OCD, our contrary findings may confirm recent suggestions that previous studies were tapping into goal-directed behavior rather than habitual control per se (Gillan et al., 2015b; Vaghi et al., 2018; Zwosta et al., 2018) and may therefore have inferred enhanced habit formation in OCD as a defaulting consequence of impaired goal-directed responding. On the other hand, we are describing here two potential sources of evidence in favor of enhanced habit formation in OCD. First, OCD patients show a bias towards the previously trained, apparently disadvantageous, action sequences. In terms of the discussion above, this could possibly be reinterpreted as a narrowing of goals in OCD (Robbins et al., 2019) underlying compulsive behavior, in favor of its intrinsic outcomes. Secondly, OCD patients self-reported greater habitual tendencies in both the ‘routine’ and ‘automaticity’ subscales. Previous studies have reported that subjective habitual tendencies are associated with compulsive traits (Ersche et al., 2019; Wuensch et al., 2022) and act, in addition to cognitive inflexibility, as a predictor of subclinical OCD symptomatology in healthy populations (Ramakrishnan et al., 2022). There is an apparent discrepancy between self-reported ‘automaticity’ and the objective measure of automaticity we provided. This may result from a possible mis-labelling of this factor in the Creature of Habit questionnaire, where many of the relevant items indicate automatic S-R elicitation by situational triggering stimuli rather than motor topographic features of the behavior (e.g. ‘when walking past a plate of sweets or biscuits, I can’t resist taking one’).

Finally, we also expected that OCD patients would show a functional impairment in arbitrating between actions, represented by a greater resistance than controls to redirect themselves to a goal state. By testing goal/habit arbitration using a re-evaluation task that applied a contextual change, which has previously been shown to recall attention and reengage the goal circuitry (Bouton, 2021; Vandaele and Ahmed, 2021), we found that the dynamic arbitration between previous automatic, and new goal-directed, action sequences observed in OCD patients was different from healthy volunteers, being driven hypothetically by the intrinsic value associated with the automatic sequences.

Possible beneficial effect of action sequence training on OCD symptoms as habit reversal therapy

OCD patients engaged significantly more with the motor sequencing app and enjoyed it more than healthy volunteers. Additionally, those patients more prone to routine habits and with higher OCI scores found use of the app beneficial, showing symptomatic improvement based on the YBOCS scale. One hypothesis for the therapeutic potential of this motor sequencing training is that the trained action sequences may disrupt OCD compulsions either via ‘distraction’ or habit ‘replacement’ by engaging the same neural ‘habit circuitry’. This habit ‘replacement’ hypothesis is in line with successful interventions in Tourette Syndrome (Hwang et al., 2012), Tic disorders and Trichotillomania (Morris et al., 2013).

Limitations

As mentioned above, we were unable to employ the often-mooted ‘gold standard’ criterion of resistance to devaluation because it would have invalidated the goal/habit arbitration test. This meant that we were unable to conclusively define the trained action sequences as habitual, although they satisfied other important criteria such as automatic execution, invariant response topography and action chunking and low cognitive load. Nevertheless, the utility of the devaluation criterion has been questioned especially when applied to human studies of habit learning because devaluation can be difficult to achieve given that human behavior has multiple goals some of which may be implicit, and thus difficult to control experimentally, as well as being subject to great individual variation.

Although we found a significant preference for the trained action sequence in OCD patients in the condition where it was pitted against a simpler and shorter motor sequence, as compared to the monetary discounting condition, the reason for this difference is not immediately obvious. However, it may have arisen because of the nature of the contingencies inherent in these choice tests. Specifically, the ‘monetary discounting’ condition involved a simple deterministic choice between the two alternatives, which should readily be resolved in favor of the option associated with the greater, non-overlapping, range of rewards provided (e.g. 1-7 versus 8-15 gems). In contrast, in the ‘effort discounting’ condition, the reward ranges for the two options were equivalent (e.g. 1-7 gems), which raised uncertainty concerning which of the chosen sequences was optimal. The probabilistic constraint over this choice may therefore account for the greater sensitivity of the task in highlighting preference in OCD, given the greater susceptibility of such patients to uncertainty (Pushkarskaya et al., 2015).

Finally, some of the conclusions relating to the effects of OCD diagnosis on sequence preference without feedback were based only on a post hoc exploratory analysis. Specifically, only those patients with higher compulsivity (OCI) and Creature Of Habit (COHS) scores exhibited this preference, therefore consistent with the hypothesis described above of the importance of intrinsic value of the habitual sequence to the development of compulsions. Evidence of this intrinsic value was provided by the greater engagement with, and therapeutic findings for, the app training in these patients. However, the latter effect needs to be confirmed in a registered clinical NHS trial in a controlled manner, which is ongoing.

Conclusion

We used a combination of behavioral tasks that addressed two key hypotheses from the goal/habit imbalance theory of compulsion relating to greater automaticity and impaired goal habit arbitration in OCD. In the initial phase, a novel app-based procedure for measuring action sequence learning and performance revealed evidence of equivalent procedural learning and attainment of objective automatization criteria of habitual performance in healthy volunteers and patients with OCD. A second phase found evidence for no impaired goal/habit arbitration in OCD following re-evaluation based on monetary feedback although there was greater preference for the trained action sequence under certain conditions. These findings may lead to a reformulation of the goal/habit imbalance hypothesis in OCD.

Finally, OCD patients with higher compulsivity scores and habitual tendencies showed more engagement with the motor training app and reported symptom alleviation, with implications for its potential use as a form of habit reversal therapy.

Materials and Methods Participants

We recruited 33 OCD patients and 34 healthy individuals, matched for age, gender, IQ and years of education. Two participants (1 HV and 1 OCD) were excluded because they did not perform the minimum required training (i.e. 2 daily practices for a period of 30 days). Therefore, a total of 32 OCD patients (19 females) and 33 healthy controls (19 females) were included in the analysis. Most participants were right-handed (left-handed: 4 OCD and 6 HV). Participants’ demographics and clinical characteristics are presented in Table 2 and Figure 10.

Demographic and clinical characteristics of OCD patients and matched healthy controls

Healthy individuals were recruited from the community, were all in good health, unmedicated and had no history of neurological or psychiatric conditions. Patients with OCD were recruited through an approved advertisement on the OCD action website (www.ocdaction.org.uk) and local support groups and via clinicians in East Anglia. All patients were screened by a qualified psychiatrist of our team, using the Mini International Neuropsychiatric Inventory (MINI) to confirm the OCD diagnosis and the absence of any comorbid psychiatric conditions. Patients with hoarding symptoms were excluded. Our patient sample comprised 6 unmedicated patients, 20 taking selective serotonin reuptake inhibitors (SSRIs) and 6 on a combined therapy (SSRIs + antipsychotic). OCD symptom severity and characteristics were measured using the Y-BOCS scale (Goodman, 1989), mood status was assessed using the Montgomery-Asberg Depression Rating Scale (MADRS) (Montgomery and Asberg, 1979) and Beck Depression Inventory (BDI) (Beck et al., 1961), anxiety levels were evaluated using the State-Trait Anxiety Inventory (STAI) (Spielberger et al., 1983), and verbal IQ was quantified using the National Adult Reading Test (NART) (Nelson and Willison, 1982). All patients included suffered from OCD and scored > 16 on the Y-BOCS, indicating at least moderate severity. They were also free from any additional axis-I disorders. General exclusion criteria for both groups were substance dependence, current depression indexed by scores exceeding 16 on the MADRS, serious neurological or medical illnesses or head injury. All participants completed additional self-report questionnaires measuring:

impulsiveness: Barratt Impulsiveness Scale (Barratt, 1994)
compulsiveness: Obsessive Compulsive Inventory (Foa et al., 1998) and Compulsive Personality Assessment Scale (Fineberg et al., 2007)
habitual tendencies: Creature of Habit Scale (Ersche et al., 2017)
self-control: Habitual Self-Control Questionnaire (Schroder et al., 2013) and Self-Control Scale (Tangney et al., 2004)
behavioral inhibition and activation: BIS/BAS Scale (Carver and White, 1994)
intolerance of uncertainty (Buhr and Dugas, 2002)
perfectionism: Frost Multidimensional Perfectionism Scale (Frost and Marten, 1990)
stress: Perceived Stress Scale (Cohen et al., 1983)
trait of worry: Penn State Worry Questionnaire (Meyer et al., 1990).

All participants gave written informed consent prior to participation, in accordance with the Declaration of Helsinki, and were financially compensated for their participation. This study was approved by the East of England - Cambridge South Research Ethics Committee (16/EE/0465).

Phase B: Tests of action-sequence preference and goal/habit arbitration

After completing the month-long app training, participants attended a laboratory session to conduct additional behavioral tests aiming at assessing preference for familiar versus novel sequences (experiment 2 and 3) including a re-evaluation test to assess goal/habit arbitration (experiment 3 only). The description of the tasks is below and instructions to the participants are presented in Table 3. Since these follow-up tests required observing additional stimuli while performing the action sequences, it was impractical to use participant’s individual iPhones to simultaneously present the task stimuli and be an interface to play the action sequences. We therefore used a ‘makey-makey’ device which enabled connecting the testing laptop (presenting the task stimuli) to four keys made of playdough arranged on a table (used as an interface for the action sequences input). This device allowed for accurate registration of input keys as well as their timings. The playdough keys were built with the same size as the keys shown in the screens of the individual iPhones used for the one month training. Participants were asked to practice the action sequences trained at home within the new context for several minutes, until they became familiar and comfortable with it. Hence, the transition to a non-mobile/laboratory context was conducted with great care.

Experiment 2: explicit preference task

Participants observed, on each trial, 2 sequences identified by a corresponding image, and were asked to choose which one they wanted to play. Once choice was made, the image correspondent to the selected sequence was highlighted in blue. Participants then played the sequence. The task included 3 conditions (15 trials each). Each condition comprised a specific sequence pair: 2 experimental conditions pairing the app preferred sequence (putative habit) with a goal-seeking sequence and 1 control condition pairing both app sequences trained at home. The conditions were as follows: 1) app preferred sequence versus app non-preferred sequence (control condition) 2) app preferred sequence versus any 6-move sequence (experimental condition 1); 3) app preferred sequence versus any 3-move sequence (experimental condition 2). The app preferred sequence was the putative habitual sequence and the ‘any 6’ or ‘any 3’-move sequences were the goal-seeking sequences because they are supposedly easier: they could comprise any key press of participant’s choice (for example the same single key press repeatedly 6 or 3 times respectively) and they could have same or different key press combinations every time the ‘any-sequence’ needed to be input. The conditions (15 trials each) were presented sequentially but counterbalanced among participants. Figure 7a for illustration of the task.

Experiment 3: two-choice appetitive learning task

On each trial, participants were presented with two ‘chests’, each containing an image identifying the sequence that needed to be completed to be able to open the chest. Participants had to choose which chest to open and play the correct sequence to open it. Their task was to learn by trial and error which chest would give them more rewards ‘gems’, which by the end of the experiment would be converted into real monetary reward. If mistakes were made inputting the sequences, participants could simply repeat the moves until they were correct, without any penalty. Behavior was assessed based on participants’ choice, regardless of the accuracy of the sequence. The task included 4 conditions (40 trials each), with chest-pairs correspondent to the following motor sequences (see also figure 8 for illustration of each condition):

condition 1: app preferred sequence versus any 6-move sequence
condition 2: app preferred sequence versus a novel (difficult) sequence
condition 3: app preferred sequence versus app non-preferred sequence
cndition 4: app preferred sequence versus any 3-move sequence

As in the preference task described above, the ‘any 6-move’ or ‘any 3-move’ sequences could comprise any key press of participant’s choice (for example the same single key press repeatedly 6 or 3 times respectively) and could be played by different key press combinations on each trial. The novel sequence (in condition 3) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the day, before starting this task (therefore, not overtrained). In conditions 1, 2 and 3, higher monetary outcomes were given to the alternative sequences. To remove the uncertainty confound commonly linked to probabilistic tasks, conditions 1, 2 and 3 followed a deterministic nature: in all trials, the choice for the preferred app sequence was rewarded with smaller monetary outcomes (sampled from a random distribution between 1-7 gems) whereas the alternative option always provided higher monetary outcomes (sampled from a random distribution between 8-15 gems). Therefore, variable amount of reward that did not overlap was given (deterministic). Condition 4, on the other hand, kept the monetary value equivalent for the two options (thus rendering a probabilistic rather than deterministic contingency) but provided a much easier/shorter alternative sequence, and thus pitted the intrinsic value of the familiar sequence against a motorically less effortful sequence. The conditions were presented sequentially but counterbalanced among participants.

Statistical analyses

Participant’s characteristics and self-reported questionnaires were analyzed with χ² and independent t-tests respectively. The Motor Sequencing App automatically uploaded the data to a cloud-based database. This task enabled us to compare patients with OCD and healthy volunteers in the following measures: training engagement (which included as primary output measures of the total number of practices completed and app engagement as defined as the number of sequences attempted, including both correct and incorrect sequences); procedural learning, automaticity development, sensitivity to reward (see definitions and description of data analyses in results section) and training effects on symptomatology as measured by the YBOCS difference pre-post training. The Phase B experiments enabled further investigation of preference and goal/habit arbitration. The primary outcome was the number of choices.

Between-group analyses were conducted using Kruskal-Wallis H tests when the normality assumption was violated. Parametric factorial analyses were carried out with analyses of variance (ANOVA). Our alpha level of significance was 0.05. On the descriptive statistics, main values are represented as median, and errors are reported as interquartile range unless otherwise stated, due to the non-Gaussian distribution of the datasets. When conducting several tests related to the same hypothesis, or when running several post-hoc tests following factorial effects, we controlled the FDR at level q = 0.05. Significant values after FDR control are denoted by p_FDR. For the Dunn’s post hoc pairwise comparisons we have used Bonferroni correction denoted by p_bonf. Analysis were performed using Python version 3.7.6 and JASP version 0.14.1.0.

In the case of non-significant effects in the factorial analyses, we assessed the evidence in favor or against the full factorial model relative to the reduced model with Bayes Factors (BF: ratio BFfull/BFrestricted) using the bayesFactor toolbox (https://github.com/klabhub/bayesFactor) in MATLAB®. This toolbox implements tests that are based on multivariate generalizations of Cauchy priors on standardized effects(Rouder et al., 2012) (Rouder et al., 2012). As recommended by Rouder and colleagues (2012), we defined the restricted models as the full factorial model without one specific main or interaction effect. The ratio BFfull/BFrestricted represents the ratio between the probability of the data being observed under the full model and the probability of the same data under the restricted model. BF values were interpreted following(Andraszewicz et al., 2015). The relationship between primary outcomes and clinical measures was calculated using a Pearson correlation.

Supporting information

Supplementary Results

Data Availability

The source data for all figures and analyses are provided with this paper. They are available in the Open Science Framework, in the following link: https://osf.io/9xrdz/

https://osf.io/9xrdz/

Data availability

The source data for all figures and analyses are provided with this paper. They are available in the Open Science Framework, in the following link: https://osf.io/9xrdz/

Acknowledgements

This research was funded by the Wellcome Trust: a Sir Henry Postdoctoral Research Fellowship (Grant 204727/Z/16/Z) to PB and a Wellcome Trust Senior Investigator Award (Grant 104631/Z/14/Z) to TWR. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted paper version arising from this submission. MHR was partially supported by the Basic Research Program of the National Research University Higher School of Economics (Russian Federation). MB was supported by MHRUK and Angharad-Dodds Bursaries. AAM was supported as a research assistant funded by the aforementioned Wellcome Trust grant TWR. We thank all participants for their contributions to this study. We would also like to acknowledge Dr Sharon Morein-Zamir for fruitful brainstorm discussions on the tasks design.

Disclosures

TWR discloses consultancy with Cambridge Cognition and receives research grants from Shionogi & Co. He also has editorial honoraria from Springer Verlag and Elsevier. All other authors report no potential conflicts of interest. NAF in the past three years has received research funding paid to her institution from the NIHR, COST Action and Orchard. She has received payment for lectures from the Global Mental Health Academy and for expert advisory work on psychopharmacology from the Medicines and Healthcare Products Regulatory Agency and an honorarium from Elsevier for editorial work. She has additionally received financial support to attend meetings from the British Association for Psychopharmacology, European College for Neuropsychopharmacology, Royal College of Psychiatrists, International College for Neuropsychopharmacology, World Psychiatric Association, International Forum for Mood and Anxiety Disorders, American College for Neuropsychopharmacology. In the past she has received funding from various pharmaceutical companies for research into the role of SSRIs and other forms of medication as treatments for OCD and for giving lectures and attending scientific meetings.

Significance of findings

Strength of evidence

Abstract

Introduction

Outline

Hypothesis

Results

Self-reported habit tendencies

Phase A: Experiment 1

Motor Sequence Acquisition using the App

Motor Sequencing App.

Practice Schedule

Training engagement

Training Engagement.

Learning

Learning.

Automaticity

Automaticity.

Sensitivity of sequence duration to reward

Sensitivity of movement time to changes in reward in the continuous reward schedule.

Sensitivity of IKI consistency (C) to reward

Sensitivity of IKI consistency (C) to reward changes in the continuous schedule.

Phase B: Tests of action-sequence preference and goal/habit arbitration

Experiment 2

Preference for familiar versus novel action sequences

Preference for familiar versus novel action sequences. a) Explicit Preference Task.

Experiment 3

Test of goal/habit arbitration: re-evaluation of the learned action sequence

Re-evaluation procedure: 2-choice appetitive learning task.

Mobile-app performance effect on symptomatology: exploratory analyses

Mobile-app effect on symptomatology.

Other self-reported symptoms

Discussion

Implications for the dual associative theory of habitual and goal-directed control

Implications for understanding OCD symptoms

Possible beneficial effect of action sequence training on OCD symptoms as habit reversal therapy

Limitations

Conclusion

Materials and Methods Participants

Phase B: Tests of action-sequence preference and goal/habit arbitration

Follow up task instructions

Experiment 2: explicit preference task

Experiment 3: two-choice appetitive learning task

Statistical analyses

Supporting information

Data Availability

Data availability

Acknowledgements

Disclosures

References

Article and author information

Author information

Paula Banca

Maria Herrojo Ruiz

Miguel Fernando Gonzalez-Zalba

Marjan Biria

Aleya A. Marzuki

Thomas Piercy

Akeem Sule

Naomi Anne Fineberg

Trevor William Robbins

Version history

Cite all versions

Copyright

Metrics