Action-sequence learning, habits and automaticity in obsessive-compulsive disorder

  1. Department of Psychology, University of Cambridge, Cambridge CB2 3EB, UK
  2. Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge CB2 3EB, UK
  3. Department of Psychology, Goldsmiths, University of London, London SE146NW, UK
  4. Quantum Motion Technologies, Windsor House, Cornwall Road, Harrogate HG1 2PW, United Kingdom
  5. Department of Psychology, School of Medical and Life Sciences, Sunway University, Petaling Jaya, Malaysia
  6. Department of Psychiatry, School of Clinical Medicine, University of Cambridge, Cambridge, UK
  7. Hertfordshire Partnership University NHS Foundation Trust, Welwyn Garden City, Hertfordshire
  8. University of Hertfordshire, Hatfield, UK

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Redmond O'Connell
    Trinity College Dublin, Dublin, Ireland
  • Senior Editor
    Floris de Lange
    Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands

Reviewer #1 (Public Review):

It is known that aberrant habit formation is a characteristic of obsessive-compulsive disorder (OCD). Habits can be defined according to the following features (Balleine and Dezfouli, 2019): rapid execution, invariant response topography and action 'chunking'. The extent to which OCD behavior is derived from enhanced habit formation relative to deficits in goal-directed behavior is a topic of debate in the current literature. This study examined habit-learning specifically (cf. deficits in goal-directed behavior) by regularly presenting, via smartphone, sequential learning tasks to patients with OCD and healthy controls. Participants engaged in the tasks every day over the course of a month. Automaticity, including the extent to which individual actions in the sequence become part of a unified 'chunk', was an important outcome variable. Following the 30 days of training, in-laboratory tasks were then administered to examine 1) if performing the learned sequences themselves had become rewarding 2) differences in goal-directed vs. habitual behavior.

Several hypotheses were tested, including:
Patients would have impaired procedural learning vs. healthy volunteers (this was not supported, possibly because there were fewer demands on memory in the task used here)
Once the task had been learned, patients would display automaticity faster (unexpectedly, patients were slower to display automaticity)
Habits would form faster under a continuous (vs. variable) reinforcement schedule

Exploratory analyses were also conducted: an interesting finding was that OCD patients with higher self-reported symptoms voluntarily completed more sessions with the habit-training app and reported a reduction in symptoms.

Strengths

This paper is well situated theoretically within the habit learning/OCD literature.
Daily training in a motor-learning task, delivered via smartphone, was innovative, ecologically valid and more likely to assay habitual behaviors specifically. Daily training is also more similar to studies with non-humans, making a better link with that literature. The use of a sequential-learning task (cf. tasks that require a single response) is also more ecologically valid.
The in-laboratory tests (after the 1 month of training) allowed the researchers to test if the OCD group preferred familiar, but more difficult, sequences over newer, simpler sequences.

Weaknesses

The sample size was relatively small. Some potentially interesting individual differences within the OCD group could have been examined more thoroughly with a bigger sample (e.g., preference for familiar sequences). A larger sample may have allowed the statistical testing of any effects due to medication status.
The authors were not able to test one criterion of habits, namely resistance to devaluation, due to the nature of the task

The authors achieved their aims in that two groups of participants (patients with OCD and controls) engaged with the task over the course of 30 days. The repeated nature of the task meant that 'overtraining' was almost certainly established, and automaticity was demonstrated. This allowed the authors to test their hypotheses about habit learning. The results are supportive of the author's conclusions.

This article is likely to be impactful -- the delivery of a task across 30 days to a patient group is innovative and represents a new approach for the study of habit learning that is superior to an in-laboratory approach.

An interesting aspect of this manuscript is that it prompts a comparison with previous studies of goal-directed/habitual responding in OCD that used devaluation protocols, and which may have had their effects due to deficits in goal-directed behavior and not enhanced habit learning per se.

Reviewer #2 (Public Review):

In this study, the researchers employed a recently developed smartphone application to provide 30 days of training on action sequences to both OCD patients and healthy volunteers. The study tested learning and automaticity-related measures and investigated the effects of several factors on these measures. Upon training completion, the researchers conducted two preference tests comparing a learned and unlearned action sequences under different conditions. While the study provides some interesting findings, I have a few substantial concerns:

1. Throughout the entire paper, the authors' interpretations and claims revolve around the domain of habits and goal-directed behavior, despite the methods and evidence clearly focusing on motor sequence learning/procedural learning/skill learning. There is no evidence to support this framing and interpretation and thus I find them overreaching and hyperbolic, and I think they should be avoided. Although skills and habits share many characteristics, they are meaningfully distinguishable and should not be conflated or mixed up. Furthermore, if anything, the evidence in this study suggests that participants attained procedural learning, but these actions did not become habitual, as they remained deliberate actions that were not chosen to be performed when they were not in line with participants' current goals.
2. Some methodological aspects need more detail and clarification.
3. There are concerns regarding some of the analyses, which require addressing.

Please see details below, ordered by the paper sections.

Introduction:
It is stated that "extensive training of sequential actions would more rapidly engage the 'habit system' as compared to single-action instrumental learning". In an attempt to describe the rationale for this statement the authors describe the concept of action chunking, its benefits and relevance to habits but there is no explanation for why sequential actions would engage the habit system more rapidly than a single-action. Clarifying this would be helpful.

In the Hypothesis section the authors state: "we expected that OCD patients... show enhanced habit attainment through a greater preference for performing familiar app sequences when given the choice to select any other, easier sequence." I find it particularly difficult to interpret preference for familiar sequences as enhanced habit attainment.

A few notes on the task description and other task components:
It would be useful to give more details on the task. This includes more details on the time/condition of the gradual removal of visual and auditory stimuli and also on the within practice dynamic structure (i.e., different levels appear in the video).

Some more information on engagement-related exclusion criteria would be useful (what happened if participants did not use the app for more than one day, how many times were allowed to skip a day etc.).

According to the (very useful) video demonstrating the task and the paper describing the task in detail (Banca et al., 2020), the task seems to include other relevant components that were not mentioned in this paper. I refer to the daily speed test, the daily random switch test, and daily ratings of each sequence's enjoyment and confidence of knowledge.
If these components were not included in this procedure, then the deviations from the procedure described in the video and Banca al. (2020) should be explicitly mentioned. If these components were included, at least some of them may be relevant, at least in part, to automaticity, habitual action control, formulation of participants' enjoyment from the app etc. I think these components should be mentioned and analyzed (or at least provide an explanation for why it has been decided not to analyze them).
This is also true for the reward removal (extinction) from the 21st day onwards which is potentially of particular relevance for the research questions.

Training engagement analysis:
I find referring to the number of trials including successful and unsuccessful trials as representing participants "commitment to training" (e.g. in Figure legend 2b) potentially inadequate. Given that participants need at least 20 successful trials to complete each practice, more errors would lead to more trials. Therefore, I think this measure may mostly represent weaker performance (of the OCD patients as shown in Figure 2b). Therefore, I find the number of performed practice runs, as used in Figure 2a (which should be perfectly aligned with the number of successful trials), a "clean" and proper measure of engagement/commitment to training.

Also, to provide stronger support for the claim about different diurnal training patterns (as presented in Figure 2c and the text) between patients and healthy individuals, it would be beneficial to conduct a statistical test comparing the two distributions. If the results of this test are not significant, I suggest emphasizing that this is a descriptive finding.

Learning results:
When describing the Learning results (p10) I think it would be useful to provide the descriptive stats for the MT0 parameter (as done above for the other two parameters).

Sensitivity of sequence duration and IKI consistency (C) to reward:
I think it is important to add details on how incorrect trials were handled when calculating ∆MT (or C) and ∆R, specifically in cases where the trial preceding a successful trial was unsuccessful. If incorrect trials were simply ignored, this may not adequately represent trial-by-trial changes, particularly when testing the effect of a trial's outcome on performance change in the next trial.

I have a serious concern with respect to how the sensitivity of sequence duration to reward is framed and analyzed. Since reward is proportional to performance, a reduction in reward essentially indicates a trial with poor performance, and thus even regression to the mean (along with a floor effect in performance [asymptote]) could explain the observed effects. It is possible that even occasional poor performance could lead to a participant demonstrating this effect, potentially regardless of the reward. Accordingly, the reduced improvement in performance following a reward decrease as a function of training length described in Figure 5b legend may reflect training-induced increased performance that leaves less room for improvement after poor trials, which are no longer as poor as before. To address this concern, controlling for performance (e.g., by taking into consideration the baseline MT for the previous trial) may be helpful. If the authors can conduct such an analysis and still show the observed effect, it would establish the validity of their findings."
Another way to support the claim of reward change directionality effects on performance (rather than performance on performance), at least to some extent, would be to analyze the data from the last 10 days of the training, during which no rewards were given (pretending for analysis purposes that the reward was calculated and presented to participants). If the effect persists, it is less unlikely that the effect in question can be attributed to the reward dynamics.
This concern is also relevant and should be considered with respect to the Sensitivity of IKI consistency (C) to reward (even though the relationship between previous reward/performance and future performance in terms of C is of a different structure).
This concern is also relevant and should be considered with respect to the sensitivity of IKI consistency (C) to reward. While the relationship between previous reward/performance and future performance in terms of C is of a different structure, the similar potential confounding effects could still be present.

Another related question (which is also of general interest) is whether the preferred app sequence (as indicated by the participants for Phase B) was consistently the one that yielded more reward? Was the continuous sequence the preferred one? This might tell something about the effectiveness of the reward in the task.

Regarding both experiments 2 and 3:
The change in context in experiment 2 and 3 is substantial and include many different components. These changes should be mentioned in more detail in the Results section before describing the results of experiments 2 and 3.

Experiment 2:
In Experiment 2, the authors sometimes refer to the "explicit preference task" as testing for habitual and goal-seeking sequences. However, I do not think there is any justification for interpreting it as such. The other framings used by the authors - testing whether trained action sequences gain intrinsic/rewarding properties or value, and preference for familiar versus novel action sequences - are more suitable and justified. In support of the point I raised here, assigning intrinsic rewarding properties to the learned sequences and thereby preferring these sequences can be conceptually aligned with goal-directed behavior just as much as it could be with habit.

Experiment 3:
Similar to Experiment 2, I find the framing of arbitration between goal-directed/habitual behavior in Experiment 3 inadequate and unjustified. The results of the experiment suggest that participants were primarily goal-directed and there is no evidence to support the idea that this re-evaluation led participants to switch from habitual to goal-directed behavior.
Also, given the explicit choice of the sequence to perform participants had to make prior to performing it, it is reasonable to assume that this experiment mainly tested bias towards familiar sequence/stimulus and/or towards intrinsic reward associated with the sequence in value-based decision making.

Mobile-app performance effect on symptomatology: exploratory analyses:
Maybe it would be worth testing if the patients with improved symptomatology (that contribute some of their symptom improvement to the app) also chose to play more during the training stage.

Discussion:
Based on my earlier comments highlighting the inadequacy and mis-framing of the work in terms of habit and goal-directed behavior, I suggest that the discussion section be substantially revised to reflect these concerns.

In the sentence "Nevertheless, OCD patients disadvantageously preferred the previously trained/familiar action sequence under certain conditions" the term "disadvantageously" is not necessarily accurate. While there was potentially more effort required, considering the possible presence of intrinsic reward and chunking, this preference may not necessarily be disadvantageous. Therefore, a more cautious and accurate phrasing that better reflects the associated results would be useful.

Materials and Methods:
The authors mention: "The novel sequence (in condition 3) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the day, before starting this task (therefore, not overtrained)." - for the sake of completeness, more details on the pre-training done on that day would be useful.

Minor comments:
In the section discussing the sensitivity of sequence duration to reward, the authors state that they only analyzed continuous reward trials because "a larger number of trials in each subsample were available to fit the Gaussian distributions, due to feedback being provided on all trials." However, feedback was also provided on all trials in the variable reward condition, even though the reward was not necessarily aligned with participants' performance. Therefore, it may be beneficial to rephrase this statement for clarity.

With regard to experiment 2 (Preference for familiar versus novel action sequences) in the following statement "A positive correlation between COHS and the app sequence choice (Pearson r = 0.36, p = 0.005) further showed that those participants with greater habitual tendencies had a greater propensity to prefer the trained app sequence under this condition." I find the use of the word "further" here potentially misleading.

Author Response

Reviewer #1 (Public Review):

Strengths

This paper is well situated theoretically within the habit learning/OCD literature. Daily training in a motor-learning task, delivered via smartphone, was innovative, ecologically valid and more likely to assay habitual behaviors specifically. Daily training is also more similar to studies with non-humans, making a better link with that literature. The use of a sequential-learning task (cf. tasks that require a single response) is also more ecologically valid. The in-laboratory tests (after the 1 month of training) allowed the researchers to test if the OCD group preferred familiar, but more difficult, sequences over newer, simpler sequences.

The authors achieved their aims in that two groups of participants (patients with OCD and controls) engaged with the task over the course of 30 days. The repeated nature of the task meant that 'overtraining' was almost certainly established, and automaticity was demonstrated. This allowed the authors to test their hypotheses about habit learning. The results are supportive of the authors' conclusions.

We truly appreciate the positive assessment of referee 1, particularly the consideration that our study is theoretically strong and that ‘the results are supportive of the authors' conclusions’. This is an important external endorsement of our conclusions, contrasting somewhat with the views of referee 2.

Weaknesses

The sample size was relatively small. Some potentially interesting individual differences within the OCD group could have been examined more thoroughly with a bigger sample (e.g., preference for familiar sequences). A larger sample may have allowed the statistical testing of any effects due to medication status.

The authors were not able to test one criterion of habits, namely resistance to devaluation, due to the nature of the task

We agree with the reviewer that the proof of principle established in our study opens new avenues for research into the psychological and behavioral determinants of the heterogeneity of this clinical population. However, considering the study timeline and the pandemic constraints, a bigger sample was not possible. Our sample can indeed be considered small if one compares it with current online studies, which do not require in-person/laboratory testing, thus being much easier to recruit and conduct. However, given the nature of our protocol (with 2 demanding test phases, 1-month engagement per participant and the inclusion of OCD patients without comorbidities only) and the fact that this study also involved laboratory testing, we consider our sample size reasonable and comparable to other laboratory studies (typically comprising on average between 30-50 participants in each group).

This article is likely to be impactful -- the delivery of a task across 30 days to a patient group is innovative and represents a new approach for the study of habit learning that is superior to an inlaboratory approach.

An interesting aspect of this manuscript is that it prompts a comparison with previous studies of goal-directed/habitual responding in OCD that used devaluation protocols, and which may have had their effects due to deficits in goal-directed behavior and not enhanced habit learning per se.

Thank you for acknowledging the impact of our study, in particular the unique ability of our task to interrogate the habit system.

Reviewer #2 (Public Review):

In this study, the researchers employed a recently developed smartphone application to provide 30 days of training on action sequences to both OCD patients and healthy volunteers. The study tested learning and automaticity-related measures and investigated the effects of several factors on these measures. Upon training completion, the researchers conducted two preference tests comparing a learned and unlearned action sequences under different conditions. While the study provides some interesting findings, I have a few substantial concerns:

  1. Throughout the entire paper, the authors' interpretations and claims revolve around the domain of habits and goal-directed behavior, despite the methods and evidence clearly focusing on motor sequence learning/procedural learning/skill learning. There is no evidence to support this framing and interpretation and thus I find them overreaching and hyperbolic, and I think they should be avoided. Although skills and habits share many characteristics, they are meaningfully distinguishable and should not be conflated or mixed up. Furthermore, if anything, the evidence in this study suggests that participants attained procedural learning, but these actions did not become habitual, as they remained deliberate actions that were not chosen to be performed when they were not in line with participants' current goals.

We acknowledge that the research on habit learning is a topic of current controversy, especially when it comes to how to induce and measure habits in humans. Therefore, within this context referee’s 2 criticism could be expected. Across disQnct fields of research, different methodologies have been used to measure habits, which represent relaQvely stereotyped and autonomous behavioral sequences enacted in response to a specific sQmulus without consideraQon, at the Qme of iniQaQon of the sequence, of the value of the outcome or any representaQon of the relaQonship that exists between the response and the outcome. Hence these are sQmulus-bound responses which may or may not require the implementaQon of a skill during subsequent performance. Behavioral neuroscienQsts define habits similarly, as sQmulus-response associaQons which are independent of reward or outcome, and use devaluaQon or conQngency degradaQon strategies to probe habits (Dickinson and Weiskrantz, 1985; Tricomi et al., 2009). Others conceptualize habits as a form of procedural memory, along with skills, and use motor sequence learning paradigms to invesQgate and dissect different components of habit learning such as acQon selecQon, execuQon and consolidaQon (Abrahamse et al., 2013; Doyon et al., 2003; Squire et al., 1993). It is also generally agreed that the autonomous nature of habits and the fluid proficiency of skills are both usually achieved with many hours of training or pracQce, respecQvely (Haith and Krakauer, 2018).

We consider that Balleine and Dezfouli (2019) made an excellent attempt to bring all these different criteria within a single framework, which we have followed. We also consider that our discussion in fact followed a rather cautious approach to interpretation solely in terms of goaldirected versus habitual control.

Referee 2 does not actually specify criteria by which they define habits and skills, except for asserting that skilled behavior is goal-directed, without mentioning what the actual goal of the implantation of such skill is in the present study: the fulfillment of a habit? We assume that their definition of habit hinges on the effects of devaluation, as a single criterion of habit, but which according to Balleine and Dezfouli (2019) is only 1 of their 4 listed criteria. We carefully addressed this specific criterion in our manuscript: “We were not, however, able to test the fourth criterion, of resistance to devaluation. Therefore, we are unable to firmly conclude that the action sequences are habits rather than, for example, goal-directed skills. Regardless of whether the trained action sequences can be defined as habits or goal-directed motor skills, it has to be considered…”. Therefore, we took due care in our conclusions concerning habits and thus found the referee’s comment misleading and unfair.

We note that our trained motor sequences did in fact fulfil the other 3 criteria listed by Balleine and Dezfouli (2019), unlike many studies employing only devaluation (e.g. Tricomi et al 2009; Gillan et al 2011). Moreover, we cited a recent study using very similar methodology where the devaluation test was applied and shown to support the habit hypothesis (Gera et al., 2022).

Whether the initiation of the trained motor sequences in experiment 3 (arbitration) are underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1). Transitions between habitual and goal-directed control over behavior are quite well established in the experimental literature, especially when choice opportunities become available (Bouton et al (2021), Frölich et al (2023), or a new goal-directed schemata is recruited to fulfill a habit (Fouyssac et al, 2022). This switching between habits and goal-directed responding may reflect the coordination of these systems in producing effective behavior in the real world.

  • Fouyssac M, Peña-Oliver Y, Puaud M, Lim NTY, Giuliano C, Everitt BJ, Belin D. (2021).Negative Urgency Exacerbates Relapse to Cocaine Seeking After Abstinence. Biological Psychiatry. doi: 10.1016/j.biopsych.2021.10.009

  • Frölich S, Esmeyer M, Endrass T, Smolka MN and Kiebel SJ (2023) Interaction between habits as action sequences and goal-directed behavior under time pressure. Front. Neurosci. 16:996957. doi: 10.3389/fnins.2022.996957

  • Bouton ME. 2021. Context, attention, and the switch between habit and goal-direction in behavior. Learn Behav 49:349– 362. doi:10.3758/s13420-021-00488-z

  1. Some methodological aspects need more detail and clarification.
  1. There are concerns regarding some of the analyses, which require addressing.

We thank referee 2 for their detailed review of the methods and analyses of our study and for the helpful feedback, which clearly helps improve our manuscript. We will clarify the methodological aspects in detail and conduct the suggested analysis. Please see below our answers to the specific points raised.

Introduction:

  1. It is stated that "extensive training of sequential actions would more rapidly engage the 'habit system' as compared to single-action instrumental learning". In an attempt to describe the rationale for this statement the authors describe the concept of action chunking, its benefits and relevance to habits but there is no explanation for why sequential actions would engage the habit system more rapidly than a single-action. Clarifying this would be helpful.

We agree that there is no evidence that action sequences become habitual more readily than single actions, although action sequences clearly allow ‘chunking’ and thus likely engage neural networks including the putamen which are implicated in habit learning as well as skill. In our revised manuscript we will instead state: “we have recently postulated that extensive training of sequential actions could be a means for rapidly engaging the ‘habit system’ (Robbins et al., 2019)]”

  1. In the Hypothesis section the authors state: “we expected that OCD patients... show enhanced habit attainment through a greater preference for performing familiar app sequences when given the choice to select any other, easier sequence”. I find it particularly difficult to interpret preference for familiar sequences as enhanced habit attainment.

We agree that choice of the familiar response sequence should not be a necessary criterion for habitual control although choice for a familiar sequence is, in fact, not inconsistent with this hypothesis. In a recent study, Zmigrod et al (2022) found that 'aversion to novelty' was a relevant factor in the subjective measurement of habitual tendencies. It should also be noted that this preference was present in patients with OCD. If one assumes instead, like the referee, that the familiar sequence is goal-directed, then it contravenes the well-known 'egodystonia' of OCD which suggests that such tendencies are not goal-directed.

To clarify our hypothesis, we will amend the sentence to the following: “Finally, we expected that OCD patients would generally report greater habits, as well as attribute higher intrinsic value to the familiar app sequences manifested by a greater preference for performing them when given the choice to select any other, easier sequence”.

A few notes on the task description and other task components:

  1. It would be useful to give more details on the task. This includes more details on the time/condition of the gradual removal of visual and auditory stimuli and also on the within practice dynamic structure (i.e., different levels appear in the video).

These details will be included in the revised manuscript. Thank you for pointing out the need for further clarification of the task design.

  1. Some more information on engagement-related exclusion criteria would be useful (what happened if participants did not use the app for more than one day, how many times were allowed to skip a day etc.).

This additional information will be added to the revised manuscript. If participants omitted to train for more than 2 days, the researcher would send a reminder to the participant to request to catch up. If the participant would not react accordingly and a third day would be skipped, then the researcher would call to understand the reasons for the lack of engagement and gauge motivation. The participant would be excluded if more than 5 sequential days of training were missed. Only 2 participants were excluded given their lack of engagement.

  1. According to the (very useful) video demonstrating the task and the paper describing the task in detail (Banca et al., 2020), the task seems to include other relevant components that were not mentioned in this paper. I refer to the daily speed test, the daily random switch test, and daily ratings of each sequence's enjoyment and confidence of knowledge.

If these components were not included in this procedure, then the deviations from the procedure described in the video and Banca al. (2020) should be explicitly mentioned. If these components were included, at least some of them may be relevant, at least in part, to automaticity, habitual action control, formulation of participants' enjoyment from the app etc. I think these components should be mentioned and analyzed (or at least provide an explanation for why it has been decided not to analyze them).

This is also true for the reward removal (extinction) from the 21st day onwards which is potentially of particular relevance for the research questions.

The task procedure was indeed the same as detailed in Banca et al., 2020. We did not include these extra components in this current manuscript for reasons of succinctness and because the manuscript was already rather longer than a common research article, given that we present three different, though highly inter-dependent, experiments in order to answer key interrelated questions in an optimal manner. However, since referee 2 considers this additional analysis to be important, we will be happy to include it in the supplementary material of the revised manuscript.

Training engagement analysis:

9)I find referring to the number of trials including successful and unsuccessful trials as representing participants "commitment to training" (e.g. in Figure legend 2b) potentially inadequate. Given that participants need at least 20 successful trials to complete each practice, more errors would lead to more trials. Therefore, I think this measure may mostly represent weaker performance (of the OCD patients as shown in Figure 2b). Therefore, I find the number of performed practice runs, as used in Figure 2a (which should be perfectly aligned with the number of successful trials), a "clean" and proper measure of engagement/commitment to training.

We acknowledge referee’s concern on this matter and agree to replace the y-axis variable of Figure 2b to the number of performed practices (thus aligning with Figure 2a). This amendment will remove any potential effect of weaker performance on the engagement measurement and will provide clearer results.

  1. Also, to provide stronger support for the claim about different diurnal training patterns (as presented in Figure 2c and the text) between patients and healthy individuals, it would be beneficial to conduct a statistical test comparing the two distributions. If the results of this test are not significant, I suggest emphasizing that this is a descriptive finding.

We will conduct the statistical test and report accordingly.

Learning results:

  1. When describing the Learning results (p10) I think it would be useful to provide the descriptive stats for the MT0 parameter (as done above for the other two parameters).

Thank you for pointing this out. The descriptive stats for MT0 will be added to the revised version of the manuscript.

  1. Sensitivity of sequence duration and IKI consistency (C) to reward:

I think it is important to add details on how incorrect trials were handled when calculating ∆MT (or C) and ∆R, specifically in cases where the trial preceding a successful trial was unsuccessful. If incorrect trials were simply ignored, this may not adequately represent trial-by-trial changes, particularly when testing the effect of a trial's outcome on performance change in the next trial.

This is an important question. Our analysis protocol was designed to ensure that incorrect trials do not contaminate or confound the results. To estimate the trial-to-trial difference in ∆MT (or C) and ∆R, we exclusively included pairs of contiguous trials where participants achieved correct performance and received feedback scores for both trials. For example, if a participant made a performance error on trial 23, we did not include ∆R or ∆MT estimates for the pairs of trials 23-22 and 24-23. Instead of excluding incorrect trials from our analyses, we retained them in our time series but assigned them a NaN (not a number) value in Matlab. As a result, ∆R and ∆MT was not defined for those two pairs of trials. Similarly for C. This approach ensured that our analyses are not confounded by incremental or decremental feedback scores between noncontiguous trials. In the past, when assessing the timing of correct actions during skilled sequence performance, we also considered events that were preceded and followed by correct actions. This excluded effects such as post-error slowing from contaminating our results (Herrojo Ruiz et al., 2009, 2019). Therefore, we do not believe that any further reanalysis is required.

  • Ruiz MH, Jabusch HC, Altenmüller E. Detecting wrong notes in advance: neuronal correlates of error monitoring in pianists. Cerebral cortex. 2009 Nov 1;19(11):2625-39.

  • Bury G, García-Huéscar M, Bhattacharya J, Ruiz MH. Cardiac afferent activity modulates early neural signature of error detection during skilled performance. NeuroImage. 2019 Oct 1;199:704-17.

  1. I have a serious concern with respect to how the sensitivity of sequence duration to reward is framed and analyzed. Since reward is proportional to performance, a reduction in reward essentially indicates a trial with poor performance, and thus even regression to the mean (along with a floor effect in performance [asymptote]) could explain the observed effects. It is possible that even occasional poor performance could lead to a participant demonstrating this effect, potentially regardless of the reward. Accordingly, the reduced improvement in performance following a reward decrease as a function of training length described in Figure 5b legend may reflect training-induced increased performance that leaves less room for improvement after poor trials, which are no longer as poor as before. To address this concern, controlling for performance (e.g., by taking into consideration the baseline MT for the previous trial) may be helpful. If the authors can conduct such an analysis and still show the observed effect, it would establish the validity of their findings."

Thank you for raising this point. Figure 5b illustrates two distinct effects of reward changes on behavioral adaptation, which are expected based on previous research.

I. Practice effects: Firstly, we observe that as participants progress across bins of practice, the degree of improvement in behavior (reflected by faster movement time, MT) following a decrease in reward (∆R−) diminishes, consistent with our expectations based on previous work. Conversely, we found that ∆MT does not change across bins of practices following an increase in reward (∆R+). We appreciate the reviewer's suggestion regarding controlling for the reference movement time (MT) in the previous trial when examining the practice effect in the p(∆T|∆R−) and p(∆T|∆R+) distributions. In the revised manuscript, we will conduct the proposed control analysis to better understand whether the sensitivity of MT to score decrements changes across practice when normalising MT to the reference level on each trial. But see below for a preliminary control analysis.

II. Asymmetry of the effect of ∆R− and ∆R+ on performance: Figure 5b also depicts the distinct impact of score increments and decrements on behavioural changes. When aggregating data across practice bins, we consistently observed that the centre of the p(∆T|∆R−) distribution was smaller (more negative) than that of p(∆T|∆R+). This suggests that participants exhibited a greater acceleration following a drop in scores compared to a relative score increase, and this effect persisted throughout the practice sessions. Importantly, this enhanced sensitivity to losses or negative feedback (or relative drops in scores) aligns with previous research findings (Galea et al., 2015; Pekny et al., 2014; van Mastrigt et al., 2020).

We have conducted a preliminary control analysis to exclude the potential impact that reference movement time (MT) values could have on our analysis. We have assessed the asymmetry between behavioural responses to ∆R− and ∆R+ using the following analysis: We estimated the proportion of trials in which participants exhibited speed-up (∆T < 0) or slow-down (∆T > 0) behaviour following ∆R− and ∆R+ across different practice bins (bins 1 to 4). By discretising the series of behavioural changes (∆T) into binary values (+1 for slowing down, -1 for speeding up), we can assess the type of changes (speed-up, slow-down) without the absolute ∆T or T values contributing to our results. We obtained several key findings:

• Consistent with expectations (sanity check), participants exhibited more instances of speeding up than slowing down across all reward conditions.

• Participants demonstrated a higher frequency of speeding up following ∆R− compared to ∆R+, and this asymmetry persisted throughout the practice sessions (greater proportion of -1 events than +1 events). 53% events were speed-up events in the in the p(∆T|∆R+) distribution for the first bin of practices, and 55% for the last bin. Regarding p(∆T|∆R-), there were 63% speed-up events throughout each bin of practices, with this proportion exhibiting no change over time.

• Accordingly, the asymmetry of reward changes on behavioural adaptations, as revealed by this analysis, remained consistent across the practice bins.

Thus, these preliminary findings provide an initial response to referee 2 and offer valuable insights into the asymmetrical effects of positive/negative reward changes on behavioural adaptations. We plan to include these results in the revised manuscript, as well as the full control analysis suggested by the referee. We will further expand upon their interpretation and implications.

  1. Another way to support the claim of reward change directionality effects on performance (rather than performance on performance), at least to some extent, would be to analyze the data from the last 10 days of the training, during which no rewards were given (pretending for analysis purposes that the reward was calculated and presented to participants). If the effect persists, it is less unlikely that the effect in question can be attributed to the reward dynamics.

The reviewer’s concern is addressed in the previous quesQon. Also, this analysis would not be possible because our Gaussian fit analyses use the Qme series of conQnuous reward scores, in which ∆R− or ∆R+ are embedded. These events cannot be analyzed once reward feedback is removed because we do not have behavioral events following ∆R− or ∆R+ anymore.

  1. This concern is also relevant and should be considered with respect to the sensitivity of IKI consistency (C) to reward. While the relationship between previous reward/performance and future performance in terms of C is of a different structure, the similar potential confounding effects could still be present.

We will conduct this analysis for the revised manuscript, similarly to the control analysis suggested by referee 2 on MT. Our preliminary control analysis, as explained above, suggests that the fundamental asymmetry in the effect of ∆R+ and ∆R+ on behavioral changes persists when excluding the impact of reference performance values in our Gaussian fit analysis.

  1. Another related question (which is also of general interest) is whether the preferred app sequence (as indicated by the participants for Phase B) was consistently the one that yielded more reward? Was the continuous sequence the preferred one? This might tell something about the effectiveness of the reward in the task.

We have now conducted this analysis. There is in fact no evidence to conclude that the continuously rewarded sequence was the preferred one. The result shows that 54.5% of HV and 29% of the OCD sample considered the continuous sequence to be their preferred one. Of note, this preference may not necessarily be linked to the trial-by-trial reward sensitive analysis. The latter assesses how learning may be affected by reward. The overall preference may be influenced by many other factors, such as, for example, the aesthetic appeal of particular combinations of finger movements.

Regarding both experiments 2 and 3:

  1. The change in context in experiment 2 and 3 is substantial and include many different components. These changes should be mentioned in more detail in the Results section before describing the results of experiments 2 and 3.

Following referee’s advice, we will move these details (currently written in the Methods section) to the Results section, when we introduce Phase B and before describing the results of experiments 2 and 3.

Experiment 2:

  1. In Experiment 2, the authors sometimes refer to the "explicit preference task" as testing for habitual and goal-seeking sequences. However, I do not think there is any justification for interpreting it as such. The other framings used by the authors - testing whether trained action sequences gain intrinsic/rewarding properties or value, and preference for familiar versus novel action sequences - are more suitable and justified. In support of the point I raised here, assigning intrinsic rewarding properties to the learned sequences and thereby preferring these sequences can be conceptually aligned with goal-directed behavior just as much as it could be with habit.

We clearly defined the theoretical framing of experiment 2 as a test of whether trained action sequences gain intrinsic value and we are pleased to hear that the referee agrees with this framing. If the referee is referring to the paragraph below (in the Discussion), we actually do acknowledge within this paragraph that a preference for the trained sequences can either be conceptually aligned with a habit OR a goal-directed behavior.

“On the other hand, we are describing here two potential sources of evidence in favor of enhanced habit formation in OCD. First, OCD patients show a bias towards the previously trained, apparently disadvantageous, action sequences. In terms of the discussion above, this could possibly be reinterpreted as a narrowing of goals in OCD (Robbins et al., 2019) underlying compulsive behavior, in favor of its intrinsic outcomes”

This narrowing of goals model of OCD refers to a hypothetically transiQonal stage of compulsion development driven by behavior having an abnormally strong, goal-directed nature, typically linked to specific values and concerns.

If the referee is referring to the penulQmate sentence of hypothesis secQon, this has been amended in response to Q5. We cannot find any other possible instances in this manuscript stating that experiment 2 is a test of habitual or goal-directed behavior.

Experiment 3:

  1. Similar to Experiment 2, I find the framing of arbitration between goal-directed/habitual behavior in Experiment 3 inadequate and unjustified. The results of the experiment suggest that participants were primarily goal-directed and there is no evidence to support the idea that this reevaluation led participants to switch from habitual to goal-directed behavior.

Also, given the explicit choice of the sequence to perform participants had to make prior to performing it, it is reasonable to assume that this experiment mainly tested bias towards familiar sequence/stimulus and/or towards intrinsic reward associated with the sequence in value-based decision making.

This comment is aligned with (and follows) the referee’s criticism of experiment 1 not achieving automatic and habitual actions. We have addressed this matter above, in response 1 to Referee 2.

Mobile-app performance effect on symptomatology: exploratory analyses:

  1. Maybe it would be worth testing if the patients with improved symptomatology (that contribute some of their symptom improvement to the app) also chose to play more during the training stage.

We have conducted analysis to address this relevant question. There is no correlation between the YBOCS score change and the number of total practices, meaning that the patients who improved symptomatology post training did not necessarily chose to play the app more during the training stage (rs = 0.25, p = 0.15). Additionally, we have statistically compared the improvers (patients with reduced YBOCS scores post-training) and the non-improvers (patients with unchanged or increased YBOCS scores post-training) in their number of app completed practices during the training phase and no differences were observed (U = 169, p = 0.19).

Discussion:

  1. Based on my earlier comments highlighting the inadequacy and mis-framing of the work in terms of habit and goal-directed behavior, I suggest that the discussion section be substantially revised to reflect these concerns.

We do not agree that the work is either "inadequate or mis-framed" and will not therefore be substantially revising the Discussion. We will however clarify further the interpretation we have made and make explicit the alternative viewpoint of the referee. For example, we will retitle experiment 3 as “Re-evaluation of the learned action sequence: possible test of goal/habit arbitration” to acknowledge the referee’s viewpoint as well as our own interpretation.

  1. In the sentence "Nevertheless, OCD patients disadvantageously preferred the previously trained/familiar action sequence under certain conditions" the term "disadvantageously" is not necessarily accurate. While there was potentially more effort required, considering the possible presence of intrinsic reward and chunking, this preference may not necessarily be disadvantageous. Therefore, a more cautious and accurate phrasing that better reflects the associated results would be useful.

We recognize that the term "disadvantageously" may be semantically ambiguous for some readers and therefore we will remove it.

Materials and Methods:

  1. The authors mention: "The novel sequence (in condition 3) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the day, before starting this task (therefore, not overtrained)." - for the sake of completeness, more details on the pre-training done on that day would be useful.

Details of the learning procedure of the novel sequence (in condition 3, experiment 3) will be provided in the methods of the revised version of the manuscript.

Minor comments:

  1. In the section discussing the sensitivity of sequence duration to reward, the authors state that they only analyzed continuous reward trials because "a larger number of trials in each subsample were available to fit the Gaussian distributions, due to feedback being provided on all trials." However, feedback was also provided on all trials in the variable reward condition, even though the reward was not necessarily aligned with participants' performance. Therefore, it may be beneficial to rephrase this statement for clarity.

We will follow this referee’s advice and will rephrase the sentence for clarity.

  1. With regard to experiment 2 (Preference for familiar versus novel action sequences) in the following statement "A positive correlation between COHS and the app sequence choice (Pearson r = 0.36, p = 0.005) further showed that those participants with greater habitual tendencies had a greater propensity to prefer the trained app sequence under this condition." I find the use of the word "further" here potentially misleading.

The word "further" will be removed.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation