1. Neuroscience
Download icon

The control of tonic pain by active relief learning

  1. Suyi Zhang  Is a corresponding author
  2. Hiroaki Mano
  3. Michael Lee
  4. Wako Yoshida
  5. Mitsuo Kawato
  6. Trevor W Robbins
  7. Ben Seymour  Is a corresponding author
  1. University of Cambridge, United Kingdom
  2. Advanced Telecommunications Research Institute International, Japan
  3. National Institute for Information and Communications Technology, Japan
Research Article
  • Cited 0
  • Views 2,496
  • Annotations
Cite as: eLife 2018;7:e31949 doi: 10.7554/eLife.31949

Abstract

Tonic pain after injury characterises a behavioural state that prioritises recovery. Although generally suppressing cognition and attention, tonic pain needs to allow effective relief learning to reduce the cause of the pain. Here, we describe a central learning circuit that supports learning of relief and concurrently suppresses the level of ongoing pain. We used computational modelling of behavioural, physiological and neuroimaging data in two experiments in which subjects learned to terminate tonic pain in static and dynamic escape-learning paradigms. In both studies, we show that active relief-seeking involves a reinforcement learning process manifest by error signals observed in the dorsal putamen. Critically, this system uses an uncertainty (‘associability’) signal detected in pregenual anterior cingulate cortex that both controls the relief learning rate, and endogenously and parametrically modulates the level of tonic pain. The results define a self-organising learning circuit that reduces ongoing pain when learning about potential relief.

https://doi.org/10.7554/eLife.31949.001

eLife digest

Chronic pain lasting longer than three months is a common problem that affects about 1 in 5 people at some point in their lives. The lack of effective treatments has led to widespread use of a group of drugs called opioids – the best-known example is morphine. Opioids work by activating the brain’s natural painkilling system and are useful to relieve short-term pain, for example in trauma or surgery, or in end-of-life care. Unfortunately, long-term use of opioids can cause many undesirable effects, including drug dependency. Misuse of opioids combined with the widespread availability of prescription drugs have contributed to the current crisis of opioid addiction and overdose.

A better understanding of how the brain’s natural painkilling system works could help scientists develop painkillers that offer relief without the harmful side effects of opioids. While unpleasant, pain is important for survival. After an injury, for example, pain saps motivation and forces people to rest and preserve their energy as they are healing. In a way, this sort of pain is healthy because it promotes recovery. There may be times when the brain might want to turn off pain, such as when an individual is seeking new ways to relieve or manage pain. For example, by finding a way to cool a burn.

Now, Zhang et al. show that the brain reduces pain while individuals are trying to find relief. In the experiments, a metal probe was attached to the arm of healthy volunteers and heated until it became painful but not hot enough to burn the skin. Then, the volunteers were asked to play a game in which they had to find out which button on a small keypad cooled down the probe. Sometimes it was easy to turn off the heat, sometimes it was difficult. During the game, volunteers reported how much pain they felt and Zhang et al. used brain imaging to see what happened in their brains.

When the subjects were actively trying to work out which button they should press, pain was reduced. But when the subjects knew which button to press, it was not. Next, Zhang et al. found that a part of the brain called the pregenual cingulate cortex was responsible for making decisions about when to turn off pain and may so trigger the brain’s natural pain killing system. A next step will be to see how this part of the brain decides to turn off pain and if it also controls opioid-like or other chemicals. This could improve the use of opioids, or even help to discover alternative treatments for chronic pain.

https://doi.org/10.7554/eLife.31949.002

Introduction

Tonic pain is a common physiological consequence of injury and results in a behavioural state that favours quiescence and inactivity, prioritising energy conservation and optimising recuperation and tissue healing. This effect extends to cognition, and decreased attention is seen in a range of cognitive tasks during tonic pain (Crombez et al., 1997; Lorenz and Bromm, 1997). However, in some circumstances, this could be counter-productive, for instance if attentional resources were required for learning some means of relief or escape from the underlying cause of the pain. A natural solution would be to suppress tonic pain when relief learning is possible. Whether and how this is achieved is not known, but it is important as it might reveal central mechanisms of endogenous analgesia.

Two observations provide potential clues as to how a relief learning system might modulate pain. First, in some situations, perceived controllability has been found to reduce pain (Salomons et al., 2004; Salomons et al., 2007; Wiech et al., 2014; Becker et al., 2015), suggesting that the capacity to seek relief can engage endogenous modulation. Second, instructed attention has commonly been observed to reduce pain (Bantick et al., 2002). Therefore, it may be that attentional processes that are internally triggered when relief is learnable might provide a key signal that controls reduction of pain.

In general, learning involves distinct processes of prediction (‘state learning’) and control (‘action learning’) (Mackintosh, 1983), although relief learning during tonic pain has not been thoroughly investigated. But a quantitative model of relief learning - one that describes the computational processes that are implemented in learning centres in the brain - would allow interrogation of how an attentional process might operate to modulate tonic pain. In the case of phasic pain, learning can be described by reinforcement learning (RL) models - a well-studied computational framework for learning from experience. RL models describe how to predict the occurrence of inherently salient events, and learn actions to exert control over them (maximising rewards, minimising penalties) (Seymour et al., 2004). RL models aim to provide a mechanistic (beyond a merely descriptive) account of the information processing operations that the brain actually implements (Dayan and Abbott, 2001), and have a solid foundation in classical theories of animal learning (Mackintosh, 1983). In such models, an agent learns state or action value functions through outcomes provided by interacting with the world. These functions can be learned by computing the error between predicted and actual outcomes, and using the error to improve future predictions and actions (Sutton and Barto, 1998). Experimentally, the validity of these models can be tested by comparing how well different model-generated predictors fit the actual behavioural and/or neural data (O'Doherty et al., 2007).

During learning, attention is thought to boost learning of predictive associations and suppress other irrelevant information. Computationally, this can be achieved by estimating the uncertainty as predictive associations are learned, and using this as a metric to control learning rates. Accordingly, high uncertainty corresponds to high attention and leads to more rapid learning (Dayan et al., 2000; Yu and Dayan, 2005). One well-recognised way of formalising uncertainty in RL is by computing a quantity called the associability, which calculates the running average of the magnitude of recent prediction errors (i.e. frequent large prediction errors implies high uncertainty/associability). The concept of associability is grounded in classical theories of Pavlovian conditioning (the ‘Pearce-Hall’ learning rule, Le Pelley, 2004; Pearce and Hall, 1980; Holland and Schiffino, 2016), and provides a good account of behaviour and neural responses during Pavlovian learning (Li et al., 2011; Boll et al., 2013; Zhang et al., 2016). In this way, associability reflects a computational construct that captures aspects of the psychological construct of attention.

If it is the case, therefore, that attention can be understood as an uncertainty signal that drives learning during relief-seeking, it can then be tested with it modulates tonic pain in parallel. Standard models of RL do not include any mechanism by which the subjective experience of outcomes is under control, although in principle endogenous modulation of tonic pain could arise from any component of the learning system, including an associability signal. Using an associability signal in this way would make intuitive sense, because it would reduce ongoing pain when requirement for learning was high.

The studies presented here set two goals: to delineate the basic neural architecture of relief learning from tonic pain (i.e. pain escape learning) based on a state and action learning RL framework; and to understand the relationship between relief learning and endogenous pain modulation that is, to test the hypothesis that an attentional learning signal reduces pain. We studied behavioural, physiological and neural responses during two relief learning tasks in humans, involving (i) static and (ii) dynamic cue-relief contingencies. These tasks were designed to place a high precedence on error-based learning and uncertainty, as a robust test for learning mechanisms and dynamic modulation of tonic pain. Using a computationally motivated analysis approach, we aimed to identify whether behavioural and brain responses were well described as state and/or action RL learning systems and examined whether and how they exerted control over the perceived intensity of ongoing pain.

Results

Experiment 1

Experiment 1 was an escape learning task (n = 19) with fixed, probabilistic cue-relief contingencies (Figure 1a). Each subject performed three instrumental sessions and three Pavlovian sessions, to allow us to compare active and passive relief learning (Figure 1b). During each session (lasting approximatey 5 min), subjects were held in continuous pain by a thermal stimulator attached to their left arm, and temporary relief (i.e. escape) was given by rapidly cooling the thermode for 4 s, after which it returned to the baseline tonic pain level (Figure 1c). In instrumental sessions, subjects actively learned to select actions, a left or right button press, after viewing one of two visual cues (fractal images on a computer screen). For one of the cues, the probability of relief was 80% for one action and 20% for the other action, and for the other cue, the action relief probabilities were 60% and 40%. In the Pavlovian sessions, stimulus and outcome sequences were yoked to instrumental sessions for individual subjects, and subjects were required simply to press a button to match a random direction appearing on screen 0.5 s after visual cue onset (to control for motor responses). Subjective ratings of pain and relief were collected in random trials after outcome delivery, with on average eight pain and eight relief ratings per paradigm that is total 16 for each subject. All behavioural data including raw SCRs, choices, and ratings can be found in the manuscript data attachment.

Experimental paradigms.

(a) Example trial in Experiment 1, which was an instrumental relief learning task (Ins) with fixed relief probabilities, yoked with identical Pavlovian task (Pav) within subject. In instrumental trials, subjects saw one of two images (’cues’) and then chose a left or right button press, with each action associated with a particular probability of relief. In the yoked Pavlovian session, subjects were simply asked to press button to match the action shown on screen (appearing 0.5 s after CS onset). (b) Instrumental/Pavlovian session yoking and cue-outcome contingency in Experiment 1, arrows represent identical stimulus-outcome sequence. Note in contingency table, left and right button presses were randomised for both actions and cues. (c) Relief and no relief outcomes, individually calibrated, constant temperatures at around 44°C were used to elicit tonic pain; a brief drop in temperature of 13°C was used as a relief outcome (4 s in Experiment 1, 3 s in Experiment 2), but temperature did not change for the duration in no relief outcomes. (d) Example trial in Experiment 2, where subjects performed an instrumental paradigm (only) involving unstable relief probabilities. The cue-action representation was different to Experiment 1, and three cues were presented alongside each other with subjects required to choose one of the three using a button press. The position of each cue varied from trial-to-trial, and the same three cues were presented throughout. Tonic pain rating being taken before the outcome was experienced, not after as in Experiment 1. (e) Example traces of dynamic relief probabilities for the three displayed cues throughout all trials in eight sessions in Experiment 2, which required a constant trade-off of exploration and exploitation throughout the task. Dynamic relief probabilities also provide varying uncertainty throughout learning.

https://doi.org/10.7554/eLife.31949.003

Behavioural results

Choice

In instrumental learning, participants can learn which actions maximise the chance of relief. We assessed the ability of RL models to explain subjects’ choice data, in comparison to a simple win-stay-lose-shift (WSLS) decision-making rule. We compared two basic RL models that have been widely studied in neurobiological investigations of reward and avoidance - a temporal difference (TD) action learning model with a fixed learning rate, and a version of the TD model with an adaptive learning rate based on action associabilities (hybrid TD model). As mentioned above, the associability reflects the uncertainty in the action value, where higher associability indicates high uncertainty during learning, and is calculated based the recent average of the prediction error magnitude for each action. In a random-effects model comparison procedure (Daunizeau et al., 2014), we found that choices were best fit by the basic TD model (model frequency = 0.964, exceedance probability = 1, Figure 2a). Thus, there is no evidence that associability operates directly at the level of actions.

Figure 2 with 3 supplements see all
Experiment 1: behavioural results.

(a) Choice-fitted model comparison, TD model fit instrumental sessions choices best (TD: action-learning model with fixed learning rate, Hybrid: action-learning model with associability as changing learning rate, WSLS: win-stay-lose-shift model). Model frequency represents how likely a model generate the data given a random participant, while exceedance probability estimates how one model is more likely compared to others (Stephan et al., 2009). (b) Instrumental vs Pavlovian sessions SCRs (n = 15, sessions with over 20% trials <0.02 amplitude excluded). (c) Associability from hybrid model fitted trial-by-trial SCRs best in instrumental sessions (Assoc: associability, Hyb: hybrid model, RW: Rescorla-Wagner model). (d) Associability also fitted SCRs from Pavlovian sessions best. (e) Both pain and relief ratings did not differ significantly between instrumental and Pavlovian sessions (Participants’ ratings were averaged for each of the four categories shown, mean = 8 ratings per person per category).

https://doi.org/10.7554/eLife.31949.004
Skin conductance responses (SCR)

To investigate physiological indices of learning, we examined trial-by-trial skin conductance responses (SCRs) during the 3 s cue time, before outcome presentation. SCRs obtained in instrumental sessions were higher compared to yoked Pavlovian sessions (Figure 2b, n = 15, see Materials and methods for session exclusion criteria, paired t-test T(14)=2.55, p=0.023), with the average SCR positively correlated between paradigms across individuals (Pearson correlation ρ=0.623, p=0.013, n = 15). Raw traces and cue-evoked responses of SCRs can be found in Figure supplements.

In Pavlovian aversive (fear) learning, SCRs have been shown to reflect the associability of Pavlovian predictions (Li et al., 2011; Boll et al., 2013; Zhang et al., 2016). Here, associability is calculated as the mean prediction error magnitude for the state (i.e. regardless of actions) (Le Pelley, 2004). In instrumental learning, Pavlovian learning of state-outcome contingencies still proceeds alongside action-outcome learning, distinct from instrumental choices, so Pavlovian state-outcome learning can be modelled in both instrumental and Pavlovian sessions. Consistent with previous studies of phasic pain, model-fitting revealed that a learning model with a state-based associability (’hybrid’ model) best fit the SCR data in both Pavlovian and instrumental sessions (Figure 2c and Figure 2d, instrumental sessions: model frequency = 0.436, exceedance probability = 0.648, Pavlovian sessions: model frequency = 0.545, exceedance probability = 0.676), when tested against a competing simple Pavlovian Rescorla-Wagner model (akin to a TD model with only one state and a fixed learning rate). However, using the more stringent Protected Exceedance Probability analyses, the advantage of associability over other models were less conclusive (Figure 2—figure supplement 3). Together with the choice results, these analyses suggest that subjects use an associability-based RL mechanism for learning state values during both Pavlovian and instrumental pain escape, and a non-associability-based RL mechanism for learning action values in instrumental sessions. This divergence in learning strategies indicates that parallel learning systems coexist, which differ in their way of incorporating information about uncertainty in learning, as well as the nature of their behavioural responses.

Ratings

Subjective ratings of pain and relief were taken intermittently after outcomes during the task, to explore how pain modulation might depend on relief learning. Ratings were taken on a sample of trials, so as to minimise disruption of task performance. Based on the fact that both controllability and attention are implicated in endogenous control, we hypothesised that pain would be reduced when the state-outcome associability was high, reflecting an attentional signal associated with enhanced learning. However, other types of modulation are possible. For instance, pain might be non-specifically reduced in instrumental, versus Pavlovian learning, reflecting a general effect of instrumental controllability. Alternatively, pain might be reduced by the expectation of relief that arises during learning, as it is known that conditioning alone can support placebo analgesia responses (Colloca et al., 2008) (although the extent to which this occurs might depend on the acquisition of contingency awareness during learning) (Montgomery and Kirsch, 1997; Locher et al., 2017). In this case, pain would be positively correlated with the relief prediction error, since it reports the difference between expectation and outcome.

To test these competing hypotheses, we first compared the mean ratings of both pain (following a ‘no relief’ outcome) and relief (following a relief outcome) between Pavlovian and instrumental sessions, and found no significant difference (Mean±SEM, n = 19, mean = 8 ratings per person per category, instrumental pain: 6.97±0.13, Pavlovian pain: 6.91±0.20, instrumental relief: 6.46±0.24, Pavlovian relief: 6.33±0.27, between paradigm paired t-test both ratings p>0.5, Figure 2e). Hence, there is no support for a general effect of instrumental controllability on subjective pain and/or relief experience. We noted that mean pain and relief ratings were correlated with each other across individuals (ratings averaged across paradigms, Spearman’s correlation ρ=0.73, p<0.001), indicating that higher perceived tonic heat pain was associated with higher cooling-related relief.

Next, we correlated pain ratings with the state-based associability and TD prediction error. In accordance with our hypothesis, in instrumental sessions associability was found to be negatively correlated with pain ratings (mean Spearman’s ρ¯=−0.177, one-sample t-test of Fisher’s z-transformed correlation coefficients T(18)=-2.125, p=0.048). In Pavlovian sessions, however, we did not find a correlation (ρ¯=−0.114, T(18)=0.758, p=0.458). There was no significant interaction between associability and paradigm (repeated measure ANOVA F(1,18)=1.247, p=0.279). This suggests that although associability is associated with pain modulation, this effect is not necessarily specific to instrumental sessions.

We found that the prediction errors were negatively correlated with pain ratings in Pavlovian sessions (ρ¯=−0.356, T(18)=-3.198, p=0.005), but not instrumental sessions (ρ¯=−0.154, T(18)=0.720, p=0.481). That is, when relief was omitted (i.e. as was always the case on the pain rating trial), a larger frustrated (i.e. negative) relief prediction error was associated with an increase in pain - in contrast to the prediction of a placebo expectation hypothesis. Finally, we also looked at relief ratings, but failed to find any significant correlation with either associability or prediction error in either instrumental or Pavlovian sessions.

Neuroimaging results

The behavioural findings support the hypothesis that an associability signal that arises during state-based learning is associated with reduction of pain. Next, therefore, we then sought to identify (i) neural evidence for an error-based relief learning process and (ii) the neural correlates of the associability signal associated with tonic pain modulation. We implemented the TD action-learning model and associability-based hybrid TD state-learning model as determined from the behavioural data, using group-mean parameters (learning rate in TD model, and free parameter κ and η in hybrid TD model) to re-estimate trial-by-trial prediction errors/associability values for each subject as parametric modulators of fMRI BOLD time-series in general linear models.

Prediction errors

The prediction error represents the core ‘teaching’ signal of the reinforcement learning model, and we specified a priori regions of interest based on the areas known to correlate with the prediction error in previous reinforcement learning studies of pain and reward (ventral and dorsal striatum, ventromedial prefrontal cortex (VMPFC), dorsolateral prefrontal cortex (DLPFC), and amygdala (Seymour et al., 2005; Garrison et al., 2013; FitzGerald et al., 2012)).

First, we looked for brain responses correlated with the action prediction error from the TD model in instrumental sessions. This identified responses in bilateral putamen, bilateral amygdala, left DLPFC, and VMPFC (Figure 3a, Table 1).

Experiment 1: neuroimaging results, shown at p<0.001 uncorrected: (a) TD model prediction errors (PE) as parametric modulators at outcome onset time (duration = 3 s). 

(b) Model PE posterior probability maps (PPMs) from group-level Bayesian model selection (BMS) within PE cluster mask, warm colour: TD model PE, cool colour: hybrid model PE (shown at exceedance probability P>0.7). (c) Axiomatic analysis of hybrid model PEs in instrumental sessions, ROIs were 8 mm spheres from BMS peaks favouring TD model PEs, in left putamen and VMPFC. (d) Associability uncertainty generated by hybrid model, as parametric modulators at choice time (duration = 0), in instrumental sessions. (e) Comparing pgACC activations across instrumental/Pavlovian paradigms, ROI was 8 mm sphere at [−3, 40, 5], peak from overlaying the pgACC clusters from Experiments 1 and 2.

https://doi.org/10.7554/eLife.31949.009
Table 1
Multiple correction for Experiment 1 (cluster-forming threshold of p<0.001 uncorrected, regions from Harvard-Oxford atlas. *FWE cluster-level corrected (showing p<0.05 only).
https://doi.org/10.7554/eLife.31949.010
p*kTZMNI coordinates (mm)Region mask
xyz
TD model PE, instrumental sessions
0.00744.273.5−21-5−14Amygdala L
0.01134.983.928-1−14Amygdala R
0285.314.07−213-7Putamen L
4.73.75−28-51
0.003145.734.27207-7Putamen R
0.03423.753.1828-18
0.00744.633.71−173-3Pallidum L
0.00395.24.01177-3Pallidum R
Hybrid model PE, instrumental sessions
0.00554.33.52−21-5−14Amygdala L
0.01424.533.6528-1−14Amygdala R
0.004125.023.92−213-7Putamen L
0.01264.553.66−2838
0.04613.823.23−2811-3
0.001235.033.92207-7Putamen R
4.923.872071
4.393.5724-15
0.00654.043.36−173-3Pallidum L
0.00564.823.811771Pallidum R
Hybrid model PE, Pavlovian sessions
None
Hybrid model associability, instrumental sessions
0.02754.343.55-2375Cingulate Anterior

Since action-outcome learning and state-outcome learning co-occur during instrumental sessions, we next modelled the state prediction error from the hybrid model in a separate regression model. In instrumental sessions, this revealed responses in similar regions to the TD action prediction error: in the striatum, right amygdala and left DLPFC (figure not shown, Table 1), consistent with the fact that state and action prediction errors are highly correlated.

Table 5
Experiment 1 learning model fitting results.
https://doi.org/10.7554/eLife.31949.011
Model (Options)Data fitted (sessions)ParametersMeanStdInitial states
TD (*)choice (instrumental)learning rate, α0.4010.087Q0=0
WSLS (*)choice (instrumental)pseudo Q (cue 1), p10.3820.073No hidden states
pseudo Q (cue 2), p20.4580.075
Hybrid Action learning (*)choice (instrumental)free parameter κ0.5270.104Q0=0
free parameter η0.4130.125α0=1
RW - V (†)SCR (instrumental)learning rate, α0.4920.013V0=0
RW - V (†)SCR (Pavlovian)learning rate, α0.4920.014V0=0
Hybrid - Assoc (†)SCR (instrumental)free parameter κ0.4970.004V0=0
free parameter η0.4950.004α0=1
Hybrid - Assoc (†)SCR (Pavlovian)free parameter κ0.4980.003V0=0
free parameter η0.4960.008α0=1
Hybrid - V (†)SCR (instrumental)free parameter κ0.4920.012V0=0
free parameter η0.4990.003α0=1
Hybrid - V (†)SCR (Pavlovian)free parameter κ0.4940.005V0=0
free parameter η0.50.003α0=1
  1. *Fitting options: muTheta, muPhi = 0, sigmaTheta, sigmaPhi = 1.

    muTheta, muPhi=0, sigmaTheta=0.05, sigmaPhi=1.

To test which regions were better explained by each, we conducted a Bayesian model selection (BMS) within the prediction error ROIs (a conjunction mask of correlated clusters to both prediction error signals). This showed that the action-learning TD model had higher posterior and exceedance probabilities in the dorsal putamen, and VMPFC (Figure 3b warm colour clusters). The state-learning (hybrid) model better explained activities in the amygdala, ventral striatum, and DLPFC (Figure 3b cool colour clusters). Applying the same hybrid model prediction error signal in Pavlovian sessions only identified much weaker responses that did not survive multiple correction, in regions including the left amygdala (figure not shown) (Table 1).

To further illustrate the nature of the outcome response, we calculated a median split of the preceding cue values (based on the TD model), and looked at the outcome response for relief and no-relief outcomes. A prediction error response should be (i) higher for relief trials and (ii) higher when the preceding cue value was low (i.e. when relief was delivered when it was not expected) (Roy et al., 2014). As illustrated in Figure 3c, this ‘axiomatic’ analysis reveals some features of the prediction error, but lacks the resolution to illustrate it definitively.

Associability

Since the behavioural data showed that the state-based associability correlated negatively with tonic pain ratings, we examined BOLD responses correlated with trial-by-trial associability from the hybrid model, by using the associability as a parametric regressor at the choice time (see Materials and methods for details of GLMs). We specified a priori ROIs according to regions previously implicated in attention and controllability-related endogenous analgesia, notably pregenual anterior cingulate cortex (pgACC), posterior insula and ventrolateral prefrontal cortex (VLPFC) (Salomons et al., 2007; Wiech et al., 2006); and associability (amygdala) (Li et al., 2011; Zhang et al., 2016; Boll et al., 2013).

We found correlated responses only in pgACC, in instrumental sessions (Figure 3d, Table 1, MNI coordinates of peak: [−2, 37, 5]). No significant responses were observed in Pavlovian sessions. Figure 3e illustrates individual subjects’ beta values extracted from an 8 mm diameter spherical ROI mask built around peak coordinates [−3, 40, 5]. Instrumental sessions had higher response magnitude in pgACC compared to Pavlovian sessions across subjects (Instrumental sessions: one-sample t-test against 0 T(18)=3.746, p=0.0015, Pavlovian sessions: one-sample t-test against 0 T(18)=-1.230, p=0.235, paired t-test for instrumental versus Pavlovian T(18)=3.317, p=0.0038).

Summary of experiment 1

In summary, the data indicate that (i) relief action learning is well described by a RL (TD) learning process, with action prediction error signals observed in the dorsal putamen, (ii) that state-outcome learning proceeds in parallel to action-outcome learning, and can be described by an associability-dependent hybrid TD learning mechanism, and (iii) that this state associability modulates the level of ongoing tonic pain during instrumental learning, with associated responses in pgACC.

This provides good evidence of a relief learning system that modulates pain according to learned uncertainty, and raises two important questions. First, can the associability signal be distinguished from other uncertainty signals that may arise in learning? Importantly, the use of fixed probabilities in the task means that associability tends to decline during sessions, raising the possibility that more complex models of uncertainty and attention might better explain the data, for instance those that involve changing beliefs that arise in changing (non-stationary) environments. Second, does the modulation of pain ratings occur throughout the trial? In the task, pain ratings are taken at the outcome of the action, and only when relief is frustrated, raising the possibility that it reflects an outcome-driven response, as opposed to learning-driven process modifying the ongoing pain. With these issues in mind, we designed a novel task to test if the model could be generalised to a different paradigm with greater demands on flexible learning.

Experiment 2

In Experiment 2, 23 new subjects participated in a modified version of the instrumental escape learning task in Experiment 1, with a number of important differences. First, subjects performed only instrumental sessions (8 sessions with 24 trials in each) given the absence of a global effect of instrumental versus Pavlovian pain in the first experiment. Second, subjects were required to choose one out of three simultaneously presented visual cues to obtain relief, in which the position of each cue varied randomly from trial to trial. This was done to experimentally and theoretically better distinguish state-based and action-specific associability (Figure 1d). Third, the action-outcome contingencies were non-stationary, such that the relief probability from selecting each cue varied slowly throughout the experiment duration, controlled by a random walk algorithm which varied between 20 and 80% (Figure 1e). This ensured that associability varied constantly through the task, encouraging continued relief exploration, and allowed us to better resolve more complex models of uncertainty (see below). It also reduced the potential confounding correlation of associability and general habituation of SCRs. Fourth, we increased the frequency of tonic pain ratings (10 per session, 80 per subject in total) to enhance power for identifying modulatory effects on pain. Fifth, the rating was taken after the action but before outcome, to provide an improved assessment of ongoing tonic pain modulation without interference by the outcome. Finally, we also collected SCRs bilaterally, to enhance the data quality given the importance of the SCR in inferences about associability.

Behavioural results

Choice

In addition to the simple TD and hybrid action-learning TD models compared in Experiment 1, the modification in paradigm allowed us to test more sophisticated model-based learning models, including a hidden Markov model (HMM) (Prévost et al., 2013), and a hierarchical Bayesian model (Mathys et al., 2011). Both models incorporate a belief of environmental stability into learning, that is whether a cue previously predicting relief reliably has stopped being reliable during the course of the experiment. This is achieved by tracking the probability of state transition in the HMM, or environmental volatility in the hierarchical Bayesian model. Despite the greater demands of the non-stationary task compared to Experiment 1, the basic TD action learning model still best predicted choices following model comparison (model frequency = 0.624, exceedance probability = 0.989), followed by the HMM (model frequency = 0.192, exceedance probability = 0.006) and the hybrid action-learning model (model frequency = 0.174, exceedance probability = 0.004) (Figure 4a, see Methods for full details).

Figure 4 with 3 supplements see all
Experiment 2: behavioural results.

(a) Model comparison showed that TD model fitted choices best (Bayesian: hierarchical Bayesian model, HMM: hidden Markov model, Hybrid: action-learning model with associability as changing learning rate). (b) SCRs measured on the side with thermal stimulation (‘Stim side’, left hand) were lower than those on without stimulation (‘Non-stim side’, right hand), but both were highly correlated. (c) Associability from state-learning hybrid model fit SCRs best, similarly to Experiment 1. (d) Trial-by-trial associability from hybrid model fitted pain ratings best compared with other uncertain measures (entropy: HMM entropy, surprise: TD model prediction error magnitude from previous trial, null model: regression with no predictors). (e) Regression coefficients with associability as uncertainty predictor were significantly negative across subjects.

https://doi.org/10.7554/eLife.31949.012
SCR

SCRs were recorded from the side with thermal stimulation (left hand) and the side without stimulation (right hand). The left side had lower mean SCRs (Figure 4b,L/R paired t-test T(19)=-2.67, p=0.015, n = 20, exclusion criteria followed from Experiment 1), however, trial-by-trial SCRs were highly correlated between both sides within individual subjects (mean Pearson correlation ρ¯=0.733, 18 out of 20 participants with p<0.001). This suggests that although the overall SCR amplitude might be suppressed by the tonic heat stimulus, this did not affect event-related responses.

Using the same model-fitting procedure as in Experiment 1 (with the addition that the model now predicted SCR on both hands for each trial), we found that the associability from the state-outcome hybrid model again provided the best fit of trial-by-trial SCRs (Figure 4c, model frequency = 0.667, exceedance probability = 0.954). Indeed, the associability-SCR fit has a much higher model exceedance probability compared with that in Experiment 1, presumably from including the less attenuated SCRs from the non-stimulated right side.

Ratings

Experiment 1 suggested that the associability was correlated with modulation of tonic pain ratings. However, given the dynamic nature of Experiment 2, we investigated whether uncertainty measures related to other aspects of learning might offer a better account. To do this, we fitted multiple regression models to trial-by-trial ratings for each participant as follows:

(1) Rating=β1Relief+β2log(Trial)+β3Predictor

where the ‘Relief’ term is the number of trials since the previous relief outcome, log(\textTrial) is the log of trial number within session (1-24), ‘Predictor’ is the model generated uncertainty value. The ‘Relief’ and log(\textTrial) terms were included to account for potential temporal and sessional effects of the tonic pain stimulus.

We built a regression model with different uncertainty signals as predictors for comparison: the state-based associability from hybrid model (as in Experiment 1), the entropy of state-action posterior probabilities (approximate of uncertainty over values) in an HMM, the absolute value of prediction error from previous trial in TD model (as a model of surprise), and a null model that did not include ‘Predictor’ term (Figure 4d). In this analysis, the state-learning hybrid associability again best fit the pain ratings (model frequency = 0.698, exceedance probability = 0.980; n = 22, 1550 ratings, one participant was excluded for having >90% identical ratings). Regression coefficients with hybrid model associability as uncertainty predictor were significant across subjects (Figure 4e, one-sample t-test for three sets of coefficients: ‘Relief’ term: T(21)=-4.004, p<0.001 (i.e. habituation, reduced pain over time after relief), log(trial) term: T(21)=1.017, p=0.321, associability term: T(21)=-2.643, p=0.015).

Neuroimaging results

Prediction errors

We found that the TD model action prediction errors was robustly correlated with BOLD responses in similar regions identified in Experiment 1, including left dorsal putamen, bilateral amygdala, and left DLPFC (Figure 5a, Table 2). Of these, BMS showed the TD model had higher posterior and exceedance probabilities in the dorsal putamen, as well as amygdala and DLPFC (Figure 5b warm colour clusters). The state-learning hybrid model explained prediction error responses in several areas, but outside our original a priori regions of interest (see Figure 5b cool colour clusters).

Table 6
Experiment 2 learning model fitting results.
https://doi.org/10.7554/eLife.31949.017
Model (Options)Data fittedParametersMeanStdInitial states
TD (*)choicelearning rate, α0.5770.28Q0=0
Hybrid Action learning (*)choicefree parameter κ0.7740.381Q0=0
free parameter η0.140.139α0=1
HMM (*)choicestate transition probability β0.2750.213Q0=0.5
relief outcome bias c0.5350.212
no relief outcome bias d0.0270.072
Bayesian (‡)choicelevel 2 (outcome) κ0.3310.239Q0=0
level 2 (outcome) ω−0.4231.396
level 3 (belief) θ0.450.03
RW - V (†)SCR (bilateral)learning rate, α0.460.054V0=0
Hybrid - Assoc (†)SCR (bilateral)free parameter κ0.490.01V0=0
free parameter η0.4880.027α0=1
Hybrid - V (†)SCR (bilateral)free parameter κ0.480.034V0=0
free parameter η0.4960.013α0=1
  1. * Fitting options: muTheta, muPhi = 0, sigmaTheta, sigmaPhi = 1.

    muTheta, muPhi = 0, sigmaTheta = 0.05, sigmaPhi = 1.

  2. muTheta=[0,-2,0], muPhi=0, sigmaTheta, sigmaPhi=1

Figure 5 with 2 supplements see all
Experiment 2: neuroimaging results, shown at p<0.001 uncorrected: (a) TD model prediction errors (PE), at outcome onset time (duration = 3 s). 

(b) Model PE posterior probability maps (PPMs) from group-level Bayesian model selection, warm colour: TD model PE, cool colour: hybrid model PE (both shown at exceedance probability p>0.80). (c) Axiom analysis, separating trials according to outcomes and predicted relief values (bins 1–3 from low to high), BOLD activity pattern from striatum (putamen) satisfied those of relief PE. (d) Associability uncertainty generated by hybrid model correlating with pgACC activities, at choice time (duration = 0). (e) pgACC activation beta values across all subjects, ROI was 8 mm sphere at [−3, 40, 5], peak from overlaying the pgACC clusters from Experiments 1 and 2.

https://doi.org/10.7554/eLife.31949.018

As previously, we further illustrated the pattern of outcome responses as a function of preceding cue value and relief/no-relief in an ‘axiomatic’ analysis. We split the trial values into three bins, allowing a better inspection of responses permitted by our larger number of trials. This revealed a clear prediction error-like pattern in the dorsal putamen, but somewhat less clear cut in the amygdala and DLPFC (Figure 5c). Therefore, across all analysis methods and the two experiments, the left dorsal putamen robustly exhibited a response profile consistent with an escape-based relief prediction error.

Associability

Following the same analysis as in Experiment 1, we found again that pgACC BOLD responses correlated with trial-by-trial associability from the state-learning hybrid model (Figure 5d–e, Table 2). The peak from this analysis was almost identical to that in Experiment 1 (Overlayed clusters can be found in Figure supplements). In addition, we used trial-by-trial pain ratings as a parametric modulator, but did not find significant pgACC responses, which suggested that it was unlikely to be solely driven by pain perception itself. Taken together, this indicates that the pgACC associability response is robust across experimental designs.

Summary of experiment 2

In summary, Experiment 2 reproduced the main results of Experiment 1 within a non-stationary relief environment. Firstly, dorsal putamen correlated with an action-relief prediction error from the RL model. And secondly, pgACC correlated with a state-based associability signal, that in turn was associated with reduced tonic pain. In particular, this modulation of pain was present after the cue was presented (and not just at the outcome as in Experiment 1) and was better explained by the associability signal when compared against alternative uncertainty measures.

Discussion

Across both experiments, the results provide convergent support for two key findings. First, we show that relief seeking from the state of tonic pain is supported by a reinforcement learning process, in which optimal escape actions are acquired using prediction error signals, which are observed as BOLD signals in the dorsal putamen. Second, we show that during learning, the level of ongoing pain is reduced by the learned associability associated with state-based relief predictions. This signal thus reduces pain when there is a greater capacity to learn new information and is associated with BOLD responses in the pregenual anterior cingulate cortex. Together, these results identify a learning circuit that governs tonic pain escape learning whilst also suppressing pain according to the precise information available during learning. In doing so, it solves the problem of balancing tonic pain with the requirement to actively learn about behaviour that could lead to relief.

The findings highlight the dual function of a state-based relief associability signal during tonic pain escape. Associability has its theoretical underpinnings in classical theories of associative learning and attention (i.e. the Pearce-Hall theory, Pearce and Hall, 1980), and its mathematical implementation here is as an approximate uncertainty quantity derived from computing the running average of the magnitude of the prediction error (Sutton, 1992; Le Pelley, 2004). This uncertainty signal effectively captures how predictable the environment is: when uncertainty is high (because of lots of recent large prediction errors), it increases the speed of acquisition through increasing the learning rate, and so accelerates convergence to stable predicted values. It is therefore an effective attention-like signal for mediating endogenous analgesia, because it selectively facilitates active relief seeking by suppressing pain only when it is necessary. This conception of the role of uncertainty in pain may explain why uncertainty has been shown to enhance phasic pain (Yoshida et al., 2013) - where pain acts as the signal to drive learning, and suppresses tonic pain, where pain acts to reduce general cognition. In both instances, the role of uncertainty and attention is to facilitate learning.

A caveat to this is that associability cannot distinguish unreliable cues - inherently poor predictors of outcomes, and so does not discriminate between reducible and irreducible uncertainty, bearing in mind there is little adaptive logic in suppressing pain for unreliable predictors. Over extended time-frames, it is possible that the learning system recognises this and reduces endogenous control. However, in rodent studies of associative learning, associability is maintained even after several days of training (Holland et al., 2002), and it is possible that salient cues in aversive situations maintain the ability to command attention and learning longer than that would be predicted by ‘optimal’ Bayesian models.

The localisation of the associability signal to the pgACC is consistent with a priori predictions. The region is known to be involved in threat unpredictability (Rubio et al., 2015; Nitschke et al., 2006), computations of uncertainty during difficult approach-avoidance decision-making (Amemori and Graybiel, 2012), and in the perseverance of behaviour during foraging (McGuire and Kable, 2015; Kolling et al., 2012). It is distinct from a more anterior region in the ventromedial prefrontal cortex associated with action value (FitzGerald et al., 2012). More importantly, it has been specifically implicated in various forms of endogenous analgesia, including coping with uncontrollable pain (Salomons et al., 2007), distraction (Valet et al., 2004), and placebo analgesia (Bingel et al., 2006; Eippert et al., 2009). However, an open question remains about the role of conscious awareness in driving pgACC-related endogenous control - a factor that is often important in these other paradigms. Whether or not the role of associability is modulated by the metacognitive awareness of uncertainty or controllability would be an important question for future studies.

The pgACC has been suggested to be central to a ‘medial pain system’ and the descending control of pain, with its known anatomical and functional connectivity to key regions including the amygdala (Derbyshire et al., 1997; Vogt et al., 2005; Salomons et al., 2015) and PAG (Stein et al., 2012; Buchanan et al., 1994; Vogt, 2005; Domesick, 1969). Evidence of high level of μ-opioid receptors within pgACC (Vogt et al., 2005), where increased occupation has been found in both acute and chronic pain (Zubieta et al., 2005; Jones et al., 2004), further illustrates pgACC’s potential role for cortical control of pain.

The results provide a formal computational framework that brings together theories of pain attention, controllability and endogenous analgesia. Previous demonstrations of reduced pain (albeit typically for phasic, not tonic pain) have been inconsistent (Becker et al., 2015; Salomons et al., 2004; Salomons et al., 2007; Wiech et al., 2014; Wiech et al., 2006; Mohr et al., 2012). Our results offer insight into why - by suggesting that endogenous analgesia is not a non-specific manifestation of control, but rather a specific process linked to the learnable information.

From the perspective of animal learning theory, the experiments here show how motivation during the persistent pain state can be understood as an escape learning problem, in which the state of relief is determined by the offset of a tonic aversive state (Mackintosh, 1983; Solomon and Corbit, 1974). This is theoretically distinct from the better-studied form of relief that results from omission of otherwise expected pain or punishment (Konorski, 1967), and which motivates avoidance behaviour (Mowrer, 1960). In our task, acquisition of dissociable behavioural responses (SCRs and choices) reveals the underlying theoretical architecture of the escape learning process, which involves both parallel state-outcome and action-outcome learning components. The action-outcome learning error signal localises to a region of the dorsolateral striatum (dorsal putamen). Striatal error signals are seen across a broad range of action learning tasks, although the region here appears more dorsolateral than previously noted in avoidance learning (Kim et al., 2006; Seymour et al., 2012; Delgado et al., 2009). It is not possible to definitively identify whether avoidance and escape use distinct errors, but it is well recognised that there are multiple error signals in dorsal and ventral striatum, for instance reflecting ‘model-based’ (cognitive), ‘model-free’ (including stimulus-response habits) and Pavlovian control (Tricomi et al., 2009; Schonberg et al., 2010; Yin et al., 2004). The reinforcement learning model we describe is a ‘model-free’ mechanism, since it learns action values but does not build an internal model of state-outcome identities and transition probabilities (Daw et al., 2005). However, it is likely that a model-based system co-exists and might be identifiable with appropriate task designs (Daw et al., 2011).

Developing a computational account of relief learning and endogenous control may also help us understand how the brain contributes to the pathogenesis and maintenance of chronic pain (Navratilova and Porreca, 2014). Adaptive learning processes are thought to be important in chronic pain: learning and controllability have been proposed to play a role in the pathogenesis and maintenance of chronic pain (Vlaeyen, 2015; Flor et al., 2002; Apkarian et al., 2004; Salomons et al., 2015), and brain regions such as the medial prefrontal cortex and striatum have been consistently implicated in clinical studies, for example in pain offset responses (Baliki et al., 2010) and resting functional connectivity in chronic back pain (Baliki et al., 2008; Baliki et al., 2012; Fritz et al., 2016; Yu et al., 2014). In addition to suggesting a possible computational mechanism that might underlie pain susceptibility in these patients, the results highlight the pgACC as a potential target for therapeutic intervention.

Materials and methods

Subjects

Two separate groups of healthy subjects participated in the two neuroimaging experiments (Experiment 1: n = 19, six female, age 26.1±5.1 years; Experiment 2: n = 23, five female, age 23.9±3.1 years). All subjects gave informed consent prior to participation, had normal or corrected to normal vision, and were free of pain conditions or pain medications. The two experiments were performed in different institutes, and approved by their relevant ethics and Safety committees: for the National Institute of Information and Communications Technology, Japan (Experiment 1), and the Advanced Telecommunications Research Institute, Japan (Experiment 2).

Experimental design

Experiment 1

Subjects participated in an interleaved instrumental conditioning and yoked Pavlovian relief conditioning sessions in which they actively or passively escaped from tonic pain, respectively. Tonic pain was maintained by constant thermal stimulation to the left inner forearm (see ‘Stimulation’ for details), and relief was induced by temporarily cooling the heat stimulus, which abolishes pain and causes a strong sense of relief.

In the instrumental conditioning sessions, subjects learned to select actions based on different cues. The cues were abstract fractal images on a computer screen. Actions were left or right button-presses on a response pad, and successful outcomes were the brief cooling (relief) period from the tonic painful heat. There were two types of visual cue: an ‘easy’ cue with high probability of relief when paired with a particular response (80% relief chance with one of the button press responses and 20% chance with the other response), and a ‘hard’ visual cue with a lower probability of relief with a particular response (60%/40% relief chance for the two response actions). These different outcome probabilities were used to induce experimental variability in the uncertainty of relief prediction. On each trial, the visual cue (conditioned stimulus, CS) appeared on screen for 3 s, during which subjects were asked to make the left or right button press response. An arrow corresponding to the chosen direction was superimposed on the cue after the decision was made until the 3 s display period ended. The disappearance of the cue and response arrow was followed immediately by the outcome of a temporary decrease in temperature of the painful heat stimulus (temporary reduction of temperature by 13°C from the tonic level for 4 s), or no change in temperature such that the constant pain continued straight on into the next trial. The next trial started after a jittered inter-trial interval (ITI) of 4–6 s (mean = 5 s) after outcome presentation concluded (Figure 1a). There were 20 trials per session, with equal number of ‘easy’ and ‘hard’ cues (n = 10 each). Each session lasted about 5 min.

The yoked Pavlovian conditioning task was identical to the instrumental task, except subjects did not have control over the outcomes through their responses. Instead, the sequence of cues and outcomes from the previous instrumental session were used (or the first instrumental session from the previous subject, for subjects who started with a Pavlovian session), although subjects were not aware of the yoking process. A different set of fractal images was used for the yoked Pavlovian sessions, so learning from an instrumental session could not be transferred to its corresponding Pavlovian session. To control for motor responses in both sessions, subjects were asked to press the response button according to the randomised indicator arrow, which appeared on screen 0.5 s after CS presentation. This is common in neuroimaging studies of Pavlovian and instrumental learning, and it was clearly explained to subjects that these actions bore no relationship to outcomes.

Each subject repeated instrumental and yoked Pavlovian sessions three times (six sessions in total). They were clearly instructed whether it was a Pavlovian or instrumental session. To remove any order confounds, the session order was alternated within and between subjects (i.e. order ABABAB, or BABABA), with half the subjects started with the instrumental and the other half with the Pavlovian task. A short break was taken every two sessions to allow the experimenter to change the location of the heat stimuli probe, to minimise effects of habituation/sensitization across the whole experiment.

Subjective ratings of perceived trial outcomes (pain relief or ongoing pain) were collected near the beginning, middle, and end of each session, in identical order for instrumental and its yoked Pavlovian counterpart. A 0–10 rating scale appeared 3.5 s after outcome presentation (0.5 s overlap with relief duration if any), where the scale ranged from 0 (no pain at all) to 10 (unbearable pain) for no relief outcome (red scale in Figure 1a), and 0 (no relief at all) to 10 (very pleasant relief) for relief outcome (green scale). Although it is the case that ratings are inherently subjective, their modulation reflects an objective process that may explain a component of this apparent subjectivity. This does raise the issue of whether the subjective relief ratings influence the outcome values when learned in the RL model, but this (presumably subtle) effect is something that is beyond the experimental power of this experiments to resolve.

Experiment 2

Experiment 2 was a purely instrumental relief conditioning task, similar to that of Experiment 1. However in this task, three visual cues were presented on screen simultaneously for 3 s, during which the subject was asked to choose one (Figure 1d) with a three-button response pad. Each one of these cues had varying relief probability, generated by a random walk process (probabilities changing at step size of 0.1, bound between 0.2 and 0.8, with random start). Relief outcomes were identical to that in Experiment 1, except the duration was reduced to 3 s, which was enough to produce a similar relief sensation with lower trial time. Subjects repeated the same task for eight sessions (24 trials each), with the same visual cues throughout. However, several subjects did not complete all sessions because of excess time in SCR experimental set-up which reduced the time available for the task; hence, the overall average was 7.08±1.44 sessions per subject.

Subjective pain ratings were collected after the 3 s choice period and before outcome presentation, in 10 random trials out of 24 in each session, with the same 0–10 rating scale in Experiment 1 (red scale only). We have summarised the details of ratings from both experiments in Table 3.

Stimulation

Painful tonic thermal stimuli were delivered to the subject’s skin surface above the wrist on the left inner forearm, through a contact heat-evoked potential stimulator (CHEPS, Medoc Pathway, Israel). The CHEPS thermode is capable of rapid cooling at 40°C/s, which made rapid temporary pain relief possible in an event-related design.

The temperature of painful tonic stimuli was set according to the subject’s own pain threshold calibrated beforehand. In Experiment 1, before the task, two series of 6 pre-set temperatures were presented in random order (set 1: mean ± std 43.7±1.7°C; set 2: 44.6±0.6°C), with each temperature delivered for 8 s, after which the subject determined whether the stimulation period was painful or not (ISI = 8 s). The higher of the two lowest painful temperatures from the two tests was used as the tonic stimulation temperature.

In Experiment 2, 10 temperatures were presented in each series, both were randomly generated with 44.4±0.7°C. After the 8 s stimulation, subjects were asked to rate their pain on a 0–10 VAS scale, which were fitted with a sigmoid function. The temperature was chosen from the temperature range of: 44, 44.2, 44.5, 44.8, 45°C, whichever closest and below the model fitted value of VAS = 8.

The final temperature used did not differ hugely for the two experiments despite the change in thresholding method (Experiment 1: 44.3±0.2°C, Experiment 2: 44.5±0.4°C). The relief temperature was set constant at 13°C below threshold temperature for all subjects.

Physiological measures

Skin conductance responses (SCRs) were measured using MRI-compatible BrainAmp ExG MR System (Brain Products, Munich, Germany) with Ag/AgCl sintered MR electrodes, filled with skin conductance electrode paste.

In Experiment 1, SCR data were recorded on volar surfaces of distal phalanges of the second and fourth fingers on the left (tonic pain side with thermode attached). In Experiment 2, data were recorded from both hands, in the same location on the left (with thermode), and on the hypothenar eminences of the palm on the right (button press hand without thermode), with electrodes approximately 2 cm apart. The signals were collected using BrainVision software at 500 Hz with no filter.

Off-line processing and analysis were implemented in MATLAB7 (The MathWorks Inc., Natick, MA), with the PsPM toolbox (http://pspm.sourceforge.net/). Data were down-sampled to 10 Hz, band-pass filtered at 0.0159–2 Hz (1 st order Butterworth). Given the variable nature of SCR onset and duration in a learning experiment, the non-linear model in PsPM was used. Boxcar regressors were constructed at cue onset (duration = 3 s, cue presentation). These regressors were convolved with the canonical skin conductance response function, to estimate event-related response amplitude, latency, and dispersion (only SCR amplitude were used in modelling).

Sessions with more than 20% trials (4 out of 20 trials for Experiment 1, 5 out of 24 for Experiment 2) with cue-evoked SCR amplitude below the threshold of 0.02 were labelled as not having enough viable event related SCRs. In Experiment 1, 15 subjects and 50 sessions remained. In Experiment 2, 19 subjects and 79 sessions remained for the left (thermal stimulation side), 20 subjects and 96 sessions remained for the right (no stimulation side). For model fitting, right side SCR reject criteria were used, since both channel’s data were included as two data sources. Trial SCRs were log-transformed within subject before model fitting. Transformed SCRs on both sides were highly correlated (Figure 4b).

Other behavioural measures

Trial-by-trial choice data (button press indicating choices) and reaction times (length of time taken from CS onset to button press) of subjects were recorded as part of behavioural measurements.

Computational learning models

To capture relief learning we fitted behavioural responses using different learning models from previous studies (Table 4). Free energy (F) are variational Bayesian approximation of model’s marginal likelihood, table showing the sum of F for all participants to provide model absolute fit evaluation. Actual model comparison was conducted based on random-effect analysis. For instrumental learning, the reinforcement of subjects’ responses (i.e. choices) based on relief experience can be modelled using reinforcement learning model (Sutton and Barto, 1998). For Pavlovian learning, physiological responses can be used for model fitting (Li et al., 2011; Boll et al., 2013; Zhang et al., 2016).

Table 2
Multiple correction for Experiment 2 (cluster-forming threshold of p<0.001 uncorrected, regions from Harvard-Oxford atlas. *FWE cluster-level corrected (showing p<0.05 only).
https://doi.org/10.7554/eLife.31949.021
p*kTZMNI coordinates (mm)Region mask
xyz
TD model PE
0.002154.313.63−25-5−22Amygdala L
0.003114.363.6624-8−14Amygdala R
0.01813.973.4128-1−26
0.002225.94.52−32-85Putamen L
0.02144.553.7832−161Putamen R
Hybrid model PE
0.001164.363.66−21−12−14Amygdala L
4.233.58−21-1−18
0.002134.954.0124-8−18Amygdala R
4.343.6528-1−26
0.003175.494.31−32-85Putamen L
Hybrid model associability
0.001294.53.75-64012Cingulate Anterior
4.443.71-23323
4.083.49-2445
3.933.382401

Win-Stay-Lose-Shift (WLSL) model

WSLS assumes a subject has fixed pseudo Q values for each state-action pair, where a relief outcome always produces a positive value for the chosen state-action pair (i.e. win-stay), while the remaining state-action combinations had negative values (i.e. lose-shift). A no relief outcome flipped the sign of all values. Two free parameters p1 and p2 (0p1,21) scaling the pseudo Q values for the two cues presented were used in model fitting, which were assumed fixed throughout the experiment but varied for individuals.

TD model

The predicted state-action value Q given particular state s and action a between successive trials is updated using an error-driven delta rule with learning rate α (0α1) (Gläscher et al., 2010; Morris et al., 2006; Sutton and Barto, 1998):

(2) Qt+1(s,a)=Qt(s,a)+α(rt-Qt(s,a))

where rt is the outcome of the trial (relief = 1, no relief = 0). The probability of choosing action a from a set of all available actions As{a,b,c} in trial t is modelled by a softmax distribution, 

(3) p(a|s)=𝑒𝑥𝑝(τQt(s,a))bAs𝑒𝑥𝑝(τQt(s,b))

where τ is the inverse temperature parameter governing the competition between actions (τ>0).

Rescorla-Wagner (RW) model

For Pavlovian learning, where choice decisions are not available, the standard temporal difference (TD) model updates the state value V(s) based on prediction errors following the Rescorla-Wagner learning rule:

(4) Vt+1(s)=Vt(s)+α(rt-Vt(s))

Hybrid model

The hybrid model incorporated an associability term as a changing learning rate for a standard TD model in value learning (Le Pelley, 2004; Li et al., 2011). The associability term is also referred to as Pearce-Hall associability, an equivalent measure of attention or uncertainty, which is modulated by the magnitude of recent prediction error. The varying learning rate can be used in Pavlovian state-learning: 

(5) Vt+1(s)=Vt(s)+καt(s)(rtVt(s))
(6) αt+1(s)=η|rtVt(s)|+(1η)αt(s)

where η, κ are free parameters limited within the range of [0,1].

The model can also be extended to instrumental action-learning: 

(7) Qt+1(s,a)=Qt(s,a)+καt(s,a)(rtQt(s,a))
(8) αt+1(s,a)=η|rtQt(s,a)|+(1η)αt(s,a)

Hidden Markov Model (HMM)

For Experiment 2, where relief probability is unstable, model-based learning models were fitted to behavioural data. Hidden Markov Model with dynamic expectation of change (Prévost et al., 2013; Schlagenhauf et al., 2014) was adapted to incorporate a hidden state variable St that represents the subject’s estimation of an action-outcome pair (e.g. in Experiment 2, St=(cue,relief), three cues × relief/no relief = 6 combinations). The state transition probabilities are calculated as:

(9) P(St|St1)=(1βββ1β)

where β is a free parameter (0β1). For each cue, the symmetry of the transition matrix encodes the reciprocal relationship between relief/no relief belief. Given the hidden state variable, the probability of actually observing this outcome is updated as:

(10) P(Ot|St)=0.5×(1+c1c1d1+d)

where the rows of the matrix represent relief/no relief outcomes, the columns represent the relief/no relief belief in St. c and d are free parameters (0c1, 0d1) to incorporate potential discrimination between the two outcome types. The prior probability of St is calculated from the state transition probabilities and the posterior probability of St-1 (Equation 11). The posterior probability of St is calculated from the prior P(St) (from Equation 11) and the observed outcome Ot (Equation 12):

(11) P(St)=St-1P(St|St-1)P(St-1)
(12) P(St)=P(Ot|St)P(St)StP(Ot|St)P(St)

where Equation 11 is updated before observed outcome Ot, Equation 12 is updated after Ot.

St can be used to approximate state values by calculating the relative relief belief through a sigmoid function, with a free parameter m, and the preferred action to be inferred using the softmax function.

(13) P(r=1|cue)=11+exp(x)

where x=St(r=1)St(r=0)+m.

To represent uncertainty under i possible posterior relief probabilities, entropy H is calculated for chosen cue as:

(14) H(St)=-iP(St)logP(St)

Hierarchical Bayesian model

The Hierarchical Bayesian model introduced by (Mathys et al., 2011) incorporates different forms of uncertainty during learning on each level: irreducible uncertainty (resulting from probabilistic relationship between prediction and outcome), estimation uncertainty (from imperfect knowledge of stimulus-outcome relationship), and volatility uncertainty (from potential environmental instability). This model has been shown to fit human acute stress responses (de Berker et al., 2016). The model was adopted to our study with the basic structure unchanged, and the second level estimated probabilities were used to approximate state values of different cues, and the preferred action calculated using the softmax function.

Modelling pain ratings

Our prior hypothesis suggests uncertainty is a likely modulator of tonic pain perception, hence model generated uncertainty signals (associability in Experiments 1 and 2, with entropy and surprise added in Experiment 2) were used as the main pain rating predictors. A generalised linear model includes the uncertainty predictor, and additional terms to control for potential temporal habituation/sensitization and between-session variation:

(15) Rating=β1Relief+β2log(Trial)+β3Predictor

where the ‘Relief’ term is the number of trials since the previous relief outcome, log(Trial) is the log of trial number within session (1-24), ‘Predictor’ is the model generated uncertainty value using group-averaged model parameters fitted with choice/SCR data. All trials were used for predictor calculation, but only rated trials were included in this regression.

Model fitting and comparison

Model fitting

Model fitting was performed with the Variational Bayesian Analysis (VBA) toolbox (https://mbb-team.github.io/VBA-toolbox/). The toolbox seeks to optimise free energy within the Bayesian framework, analogous of maximum likelihood. Behavioural data (choices, SCRs) were fitted separately for each individual resulting in different sets of parameters, and model fitting performance was measured by aggregating individual subject fitting statistics. The mean of all subject parameters were used to generate regressors for fMRI analysis following conventions (Table 5 and Table 6).

Table 3
Details of subjective ratings for Experiments 1 and 2.
https://doi.org/10.7554/eLife.31949.022
ExperimentRating typeRating timingAvg # of ratings per subject
Experiment 1Instrumental painAfter 3 s cue + choice window AND outcome (rating type depend on outcome)8.2
Instrumental relief7.7
Pavlovian pain8.1
Pavlovian relief7.7
Experiment 2Instrumental painAfter 3 s cue + choice window, BEFORE outcome70.9
Table 4
All learning models fitted (bold: winning model; AL - action-learning; SL - state-learning, F - variational Bayesian approximation to the model’s marginal likelihood, used for model comparison)
https://doi.org/10.7554/eLife.31949.023
Experiment 1 (Instrumental sessions)
ChoiceF (n=19, sum [sem])SCRF (n = 15, sum [sem])
TD-1330.920 [3.604]RW - value−1079.153 [8.024]
Hybrid (AL)-1345.667 [3.664]Hybrid (SL) - value−1077.911 [8.059]
WSLS-1486.723 [3.973]Hybrid (SL) - associability−1077.699 [8.003]
Experiment 1 (Pavlovian sessions)
Choice (not available)SCRF (n = 15, sum [sem])
N/ARW - value−1101.079 [7.132]
Hybrid (SL) - value−1096.250 [7.195]
Hybrid (SL) - associability−1095.135 [7.106]
Experiment 2 (Instrumental sessions, Pavlovian not available)
ChoiceF (n=23, sum [sem])SCRF (n = 20, sum [sem])
TD-3572.476 [8.736]RW - value−7867.834 [60.668]
Hybrid (AL)-3626.478 [8.946]Hybrid (SL) - value−7857.341 [60.643]
HMM-3571.020 [9.067]Hybrid (SL) - associability−7841.864 [60.838]
Bayesian Hierarchical-3784.372 [8.616]

The VBA toolbox takes in an evolution function that describes the learning model (e.g. value updating rule), and an observation function that describes response mapping (e.g. softmax action selection). For choice fitting, data were split into multiple sessions to allow between-session changes in observation function parameters, but evolution function parameters and initial states were fixed throughout all sessions.

For SCR fitting, multi-session split was the same as choice fitting. The first two trials from each session were excluded from fitting to avoid extreme values from startle effects, which also served to reduce the confound from general habituation of SCRs. Trials with insufficient event-related responses were also excluded (see ‘Physiological measures’ above). The observation function for SCR fitting were simply g(x)=Predictor+b, with b as a free parameter. The predictor (model uncertainty) was not scaled to avoid overfitting. For Experiment 2, both left and right SCRs were fitted simultaneously as two data sources, with b1 and b2 as two free parameters to fit each side with the same predictor.

Parameter prior setting for models followed previous studies. TD, RW and hybrid models all have initial values as 0, and initial associability as 1. HMM and Bayesian models all have initial hidden states of relief belief as 0. All evolution parameters had variance set to 1, with the exception of SCR fitting at 0.05 to reduce flexibility.

We calculated the protected exceedance probabilities based on (Rigoux et al., 2014), shown in figure supplements in the same way as in the original exceedance probabilities in the results section. See http://mbb-team.github.io/VBA-toolbox/wiki/BMS-for-group-studies/#rfx-bms for details of its calculation. In Experiment 1, for SCR fitted model comparison, the best fitting model became less clear. However, in Experiment 2, where the number of trials was increased as fitting was not conducted separately for Instrumental/Pavlovian sessions, best fitting models from comparison remained unchanged from the original comparison using exceedance probabilities. Results from Experiment 2 provided validation for Experiment 1 in the way similar to the neuroimaging analysis.

Model comparison

Model comparison was implemented with random-effect Bayesian model selection in the VBA toolbox. The best fitted model for each individual is allowed to vary, and model frequency in population (i.e. in how many subjects the model was the best-fit model) was estimated from model fitting evidence (free energy from learning models in choice and SCR fitting, or log likelihood from regression models in rating fitting), and model exceedance probability (i.e. how likely the model is more frequent than other models compared).

fMRI acquisition

For Experiment 1, neuroimaging data was acquired with a 3T Siemens Magnetom Trio Tim scanner, with the Siemens standard 12 channel phased array head coil. For Experiment 2, a 3T Siemens Prisma scanner was used, with the Siemens standard 64 channel phased array head coil.

Scanning parameters were identical for both experiments: functional images were collected with a single echo EPI sequence (repetition time TR = 2500 ms, echo time TE = 30 ms, flip angle = 80, field of view = 240 mm), 37 contiguous oblique-axial slices (voxel size 3.75 × 3.75×3.75 mm) parallel to the AC-PC line were acquired. Whole-brain high resolution T1-weighted structural images (dimension 208 × 256×256, voxel size 1 × 1×1 mm) using standard MPRAGE sequence were also obtained.

fMRI preprocessing

Functional images were slice time corrected using SPM12 (http://www.fil.ion.ucl.ac.uk/spm/software/spm12/) with individual session’s slice timing output by the scanner. Resulting images were then preprocessed using the fmriprep software (build date 09/03/2017, freesurfer option turned off, https://github.com/poldracklab/fmriprep), a pipeline that performs motion correction, field unwarping, normalisation, field bias correction, and brain extraction using a various set of neuroimaging tools available. The normalised images were smoothed using a Gaussian kernel of 8 mm using SPM12. The confound files output by fmriprep include the following signals: mean global, mean white matter tissue class, three FSL-DVARS (stdDVARS, non-stdDVARS and voxel-wise stdDVARS), framewise displacement, six FSL-tCompCor, six FSL-aCompCor, and six motion parameters (matrix size: 24 × number of volumes).

fMRI GLM model

All event-related fMRI data were analysed with generalised linear models (GLM) constructed using SPM12, estimated for each participant in the first level. Model generated signals used as parametric modulators were generated with one set of group-mean model parameters, obtained with behavioural data fitting as described. We used the mean of the fitted parameters from all participants in the imaging analysis as this provides the most stable estimate of the population mean (taking into account the fact that individual fits reflect both individual differences and noise). For completeness, however, we also ran the analyses with individually fitted values, which led to similar results (i.e. no change in significance level of each result). All regressors were convolved with a canonical hemodynamic response function (HRF). We also include regressors of no interest to account for habituation and motion effects. Specifically, the number of trials since last receiving a relief outcome (‘Relief’ term in rating regression model), and the log of trial number within session (log(Trial) term) were included to regress out potential change in tonic pain perception simply due to prolonged stimulation. The resulting GLM estimates were entered into a second-level one-sample t-test for the regressors of interest to produce the random-effect statistics and images presented in Results section.

TD softmax (Figure 3a and Figure 5a)

Regressors of interest:

  • CS onset (duration = 3 s, cue presentation): Q values of chosen cue,

  • Outcome onset (duration = 3 s): prediction error,

Regressors of no interest:

  1. CS onset (duration = 3 s, cue presentation): number of trials since last relief,

  2. CS onset (duration = CS onset to outcome offset, entire trial exclude ITI): within session log trial number,

  3. choice press (duration = 0),

  4. rating press (duration = rating duration),

  5. CS offset (duration = 0),

  6. 24 column confounds matrix output by fmriprep.

Hybrid model associability (Figure 3d and Figure 5d)

Regressors of interest:

  • choice press time (duration = 0, cue button press): associability (generated for individual session with new V0/A0 to match SCR fitting procedure),

Regressors of no interest: same as GLM above, adding relief onset (duration = 0), and removing choice press regressor. We note for completeness that it is theoretically possible to model the learning process as a continuously valued function that exactly matches the time-course of the temperature changes. In the context of the current study, this effect of this would be largely orthogonal to the experimental manipulations. However, representation of the baseline temperature as a continuous function is clearly important in real-life contexts in which the baseline level determines homeostatic motivation and phasic reward functions (Morville et al., 2018), and hence future studies could directly manipulate this.

For multiple comparison, we used anatomical binary masks generated using the Harvard-Oxford Atlas (Desikan et al., 2006) for small volume correction. Atlases are freely available with the FSL software (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases). We thresholded the probability maps at 50%, focusing on ROIs defined a priori (learning related: amygdala, accumbens, putamen, caudate, pallidum, VMPFC, DLPFC. Controllability-induced analgesia related: cingulate gyrus - anterior division, insular cortex, VLPFC). We used the frontal medial cortex for VMPFC, the frontal orbital cortex for VLPFC, and the middle frontal gyrus for DLPFC respectively. We reported results with p<0.05 (FWE cluster-level corrected). Masks were applied separately, not combined (Table 1 and Table 2).

fMRI model comparison

To determine whether state-based and action-based learning involve the same brain regions during instrumental learning, we used Bayesian model selection (BMS) with the instrumental sessions imaging data. We ran Bayesian first level analysis using two separate GLMs containing the prediction error signals from TD and hybrid models (at outcome onset time, durations = 3 s) using unsmoothed functional imaging data, with the same regressors of no interest as other GLMs described. To reduce computation time, this was restricted to voxels correlated to prediction error from previous parametric modulation analysis results from our present study, within a mask of conjunction clusters from TD and hybrid prediction error analysis (cluster formation at p<0.01, k<5). Resulting log-model evidence maps produced from each model for individual participant were first smoothed with a 6 mm Gaussian kernel, then entered into a random-effect group analysis (Stephan et al., 2009). Voxel-wise comparison between models produced posterior and exceedance probability maps to show whether a particular brain region is better accounted for by one model or the other. Posterior probability maps were overlaid on subject-averaged anatomical scans using MRIcroGL (https://www.nitrc.org/projects/mricrogl/).

Axiom analysis for prediction errors

To determine whether ROI activations to prediction errors were responding outcomes or prediction errors, we carried out ROI axiomatic analysis (Roy et al., 2014). Trials were separated into relief or no relief outcomes, then into equal-size bins of ascending sorted expected relief values, calculated from TD model as we were primarily interested in instrumental/active relief learning. This produced four regressors (2 outcomes × 2 value bins) in Experiment 1, and six regressors (2 outcomes × 3 value bins) in Experiment 2, to be estimated at outcome time (duration = 3 s) when prediction error was generated. GLMs include button presses for choice or rating, and movement related regressors of no interest mentioned above. ROI masks of 8 mm spheres were generated from peak coordinates from TD model prediction error exceedance probability map calculated by BMS above (ventral and dorsal striatum, amygdala, VMPFC and DLPFC).

Mean activity were extracted from these ROI masks averaged across sessions within individual subject. Although the axiomatic analysis is useful for delineating outcome and prediction responses in previous reward or aversive PE studies, the continued presence of tonic pain in our study differs from the ‘no stimulation’ conditions in these studies, thus we are primarily interested in the overall BOLD activity pattern and did not include full statistics of this analysis.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
    Theoretical Neuroscience, Vol. 806
    1. P Dayan
    2. LF Abbott
    (2001)
    Cambridge: MIT Press.
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
    Integrative Activity of the Brain: An Interdisciplinary Approach
    1. J Konorski
    (1967)
    Chicago: University of Chicago Press.
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
    Conditioning and Associative Learning
    1. NJ Mackintosh
    (1983)
    Clarendon Press Oxford.
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
    Learning Theory and Behavior
    1. H Mowrer
    (1960)
    Learning Theory and Behavior.
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
    Introduction to Reinforcement Learning (1st edition)
    1. RS Sutton
    2. AG Barto
    (1998)
    Cambridge: MIT Press.
  68. 68
    AAAI
    1. RS Sutton
    (1992)
    171–176, Adapting bias by gradient descent: An incremental version of delta-bar-delta, AAAI.
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81

Decision letter

  1. Tor Wager
    Reviewing Editor; 1Institute of Cognitive Science, University of Colorado Boulder, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "The Control of Tonic Pain by Active Relief Learning" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Frank as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Zhang and colleagues investigate how different aspects of relief learning during tonic pain stimulation relate to pain perception and what the neuronal correlates of these processes are. They report that uncertainty/attention inferred from a formal model parameter called 'associability' is correlated to reductions in pain perception. This is an intriguing and novel take on the links between learning processes and pain regulation. The study addresses an important and timely question that will be of high interest to readers in various research fields, including pain, learning theory, decision-making, and motivation. They identify neural correlates of prediction error and associability and show that these parameters map onto responses in striatum and pgACC in two separate imaging studies of instrumental relief learning during tonic heat pain (effects during Pavlovian relief learning are less conclusive). The paper is well-written, analyses are appropriate, and the work builds on previous studies of tonic pain and pain-related learning from this group, as well as a growing body of work integrating pain and associative learning.

Essential revisions:

All reviewers noted that the modeling was sophisticated but not particularly accessible to a non-modeling audience. Overall, the manuscript is densely written and relies a lot on technical terms. Unpacking some of the ideas and concepts in the Introduction and Results section would help to make the manuscript more accessible to a broader audience. Given that this is a general interest journal, I hope the following suggestions will make it easier to for a broader audience to extract conclusions. At the same time there are technical concerns that need to be addressed. These comments reflect input from all three reviewers – there are many points but some of them are convergent.

1) It isn't clear why the particular models tested were chosen and what adjudicating between them tells us, in practical terms. For example, what are the implications of a model with a fixed learning rate (TD) fitting better than a model with an adaptive learning rate (hybrid TD)? Also, as it seems a central goal to demonstrate that RL models have more explanatory power than simpler models, it would be helpful to be able to understand how well each model fit and what the incremental difference between them is. The latter might be accomplished by expanding on the description of exceedance probability in “subsection “Model fitting and comparison” and mentioning it in the Results (or figure captions?). Relatedly, the behavioral choice data in Experiments 1 and 2 are best explained by a temporal-difference (TD) model without an associability term. Are decisions and actions thus independent of the associability? Do participants learn, but do not act on that knowledge? What does that imply for the conclusions drawn here?

The authors fit models to individuals' behavior, then used mean parameters to generate regressors for neuroimaging data. Individuals seem quite variable in terms of fitted parameters, particularly in Experiment 2, and this variability in learning and performance might contribute to inconsistencies in the neural data. Why did the authors not (1) use the individual model fits in the imaging analyses, (2) fit to the group, or (3) incorporate information about individual fits (e.g. learning rates in the TD model) at the subject level in analyses?

Also, regarding the modeling efforts – How reliable are the model parameters and outcomes of the model comparisons, when only 16-18 SCR data points are used for fairly complex TD or Hybrid models for each session? Are the reported exceedance probabilities for the model comparisons 'protected exceedance probabilities' (Rigoux et al., 2014) that account for the possibility that models are equally (un)likely?

2) Associability is clearly a critical construct here, but it seems to arise operationally from the models but a little muddier at the level of theory. For example, would an associability account of relief learning differ from an attentional account? Associability and attention are discussed almost interchangeably. On a more practical level, simply stating the direction of associability (e.g. high associability = higher uncertainty) clearly would make correlations more immediately interpretable. Clarifying the relationship between these concepts early on and consistently using them throughout the manuscript will increase readability.

As a more general point relating to both of the preceding issues, I think some of the difficulties in interpreting results was due to the format whereby results are presented before methods. As an example, there was more description of associability in the discussion and methods, but it would have been helpful to have some information provided in the results, given that this is read first. Perhaps the authors could be mindful of this format and provide some explanation along with the results presented?

3) Other than the fact that a reinforcement learning paradigm fits putamen responses, do we have evidence that dorsal putamen responses are involved in a learning process? Is there any correspondence between dorsal putamen findings and behavioural findings?

4) The timing of pain and relief ratings wasn't very clear. Am I correct in inferring that both pain and relief ratings were collected at three time points in each session (near beginning, near middle, near end)? How many of each rating makes up the scores reported?

5) The important event for participants is the reduction of pain when the temperature is reduced. Since pain has a continuous intensity dimension and a reduction along this dimension is driving the learning process, I wonder whether the RL models could be extended to use a continuous outcome that might offer more information?

6) Given that correspondence between the two experiments is an important feature, a figure in which activations in one experiment are overlayed on the other (to judge spatial correspondence) would be helpful.

7) It isn't very clear how imaging data were corrected for multiple comparisons. Relatedly, in some cases, searches were restricted to a priori ROIs (e.g. pgACC, posterior insula, vlPFC), but isn't clear how these were defined (e.g. anatomically? Based on previous findings?) or whether data in these analyses was corrected across the mask of all ROIs. From the tables and Results section, I conclude that the authors use a mix of Cluster-extend thresholding and peak-voxel SVC correction. The authors should choose one method and use it consistently.

Furthermore, the authors use SVC correction based on coordinates from hand-selected previous studies or selectively use Experiment 1 coordinates for SVC correction of the amygdala results in Experiment 2. With the availability of comprehensive anatomical atlases, I urge the authors to apply masks based on anatomical atlases or independent functional localizers to correct for multiple comparisons.

8) Since the conclusions of this manuscript rely primarily on the model efforts, I think presenting about absolute model fits for choices and SCR data would help in evaluating the models. In addition, presenting information on SCR data quality will help to convince the reader about the conclusions. For example, skin conductance shows spontaneous, phasic responses during acute (10-20s) or tonic pain stimuli. To which degree are the responses modeled here locked to the cue- or outcome-events? Showing raw SCR traces and/or averaged evoked responses with predicted SCR responses would help here, e.g. using eLife's figure supplements.

9) Do changes in pain perception also correlate with pgACC activity when used as a regressor in subject-level models?

10) In the Discussion section, the authors argue that lack of controllability in Pavlovian paradigms renders uncertainty hyperalgesic instead of analgesic. However, pain ratings do not differ between instrumental and Pavlovian sessions in Experiment 1, as predicted by this reasoning.

11) In subsection “Ratings” the authors argue that placebo expectation theory predicts that larger prediction errors are correlated with pain reductions. Montgomery & Kirsch, (1997) and Locher et al., (2017) have shown that a plausible instruction regarding the placebo is needed for conditioned placebo analgesia. Participants in the present study weren't given any rationale for a cue being a placebo treatment. Hence, different processes might be involved here. In addition, this test relies on the correct estimation of the prediction error, which depends on the estimated value. The value will increase (i.e. encode more expectation for relief) over repeated trials that included a relief. When the expectation for relief and thus the prediction errors are maximal, participants have just experienced a series of relief trials and the surprise or associability/uncertainty to previous trials is also maximal.

12) In both studies, pain (and relief) ratings were collected "intermittently," yet the authors make strong assumptions about effects on relief/pain based on the correlations between ratings and the time-varying measures of associability or prediction error. The authors should present complete information about rating measurement for each experiment (e.g. number of ratings) and justify why they did not incorporate ratings at the same time scale of choice, stimulus display, and SCR measurement. To determine that ratings are preferentially related to associability and not prediction error, it seems that all quantities should be measured with the same number of observations. Furthermore, this would allow direct fits to ratings, which would be the best way to determine how these learning-related parameters modulate pain and relief. Finally, if I understand correctly, Experiment 2 included pain ratings before relief outcomes were delivered. These ratings are likely to be influenced by anticipation and uncertainty, but not by relief, whereas Experiment 1's ratings were measured after outcomes. Thus, the studies differ in terms of the construct that is captured by ratings. Since pain and relief are ultimately subjective, a more thorough consideration of the self-report measures is warranted.

13) Please also explain the negative coefficients in Figure 4E – participants experienced less pain with higher associability and with longer time since relief? This seems inconsistent with previous work on uncertainty, attention, and desire for relief which should enhance pain.

14) Skin conductance was found to be best fit by associability from the hybrid model. Can the authors rule out the possibility that this is only the case because both associability and skin conductance decrease over time? Other models included an effect of time/trial to account for such habituation. Are these findings artefactual, and might SCR track value or prediction error if habituation is modeled separately?

15) The study uses a mild tonic stimulus in healthy volunteers and measures behavioral correlates of intermittent relief. While the pgACC results are cool, I find it quite inappropriate to suggest that "the results highlight the pgACC as a target for therapeutic intervention […] by invasive excitatory deep brain stimulation."

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "The Control of Tonic Pain by Active Relief Learning" for further consideration at eLife. Your revised article has been favorably evaluated by Michael Frank (Senior editor), a Reviewing editor, and three reviewers.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

Reviewer #1:

The authors have largely satisfied any concerns I had about the reliability of the findings. So, I would be comfortable publishing the paper in its current form.

That said, I concur that they haven't done as much as they might have to increase the paper's accessibility to a general audience. On re-reading the paper after the authors' responses, I think the easiest way to do this might be to do some additional revision to the introduction. The Introduction (particularly before the addition of the sections on associability and reward learning) does little to set up the actual experimental paradigms and modelling techniques used, such that one ends up trying to piece together the rationale for most of what was done while reading the methods and results. The methodology and modelling would have been far clearer to me had the authors been more explicit (as they were in their reply to reviewers) about the relevance of associability for illuminating the distinction between state and action learning and how doing so relates to the broader goal of understanding relief learning in the context of tonic pain.

So, in summary, the paper is publishable, but I do think the paper could be improved in terms of accessibility without a great deal of additional work.

Reviewer #2:

The authors have addressed all my comments and questions.

Reviewer #3:

For the most part, the authors have addressed all major concerns. I was particularly impressed that results and conclusions hold (1) whether parametric modulators are based on individual versus mean fits for Experiment 2 (although I think the authors should consider including these results in Supplementary figures), and (2) when consistently defined ROIs are employed (new Tables 6 and 7). The paper is also strengthened by the addition of information clarifying rating procedures and depicting skin conductance over time.

However, I feel that a few concerns remain, which I have delineated as minor concerns in the following section. In several places (e.g. the discussions of contingency awareness, modeling the time course of temperature changes, subjectivity of ratings), I felt that the authors only superficially engaged with reviewers' collective suggestions, and that overall accessibility of the work is still somewhat limited for non-expert audiences.

https://doi.org/10.7554/eLife.31949.028

Author response

Essential revisions:

All reviewers noted that the modeling was sophisticated but not particularly accessible to a non-modeling audience. Overall, the manuscript is densely written and relies a lot on technical terms. Unpacking some of the ideas and concepts in the Introduction and Results section would help to make the manuscript more accessible to a broader audience. Given that this is a general interest journal, I hope the following suggestions will make it easier to for a broader audience to extract conclusions. At the same time there are technical concerns that need to be addressed. These comments reflect input from all three reviewers – there are many points but some of them are convergent.

We thank the reviewers for pointing out the issue with accessibility for non-modelling audience. We have modified the manuscript to improve this, and these changes are detailed in response to the specific issues below. But briefly, this involves (i) explaining the modelling aspects in a more intuitive manner, and (ii) moving some model description contents from The Discussion section and Materials and methods section to the Introduction and Results section.

1) It isn't clear why the particular models tested were chosen and what adjudicating between them tells us, in practical terms. For example, what are the implications of a model with a fixed learning rate (TD) fitting better than a model with an adaptive learning rate (hybrid TD)? Also, as it seems a central goal to demonstrate that RL models have more explanatory power than simpler models, it would be helpful to be able to understand how well each model fit and what the incremental difference between them is. The latter might be accomplished by expanding on the description of exceedance probability on page 19 and mentioning it in the Results (or figure captions?). Relatedly, the behavioral choice data in Experiments 1 and 2 are best explained by a temporal-difference (TD) model without an associability term. Are decisions and actions thus independent of the associability? Do participants learn, but do not act on that knowledge? What does that imply for the conclusions drawn here?

There are a few issues here for us to clarify in the manuscript:

i) Choice of RL models: The models chosen for comparison were RL models that have been well-evidenced in previous reward learning studies. In many ways, RL/TD models are the simplest mechanistic models of learning and reflect an account of learning that is both intuitive and in particular, well-grounded in the animal learning theory literature. We now include an additional sentence in Introduction to set this out with much more clarity:

“RL models provide a mechanistic (as opposed to merely descriptive) account of the information processing operations that the brain actually implements and have a solid foundation in classical theories of animal learning (Mackintosh, 1983; Dayan and Abbott, 2001).”

ii) The nature and role of associability. The behavioural choice data are best fitted by TD model without an associability term, while SCR data are best fitted by hybrid TD model with an associability term. This difference between action and state learning has been observed in previous studies (Li et al., 2011; Boll et al., 2013; Zhang et al., 2016; Gläscher et al., 2010; Morris et al., 2006), and reflects the distinction between instrumental and Pavlovian learning systems. The key point is that Pavlovian and instrumental systems learn different things – conditioned responses (such as autonomic responses) and actions respectively, each of which have independent biological functions. So, whereas we do not find evidence that associability is used in determining actions (in keeping with previous reports), it is used for learning conditioned responses. Although these are independent in our experiment and model, it is perfectly possible (even, likely) that they do interact in appropriate circumstances, but we haven’t employed a task to probe this here. We now mention this more explicitly in the manuscript, in the Introduction:

“Models of the role of attention during learning typically invoke attention during uncertainty, as this is when there may be the greater requirement to devote resources to enhance learning. From a computational perspective, the role of uncertainty is often operationalised as controlling the learning rate, such that high uncertainty (hence high attention) leads to more rapid learning (Dayan et al., 2000; Angela and Dayan, 2005). One way of formalising uncertainty in RL by computing a quantity called the associability, which calculates the running average of the magnitude of recent prediction errors (i.e. frequent large prediction errors implies high uncertainty / associability). The concept of associability is well grounded in classical theories of Pavlovian conditioning (the `Pearce-Hall' learning rule, Pearce and Hall, 1980; Le Pelley, 2004; Holland and Schiffino, 2016), and provides a good account of behaviour and neural responses during Pavlovian learning (Li et al., 2011; Boll et al., 2013; Zhang et al., 2016). In this way, it can be seen that associability reflects a computational construct that captures aspects of the psychological construct of attention.”

Later in the Results section, we emphasize this again;

“As mentioned above, the associability reflects the uncertainty in the action value, where higher associability indicates high uncertainty during learning, and is calculated based the recent average of the prediction error magnitude for each action.”

And again, point out the fact that associability does not to state-learning only (Results section).

“Thus, there is no evidence that associability operates directly at the level of actions”.

And later in subsection “Ratings”:

“This divergence in learning strategies indicates that parallel learning systems coexist, which differ in their way of incorporating information about uncertainty in learning, as well as the nature of their behavioural responses.”

iii) Explaining exceedance probability: Since we had a specific prior hypothesis of a reciprocal relationship between attention and tonic pain, we used hybrid TD model to test this hypothesis by demonstrating that it fitted better than conventional RL models in relief learning. We have incorporated more information on model exceedance probability in Figure 2 legend to familiarise audience with the concept:

“Model frequency represents how likely a model generate the data given a random participant, while exceedance probability estimates how one model is more likely compared to others (Stephen et al., 2009).”

The authors fit models to individuals' behavior, then used mean parameters to generate regressors for neuroimaging data. Individuals seem quite variable in terms of fitted parameters, particularly in Experiment 2, and this variability in learning and performance might contribute to inconsistencies in the neural data. Why did the authors not (1) use the individual model fits in the imaging analyses, (2) fit to the group, or (3) incorporate information about individual fits (e.g. learning rates in the TD model) at the subject level in analyses?

We conducted the additional analysis as the reviewer suggested, using individual behavioural data fitted parameters to produce sequences of parametric modulators, instead of using a single group-mean parameter. Author response image 1 showed overlayed clusters from TD model prediction errors with group mean learning rate in Experiment 2 (red), and prediction errors with individual learning rates (green). Resulting dark green clusters were the overlapping regions between the two maps (both viewed at p<0.001 unc). Using individual learning rate reproduced the major clusters reported in the manuscript (dorsal putamen, amygdala, middle frontal gyrus left), with the global peak in left putamen ([-32, -12, 5], T=5.89, where group-mean parameter analysis had the global peak in the same coordinate with T=6.98). Given that Experiment 2 TD learning rates had the largest variation comparing to all other fitted models (mean=0.577, with standard deviation=0.28, see Tables of model fitting results), it’s reasonable to believe that other model results are not likely to change significantly when using individual instead of group-mean parameters.

We now mention this result briefly in the manuscript, in the appropriate section of the Materials and methods section, although we don’t include the figure for the sake of clarity but present it here to reassure the reviewers of the validity of the point.

“We used the mean of the fitted parameters from all participants in the imaging analysis as this provides the most stable estimate of the population mean (taking into account the fact that individual fits reflect both individual differences and noise). For completeness, however, we also ran the analyses with individually fitted values, which led to similar results (i.e. no change in significance level of each result).”

Also, regarding the modeling efforts – How reliable are the model parameters and outcomes of the model comparisons, when only 16-18 SCR data points are used for fairly complex TD or Hybrid models for each session? Are the reported exceedance probabilities for the model comparisons 'protected exceedance probabilities' (Rigoux et al., 2014) that account for the possibility that models are equally (un)likely?

To clarify this point, we note that we used SCR data points from all sessions for model fitting, applying multi-session split within VBA toolbox, which adjusted fitting considering all session data (Experiment 1: 3.3 sessions/~60 trials per participant, Experiment 2: 4.8 session/~115 trials per participant). Also, we constrained free parameters to have a low variance to limit overfitting in order to allow model generalisability.

Notwithstanding this, however, we have followed the reviewer’s advice to calculate the protected exceedance probabilities. We now add the following clarification in the Materials and methods section, and the additional results in Figure 2—figure supplement 3 and Figure 4—figure supplement 3:

“We calculated the protected exceedance probabilities based on Rigoux et al., (2014), shown in figures supplements in the same way as in the original exceedance probabilities in the Results section. See http://mbb-team.github.io/VBA-toolbox/wiki/BMS-for-group-studies/#rfx-bms for details of its calculation. In Experiment 1, for SCR fitted model comparison, the best fitting model became less clear. However, in Experiment 2, where the number of trials was increased as fitting wasn’t conducted separately for Instrumental/Pavlovian sessions, best fitting models from comparison remained unchanged from the original comparison using exceedance probabilities. Results from Experiment 2 provided validation for Experiment 1 in the way similar to the neuroimaging analysis.”

2) Associability is clearly a critical construct here, but it seems to arise operationally from the models but a little muddier at the level of theory. For example, would an associability account of relief learning differ from an attentional account? Associability and attention are discussed almost interchangeably. On a more practical level, simply stating the direction of associability (e.g. high associability = higher uncertainty) clearly would make correlations more immediately interpretable. Clarifying the relationship between these concepts early on and consistently using them throughout the manuscript will increase readability.

Clearly there is a strong parallel between associability (a computational construct) and attention (a psychological construct), such that whilst they cannot be directly equated (as they are different types of construct), it is certainly the case that they share a common intuitive basis. We have modified the text in manuscript to discuss associability in terms of uncertainty, and this includes the major new introductory paragraph as mentioned above. We hope this makes the narrative more consistent.

See above.

As a more general point relating to both of the preceding issues, I think some of the difficulties in interpreting results was due to the format whereby results are presented before methods. As an example, there was more description of associability in the discussion and methods, but it would have been helpful to have some information provided in the results, given that this is read first. Perhaps the authors could be mindful of this format and provide some explanation along with the results presented?

Again, this point echoes the preceding points, which we have dealt with above.

3) Other than the fact that a reinforcement learning paradigm fits putamen responses, do we have evidence that dorsal putamen responses are involved in a learning process? Is there any correspondence between dorsal putamen findings and behavioural findings?

It is hard for us to provide causative evidence of a role of dorsal putamen in the actual learning process itself, and the difficulty in distinguishing the site of learning from performance has been a long-standing debate in resolving sub-regions on the striatum. Clearly the BOLD signal observed in our and other studies reflects the computation of a prediction error, and this prediction error is used in learning, but in the absence of interventional studies, we cannot conclude more than this.

4) The timing of pain and relief ratings wasn't very clear. Am I correct in inferring that both pain and relief ratings were collected at three time points in each session (near beginning, near middle, near end)? How many of each rating makes up the scores reported?

For Experiment 1, both pain and relief ratings were collected at 3 time points in each session (near beginning, near middle, near end), when there were pain / relief outcomes respectively. The results reported in Figure 2C consisted of 19 subject’s ratings averaged across sessions, separated according to paradigm (instrumental/Pavlovian) and outcome (relief/pain). For Experiment 2, only pain rated were collected in 10 random trials out of 24, the same for each session. We have included more details of rating in both the figure legends and the manuscript text and added Table 1.

5) The important event for participants is the reduction of pain when the temperature is reduced. Since pain has a continuous intensity dimension and a reduction along this dimension is driving the learning process, I wonder whether the RL models could be extended to use a continuous outcome that might offer more information?

This is an interesting point, and theoretically it is possible to model learning with continuous time and value functions. However, in our experiment, it would simply render the analyses much more complicated i.e. involve inclusion of several additional parameters, whilst being not easy to see how it would change our ability to answer the central questions being asked. To be more specific, the cues would still elicit a value signal that reflects the accumulated anticipated value, but this value would be a more complex integral of a temporal function; and the outcome signal in particular would be complicated by the fact that it’s onset was less temporally discrete. In RL generally, this latter issue is certainly important and not particularly well studied, but here, not capturing this full complexity does not confound our analysis, even if we knew how to parameterise. In this case, it might make a very subtle improvement in the sensitivity of analysis, but the findings are already sufficiently robust. Therefore, in our current analysis there is no real need not to treat relief learning as driven by discrete relief events (we note also our short trial design, where cue and choice period lasting for 3 seconds, and outcome events lasting for 3-4 seconds, comparing with the relatively long TR of 2.5s in fMRI acquisition). We now add the following sentence on this point in the Materials and methods section:

“We note that it is theoretically possible to model the learning process as a continuously valued function that exactly matches the time-course of the temperature changes, but such models are unnecessarily complex and largely orthogonal to the experimental manipulations.”

6) Given that correspondence between the two experiments is an important feature, a figure in which activations in one experiment are overlayed on the other (to judge spatial correspondence) would be helpful.

We have added a figure showing the overlaid associability-correlated pgACC responses (left), and prediction error correlated activations in dorsal putamen and amygdala (right), as Figure 5—figure supplements 1 and 2 in the manuscript.

7) It isn't very clear how imaging data were corrected for multiple comparisons. Relatedly, in some cases, searches were restricted to a priori ROIs (e.g. pgACC, posterior insula, vlPFC), but isn't clear how these were defined (e.g. anatomically? Based on previous findings?) or whether data in these analyses was corrected across the mask of all ROIs. From the tables and Results section, I conclude that the authors use a mix of Cluster-extend thresholding and peak-voxel SVC correction. The authors should choose one method and use it consistently.

Furthermore, the authors use SVC correction based on coordinates from hand-selected previous studies or selectively use Experiment 1 coordinates for SVC correction of the amygdala results in Experiment 2. With the availability of comprehensive anatomical atlases, I urge the authors to apply masks based on anatomical atlases or independent functional localizers to correct for multiple comparisons.

Following the reviewer’s advice, we have now consistently used binary masks generated from an anatomical atlas for small volume correction. ROI masks were generated using sing the Harvard-Oxford Atlas (Desikan et al., 2006), freely available with the FSL software (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases). We thresholded the probability maps at 50%, focusing on ROIs defined a priori (learning related: amygdala, accumbens, putamen, caudate, pallidum, VMPFC, DLPFC. Controllability-induced analgesia related: cingulate gyrus – anterior division, insular cortex, VLPFC). We used the frontal medial cortex for VMPFC, the frontal orbital cortex for VLPFC, and the middle frontal gyrus for DLPFC respectively. We reported results with p<0.05 (FWE cluster-level corrected). Masks were applied separately, not combined.

In brief, there is no change in the significance levels / inference based on using a consistent system for multiple comparisons. The manuscript now includes the paragraph above and Table 3 and 7.

8) Since the conclusions of this manuscript rely primarily on the model efforts, I think presenting about absolute model fits for choices and SCR data would help in evaluating the models. In addition, presenting information on SCR data quality will help to convince the reader about the conclusions. For example, skin conductance shows spontaneous, phasic responses during acute (10-20s) or tonic pain stimuli. To which degree are the responses modeled here locked to the cue- or outcome-events? Showing raw SCR traces and/or averaged evoked responses with predicted SCR responses would help here, e.g. using eLife's figure supplements.

We have modified Table 4, to include each model’s absolute fit, as output by the VBA toolbox. The Free Energy F is the model log evidence from variational Bayesian approximation, where a larger value suggests a better fit (similar to log likelihood in conventional gradient descent fitting). While absolute model fits provide information for evaluation, calculating model frequency and model exceedance probabilities (protected) through sampling are widely accepted methods for model comparison. We have verified the results further by calculating the protected exceedance probabilities, as indicated above.

We have also added both raw SCR traces from all participants (after exclusion) and averaged filtered trial SCR traces from both experiments (Figure 2—figure supplements 1–3 and figure 4—figure supplements 1–3). Raw SCR traces showed that non-excluded participants had reliable event-evoked response within session (showing only one session from all participants without filtering, some responses might not be obvious without scaling). Trial averaged SCRs showed time locked responses to cue display, where response onsets began at ~2seconds after cue appearance, before any other events (i.e. outcome or rating) taking place. In addition, the PsPM toolbox estimates SCR by convolving skin conductance response function (SCRF) with input signal time series, allowing variable onset time and response duration, providing more accurate estimates than simple peak-to-peak measures (http://pspm.sourceforge.net/). SCR recordings from the two experiments appeared to satisfy the requirements for model fitting.

Accordingly, we have added explanation of this in the manuscript and new figures are as follows: Figure 2—figure supplement 1, Figure 2—figure supplement 2, Figure 4—figure supplement 1 and Figure 4—figure supplement 2.

9) Do changes in pain perception also correlate with pgACC activity when used as a regressor in subject-level models?

We didn’t find pain ratings correlated pgACC activations at a threshold of p<0.01 uncorrected, in keeping with the notion that associability correlated pgACC activations were not solely driven by pain perception. We conducted the analysis using pain ratings as parametric modulators at decision time (duration=0) instead of associability, as the reviewer suggested. Since Experiment 1 there were insufficient number of ratings (only 8 pain ratings per participant per paradigm), we used data from Experiment 2 only (71 ratings per participant). Two participants were excluded because of constant ratings within session, whose first-level contrasts cannot be estimated (final n=21). Clusters positively correlated with pain perception include the insula, IFG, Rolandic Operculum, while negative clusters include the precentral, primary motor cortex, the PAG. Whole-brain analysis and AAL labels were summarised in the table below (table not shown in manuscript because results were not significant).

We added the following sentence in the Results section to incorporate this result:

“In addition, we used trial-by-trial pain ratings as a parametric modulator, but did not find significant pgACC responses, which suggested that it was unlikely to be solely driven by pain perception itself.”

Experiment 2
with an initial cluster-forming threshold of p < 0.005, cluster size k>5, regions from AAL2
p*kTZMNI coordinates (mm)Region (AAL)
xyz
Positive
0.474364.373.62-36335Frontal Inf Tri L
3.743.22-4078Insula L
3.352.95-29188Insula L
0.782244.293.5743-2023Rolandic Oper R
0.986123.993.38-172246Frontal Sup 2 L
0.99883.783.25-40-16-14Hippocampus L
0.986123.733.2117-461Lingual R
0.971143.723.254-3112Temporal Sup R
3.172.8262-3512Temporal Sup R
0.991113.663.1662-461Temporal Mid R
0.995103.513.065-81Thalamus R
0.99973.242.875-31-52Cerebelum 9 R
153.232.86-407-3Insula L
Negative
0.99884.223.53-51-150Precentral L
0.1685643.39-17-1668Precentral L
3.633.15-25-1265Precentral L
3.583.11-36-2065Precentral L
0.991113.953.36-2-23-3PAG/Thalamus L
0.99793.853.29-29-6161Parietal Sup L
153.743.22-14-911Occipital Sup L
0.99793.362.9532-8031Occipital Mid R
3.052.7328-7638Occipital Sup R
153.352.952018-3Putamen R
0.99973.232.8613-6538Precuneus R

10) In the Discussion section, the authors argue that lack of controllability in Pavlovian paradigms renders uncertainty hyperalgesic instead of analgesic. However, pain ratings do not differ between instrumental and Pavlovian sessions in Experiment 1, as predicted by this reasoning.

We agree with this point. Previously we were trying to point out previous evidence where uncertainty is linked to hyperalgesia when no control is available but didn’t rule out other possibilities. But it is difficult to draw any strong conclusions, so we have now removed this sentence.

11) In subsection “Ratings” the authors argue that placebo expectation theory predicts that larger prediction errors are correlated with pain reductions. Montgomery & Kirsch (1997) and Locher et al., (2017) have shown that a plausible instruction regarding the placebo is needed for conditioned placebo analgesia. Participants in the present study weren't given any rationale for a cue being a placebo treatment. Hence, different processes might be involved here. In addition, this test relies on the correct estimation of the prediction error, which depends on the estimated value. The value will increase (i.e. encode more expectation for relief) over repeated trials that included a relief. When the expectation for relief and thus the prediction errors are maximal, participants have just experienced a series of relief trials and the surprise or associability/uncertainty to previous trials is also maximal.

We accept there is a debate about the importance of instructed or conscious contingency knowledge in generating placebo analgesic responses. In our case, we did not get explicit contingency awareness ratings, so we do not know whether they were acquired or not. The point we are trying to make is that a certain direction of response would be consistent with a placebo analgesic response, but the fact that it doesn’t occur renders the point somewhat academic. Of course, we did not design the study to actively try and pit placebo analgesia against the associability analgesic effects, although it is an interesting issue to consider. So as the reviewer correctly points out, expectation/value of relief increases after a series of relief trials, making subsequent relief outcomes less surprising, which lead to low associability/uncertainty and were associated with higher pain. And this is not the manifestation of the typical conditioned placebo analgesia, which should associate increasing expected relief values with decreased pain. Instead, therefore, the results are consistent with an information driven, temporally dynamic process that suppresses pain when learning was most needed. We cannot rule out the possibility that this is the result of the interplay between different processes, including placebo analgesia, but we can say that any placebo-like response does not dominate the ratings.

To make this clearer, we now add the following sentence in the Results section:

“… although the extent to which this occurs might depend on the acquisition of contingency awareness during learning (Montgomery and Kirsch, 1997; Locher et al., 2017)”.

12) In both studies, pain (and relief) ratings were collected "intermittently," yet the authors make strong assumptions about effects on relief/pain based on the correlations between ratings and the time-varying measures of associability or prediction error. The authors should present complete information about rating measurement for each experiment (e.g. number of ratings) and justify why they did not incorporate ratings at the same time scale of choice, stimulus display, and SCR measurement. To determine that ratings are preferentially related to associability and not prediction error, it seems that all quantities should be measured with the same number of observations. Furthermore, this would allow direct fits to ratings, which would be the best way to determine how these learning-related parameters modulate pain and relief. Finally, if I understand correctly, Experiment 2 included pain ratings before relief outcomes were delivered. These ratings are likely to be influenced by anticipation and uncertainty, but not by relief, whereas Experiment 1's ratings were measured after outcomes. Thus, the studies differ in terms of the construct that is captured by ratings. Since pain and relief are ultimately subjective, a more thorough consideration of the self-report measures is warranted.

The ratings were collected intermittently because interruptions on each trial will impede the relief learning process, as well as greatly lengthening the duration of experiment. This is typical in pain studies. More precisely, while choices and stimulus display are inherent components of the learning paradigm and SCRs were measured without participants’ conscious input, pain/relief ratings required participants to disengage from learning and switched their attention to evaluate their perception and convert that to numerical ratings. To minimise this disruption, we decided to use intermittent ratings, which had been used previously in both appetitive and aversive conditioning paradigms (Delgado et al., 2011; Prevost et al., 2013). In addition, the tonic pain stimulation would be prolonged with ratings in each trial. Balancing the needs for participants to learn effectively, to have enough trials, and to minimise their exposure to tonic pain motivates our design. Of course, fewer ratings comes at the expense of power, but it does not invalidate the basis for model fitting and comparison, and the fact that the effects are robust reflects the effect sizes.

We now add the following sentence in the Results section:

“Ratings were taken on a sample of trials, so as to minimise disruption of task performance”

We have also included a table to summarise the details of rating timing and frequency:

See Response #4 above.

The rating timing was after outcome in Experiment 1, and before outcome in Experiment 2, which was done for precisely the reason that the reviewer mentions, that is, to address the issue that the effects might be restricted to outcome times. This issue is discussed in subsection “Summary of Experiment 1”, where we wrote:

“Second, does the modulation of pain ratings occur throughout the trial? In the task, pain ratings are taken at the outcome of the action, and only when relief is frustrated, raising the possibility that it reflects an outcome-driven response, as opposed to learning-driven process modifying the ongoing pain.”

On the issue of ratings as measures, clearly pain is necessarily subjective as the reviewer points out, but still under an endogenous control process that is objectively testable. Hence, we can test how these subjective ratings change as a result of learning (against a null hypothesis of constant pain/relief ratings since the temperature of tonic pain and reduction never changed). In other words, by formally modelling the learning process we were able to capture these subjective perceptual changes using predictors such as associability, regardless of the timing of rating collection. This also suggested the robustness of learning effects on perception. We have now added the following sentence in the Materials and methods section:

“Although it is the case that subjective ratings are inherently subjective, their modulation reflects an objective process that may explain a component of this apparent subjectivity.”

13) Please also explain the negative coefficients in Figure 4E – participants experienced LESS pain with higher associability and with longer time since relief? This seems inconsistent with previous work on uncertainty, attention, and desire for relief which should enhance pain.

Precisely, we show that participants experienced less pain with higher associability as a result of relief learning – when uncertainty is high and hence the need to learn is maximal, pain is reduced, presumably to facilitate learning. This is contrary to the effect of uncertainty seen in some experiments, which don’t involve controllable relief. This underscores the significance of the finding. As for the reduced pain with longer time since relief, it is may reflect a peripheral habituation process. We have included this as a regressor of no interest in all neuroimaging analysis to exclude its effect.

We hope that the new discussion of uncertainty in response to several points above addresses this concern.

14) Skin conductance was found to be best fit by associability from the hybrid model. Can the authors rule out the possibility that this is only the case because both associability and skin conductance decrease over time? Other models included an effect of time/trial to account for such habituation. Are these findings artefactual, and might SCR track value or prediction error if habituation is modeled separately?

Indeed, this is an issue that we considered following Experiment 1 and was part of the motivation for the non-stationary design in Experiment 2, in which the associability is maintained over time (Author response image 2). The modified discussion of this point appears in the Results section as follows:

“Third, the action-outcome contingencies were non-stationary, such that the relief probability from selecting each cue varied slowly throughout the experiment duration, controlled by a random walk algorithm which varied between 20-80%. This ensured that associability varied constantly through the task, encouraging continued relief exploration, and allowed us to better resolve more complex models of uncertainty. It also reduced the potential confounding correlation of associability and general habituation of SCRs.”

In addition, we have taken the precaution in model fitting procedure by (a) excluding first two trials in each session for fitting, since they are most likely to be very large due to startle effects, hence accentuating the SCR decrease, (b) constraining the variance in free parameter priors, to reduce the generalisability of associability traces so that the fitting explains mostly trial-by-trial variation (figure not included in manuscript).

15) The study uses a mild tonic stimulus in healthy volunteers and measures behavioral correlates of intermittent relief. While the pgACC results are cool, I find it quite inappropriate to suggest that "the results highlight the pgACC as a target for therapeutic intervention […] by invasive excitatory deep brain stimulation."

We have removed the suggestion of ‘invasive excitatory deep brain stimulation’ as intervention in the sentence and added ‘potential’ as a preface to ‘target’. This sentence, in the Discussion section, now reads:

“In addition to suggesting a possible computational mechanism that might underlie pain susceptibility in these patients, the results highlight the pgACC as a potential target for therapeutic intervention.”

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Reviewer #1:

The authors have largely satisfied any concerns I had about the reliability of the findings. So, I would be comfortable publishing the paper in its current form.

That said, I concur that they haven't done as much as they might have to increase the paper's accessibility to a general audience. On re-reading the paper after the authors' responses, I think the easiest way to do this might be to do some additional revision to the introduction. The Introduction (particularly before the addition of the sections on associability and reward learning) does little to set up the actual experimental paradigms and modelling techniques used, such that one ends up trying to piece together the rationale for most of what was done while reading the methods and results. The methodology and modelling would have been far clearer to me had the authors been more explicit (as they were in their reply to reviewers) about the relevance of associability for illuminating the distinction between state and action learning and how doing so relates to the broader goal of understanding relief learning in the context of tonic pain.

So, in summary, the paper is publishable, but I do think the paper could be improved in terms of accessibility without a great deal of additional work.

Thanks for this comment. We have now gone through the introduction in detail to try and better communicate the hypothesis and scientific motivation for the study. This includes:

i) justify actual experimental paradigm and modelling techniques used,

ii) discuss relevance of associability and

iii) distinction between state and action learning:

The revised Introduction:

“Tonic pain is a common physiological consequence of injury, and results in a behavioural state that favours quiescence and inactivity, prioritising energy conservation and optimising recuperation and tissue healing. This effect extends to cognition, and decreased attention is seen in a range of cognitive tasks during tonic pain (Moore et al., 2012; Crombez et al., 1997; Lorenz and Bromm, 1997). However, in some circumstances this could be counter-productive, for instance if attentional resources were required for learning some means of relief or escape from the underlying cause of the pain. A natural solution would be to suppress tonic pain when relief learning is possible. Whether and how this is achieved is not known, but it is important as it might reveal central mechanisms of endogenous analgesia. […] The studies presented here set two goals: to delineate the basic neural architecture of relief learning from tonic pain (i.e. pain escape learning) based on a state and action learning RL framework; and to understand the relationship between relief learning and endogenous pain modulation i.e. to test the hypothesis that an attentional learning signal reduces pain. We studied behavioural, physiological and neural responses during two relief learning tasks in humans, involving (i) static and (ii) dynamic cue-relief contingencies. These tasks were designed to place a high precedence on error-based learning and uncertainty, as a robust test for learning mechanisms and dynamic modulation of tonic pain. Using a computationally motivated analysis approach, we aimed to identify whether behavioural and brain responses were well described as state and/or action RL learning systems and examined whether and how they exerted control over the perceived intensity of ongoing pain.”

Reviewer #2:

The authors have addressed all my comments and questions.

We thank the reviewer for the assessment.

Reviewer #3:

For the most part, the authors have addressed all major concerns. I was particularly impressed that results and conclusions hold (1) whether parametric modulators are based on individual versus mean fits for Experiment 2 (although I think the authors should consider including these results in Supplementary figures), and (2) when consistently defined ROIs are employed (new Tables 6 and 7). The paper is also strengthened by the addition of information clarifying rating procedures and depicting skin conductance over time.

However, I feel that a few concerns remain, which I have delineated as minor concerns in the following section. In several places (e.g. the discussions of contingency awareness, modeling the time course of temperature changes, subjectivity of ratings), I felt that the authors only superficially engaged with reviewers' collective suggestions, and that overall accessibility of the work is still somewhat limited for non-expert audiences.

We hope the revisions made in the introduction detailed above improve overall accessibility. We have addressed the reviewer’s minor comments in more detail below, but we also revisited some of the issues mentioned above:

The role of subjective awareness. We think this is important, especially for understanding pgACC function and the link to other paradigms. Although not something that we could directly address with our current design. But to highlight the importance, we now add the following comment in the Discussion section:

“However, an open question remains about the role of conscious awareness in driving pgACC-related endogenous control – a factor that is often important in these other paradigms. Whether or not the role of associability is modulated by the metacognitive awareness of uncertainty or controllability would be an important question for future studies.”

Modelling the time course of temperature. This is critical for an understanding of homeostatic motivation in which decisions would have a persistent effect on the baseline tonic temperature. Again, whilst it is beyond what we aimed to look at here, we now add further comment to emphasize that it deserves further study (Subsection “fMRI GLM model”):

“However, representation of the baseline temperature as a continuous function is clearly important in real-life contexts in which the baseline level determines homeostatic motivation and phasic reward functions (Morville et al., 2018), and hence future studies could directly manipulate this.”

On subjective ratings: we think the point at which this has the capacity to be most important is if the subjective (relief) ratings lead to a better determinant of the learned state and action values than fixed objective functions. Any effect here would be subtle if it exists at all, although it has some theoretical importance since it is against the conventional wisdom of the ‘wanting vs liking’ dissociation. Anyhow, we now add the following (subsection “Experimental Design’):

‘This does raise the issue of whether the subjective relief ratings influence the outcome values when learned in the RL model, but this (presumably subtle) effect is something that is beyond the experimental power of this experiments to resolve.”

https://doi.org/10.7554/eLife.31949.029

Article and author information

Author details

  1. Suyi Zhang

    1. Computational and Biological Learning Laboratory, Department of Engineering, University of Cambridge, Cambridge, United Kingdom
    2. Brain Information Communication Research Laboratory Group, Advanced Telecommunications Research Institute International, Kyoto, Japan
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    sz321@cam.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon 0000-0001-9028-6265
  2. Hiroaki Mano

    1. Computational and Biological Learning Laboratory, Department of Engineering, University of Cambridge, Cambridge, United Kingdom
    2. Brain Information Communication Research Laboratory Group, Advanced Telecommunications Research Institute International, Kyoto, Japan
    3. Center for Information and Neural Networks, National Institute for Information and Communications Technology, Osaka, Japan
    Contribution
    Conceptualization, Validation, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Michael Lee

    Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Conceptualization, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Wako Yoshida

    Brain Information Communication Research Laboratory Group, Advanced Telecommunications Research Institute International, Kyoto, Japan
    Contribution
    Conceptualization, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon 0000-0001-9273-1617
  5. Mitsuo Kawato

    Brain Information Communication Research Laboratory Group, Advanced Telecommunications Research Institute International, Kyoto, Japan
    Contribution
    Conceptualization, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  6. Trevor W Robbins

    Behavioural and Clinical Neuroscience Institute, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Conceptualization, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  7. Ben Seymour

    1. Computational and Biological Learning Laboratory, Department of Engineering, University of Cambridge, Cambridge, United Kingdom
    2. Brain Information Communication Research Laboratory Group, Advanced Telecommunications Research Institute International, Kyoto, Japan
    3. Center for Information and Neural Networks, National Institute for Information and Communications Technology, Osaka, Japan
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    bjs49@cam.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon 0000-0003-1724-5832

Funding

National Institute of Information and Communications Technology

  • Suyi Zhang
  • Hiroaki Mano
  • Ben Seymour

Cambridge Commonwealth Trust

  • Suyi Zhang

Japan Society for the Promotion of Science (S2604)

  • Hiroaki Mano
  • Wako Yoshida
  • Ben Seymour

Japan Agency for Medical Research and Development

  • Wako Yoshida
  • Mitsuo Kawato
  • Ben Seymour

Wellcome Trust (097490)

  • Trevor W Robbins
  • Ben Seymour

Arthritis Research UK (21357)

  • Ben Seymour

WD Armstrong Fund

  • Suyi Zhang

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Research was supported by National Institute for Information and Communications Technology (Japan), the Wellcome Trust (UK, Ref: 097490), the Japanese Society for the Promotion of Science (JSPS), and the ‘Application of DecNef for development of diagnostic and cure system for mental disorders and construction of clinical application bases’ of the Strategic Research Program for Brain Sciences from Japan Agency for Medical Research and development, AMED. SZ is supported by the WD Armstrong Fund and the Cambridge Trust. The Research is also supported by Arthritis Research UK (Ref: 21357), We thank Drs. Daniel McNamee and Agnes Norbury for helpful discussions, and the imaging teams at the Center for Information and Neural Networks and the Advanced Telecommunications Research Institute for their assistance in performing the study.

Ethics

Human subjects: The two experiments were performed in different institutes, and approved by their relevant ethics and Safety committees: for the National Institute of Information and Communications Technology, Japan (Expt 1), and the Advanced Telecommunications Research Institute, Japan (Expt 2). All subjects gave informed consent prior to participation

Reviewing Editor

  1. Tor Wager, Reviewing Editor, 1Institute of Cognitive Science, University of Colorado Boulder, United States

Publication history

  1. Received: September 12, 2017
  2. Accepted: February 8, 2018
  3. Accepted Manuscript published: February 27, 2018 (version 1)
  4. Version of Record published: March 8, 2018 (version 2)
  5. Version of Record updated: April 11, 2018 (version 3)

Copyright

© 2018, Zhang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,496
    Page views
  • 298
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Jay A Hennig et al.
    Research Article
    1. Neuroscience
    Michael Troup et al.
    Research Article