Presenting a sham treatment as personalised increases the placebo effect in a randomised controlled trial

  1. Dasha A Sandra  Is a corresponding author
  2. Jay A Olson
  3. Ellen J Langer
  4. Mathieu Roy
  1. Integrated Program in Neuroscience, McGill University, Canada
  2. Department of Psychology, Harvard University, United States
  3. Department of Psychology, McGill University, Canada

Abstract

Background:

Tailoring interventions to patient subgroups can improve intervention outcomes for various conditions. However, it is unclear how much of this improvement is due to the pharmacological personalisation versus the non-specific effects of the contextual factors involved in the tailoring process, such as the therapeutic interaction. Here, we tested whether presenting a (placebo) analgesia machine as personalised would improve its effectiveness.

Methods:

We recruited 102 adults in two samples (N1=17, N2=85) to receive painful heat stimulations on their forearm. During half of the stimulations, a machine purportedly delivered an electric current to reduce their pain. The participants were either told that the machine was personalised to their genetics and physiology, or that it was effective in reducing pain generally.

Results:

Participants told that the machine was personalised reported more relief in pain intensity than the control group in both the feasibility study (standardised β=−0.50 [–1.08, 0.08]) and the pre-registered double-blind confirmatory study (β=−0.20 [–0.36, –0.04]). We found similar effects on pain unpleasantness, and several personality traits moderated the results.

Conclusions:

We present some of the first evidence that framing a sham treatment as personalised increases its effectiveness. Our findings could potentially improve the methodology of precision medicine research and inform practice.

Funding:

This study was funded by the Social Science and Humanities Research Council (93188) and Genome Québec (95747).

Editor's evaluation

Sandra et al. assessed the effects of a personalized intervention on the placebo effect in a randomized controlled trial. The study showcases important results highlighting that psychological aspects of 'personalised' or 'precision' medicine substantially shape the treatment effects over and above the benefit of biologically/clinically/pharmacologically tailored interventions. It has to be noted that the effect sizes identified are relatively small and the outcomes are subjective, which has implications for the generalizability of the results.

https://doi.org/10.7554/eLife.84691.sa0

eLife digest

Precision treatments are therapies that are tailored to a patient’s individual biology with the aim of making them more effective. Some cancer drugs, for example, work better for people with specific genes, leading to improved outcomes when compared to their ‘generic’ versions. However, it is unclear how much of this increased effectiveness is due to tailoring the drug’s chemical components versus the contextual factors involved in the personalisation process.

Contextual factors like patient beliefs can boost a treatment’s outcomes via the ‘placebo effect’ – making the intervention work better simply because the patient believes it to. Personalised treatments typically combine more of these factors by being more expensive, elaborate, and invasive – potentially boosting the placebo effect.

Sandra et al. tested whether simply describing a placebo machine – which has no therapeutic value – as personalised would increase its effectiveness at reducing pain for healthy volunteers. Study participants completed several sham physiological and genetic tests. Those in the experimental group were told that their test results helped tailor the machine to increase its effectiveness at reducing pain whereas those in the control group were told that the tests screened for study eligibility.

All volunteers were then exposed to a series of painful stimuli and used the machine to reduce the pain for half of the exposures. Participants that believed the machine was personalised reported greater pain relief. Those with a stronger desire to be seen as different from others – based on the results of a personality questionnaire – experienced the largest benefits, but only when told that the machine was personalised.

This is the first study to show that simply believing a sham treatment is personalised can increase its effectiveness in healthy volunteers. If these results are also seen in clinical settings, it would suggest that at least some of the benefit of personalised medicine could be due to the contextual factors surrounding the tailoring process. Future work could inform doctors of how to harness the placebo effect to benefit patients undergoing precision treatments.

Introduction

Precision medicine may revolutionise healthcare by tailoring interventions to patients’ specific genetic, biological, and behavioural markers. Targeted therapies can lead to better health outcomes, such as increased life expectancy and remission rates, notably in cancer (Cutler, 2020). Researchers are now attempting to extend precision medicine approaches to other conditions such as chronic pain (Reimer et al., 2021). Further, advancements in artificial intelligence may soon broaden the use of personalisation for drug dosing (Rybak et al., 2020) and treatment selection (Ahmed et al., 2020). However, the greater effectiveness of tailored interventions may be due to more than just their pharmacological ingredients: contextual factors, such as the treatment setting and patient beliefs, may also directly contribute to better outcomes. The influence of contextual factors in precision medicine remains relatively unexplored, despite experts highlighting their potential influence on intervention outcomes (Haga et al., 2009). Thus, isolating the role of the contextual factors involved in the personalisation process could help control for them in precision medicine research and possibly optimise them in clinical practice.

Research in placebo science has shown that the contextual factors surrounding the intervention, such as verbal suggestions, patient expectations, social cues, and observational learning increase its effectiveness and reduce the associated side effects (Bernstein et al., 2020; Colloca and Barsky, 2020; Olson et al., 2021a). Some of these contextual factors can both modulate the effectiveness of inactive treatments in lab settings as well as increase the placebo effect of the real treatments in clinical settings (Blasini et al., 2018). Additionally, these effects are present for both subjective symptoms (e.g. pain, depression) and physiological ones (e.g. immune response, motor function) (Benedetti et al., 2005). Precision medicine may already benefit from greater patient expectations given the public’s high hopes for the field (Collins and Varmus, 2015), increased trust, and possible preference to be seen as different from others (i.e., high need for uniqueness), which may all increase placebo effects.

The public generally believes that personalised interventions are fully unique, so much so that the field rebranded from ‘personalised’ to ‘precision’ medicine in an effort to dispel this exaggerated view (Juengst et al., 2016). Despite the field’s more modest focus on targeting patient subgroups, tailored interventions use information about individual genetics and biology—elements most people believe to define their individual essence (Gelman, 2003). If an intervention was tailored to something so unique, the treatment would indeed be more likely to work, in turn possibly raising patients’ expectations. This appeal may be particularly strong in the broader context of rising individualism (Santos et al., 2017) and may speak to patients’ desire to be seen as distinct individuals. Although no studies to our knowledge have directly explored the influence of genetic information on perceived treatment effectiveness, receiving sham genetic feedback itself can affect behaviour and physiology, suggesting the potential for placebo effects in precision therapies. For example, simply learning about one’s increased genetic risk for obesity may lead to lower self-efficacy, reduced perceived control over related behaviours (Beauchamp et al., 2011; Dar-Nimrod et al., 2014), and worse cardiorespiratory capacity (Turnwald et al., 2019); learning one has a protective genetic makeup may cause the opposite results (Turnwald et al., 2019), regardless of the actual genes involved. Providing genetic and physiological feedback and then using it to tailor a treatment may similarly influence outcomes for precision therapies.

Beyond the appeal to individuality, the personalisation process may implicitly suggest stronger perceived treatment effectiveness by harnessing factors well-known to increase the placebo effect (Olson et al., 2021a). Pharmacological tailoring is an intricate process and often requires biomarker tests that are sometimes invasive (Corcoran, 2020), take longer to process (Rieder et al., 2005), or use advanced technology (Rybak et al., 2020). Indeed, studies in placebo science show that treatments that are more elaborate, invasive (de Craen et al., 2000; Hróbjartsson and Gøtzsche, 2010; Meissner et al., 2013), or use complex technology may cause larger improvements (Kaptchuk et al., 2000; Kaptchuk et al., 2008). Given the complexity of the procedure, tailoring treatments requires more physician attention and the involvement of practitioners specifically trained in therapeutic communication, such as genetic counsellors (Austin et al., 2014; Kohut et al., 2019). Practitioners may also inadvertently suggest greater effectiveness of the treatment by explaining it in more detail. Similarly, placebo studies show that providing enhanced information about a treatment can increase the effects of already potent drugs like opioids (Amanzio et al., 2001; Benedetti et al., 2003), and positive communication strategies may reduce the side effects of sham pills (Barnes et al., 2019; Colloca and Finniss, 2012). More broadly, a warm and empathetic encounter can improve outcomes for active and inactive treatments in clinical settings (Blasini et al., 2018).

Despite the similarities between the ideal contextual factors for strong placebo effects and the typical contextual factors involved in precision medicine, the influence of the personalisation process on perceived treatment effectiveness is largely unknown. Thus, we tested whether believing that a treatment is tailored to one’s physiology and genetics may improve its perceived efficacy. We predicted that participants using the machine presented as personalised would report greater placebo effects than those in a control group.

To isolate the role of the placebo effects of personalisation while avoiding the ethical issues involved in deceiving severely ill patients—the typical participants in precision clinical trials—we tested healthy adults. We developed an elaborate procedure to plausibly simulate treatment personalisation and then tested it in a feasibility study (N1=17) before confirming the findings in a pre-registered double-blind experiment (N2=85). The procedure was based on studies of complex placebo interventions (Olson et al., 2021a; Olson and Raz, 2021c) and simulated both the nature of the tests (i.e., genetic, physiological) and the medical context (i.e., room setting, location) of treatment tailoring. We also measured several personality traits that could potentially interact with the placebo effects of personalisation. Recent studies suggest that traits such as interoceptive awareness (attention to one’s physical sensations) and openness to experience predict the magnitude of placebo response (Vachon-Presseau et al., 2018); we additionally expected that other traits such as need for uniqueness (the desire to be seen as different from others) may moderate the specific placebo effects of personalisation.

Materials and methods

Feasibility study

Participants

We recruited 19 participants aged 18–35 from the McGill University community. One person was excluded due to technical errors during testing and another one for guessing the placebo component. The final sample included 17 participants (14 women) who were undergraduate psychology students (n=9) and 21.1 years old on average (SD = 2.9). Most participants were White (n=6) or Asian (n=6). The study was approved by the McGill University Research Ethics Board II (#45–0619).

Procedure

Request a detailed protocol

Before arriving at the lab (Figure 1), participants consented to participate in the study and completed several personality questionnaires: the Need for Uniqueness Scale (Snyder and Fromkin, 1977), Multidimensional Assessment of Interoceptive Awareness (Mehling et al., 2012), Big Five Inventory (John and Srivastava, 1999), Fear of Pain Questionnaire-III (McNeil and Rainwater, 1998), and the Pain Catastrophizing Scale (Sullivan et al., 1995); see SI Appendix for descriptions. Once at the lab, participants met two female experimenters at a medical building of a large Canadian university. The experimenters introduced themselves as neuroscience researchers and explained the study procedure. Participants learned about the study and were introduced to the placebo machine (Figure 2), which was presented as an analgesic device used in hospitals.

Participants completed sham medical tests and then rated pain stimulations in a room with various medical equipment.
On half of the stimulations, participants used a complex placebo machine with dials, vibration, and flashing lights to help reduce pain.

This machine was presented as either personalised to their test results or as generally effective. The machine’s design (over a dozen of switches and dials) allowed us to simulate complex personalisation to the participants’ profile.

Pain calibration

Request a detailed protocol

The experimenter then calibrated participants’ individual levels of pain for the pain task (Tabry et al., 2020); the calibration was performed once. The experimenter marked four 3 cm long locations on the participants’ inner forearm and then applied heat to each of these in a random order using the Medoc Pathway heat stimulator (3×4 cm, TSA-II Neurosensory Analyzer, Medoc Advanced Medical Systems Ltd., Ramat Yishai, Israel). Participants completed 28 heat stimulations: 7 temperatures per spot, ranging from 40 °C to 49 °C, generating the participant’s pain sensitivity curve. Each heat stimulation lasted 9 s (2.5 s ramp-up, 4 s maximum temperature, and 2.5 s ramp-down at the rate of 2.3 °C/s). Participants rated the stimulation as perceived heat or pain: for heat stimulations they rated the warmth on a visual analogue scale (0–100) to determine their pain threshold; for pain they rated the intensity (strength) and unpleasantness (discomfort) on separate scales (0–100) to determine the perception of pain levels. The task took approximately 20 min and was coded in E-Prime (Psychology Software Tools, Inc, Sharpsburg, PA). After its completion, participants were randomised to receive either a ‘personalised’ placebo machine or not.

Sham medical tests

Request a detailed protocol

All volunteers completed additional sham genetic and electrodermal skin response tests. For the genetic test, participants provided a saliva sample using a commercially available DNA kit. To feign the electrodermal skin response test, the experimenter attached two electrodes to participants’ fingers and then pretended to record their galvanic skin response for one minute.

Personalised group

Request a detailed protocol

During the procedure, participants learned that the experimenter would adjust the machine to their test results in an effort to increase its effectiveness. Once the tests were complete, the experimenter provided sham genetic and physiological feedback to the participant, reiterated that these were useful for the machine personalisation, and explained the machine functioning in detail. The experimenter then adjusted several dials and switches on the machine to match the participants’ results in front of them. Finally, the participants tested the machine briefly to increase their comfort with it (as well as its believability). For this, the experimenter attached two electrodes to the participants’ forearm and connected the machine to them for approximately one minute.

Control group

Request a detailed protocol

Those in the control group completed the same procedure ostensibly for eligibility instead of personalisation. The experimenter received the participant’s genetic feedback and informed the participants that they were eligible for the study. To match the duration of interaction and explanations provided in the personalised group, the experimenter instead described the different kinds of analgesics used in hospitals. The experimenter provided approximately 300 words of information to each group (280 in experimental and 298 in control). Finally, the experimenter introduced the machine with the same description and demonstration.

Placebo machine

Request a detailed protocol

We used a defunct electrical stimulator with various dials, switches, and buttons. The machine had several lights that flashed when turned on and a small vibrating device behind the machine to mimic buzzing. We used a real electric current to increase machine credibility: we hid a small Transcutaneous Electrical Nerve Stimulation (TENS) device behind the placebo machine, which was connected to the electrodes placed on the participant’s arm. The device was set at a non-therapeutic intensity that was barely perceptible and was administered for a few minutes at a time, as opposed to at therapeutic levels (i.e., high intensity and for at least 20 min).

Pain rating task

Request a detailed protocol

To reduce demand characteristics, a research assistant blind to the condition replaced the experimenter to run the participant through a validated pain task (Wager et al., 2004). The assistant then led the participants through 18 stimulation trials in 3 phases: conditioning (8), habituation (2), and testing (8). The stimulations followed the same procedure as the pain calibration task: participants received heat stimulations lasting 9 s each and rated these stimulations on pain intensity and unpleasantness using the same visual analogue scale coded with similar software (PsychoPy, version 3.1). We used temperatures corresponding to the participants’ respective pain levels obtained during the calibration task in order to standardise pain perception across participants. The conditioning phase of the task was used to demonstrate the machine’s effectiveness. Participants received 4 pain stimulations at 80/100 level pain when the machine was turned off and 4 stimulations at 20/100 level pain when the machine was turned on, in a counterbalanced order. To minimise habituation and sensitisation noise from the repeated pain stimulations, we applied heat randomly on areas 1 and 3 (out of the 4 spots previously marked) on the participant’s arm, and reserved spots 2 and 4 for testing. A 5-min break followed, during which participants completed a filler creativity task (Olson et al., 2021b).

After the break, participants completed 2 habituation trials with level 50 pain on areas 2 and 4 of the participant’s arm, followed by 8 testing trials on the same spots, with the same on–off order as conditioning and level 50 of heat pain.

Probing for suspicion

Request a detailed protocol

At the end of the study, the experimenter interviewed participants about their experience, probed them for suspicion about the true purpose of the study (Nichols and Edlund, 2015), and provided a partial debriefing. All participants were fully debriefed after the end of data collection.

Confirmatory study

Participants

The sample size, exclusion criteria, and analyses were pre-registered online (https://osf.io/dcs98). We recruited 106 healthy participants aged 18–35 from the McGill University community; these were students and recent graduates from various disciplines. Of all participants, 1 did not complete the questionnaires which included the consent form, 6 did not fit eligibility criteria after consenting to participate, 1 experienced technical errors during the experiment, 1 refused to use the machine, and 12 mentioned or asked about the placebo effect (6 in each group). We were stringent with the exclusion criteria to avoid positively biasing our effect: we only excluded participants who explicitly mentioned the placebo effect with additional explanations. For instance, one participant expressed general suspicion about stimulation timings and asked about placebo effects in the beginning of the session and was therefore excluded. The final sample included 85 participants (71 women) with a mean age of 21.4 (SD = 2.2). Most participants were White (n=42) or Asian (n=34). We excluded one additional participant from the analyses of expectations due to missing data. The study was approved by the McGill University Research Ethics Board II (#45–0619).

Procedure

Request a detailed protocol

The procedure and measures were identical to those reported for the feasibility study, with the changes listed below (Figure 3).

Procedure for the confirmatory study.

We first asked participants to complete personality questionnaires and calibrated heat stimulations to their individual pain perception. Participants then completed sham medical tests (i.e., genetics, skin conductance) before being randomised to receive the placebo machine described as personalised to their sham test results or not (control). A research assistant blind to the experimental condition then led participants through a pain rating task that was similar to the calibration. On half of the heat stimulations, participants used the machine (turned on) to counteract the heat pain (on the other half, the machine was turned off). In the conditioning phase, we simulated machine effectiveness by covertly reducing the intensity of pain stimulations when the machine was turned on. For the testing phase, we kept the temperature stable and quantified the placebo effect as the difference between the trials with the machine off and on.

Pain task

Request a detailed protocol

In this study, we used a pain level of 60 out of 100 for the conditioning-machine-off block (instead of 80 in the feasibility study) and level 40 for the testing blocks (instead of 50). We reduced the gap between off–on temperatures to increase the believability of the machine’s effect in the confirmatory study. On average, participants reported a pain threshold of 45.9 °C (SD = 1.7), as well as 46.9 °C (1.3) for pain level 20, 47.8 °C (1.0) for pain level 40, and 48.5 °C (1.2) for pain level 60.

Expectations

Request a detailed protocol

Participants rated ‘how effective [they] expect the machine to be in reducing [their] pain’ on a scale from 0 (Not at all) to 10 (Completely). They rated their expectations twice: at the introduction of the machine, and after the conditioning.

Side effect suggestion and assessment

Request a detailed protocol

To induce side effects, the experimenter suggested that approximately 10% of people using the machine may experience transient side effects: itchiness, dizziness, or muscle tremors. At the end of the pain task, they rated the experienced side effects from the machine using the modified General Assessment of Side Effects (Rief et al., 2011). We predicted that participants in the personalised group would show fewer side effects, because the treatment would be more personalised and less likely to cause adverse effects.

Sample size and analyses

Request a detailed protocol

Analyses were similar across the two studies.

We had two hypotheses. First, we expected participants in the tailored placebo group to show a greater reduction in pain ratings than those in the control group when using the machine. We used mixed-effects linear regression (package nlme, R version 4.2.1) separately testing the main outcomes of pain intensity and unpleasantness given the condition (tailored or control), placebo machine state (on or off), and their interaction. We used a random intercept for each participant.

Second, we expected participants to show fewer side effects with the tailored placebo than the standard one. We ran a Poisson regression to compare the total number of reported side effects between the groups. We used a Type I error rate of .05, directional tests, and no family-wise error control.

As exploratory analyses, we also tested whether expectations and personality characteristics moderated the magnitude of placebo effects. Due to high rates of suspicion and exclusion of participants when using the pre-registered measure of expectations during pilot testing, we deviated from our pre-registered measure and instead used a single expectation rating. With this sample size and number of trials per participant, we had nearly 100% power to detect the medium behavioural effects (standardised β=0.5) found in our feasibility trial.

Results

Feasibility study

Participants receiving a placebo they thought was personalised reported nearly twice the reduction in pain intensity (38%; standardised β=−0.50 [−1.08, 0.08], p=.044) and unpleasantness (41%; β=−0.52 [–1.04, –0.001], p=.025, Figures 4 and 5) as those in the control group (19% and 27%, respectively).

Participants in the personalised group reported nearly twice the reduction in pain intensity (A) and unpleasantness (B; N=17).

The placebo effect was calculated as ratings with the machine off – machine on. Black dots show means, coloured dots show individual raw scores, violin widths show frequency, and error bars show 95% confidence intervals.

Individual pain score changes with the placebo machine turned on or off for pain intensity (A) and unpleasantness (B).

Large coloured dots show means, small coloured dots show individual scores, and error bars show 95% confidence intervals.

Confirmatory study

Pre-registered findings: Pain ratings and side effects

Consistent with our predictions, participants in the personalised group showed stronger placebo effects than those in the control group on pain intensity (standardised β=−0.20 [–0.36, –0.04], p=.013) and unpleasantness (β=−0.24 [–0.41, –0.08], p=.003, Figures 6 and 7). Participants receiving a machine that they thought was personalised reported an average reduction of 5.8 points in pain intensity (11% from baseline) and a 7.3-point reduction in unpleasantness (16%), compared to the control group decrease of 1.4 points (3%) for both. The ratings of pain intensity and unpleasantness on each trial correlated nearly perfectly (r(678)=.91 [.89, .92], p<.001); we therefore focus on pain intensity ratings, but all effects were also found in pain unpleasantness (see Appendix 1). Several participants in both groups also reported increases in pain ratings from using the machine.

Participants in the personalised group reported higher placebo effects than those in the control group for pain intensity (A) and unpleasantness (B; N=85).

The panels show changes calculated as ratings with the machine off – machine on. Black dots show means, coloured dots show individual raw scores, violin widths show frequency, and error bars show 95% confidence intervals.

Individual pain score changes with the placebo machine turned on or off for pain intensity (A) and unpleasantness (B).

Large coloured dots show means, small coloured dots show individual scores, and error bars show 95% confidence intervals.

Finally, participants in both groups showed similarly low rates of side effects when using the placebo machine (βgroup=0.31, p=.56).

Exploratory findings: Individual-level moderators of the placebo effect

Several personality traits moderated the personalisation placebo effects. Need for uniqueness moderated the increase in placebo analgesia in the personalised group (βinteraction=–0.02 [–0.03, –0.003], p=.014; Figure 8A). Participants with a greater need for uniqueness benefitted more from the sham personalised placebo than those in the control group.

Exploratory predictors of placebo effects on pain intensity (N=85).

Participants high in Need for uniqueness (A), Attention regulation (B), Emotion awareness (C), and Noticing (D) showed stronger placebo effects with a sham-personalised machine than those in the control group. Shaded regions denote 95% confidence intervals and correlations are between the trait and the pain ratings in each group.

Interoceptive awareness, measured by the Multidimensional Assessment of Interoceptive Awareness, showed a similar pattern. Three of the eight subscales of this measure drove the effects: emotion awareness (e.g., ‘I notice how my body changes when I am angry’; standardised β=−0.20 [–0.35, –0.05]), attention regulation (e.g., ‘I can return awareness to my body if I am distracted’; β=−0.18 [–0.34, –0.01]), and noticing (e.g., ‘I notice when I am uncomfortable in my body’; β=−0.17 [–0.33, –0.01]; Figure 8B–D). The body listening subscale (e.g., ‘When I am upset, I take time to explore how my body feels’) only moderated effects on pain unpleasantness (β=−0.17 [–0.31, –0.03], see Appendix 1) but not intensity.

Other personality traits also moderated the effect on either unpleasantness (openness to experience; β=−0.04 [–0.07, –0.01]) or intensity (conscientiousness; β=−0.03 [0.003, 0.05]). Appendix 1 includes the statistics for all other personality moderators measured (Appendix 1—tables 1 and 2) as well as correlations between them (Appendix 1—figure 2). Sex did not moderate the placebo effects of personalisation (β=−0.07 [–1.09, 1.22], p=.91).

Exploratory findings: Expectations

Expectations about the machine’s perceived effectiveness were moderate in both groups before (Mcontrol=6.1 out of 10 (SD = 1.6), Mpersonalised = 5.9 (1.6)) and after conditioning (Mcontrol = 6.0 (2.3), Mcontrol=6.7 (2.2)). There was no difference between the personalised and the control conditions ( βgroup=–0.30 [–0.13, 0.73], p=.17). When combined across groups, higher pre-conditioning expectations correlated with smaller effects on pain unpleasantness (r(82)=–.30 [–.49, –.10], p=.005); higher post-conditioning expectations showed the opposite effect and correlated positively with stronger effects on pain intensity (r(82)=.25 [.04, .44], p=.021, Figure 9). In other words, people who expected the machine to work better before conditioning reported lower pain unpleasantness, while those who expected the machine to work better after conditioning reported lower pain intensity.

Expectations as a predictor of placebo effects with groups combined (N=84).

Dots show individual scores and shaded regions denote 95% confidence intervals.

Discussion

With interest in precision medicine and personalisation on the rise (ANA, 2019; Joshua, 2019), understanding how contextual factors influence the perceived effectiveness of targeted treatments can impact research and delivery. In a feasibility study and a pre-registered double-blind experiment, we found that completing a sham biological personalisation process led to greater placebo analgesia. In the feasibility study, participants experienced double the reduction in pain intensity when receiving treatment from a machine presented as personalised; we found similar but smaller effects in the confirmatory study. Thus, participants that received a machine framed as personalised to their genetics perceived it to be more effective in reducing their pain. Our findings provide some of the first evidence for this novel placebo effect and suggest its further study in clinical contexts, echoing experts in the field (Haga et al., 2009). The results also support the need for more consistent use of blinding, inactive control groups, and randomisation, especially for pivotal trials determining FDA approval of precision drugs. Indeed, only half of the FDA-approved precision treatments in recent years were based on double- or single-blinded pivotal trials, and only 38% of all pivotal trials used a placebo comparator (Pregelj et al., 2018). Although precision treatments are often developed for difficult-to-study diseases, their potential to elicit stronger placebo effects calls for more robust research designs.

Better control over placebo effects in precision medicine may become especially important given the future trend of the field to use increasingly complex technologies such as brain scanning and artificial intelligence (Ahmed et al., 2020; da Silva Castanheira et al., 2021) for more extensive personalisation. Depending on the disease, targeted treatments may soon be adjusted to dozens of genetic, neural, and physiological biomarkers instead of only a few genetic markers. Such personalisation may magnify the focus on individuality, boost treatment complexity, and increase patient–practitioner interaction—likely increasing the placebo effects in the process. Preferentially using blind, randomised, and placebo-controlled trial designs can help successfully isolate the active treatment effects in such a context.

Curiously, we found that the placebo effects of personalisation may also potentially be ‘personal’: some participants may benefit from them more based on their personality traits. Participants high in need for uniqueness—the desire to be seen as different from others—responded strongest to the sham personalised machine, yet less so to the one in the control group. Other personality traits including attentiveness to bodily sensations (emotion awareness, attention regulation, and noticing physical sensations) as well as openness to experience also moderated the effect, in line with recent findings and the general hope for eventual personalisation of the treatment context (Enck et al., 2013; Geers et al., 2006; Vachon-Presseau et al., 2018). Indeed, some of the same traits (Vachon-Presseau et al., 2018) and general attention to symptoms (Geers et al., 2006) predicted increased placebo effects in other studies; our findings tentatively suggest that these traits may also amplify the specific placebo effects due to personalisation. Future studies may explore which complex personality profiles benefit the most from these placebo effects and through which mechanisms.

Several methodological strengths increased the validity of our results. We used a two-step approach of first testing the effectiveness of the deceptive procedure in a feasibility study and then confirming our findings in a pre-registered experiment; the results are thus more likely to replicate than a single study. Using the elaborate deception procedure may have also helped reduce participant suspicion (Olson and Raz, 2021c) and increase the reliability of their pain ratings. Only 12% of the participants suspected the placebo effect and none guessed the purpose of the experiment in the confirmatory study, despite many participants having graduate training in biology, genetics, or psychology. This is in line with previous studies on complex deception using intentionally elaborate placebos (Olson et al., 2016; Olson et al., 2020; Olson et al., 2023).

Finally, we imitated parts of the personalisation process such as the medical setting (Figure 1), therapeutic interactions (e.g., the explanation of the genetic results), and the level of complex testing (i.e., multiple tests) to somewhat increase generalisability. Together, these elements strengthened our conclusion that contextual factors may potentially play a role in increasing the placebo effect of precision treatments.

The main limitations of the study are its focus on healthy participants, the use of an inactive treatment, a sample with imbalanced genders, and the focus on subjective outcomes. Together, these factors restricted the generalisation of our findings to clinical settings. Our effect was also small; the 11% reduction in pain intensity and 16% reduction in unpleasantness reached the lower threshold of minimal clinical significance of pain reduction (10 to 20%) suggested by guidelines (Dworkin et al., 2009). Nevertheless, testing placebo effects with experimental pain may have led to a conservative estimate of the placebo effect and may not map directly onto the clinical experience of chronic pain. Patients differ from healthy participants on many characteristics, including their motivation to get better (National Cancer Institute, 2021), the mechanisms through which they experience placebo effects (short- or long-term; Vase et al., 2005), and the methods of assessing pain ratings (immediate versus retrospective). Our effect sizes were similar to that of paracetamol (Jürgens et al., 2014) and morphine (Koppert et al., 1999) on thermal pain, suggesting the potential for clinical significance if tested in patients. Future studies could build on our proof-of-concept findings and explore whether these placebo effects apply to clinical populations who receive real personalised treatments focused on more objective measures. These additional investigations will help determine the clinical significance of placebo effects due to personalisation for active treatments.

Finding a mixed relationship between expectations and the tailoring process also limited our understanding of the mechanism underlying our effects. Post-conditioning expectations predicted a greater reduction in pain ratings, suggesting that conditioning is crucial for inducing placebo effects in pain, as has been demonstrated in previous studies (Colloca et al., 2020). However, there were no expectancy differences between the groups: both were moderate after the conditioning. The mechanism behind the placebo effect of personalisation may thus rely on an interaction with additional elements that need to be explored, such as increases in mood from receiving a personalised treatment. It is also possible that the more complex mechanism is responsible for the general lack of placebo effects in the control group, but not in the experimental group.

Clinical implications

If the studies in clinical contexts with real treatments find a similar or larger placebo effect due to personalisation, clinicians may be able to optimise it when delivering treatments. Precision drug dosing is set to become more available to the general public by potentially targeting a broader range of diseases (Rybak et al., 2020); physicians may be able to enhance this placebo effect by improving therapeutic communication. For example, they could describe in detail how patients’ biological variability would be used to personalise the treatment or drug dose, or they could highlight the general complexity of the personalisation procedure. Physicians could also simply emphasise the likely increase in intervention effectiveness due to its personalisation.

Outside of personalised treatments, physicians could still harness the allure of tailoring. A lot of medicine is already personalised to various metrics even before factoring in genetic testing; focusing patients’ attention on that fact and how it is personalised to their tests or biological particularities may potentially enhance the effectiveness of more typical treatments. Indeed, placebo studies demonstrate that verbally emphasising the helpfulness of drugs like morphine further increases their effect (Benedetti et al., 2003). One could take a similar approach to emphasise the existing personalisation for various treatments. Overall, there are many opportunities to harness contextual factors of personalisation and patient characteristics if these are effective at improving treatment outcomes in clinical practice.

Ideas and speculation

If confirmed in clinical settings, our findings may have implications beyond the field of precision medicine and healthcare. Individual tailoring is increasingly becoming the focus of consumer products and experiences; a large marketing organisation recently declared ‘personalisation’ as the word of the year (ANA, 2019). This may be especially true for genetics-based tailoring, likely due to the growing accessibility of testing and the general hype around genetics (Sabatello et al., 2021). Various companies now sell personalised diets based on nutrigenomics or personalised exercise plans based on sportomics; others promise personalised learning approaches based on behavioural genetics, to name a few. However, several of these fields are in their early stages (Guest et al., 2019; Sellami et al., 2021) and it remains unclear what the effectiveness of some such tailored approaches may be (Janssens et al., 2008). Our results raise the possibility that placebo effects involved in personalisation may play a relevant role in the context of the growing interest in precision medicine. In this study, we show that the personalisation process was strong enough to influence the perception of thermal pain stimulations. These effects could be potentially even more pronounced in clinical trials and medical contexts, for conditions with both objective and subjective symptoms that are amenable to placebo effects (Wampold et al., 2005), or for complex interventions such as diet change.

Conclusion

We suggest a new avenue of clinical research to extend the effects of placebo personalisation to specific treatments, determine their mechanisms of action, and explore the optimisation of contextual factors in their delivery. Some interventions known to be susceptible to placebos (e.g. immunotherapy) may be more amenable to context optimisation than others (e.g., Alzheimer’s therapy; Benedetti et al., 2005); patients from more individualistic cultures and possessing specific personality traits may benefit from enhanced tailoring while others may be hindered by it. Initiatives like the United States’ ‘All of Us Program’ and the UK’s Biobank are collecting millions of data points on biomarkers of disease in a move towards routinely personalised healthcare. We show that contextual factors may be a hidden element to understand and harness in this new era of medicine.

Appendix 1

Measures

Need for Uniqueness (NUS)

The NUS is a 32-item self-report measure assessing a person’s motivation to appear different or unique (Snyder and Fromkin, 1977). Participants rate characteristics like “Feeling ‘different’ in a crowd of people makes me feel uncomfortable” on a scale of 1 (Strongly disagree) to 5 (Strongly agree). It ranges between 32 and 160, and has a high internal reliability (Cronbach’s α=.84).

Multidimentional Assessment of Interoceptive Awareness (MAIA)

To measure interoceptive awareness, or attention to bodily sensations, we used the MAIA scale (Mehling et al., 2012). It includes 32 questions on 8 different aspects of interoceptive awareness, such as noticing one’s sensations (‘I notice when I am uncomfortable in my body’), awareness of bodily sensations and emotional states (‘When something is wrong in my life, I can feel it in my body’), and regulating one’s attention to sensations (‘I can return awareness to my body if I am distracted’). The scale ranges from 0 to 160 in total, but each subscale can have its own score. Each subscale score is computed as the mean of all questions included in that subscale. Reliability of each of these varies from adequate to good (α=.66 to .82).

Fear of Pain Questionnaire-III (FPQ-III)

Pain anxiety and desire for pain relief may predict the magnitude of the experienced placebo analgesia (Wager, 2005). The FPQ-III is a 30-item self-report measure assessing fear in response to painful stimuli (McNeil and Rainwater, 1998). Participants rate fear of painful experiences such as ‘Breaking your arm’ on a scale of 1 (Not at all) to 5 (Extreme), with scores ranging from 1 to 150. Subscales have excellent internal consistency (α ranging from .88 to .92).

Pain Catastrophising Questionnaire (PCS)

The PCS is a 13-item self-report measure assessing the trait for catastrophising thoughts related to pain (Sullivan et al., 1995). Participants rate thoughts and feelings such as ‘I feel I can’t go on’ about the experience of pain on a scale of 1 (Not at all) to 4 (All the time). The score range is between 13 and 52; the higher the score, the more catastrophising thoughts are present. This questionnaire has excellent internal consistency (α=.93).

Big Five Inventory (BFI)

The BFI is a 44-item self-report measure assessing five broad personality traits: openness to experience, conscientiousness, extraversion, neuroticism, and agreeableness (John et al., 1991; John and Srivastava, 1999). Participants rate characteristics like ‘I am someone who is talkative’ on a scale of 1 (Disagree strongly) to 5 (Agree strongly). It has good internal reliability (α=.83); each trait has a separate score, summed across its respective subscale items.

Pain ratings during conditioning

Appendix 1—figure 1
The differences in pain intensity and unpleasantness during the conditioning phase of the confirmatory study.

Dots show means and error bars show 95% confidence intervals.

Personality trait moderators

Appendix 1—figure 2
Personality traits that significantly moderated the placebo effects of personalisation (N=85).

Shaded regions show 95% confidence intervals, equations represent proportion of variance explained by each group.

Appendix 1—table 1
Regression results of all personality predictors of increased placebo effects on pain intensity.

We only tested interactions to reduce the probability of Type I errors; all tests were exploratory. Significant interactions (change in pain ratings × personality trait; two-tailed p <.05) are bolded.

Personality traitPredictorStandardised βSEdftp
Attention regulation(Intercept)0.0340.4115910.084.933
Condition–0.0320.54981–0.059.953
Machine–0.1670.175591–0.950.342
Attention regulation0.0200.143810.138.891
Interaction0.1770.0835912.142.033
Noticing(Intercept)–0.1580.46591–0.343.732
Condition0.0580.623810.093.926
Machine–0.4140.198591–2.094.037
Noticing0.0770.139810.554.581
Interaction0.1670.0815912.065.039
Not-worrying(Intercept)0.2710.3655910.743.458
Condition–0.1210.49381–0.244.807
Machine0.0470.1575910.297.766
Not-worrying–0.0740.13881–0.537.593
Interaction0.0800.0825910.981.327
Self-regulation(Intercept)–0.2670.401591–0.666.506
Condition0.2720.541810.502.617
Machine–0.0950.173591–0.547.584
Self-regulation0.1320.140810.940.350
Interaction–0.0770.082591–0.943.346
Emotion awareness(Intercept)–0.3330.413591–0.806.421
Condition0.6930.591811.172.244
Machine–0.4350.179591–2.426.016
Emotion awareness0.1300.121811.078.284
Interaction0.2030.0775912.643.008
Not-distracting(Intercept)0.7390.3355912.207.028
Condition–1.0300.52281–1.974.052
Machine–0.2050.147591–1.391.165
Not-distracting–0.3120.14781–2.119.037
Interaction–0.1340.100591–1.350.177
Trusting(Intercept)0.2990.4235910.707.480
Condition–0.9450.57281–1.651.103
Machine0.0130.1855910.069.945
Trusting–0.0650.12381–0.529.598
Interaction–0.0370.073591–0.503.615
Body listening(Intercept)–0.2960.327591–0.904.366
Condition0.1550.447810.348.729
Machine–0.1060.142591–0.749.454
Body listening0.1570.123811.276.206
Interaction–0.0680.071591–0.947.344
Openness to experience(Intercept)0.3870.8395910.462.644
Condition0.5151.205810.427.671
Machine–0.1290.362591–0.356.722
Openness to experience–0.0080.02381–0.360.720
Interaction–0.0030.014591–0.226.821
Conscientiousness(Intercept)0.0620.6465910.097.923
Condition–0.2510.87781–0.286.776
Machine0.3710.2775911.336.182
Conscientiousness0.0010.020810.043.966
Interaction0.0270.0125912.281.023
Extraversion(Intercept)0.6250.5445911.148.251
Condition–0.2460.82581–0.298.766
Machine–0.4990.233591–2.136.033
Extraversion–0.0210.02181–1.015.313
Interaction–0.0130.013591–0.953.341
Agreeableness(Intercept)0.4510.7385910.611.541
Condition–0.7631.41681–0.538.592
Machine0.3230.3175911.017.310
Agreeableness–0.0110.02281–0.500.618
Interaction0.0010.0185910.061.951
Neuroticism(Intercept)0.5780.5475911.057.291
Condition0.0930.803810.115.908
Machine–0.3710.236591–1.573.116
Neuroticism–0.0200.02181–0.920.361
Interaction–0.0080.013591–0.594.553
Fear of pain(Intercept)0.2960.7355910.403.687
Condition–1.1791.07081–1.102.274
Machine–0.4930.318591–1.553.121
Fear of pain–0.0020.00981–0.283.778
Interaction–0.0070.005591–1.211.226
Pain catastrophising(Intercept)–0.2210.282591–0.786.432
Condition0.2680.449810.596.553
Machine–0.0950.122591–0.776.438
Pain catastrophising0.0140.012811.246.216
Interaction–0.0090.008591–1.104.270
Appendix 1—table 2
Regression results of all personality predictors of increased placebo effects on pain unpleasantness.

Significant interactions (change in pain ratings × personality trait; two-tailed p <.05) are bolded.

Personality traitPredictorβSEdftp
Attention regulation(Intercept)0.0360.4135910.086.931
Condition0.0520.552810.094.925
Machine–0.3020.176591–1.710.088
Attention regulation–0.0100.14481–0.067.947
Interaction0.2790.0835913.349.001
Noticing(Intercept)–0.4160.460591–0.905.366
Condition0.4450.623810.714.477
Machine–0.5450.200591–2.724.007
Noticing0.1340.139810.967.336
Interaction0.2120.0825912.590.010
Not-worrying(Intercept)0.3570.3625910.988.324
Condition0.3090.488810.632.529
Machine0.0080.1595910.052.958
Not-worrying–0.1440.13781–1.049.297
Interaction0.0780.0835910.944.346
Self-regulation(Intercept)–0.3430.403591–0.851.395
Condition0.4430.544810.814.418
Machine–0.2150.175591–1.227.220
Self-regulation0.1320.141810.932.354
Interaction–0.1330.083591–1.613.107
Emotion awareness(Intercept)–0.4630.417591–1.111.267
Condition0.6790.596811.139.258
Machine–0.4400.182591–2.419.016
Emotion awareness0.1460.122811.198.234
Interaction0.2060.0785912.644.008
Not-distracting(Intercept)0.6320.3385911.869.062
Condition–0.9610.52781–1.824.072
Machine–0.2070.149591–1.389.165
Not-distracting–0.2960.14981–1.992.050
Interaction–0.1840.101591–1.823.069
Trusting(Intercept)0.2060.4305910.479.632
Condition–0.5800.58281–0.997.322
Machine–0.0650.188591–0.346.730
Trusting–0.0610.12581–0.485.629
Interaction–0.0080.074591–0.108.914
Body listening(Intercept)–0.4270.325591–1.315.189
Condition0.1410.444810.317.752
Machine–0.2130.143591–1.484.138
Body listening0.1800.122811.471.145
Interaction0.1680.0725912.331.020
Openness to experience(Intercept)0.3740.8455910.442.658
Condition0.0551.214810.045.964
Machine–0.2870.363591–0.790.430
Openness to experience–0.0100.02381–0.433.666
Interaction0.0370.0145912.655.008
Conscientiousness(Intercept)–0.0420.654591–0.064.949
Condition0.4090.888810.460.647
Machine0.1360.2825910.482.630
Conscientiousness0.0020.021810.081.936
Interaction0.0200.0125911.608.108
Extraversion(Intercept)0.5410.5485910.987.324
Condition–0.5310.83181–0.639.525
Machine–0.3800.237591–1.603.110
Extraversion–0.0210.02181–1.004.318
Interaction–0.0180.014591–1.326.185
Agreeableness(Intercept)0.3640.7435910.489.625
Condition–0.7231.42781–0.507.614
Machine0.2500.3225910.777.438
Agreeableness–0.0110.02281–0.484.629
Interaction0.0070.0185910.371.710
Neuroticism(Intercept)0.0290.5545910.053.958
Condition–0.0160.81481–0.020.984
Machine–0.3380.239591–1.414.158
Neuroticism–0.0010.02281–0.037.970
Interaction–0.0150.014591–1.137.256
Fear of pain(Intercept)–0.1560.742591–0.210.834
Condition–0.4051.08081–0.375.709
Machine–0.4410.322591–1.370.171
Fear of pain0.0020.009810.231.818
Interaction–0.0040.006591–0.671.502
Pain catastrophising(Intercept)–0.4040.278591–1.452.147
Condition0.0640.444810.143.886
Machine–0.0980.124591–0.796.426
Pain catastrophising0.0190.011811.678.097
Interaction–0.0100.008591–1.172.242
Appendix 1—figure 3
Correlations between all personality traits measured as potential predictors of placebo effects of personalisation.

Data availability

All data is freely available at Open Science Framework (https://osf.io/6j7z5/).

The following data sets were generated
    1. Sandra DA
    2. Olson JA
    3. Langer EJ
    4. Roy M
    (2023) Open Science Framework
    ID 6j7z5. Pain ratings with and without two types of a placebo machine.

References

  1. Book
    1. Blasini M
    2. Peiris N
    3. Wright T
    4. Colloca L
    (2018) The role of patient–practitioner relationships in placebo and nocebo phenomena
    In: Blasini M, editors. International Review of Neurobiology. Academic Press Inc. pp. 211–231.
    https://doi.org/10.1016/bs.irn.2018.07.033
    1. Gelman SA
    (2003)
    The essential child: Origins of essentialism in everyday thought
    The essential child, The essential child: Origins of essentialism in everyday thought, Oxford University Press, 10.1093/acprof:oso/9780195154061.001.0001.
  2. Book
    1. John O
    2. Srivastava S
    (1999)
    The big-five trait taxonomy: History, measurement, and theoretical perspectives
    University of California Press.

Decision letter

  1. José Biurrun Manresa
    Reviewing Editor; National Scientific and Technical Research Council (CONICET), National University of Entre Ríos (UNER), Argentina
  2. Christian Büchel
    Senior Editor; University Medical Center Hamburg-Eppendorf, Germany
  3. José Biurrun Manresa
    Reviewer; National Scientific and Technical Research Council (CONICET), National University of Entre Ríos (UNER), Argentina

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "Presenting a Sham Treatment as Personalised Increases its Effectiveness in a Randomised Controlled Trial" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including José Biurrun Manresa as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Christian Büchel as the Senior Editor.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

Reviewers have outlined several recommendations for the authors. Below please find a summarized list with the essential revisions, but do refer to the reviewers' suggestions for details:

1) Revise the terminology throughout the manuscript.

2) Add missing details about the methodology and additional data regarding calibration procedures.

3) Improve the graphical data presentation.

4) Add the statistical analysis on the existing data requested by the reviewers.

5) Rework the discussion and reassess the extent of the claims taking into account the reviewers' suggestion, particularly with regards to the magnitude of the effects and its clinical significance, and the potential confounders that are not currently discussed (sample bias, intervening variables, etc.).

Reviewer #1:

In this manuscript, Sandra et al. aimed at quantifying the role of the placebo effects of personalization in a randomized clinical trial.

The main strengths of the manuscript are:

– It presents data from an exploratory and a confirmatory study to test the same hypothesis.

– The study presents data from several relevant variables that appear to have been carefully collected.

The main weaknesses of the manuscript are:

– The sample is not representative of the general population and the experimental settings are not a good match for clinical settings, which hinders the generalizability of the results.

– The interpretation of the results does not consider potential implications related to individual vs group differences, or the experimental or clinical relevance of the effect sizes observed.

I believe that the authors partially succeed in their aim, given that they are able to show a group effect of personalization in the quantification of the placebo effect. I believe that the discussion would benefit from contextualizing these results in the experimental settings, and reappraising them in relation to their actual clinical relevance.

Terminology

This might sound like a minor semantic detail, but the authors state that "precision treatments benefit from contextual factors", and that "treatment effects can be boosted by personalization", and this phrasing considers the placebo effect as part of the treatment. If I might suggest a different phrasing, I would say that the outcomes of an "intervention" can be constructed as the sum of the "real" treatment effect (if any) plus the placebo effect, and personalization in the context of this study only boosts the placebo effect. Here, the word "treatment" is used with a double meaning: as the action of attending to the patient's needs (what I suggest calling the intervention), and as the active component of the therapy that is supposed to produce physiological effects through mechanisms other than the placebo effect, that is, the treatment (which could be a pain-relief drug or a procedure such as TENS). Whereas this is not wrong per se, I would argue that the manuscript would benefit from clearly differentiating between these two terms, because this study does not have an active "treatment" condition, i.e., a group in which analgesia was provided by means known to work directly at the nociceptive system level. This would also be helpful to handle ethical issues that might arise if one considers placebo as part of the treatment: under these conclusions, pseudo-scientific therapies not backed by evidence (e.g., homeopathy) could be called "effective" just by "tailoring" or "personalizing" some aspect of the treatment to specific patient traits.

In line with this, the use of the term "treatment effectiveness" might not be entirely adequate to refer to the placebo analgesia effects. A suggestion would be to qualify it as "perceived effectiveness", or to rephrase it as "Presenting a sham treatment as personalized increases the placebo effect in an RCT". Furthermore, there is no real "pain relief": as the figures show, there are changes in pain intensity ratings due to the summed effects of placebo and reduction in stimulation intensity.

Experimental design

Settings and sampling

I believe that the settings and sampling could have been improved. Even though the authors clearly state that volunteers were dropped if they asked about or mentioned the placebo effect, all participants are sampled from the McGill University community, and from the data, most are women who are also undergraduate students. Furthermore, it is fair to assume that the experiments were carried out in a laboratory at the Dept. of Psychology. In most circumstances (and for the feasibility study presented here) one would accept that an exploratory study might be carried out in these settings. However, the confirmatory study deals with a well-known psychological effect, is carried out in the Dept. of Psychology by psychologists, the volunteers are mostly sampled from psychology students, and the experiment has a large number of psychological questions related to the expectation of pain. Furthermore, the "pain relief" device supposedly tailored to genetic information looks like technology from the 70s, when they could have just used the TENS stimulator that probably looks like cutting-edge, pain relief technology.

Interestingly, the authors list many of these parts of the study as strengths, but I believe some of them are limitations. I actually find it odd that only 12% of participants having graduate training in biology, genetics, or psychology would not behave differently in this experiment (more on these below in the results subsection), and we actually have no data on how the general population would behave; the authors could have run the experiment on real medical settings with medical doctors (as opposed to psychology researchers) and a level of complexity during testing that mimics clinical routine, no more and no less (the abundance of psychological questions is particularly notorious).

Sham condition

If I followed the methodology correctly, the expected change in pain intensity in the conditioning stage of the feasibility study should be about 60 points for the feasibility study (80-20) and 40 points for the confirmatory study (60-20). The effect on pain intensity ratings is so large that participants should believe that this is nothing short of a miraculous machine (considering experimental settings, age, gender, and current health status). I suggest authors present data recorded on the observed effect in pain intensity and unpleasantness ratings during conditioning.

Following the experiment, participants proceed to the testing block, and the effect almost vanishes (an average of 5 points on pain intensity ratings, approximately 10% of the effect during the conditioning stage). I find it surprising that this fact apparently did not come up during the debriefing. Furthermore, debriefing is mentioned in the methods, but no results are reported in this regard.

Interpretation of results

I assume that the results shown in Figure 3 and 4 are the change in pain intensity and unpleasantness ratings during the testing block, for which the temperature at both the On and Off conditions is calibrated at a pain intensity of 50. I suggest reporting data from the calibration procedure in more detail, in order to compare it with the extensive knowledge available for thermal pain testing (e.g. Moloney et al., 2012).

Furthermore, in the absence of a placebo effect (or other psychological confounders derived from the experimental settings), one would expect ratings during On and Off testing to be distributed around 50 points. In this regard, I suggest authors plot not just the differences, but the actual recorded ratings (with lines linking the On and Off data for each subject, since these are within-subject repeated measurements), and analyze the results using the Off condition as covariate (Vickers, 2001).

In line with this, differences between On and Off testing are to be within the measurement error for pain intensity ratings due to thermal stimulation, with an average difference between conditions of zero (no bias), as observed in the control condition. Since the authors report a non-zero average difference (5.8/100) they attribute this to the placebo effect.

However, the settings for this experiment are quite particular: since it discusses "personalized" effects, it is important to pay attention to individual pain intensity and unpleasantness scores. From the figures, if one considers a binary scale, in which a difference of zero is no effect, any positive difference (regardless of its size) implies a placebo effect, and any negative difference (regardless of its size) implies a nocebo effect, then a significant number of participants is individually reporting nocebo effects in both conditions, i.e., they report more pain and unpleasantness during the On condition. This is surprisingly not mentioned anywhere in the text.

If, on the other hand, one is inclined to interpret these results as a group effect, the experimental/clinical relevance of the effect size must be considered, instead of just the statistical significance (Angst, 2017; de Vet et al., 2006). In this section of the Discussions, the authors should discuss the actual effect size (5.8/100) and its relation to the minimal clinically relevant effect. In my opinion, the detected group effect is too small to be considered clinically significant. The authors state "Our findings raise a question in the broad context of increasing interest in personalisation: just how big of a placebo effect is there from intervention tailoring? In this study, we show that the personalisation process was strong enough to influence the perception of thermal pain stimulations". While I agree that the personalization process somewhat influenced the perception of thermal pain thresholds at a group level, I also believe that this was only detectable with a relatively large sample, that the effect size is small (and in my opinion, not clinically significant), and that the implications of nocebo effects in individual pain ratings were not thoroughly analyzed and discussed.

References

Angst, F. (2017). The minimal clinically important difference raised the significance of outcome effects above the statistical level, with methodological implications for future studies. Journal of Clinical Epidemiology.

Moloney, N.A., Hall, T.M., Doody, C.M. (2012). Reliability of thermal quantitative sensory testing: A systematic review. Journal of Rehabilitation Research and Development 49, 191-208.

de Vet, H.C., Terwee, C.B., Ostelo, R.W., Beckerman, H., Knol, D.L., Bouter, L.M. (2006). Minimal changes in health status questionnaires: distinction between minimally detectable change and minimally important change. Health Qual Life Outcomes 4, 54.

Vickers, A.J. (2001). The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study. BMC Medical Research Methodology 1, 6.

Reviewer #2:

This pre-registered confirmatory study provides experimental evidence that the 'personalisation' of a treatment represents an important contextual factor that can enhance the placebo analgesic effect. This is shown in a well-controlled, double-blind study, where a sham analgesic treatment is either presented as "personalized" or not in 85 healthy volunteers. The authors show that volunteers in the 'personalized' group display stronger placebo analgesic responses as compared to the control group (although it should be noted that overall no placebo analgesic effect is induced in the control group, which is somewhat surprising given the conditioning procedure used). No differences in side effect profiles were observed, however, side effects were generally low in both groups, so the absence of differences between groups might be explained by a floor effect.

Interestingly, these differences in placebo analgesia depending on 'personalisation' are not paralleled by significant differences in expectancy levels between groups neither before nor after conditioning. The authors also explore the role of different personality traits that may modulate the effect of 'personalisation' with interesting findings.

Overall, the manuscript is very well written, and the rationale of the study is clear and highly relevant, and innovative in the context of the increasing attempts to individually tailor treatments, also referred to as precision medicine. The methods of the study are sound and the conclusions of the authors are adequately backed up by the data.

These proof-of-concept findings obtained in healthy volunteers may have broad implications for the design and interpretation of clinical trials as well as systematic attempts to optimize treatment effects in clinical settings in the field of pain and beyond. Next studies have to test, whether the results translate into clinical contexts and beyond placebo analgesia.

The authors should indicate the amount of variance explained by the personality traits.

It should be acknowledged and discussed why no placebo effect was induced with the paradigm in the control group. The authors may want to indicate in the limitation section that the majority of participants were female. Please also indicate the gender of the experimenters.

Please indicate the proportion of participants who were suspicious of the nature of the study depending on the group.

While the authors strictly follow the preregistered exclusion criteria which are good, it should at least be discussed that the final results may be prone to bias by excluding participants who did not develop the expectancy of analgesia, which is also a key outcome of the study. This would be particularly troublesome if the proportion of suspicious participants would differ between groups.

The authors may thus want to compliment an intention to treat like analysis showing the results of all participants.

Reviewer #3:

This study highlights an important potential confounding factor that might affect a wide range of clinical studies into personalized medicine. The authors show that the psychological effect of believing a procedure is personalized can modulate the perceived pain and unpleasantness of that procedure. The study is carefully designed but the effect sizes identified are quite small compared to the large inter-individual variability seen in all groups, raising questions about the generalizability of the results to other contexts. Furthermore, the authors test the psychological effect of personalization only on a subjective outcome, whereas many clinical studies in personalized medicine are focused on more objective measures such as disease survival that may be less affected by patients' belief in personalization.

The study involves two replicates, the first being a pilot/feasibility study involving 17 subjects that found a marginally significant decrease (38%) in perceived pain intensity in the group believing their test was personalized. The second was a pre-registered, double-blinded, placebo-controlled confirmatory study involving 85 patients. This second study measured a smaller effect (11%) than the first, but with stronger statistical support. The second study also identified a small (16%) effect on the perceived "unpleasantness" of pain. The second study identified a personality trait, the "need for uniqueness" as weakly moderating the effect of sham personalization on pain perception, as well as several other personality traits. In data aggregated across both experiments, pre-conditioning expectations correlated with pain perception.

The study appears well-designed. However, the consistent effects of sham personalization are quite small compared to the large differences in pain perception within both control and personalized groups, raising questions about how generalizable the study results will be. Though the statistical analysis shows statistically significant p-values, the larger confirmatory study yielded effect sizes smaller than the initial pilot study. One wonders whether in a larger third study, that might include for example several hundred individuals, the effect size might be smaller still. It is unclear from the manuscript as currently written how clinically significant an 11% decrease in perceived pain intensity might be. The manuscript would benefit from a better framing of the magnitude of the effects identified so that readers can more clearly understand the effects' practical significance.

The authors spent considerable effort designing a test that controls for a variety of potential confounding factors, developing an intricate sham personalization machine with bespoke equipment. However, by necessity the study involved consistent differences in communication between sham treatment and placebo groups, leaving open the possibility for uncontrolled confounding factors. For example, the additional interaction between subjects and study staff in the sham group might have altered the subjects' impression of their environment-a friendlier relationship with staff, an improved mood due to perceived kindness factors that might modulate pain tolerance independently from any specific belief in personalization. The possibility that such potential confounds might mediate in part the decrease in subjective pain intensity weakens the generalizability of the results to other contexts.

That said, the authors interpret their findings fairly narrowly, suggesting that clinical trials should be aware of the psychological effects of personalization. This seems like a solid recommendation irrespective of any technical limitations of the current study. However, it must be noted that the authors study the effect of belief in personalization only on subjective outcome-perceived pain intensity. One wonders about the relevance of these results for clinical work in personalized medicine focused primarily on objective outcomes, which may be less influenced by patients' beliefs during treatment.

I recommend that the authors frame better the magnitude of the effects identified so that readers can better understand their practical significance. I also recommend that the authors more explicitly the possibility that the experimental protocol for sham personalization might act not via a belief in personalization per se, but rather by modulating subjects' impression of their environment (more friendly) or by altering subjects' mood – two factors previously identified as modulating pain perception.

https://doi.org/10.7554/eLife.84691.sa1

Author response

Essential revisions:

Reviewers have outlined several recommendations for the authors. Below please find a summarized list with the essential revisions, but do refer to the reviewers' suggestions for details:

1) Revise the terminology throughout the manuscript.

As described in more detail in the reviewer comments, we have now revised the terminology throughout the manuscript to clarify the distinction between active and sham treatments. We also, where possible, differentiated between them by using the term “intervention” when talking about active treatments.

2) Add missing details about the methodology and additional data regarding calibration procedures.

We elaborated on various aspects of methodology, such as the gender of the experimenters and the rates of suspicion in each group. We have also provided additional information regarding calibration procedures such as the average temperatures for each pain level in the sample of the confirmatory study.

3) Improve the graphical data presentation.

We have included additional graphs to show the individual changes in pain ratings and better demonstrate the variability in the raw scores. We have also included the graphs from the conditioning phase in the Appendix.

4) Add the statistical analysis on the existing data requested by the reviewers.

We clarify that the statistical analysis was already performed on raw scores, and therefore accounts for the random variation in pain ratings in the Off condition, suggested by Reviewer 1. We also discuss the importance of using per-protocol analysis instead of intention-to-treat analysis given our population of interest and the presence of deception. In brief, we suggest that our population of interest is only the participants who believe their treatment is personalised to them, and therefore those who are suspicious of our manipulation should be excluded following our pre-registered criteria.

5) Rework the discussion and reassess the extent of the claims taking into account the reviewers' suggestion, particularly with regards to the magnitude of the effects and its clinical significance, and the potential confounders that are not currently discussed (sample bias, intervening variables, etc.).

We have now reworked the discussion and expanded it on several points. First, we better contextualised our findings in terms of their clinical significance and relevance for clinical settings based on current guidelines. Second, we discussed in more detail several possible limitations such as the presence of sample bias and intervening variables. Finally, we expanded our methodology description and discussed the possibility of confounds suggested by Reviewer 3.

Reviewer #1:

In this manuscript, Sandra et al. aimed at quantifying the role of the placebo effects of personalization in a randomized clinical trial.

The main strengths of the manuscript are:

– It presents data from an exploratory and a confirmatory study to test the same hypothesis.

– The study presents data from several relevant variables that appear to have been carefully collected.

The main weaknesses of the manuscript are:

– The sample is not representative of the general population and the experimental settings are not a good match for clinical settings, which hinders the generalizability of the results.

– The interpretation of the results does not consider potential implications related to individual vs group differences, or the experimental or clinical relevance of the effect sizes observed.

I believe that the authors partially succeed in their aim, given that they are able to show a group effect of personalization in the quantification of the placebo effect. I believe that the discussion would benefit from contextualizing these results in the experimental settings, and reappraising them in relation to their actual clinical relevance.

Terminology

This might sound like a minor semantic detail, but the authors state that "precision treatments benefit from contextual factors", and that "treatment effects can be boosted by personalization", and this phrasing considers the placebo effect as part of the treatment. If I might suggest a different phrasing, I would say that the outcomes of an "intervention" can be constructed as the sum of the "real" treatment effect (if any) plus the placebo effect, and personalization in the context of this study only boosts the placebo effect. Here, the word "treatment" is used with a double meaning: as the action of attending to the patient's needs (what I suggest calling the intervention), and as the active component of the therapy that is supposed to produce physiological effects through mechanisms other than the placebo effect, that is, the treatment (which could be a pain-relief drug or a procedure such as TENS). Whereas this is not wrong per se, I would argue that the manuscript would benefit from clearly differentiating between these two terms, because this study does not have an active "treatment" condition, i.e., a group in which analgesia was provided by means known to work directly at the nociceptive system level. This would also be helpful to handle ethical issues that might arise if one considers placebo as part of the treatment: under these conclusions, pseudo-scientific therapies not backed by evidence (e.g., homeopathy) could be called "effective" just by "tailoring" or "personalizing" some aspect of the treatment to specific patient traits.

Thank you for this important suggestion. We have now clarified the distinction between the placebo effect and active ingredients throughout the manuscript, especially in the abstract and the introductory paragraphs. We have also used “intervention” when discussing precision medicine treatments to distinguish them from sham treatments:

Abstract

“Tailoring interventions to patient subgroups can improve intervention outcomes for various conditions. However, it is yet unclear how much of this improvement is due to the pharmacological personalisation itself versus the non-specific effects of the contextual factors involved in the tailoring process, such as therapeutic interaction. Here, we tested whether presenting a placebo analgesia machine as personalised would improve its effectiveness..… Conclusions. We present some of the first evidence that framing a sham treatment as personalised increases its effectiveness. Our findings could potentially improve the methodology of precision medicine research or inform practice.”

Introduction

“However, the greater effectiveness of tailored interventions may be due to more than just their pharmacological ingredients: contextual factors, such as the treatment setting and patient beliefs, may also directly contribute to better outcomes.”

In line with this, the use of the term "treatment effectiveness" might not be entirely adequate to refer to the placebo analgesia effects. A suggestion would be to qualify it as "perceived effectiveness", or to rephrase it as "Presenting a sham treatment as personalized increases the placebo effect in an RCT".

We have now revised the title to: “Presenting a Sham Treatment as Personalised Increases the Placebo Effect.”

Furthermore, there is no real "pain relief": as the figures show, there are changes in pain intensity ratings due to the summed effects of placebo and reduction in stimulation intensity.

We have used the term “pain relief” to refer to the reduced pain perception that follows from an intervention (real or placebo). If we understand correctly the reviewer’s comment, it seems that he would prefer to reserve the term “relief” to the “real” effects of an active treatment. We have now changed the term “pain relief” to “pain reduction” to refer to our findings.

To be clear, participants did experience pain reduction in the testing block of the procedure. During that part, we maintained the stimulation temperature at 50 level pain in the feasibility study and 40 level pain in the confirmatory study for all trials (when the machine was turned on and off) and for all participants. Therefore, any changes participants reported in their pain levels constituted pain reduction due to the placebo effect.

The true reduction in heat stimulation temperature happened only in the conditioning phase, which we did not include in the analyses.

Experimental design

Settings and sampling

I believe that the settings and sampling could have been improved. Even though the authors clearly state that volunteers were dropped if they asked about or mentioned the placebo effect, all participants are sampled from the McGill University community, and from the data, most are women who are also undergraduate students. Furthermore, it is fair to assume that the experiments were carried out in a laboratory at the Dept. of Psychology. In most circumstances (and for the feasibility study presented here) one would accept that an exploratory study might be carried out in these settings. However, the confirmatory study deals with a well-known psychological effect, is carried out in the Dept. of Psychology by psychologists, the volunteers are mostly sampled from psychology students, and the experiment has a large number of psychological questions related to the expectation of pain.

Thank you for this observation. We have now clarified that the testing settings were presented in a medical context. Indeed, the experimenters introduced themselves as neuroscience researchers; the study took place in a medical building that was located in a different building away from the psychology area.

Once at the lab, participants met two female experimenters at a medical building of a large Canadian university. The experimenters introduced themselves as neuroscience researchers and explained the study procedure.”

We also clarify that our sample for the confirmatory study was more diverse:

“We recruited 106 healthy participants aged 18 to 35 from the McGill University community; these were students and recent graduates from various disciplines.”

Furthermore, the "pain relief" device supposedly tailored to genetic information looks like technology from the 70s, when they could have just used the TENS stimulator that probably looks like cutting-edge, pain relief technology.

Interestingly, the authors list many of these parts of the study as strengths, but I believe some of them are limitations.

The placebo machine used in the study was indeed “vintage” and could have looked more modern. However, we used it due to some unique features that we could not find on most TENS machines widely available today. The machine is large and clearly visible from most angles in the testing room, attracting attention in a way that a compact TENS may not. Additionally, it prominently displays over a dozen dials and switches, which we used to saliently “personalise” it to each participant. This likely helped emphasise the personalisation aspect of the procedure, despite the machine’s less-than-modern exterior. Overall, it is unclear whether another setting would have been more or less effective, but we do have a reliable inter-group difference in placebo effects from using it.

I actually find it odd that only 12% of participants having graduate training in biology, genetics, or psychology would not behave differently in this experiment (more on these below in the results subsection), and we actually have no data on how the general population would behave.

Indeed, the suspicion levels of the machine were low, and only 12% of the participants mentioned that the study focused on the placebo effect when probed for suspicion, and were excluded based on our pre-registered criteria. Although counterintuitive, this finding is consistent with other studies using complex deception that is now referenced in the manuscript:

“Only 12% of the participants questioned the veracity of the machine and none guessed the purpose of the study in the confirmatory study, despite many participants having graduate training in biology, genetics, or psychology. This is in line with previous studies on complex deception using intentionally elaborate placebos (Olson et al., 2016, 2020, 2023).”

The authors could have run the experiment on real medical settings with medical doctors (as opposed to psychology researchers) and a level of complexity during testing that mimics clinical routine, no more and no less (the abundance of psychological questions is particularly notorious).

We completely agree and are currently running another study in a hospital setting on chronic pain patients and using lidocaine IV infusion as a real treatment. Given the ethical concerns associated with studying placebo effect in patients, we first tested the presence of placebo effect of personalisation with experimental pain. This allowed us to control for the specific setting and pain levels to better isolate the effects of our manipulation. The positive findings from this study have now justified testing it in clinical settings despite the ethical concerns and will allow us to understand whether the more varied clinical pain is also subject to placebo effects of personalisation. We hope that this study will clarify the clinical significance of placebo effect of personalisation in treatment settings.

Sham condition

If I followed the methodology correctly, the expected change in pain intensity in the conditioning stage of the feasibility study should be about 60 points for the feasibility study (80-20) and 40 points for the confirmatory study (60-20). The effect on pain intensity ratings is so large that participants should believe that this is nothing short of a miraculous machine (considering experimental settings, age, gender, and current health status). I suggest authors present data recorded on the observed effect in pain intensity and unpleasantness ratings during conditioning.

We have now included the data from the conditioning part of the study in the Appendix (Figure A3).

Following the experiment, participants proceed to the testing block, and the effect almost vanishes (an average of 5 points on pain intensity ratings, approximately 10% of the effect during the conditioning stage). I find it surprising that this fact apparently did not come up during the debriefing.

Indeed, several participants have mentioned that they thought the machine did not work for them at the end of the study; however, they did not doubt the veracity of the machine or the true nature of the study, and we thus kept them in the analyses.

Furthermore, debriefing is mentioned in the methods, but no results are reported in this regard.

We report the number of participants suspicious in the sample section of the study; however, we now also clarify the distribution of suspicion (and exclusion) by group:

“Of all participants, 1 did not complete the questionnaires which included the consent form, 6 did not fit eligibility criteria after consenting to participate, 1 experienced technical errors during the experiment, 1 refused to use the machine, and 12 mentioned or asked about the placebo effect (6 in each group).”

Interpretation of results

I assume that the results shown in Figure 3 and 4 are the change in pain intensity and unpleasantness ratings during the testing block, for which the temperature at both the On and Off conditions is calibrated at a pain intensity of 50. I suggest reporting data from the calibration procedure in more detail, in order to compare it with the extensive knowledge available for thermal pain testing (e.g. Moloney et al., 2012).

We report more detail on calibration procedure, including the average temperatures for each pain level for the confirmatory study:

“On average, participants reported the pain threshold of 45.9 °C (SD = 1.7), as well as 46.9 °C (1.3) for pain level 20, 47.8 °C (1.0) for pain level 40, and 48.5 °C (1.2) for pain level 60.”

Furthermore, in the absence of a placebo effect (or other psychological confounders derived from the experimental settings), one would expect ratings during On and Off testing to be distributed around 50 points. In this regard, I suggest authors plot not just the differences, but the actual recorded ratings (with lines linking the On and Off data for each subject, since these are within-subject repeated measurements), and analyze the results using the Off condition as covariate (Vickers, 2001).

In our analysis model, we used raw pain ratings for intensity and unpleasantness, but for ease of presentation we plotted the changes per participant. We now include additional graph panels for each study that plot the individual raw pain ratings.

In line with this, differences between On and Off testing are to be within the measurement error for pain intensity ratings due to thermal stimulation, with an average difference between conditions of zero (no bias), as observed in the control condition. Since the authors report a non-zero average difference (5.8/100) they attribute this to the placebo effect.

However, the settings for this experiment are quite particular: since it discusses "personalized" effects, it is important to pay attention to individual pain intensity and unpleasantness scores. From the figures, if one considers a binary scale, in which a difference of zero is no effect, any positive difference (regardless of its size) implies a placebo effect, and any negative difference (regardless of its size) implies a nocebo effect, then a significant number of participants is individually reporting nocebo effects in both conditions, i.e., they report more pain and unpleasantness during the On condition. This is surprisingly not mentioned anywhere in the text.

Thank you for your observation. We hesitate to discuss any non-zero difference between the “ON” and “OFF” conditions as indicative of a placebo or nocebo response. Indeed, we would expect that some variations would occur by chance – the difference will never be exactly zero – and we therefore cannot draw any strong conclusion at the individual level that someone is placebo or nocebo responder.

Nevertheless, we now mention the fact that not all participants experienced pain reductions in the “ON” condition.

“A number of participants in both groups also reported increases in pain ratings from using the machine.”

If, on the other hand, one is inclined to interpret these results as a group effect, the experimental/clinical relevance of the effect size must be considered, instead of just the statistical significance (Angst, 2017; de Vet et al., 2006). In this section of the Discussions, the authors should discuss the actual effect size (5.8/100) and its relation to the minimal clinically relevant effect. In my opinion, the detected group effect is too small to be considered clinically significant. The authors state "Our findings raise a question in the broad context of increasing interest in personalisation: just how big of a placebo effect is there from intervention tailoring? In this study, we show that the personalisation process was strong enough to influence the perception of thermal pain stimulations". While I agree that the personalization process somewhat influenced the perception of thermal pain thresholds at a group level, I also believe that this was only detectable with a relatively large sample, that the effect size is small (and in my opinion, not clinically significant), and that the implications of nocebo effects in individual pain ratings were not thoroughly analyzed and discussed.

Thank you for these interesting citations. Unfortunately, given our methodology, we are unable to use the clinical significance threshold suggested by Angst (2017). We do not have any qualitative measures of pain improvement in this study (e.g., the “slightly better” or the “about the same” qualifiers), but only numeric pain ratings. We agree that the effect sizes need to be discussed in context of clinical significance and expand on it below using medical recommendations.

Currently accepted IMMPACT recommendations for interpreting the importance of treatment effects make a clear distinction between what is meaningful at the individual versus group level (Dworkin et al., 2009). The typical guidelines when discussing clinically meaningful pain reduction usually refer to clinically meaningful difference at the individual level; these suggest that a reduction of 1/10 point or 10-20% represents a minimally important change. We observed an 11% reduction in pain intensity ratings and 16% reduction in pain unpleasantness, which places our results within this range at the individual level. Thus, even when using guidelines for assessing the importance of individual effects, it seems that our intervention is at least minimally clinically meaningful.

However, IMMPACT recommendations strongly suggest against using the same guidelines to judge the importance of group effects and instead recommends a case-by-case evaluation given the difference between clinical and experimental settings. Indeed, the therapeutic effect in clinical settings combines elements other than the active effects of the treatment, whereas the effects of randomised control trials only reveal the incremental effects of the active treatment compared to a control and are likely to be lower. Guidelines recommend a few strategies for assessing meaningfulness, such as comparing the effects to those of widely recognised treatments. Following these guidelines, the amplitude of our sham personalisation effects may be similar to the clinically significant effects of acetaminophen/paracetamol (Jürgens et al., 2014) or morphine (Koppert et al., 1999) on thermal pain perception. Our effect is particularly noteworthy given that the experimental and control condition were remarkably similar and only presented a subtle change in narrative of the placebo machine.

Thus, we believe that enhancing the personalisation aspects of treatment could have a clinically meaningful impact. However, we recognise that this remains to be tested in a clinical study.

We now include this additional paragraph in the discussion:

“Our effect was also small; the 11% reduction in pain intensity and 16% reduction in unpleasantness reached the lower threshold of minimal clinical significance of pain reduction (10 – 20%) suggested by guidelines (Dworkin et al., 2009). Nevertheless, testing placebo effects with experimental pain may have led to a conservative estimate of the placebo effect and may not map directly onto the clinical experience of chronic pain. Patients differ from healthy participants on many characteristics, including their motivation to get better(National Cancer Institute, 2021), the mechanisms through which they experience placebo effects (short- or long-term) (Vase et al., 2005), and the methods of assessing pain ratings (immediate versus retrospective). Our effect sizes were similar to that of paracetamol (Jürgens et al., 2014) and morphine (Koppert et al., 1999) on thermal pain, suggesting the possibility of clinical significance if tested in patients. Future studies could build on our proof-of-concept findings and explore whether these placebo effects apply to clinical populations who receive real personalised treatments focused on more objective measures. These additional investigations will help determine the clinical significance of placebo effect of personalisation for active treatments.

We have also changed the sentence mentioned above in the Ideas and Speculation section to reflect a narrower interpretation of the clinical significance of our data:

“Our results raise the possibility that placebo effects involved in personalisation may play a clinically relevant role in the broad context of the growing interest in precision medicine. In this study, we show that the personalisation process was strong enough to influence the perception of thermal pain.”

Reviewer #2:

This pre-registered confirmatory study provides experimental evidence that the 'personalisation' of a treatment represents an important contextual factor that can enhance the placebo analgesic effect. This is shown in a well-controlled, double-blind study, where a sham analgesic treatment is either presented as "personalized" or not in 85 healthy volunteers. The authors show that volunteers in the 'personalized' group display stronger placebo analgesic responses as compared to the control group (although it should be noted that overall no placebo analgesic effect is induced in the control group, which is somewhat surprising given the conditioning procedure used). No differences in side effect profiles were observed, however, side effects were generally low in both groups, so the absence of differences between groups might be explained by a floor effect.

Interestingly, these differences in placebo analgesia depending on 'personalisation' are not paralleled by significant differences in expectancy levels between groups neither before nor after conditioning. The authors also explore the role of different personality traits that may modulate the effect of 'personalisation' with interesting findings.

Overall, the manuscript is very well written, and the rationale of the study is clear and highly relevant, and innovative in the context of the increasing attempts to individually tailor treatments, also referred to as precision medicine. The methods of the study are sound and the conclusions of the authors are adequately backed up by the data.

These proof-of-concept findings obtained in healthy volunteers may have broad implications for the design and interpretation of clinical trials as well as systematic attempts to optimize treatment effects in clinical settings in the field of pain and beyond. Next studies have to test, whether the results translate into clinical contexts and beyond placebo analgesia.

Thank you for your comments.

The authors should indicate the amount of variance explained by the personality traits.

We now indicate the correlations of each personality trait with each group on the graphs in the main text and in the Appendix.

It should be acknowledged and discussed why no placebo effect was induced with the paradigm in the control group.

We now discuss this point in limitations in the context of potential mechanisms of action:

“The mechanism behind the placebo effect of personalisation may thus rely on an interaction with additional elements that need to be explored, for instance increases in mood from receiving a personalised treatment. It is also possible that the more complex mechanism is responsible for the general lack of placebo effect in the control group, but not the experimental group."

The authors may want to indicate in the limitation section that the majority of participants were female.

Indicated:

“Still, the main limitations of the study are its focus on healthy participants, the use of an inactive treatment, a sample with imbalanced genders, and the focus on a primarily subjective outcome of pain.”

Please also indicate the gender of the experimenters.

Indicated:

“Once at the lab, participants met two female experimenters introduced as neuroscience researchers at a medical building of a large Canadian university.”

Please indicate the proportion of participants who were suspicious of the nature of the study depending on the group.

Indicated:

“Of all participants, 1 did not complete the questionnaires which included the consent form, 6 did not fit eligibility criteria after consenting to participate, 1 experienced technical errors during the experiment, 1 refused to use the machine, and 12 mentioned or asked about the placebo effect (6 in each group).”

While the authors strictly follow the preregistered exclusion criteria which are good, it should at least be discussed that the final results may be prone to bias by excluding participants who did not develop the expectancy of analgesia, which is also a key outcome of the study. This would be particularly troublesome if the proportion of suspicious participants would differ between groups.

Thank you for the observation. Indeed, excluding participants who did not develop the expectancy of analgesia could increase bias and affect our results by artificially inflating the effect size. To avoid this issue, we strictly followed our pre-registered exclusion criteria: we only excluded participants who explicitly mentioned the placebo effect as the focus of the study with further elaboration. All other participants who merely thought the machine did not work for them or were disappointed by it were still included in the analyses reported here. We now clarify this in the main text:

“We were stringent with exclusion criteria to avoid positively biasing our effect: we only excluded participants who explicitly mentioned the placebo effect with additional explanations. For instance, one participant expressed general suspicion about stimulation timings and asked about placebo effects in the beginning of the session. The final sample included 85 participants (71 women) with a mean age of 21.4 (SD = 2.2).”

The authors may thus want to compliment an intention to treat like analysis showing the results of all participants.

We strictly followed our pre-registered exclusion criteria, thus reducing the possibility of bias and followed a per-protocol approach to analysis given the nature of our study. This is because intention-to-treat analysis is usually geared toward assessing the practical impact of the treatment in clinical settings and would be dealing with a different type of population. For instance, we are interested in the population that believes they are receiving a personalised treatment, as patients in clinical settings would. Therefore, participants who were suspicious of personalisation or the machine in the experimental study did not pass the essential manipulation check and cannot be comparable to the rest of the participants that represented our targeted population. This justified excluding them from the analyses. Were we to include all participants in our analyses (short of the technical malfunctions), all the p values would have increased to p >.05 for both measures. Nevertheless, this is not surprising, and there is little that we can conclude from this discrepancy in effects. Guessing the fact that the study focused on the placebo effect meant that these participants failed the crucial manipulation check of the experiment. They would therefore not be considered as the population of interest, which constitutes of people who trust a treatment they think is personalised to them.

We also suggest studying clinical populations in the discussion to better contextualise the relevance of the placebo effect for personalised medicine:

“Future studies could build on our proof-of-concept findings and explore whether these placebo effects apply to clinical populations who receive real personalised treatments focused on more objective measures.”

Reviewer #3:

This study highlights an important potential confounding factor that might affect a wide range of clinical studies into personalized medicine. The authors show that the psychological effect of believing a procedure is personalized can modulate the perceived pain and unpleasantness of that procedure. The study is carefully designed but the effect sizes identified are quite small compared to the large inter-individual variability seen in all groups, raising questions about the generalizability of the results to other contexts. Furthermore, the authors test the psychological effect of personalization only on a subjective outcome, whereas many clinical studies in personalized medicine are focused on more objective measures such as disease survival that may be less affected by patients' belief in personalization.

The study involves two replicates, the first being a pilot/feasibility study involving 17 subjects that found a marginally significant decrease (38%) in perceived pain intensity in the group believing their test was personalized. The second was a pre-registered, double-blinded, placebo-controlled confirmatory study involving 85 patients. This second study measured a smaller effect (11%) than the first, but with stronger statistical support. The second study also identified a small (16%) effect on the perceived "unpleasantness" of pain. The second study identified a personality trait, the "need for uniqueness" as weakly moderating the effect of sham personalization on pain perception, as well as several other personality traits. In data aggregated across both experiments, pre-conditioning expectations correlated with pain perception.

The study appears well-designed. However, the consistent effects of sham personalization are quite small compared to the large differences in pain perception within both control and personalized groups, raising questions about how generalizable the study results will be. Though the statistical analysis shows statistically significant p-values, the larger confirmatory study yielded effect sizes smaller than the initial pilot study. One wonders whether in a larger third study, that might include for example several hundred individuals, the effect size might be smaller still. It is unclear from the manuscript as currently written how clinically significant an 11% decrease in perceived pain intensity might be. The manuscript would benefit from a better framing of the magnitude of the effects identified so that readers can more clearly understand the effects' practical significance.

The authors spent considerable effort designing a test that controls for a variety of potential confounding factors, developing an intricate sham personalization machine with bespoke equipment. However, by necessity the study involved consistent differences in communication between sham treatment and placebo groups, leaving open the possibility for uncontrolled confounding factors. For example, the additional interaction between subjects and study staff in the sham group might have altered the subjects' impression of their environment-a friendlier relationship with staff, an improved mood due to perceived kindness factors that might modulate pain tolerance independently from any specific belief in personalization. The possibility that such potential confounds might mediate in part the decrease in subjective pain intensity weakens the generalizability of the results to other contexts.

That said, the authors interpret their findings fairly narrowly, suggesting that clinical trials should be aware of the psychological effects of personalization. This seems like a solid recommendation irrespective of any technical limitations of the current study. However, it must be noted that the authors study the effect of belief in personalization only on subjective outcome-perceived pain intensity. One wonders about the relevance of these results for clinical work in personalized medicine focused primarily on objective outcomes, which may be less influenced by patients' beliefs during treatment.

We now put our findings in context and discuss their clinical significance:

“Our effect was also small; the 11% reduction in pain intensity and 16% reduction in unpleasantness reached the lower threshold of minimal clinical significance of pain reduction (10 – 20%) suggested by guidelines (Dworkin et al., 2009). Nevertheless, testing placebo effects with experimental pain may have led to a conservative estimate of the placebo effect and may not map directly onto the clinical experience of chronic pain. Patients differ from healthy participants on many characteristics, including their motivation to get better(National Cancer Institute, 2021), the mechanisms through which they experience placebo effects (short- or long-term) (Vase et al., 2005), and the methods of assessing pain ratings (immediate versus retrospective). Our effect sizes were similar to that of paracetamol (Jürgens et al., 2014) and morphine (Koppert et al., 1999) on thermal pain, suggesting the possibility of clinical significance if tested in patients. Future studies could build on our proof-of-concept findings and explore whether these placebo effects apply to clinical populations who receive real personalised treatments focused on more objective measures. These additional investigations will help determine the clinical significance of placebo effect of personalisation for active treatments.

We also have marked the focus on subjective outcomes in the limitations:

“Still, the main limitations of the study are its focus on healthy participants, the use of an inactive treatment, a sample including predominantly women, and the focus on a primarily subjective outcome of pain. […] Future studies could explore whether placebo effect of personalisation applies to clinical populations and real treatments focused on more objective measures; it could also determine the magnitude of effect in clinical settings.”

I recommend that the authors frame better the magnitude of the effects identified so that readers can better understand their practical significance.

We have improved the framing of the magnitude of the effects throughout the discussion, given the current evidence on clinical and practical significance (see previous comment).

We have also elaborated on the usefulness of our findings for clinical trials of personalised medicine:

“Our findings provide some of the first evidence for this novel placebo effect of personalisation and suggest its further study in clinical contexts, echoing experts in the field (Haga et al., 2009). It also supports the need for more consistent use of blinding, inactive control groups, and randomisation, especially for pivotal trials determining FDA approval of precision drugs (Pregelj et al., 2018). Indeed, only half of the FDA-approved precision treatments in recent years were based on double- or single-blinded pivotal trials, and only 38% of all pivotal trials used a placebo comparator (Pregelj et al., 2018). Although precision treatments are often developed for difficult-to-study diseases, their potential to elicit stronger placebo effects calls for more robust research designs.”

I also recommend that the authors more explicitly the possibility that the experimental protocol for sham personalization might act not via a belief in personalization per se, but rather by modulating subjects' impression of their environment (more friendly) or by altering subjects' mood – two factors previously identified as modulating pain perception.

Thank you for your recommendation. We would like to clarify that we paid careful attention when designing the study to avoid this confound and dedicated the same amount of time and attention to the participants in both groups. We now specify it further:

“To match the duration of participant interaction and explanations provided in the personalised group in an effort to reduce potential confounds, the experimenter instead described the different kinds of analgesics currently used in hospitals. The experimenter provided approximately 300 words of information to each group (280 in experimental and 298 in control). Finally, the experimenter introduced the machine with the same description and demonstration.”

However, we also mention the potential role that mood may play as an additional unmeasured mechanism or mediator of the study in limitations:

“The mechanism behind these placebo effects may thus rely on an interaction with additional elements that need to be explored, for instance increases in mood caused by receiving a personalised treatment.”

https://doi.org/10.7554/eLife.84691.sa2

Article and author information

Author details

  1. Dasha A Sandra

    Integrated Program in Neuroscience, McGill University, Montreal, Canada
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    dasha.sandra@mail.mcgill.ca
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9930-2807
  2. Jay A Olson

    Department of Psychology, Harvard University, Cambridge, United States
    Present address
    University of Toronto Mississauga, Mississauga, Canada
    Contribution
    Conceptualization, Supervision, Funding acquisition, Methodology, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1161-5209
  3. Ellen J Langer

    Department of Psychology, Harvard University, Cambridge, United States
    Contribution
    Conceptualization, Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Mathieu Roy

    Department of Psychology, McGill University, Montreal, Canada
    Contribution
    Conceptualization, Supervision, Funding acquisition, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3335-445X

Funding

Social Sciences and Humanities Research Council of Canada (PT 93188)

  • Mathieu Roy
  • Jay A Olson

Genome Quebec (PT 95747)

  • Mathieu Roy
  • Jay A Olson

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank Mira Kaedbey, Naz Alpdogan, and Holly Bowman for assisting with data collection; Alain Al Bikaii for feedback on the manuscript; as well as Samuel Veissière, Michael Lifshitz, Jason da Silva Castanheira, and the Langer Mindfulness Lab for helpful suggestions. We also thank Ekaterina Rossokhata for inspiring interest in this research.

This study was funded through the Social Sciences and Humanities Research Council (SSHRC, PT 93188), and Genome Québec (PT 95747). JO acknowledges funding from the Fonds de Recherche du Quebec—Santé (FRQS). DS acknowledges funding from the Fonds de Recherche du Quebec Nature et Téchnologie (FRQNT).

Ethics

The study was approved by the McGill University Research Ethics Board II (#45-0619). All participants consented to participate and to their results to be a part of group analysis which were published upon completion of the study.

Senior Editor

  1. Christian Büchel, University Medical Center Hamburg-Eppendorf, Germany

Reviewing Editor

  1. José Biurrun Manresa, National Scientific and Technical Research Council (CONICET), National University of Entre Ríos (UNER), Argentina

Reviewer

  1. José Biurrun Manresa, National Scientific and Technical Research Council (CONICET), National University of Entre Ríos (UNER), Argentina

Version history

  1. Preprint posted: November 4, 2022 (view preprint)
  2. Received: November 4, 2022
  3. Accepted: May 12, 2023
  4. Version of Record published: July 5, 2023 (version 1)

Copyright

© 2023, Sandra et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,421
    Page views
  • 160
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Dasha A Sandra
  2. Jay A Olson
  3. Ellen J Langer
  4. Mathieu Roy
(2023)
Presenting a sham treatment as personalised increases the placebo effect in a randomised controlled trial
eLife 12:e84691.
https://doi.org/10.7554/eLife.84691

Share this article

https://doi.org/10.7554/eLife.84691

Further reading

    1. Medicine
    Erika Zodda, Olga Tura-Ceide ... Marta Cascante
    Research Article Updated

    Compelling evidence has accumulated on the role of oxidative stress on the endothelial cell (EC) dysfunction in acute coronary syndrome. Unveiling the underlying metabolic determinants has been hampered by the scarcity of appropriate cell models to address cell-autonomous mechanisms of EC dysfunction. We have generated endothelial cells derived from thrombectomy specimens from patients affected with acute myocardial infarction (AMI) and conducted phenotypical and metabolic characterizations. AMI-derived endothelial cells (AMIECs) display impaired growth, migration, and tubulogenesis. Metabolically, AMIECs displayed augmented ROS and glutathione intracellular content, with a diminished glucose consumption coupled to high lactate production. In AMIECs, while PFKFB3 protein levels of were downregulated, PFKFB4 levels were upregulated, suggesting a shunting of glycolysis towards the pentose phosphate pathway, supported by upregulation of G6PD. Furthermore, the glutaminolytic enzyme GLS was upregulated in AMIECs, providing an explanation for the increase in glutathione content. Finally, AMIECs displayed a significantly higher mitochondrial membrane potential than control ECs, which, together with high ROS levels, suggests a coupled mitochondrial activity. We suggest that high mitochondrial proton coupling underlies the high production of ROS, balanced by PPP- and glutaminolysis-driven synthesis of glutathione, as a primary, cell-autonomous abnormality driving EC dysfunction in AMI.

    1. Medicine
    Shengjie Li, Jun Ren ... Wenjun Cao
    Research Article

    Background:

    Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness in Asia, and no reliable, effective diagnostic, and predictive biomarkers are used in clinical routines. A growing body of evidence shows metabolic alterations in patients with glaucoma. We aimed to develop and validate potential metabolite biomarkers to diagnose and predict the visual field progression of PACG.

    Methods:

    Here, we used a five-phase (discovery phase, validation phase 1, validation phase 2, supplementary phase, and cohort phase) multicenter (EENT hospital, Shanghai Xuhui Central Hospital), cross-sectional, prospective cohort study designed to perform widely targeted metabolomics and chemiluminescence immunoassay to determine candidate biomarkers. Five machine learning (random forest, support vector machine, lasso, K-nearest neighbor, and GaussianNaive Bayes [NB]) approaches were used to identify an optimal algorithm. The discrimination ability was evaluated using the area under the receiver operating characteristic curve (AUC). Calibration was assessed by Hosmer-Lemeshow tests and calibration plots.

    Results:

    Studied serum samples were collected from 616 participants, and 1464 metabolites were identified. Machine learning algorithm determines that androstenedione exhibited excellent discrimination and acceptable calibration in discriminating PACG across the discovery phase (discovery set 1, AUCs=1.0 [95% CI, 1.00–1.00]; discovery set 2, AUCs = 0.85 [95% CI, 0.80–0.90]) and validation phases (internal validation, AUCs = 0.86 [95% CI, 0.81–0.91]; external validation, AUCs = 0.87 [95% CI, 0.80–0.95]). Androstenedione also exhibited a higher AUC (0.92–0.98) to discriminate the severity of PACG. In the supplemental phase, serum androstenedione levels were consistent with those in aqueous humor (r=0.82, p=0.038) and significantly (p=0.021) decreased after treatment. Further, cohort phase demonstrates that higher baseline androstenedione levels (hazard ratio = 2.71 [95% CI: 1.199–6.104], p=0.017) were associated with faster visual field progression.

    Conclusions:

    Our study identifies serum androstenedione as a potential biomarker for diagnosing PACG and indicating visual field progression.

    Funding:

    This work was supported by Youth Medical Talents – Clinical Laboratory Practitioner Program (2022-65), the National Natural Science Foundation of China (82302582), Shanghai Municipal Health Commission Project (20224Y0317), and Higher Education Industry-Academic-Research Innovation Fund of China (2023JQ006).