Introduction

Endoscopic retrograde cholangiopancreatography (ERCP) is widely performed as an important diagnostic and therapeutic procedure for pancreaticobiliary diseases. ERCP-related procedures are relatively risky among endoscopic procedures. The high-risk adverse events of ERCP include duodenal perforation and bleeding after endoscopic sphincterotomy (EST) and post-ERCP pancreatitis (PEP). The rate of PEP occurrence is 3.1-13.0% (Andriulli et al., 2007, Freeman et al., 1996, Glomsaker et al., 2013, Katsinelos et al., 2014, Kochar et al., 2015, Loperfido et al., 1998). PEP can even become life-threatening. The fatality rate of PEP is 0.1–0.7% (Andriulli et al., 2007, Kochar et al., 2015). Therefore, the decision to perform ERCP should be made carefully, considering each patient’s risk factors for PEP.

To predict an individual patient’s PEP risk, five scoring systems have been devised (Chiba et al., 2021, DiMagno et al., 2013, Friedland et al., 2002, Fujita et al., 2021, Zheng et al., 2020). The first risk scoring system for PEP occurrence was established in 2002. In that study, pain during the procedure, pancreatic duct cannulation, a history of PEP, and the number of cannulation attempts were identified as risk factors for PEP. After the first scoring system was reported, each new scoring system used risk factors that were extracted by multivariate analyses. These included various patient characteristics before ERCP and postprocedural risk factors. As postprocedural risk factors, precut sphincterotomy and difficult cannulation were proposed, but it is difficult to predict these risk factors and to determine the PEP risk before ERCP. Thus, a new prediction scoring system for PEP before ERCP is desirable. If the risk of PEP could be predicted before ERCP, then the expert endoscopist can perform ERCP from the start, and high-PEP-risk procedures (for example, precut sphincterotomy, multiple cannulation attempts, inadvertent pancreatic duct cannulation) can be avoided (Testoni et al., 2010, Wang et al., 2009). If biliary cannulation without the use of at least one high-PEP-risk procedure is difficult, other treatments (for example, percutaneous transhepatic biliary drainage (PTBD) or endoscopic ultrasound (EUS)-guided biliary drainage (EUS-BD)) could be considered.

Therefore, we aimed to establish a PEP prediction model using only risk factors that can be gathered before ERCP. Our model was developed and validated with multicenter data from Japan.

Methods

We performed a multicenter retrospective study at six institutions in Japan. This study was approved by the institutional review board of Fukushima Medical University and that of each partner medical institution. All patients agreed to undergo ERCP after providing written consent.

Patients

Among 2,176 patients who underwent ERCP between November 2020 and October 2022, 2,074 were enrolled in this study. The other 102 patients were excluded for the following reasons: past history of choledochojejunostomy, acute pancreatitis, choledochoduodenal fistula, difficulty finding the Vater papilla, past history of pancreatojejunostomy, or past history of pancreatogastrostomy (Figure 1).

Flowchart of the inclusion criteria. ERCP, endoscopic retrograde cholangiopancreatography.

Study design

We randomly sampled 50% of the patients as the development cohort and 50% as the validation cohort (Figure 1). In the development cohort, we established a risk scoring system for predicting PEP before ERCP, which was named the support for PEP reduction model (SuPER model). The validation cohort was used to confirm the effectiveness of the scoring system. PEP diagnosis and severity were assessed according to Cotton’s criteria (Cotton et al., 1991). Patients who experienced abdominal pain and had hyperamylasemia (more than three times the normal upper limit) at least 24 hours after ERCP were diagnosed with PEP. Mild PEP was defined as pancreatitis that required prolongation of the planned hospitalization by 2-3 days. Moderate PEP was defined as pancreatitis that required 4-10 days of hospitalization. Severe PEP was defined as pancreatitis that required more than 10 days of hospitalization or intervention or hemorrhagic pancreatitis, phlegmon, or pseudocysts.

To establish the risk score, the risk factors for PEP were investigated using the data from the development cohort. To determine the PEP risk score, factors that might be associated with PEP occurrence were investigated. To predict the PEP risk score before ERCP, factors related to patient characteristics and previously scheduled procedures, as reported in past studies, were selected. The patients’ risk factors included age < 50 years, female sex, a past history of pancreatitis, a past history of PEP, a past history of gastrectomy, pancreatic cancer, intraductal papillary mucinous neoplasm (IPMN), a native papilla of Vater, absence of chronic pancreatitis (CP), normal serum bilirubin (≤ 1.2 mg/dl), and periampullary diverticulum (Ding et al., 2015, Freeman et al., 2001, Freeman et al., 1996, Fujita et al., 2022, Fujita et al., 2021, Masci et al., 2003, Wang et al., 2009, Williams et al., 2007, Zheng et al., 2020). Pancreatic divisum was excluded from the patient risk factor list because pancreatic divisum was observed in only two patients. Pancreatic calcification and a diameter of the main pancreatic duct > 3 mm were considered to indicate CP (Beyer et al., 2023, Sarner and Cotton, 1984).

These imaging findings were confirmed by CT, MRI, or EUS before ERCP. The CT and MRI findings were reviewed by radiologists. IPMN was diagnosed according to the results of CT, MRI, and EUS. As pre-ERCP prophylaxes for PEP, gabexate or nafamostat, intravenous hydration, and NSAID suppositories were used (Fujita et al., 2022). As planned procedure-related risk factors, EST, endoscopic papillary balloon dilation (EPBD), endoscopic papillary large balloon dilation (EPLBD) using a ≥ 12 mm balloon catheter (Itoi et al., 2018), biliary stone removal, ampullectomy, biliary stent material (plastic stent, self-expandable metallic stent (SEMS), or covered SEMS (CSEMS)), inside stent placement, and procedures on the pancreatic duct were evaluated (Freeman et al., 2001, Freeman et al., 1996, Harewood et al., 2005, Kato et al., 2022, Masci et al., 2003, Masci et al., 2001, Testoni et al., 2010, Williams et al., 2007). A biliary stent above the Vater papilla was also assessed as a prophylactic measure against PEP (Ishiwatari et al., 2013).

To demonstrate the independence of the established risk classification, the relationship between it and intraprocedural PEP risk factors (including precut sphincterotomy and inadvertent pancreatic duct cannulation) (Testoni et al., 2010, Wang et al., 2009) were investigated. Because of the retrospective nature of the data, the exact cannulation times and the number of cannulation attempts were not available.

Therefore, multiple cannulation attempts and a prolonged cannulation time could not be investigated as intraprocedural PEP risk factors.

Sample size

The primary aim of this study was to establish a PEP prediction model that could be used to calculate a risk score before ERCP. To construct a prediction model by logistic regression analysis, 10 events per explanatory variable were needed (Wynants et al., 2015). Seven variables were evaluated in the development cohort, so 70 PEP patients were required. Five variables were evaluated in the validation cohort, so 50 PEP patients were necessary for it. According to a previous systematic review, the rate of PEP occurrence was 9.7% (Kochar et al., 2015). Therefore, at least 722 and 521 patients were included in the development and validation cohorts, respectively.

Statistical analysis

In the development cohort, univariate and multivariate logistic regression analyses were performed to identify the risk factors for PEP. The factors that had a p value < 0.10 in the univariate analysis were included in the multivariate analysis. To construct the scoring system for PEP risk, the factors with p < 0.10 in the multivariate analysis were ultimately included in the risk score model. The factors selected in the multivariate analysis were assigned points according to the regression coefficient (each variable’s risk points = the ratio of the variable’s regression coefficient/minimum regression coefficient). The sum of the assigned points was calculated for each patient, and the patients were classified into three groups (low risk, moderate risk, and high risk) according to the expected rate of PEP occurrence (Friedland et al., 2002). The risk classification system (SuPER model) was also applied to the validation cohort.

With respect to both the development and validation cohorts, the effectiveness of the risk score model was evaluated as follows. The correlations between the risk score, risk classification and PEP occurrence were evaluated by the Cochran–Armitage trend test. The predictive accuracy of the risk score was assessed using the C statistic. The goodness of fit of the model was evaluated using the Hosmer‒Lemeshow test. The independence of the established risk classification from the unexpected intraprocedural PEP risk factors was assessed by multivariate logistic regression analyses.

Patients with missing data for variables selected in the risk score model were removed from the final cohort.

Statistical analyses were performed using EZR version 1.62 (Saitama Medical Centre, Jichi Medical University, Saitama, Japan) and SPSS version 26.0 (IBM Corp., Armonk, NY, USA). A p value < 0.05 indicated statistical significance.

Results

Patient characteristics and ERCP outcomes in each cohort

The patient characteristics and ERCP outcomes in each cohort are shown in Table 1. A total of 1037 patients were assigned to each of the development and validation cohorts, including 70 (6.8%) and 64 (6.2%) patients diagnosed with PEP, respectively. The pre-ERCP prophylactic measures used at each hospital differed, and not all patients received prophylaxis.

Comparison of patient characteristics and ERCP outcomes between the development and validation cohorts.

Construction of the PEP risk scoring system

According to the univariate analyses, age < 50 years, female sex, IPMN, a native papilla of Vater, pancreatic calcification, EST, and procedures on the pancreatic duct had p values < 0.10 (Table 2). According to the multivariate analysis, female sex, IPMN, a native papilla of Vater, pancreatic calcification, and procedures on the pancreatic duct had p values < 0.10. These factors were assigned risk points according to their respective regression coefficients.

Logistic regression analysis of predictive factors for PEP in the development cohort.

The risk score of each patient was calculated as the total of that patient’s risk points and ranged from -2 to 7 points (Table 3). The risk score was found to be correlated with PEP occurrence (p < 0.01, Cochran–Armitage trend test). The patients were classified as low (≤ 0 points), moderate (1-3 points), or high risk (4-7 points) for PEP according to the risk score. The PEP rates were 0% (0/327) among the low-risk patients, 5.5% (27/492) among the moderate-risk patients, and 20.2% (39/193) among the high-risk patients. The risk classification was correlated with PEP occurrence (p < 0.01, Cochran–Armitage trend test).

Patient distribution in terms of risk score and classification.

The C statistic of the risk score model was sufficiently high at 0.77 (95% confidence interval (CI) 0.72-0.82) (Table 4). The goodness of fit of the risk score model was also confirmed by the Hosmer–Lemeshow test (p = 0.59).

Goodness of fit of the risk score model.

Validation of the PEP risk scoring system

The risk score was associated with PEP occurrence in the validation cohort (p < 0.01, Cochran–Armitage trend test) (Table 3). We found that 2.4% (8/331) of the patients at low risk, 5.3% (27/513) of those at moderate risk, and 18.0% (29/161) of those at high risk experienced PEP. The risk classification was also correlated with PEP occurrence in the validation cohort (p < 0.01, Cochran–Armitage trend test).

The C statistic of the risk score was 0.71, which was also high in the validation cohort (Table 4). The PEP risk score model showed good fitness according to the Hosmer–Lemeshow test (p = 0.40). According to the above results, the preprocedural PEP risk could be calculated, as shown in Figure 2.

Example of the preprocedural PEP risk checklist. ERCP, endoscopic retrograde cholangiopancreatography. IPMN, intraductal papillary mucinous neoplasm; PEP, post-ERCP pancreatitis.

Risk classification and unexpected PEP risk factors

The relation between the established risk classification and intraprocedural PEP risk factors is shown in Appendix 1—table 1. In all patients, the development cohort, and the validation cohort, the risk classification was significantly associated with the occurrence of PEP. On the other hand, precut sphincterotomy and inadvertent pancreatic duct cannulation were not significantly associated with the occurrence of PEP.

Discussion

In this multicenter study, we created a risk scoring system (the SuPER model) using five items that could be measured before performing ERCP. With this score, PEP occurrence could be accurately predicted to some degree. Besides, the established PEP risk classification was associated with PEP occurrence independently from unpredictable intraprocedural PEP risk procedures.

This risk scoring and classification of PEP has several advantages. First, the score is calculated using only five items, all of which can be easily assessed by medical interviews and imaging (for example, CT). One scoring system included sphincter of Oddi dysfunction (SOD) as a test item (DiMagno et al., 2013). The diagnosis of SOD requires sphincter of Oddi manometry and fulfilment of the criteria for biliary pain, but sphincter of Oddi manometry is not widely used (Cotton et al., 2016). The diagnostic criterion for biliary pain included 8 items, and that for SOD included 15 items. Among the items of the SuPER risk scoring system, pancreatic calcification was assigned -2 points. Its low weighting could be explained by the following. The international conceptual model of CP can be divided into four stages: acute pancreatitis–recurrent acute pancreatitis, early CP, established CP, and end-stage CP (Whitcomb et al., 2016). Established CP patients have already passed the acute pancreatitis–recurrent acute pancreatitis course, and pancreatic calcification has been reported in established CP patients. Acinar dysfunction has also been observed in these patients (Whitcomb et al., 2016). Therefore, patients with pancreatic calcification may have a lower incidence of PEP.

Second, the SuPER risk score can be determined before the ERCP procedure, as the established risk classification was found to be the sole significant factor predicting the occurrence of PEP independent from intraprocedural PEP risk factors. As described in the Background section, precut sphincterotomy, multiple cannulation attempts, and a cannulation time greater than 10 minutes were identified as high risk factors that cannot be accounted for prior to ERCP (Testoni et al., 2010, Wang et al., 2009). Although the established PEP risk classification was independent from the included intraprocedural risk factors (precut sphincterotomy and inadvertent pancreatic duct cannulation), detailed data on the number of cannulation attempts and the cannulation time were not available. Therefore, to avoid intraoperative procedures associated with a high risk of PEP occurrence, an expert endoscopist can initially perform ERCP for high-PEP-risk patients. In addition, PEP prophylaxis can be administered beforehand for high-PEP-risk patients. As effective prophylaxes for PEP, rectal NSAIDS and pancreatic stent placement have been reported (Elmunzer et al., 2008, Murray et al., 2003, Sugimoto et al., 2019). In this report, rectal NSAID use was not identified as a significant factor preventing PEP. One reason for this is that in past reports describing the use of rectal NSAIDs to prevent PEP, patients at high risk for PEP were often treated (Elmunzer et al., 2008). In contrast, this study included all patients who underwent ERCP. Another reason might be the difference in dose. One hundred milligrams of rectal diclofenac was used in past reports, whereas 12.5-50 mg of diclofenac was used in this study. In Japan, the approved diclofenac dose covered by insurance is 50 mg or less, with the dose being typically lower for elderly patients.

Therefore, diclofenac doses of 12.5-50 mg were prescribed by the doctors depending on the age and size of the patients. Pancreatic stent placement itself is one of the procedures performed on the pancreatic duct and was a higher-risk procedure for PEP than endoscopic biliary procedures without an approach to the pancreatic duct (Appendix 2—table 2). Moreover, pancreatic stent placement has become a prophylactic treatment for PEP in patients who have undergone pancreatography or wire placement to the pancreatic duct (Mazaki et al., 2014, Sugimoto et al., 2019). As described above, pancreatic stent placement was performed along with high-risk-PEP procedures (i.e., guidewire placement to the pancreatic duct or pancreatography); therefore, pancreatic stent placement was grouped together with the other endoscopic retrograde pancreatography procedures as “procedures on the pancreatic duct”.

This study has several limitations. First, the study was retrospective, and there were missing data. However, the results reported are trustworthy. The percentage of patients who did not meet the inclusion criteria was not more than 5%, and the percentage of missing data was not over 1%. As described in the Materials and Methods section, patients with missing data for the variables selected in the risk score model were removed from the final cohort. The reliability of the SuPER risk score model was also statistically confirmed. Second, some factors cannot be assessed before ERCP. Additional procedures could be conducted during ERCP, and unplanned pancreatography is often performed in patients who are scheduled for endoscopic cholangiography or biliary treatment. However, the established PEP risk classification was independent from the included intraprocedural risk factors. A planned procedure for accessing the pancreatic duct is listed in the SuPER risk model. Therefore, we can predict the SuPER risk score and classification of patients regardless of whether they have undergone pancreatic duct procedures. Third, this study was performed in a single country. Validation studies over wider geographic regions are necessary.

In conclusion, a simple and useful PEP scoring system (SuPER model) with only five clinical items was developed in this multicenter study. This scoring system may aid in predicting and explaining PEP risk and in selecting appropriate prophylaxes for PEP and endoscopic pancreaticobiliary procedures for each patient.

Acknowledgements

We thank all the staff at the Department of Gastroenterology of Fukushima Medical University, the Department of Endoscopy of Fukushima Medical University Hospital, the Department of Gastroenterology of Fukushima Rosai Hospital, the Department of Gastroenterology of Aizu Medical Center, Fukushima Medical University, the Department of Gastroenterology of Ohtanishinouchi Hospital, Koriyama, the Department of Gastroenterology of Fukushima Redcross Hospital, the Department of Gastroenterology of Soma General Hospital, the Department of Gastroenterology of Saiseikai Fukushima General Hospital, and the Gastroenterology Ward of Fukushima Medical University Hospital. We also thank American Journal Experts for providing English language editing services.

Additional information

Competing interests

The authors declare that they have no competing interests to report.

Funding

None.

Author contributions

M.S. wrote the paper, designed and performed the research. T.T. wrote the paper, designed and oversaw the research. T.S., H.S., G.S., Y.S., Y.N., Y.T., Y.N., R.K., H.I., H.A., N.K., Y.W., and H.A. performed the research. R.S. provided clinical advice. T.H. supervised the report. H.O. supervised the report and the writing of the paper. All authors read and approved the final manuscript.

Ethics

The study protocol was reviewed and approved by the Institutional Review Board of Fukushima Medical University (Number 2453). The analysis used anonymous clinical data obtained after all the participants agreed to treatment by written consent, so patients were not required to give informed consent for the study. The details of the study can be found on the homepage of Fukushima Medical University.

Data availability

The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.