Prediction of type 2 diabetes mellitus onset using logistic regression-based scorecards

  1. Yochai Edlitz
  2. Eran Segal  Is a corresponding author
  1. Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Israel
  2. Department of Molecular Cell Biology, Weizmann Institute of Science, Israel
6 figures, 9 tables and 1 additional file


A flowchart of the cohort selection process and an illustrative figure of the model’s extraction.

(A) A flowchart of the selection process of participants in this study. We selected participants who came for a repeated second or third visit from the 502,536 participants of the UK Biobank (UKB). Next, we excluded 1652 participants who self-reported having type 2 diabetes (T2D). We then split the data into 80% for the training and validation sets and 20% for the holdout test set. We excluded an additional 2285 participants due to (1) having 25% or more missing values from the full feature list, (2) having HbA1c levels above or equal to 6.5%, or (3) being treated with metformin or insulin, (4) found to be diagnosed with T2D before the first UKB visit. The final training, validation, and test sets included 25,025 participants (56% of the cohort), 10,724 participants (24%), and 8960 participants (20%), respectively. (B) The process flow during the training and testing of the models. We first split the data and kept a holdout test set. We then explored several models using the training and validation datasets. We then compared the selected models using the holdout test set and reported the results. We calibrated the output of the models to predict the probability of a participant developing T2D.

Anthropometrics and blood tests scorecards.

(A) Anthropometrics-based scorecard. Summing the scores of the various features provides a final score that we quantified into one of three risk groups (figure 2C). (B) “Four blood test” scorecard. Adding the scores of the various features provides a final score that we quantified into one of four risk groups (Figure 2D). (C) Anthropometrics scorecards risk groups - first group score range [1-69] 1% [0.8-1%] 95%CI of developing T2D which is below the cohorts 1.8% prevalence of T2D (green dashed line); Second group, score range 70-78 predicts a 5% [3-6%] 95%CI of developing T2D; Third group 79-96 9% [7-12%] 95%CI of developing T2D. (D) four blood tests scorecards risk groups - first group score range [1-104] <0.5% [0.04-0.7%] 95%CI of developing T2D which is below the cohorts 1.8% prevalence of T2D (red dashed line); Second group, score range 105-116 predicts a 3% [2-4%] 95%CI of developing T2D.; Third group range 117-146 with 10% [8-12%] 95%CI of developing T2D. Fourth group range 147-162 predicts 23% [10-37%] 95%CI of developing T2D, which is X13 fold prevalence enrichment compared to the cohort’s T2D prevalence.

Main results calculated using 1000 bootstraps of the cohort population.

Each point in the graphs represents a bootstrap iteration result. The color legend is at the bottom of the figure. (A) Receiver operating characteristic (ROC) curves comparing the models developed in this research: a Gradient Boosting Decision Trees (GBDT) model of all features; logistic regression models of four blood tests; an anthropometry-based model compared to the well-established German Diabetes Risk Score (GDRS) and Finnish Diabetes Risk Score (FINDRISC). (B) Precision–recall (P-R) curves, showing the precision versus the recall for each model, with the prevalence of the population marked with the dashed line. (C) Deciles’ odds ratio graph, the prevalence ratio in each decile to the prevalence in the fifth decile. (D) A feature importance graph of the logistic regression anthropometry model for a model with normalized features values. The bars indicate the feature importance values’ standard deviation (SD). The top predictive features of this model are the body mass index (BMI) and waist-to-hip ratio (WHR). (E) Feature importance graph of logistic regression blood tests model with SD bars. While higher levels of HbA1c% positively contribute to type 2 diabetes (T2D) prediction, and high-density lipoprotein (HDL) cholesterol levels are negatively correlated with the predicted probability of T2D, the information provided by age and sex relevant for predicting T2D onset is screened by other features. (F) A calibration plot of the anthropometry, four blood tests, full blood test, and the FINDRISC models. Calibration of the models’ predictions allows reporting the probability of developing T2D (see ‘Methods’).

Figure 3—source data 1

Detailed results for the top to bottom quantiles OR calculation.
Figure 3—source data 2

Detailed coefficients for the non-laboratory logistic regression model.
Figure 3—source data 3

Detailed coefficients for the laboratory logistic regression model.
Models calibration plots.

Anthropometric, four blood tests, Finnish Diabetes Risk Score (FINDRISC), and German Diabetes Risk Score (GDRS) scorecards calibration graphs.

Appendix 1—figure 1
Models testing and training process.

Models’ development. Scheme of the models' exploration and evaluation process. For the models’ selection process, we used a fivefold cross-validation with 200 iterations of the random hyperparameters process for each group of features. We then selected the top-scored hyperparameters for each feature’s group. We trained a new model based on the training set and measured the area under the receiver operating curve (auROC) using the validation set. Out of the validated models, we chose the models that had a minimal number of features and provided high performance. The reported results are of the heldout test set.

Appendix 1—figure 2
Socioeconomic impact on prediction of risk of developing type 2 diabetes (T2D).

(A) Deprivation index differences between T2D sick and healthy populations in our data: a density histogram showing the differences in deprivation index of participants who were diagnosed with T2D in one of their returning visits to the assessment center and for healthy participants. Executing a Mann–Whitney test on this data yields a p-value lower than 2.37 * 10–137, indicating a correlation between lower socio-demographic state with higher T2D prevalence. (B) Shapley Additive Explanations (SHAP) analysis of the socio-demographic features for a Gradient Boosting Decision Trees (GBDT) predictor of T2D: Each dot represents a participant’s value for each feature along the Y-axis. The colors indicate the values of the features: red indicates higher feature values, blue indicates lower feature values. The X-axis is the SHAP value, where higher SHAP values indicate a stronger positive impact on the positive prediction of the GBDT predictor, that is, higher risk for T2D onset. The analysis indicates that higher values of deprivation index and lower household income push the probability of T2D onset to higher values. The full meaning of the codes is provided at the UK Biobank data showcase.


Table 1
Cohort statistical data.

Characteristics of this study’s cohort population and the UK Biobank (UKB) population. A ‘±’ sign denotes the standard deviation. While type 2 diabetes (T2D) prevalence in the UKB participants is 4.8%, it is 1.79% in our cohort as we screened the cohort at baseline for HbA1c% levels <6.5%. The age range of the participants at the first visit was 40–69; thus, our models are not suitable for people who develop T2D at younger ages. The models predict the risk of developing T2D between the first visit to the UKB assessment center and the last visit. We refer to this feature as ‘the time between visits’.

UKB populationTrain, validation, and test setsTest setTrain setValidation set
Number of participants502,53644,7098,96025,02510,724
Age at first visit (years)56.5 ± 8.155.6 ± 7.655.5 ± 7.555.6 ± 7.655.6 ± 7.6
Age at last visit (years)-62.9 ± 7.562.9 ± 7.462.9 ± 7.562.9 ± 7.5
The time between visits (years)-7.3 ± 2.37.3 ± 2.37.3 ± 2.37.3 ± 2.3
Males in the population (%)45.547.847.947.947.5
Diabetic at first visit (%)4.80000
Diabetic at last visit (%)-1.791.761.751.91
Hba1c at first visit (%)5.5 ± 0.65.3 ± 0.35.3 ± 0.35.3 ± 0.35.3 ± 0.3
Hba1c at last return (%)-5.4 ± 0.45.4 ± 0.35.4 ± 0.45.4 ± 0.4
Weight at first visit (kg)78.1 ± 15.976.6 ± 14.776.4 ± 14.676.7 ± 14.776.8 ± 14.9
Weight at last visit (kg)-76.2 ± 15.276.0 ± 14.976.2 ± 15.276.5 ± 15.3
Body mass index at first visit (kg/m2)27.4 ± 4.826.6 ± 4.226.5 ± 4.126.6 ± 4.226.7 ± 4.3
Body mass index at last visit (kg/m2)-26.6 ± 4.426.5 ± 4.326.6 ± 4.426.7 ± 4.5
Hips circumference at first visit (cm)103.4 ± 9.2102.1 ± 8.2101.9 ± 8.0102.1 ± 8.2102.3 ± 8.3
Hips circumference at last visit (cm)-101.6 ± 8.8101.4 ± 8.7101.6 ± 8.8101.8 ± 9.0
Waist circumference at first visit (cm)90.3 ± 13.587.9 ± 12.587.7 ± 12.487.9 ± 12.488.2 ± 12.7
Waist circumference at last visit (cm)-88.7 ± 12.788.5 ± 12.588.7 ± 12.789.0 ± 12.9
Height at first visit (cm)168.4 ± 9.3169.5 ± 9.2169.5 ± 9.1169.5 ± 9.2169.4 ± 9.1
Height at last visit (cm)-169.0 ± 9.2169.0 ± 9.2169.0 ± 9.3168.9 ± 9.2
Table 2
External validation cohort (‘Clalit’) statistical data.
Males (%)HbA1c (%)GGTReticulocyte countHDLTriglyceridesAgeWeightHeightBMI
Number of samples17,13217,1328317,13217,13217,13217,05117,05117,051
Mean value455.5632.3156.3349.77141.3356.4079.001.6628.72
Standard deviation0.4149.2936.9713.3382.098.0649.900.0919.58
  1. HbA1c, hemoglobin A1c; GGT, gamma-glutamyl transferase; HDL, high-density lipoprotein; BMI, body mass index.

Table 3
Comparing models' main results.

The values in parentheses indicate a 95% confidence interval (CI). The deciles’ odds ratio (OR) measures the ratio between type 2 diabetes (T2D) prevalence in the top risk score decile bin and the prevalence in the fifth decile bin (see ‘Methods’).

Measure typeModel typeAPSauROCDecile’s prevalence OR
GDRSScore card cox regression for 5 years0.04 (0.03–0.06)0.66 (0.62–0.70)2.5 (1.46–4.45)
FINDRISCScore card logistic regression0.04 (0.03–0.06)0.73 (0.69–0.76)4.13 (2.29–7.37)
AnthropometryScore card cox regression for 5 years0.04 (0.03–0.07)0.79 (0.75–0.83)8.8 (3.6–36)
AnthropometryScore card cox regression for 10 years0.06 (0.04–0.09)0.79 (0.76–0.82)10 (4.6–32.9)
AnthropometryScore card logistic regression0.07 (0.05–0.10)0.81 (0.77–0.84)17.2 (5–66)
AnthropometryLogistic regression0.09 (0.06–0.13)0.81 (0.78–0.84)16.9 (4.8–66)
AnthropometryCox regression0.10 (0.07–0.13)0.82 (0.79–0.85)10.7 (5–24)
Four blood testsScore card cox regression for 10 years0.13 (0.10–0.16)0.87 (0.85–0.90)22.4 (9.8–54)
Four blood testsLR score card0.13 (0.10–0.17)0.87 (0.85–0.90)48 (11.9–109)
Four blood testsScore card cox regression for 5 years0.09 (0.06–0.12)0.89 (0.86–0.92)53.2 (18.9–84.2)
Four blood testsCox regression0.25 (0.18–0.32)0.88 (0.85–0.90)43 (13.6–109)
Four blood testsLogistic regression0.24 (0.17–0.31)0.88 (0.85–0.91)32.5 (10.89–110)
Blood testsLogistic regression0.26 (0.19–0.33)0.91 (0.89–0.93)75.4 (17.7–133)
All featuresBoosting decision trees0.27 (0.20–0.34)0.91 (0.89–0.93)72.6 (15.1–135)
  1. APS, average precision score; auROC, area under the receiver operating curve; GDRS, German Diabetes Risk Score; FINDRISC, Finnish Diabetes Risk Score; DT, Decision Trees.

Table 4
Comparing model results applied to an HbA1c% stratified population.

The values in parentheses indicate 95% confidence interval (CI). Results of the models applied to a stratified population. The mixed population-based model column provides the results of the scorecard models presented in Figure 2 applied to normoglycemic and prediabetes stratified population.

PopulationMixed population-based model: tested on a stratified populationModels built using a stratified training set
(N = 1006,
prevalence = 9.4%)
GDRS0.64 (0.57–0.70)0.17 (0.12–0.23)-
FINDRISC0.66 (0.61–0.72)0.20 (0.14–0.27)-
Anthropometry0.73 (0.68–0.77)0.20 (0.15–0.26)0.73 (0.68–0.78)0.21 (0.16–0.27)
Four blood tests0.73 (0.68–0.77)0.20 (0.15–0.26)0.72 (0.67–0.77)0.21 (0.15–0.26)
(N = 7948,
prevalence = 0.8%)
GDRS0.67 (0.61–0.74)0.02 (0.01–0.03)-
FINDRISC0.74 (0.69–0.79)0.04 (0.02–0.07)-
Anthropometry0.81 (0.76–0.86)0.04 (0.02–0.07)0.81 (0.76–0.85)0.03 (0.02–0.06)
Four blood tests0.81 (0.76–0.85)0.03 (0.02–0.05)0.82 (0.77–0.86)0.05 (0.03–0.09)
  1. auROC, area under the receiver operating curve; FINDRISC, Finnish Diabetes Risk Score; GDRS, German Diabetes Risk Score; APS, average precision score.

Table 5
Four blood tests scorecard results from the external validation cohort (‘Clalit’).
LabelCohort sizePrevalence (%)APSauROC
Full population (HbA1c% < 6.5%)17,1324.10.11 (0.10–0.11)0.75 (0.74–0.75)
Normoglycemic population
(HbA1c% < 5.7%)
10,06420.04 (0.04–0.05)0.69 (0.66–0.69)
Prediabetes population
(5.7% = <HbA1c% < 6.5%)
70597.10.12 (0.12–0.13)0.68 (0.67–0.69)
  1. APS, average precision score; auROC, area under the receiver operating curve.

Appendix 1—table 1
Predicting using feature domain groups.

Results of Gradient Boosting Decision Trees (GBDT) models for various feature domains.

All features without genetic sequencing0.28 (0.20–0.36)0.92 (0.89–0.94)
All features0.27 (0.20–0.34)0.91 (0.89–0.93)
All blood tests0.28 (0.21–0.36)0.90 (0.88–0.93)
Four blood tests0.20 (0.14–0.27)0.88 (0.85–0.90)
Blood tests without HbA1c%0.13 (0.09–0.18)0.84 (0.81–0.87)
HbA1c%0.17 (0.12–0.23)0.84 (0.80–0.87)
Blood tests without HbA1c% nor glucose0.10 (0.07–0.13)0.82 (0.79–0.86)
Anthropometry0.07 (0.05–0.11)0.79 (0.75–0.82)
Lifestyle and physical activity0.05 (0.04–0.07)0.73 (0.69–0.77)
Blood pressure and heart rate0.05 (0.03–0.07)0.69 (0.64–0.73)
Nondiabetes-related medication0.04 (0.03–0.06)0.67 (0.62–0.73)
Mental health0.04 (0.03–0.06)0.67 (0.62–0.71)
Family and ethnicity0.04 (0.03–0.05)0.66 (0.60–0.71)
Diet0.04 (0.03–0.06)0.66 (0.60–0.71)
Socio-demographics0.03 (0.02–0.05)0.65 (0.60–0.70)
Early-life factors0.03 (0.02–0.05)0.64 (0.59–0.69)
Age and sex0.03 (0.02–0.04)0.61 (0.56–0.67)
Only genetics0.03 (0.02–0.04)0.57 (0.51–0.63)
  1. APS, average precision score; auROC, area under the receiver operating curve.

Appendix 1—table 2
Summary of Incremental feature’s model.

Comparison table of average precision score (APS) and area under the receiver operating curve (auROC) for the Gradient Boosting Decision Trees (GBDT) models, where each model includes the preceding model’s features plus an additional feature domain. The largest increase in prediction accuracy was the result of adding the HbA1C% feature, which is also a biomarker for type 2 diabetes (T2D) diagnosis. Adding the DNA sequencing data did not significantly contribute to the prediction power of the model.

Age and sex0.03 (0.02–0.04)0.61 (0.56–0.67)
HbA1c%0.17 (0.12–0.23)0.84 (0.80–0.87)
Four blood tests0.20 (0.14–0.27)0.88 (0.85–0.90)
All blood tests0.28 (0.21–0.36)0.90 (0.88–0.93)
Adding anthropometrics0.23 (0.17–0.30)0.90 (0.87–0.92)
Adding physical health DT0.28 (0.21–0.36)0.91 (0.89–0.93)
Adding lifestyle DT0.24 (0.18–0.32)0.91 (0.88–0.93)
Adding blood pressure and heart rate0.25 (0.19–0.33)0.91 (0.88–0.93)
Adding non-T2D-related medical diagnosis0.24 (0.18–0.32)0.91 (0.88–0.93)
Adding mental health0.28 (0.20–0.36)0.91 (0.89–0.93)
Adding medication0.28 (0.20–0.35)0.91 (0.89–0.93)
Adding diet0.24 (0.18–0.31)0.91 (0.89–0.93)
Adding family-related information0.28 (0.21–0.35)0.91 (0.89–0.94)
Adding early-life factors0.24 (0.17–0.31)0.91 (0.89–0.93)
Adding socio-demographic0.27 (0.20–0.36)0.92 (0.89–0.94)
Adding genetics0.27 (0.20–0.34)0.91 (0.89–0.93)
Author response table 1
Comparing models main results.
LabelModel typeAPSauROCDeciles prevalence odds ratio
GDRS SAScoreboard0.04 (0.03-0.06)0.66 (0.62-0.70)11 (3.8-38)
FINDRISC LRScoreboard0.04 (0.03-0.06)0.73 (0.69-0.76)33(9.6-67)
AnthropometryLogistic regression0.09(0.06-0.13)0.82(0.78-0.84)54(18-80)
AnthropometryCox regression0.10(0.07-0.13)0.82(0.79-0.85)69(27-89)
Four blood testsScoreboard0.13(0.10-0.17)0.87(0.85-0.90)96(79-115)
Four blood testsCox regression0.25(0.18-0.32)0.88(0.85-0.90)101(84-121)
Four blood testsLogistic regression0.24(0.17-0.31)0.88(0.85-0.91)104(84-125)
Blood testsLogistic regression0.26(0.19-0.33)0.91(0.89-0.93)116(95-138)
All features DTBoosting decision trees0.27(0.20-0.34)0.91(0.89-0.93)117(98-139)
Author response table 2
LabelModel typeAPSauROCDecile’s prevalence odds ratio
AnthropometrySA Scoreboard 5yrs0.04 (0.03-0.07)0.79 (0.75-0.83)8.8 (3.6-36)
AnthropometrySA Scoreboard 10yrs0.06 (0.04-0.09)0.79 (0.76-0.82)10 (4.6-32.9)
AnthropometryScoreboard0.07 (0.05-0.10)0.81 (0.77-0.84)17.2 (5-66)
AnthropometryLogistic regression0.09 (0.06-0.13)0.81 (0.78-0.84)16.9 (4.8-66)
AnthropometryCox regression0.10 (0.07-0.13)0.82 (0.79-0.85)10.7 (5-24)
Four blood testsSA Scoreboard 10yrs0.13 (0.10-0.16)0.87 (0.85-0.90)22.4 (9.8-54)
Four blood testsScoreboard0.13 (0.10-0.17)0.87 (0.85-0.90)48 (11.9-109)
Four blood testsSA Scoreboard 5yrs0.09 (0.06-0.12)0.89 (0.86-0.92)53.2 (18.9-84.2)
Four blood testsLogistic regression0.24 (0.17-0.31)0.88 (0.85-0.91)32.5 (10.89-110)
Four blood testsCox regression0.25 (0.18-0.32)0.88 (0.85-0.90)43 (13.6-109)

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Yochai Edlitz
  2. Eran Segal
Prediction of type 2 diabetes mellitus onset using logistic regression-based scorecards
eLife 11:e71862.