A modular approach to integrating multiple data sources into real-time clinical prediction for pediatric diarrhea

  1. Ben J Brintz  Is a corresponding author
  2. Benjamin Haaland
  3. Joel Howard
  4. Dennis L Chao
  5. Joshua L Proctor
  6. Ashraful I Khan
  7. Sharia M Ahmed
  8. Lindsay T Keegan
  9. Tom Greene
  10. Adama Mamby Keita
  11. Karen L Kotloff
  12. James A Platts-Mills
  13. Eric J Nelson
  14. Adam C Levine
  15. Andrew T Pavia
  16. Daniel T Leung  Is a corresponding author
  1. Division of Epidemiology, Department of Internal Medicine, University of Utah, United States
  2. Division of Infectious Diseases, Department of Internal Medicine, University of Utah, United States
  3. Population Health Sciences, University of Utah, United States
  4. Division of Pediatric Infectious Diseases, University of Utah, United States
  5. Institute of Disease Modeling, Bill and Melinda Gates Foundation, United States
  6. International Centre for Diarrhoeal Disease Research, Bangladesh, Bangladesh
  7. Centre Pour le Développement des Vaccins-Mali, Mali
  8. Division of Infectious Disease and Tropical Pediatrics, University of Maryland, United States
  9. Division of Infectious Diseases and International Health, University of Virginia, United States
  10. Departments of Pediatrics, University of Florida, United States
  11. Departments of Environmental and Global Health, University of Florida, United States
  12. Department of Emergency Medicine, Brown University, United States
  13. Division of Microbiology and Immunology, Department of Internal Medicine, University of Utah, United States
5 figures, 4 tables and 1 additional file

Figures

Figure 1 with 2 supplements
Temperature in The Gambia over study period with (blue) trend line from LOESS (locally estimated scatterplot smoothing).
Figure 1—figure supplement 1
The black line represents a 2-week rolling average of daily viral etiology rates over time.

The purple and green lines represent the prior 2-week average of daily rain and temperature averages.

Figure 1—figure supplement 2
The black line represents a 2-week rolling average of daily viral etiology rates over time.

The purple and green lines represent the prior 2-week average of daily rain and temperature averages.

Figure 2 with 1 supplement
Histograms with overlaid estimated kernel densities (dashed lines) of predicted values obtained from logistic regression on patient training data.

The left graph represent other known etiologies and the right graph represent viral etiologies. The dashed lines do not represent standardized density heights so the heights for V = 0 and V = 1 should not be compared from this graph.

Figure 2—figure supplement 1
Contour plots of two-dimensional kernel densities of predicted values obtained from logistic regression on GEMS climate and seasonality data in Mali.

The right graph represents viral etiologies and the left graph represents other known etiologies.

The steps for fitting prediction models and calculating the post-test odds for within rolling-origin-recalibration evaluation.
Figure 4 with 2 supplements
AUC’s and confidence intervals for post-test odds used in the 80% training and 20% testing iteration.

'PresPtnt' refers to the predictive model using the presenting patient’s information. 'Pre-test' refers tot he use of pre-test odds based on prior patients’ predictive models. 'Climate' refers to the predictive model using aggregate local weather data. 'Seasonal' refers to the predictive model based on seasonal sine and cosine curves. 'Joint' refers to the two-dimensional kernel density estimate from the Seasonal and Climate predictive models.

Figure 4—source data 1

AUC’s and confidence intervals for post-test odds used in the 80%training and 20%testing iteration.

https://cdn.elifesciences.org/articles/63009/elife-63009-fig4-data1-v1.csv
Figure 4—figure supplement 1
AUC’s and confidence intervals for tests used in within rolling-origin-recalibration evaluation.

Individual plot titles show the proportion of data used in training.

Figure 4—figure supplement 2
AUC’s and confidence intervals for tests used in the leave-one-site-out evaluation.

Pre-test refers to the use of prior patient predictions. Individual plot titles show the site left out of training.

ROC curves from validation from 80% training set.

Curves shown for three models with additional diagnostics.

Tables

Table 1
Rank of variable importance by average reduction in the mean squared prediction error of the response using Random Forest regression.

Greyed rows are variables that would be accessible for providers in LMICs at the time of presentation. Table 1 is reproduced from Brintz et al., 2020, PLoS Negl Trop Dis., published under the Creative Commons Attribution 4.0 International Public License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/).

Viral etiology
Variable nameVariance reduction
Age51.6
Season29.0
Blood in stool26.1
Height-for-age Z-score24.7
Vomiting23.0
Breastfeeding22.0
Mid-upper arm circumference20.9
Respiratory rate18.5
Wealth index18.3
Body Temperature16.7
Table 2
AUC results by site using 80% of data for training and 20% of data for testing of the top two models.

PresPtnt refers to the model fit using presenting patient information.

CountryTest set sizeFormulaAUC (95% CI)
Kenya79Pre-test * PresPtnt0.65 (0.53–0.77)
PresPtnt * Seasonal0.66 (0.54–0.78)
PresPtnt0.63 (0.51–0.75)
Mali88Pre-test * PresPtnt0.74 (0.61–0.86)
PresPtnt * Seasonal0.78 (0.66–0.89)
PresPtnt0.75 (0.62–0.87)
Pakistan108Pre-test * PresPtnt0.81 (0.72–0.89)
PresPtnt * Seasonal0.8 (0.72–0.88)
PresPtnt0.81 (0.73–0.89)
India119Pre-test * PresPtnt0.84 (0.76–0.91)
PresPtnt * Seasonal0.85 (0.78–0.92)
PresPtnt0.81 (0.74–0.89)
The Gambia80Pre-test * PresPtnt0.89 (0.82–0.96)
PresPtnt * Seasonal0.87 (0.79–0.94)
PresPtnt0.78 (0.67–0.88)
Mozambique66Pre-test * PresPtnt0.88 (0.79–0.97)
PresPtnt * Seasonal0.9 (0.82–0.98)
PresPtnt0.77 (0.66–0.89)
Bangladesh141Pre-test * PresPtnt0.91 (0.82–1)
PresPtnt * Seasonal0.93 (0.88–0.99)
PresPtnt0.95 (0.92–0.99)
Table 3
AUC and 95% confidence intervals from 80% training set after adding an additional point-of-care diagnostic test with specified sensitivities (Se.) and specificities (Sp.) to the current patient test and pre-test odds.

Additionally, + and - refer to our model indicating a true positive or false positive, respectively, based on the threshold for each model which achieves a 0.90 or 0.95 specificity. Only patients who were prescribed/given antibiotics are included in the count.

Specificity=0.90Specificity=0.95
ModelAddl. diag. (Se.,Sp.)Auc (95% CI)True +False +True +False +
Pre-test * PresPtntNone0.839 (0.809–0.869)88296016
(0.7, 0.7)0.876 (0.849–0.902)102317816
(0.7, 0.95)0.933 (0.914–0.952)1323112316
(0.9, 0.95)0.972 (0.960–0.984)1543414718
PresPtnt * SeasonalNone0.830 (0.798–0.861)70255411
(0.7, 0.7)0.870 (0.842–0.897)101276814
(0.7, 0.95)0.931 (0.912–0.951)1302712116
(0.9, 0.95)0.971 (0.959–0.984)1543014918
PresPtntNone0.809 (0.776–0.842)66314115
(0.7, 0.7)0.857 (0.827–0.886)98336816
(0.7, 0.95)0.925 (0.904–0.946)1293311718
(0.9, 0.95)0.968 (0.955–0.981)1533414918
Table 3—source data 1

Frequency table of pathogens in which the post-test odds formulation with varying specifity (Sp.) chosen have false positives.

https://cdn.elifesciences.org/articles/63009/elife-63009-table3-data1-v1.docx
Table 4
Average AUC’s from one-dimensional and two-dimensional kernel density estimates (KDE) when the post-test odds conditional independence assumption is broken.

The table shows the factor (γ) used to simulate induced conditional dependence between two covariates and their average conditional correlation. Additionally, it shows the average AUC resulting from a post-test odds model where a one-dimensional kernel density estimate (conditional independence assumed) is generated for each covariate, and a post-test odds model where a two-dimensional joint kernel density estimate is derived for the two covariates.

AUC
γcor(X,YZ)1D-KDE2D-KDE
-2.000−0.8940.7250.830
-1.000−0.7090.7580.828
-0.500−0.4460.8240.838
0.0000.0020.8380.836
0.5000.4480.8360.836
1.0000.7080.8310.840
2.0000.8940.8100.836

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ben J Brintz
  2. Benjamin Haaland
  3. Joel Howard
  4. Dennis L Chao
  5. Joshua L Proctor
  6. Ashraful I Khan
  7. Sharia M Ahmed
  8. Lindsay T Keegan
  9. Tom Greene
  10. Adama Mamby Keita
  11. Karen L Kotloff
  12. James A Platts-Mills
  13. Eric J Nelson
  14. Adam C Levine
  15. Andrew T Pavia
  16. Daniel T Leung
(2021)
A modular approach to integrating multiple data sources into real-time clinical prediction for pediatric diarrhea
eLife 10:e63009.
https://doi.org/10.7554/eLife.63009