Development, validation and application of a machine learning model to estimate salt consumption in 54 countries

Abstract
Data availability
Article and author information
Metrics

Abstract

Global targets to reduce salt intake have been proposed but their monitoring is challenged by the lack of population-based data on salt consumption. We developed a machine learning (ML) model to predict salt consumption at the population level based on simple predictors and applied this model to national surveys in 54 countries. We used 21 surveys with spot urine samples for the ML model derivation and validation; we developed a supervised ML regression model based on: sex, age, weight, height, systolic and diastolic blood pressure. We applied the ML model to 54 new surveys to quantify the mean salt consumption in the population. The pooled dataset in which we developed the ML model included 49,776 people. Overall, there were no substantial differences between the observed and ML-predicted mean salt intake (p<0.001). The pooled dataset where we applied the ML model included 166,677 people; the predicted mean salt consumption ranged from 6.8 g/day (95% CI: 6.8-6.8 g/day) in Eritrea to 10.0 g/day (95% CI: 9.9-10.0 g/day) in American Samoa. The countries with the highest predicted mean salt intake were in Western Pacific. The lowest predicted intake was found in Africa. The country-specific predicted mean salt intake was within reasonable difference from the best available evidence. A ML model based on readily available predictors estimated daily salt consumption with good accuracy. This model could be used to predict mean salt consumption in the general population where urine samples are not available.

Data availability

This study used nationally-representative survey data that are in the public domain, which was requested through the online repository (https://extranet.who.int/ncdsmicrodata/index.php/home). We provide the analysis code of data preparation and data analysis as supplementary materials to this paper (Source Code File - "Analysis Code | Python and R")

The following previously published data sets were used

1. World Health Organization
(2004) STEPwise Approach to NCD Risk Factor Surveillance (STEPS)
NCD Microdata Repository.

https://extranet.who.int/ncdsmicrodata/index.php/catalog

Article and author information

Author details

Wilmer Cristobal Guzman-Vilca

Universidad Peruana Cayetano Heredia, Lima, Peru

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-2194-8496
Manuel Castillo-Cara

Universidad de Lima, Lima, Peru

Competing interests
The authors declare that no competing interests exist.
Rodrigo M Carrillo-Larco

Imperial College London, London, United Kingdom

For correspondence
rcarrill@ic.ac.uk

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-2090-1856

Funding

Wellcome Trust (214185/Z/18/Z)

Rodrigo M Carrillo-Larco

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.