Development, validation, and application of a machine learning model to estimate salt consumption in 54 countries

  1. Wilmer Cristobal Guzman-Vilca
  2. Manuel Castillo-Cara
  3. Rodrigo M Carrillo-Larco  Is a corresponding author
  1. School of Medicine Alberto Hurtado, Universidad Peruana Cayetano Heredia, Peru
  2. CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Peru
  3. Sociedad Científica de Estudiantes de Medicina Cayetano Heredia (SOCEMCH), Universidad Peruana Cayetano Heredia, Peru
  4. Universidad de Lima, Peru
  5. Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, United Kingdom

Abstract

Global targets to reduce salt intake have been proposed, but their monitoring is challenged by the lack of population-based data on salt consumption. We developed a machine learning (ML) model to predict salt consumption at the population level based on simple predictors and applied this model to national surveys in 54 countries. We used 21 surveys with spot urine samples for the ML model derivation and validation; we developed a supervised ML regression model based on sex, age, weight, height, and systolic and diastolic blood pressure. We applied the ML model to 54 new surveys to quantify the mean salt consumption in the population. The pooled dataset in which we developed the ML model included 49,776 people. Overall, there were no substantial differences between the observed and ML-predicted mean salt intake (p<0.001). The pooled dataset where we applied the ML model included 166,677 people; the predicted mean salt consumption ranged from 6.8 g/day (95% CI: 6.8–6.8 g/day) in Eritrea to 10.0 g/day (95% CI: 9.9–10.0 g/day) in American Samoa. The countries with the highest predicted mean salt intake were in the Western Pacific. The lowest predicted intake was found in Africa. The country-specific predicted mean salt intake was within reasonable difference from the best available evidence. An ML model based on readily available predictors estimated daily salt consumption with good accuracy. This model could be used to predict mean salt consumption in the general population where urine samples are not available.

Editor's evaluation

Salt intake is a major determinant of volume status, blood pressure values, and congestion, but its estimation is challenging because of the need of measuring 24-h urinary sodium excretion over a number of days, which is unfeasible in most countries. The demonstration of the feasibility of estimating accurately salt intake at the population level using artificial intelligence starting from simple and widely available variable is therefore important for epidemiological and intervention studies in which salt intake is a major player, particularly, but not only, in countries experiencing economic hardships.

https://doi.org/10.7554/eLife.72930.sa0

Introduction

The association between high sodium/salt intake and high blood pressure, a major risk factor of cardiovascular diseases (CVDs), is well-established (He et al., 2013; World Health Organization, 2021a; Poggio et al., 2015). More than 1.7 million CVD deaths were attributed to a diet high in sodium in 2019, with ~90% of these deaths occurring in low- and middle-income countries (LMICs) (GBD 2019 Risk Factors Collaborators, 2020; GBD Results Tool, 2021). Consequently, salt reduction has been included in international goals: the World Health Organization (WHO) recommendation of limiting salt consumption to <5 g/day (World Health Organization, 2021a), and the agreement by the WHO state members of a 30% relative reduction in mean population salt intake by 2025 (WHO. World Health Organization, 2021). Because available evidence suggests that sodium/salt consumption is higher than the global targets (Powles et al., 2013; Carrillo-Larco and Bernabe-Ortiz, 2020; Oyebode et al., 2016) we need timely and consistent data of sodium/salt consumption in the general population to track progress of salt reduction targets.

Global efforts have been made to produce comparable estimates of sodium/salt intake for all countries (Powles et al., 2013). Similarly, researchers have summarized all the available evidence in specific world regions (Carrillo-Larco and Bernabe-Ortiz, 2020, Oyebode et al., 2016). Although the global endeavor was based on the gold standard method to assess sodium/salt intake (i.e., 24 hr urine sample), their estimates were up to 2010 (Powles et al., 2013). Therefore, robust and comparable sodium/salt intake estimates for all countries lack for the last 10 years. The regional endeavors summarized population-based evidence, yet they conducted study-level meta-analyses in which the original studies could have followed different laboratory methods, and they did not study all countries in the region. Therefore, comparability across studies could be limited and evidence lacks for many countries. Finding a method to estimate sodium/salt consumption in national samples leveraging on available data is needed to update and complement the existing evidence (Powles et al., 2013; Carrillo-Larco and Bernabe-Ortiz, 2020; Oyebode et al., 2016; Thout et al., 2019). Quantifying sodium/salt intake based on 24 hr urine samples is costly and burdensome, limiting its use in population-based studies or national health surveys. As an alternative, equations have been developed to estimate sodium/salt intake based on spot urine (SU) samples (Brown et al., 2013; Kawasaki et al., 1993; Toft et al., 2014; Tanaka et al., 2002). Although these equations may not deliver identical results to those based on 24 hr urine samples at the individual level, at the population level the difference between SU samples and 24 hr samples appears to be small (Huang et al., 2016; Santos et al., 2020). However, these equations have been used in few WHO STEPS and other national health surveys (World Health Organization, 2021b), leaving several countries without data to quantify the local sodium/salt consumption because they do not have access to SU samples (World Health Organization, 2021c).

If we could (accurately) estimate sodium/salt intake at the population level based on variables that are routinely available in national health surveys (e.g., weight or blood pressure), mean sodium/salt intake at the population level in countries that currently lack urine data (i.e., 24 hr or spot) could be computed using these available predictors. Advanced analytic techniques like machine learning (ML) could make accurate predictions and inform about the mean sodium/salt intake at the population level. We developed an ML predictive model to estimate mean salt intake at the population level (not at the individual level) using routinely available variables in national health surveys. We applied this ML model to other national health surveys without urine data to compute the mean salt intake in the general population.

Results

Study population for model derivation and validation

The pooled dataset included 49,776 people from 21 surveys in 19 countries (i.e., two countries, Bhutan and Mongolia, had two surveys) conducted between 2013 and 2019 (Appendix 1—table 1). Overall, the mean age ranged from 33 (95% confidence interval [95% CI]: 33–34) years in Zambia to 43 (95% CI: 42–44) years in Belarus. The proportion of men ranged from 35.7% in Tonga to 61.4% in Solomon Islands. The mean SBP was lowest in Jordan (117.7 mmHg [95% CI: 115.7–119.8 mmHg]) and highest in Belarus (134.6 mmHg [95% CI: 133.6–135.5 mmHg]). The mean DBP was lowest in Chile (73.6 mmHg [95% CI: 72.5–74.6 mmHg]) and highest in Belarus (84.9 mmHg [95% CI: 84.4–85.5 mmHg]). The mean weight ranged from 54.6 kg (95% CI: 53.8–55.5 kg) in Nepal to 98.6 kg (95% CI: 97.7–99.5 kg) in Tonga. The mean height ranged from 1.55 m (95% CI: 1.55–1.56 m) in Nepal to 1.71 m (95% CI: 1.70–1.71 m) in Tokelau.

Observed and predicted mean salt intake during the ML model derivation and validation

In the test dataset including 20 WHO STEPS surveys and one national health survey (Chile) (i.e., 21 surveys in total), the observed mean salt intake computed as per the INTERSALT equation was higher in men than in women in all countries; it ranged from 8.5 g/day (95% CI: 8.2–8.8 g/day; Zambia) to 10.4 g/day (95% CI: 10.1–10.7 g/day; Azerbaijan) in men and from 6.8 g/day (95% CI: 6.7–6.8 g/day; Turkmenistan) to 8.3 g/day (95% CI: 8.0–8.6 g/day; Malawi) in women. Across countries, the predicted mean salt intake was also higher in men than in women. Results for each survey are presented in Figure 1 and Appendix 1—table 2.

Observed and predicted mean salt intake (g/day) by sex in each survey included in the machine learning (ML) model development.

Exact estimates (along with their 95% CI) are presented in Appendix 1—table 2. These results were computed with the test dataset only. Results are for the HuR algorithm, which was the model with the best performance.

The mean observed salt intake was higher in people aged ≥30 years (7.9 g/day vs. 8.4 g/day, p<0.05 for independent t-test), and so was for people with raised blood pressure (≥140/90 mmHg) (8.7 g/day vs. 8.2 g/day, p<0.05). The mean salt consumption was also different across body mass index (BMI) categories (p<0.05 for ANOVA test). The same profile was found for predicted mean salt intake (Appendix 1—table 3).

In men across all countries in the test dataset including 20 WHO STEPS surveys (representing 18 countries) and 1 national health survey (Chile), the mean difference between observed and predicted mean salt intake was –0.02 g/day (p<0.001 for paired t-test). Across all surveys, the positive mean difference farthest from zero was 0.54 g/day (Nepal, p<0.001 for paired t-test), and the negative mean difference farthest from zero was –1.31 g/day (Tonga, p<0.001 for paired t-test). The mean difference closest to zero was –0.03 g/day (Morocco, p=0.308 for paired t-test) (Appendix 1—table 4).

In women across all countries in the test dataset including 20 WHO STEPS surveys (representing 18 countries) and 1 national health survey (Chile), the mean difference between the observed and predicted mean salt intake was 0.01 g/day (p<0.001 for paired t-test). The positive mean difference farthest from zero was 1.23 g/day (Malawi, p<0.001 for paired t-test) and the negative mean difference farthest from zero was in –1.22 g/day (Tonga, p<0.001 for paired t-test). The mean difference closest to zero was 0.01 g/day (Armenia, p=0.195 for paired t-test) (Appendix 1—table 4).

None of the countries herein analyzed, regardless of the method of sodium intake assessment (i.e., observed or predicted), showed a mean salt intake below the WHO recommended level of <5 g/day (Figure 1, Appendix 1—table 2). The same occurred for the mean salt intake estimates using the Kawasaki, Toft, and Tanaka formulas (Appendix 1—table 5).

Implementation of the developed ML model to predict salt consumption in 54 countries

The pooled dataset where we applied the ML model included 166,677 people from 54 countries in 54 WHO STEPS surveys conducted between 2004 and 2018 (Appendix 1—table 6). Overall, the mean age ranged from 31 (95% CI: 31–32) years in Ethiopia to 43 (95% CI: 40–47) years in Barbados. The proportion of men ranged from 17.2% in Eritrea to 63.8% in Timor-Leste. The mean SBP was lowest in Cambodia (116.2 mmHg [95% CI: 115.6–116.9 mmHg]) and highest in Mozambique (138.7 mmHg [95% CI: 136.3–141.0 mmHg]). The mean DBP was lowest in Cambodia (72.4 mmHg [95% CI: 71.8–73.0 mmHg]) and highest in Kyrgyzstan (86.8 mmHg [95% CI: 85.9–87.8 mmHg]). The mean weight ranged from 51.8 kg (95% CI: 51.2–52.4 kg) in Eritrea to 100.4 kg (95% CI: 100.1–100.8 kg) in American Samoa. The mean height ranged from 1.54 m (95% CI: 1.54–1.55 m) in Lao People’s Democratic Republic to 1.70 m (95% CI: 1.70–1.71 m) in British Virgin Islands.

Across the 54 countries, the overall predicted mean salt intake ranged from 6.8 g/day (95% CI: 6.8–6.8 g/day) in Eritrea to 10.0 g/day (95% CI: 9.9–10.0 g/day) in American Samoa. The mean was always higher in men than in women. None of the countries herein analyzed, regardless of sex, showed a predicted mean salt intake below the WHO recommended level of <5 g/day (Figure 2, Appendix 1—table 7).

Predicted mean salt intake (g/day) by sex in each of the 54 national surveys included in the application of the model herein developed.

Exact estimates (along with their 95% CI) are presented in Appendix 1—table 7. Countries are presented in ascending order based on their overall mean salt intake (i.e., countries with the highest mean salt intake are at the bottom).

In men, the countries with the highest predicted mean salt intakes were Nauru (11.0 g/day), American Samoa and Cook Islands (both with 10.9 g/day), and Niue and Tuvalu (both with 10.4 g/day); remarkably, all of these countries are in the Western Pacific. In contrast, the lowest predicted mean salt intake in men was in Eritrea (8.3 g/day), Ethiopia (8.5 g/day), and Niger (8.6 g/day); remarkably, all of these countries are in Africa.

In women, the countries with the highest predicted mean salt intake were American Samoa (9.0 g/day), Nauru (8.8 g/day), and Cook Islands and Tuvalu (both with 8.7 g/day); all of these countries are in the Western Pacific. Conversely, the lowest predicted mean salt intake in women was in Eritrea (6.5 g/day), Ethiopia (6.6 g/day), and Niger (6.7 g/day); all of these countries are in Africa.

Discussion

Main findings

This work leveraged on 21 national health surveys and readily available predictors to develop an ML model to predict salt consumption; this model was then applied to national surveys in 54 countries. It should be noted that we analyzed SU samples. These are not the gold standard to assess salt consumption. Results should be interpreted in light of this limitation, considering that our model aimed to deliver estimates at the population level (not individual level) (Huang et al., 2016; Santos et al., 2020). The HuR ML algorithm yielded the predictions closest to the observed salt intake: the mean difference between predicted and observed salt consumption across surveys was –0.02 g/day in men and 0.01 g/day in women. We used this novel ML model to predict salt consumption in 54 countries, where the mean salt consumption ranged from 8.3 g/day (Eritrea) to 11.0 g/day (Nauru) in men; these numbers in women ranged from 6.5 g/day (Eritrea) to 9.0 g/day (American Samoa). This work aimed to elaborate on novel analytical tools to predict salt consumption where national surveys have not collected this information, limiting their ability to keep track of mean sodium consumption in the general population. Pending external independent validation, our model could be used in monitoring frameworks of salt consumption because most countries do not collect sodium samples in their national health surveys. Our model could contribute to the global surveillance of salt consumption, a relevant cardiometabolic risk factor (He et al., 2013; World Health Organization, 2021a; Poggio et al., 2015).

Public health implications

ML models have been used extensively to predict relevant clinical outcomes (e.g., mortality) and epidemiological indicators (e.g., forecasting COVID-19 cases) (Wang et al., 2020; Wynants et al., 2020; Groot et al., 2021; Watson et al., 2021; Mohan et al., 2021). Furthermore, ML algorithms have proven to be useful for understanding complex outcomes (e.g., identifying clusters of people with diabetes) based on simple predictors (e.g., BMI) in nationally representative survey data (Oh et al., 2019; García de la Garza et al., 2021; Carrillo-Larco et al., 2021). Our work complements the current evidence on ML algorithms by demonstrating its use in a relevant field: population salt consumption. In so doing, we delivered a pragmatic tool that could be used to inform the surveillance of salt consumption in countries where national surveys do not objectively collect this information (e.g., SU samples). Moreover, this work provided preliminary evidence to update the global estimates of population-based sodium consumption (Powles et al., 2013) by informing about the mean sodium consumption in 54 countries. Our results suggest that mean salt consumption is above the WHO recommended level in all the 54 countries herein analyzed, and it was the highest among countries in the Western Pacific, and the lowest among countries in Africa. This finding, which is consistent with a global work (Powles et al., 2013), calls for urgent actions to reduce salt consumption in these 54 countries, especially those in the Western Pacific.

We do not believe that our – or any other – ML model should replace a comprehensive population-based nationally representative health survey with 24 hr or SU samples. However, until such surveys are available in many countries and periodically conducted, we could suggest using an estimation approach to shed lights about the mean salt consumption in the population. Our ML model seems to be a reasonably good alternative and could become a pragmatic tool for surveillance systems that keep track of sodium consumption in accordance with global goals (World Health Organization, 2021a; WHO. World Health Organization, 2021).

Research in context

A global effort provided mean sodium/salt consumption estimates for 187 countries in 1990 and 2010 (Powles et al., 2013); they used 24 hr urine samples and dietary reports from surveys conducted in 66 countries. Unfortunately, their results were until 2010. Our results advanced this evidence by providing more recent salt consumption estimates because most of the surveys in which we applied our ML model were conducted after 2010 (Appendix 1—table 6).

Compared to the global estimates for the same countries in 2010, (Powles et al., 2013), our mean salt consumption estimates were very similar. For example, our 2010 mean salt consumption estimates for Cambodia, Eritrea, and the Gambia were 7.8 g/day, 6.8 g/day, and 8.1 g/day, whereas the estimates by Powles et al., 2013 were 11.0 g/day, 5.9 g/day, and 7.7 g/day (Appendix 1—table 8; Powles et al., 2013). We further compared our estimates for surveys conducted between 2007 and 2013 (±3 years around 2010) with the 2010 estimates provided by Powles et al., 2013, and our results were also within reasonable difference. The largest differences were in Tajikistan (8.5 by our ML model vs. 13.5 by Powles et al., 2013), as well as in Kyrgyzstan (8.6 vs. 13.4 by Powles et al., 2013) and Samoa (9.5 vs. 5.2 by Powles et al., 2013). It appears that our predictions were higher than those provided by Powles et al., 2013 in countries with presumably low salt consumption (e.g., Samoa); conversely, in countries with presumably high salt consumption (e.g., Kyrgyzstan), our predictions revealed smaller estimates than those by Powles et al., 2013 (Appendix 1—table 8). These differences could be explained by the fact that our ML model was developed based on SU samples rather than 24 hr urine samples as Powles et al., 2013 did. Strong evidence indicates that estimates based on SU may overestimate salt intake at lower levels of consumption and underestimate salt intake at higher levels of consumption (Huang et al., 2016).

In addition to the global work by Powles et al., 2013, there are other reports from some specific countries. For example, a survey conducted between 2012 and 2016 with 24 hr urine samples in Fiji and Samoa showed that the mean salt consumption was 10.6 g/day and 7.1 g/day, respectively (Santos et al., 2019). The estimates from our ML model for Fiji (2011) and Samoa (2013) suggested that the mean salt consumption was 8.7 g/day and 9.5 g/day, respectively. A survey in Vanuatu in 2016 based on 24 hr urine sample informed that the mean salt intake was 5.9 g/day (Paterson et al., 2019); our estimate for the year 2011 was 8.4 g/day. In 2009 in Vietnam, a survey with SU samples revealed that the mean salt consumption was 9.9 g/day (Jensen et al., 2018); our prediction for the year 2015 was 7.9. These comparisons suggest that our ML-predicted estimates are plausible and close to the best available evidence.

Although these comparisons do not validate our predictions in the 54 national surveys, they suggest that our salt consumption estimates are within reasonable distance from the best available evidence. Until better data are available (e.g., national survey with spot or 24 hr urine sample), our model could provide preliminary evidence to inform the national mean salt consumption. Careful interpretation is warranted to understand the strengths and limitations of our ML-based predictions.

Strengths and limitations

We followed sound and transparent methods to develop an ML model to predict salt consumption at the individual level. We leveraged on open-access national data collected following standard and consistent protocols (World Health Organization, 2021b; Departamento de Epidemiologia. Ministerio de Salud, 2021). Most of the surveys we analyzed were conducted after 2010, providing more recent evidence than the latest global effort to quantify salt consumption in all countries (Powles et al., 2013). Notwithstanding, we must acknowledge some limitations. First and foremost, urine data was based on a spot sample, which is not the gold standard (24 hr urine sample) to measure daily salt consumption. Future work should verify and advance our results using on 24 hr urine samples available in nationally representative samples; in the meantime, our work has led the foundations and hopefully sparked interest to use available data and novel analytical techniques to deliver estimates of salt consumption in the general population. While SU samples may not be the best approach to estimate salt consumption at the individual level, at the population level the means estimated based on SU samples and 24 hr urine samples are similar (Huang et al., 2016; Santos et al., 2020). Therefore, the limitation of using SU samples only may have had little impact on our mean estimates, which are the country level, not at the individual level. While this – reanalysis of SU sample rather than 24 hr urine samples – is a limitation of our work, it is also an observation showing the lack of nationally representative surveys with 24 hr urine samples available for independent reanalyses. Second, even though we analyzed 21 national surveys (representing 19 countries) to develop our ML model, the sample size could still be limited for a data-driven ML algorithm (i.e., 24,889 observations were included in model development). A larger and global work in which all relevant data sources are pooled is needed; while this endeavor takes place, our work has provided recent estimates of salt consumption at the population level in 54 countries. In this line, there are still countries that were not herein included. Researchers in these countries, along with local (e.g., ministries of health) and international health authorities (e.g., WHO), should conduct studies/surveys to collect data on salt consumption. This would inform global targets but also local needs and interventions.

An ML model based on readily available variables was accurate to predict daily salt consumption. This ML model applied to 54 national surveys with no urine samples to compute daily salt consumption revealed high levels of salt intake particularly in the Western Pacific region. Pending further validation, this ML model could be used to keep track of the overall sodium consumption where resources are not available to conduct national surveys with urine samples.

Methods

Study design

This is an individual-level data pooling ML analysis.

Data sources

We sought surveys that met these two criteria: (i) nationally representative health surveys (i.e., community or subnational surveys were not included); and (ii) surveys that were open access or that could be accessed without significant administrative burden (e.g., data sharing agreements that may involve institutional signatures).

First, we downloaded 20 WHO STEPS surveys and 1 national health survey with SU samples; these surveys were used for the training, validation, and testing of the ML model. These 21 surveys represented 19 countries; two countries contributed with two surveys: Bhutan 2014 and 2019 as well as Mongolia 2013 and 2019. Second, we downloaded 54 new WHO STEPS surveys that had the variables included in the ML prediction model (see ‘Variables’ section), but did not have SU samples. The ML model herein developed was applied to these 54 surveys to estimate the mean salt consumption in the population.

To identify additional data sources, we searched the original publications included in one global analysis (Powles et al., 2013) and three systematic reviews about sodium/salt consumption at the population level (Carrillo-Larco and Bernabe-Ortiz, 2020; Oyebode et al., 2016; Thout et al., 2019). This search led to the identification of the national health survey included in the model derivation. All other data sources included in those references (Powles et al., 2013; Carrillo-Larco and Bernabe-Ortiz, 2020; Oyebode et al., 2016; Thout et al., 2019) did not meet our selection criteria.

In conclusion, our ML model was developed based on 21 surveys (20 WHO STEPS and 1 national health survey). Then, our ML model was applied to 54 WHO STEPS survey to compute the mean daily salt consumption at the population level.

According to the World Bank classification (Appendix 1—table 9), there were 9 high-income countries (2 in model derivation and 7 in model application), 16 low-income countries (1 in model derivation and 15 in model application), 26 lower-middle-income countries (9 in model derivation and 17 in model application), and 18 upper-middle-income countries (6 in model derivation and 12 model application). There were four countries (one in model derivation and three in model application) without income classification (British Virgin Islands, Cook Islands, Niue, and Tokelau).

Rationale

We hypothesized that an ML model could accurately predict salt consumption at the individual level, to then inform the overall mean in the underlying population. In addition, we endeavored to develop an ML model with simple predictors; that is, variables that are routinely available in national health surveys contrary to urine sample that are seldom collected in national health surveys. If the model were indeed accurate, then it could be applied to national surveys without urine samples but with the relevant predictors to inform about the mean salt consumption in the overall population. These model-driven estimates could be preliminary until a national health survey is conducted to study mean salt consumption with urine samples. Ideally, salt consumption should be informed by 24 hr urine samples, which are seldom available in large population-based and nationally representative health surveys. The fact that we analyzed SU samples is a limitation of our work, and the results should be interpreted accordingly. However, we aimed to develop an ML model that can be used to predict mean estimates at the population level, not at the individual level. In other words, our model should not be applied to a patient to estimate his/her salt consumption. We did not develop a diagnostic tool to replace SU or 24 hr urine samples. Our model should be applied to survey data to compute the mean sodium/salt consumption in the population (not in individuals). Empirical evidence suggests that, at the population level, mean estimates based on SU samples and on 24 hr urine samples are similar (Huang et al., 2016; Santos et al., 2020).

Variables

The predictors we used in the ML model were sex, age (years), weight (kg), height (m), systolic blood pressure (SBP, mmHg), and diastolic blood pressure (DBP, mmHg).

The analyzed surveys collect anthropometric and three blood pressure measurements. These are taken by trained fieldworkers following a standard protocol (World Health Organization, 2021b; Departamento de Epidemiologia. Ministerio de Salud, 2021). We used measured weight and height to compute the BMI (kg/m2). We used the mean SBP and mean DBP of the second and third blood pressure measurements (i.e., the first blood pressure measurement was discarded).

The outcome was salt intake as per the INTERSALT equation (Brown et al., 2013). We chose this equation because it has been used by WHO STEPS surveys. There is a specific INTERSALT equation for each sex, and they both include the following variables: age (years), BMI (kg/m2), SU sodium (mmol/L), and SU creatinine (mmol/L) (Brown et al., 2013). We used the following sex-specific formulas:

Men:{23.51 +[0.45 x NaSU]-[3.09 x CrSU]+[4.16 x BMI]+[0.22 x age]}
Women:{3.74 +[0.33 x NaSU]-[2.44 x CrSU]+[2.42 x BMI]+[2.34 x age]-[0.03 x age2]}

where the subscript SU indicates spot urine, Na is sodium, Cr is creatinine, and BMI is body mass index. Because some STEPS surveys had SU creatinine in mg/dL, these values were multiplied by 0.00884 to obtain SU creatinine in mmol/L. No conversion was needed for sodium in SU samples because all surveys herein included already had urinary sodium in mmol/L. The INTERSALT equation computes 24 hr sodium intake, which is then divided by 17.1 to obtain the salt intake in grams per day (g/d) (Brown et al., 2013). For descriptive purposes, we also computed salt intake based on the Kawasaki et al., 1993, Toft et al., 2014, and Tanaka et al., 2002 equations. Of note, our outcome variable was informed by SU samples and not by 24 hr urine samples (gold standard to assess salt consumption). Results should be interpreted according to this limitation.

Analysis

Data preparation

Our complete-case analysis was restricted to men and nonpregnant women aged between 15 and 69 years because of data availability. We dropped participants with implausible BMI levels (outside the range 10–80 kg/m2) or with implausible weight (outside the range 12–300 kg) or height records (outside the range 1.00–2.50 m). Participants with SBP outside the range 70–270 mmHg were discarded, and so were participants with DBP outside the range 30–150 mmHg. We excluded records with SU creatinine <1.8 or > 32.7 mmol/L for males and <1.8 or >28.3 for females (Santos et al., 2019; Paterson et al., 2019). In addition, we excluded participants with estimated salt intake (using the four equations) above or below 3 standard deviations from the equation-specific mean (Appendix 1—figure 1; Jensen et al., 2018). After completing data preparation, observations were randomly assigned from the pooled dataset (100%) into three datasets for the ML analysis: training dataset (50%), test dataset (30%), and validation dataset (20%).

Machine learning modeling

Our research aim was a regression problem where we had a known outcome attribute (salt consumption at the subject level). Therefore, we planned a supervised ML regression analysis. Details about the modeling process are available in the ‘Extended methods’ (Appendix 2). In brief, we designed a work pipeline with five steps. First, data analysis, where we dropped missing observations, we explored the available data to choose scaling and transformation methods to secure all variables were in the same scale or units, and we also planned transformations for categorical variables (e.g., one-hot encoding). Second, feature importance analysis, where we investigated the contribution of each predictor to the regression model through methods like Random Forest (RF) and Recursive Feature Elimination. The aim of this second step was to exclude any predictor that would not contribute to the regression model. Notably, all predictors (see ‘Variables’ section) chosen following expert knowledge were kept in the analysis (i.e., the feature importance analysis did not suggest the exclusion of any predictor). Third, data processing, having explored the available data (first step in the work pipeline), we implemented different scaling and transformation methods (e.g., Box-Cox, principal component analysis and polynomial features). Fourth, data modeling, where we implemented 10 ML algorithms: (i) linear regression (LiR); (ii) Hubber regressor (HuR); (iii) ridge regressor (RiR); (iv) multilayer perceptron (MLP); (v) support vector regressor (SVR); (vi) k-nearest neighbors (KNN); (vii) RF; (viii) gradient boost machine (GBM); (ix) extreme gradient boosting (XBG); and (x) a customized neural network. All these ML algorithms performed similarly, so the decision to choose one was postponed to the fifth (last) step in the work pipeline. Up to this point, we used the training and validations datasets. Five, forecasting of the predicted attribute in new data (i.e., data not used for model training); in this step, we used the test dataset to choose the model that yielded predictions closest to the observed salt intake. Results comparing the observed and the predicted salt intake were computed in the test dataset alone. For each country, we ran a paired t-test between the observed and predicted salt consumption, where a difference was deemed significant at a p<0.05. We also computed the absolute difference between the observed and predicted salt intake. We chose the HuR algorithm because it showed the mean difference closest to zero in both sexes combined (observed – predicted = 0) (Appendix 2—table 2, Appendix 2—figure 3) . All summary estimates (e.g., mean salt intake) were computed accounting for the complex survey design of the surveys included in the analysis.

Application of the developed ML model

Having developed the ML model following the steps above described, we applied the model to 54 WHO STEPS national surveys that did not have urine samples but included the predictors in the ML model (see ‘Variables’ section). In each of these 54 surveys, we computed the mean daily salt intake accounting for the complex survey design. These surveys were preprocessed following the same procedures described in the ‘Data preparation’ section.

Role of the funding source

The funder had no role in the study design, analysis, interpretation, or decision to publish. The authors are collectively responsible for the accuracy of the data. The arguments and opinions in this work are those of the authors alone, and do not represent the position of the institutions to which they belong.

Appendix 1

Appendix 1—table 1
Weighted distribution of predictors in each survey included in the machine learning model development.
CountryYearSample sizeMean age (years)Age range (years)Proportion of men (%)Mean, minimum and maximum values of SBP (mmHg)Mean, minimum, and maximum values of DBP (mmHg)Mean, minimum, and maximum values of weight (kg)Mean, minimum, and maximum values of height (m)Mean, minimum, and maximum values of urinary sodium (mmol/L)Mean, minimum, and maximum values of urinary creatinine (mmol/L)
Armenia201610744018–6949.7129 (86–238)85 (49–148)70.9 (35–139)1.66 (1.27–1.89)128.6 (10.6–237.6)10.1 (1.9–27.3)
Azerbaijan201723593918–6949.5126 (82–230)81 (48–142)73.1 (36–174)1.67 (1.15–1.98)167.7 (2–389)11.9 (1.8–31.8)
Bangladesh201862003918–6946.9121 (72–251)79 (32–147)55.9 (28–111)1.56 (1–2.11)119.6 (4–422)8.4 (2.2–32.3)
Belarus201745034318–6947.1135 (88–257)85 (54–147)77.7 (41–144)1.7 (1.05–1.99)149.5 (10.5–371.4)12.2 (1.8–32.7)
Bhutan201461633818–6959.2126 (75–228)85 (46–142)61.4 (23–115)1.6 (1.11–1.96)142.1 (6–388)8.1 (1.9–29.7)
Bhutan201961633415–6956.8124 (85–224)82 (44–137)61.9 (28.5–140)1.58 (1.07–1.92)129.9 (4.7–444.9)10.4 (1.8–32.7)
Brunei Darussalam201616353518–6951.4123 (76–218)78 (46–138)69 (31.2–138.3)1.59 (1.32–1.84)122.6 (19.9–329)12.6 (1.8–32.6)
Chile201729523915–6949.8120 (81–226)74 (44–130)75.7 (38.3–146.9)1.63 (1.34–1.96)135.8 (10–324)12.1 (1.8–32.2)
Jordan201910403718–6950.2118 (75–200)78 (50–120)76.3 (35.5–159.5)1.66 (1.36–1.95)165.4 (13–365)13.6 (1.8–32.5)
Lebanon20179984217–6948.7129 (80–214)77 (35–123)78.3 (40–141)1.68 (1.2–1.96)124.4 (4–385)11.5 (1.9–32)
Malawi201716013518–6956.4122 (74–222)76 (40–142)58.5 (33.6–119)1.61 (1.36–1.96)186.5 (11–399.9)10.7 (1.9–32.4)
Mongolia201375054215–6450.3129 (88–220)82 (50–134)71 (30.6–138)1.62 (1.27–1.92)134.1 (13.1–515)10.9 (1.8–31.9)
Mongolia201975053615–6950.9120 (76–254)77 (48–143)68.4 (29–159)1.64 (1.34–1.98)117 (2.1–348.9)7.5 (1.8–28.3)
Morocco201734354018–6950.6128 (83–228)78 (40–139)70.9 (35–168)1.66 (1.34–1.95)122.3 (26.3–575.2)10.3 (1.8–31.4)
Nepal201925603615–6941124 (81–239)81 (55–146)54.6 (26–160)1.55 (1.21–2.03)140.9 (3–437)5.6 (1.8–25.5)
SolomonIslands20151723818–6961.4121 (88–188)77 (52–104)67.9 (38.5–122)1.61 (1.41–1.8)99.3 (7–250)9.7 (1.9–28.4)
Sudan20165713618–6955.9128 (89–231)85 (58–132)72.2 (35.6–174)1.67 (1.42–1.92)128.5 (5–459)14 (1.9–32.4)
Tokelau20141813518–6356125 (76–184)79 (53–128)94.8 (58–158.3)1.71 (1.16–1.88)62.4 (20–265)5 (2–7.7)
Tonga20177554018–6935.7131 (96–208)83 (53–148)98.6 (48.1–181)1.69 (1.4–1.94)101.9 (4–327)15.3 (1.8–32.7)
Turkmenistan201835843718–6952.7127 (88–268)83 (54–149)72.4 (39–142)1.68 (1.16–1.98)109.2 (10–163)11.1 (4.5–18.3)
Zambia201724883318–6950.3125 (73–248)77 (36–148)60.9 (33.8–150)1.62 (1.01–2.07)137.2 (10–375)12.2 (1.8–32.4)
Appendix 1—table 2
Observed and predicted mean salt intake (g/day) by sex in each survey included in the machine learning model development.
CountryYearSexMean salt intakeMean salt intake lower 95% confidence intervalMean salt intake upper 95% confidence intervalCategory
Armenia2016Men9.249.049.45ML predicted
Armenia2016Men9.469.119.81Observed
Armenia2016Women7.437.37.57ML predicted
Armenia2016Women7.447.267.62Observed
Azerbaijan2017Men9.439.339.53ML predicted
Azerbaijan2017Men10.3910.0610.72Observed
Azerbaijan2017Women7.437.317.55ML predicted
Azerbaijan2017Women7.947.758.14Observed
Bangladesh2018Men8.878.88.93ML predicted
Bangladesh2018Men8.598.428.75Observed
Bangladesh2018Women7.187.137.24ML predicted
Bangladesh2018Women7.277.177.37Observed
Belarus2017Men9.499.429.56ML predicted
Belarus2017Men10.149.9410.35Observed
Belarus2017Women7.537.457.61ML predicted
Belarus2017Women7.567.417.72Observed
Bhutan2014Men9.149.049.25ML predicted
Bhutan2014Men9.589.279.88Observed
Bhutan2014Women7.387.37.46ML predicted
Bhutan2014Women8.17.948.27Observed
Bhutan2019Men9.339.259.41ML predicted
Bhutan2019Men9.18.859.35Observed
Bhutan2019Women7.437.367.49ML predicted
Bhutan2019Women7.537.337.73Observed
Brunei Darussalam2016Men9.789.579.99ML predicted
Brunei Darussalam2016Men8.958.669.25Observed
Brunei Darussalam2016Women7.647.57.77ML predicted
Brunei Darussalam2016Women7.37.057.54Observed
Chile2017Men9.659.569.75ML predicted
Chile2017Men9.759.1810.31Observed
Chile2017Women7.867.87.93ML predicted
Chile2017Women7.647.457.83Observed
Jordan2019Men9.319.039.6ML predicted
Jordan2019Men10.29.5210.88Observed
Jordan2019Women7.787.538.03ML predicted
Jordan2019Women8.17.758.45Observed
Lebanon2017Men9.889.6210.14ML predicted
Lebanon2017Men9.539.069.99Observed
Lebanon2017Women7.637.397.86ML predicted
Lebanon2017Women7.517.077.95Observed
Malawi2017Men8.768.668.86ML predicted
Malawi2017Men9.549.169.91Observed
Malawi2017Women7.16.977.24ML predicted
Malawi2017Women8.348.038.64Observed
Mongolia2013Men9.59.329.68ML predicted
Mongolia2013Men9.839.3410.32Observed
Mongolia2013Women7.637.517.74ML predicted
Mongolia2013Women7.797.67.97Observed
Mongolia2019Men9.329.239.42ML predicted
Mongolia2019Men9.689.59.85Observed
Mongolia2019Women7.397.327.46ML predicted
Mongolia2019Women7.467.347.59Observed
Morocco2017Men9.068.979.15ML predicted
Morocco2017Men9.038.829.24Observed
Morocco2017Women7.497.437.56ML predicted
Morocco2017Women7.477.357.59Observed
Nepal2019Men98.839.18ML predicted
Nepal2019Men9.559.229.87Observed
Nepal2019Women7.076.987.15ML predicted
Nepal2019Women7.847.658.04Observed
Solomon Islands2015Men9.429.259.59ML predicted
Solomon Islands2015Men8.748.089.4Observed
Solomon Islands2015Women7.547.297.79ML predicted
Solomon Islands2015Women7.036.427.64Observed
Sudan2016Men9.078.769.37ML predicted
Sudan2016Men8.537.739.33Observed
Sudan2016Women7.627.277.97ML predicted
Sudan2016Women7.497.097.88Observed
Tokelau2014Men10.6410.3410.93ML predicted
Tokelau2014Men10.2910.1810.4Observed
Tokelau2014Women8.968.719.21ML predicted
Tokelau2014Women8.127.618.63Observed
Tonga2017Men10.510.3110.69ML predicted
Tonga2017Men9.198.899.48Observed
Tonga2017Women8.858.659.04ML predicted
Tonga2017Women7.637.457.81Observed
Turkmenistan2018Men9.389.289.48ML predicted
Turkmenistan2018Men8.948.799.09Observed
Turkmenistan2018Women7.27.137.27ML predicted
Turkmenistan2018Women6.766.686.83Observed
Zambia2017Men8.928.849ML predicted
Zambia2017Men8.458.158.75Observed
Zambia2017Women7.046.967.12ML predicted
Zambia2017Women7.016.817.22Observed
  1. ML: machine learning; SBP: systolic blood pressure; DBP: diastolic blood pressure.

Appendix 1—table 3
Observed and predicted mean salt intake (g/day) by age, body mass index (BMI) category, and blood pressure status across all surveys included in the machine learning model development dataset.
AttributedSalt consumption (g/day) observed using surveys included in the derivation modelSalt consumption (g/day) estimated using the surveys included in the derivation model
Meanp-Value for independent t-test or ANOVA testMeanp-Value for independent t-test or ANOVA test
Age <30 years7.9<0.0018.0< 0.001
Age ≥ 30 years8.48.3
BMI <18.5 kg/m27.0< 0.0017.0< 0.001
BMI 18.5–24.9 kg/m27.87.7
BMI 25.0–29.9 kg/m28.68.4
BMI ≥ 30 kg/m29.39.3
Raised blood pressure ( ≥ 140/90 mmHg)8.7< 0.0018.6< 0.001
No raised blood pressure8.28.1
  1. These results do not consider the survey sampling design.

Appendix 1—table 4
Mean difference (g/day) between observed and predicted salt intake by sex in each survey included in the machine learning (ML) model development.
CountryYearSexMean differenceMean difference lower 95% confidence intervalMean difference upper 95% confidence intervalp-Value
Armenia2016Men0.22–0.060.50.0007
Armenia2016Women0.01–0.120.130.1953
Azerbaijan2017Men0.960.671.26< 0.0001
Azerbaijan2017Women0.520.370.66< 0.0001
Bangladesh2018Men–0.28–0.44–0.12< 0.0001
Bangladesh2018Women0.09–0.010.190.0004
Belarus2017Men0.660.470.84< 0.0001
Belarus2017Women0.03–0.090.160.6258
Bhutan2014Men0.430.170.7< 0.0001
Bhutan2014Women0.720.570.88< 0.0001
Bhutan2019Men–0.23–0.480.020.0007
Bhutan2019Women0.1–0.080.280.7508
Brunei Darussalam2016Men–0.82–1.06–0.58< 0.0001
Brunei Darussalam2016Women–0.34–0.55–0.13< 0.0001
Chile2017Men0.1–0.390.580.0001
Chile2017Women–0.22–0.36–0.08< 0.0001
Jordan2019Men0.890.311.460.0065
Jordan2019Women0.3200.640.4142
Lebanon2017Men–0.36–0.850.140.2074
Lebanon2017Women–0.12–0.450.220.1591
Malawi2017Men0.770.391.16< 0.0001
Malawi2017Women1.230.951.51< 0.0001
Mongolia2013Men0.33–0.020.680.0184
Mongolia2013Women0.16–0.030.350.2655
Mongolia2019Men0.350.230.48< 0.0001
Mongolia2019Women0.08–0.010.170.5155
Morocco2017Men–0.03–0.210.140.3083
Morocco2017Women–0.02–0.130.090.7259
Nepal2019Men0.540.250.83< 0.0001
Nepal2019Women0.780.610.94< 0.0001
Solomon Islands2015Men–0.68–1.26–0.10.0477
Solomon Islands2015Women–0.51–1.10.090.0539
Sudan2016Men–0.53–1.150.080.2111
Sudan2016Women–0.13–0.450.190.0674
Tokelau2014Men–0.35–0.53–0.160.2248
Tokelau2014Women–0.84–1.22–0.450.0026
Tonga2017Men–1.31–1.58–1.05< 0.0001
Tonga2017Women–1.22–1.39–1.05< 0.0001
Turkmenistan2018Men–0.44–0.52–0.36< 0.0001
Turkmenistan2018Women–0.45–0.51–0.39< 0.0001
Zambia2017Men–0.47–0.74–0.19< 0.0001
Zambia2017Women–0.02–0.210.170.3438
  1. p-Value for paired t Student test between observed and predicted.

Appendix 1—table 5
Observed mean salt intake (g/day) by equation and sex in each survey included in the machine learning (ML) model development.
CountryYearSexMean salt intakeMean salt intake lower 95% confidence intervalMean salt intake upper 95% confidence intervalCategory
Armenia2016Men9.469.119.81Observed_intersalt
Armenia2016Men14.5813.7115.44Observed_kawasaki
Armenia2016Men10.219.7110.7Observed_tanaka
Armenia2016Men12.7112.1913.23Observed_toft
Armenia2016Women7.447.267.62Observed_intersalt
Armenia2016Women12.4811.8713.09Observed_kawasaki
Armenia2016Women9.989.5910.36Observed_tanaka
Armenia2016Women8.418.268.57Observed_toft
Azerbaijan2017Men10.3910.0610.72Observed_intersalt
Azerbaijan2017Men14.8214.2115.42Observed_kawasaki
Azerbaijan2017Men10.319.9810.64Observed_tanaka
Azerbaijan2017Men12.8112.4513.18Observed_toft
Azerbaijan2017Women7.947.758.14Observed_intersalt
Azerbaijan2017Women12.6512.2213.08Observed_kawasaki
Azerbaijan2017Women10.149.8710.41Observed_tanaka
Azerbaijan2017Women8.458.338.56Observed_toft
Bangladesh2018Men8.598.428.75Observed_intersalt
Bangladesh2018Men12.5912.2512.93Observed_kawasaki
Bangladesh2018Men8.818.629.01Observed_tanaka
Bangladesh2018Men11.6211.411.85Observed_toft
Bangladesh2018Women7.277.177.37Observed_intersalt
Bangladesh2018Women12.0911.7812.4Observed_kawasaki
Bangladesh2018Women98.829.19Observed_tanaka
Bangladesh2018Women8.338.258.42Observed_toft
Belarus2017Men10.149.9410.35Observed_intersalt
Belarus2017Men14.2213.8514.6Observed_kawasaki
Belarus2017Men10.169.9510.38Observed_tanaka
Belarus2017Men12.4612.2412.69Observed_toft
Belarus2017Women7.567.417.72Observed_intersalt
Belarus2017Women11.4311.111.75Observed_kawasaki
Belarus2017Women9.599.379.8Observed_tanaka
Belarus2017Women8.0988.18Observed_toft
Bhutan2014Men9.589.279.88Observed_intersalt
Bhutan2014Men15.0514.2315.87Observed_kawasaki
Bhutan2014Men10.069.6410.48Observed_tanaka
Bhutan2014Men1312.5113.49Observed_toft
Bhutan2014Women8.17.948.27Observed_intersalt
Bhutan2014Women14.2413.7214.76Observed_kawasaki
Bhutan2014Women10.5410.2210.86Observed_tanaka
Bhutan2014Women8.858.728.99Observed_toft
Bhutan2019Men9.18.859.35Observed_intersalt
Bhutan2019Men12.8112.2313.39Observed_kawasaki
Bhutan2019Men8.818.519.11Observed_tanaka
Bhutan2019Men11.6211.2811.97Observed_toft
Bhutan2019Women7.537.337.73Observed_intersalt
Bhutan2019Women11.5911.2211.96Observed_kawasaki
Bhutan2019Women8.98.679.12Observed_tanaka
Bhutan2019Women8.188.078.28Observed_toft
Brunei Darussalam2016Men8.958.669.25Observed_intersalt
Brunei Darussalam2016Men11.5110.9512.08Observed_kawasaki
Brunei Darussalam2016Men8.177.898.45Observed_tanaka
Brunei Darussalam2016Men10.7910.4411.14Observed_toft
Brunei Darussalam2016Women7.37.057.54Observed_intersalt
Brunei Darussalam2016Women10.5210.0211.01Observed_kawasaki
Brunei Darussalam2016Women8.388.088.69Observed_tanaka
Brunei Darussalam2016Women7.887.738.03Observed_toft
Chile2017Men9.759.1810.31Observed_intersalt
Chile2017Men12.8612.0713.66Observed_kawasaki
Chile2017Men9.258.849.66Observed_tanaka
Chile2017Men11.6611.1412.17Observed_toft
Chile2017Women7.647.457.83Observed_intersalt
Chile2017Women11.1110.8111.4Observed_kawasaki
Chile2017Women9.138.939.32Observed_tanaka
Chile2017Women8.067.978.15Observed_toft
Jordan2019Men10.29.5210.88Observed_intersalt
Jordan2019Men13.9812.7315.23Observed_kawasaki
Jordan2019Men9.849.1710.51Observed_tanaka
Jordan2019Men12.2911.5613.02Observed_toft
Jordan2019Women8.17.758.45Observed_intersalt
Jordan2019Women12.111.4812.72Observed_kawasaki
Jordan2019Women9.749.3410.13Observed_tanaka
Jordan2019Women8.348.178.5Observed_toft
Lebanon2017Men9.539.069.99Observed_intersalt
Lebanon2017Men12.7211.6513.79Observed_kawasaki
Lebanon2017Men9.228.619.84Observed_tanaka
Lebanon2017Men11.4810.8212.14Observed_toft
Lebanon2017Women7.517.077.95Observed_intersalt
Lebanon2017Women11.3510.4512.25Observed_kawasaki
Lebanon2017Women9.378.7510Observed_tanaka
Lebanon2017Women8.037.768.3Observed_toft
Malawi2017Men9.549.169.91Observed_intersalt
Malawi2017Men14.0213.414.64Observed_kawasaki
Malawi2017Men9.439.089.77Observed_tanaka
Malawi2017Men12.412.0412.77Observed_toft
Malawi2017Women8.348.038.64Observed_intersalt
Malawi2017Women13.4312.7614.11Observed_kawasaki
Malawi2017Women10.179.7510.58Observed_tanaka
Malawi2017Women8.648.478.82Observed_toft
Mongolia2013Men9.839.3410.32Observed_intersalt
Mongolia2013Men13.3712.7414.01Observed_kawasaki
Mongolia2013Men9.489.139.83Observed_tanaka
Mongolia2013Men12.0411.6412.45Observed_toft
Mongolia2013Women7.797.67.97Observed_intersalt
Mongolia2013Women11.9211.3412.5Observed_kawasaki
Mongolia2013Women9.549.169.92Observed_tanaka
Mongolia2013Women8.248.088.4Observed_toft
Mongolia2019Men9.689.59.85Observed_intersalt
Mongolia2019Men14.8314.4915.17Observed_kawasaki
Mongolia2019Men10.149.9510.32Observed_tanaka
Mongolia2019Men12.8412.6413.05Observed_toft
Mongolia2019Women7.467.347.59Observed_intersalt
Mongolia2019Women12.1311.8112.44Observed_kawasaki
Mongolia2019Women9.639.439.84Observed_tanaka
Mongolia2019Women8.318.238.4Observed_toft
Morocco2017Men9.038.829.24Observed_intersalt
Morocco2017Men13.0412.6313.44Observed_kawasaki
Morocco2017Men9.339.19.56Observed_tanaka
Morocco2017Men11.7511.512Observed_toft
Morocco2017Women7.477.357.59Observed_intersalt
Morocco2017Women11.7211.4112.04Observed_kawasaki
Morocco2017Women9.489.289.68Observed_tanaka
Morocco2017Women8.188.098.26Observed_toft
Nepal2019Men9.559.229.87Observed_intersalt
Nepal2019Men16.615.9217.27Observed_kawasaki
Nepal2019Men10.6910.3311.04Observed_tanaka
Nepal2019Men14.0413.6414.44Observed_toft
Nepal2019Women7.847.658.04Observed_intersalt
Nepal2019Women15.3514.8215.88Observed_kawasaki
Nepal2019Women10.910.5711.24Observed_tanaka
Nepal2019Women9.128.999.25Observed_toft
Solomon Islands2015Men8.748.089.4Observed_intersalt
Solomon Islands2015Men12.9911.0614.93Observed_kawasaki
Solomon Islands2015Men8.877.979.77Observed_tanaka
Solomon Islands2015Men11.6210.4312.8Observed_toft
Solomon Islands2015Women7.036.427.64Observed_intersalt
Solomon Islands2015Women11.388.7813.98Observed_kawasaki
Solomon Islands2015Women8.987.3410.61Observed_tanaka
Solomon Islands2015Women7.957.268.64Observed_toft
Sudan2016Men8.537.739.33Observed_intersalt
Sudan2016Men11.6610.6612.66Observed_kawasaki
Sudan2016Men8.497.919.08Observed_tanaka
Sudan2016Men10.8310.1711.5Observed_toft
Sudan2016Women7.497.097.88Observed_intersalt
Sudan2016Women11.310.612.01Observed_kawasaki
Sudan2016Women9.318.859.78Observed_tanaka
Sudan2016Women8.097.898.3Observed_toft
Tokelau2014Men10.2910.1810.4Observed_intersalt
Tokelau2014Men14.3313.1615.5Observed_kawasaki
Tokelau2014Men10.19.4810.72Observed_tanaka
Tokelau2014Men12.4211.7113.14Observed_toft
Tokelau2014Women8.127.618.63Observed_intersalt
Tokelau2014Women11.49.8512.95Observed_kawasaki
Tokelau2014Women9.718.6910.72Observed_tanaka
Tokelau2014Women8.157.768.54Observed_toft
Tonga2017Men9.198.899.48Observed_intersalt
Tonga2017Men10.069.1710.95Observed_kawasaki
Tonga2017Men7.727.228.22Observed_tanaka
Tonga2017Men9.779.1910.35Observed_toft
Tonga2017Women7.637.457.81Observed_intersalt
Tonga2017Women9.378.889.87Observed_kawasaki
Tonga2017Women8.418.068.76Observed_tanaka
Tonga2017Women7.537.377.68Observed_toft
Turkmenistan2018Men8.948.799.09Observed_intersalt
Turkmenistan2018Men12.1111.9312.3Observed_kawasaki
Turkmenistan2018Men8.858.748.96Observed_tanaka
Turkmenistan2018Men11.211.0911.32Observed_toft
Turkmenistan2018Women6.766.686.83Observed_intersalt
Turkmenistan2018Women10.19.9310.26Observed_kawasaki
Turkmenistan2018Women8.538.418.65Observed_tanaka
Turkmenistan2018Women7.787.737.83Observed_toft
Zambia2017Men8.458.158.75Observed_intersalt
Zambia2017Men12.712.0913.3Observed_kawasaki
Zambia2017Men8.88.469.13Observed_tanaka
Zambia2017Men11.4811.1111.86Observed_toft
Zambia2017Women7.016.817.22Observed_intersalt
Zambia2017Women11.1110.6611.56Observed_kawasaki
Zambia2017Women8.88.529.09Observed_tanaka
Zambia2017Women87.868.13Observed_toft
Appendix 1—table 6
Weighted distribution of predictors in each of the 54 national surveys included in the application of the model herein developed.
CountryYearSample sizeMean age (years)Age range (years)Proportion of men (%)Mean, minimum, and maximum values of SBP (mmHg)Mean, minimum, and maximum values of DBP (mmHg)Mean, minimum, and maximum values of weight (kg)Mean, minimum, and maximum values of height (m)
American Samoa200420434025–6450.3131 (84–230)82 (46–134)100.4 (38.6–219.1)1.69 (1.36–2.19)
Benin201548413418–6949.6126 (74–254)82 (45–142)62.3 (30–167)1.64 (1.21–1.98)
Bahamas201214004224–6449.9127 (73–248)82 (32–140)84.8 (27.9–184.9)1.67 (1.15–2.03)
Barbados20072824325–6951.9122 (86–191)80 (55–115)77.5 (40.6–232.1)1.67 (1.17–1.93)
British Virgin Islands200910674325–6454.1130 (81–226)80 (48–126)83.2 (39.6–176.9)1.7 (1.14–2.26)
Botswana201438943315–6952.1128 (84–262)80 (47–148)63.9 (31.7–171.1)1.66 (1.02–2)
Cook Islands20158793918–6446.5128 (92–194)79 (45–118)98.6 (49.1–205.1)1.69 (1.07–1.96)
Comoros201150293925–6452.6128 (82–236)79 (48–144)64.2 (23.5–166)1.61 (1–2.15)
Cabo Verde200717233825–6450.3133 (86–234)80 (48–140)68.3 (35–150)1.68 (1.23–1.96)
Cayman Islands201212294224–6450.7125 (84–208)76 (46–127)82.3 (31–196)1.69 (1–2.1)
Algeria201765363818–6951.7127 (77–227)75 (32–137)73.3 (25–174)1.67 (1.02–2.05)
Ecuador201844664018–6949.4120 (78–220)76 (42–130)69.2 (33.4–198.4)1.59 (1.24–1.93)
Eritrea201056514225–6917.2117 (72–230)74 (46–130)51.8 (28.1–99.1)1.6 (1.16–1.89)
Ethiopia201592703115–6956.1120 (71–250)78 (30–142)54.4 (20–99.5)1.63 (1.05–2)
Fiji201124924225–6451130 (84–228)80 (39–143)78.6 (30.3–198.1)1.68 (1.03–1.94)
Gambia201034963825–6450.4130 (85–252)80 (44–144)64.8 (26.5–168.9)1.64 (1-2)
Grenada201110554125–6450.7131 (71–212)80 (50–128)77.6 (40.8–158.8)1.7 (1.32–2.49)
Guyana201626253718–6952126 (74–245)78 (37–149)69.9 (26.4–198)1.63 (1.01–2.07)
Iraq201536553518–6953.6128 (78–225)83 (45–150)76.5 (36.6–187.2)1.65 (1.01–1.97)
Kenya201542703416–6950.6125 (76–262)81 (46–146)63.2 (30–171.3)1.65 (1.01–1.95)
Kyrgyzstan201325394125–6451.9133 (82–244)87 (56–150)71.7 (36.6–162.4)1.64 (1.38–1.95)
Cambodia201052234025–6449.4116 (70–226)72 (42–138)53.7 (21.1–111)1.57 (1.24–1.85)
Kiribati201612404018–6942.8128 (85–220)85 (49–148)81.1 (30–219)1.64 (1.22–1.89)
Kuwait201428713618–6949.5120 (70–240)77 (50–130)80.5 (37.3–195)1.65 (1.04–1.96)
Lao People’s Democratic Republic201324643916–6542.3119 (72–240)76 (30–130)54.2 (27–103.1)1.54 (1.16–1.97)
Liberia201122424025–6450.7129 (88–232)80 (32–138)65.4 (32–163)1.58 (1–2.5)
Libya200932233725–6451.5133 (74–238)79 (44–148)77 (31.7–186.2)1.67 (1–1.97)
Sri Lanka201545663918–6951.5125 (74–258)81 (36–150)58 (26.2–156.9)1.59 (1.02–1.9)
Lesotho201221623825–6449.8126 (78–250)83 (46–146)66.2 (21.5–164.6)1.61 (1.02–1.97)
Republic of Moldova201340773918–6952.5133 (83–257)85 (49–148)75 (32.5–166)1.68 (1.2–1.98)
Marshall Islands201826573917–6948.5120 (70–220)75 (40–134)74.4 (27–226.5)1.58 (1.01–2.15)
Myanmar201478924225–6450.4126 (70–252)82 (35–144)57.1 (26.3–173)1.59 (1–2.18)
Mozambique20057234124–6446139 (85–220)82 (46–143)56.7 (33.4–109.5)1.6 (1.02–1.89)
Namibia20057524125–6441.3137 (87–230)86 (50–132)63.7 (26.5–134.3)1.63 (1.12–2)
Niger200726383715–6454.1134 (70–260)82 (40–145)59.5 (24.3–162.2)1.67 (1.01–2.1)
Niue20127794015–6950.1128 (89–223)76 (44–117)91.5 (44.7–165.9)1.69 (1.17–1.96)
Nauru201610373618–6950123 (76–223)80 (46–125)92.4 (43.4–197.9)1.63 (1.41–1.86)
Palau201321484325–6453138 (87–236)85 (40–135)79.4 (32–180.6)1.62 (1.02–2.03)
French Polynesia201022393618–6450.7125 (86–230)79 (48–150)86.2 (41–193)1.7 (1.41–2)
Qatar201222873518–6450.9119 (78–203)79 (46–130)79.1 (34.4–190.5)1.64 (1.35–2)
Rwanda201368823215–6448.8121 (75–250)78 (45–140)57 (23.1–165.8)1.6 (1–1.91)
Sierra Leone200944734025–6450.3131 (72–220)81 (42–148)60 (28–185)1.62 (1–2.34)
Sao Tome and Principe200822724025–6448.4135 (78–240)82 (34–143)66.1 (30–186.2)1.64 (1.01–1.98)
Eswatini201430423115–6947.4124 (72–252)80 (42–150)67.8 (22.2–227.6)1.63 (1.01–2.02)
Togo201139953215–6449.3123 (70–251)77 (31–142)61.6 (26–165)1.64 (1.02–1.99)
Tajikistan201726433218–6953.8129 (81–267)84 (54–150)66.7 (27.8–148)1.63 (1.09–2)
Timor-Leste201424803618–6963.8130 (72–235)84 (42–136)52 (27–165)1.57 (1.24–1.83)
Tuvalu201510243918–6954.9134 (92–246)84 (48–145)91.9 (35.8–181.8)1.68 (1.17–2.06)
United Republic of Tanzania201253813925–6450.6129 (80–240)80 (40–146)60.6 (29–171.1)1.63 (1.13–1.97)
Uganda201436733518–6950.5125 (83–249)81 (50–148)59.4 (30.2–165)1.62 (1.15–2.03)
Uruguay201422073815–6447.8125 (82–232)79 (44–134)74.6 (34.3–158)1.67 (1.36–2.05)
Vietnam201530333918–6950.4120 (71–224)77 (40–128)54.7 (27.8–106.4)1.58 (1.01–1.98)
Vanuatu201144204025–6447.7130 (77–269)80 (38–139)69.4 (28.3–199.8)1.63 (1.02–2.1)
Samoa201314903718–6454.1125 (80–222)75 (44–132)90.3 (32.1–160)1.68 (1.22–1.97)
  1. SBP: systolic blood pressure; DBP: diastolic blood pressure.

Appendix 1—table 7
Predicted mean salt intake (g/day) by sex in each of the 54 national surveys included in the application of the model herein developed.
CountryYearSexMean salt intakeMean salt intake lower 95% confidence intervalMean salt intake upper 95% confidence interval
Algeria2017Men9.269.229.3
Algeria2017Women7.547.57.58
Algeria2017Total8.438.398.47
American Samoa2004Men10.910.810.99
American Samoa2004Women9.038.969.11
American Samoa2004Total9.979.9310.01
Bahamas2012Men10.099.8310.35
Bahamas2012Women8.117.818.4
Bahamas2012Total9.18.99.29
Barbados2007Men9.429.259.6
Barbados2007Women7.857.548.17
Barbados2007Total8.678.458.89
Benin2015Men8.968.99.03
Benin2015Women7.016.897.12
Benin2015Total7.987.818.15
Botswana2014Men8.748.688.79
Botswana2014Women7.27.147.26
Botswana2014Total87.948.06
British Virgin Islands2009Men9.739.669.81
British Virgin Islands2009Women7.857.827.88
British Virgin Islands2009Total8.878.828.92
Cabo Verde2007Men8.988.939.03
Cabo Verde2007Women7.137.037.23
Cabo Verde2007Total8.067.978.16
Cambodia2010Men8.838.88.86
Cambodia2010Women6.836.816.86
Cambodia2010Total7.827.787.86
Cayman Islands2012Men9.739.699.77
Cayman Islands2012Women7.927.618.23
Cayman Islands2012Total8.848.758.92
Comoros2011Men9.069.029.1
Comoros2011Women7.437.387.47
Comoros2011Total8.298.248.33
Cook Islands2015Men10.8710.7311.01
Cook Islands2015Women8.748.638.86
Cook Islands2015Total9.739.599.88
Ecuador2018Men9.69.559.65
Ecuador2018Women7.657.67.69
Ecuador2018Total8.618.558.68
Eritrea2010Men8.328.278.37
Eritrea2010Women6.486.436.52
Eritrea2010Total6.796.756.84
Eswatini2014Men9.119.029.2
Eswatini2014Women7.627.567.68
Eswatini2014Total8.338.278.39
Ethiopia2015Men8.528.498.54
Ethiopia2015Women6.626.596.65
Ethiopia2015Total7.687.657.72
Fiji2011Men9.539.449.62
Fiji2011Women7.847.767.91
Fiji2011Total8.78.68.8
French Polynesia2010Men10.11010.2
French Polynesia2010Women87.98.1
French Polynesia2010Total9.068.989.15
Gambia2010Men9.058.949.17
Gambia2010Women7.177.17.25
Gambia2010Total8.128.038.22
Grenada2011Men9.219.129.31
Grenada2011Women7.747.647.84
Grenada2011Total8.498.48.58
Guyana2016Men9.269.169.35
Guyana2016Women7.677.67.74
Guyana2016Total8.58.438.56
Iraq2015Men9.669.589.75
Iraq2015Women7.947.888.01
Iraq2015Total8.878.88.93
Kenya2015Men8.828.738.9
Kenya2015Women7.137.047.21
Kenya2015Total7.987.898.07
Kiribati2016Men9.929.7410.09
Kiribati2016Women8.278.148.39
Kiribati2016Total8.978.869.09
Kuwait2014Men10.069.9910.12
Kuwait2014Women7.957.918
Kuwait2014Total8.998.949.05
Kyrgyzstan2013Men9.459.349.55
Kyrgyzstan2013Women7.627.567.67
Kyrgyzstan2013Total8.578.58.63
Lao People’s Democratic Republic2013Men9.038.989.08
Lao People’s Democratic Republic2013Women7.077.027.12
Lao People’s Democratic Republic2013Total7.97.837.97
Lesotho2012Men9.088.999.17
Lesotho2012Women7.77.67.79
Lesotho2012Total8.388.318.46
Liberia2011Men9.439.329.55
Liberia2011Women7.587.487.69
Liberia2011Total8.528.418.63
Libya2009Men9.519.449.59
Libya2009Women7.817.737.89
Libya2009Total8.698.638.75
Marshall Islands2018Men9.929.869.99
Marshall Islands2018Women8.168.18.21
Marshall Islands2018Total9.018.969.07
Mozambique2005Men8.728.628.83
Mozambique2005Women6.926.847
Mozambique2005Total7.757.637.87
Myanmar2014Men8.818.748.88
Myanmar2014Women7.076.977.17
Myanmar2014Total7.957.888.02
Namibia2005Men8.748.598.89
Namibia2005Women7.246.937.56
Namibia2005Total7.867.638.09
Nauru2016Men10.9810.8711.1
Nauru2016Women8.798.638.94
Nauru2016Total9.899.7410.03
Niger2007Men8.568.528.6
Niger2007Women6.676.636.71
Niger2007Total7.697.657.74
Niue2012Men10.3910.2810.51
Niue2012Women8.398.278.51
Niue2012Total9.49.299.5
Palau2013Men10.1810.0710.28
Palau2013Women7.997.98.08
Palau2013Total9.159.059.25
Qatar2012Men10.029.9310.11
Qatar2012Women7.947.858.04
Qatar2012Total98.99.09
Republic of Moldova2013Men9.519.459.57
Republic of Moldova2013Women7.467.417.52
Republic of Moldova2013Total8.548.488.6
Rwanda2013Men8.878.858.9
Rwanda2013Women7.026.997.05
Rwanda2013Total7.927.897.96
Samoa2013Men10.2310.0910.37
Samoa2013Women8.618.518.71
Samoa2013Total9.499.419.57
Sao Tome and Principe2008Men9.058.979.12
Sao Tome and Principe2008Women7.217.17.32
Sao Tome and Principe2008Total8.17.998.2
Sierra Leone2009Men8.858.768.94
Sierra Leone2009Women76.97.11
Sierra Leone2009Total7.937.828.04
Sri Lanka2015Men8.918.868.95
Sri Lanka2015Women7.077.037.1
Sri Lanka2015Total8.017.978.06
Tajikistan2017Men9.419.349.49
Tajikistan2017Women7.357.37.41
Tajikistan2017Total8.468.388.55
Timor-Leste2014Men8.918.799.02
Timor-Leste2014Women6.86.756.86
Timor-Leste2014Total8.157.868.43
Togo2011Men8.828.798.86
Togo2011Women7.016.967.06
Togo2011Total7.97.857.96
Tuvalu2015Men10.3710.2410.5
Tuvalu2015Women8.728.628.83
Tuvalu2015Total9.639.539.73
Uganda2014Men8.88.768.84
Uganda2014Women7.026.967.07
Uganda2014Total7.927.867.98
United Republic of Tanzania2012Men8.718.638.79
United Republic of Tanzania2012Women7.137.057.21
United Republic of Tanzania2012Total7.937.887.98
Uruguay2014Men9.559.489.63
Uruguay2014Women7.477.417.52
Uruguay2014Total8.468.398.53
Vanuatu2011Men9.389.339.43
Vanuatu2011Women7.457.47.5
Vanuatu2011Total8.378.318.43
Vietnam2015Men8.918.868.95
Vietnam2015Women6.846.816.88
Vietnam2015Total7.887.837.94
Appendix 1—table 8
Comparison between mean salt intake (g/day) predictions and global estimates across national surveys included in the application of our machine learning model.
CountryYear (machine learning predictions)Machine learning predicted mean salt intake and 95% confidence intervalYear (global estimates)Estimated mean salt intake and 95% confidence intervalRatio between machine learning predicted and global estimates
Algeria20178.4 (8.4–8.5)201010.7 (9–12.5)0.8
Bahamas20129.1 (8.9–9.3)20107.5 (6.2–8.8)1.2
Barbados20078.7 (8.4–8.9)20108.6 (7.8–9.4)1
Benin20158 (7.8–8.2)20107.1 (6.2–8.1)1.1
Botswana20148 (7.9–8.1)20106.3 (5.4–7.4)1.3
Cabo Verde20078.1 (8–8.2)20108.1 (6.8–9.7)1
Cambodia20107.8 (7.8–7.9)201011 (9.3–12.9)0.7
Comoros20118.3 (8.2–8.3)20104.2 (3.5–5)2
Ecuador20188.6 (8.6–8.7)20107.6 (6.4–8.9)1.1
Eritrea20106.8 (6.8–6.8)20105.9 (5–7)1.2
Ethiopia20157.7 (7.7–7.7)20105.7 (4.9–6.7)1.4
Fiji20118.7 (8.6–8.8)20107.2 (6–8.5)1.2
Gambia20108.1 (8–8.2)20107.7 (6.5–8.9)1.1
Grenada20118.5 (8.4–8.6)20106.5 (5.5–7.7)1.3
Guyana20168.5 (8.4–8.6)20106.1 (5.1–7.3)1.4
Iraq20158.9 (8.8–8.9)20109.4 (8–11.2)0.9
Kenya20158 (7.9–8.1)20103.7 (3.4–4)2.2
Kiribati20169 (8.9–9.1)20105.6 (4.6–6.7)1.6
Kuwait20149 (8.9–9.1)20109.7 (8.7–10.8)0.9
Kyrgyzstan20138.6 (8.5–8.6)201013.4 (11.4–15.8)0.6
Lao People’s Democratic Republic20137.9 (7.8–8)201011.1 (9.4–13.2)0.7
Lesotho20128.4 (8.3–8.5)20106.6 (5.5–7.8)1.3
Liberia20118.5 (8.4–8.6)20106.7 (5.6–7.9)1.3
Libya20098.7 (8.6–8.8)201010.6 (8.9–12.5)0.8
Marshall Islands20189 (9–9.1)20106.4 (5.4–7.5)1.4
Mozambique20057.8 (7.6–7.9)20105.6 (4.7–6.6)1.4
Myanmar20148 (7.9–8)201011.2 (9.4–13.2)0.7
Namibia20057.9 (7.6–8.1)20106.6 (5.6–7.7)1.2
Niger20077.7 (7.7–7.7)20107.3 (6.2–8.6)1.1
Qatar20129 (8.9–9.1)201010.5 (8.3–12.9)0.9
Republic of Moldova20138.5 (8.5–8.6)20109.9 (8.3–11.6)0.9
Rwanda20137.9 (7.9–8)20104 (3.3–4.9)2
Samoa20139.5 (9.4–9.6)20105.2 (4.6–5.8)1.8
Sao Tome and Principe20088.1 (8–8.2)20105.9 (4.9–6.9)1.4
Sierra Leone20097.9 (7.8–8)20106.3 (5.3–7.3)1.3
Sri Lanka20158 (8–8.1)20109.7 (8.2–11.3)0.8
Tajikistan20178.5 (8.4–8.6)201013.5 (11.6–15.7)0.6
Timor-Leste20148.2 (7.9–8.4)201011.2 (9.3–13.3)0.7
Uganda20147.9 (7.9–8)20105.3 (4.4–6.3)1.5
United Republic of Tanzania20127.9 (7.9–8)20106.9 (6.1–7.7)1.1
Uruguay20148.5 (8.4–8.5)20106.8 (5.8–8)1.2
Vanuatu20118.4 (8.3–8.4)20105.6 (4.8–6.6)1.5
Vietnam20157.9 (7.8–7.9)201011.5 (9.5–13.7)0.7
  1. There are 43 countries in this table; that is, countries included in our analysis that were not available in the previous global work were not included in this table (Powles et al., 2013).

Appendix 1—table 9
Countries included in the analysis by income group according to the World Bank classification.
AnalysisWorld regionCountryYearIncome group
Model applicationAfricaAlgeria2017Upper-middle
Model applicationWestern PacificAmerican Samoa2004Upper-middle
Model applicationAmericasBahamas2012High
Model applicationAmericasBarbados2007High
Model applicationAfricaBenin2015Lower
Model applicationAfricaBotswana2014Upper-middle
Model applicationAmericasBritish Virgin Islands2009No data
Model applicationAfricaCabo Verde2007Lower-middle
Model applicationWestern PacificCambodia2010Lower
Model applicationAmericasCayman Islands2012High
Model applicationAfricaComoros2011Lower
Model applicationWestern PacificCook Islands2015No data
Model applicationAmericasEcuador2018Upper-middle
Model applicationAfricaEritrea2010Lower
Model applicationAfricaEswatini2014Lower-middle
Model applicationAfricaEthiopia2015Lower
Model applicationWestern PacificFiji2011Lower-middle
Model applicationWestern PacificFrench Polynesia2010High
Model applicationAfricaGambia2010Lower
Model applicationAmericasGrenada2011Upper-middle
Model applicationAmericasGuyana2016Upper-middle
Model applicationEastern MediterraneanIraq2015Upper-middle
Model applicationAfricaKenya2015Lower-middle
Model applicationWestern PacificKiribati2016Lower-middle
Model applicationEastern MediterraneanKuwait2014High
Model applicationEastern MediterraneanKyrgyzstan2013Lower-middle
Model applicationWestern PacificLao People’s Democratic Republic2013Lower-middle
Model applicationAfricaLesotho2012Lower-middle
Model applicationAfricaLiberia2011Lower
Model applicationEastern MediterraneanLibya2009Upper-middle
Model applicationWestern PacificMarshall Islands2018Upper-middle
Model applicationAfricaMozambique2005Lower
Model applicationSoutheast AsiaMyanmar2014Lower-middle
Model applicationAfricaNamibia2005Lower-middle
Model applicationWestern PacificNauru2016Upper-middle
Model applicationAfricaNiger2007Lower
Model applicationWestern PacificNiue2012No data
Model applicationWestern PacificPalau2013Upper-middle
Model applicationEastern MediterraneanQatar2012High
Model applicationEuropeRepublic of Moldova2013Lower-middle
Model applicationAfricaRwanda2013Lower
Model applicationWestern PacificSamoa2013Lower-middle
Model applicationAfricaSao Tome and Principe2008Lower-middle
Model applicationAfricaSierra Leone2009Lower
Model applicationSoutheast AsiaSri Lanka2015Lower-middle
Model applicationEuropeTajikistan2017Lower
Model applicationSoutheast AsiaTimor-Leste2014Lower-middle
Model applicationAfricaTogo2011Lower
Model applicationWestern PacificTuvalu2015Upper-middle
Model applicationAfricaUganda2014Lower
Model applicationAfricaUnited Republic of Tanzania2012Lower
Model applicationAmericasUruguay2014High
Model applicationWestern PacificVanuatu2011Lower-middle
Model applicationWestern PacificVietnam2015Lower-middle
Model derivationEuropeArmenia2016Lower-middle
Model derivationEuropeAzerbaijan2017Upper-middle
Model derivationSoutheast AsiaBangladesh2018Lower-middle
Model derivationEuropeBelarus2017Upper-middle
Model derivationSoutheast AsiaBhutan2014Lower-middle
Model derivationSoutheast AsiaBhutan2019Lower-middle
Model derivationWestern PacificBrunei Darussalam2016High
Model derivationAmericasChile2017High
Model derivationEastern MediterraneanJordan2019Upper-middle
Model derivationEastern MediterraneanLebanon2017Upper-middle
Model derivationAfricaMalawi2017Lower
Model derivationWestern PacificMongolia2013Lower-middle
Model derivationWestern PacificMongolia2019Lower-middle
Model derivationEastern MediterraneanMorocco2017Lower-middle
Model derivationSoutheast AsiaNepal2019Lower-middle
Model derivationWestern PacificSolomon Islands2015Lower-middle
Model derivationEastern MediterraneanSudan2016Lower-middle
Model derivationWestern PacificTokelau2014No data
Model derivationWestern PacificTonga2017Upper-middle
Model derivationEuropeTurkmenistan2018Upper-middle
Model derivationAfricaZambia2017Lower-middle
Appendix 1—figure 1
Flowchart of data cleaning and inclusion criteria for model derivation.

Appendix 2

Expanded methods

Characteristics of the surveys included in the analysis

We analyzed WHO STEPS surveys and one national health survey (Chile) (World Health Organization, 2021b; Wang et al., 2020). These surveys included a random sample of the general population and can deliver nationally representative estimates. These are household surveys that stratify by the first administrative level in the country (e.g., region); within this level, further stratification may occur by, for example, urban/rural location. Then, a random sample of census tracts, villages, neighborhoods, or other similar division is selected. In each of these primary sampling units, households are randomly sampled for the interview.

All surveys followed standard procedures (World Health Organization, 2021b; Wang et al., 2020). Briefly, participants were given a small container along with instructions for the urine collection; the next day, participants brought the urine sample to a designated place. Then, urine samples were analyzed at a laboratory by a trained technician.

Overview

We worked with a structured dataset that mostly had numeric attributes (variables). Given our study problem, we opted for a supervised learning model because there was a target attribute (i.e., salt consumption at the subject level); specifically, we conducted a supervised regression because the target attribute was a numeric variable. For the machine learning analyses, we used Python and the Scikit-Learn library.

First, we developed a pipeline for data management and model development. This way, we followed a consistent and transparent methodology to secure an optimal model for the training set and that would adequately generalize to other (unseen) datasets. Appendix 2—figure 1 depicts the pipeline we developed: (i) we studied the available data and where needed, we did a one-hot encoding; (ii) we did feature importance analysis; (iii) we chose and tried different scaling and transformation methods, so that all variables would be in the same scale or units; (iv) we tried a set of machine learning models, including a customized neural network; and (v) we forecasted (predicted) the attribute of interest (salt consumption at the subject level) in an unseen dataset (i.e., not used for model training). Notably, we went backward and forward (see arrows in the figure) between the four first stages until we reached the best combinations and results for each model. In the following sections, we will describe each of these five stages.

Appendix 2—figure 1
Pipeline for data management and model development.

PCA, primary component analysis; LiR, linear regression; HuR, Hubber regressor; RiR, ridge regressor; MLP, multilayer perceptron; SVR, support vector regressor; KNN, k-nearest neighbors; RF, random forest; GBM, gradient boost machine; XGB, extreme gradient boosting; NN, neural network.

Data analysis

This was an exploratory analysis to understand the dataset and its characteristics. We worked with a complete-case dataset; in other words, we excluded missing observations in the variables considered in the analysis. Consequently, we did not do any data imputation analysis.

We explored the distribution of all numerical variables, which were in different units and scales; this exploratory analysis informed the choices of data processing methods (e.g., Box-Cox) implemented in the third stage.

Feature importance analysis

Even though we followed expert knowledge to select a reduced, though relevant number of predictors to be included in the regression model, we conducted feature importance analyses to understand the role each predictor would play in the model. This process aimed to eliminate variables that would not carry substantial information for the model. We used random forest, recursive feature elimination, and extra trees. Consistently, these three methods suggested that all the chosen predictors would contribute to a better model.

Data processing

As described in the data analysis section (first stage), numeric variables were in different units and scales; therefore, these variables needed to be scaled or transformed. This scaling would also help to find a better prediction model. It is common knowledge that machine learning models would perform differently (and better) depending on data transformation methods. We did (i) min-max whereby numeric variables were scaled to a range between 0 and 1; (ii) standardization; (iii) normalization: (iv) polynomial features of degree 2 (quadratic polynomial); (v) principal component analysis with three components and explained variance of ≥0.95; and (vi) Box-Cox.

Data modeling

There are several machine learning algorithms for a supervised regression model. Those that we used, and that are depicted in Appendix 1—figure 1, yielded much better results and were studied in detail. That is, at the beginning of our work we explored other algorithms, though these did not perform well and were not considered thereafter. The algorithms we considered were (i) linear regression (LiR); (ii) Hubber regressor (HuR); (iii) ridge regressor (RiR); (iv) multilayer perceptron (MLP); (v) support vector regressor (SVR); (vi) k-nearest neighbors (KNN); (vii) random forest (RF); (viii) gradient boost machine (GBM); and (ix) extreme gradient boosting (XBG).

In addition to these nine machine learning algorithms, we also implemented a neural network (see Appendix 2—figure 2). This neural network was optimized empirically. We used a batch size = 256; epochs = 300; and optimizer = ‘adam.’ The neural network was implemented in Python using the Keras library.

For each model and processing method (see ‘Data processing’ section), we studied the R2, mean absolute error (MAE), and root mean square error (RMSE). As shown in Appendix 1—table 1, all algorithms showed a similar performance. Because all the algorithms had an equivalent performance, the chosen one needed to be defined at the forecasting stage; that is, the one that would generalize better to new (unseen) data.

Forecasting modeling

This stage implies studying the predicted results in new (unseen) data (i.e., data not used for model training). For this stage, we used the validation and test datasets. We chose the model that yielded predictions closest to the observed results. In this line, we compared the mean difference between the observed and predicted mean salt intake results (i.e., observed – predicted) across all prediction algorithms.

We observed there was no unique algorithm that had the mean difference closest to zero in men and women at the same time (Appendix 1—table 2). The HuR algorithm had the mean difference closest to zero in both sexes combined (mean difference = –0.0019), the RiR algorithm performed the best in men (mean difference = 0.0063), and in women the HuR algorithm showed the best results (mean difference = 0.0082).

To support our decision process, we plotted the mean differences in men and women for each survey (Appendix 2—figure 3); this figure only included the predictions based on the top three algorithms (HuR, MLP, and customized NN). We counted how many times (i.e., number of surveys) each algorithm had the mean difference closest to zero.

Because the HuR algorithm had the mean difference closest to zero in both sexes combined and it was among the top five algorithms in men and women (Appendix 1—table 2), we decided to choose the HuR algorithm. Additionally, predictions based on the HuR algorithm were the closest to zero across surveys (Appendix 2—figure 3). These analyses were performed in R (version 4.0.3).

Algorithm application

To make the predictions in the new 54 datasets without information about urine samples, we used the HuR model (i.e., ML algorithm and predictors) developed following the methods above described (see ‘Forecasting modeling’ section). We re-trained the model with the full dataset used for model development and validation (i.e., train, validated, and test dataset pooled), and then predicted the outcome (i.e., mean salt intake) in the 54 new datasets.

Appendix 2—figure 2
Neural network implementation.
Appendix 2—table 1
Performance of each algorithm and processing method.
AlgorithmProcessingR2MAERMSE
LiRPolynomial (g = 2)0.4471.11381.4451
HuRStandardized0.4471.11321.4442
RiRPolynomial (g = 2)0.4461.11471.4459
MLPMin-max0.4511.11011.4395
SVRMin-max0.4461.09881.4459
KNNStandardized0.4211.14261.4779
RFPolynomial (g = 2)0.4171.14741.4835
GBMMin-max0.4471.11471.4447
XGBMin-max0.4311.12931.4646
Customized NNBox-Cox0.4611.09531.4156
  1. MAE: mean absolute error; RMSE: root mean square error; LiR: linear regression; HuR: Hubber regressor; RiR: ridge regressor; MLP: multilayer perceptron; SVR: support vector regressor; KNN: k-nearest neighbors; GBM: gradient boost machine; XGB: extreme gradient boosting; NN: neural network; RF: random forest.

Appendix 2—table 2
Mean difference between observed and predicted salt intake by sex across all machine learning algorithms.
Machine learning algorithmMean difference between observed and predicted mean salt intakeSex
CNN_boxcox–0.0109Both sexes
CNN_standardize–0.0075Both sexes
GBR_boxcox0.1373Both sexes
GBR_minmax0.1198Both sexes
GBR_orig–0.0252Both sexes
GBR_standardized0.1231Both sexes
HuR_boxcox0.0389Both sexes
HuR_standardized–0.0019Both sexes
KNN_boxcox0.0144Both sexes
KNN_standardized–0.0172Both sexes
LiR_poly–0.0292Both sexes
MLP_boxcox–0.0069Both sexes
MLP_minmax–0.019Both sexes
MLP_standardized–0.0174Both sexes
RF_poly–0.0479Both sexes
RiR_poly–0.0304Both sexes
SVR_minmax0.1137Both sexes
XGB_boxcox0.0389Both sexes
XGB_orig–0.0312Both sexes
XGB_standardized–0.0329Both sexes
CNN_boxcox0.088Men
CNN_standardize0.0699Men
GBR_boxcox0.1591Men
GBR_minmax0.1381Men
GBR_orig0.0197Men
GBR_standardized0.1444Men
HuR_boxcox0.0265Men
HuR_standardized–0.0119Men
KNN_boxcox0.0612Men
KNN_standardized0.0179Men
LiR_poly0.0069Men
MLP_boxcox0.0512Men
MLP_minmax–0.0104Men
MLP_standardized–0.0249Men
RF_poly–0.0129Men
RiR_poly0.0063Men
SVR_minmax0.1265Men
XGB_boxcox0.0265Men
XGB_orig0.0147Men
XGB_standardized0.0069Men
CNN_boxcox–0.1097Women
CNN_standardized–0.085Women
GBR_boxcox0.1155Women
GBR_minmax0.1015Women
GBR_orig–0.07Women
GBR_standardized0.1018Women
HuR_boxcox0.0514Women
HuR_standardized0.0082Women
KNN_boxcox–0.0324Women
KNN_standardized–0.0524Women
LiR_poly–0.0653Women
MLP_boxcox–0.0649Women
MLP_minmax–0.0276Women
MLP_standardized–0.0098Women
RF_poly–0.0828Women
RiR_poly–0.0671Women
SVR_minmax0.101Women
XGB_boxcox0.0514Women
XGB_orig–0.0771Women
XGB_standardized–0.0727Women
Appendix 2—figure 3
Comparison between mean difference between observed and predicted salt intake across the best algorithms.

CNN, customized neural network; HuR: Hubber regressor; MLP, multilayer perceptron.

Data availability

This study used nationally-representative survey data that are in the public domain, which was requested through the online repository (https://extranet.who.int/ncdsmicrodata/index.php/home). We provide the analysis code of data preparation and data analysis as supplementary materials to this paper (Source Code File - "Analysis Code | Python and R").

References

Decision letter

  1. Gian Paolo Rossi
    Reviewing Editor; University of Padua, Padua, Italy
  2. Matthias Barton
    Senior Editor; University of Zurich, Zurich, Switzerland
  3. Gian Paolo Rossi
    Reviewer; University of Padua, Padua, Italy

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Estimating salt consumption in 49 low-and middle-income countries: Development, validation and application of a machine learning model" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, including Gian Paolo Rossi as Reviewing Editor and Reviewer #1, and the evaluation has been overseen by a Senior Editor.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Both reviewers have found your manuscript of interest and potential relevance, as estimation of salt consumption by artificial intelligence in the population is a general problem. The Reviewers also agreed that this is more important for the low-mid income countries, which cannot afford measurements of sodium in a 24 hour urine collection or in a spot urinary sample. However, since the problem of estimating salt intake is not confined to such countries, a valuable addition that can increase the scientific merit of the study would be to provide data also on middle and high income countries.

They regarded as strengths of the study the development of a tool for estimating the sodium intake applicable to each country, particularly to those where it is difficult to collect urine specimens, and also the novel machine learning approach applied to 19 WHO STEPS surveys including more than 45,000 people.

However, some methodological limitations were also noted, including the fact that your ML model was developed and validated using, as reference, data obtained from a 'golden' (spot urine samples), not a gold standard method (i.e. 24-hour urine sample). One Reviewer underlined that all equations, including the Intersalt's used as outcome in this study, can imply a bias in predicting 24h U-Na+ excretion, even with use of correction formulas (see Charlton KE, Schutte A et al., 2020). Hence, the finding that mean salt consumption predicted by the supervised ML model did not differ significantly from the mean observed value, should be validated with the Na+ intake determined by the 24h U-Na+ excretion.

Reviewer #1 (Recommendations for the authors):

I find these results to be important. The manuscript is well written and the methodology seems to be correct. However, since the problem of estimating salt intake is not confined to low income countries, a valuable addition that can increase the scientific merit of the study would be to provide data also on middle and high income countries.

Reviewer #2 (Recommendations for the authors):

In this study Guzman-Vilca et al., investigated if a machine learning (ML) model based on predictors that are routinely available in large scale surveys could predict salt intake in low- and middle-income countries (LMICs), and could be an appropriate tool to estimate sodium/salt intake in the national health surveys.

This is an interesting study that moves from the need of estimating sodium intake in countries that have no access to urine samples, and exploits a novel method to pursue the aim. However, the study suffers a major methodological limitation that should be deeply considered.

Major criticisms:

The Authors trained, tested and validated the ML model using data obtained from ‘golden standard’ methods (spot urine samples), not a gold standard method as reference (i.e. 24-hour urine sample) as recommended by STARD (Bossuyt PM. Ann Intern Med 2003). Even though not updated, there are survey from LMICs considering 24h U-Na+ excretion. This is a methodological limitation that should be amended.

Moreover, all equations, including the Intersalt used as outcome in this study, can implies a bias in predicting 24h U-Na+ excretion, even after using correction formulas (e.g. see Charlton KE, Schutte A et al., 2020). Hence, the mean salt consumption predicted by the supervised ML model, which was found not significantly different from the mean observed value, could be significantly different from the intake calculated with 24h U-Na+ excretion. Finally, the deviations from WHO recommendations (<5g daily) in the LMICs could be different from those resulting from the golden (not gold) standard approach-ML based model.

The quality of each survey, including the 19 surveys used for ML training and validation and those 49 used for estimating salt intake, should be preliminary evaluated using a validated scoring system before data processing as QUADAS-2. Quality could be crucial to understand differences between observed and predicted values.

Only a sub-analysis by sex was reported. It could be interesting to also evaluate how the ML model works at different ages, BP and BMI values.

Age range 15-69. Data from old/very old people were not considered at all. It is unclear if the exclusion of old people is related to the STEPS templates that include limited classes of age, or not.

Figure Pipeline Modeling Analysis. Please introduce abbreviations and provide a brief legend.

Last Figure (number is missing). Please add number, title and legend; enlarge dots and text on the right.

Tables. Please remove decimals from SBP and DBP.

https://doi.org/10.7554/eLife.72930.sa1

Author response

Both reviewers have found you manuscript of interest and potential relevance, as estimation of salt consumption by artificial intelligence in the population is a general problem. The Reviewers also agreed that this is more important for the low-mid income countries, which cannot afford measurements of sodium in a 24 hour urine collection or in a spot urinary sample. However, since the problem of estimating salt intake is not confined to such countries, a valuable addition that can increase the scientific merit of the study would be to provide data also on middle and high income countries.

We are glad the editors and reviewers found merit in our work. We agree that this is a relevant topic. We also agree that a work including many more countries, regardless of income group and world region, would be of higher scientific value.

In this revised version we analysed more surveys, including high-income countries. This table shows the income group of each country, and whether it was included in the derivation or application phase of our analysis. Consequently, throughout our manuscript where we referred to the analysed surveys, we changed “LMICs” to “countries” (or “surveys”).

This table is now included in the Appendix 1 – table 9 and described in the text:

“According to the World Bank classification (Appendix 1 – table 9), there were nine high-income countries (two in model derivation and seven in model application); 16 low-income countries (one in model derivation and 15 in model application); 26 lower-middle income countries (9 in model derivation and 17 in model application); and 18 upper-middle income countries (six in model derivation and 12 model application). There were four countries (one in model derivation and three in model application) without income classification (British Virgin Islands, Cook Islands, Niue and Tokelau).” [pp. 11-12]

The title of the manuscript was updated as well:

“Development, validation and application of a machine learning model to estimate salt consumption in 54 countries” [p. 01]

Throughout the manuscript we updated the total number of surveys included in the derivation or application phase.

In conclusion, in this revised submission we increased the number of analysed surveys (see Author response table 1). Overall, for model development we analysed two additional surveys (1 upper-middle and 1 high income). For model application, we analysed five additional surveys (2 low, 1 lower-middle, 1 upper-middle and 1 high income). That is, of the seven new surveys included in the revised analysis, four were upper-middle or high-income countries.

Author response table 1
Income
TotalLowLower-middleUpper-middleHighNo group
Model developmentOriginal1719511
Revision1919621
Difference+2+1+1
Model applicationOriginal4913161163
Revision5415171273
Difference+5+2+1+1+1

Furthermore, of all upper-middle-income countries in the world, 32% (18/56) were included in the analysis; of all high-income countries in the world, 11% (9/83) were included in the analysis; of all lower-middle-income countries in the world, 52% (26/50) were included in the analysis; of all low-income-countries in the world, 55% (29) were included in the analysis. Although still not covering all countries in the world, our work includes countries in all income groups as suggested by the reviewers. We appreciate the editor’s and reviewers’ suggestions which improved our work.

Reviewer #1 (Recommendations for the authors): I find these results to be important. The manuscript is well written and the methodology seems to be correct. However, since the problem of estimating salt intake is not confined to low income countries, a valuable addition that can increase the scientific merit of the study would be to provide data also on middle and high income countries.

We understood the reviewer’s concern and appreciate his suggestion. To meet his expectations, we conducted additional work. We trust the reviewers and editors will find this additional work satisfactory. We are happy to receive further feedback as needed.

In this revised version we analysed more surveys, including more high-income countries. Please, refer to the tables and text in our response to the editor’s comment.

For our analysis we targeted population-based nationally representative surveys; in addition, these surveys must be available for independent analysis free of charge and without significant burden to access (e.g., data sharing agreements that may involve institutional signatures). This was further detailed in the manuscript:

“We sought surveys that met these two criteria: (i) nationally representative health surveys (i.e., community or sub-national surveys were not included); and (ii) surveys that were open access or that could be accessed without significant administrative burden (e.g., data sharing agreements that may involve institutional signatures).” [p. 11]

The WHO STEPS surveys met these criteria. In response to the reviewer’s comment we sought additional surveys following these procedures.

A work by the Global Burden of Diseases Nutrition and Chronic Diseases Expert Group delivered estimates of 24-hour urinary sodium excretion for 187 countries. We reviewed the data sources included in their analysis. These data sources were not nationally representative or were not accessible for independent re-analyses. Therefore, we could not incorporate any of these data sources in our work.

In a systematic review by Sudhir Raj Thout and colleagues, they sought nationally representative surveys with 24-hour urine collection. While almost all data came from high-income countries, these data were not open access for independent re-analyses. Therefore, we could not incorporate any of these data sources in our work.

In a systematic review by our research group, which focused on population-based surveys from Latin America regardless of the method of urine collection (24-hour vs otherwise), there were three national health surveys. We accessed data from a national health survey in Chile (high-income country). Therefore, we included this survey in our analysis for model derivation.

In a systematic review by Oyinlola Oyebode and collagues, they summarised studies from Africa with urine and sodium information. We checked the references they found though none were available for independent re-analyses. Therefore, we could not incorporate any of these data sources in our work.

This process was explained in the Materials and methods section.

“To identify additional data sources we searched the original publications included in one global analysis and three systematic reviews about sodium/salt consumption at the population level. This search led to the identification of the national health survey included in the model derivation. All other data sources included in those references did not meet our selection criteria.

In conclusion, our ML model was developed based on 21 surveys (20 WHO STEPS and 1 national health survey). Then, our ML model was applied to 54 WHO STEPS survey to compute the mean daily salt consumption at the population level.” [p. 11]

Reviewer #2 (Recommendations for the authors):

In this study Guzman-Vilca et al., investigated if a machine learning (ML) model based on predictors that are routinely available in large scale surveys could predict salt intake in low- and middle-income countries (LMICs), and could be an appropriate tool to estimate sodium/salt intake in the national health surveys.

This is an interesting study that moves from the need of estimating sodium intake in countries that have no access to urine samples, and exploits a novel method to pursue the aim. However, the study suffers a major methodological limitation that should be deeply considered.

Major criticisms:

The Authors trained, tested and validated the ML model using data obtained from 'golden standard' methods (spot urine samples), not a gold standard method as reference (i.e. 24-hour urine sample) as recommended by STARD (Bossuyt PM. Ann Intern Med 2003). Even though not updated, there are survey from LMICs considering 24h U-Na+ excretion. This is a methodological limitation that should be amended.

For our analyses we targeted surveys that met these two criteria: (i) nationally representative health surveys; and (ii) surveys that were in the public domain (open access) or that could be accessed without significant administrative burden (e.g., complex data sharing agreements that may involve institutional signatures). This was further detailed:

“We sought surveys that met these two criteria: (i) nationally representative health surveys (i.e., community or sub-national surveys were not included); and (ii) surveys that were open access or that could be accessed without significant administrative burden (e.g., data sharing agreements that may involve institutional signatures).” [p. 11]

Despite our experience working in data pooling projects,[4] and the additional work to identify more relevant surveys, we are not aware of any nationally representative health survey with 24hour urine specimens that are in the public domain available for independent re-analyses. In other words, there are no surveys with 24-hour urine samples that would meet our two inclusion criteria.

We reviewed the references of one global modelling study and three recent systematic reviews. We carefully checked the data sources they summarised including those with 24hour urine samples, and we could not find any data sources that were available for independent re-analyses (open access). In conclusion, even though we wanted to include surveys with 24hour urine samples, and we exhaustively sought these surveys, there are no nationally representative surveys with 24-hour urine samples available for independent re-analyses.

The fact that we only analysed spot urine samples is a limitation which was acknowledged in our original submission. We have further highlighted this limitation throughout our work.

“As an alternative, equations have been developed to estimate sodium/salt intake based on spot urine (SU) samples. Although these equations may not deliver identical results to those based on 24-hour urine samples at the individual level, at the population level the difference between SU samples and 24-hour samples appears to be small.” [p. 03]

The fact that we analysed SU samples is a limitation of our work and the results should be interpreted accordingly.” [p. 12]

“Of note, our outcome variable was informed by SU samples and not by 24-hour urine samples (gold-standard to assess salt consumption). Results should be interpreted according to this limitation.” [p. 13]

“It should be noted that we analysed SU samples. These are not the gold standard to assess salt consumption. Results should be interpreted in light of this limitation, considering that our model aimed to deliver estimates at the population level (not individual level).” [p. 07]

“First and foremost, urine data was based on a spot sample, which is not the gold standard (24hour urine sample) to measure daily salt consumption.” [pp. 09-10]

We trust the editors and reviewers will still find value in our work. We acknowledged this is a limitation of our work, which was transparently highlighted throughout the manuscript. Unfortunately, there is nothing we can do to overcome this limitation due to the lack of open-access data with 24-hour urine samples. This is a shared limitation with a previous global analysis in which they had to combine 24-hour urine samples and diet assessment tools because of the global lack of data on 24-hour urine samples: “…assessment methods included 24 h urinary excretion measurements, a diet assessment tool (e.g., diet record, diet recall, food frequency questionnaire…”.10)

The fact that we only analysed spot urine samples is a limitation. This was acknowledged throughout our manuscript. This is also an observation which calls to produce nationally representative health surveys with 24-hour urine samples, and make them available for independent re-analyses. This argument was further elaborated in:

“While this –re-analysis of SU sample rather than 24-hour urine samples– is a limitation of our work, it is also an observation showing the lack of nationally representative surveys with 24hour urine samples available for independent re-analyses.” [p. 10]

Moreover, all equations, including the Intersalt used as outcome in this study, can implies a bias in predicting 24h U-Na+ excretion, even after using correction formulas (e.g. see Charlton KE, Schutte A et al., 2020). Hence, the mean salt consumption predicted by the supervised ML model, which was found not significantly different from the mean observed value, could be significantly different from the intake calculated with 24h U-Na+ excretion. Finally, the deviations from WHO recommendations (<5g daily) in the LMICs could be different from those resulting from the golden (not gold) standard approach-ML based model.

Despite our efforts to systematically identify nationally representative health surveys that are in the public domain and included 24-hour urine samples, we did not find any. Please, refer to our previous answer for further details on this subject.

We agree with the observation that the equations can imply bias in estimating the 24-hour salt consumption. This was specifically acknowledged in a new sentence in the introduction.

“As an alternative, equations have been developed to estimate sodium/salt intake based on spot urine (SU) samples. Although these equations may not deliver identical results to those based on 24-hour urine samples at the individual level, at the population level the difference between SU samples and 24-hour samples appears to be small.” [p. 03]

However, we would like to rise the following argument. Our focus was the population, not the individual. We aimed to provide a machine learning model which learnt from individual-level data to provide mean estimates at the population level (i.e., overall mean in the country). We did not produce a machine learning model to be used to predict salt consumption for each individual.

The reason why we did not develop a model to be used in individuals relies on the remarks made by the reviewer: potential disagreements between spot urine samples and the gold standard (24-hour urine samples). However, at the population level, the mean appears to be similar between spot urine samples and 24-hour urine samples.[6,7] For example, “Overall average population salt intake estimated from 24-h urine samples was 9.3 g/day compared with 9.0 g/day estimated from the spot urine samples.”13

Using spot urine samples is indeed a limitation, which was fully acknowledged throughout the manuscript (please, refer to our previous answer). However, at the population level, the differences between spot urine samples and 24-hour urine samples may be small. We have included these arguments in the manuscript.

“As an alternative, equations have been developed to estimate sodium/salt intake based on spot urine (SU) samples. Although these equations may not deliver identical results to those based on 24-hour urine samples at the individual level, at the population level the difference between SU samples and 24-hour samples appears to be small.” [p. 03]

“If we could (accurately) estimate sodium/salt intake at the population level based on variables that are routinely available in national health surveys (e.g., weight or blood pressure), mean sodium/salt intake at the population level in countries that currently lack urine data (i.e., 24-hour or spot) could be computed using these available predictors. Advanced analytic techniques like machine learning (ML) could make accurate predictions, and inform about the mean sodium/salt intake at the population level. We developed a ML predictive model to estimate mean salt intake at the population level (not at the individual level) using routinely available variables in national health surveys” [p. 04]

“If the model were indeed accurate, then it could be applied to national surveys without urine samples but with the relevant predictors to inform about the mean salt consumption in the overall population.” [p. 12]

“However, we aimed to develop a model that can be used to predict mean estimates at the population level, not at the individual level. In other words, our model should not be applied to a patient to estimate his/her salt consumption. Our model should be applied to survey data to compute the mean sodium/salt consumption in the population (not in individuals). Empirical evidence suggests that, at the population level, mean estimates based on SU samples and on 24-hour urine samples are similar.” [p. 12]

“While SU samples may not be the best approach to estimate salt consumption at the individual level, at the population level the means estimated based on SU samples and 24-hour urine samples are similar. Therefore, the limitation of using SU samples only may have had little impact on our mean estimates which are the country level, not at the individual level.” [p. 10]

“We developed a machine learning (ML) model to predict salt consumption at the population level based on simple predictors…We applied the ML model to 54 new surveys to quantify the mean salt consumption in the population.” [abstract]

The quality of each survey, including the 19 surveys used for ML training and validation and those 49 used for estimating salt intake, should be preliminary evaluated using a validated scoring system before data processing as QUADAS-2. Quality could be crucial to understand differences between observed and predicted values.

We apologise unreservedly, but we are not sure how to address this comment. We tried our best, and we are happy to receive further guidance from the reviewer and editor if needed.

To guide our response, we reviewed publications in which they analysed multiple WHO STEPS surveys and other nationally representative surveys. Author response table 2 shows recent publications using multiple national health surveys where we did not find any similar preliminary evaluation.

Author response table 2
Teufel F, et al. Body-mass index and diabetes risk in 57 low-income and middle-income countries: a cross-sectional study of nationally representative, individual-level data in 685 616 adults. Lancet. 2021;398(10296):238–48.
Flood D, et al., The state of diabetes treatment coverage in 55 low-income and middle-income countries: a cross-sectional study of nationally representative, individual-level data in 680 102 adults. The L,ancet Healthy Longevity. 2021;2(6):e340–51.
Peiris D, et al., Cardiovascular disease risk profile and management practices in 45 lowincome and middle-income countries: A cross-sectional study of nationally representative individual-level survey data. PLoS Med. 2021;18(3):e1003485.
Davies JI, et al., Association between country preparedness indicators and quality clinical care for cardiovascular disease risk factors in 44 lower- and middle-income countries: A multicountry analysis of survey data. PLoS Med. 2020;17(11):e1003268.
Seiglie JA, et al., Diabetes Prevalence and Its Relationship With Education, Wealth, and BMI in 29 Low- and Middle-Income Countries. Diabetes Care. 2020;43(4):767–75.
Geldsetzer P, et al., The state of hypertension care in 44 low-income and middle-income countries: a cross-sectional study of nationally representative individual-level data from 1·1 million adults. Lancet. 2019;394(10199):652–62.
Manne-Goehler J, et al., Health system performance for people with diabetes in 28 low- and middle-income countries: A cross-sectional study of nationally representative surveys. PLoS Med. 2019;16(3):e1002751.

Nonetheless, in comparison to our original submission, these publications included further details about the sampling design and data collection of the main variable(s). We included the following text in the Appendix 2. These lines summarized the sampling design and data collection processes of the surveys we pooled. References to the manuals and further documentation about these surveys are provided as well.

“We analysed WHO STEPS surveys and one national health survey (Chile). These surveys included a random sample of the general population and can deliver nationally representative estimates. These are household surveys that stratify by the first administrative level in the country (e.g., region); within this level, further stratification may occur by, for example, urban/rural location. Then, a random sample of census tracts, villages, neighbourhoods or other similar division, is selected. In each of these primary sampling units, households are randomly sampled for the interview.

All surveys followed standard procedures. Briefly, participants were given a small container along with instructions for the urine collection; the next day, participants brought the urine sample to a designated place. Then, urine samples were analysed at a laboratory by a trained technician.”

We are also doubtful about the use of QUADAS-2 tool. This tool is for diagnostic accuracy studies. As we discussed in our previous response, we did not develop a diagnostic tool (i.e., a machine learning model for the individuals). Rather, we aimed to develop a machine learning algorithm that could leverage on survey data to compute the mean at the population level. We included this idea in the manuscript:

“We did not develop a diagnostic tool to replace SU or 24-hour urine samples.” [p. 12]

Other criticisms and suggestions"

Only a sub-analysis by sex was reported. It could be interesting to also evaluate how the ML model works at different ages, BP and BMI values.

We included these results. The following table is now available in the Appendix 1 – Table 3 and referenced in the Results in the main document.

“The mean observed salt intake was higher in people aged ≥30 years (7.9 g/day vs 8.4 g/day, p<0.05 for independent T-test), and so was for people with raised blood pressure (≥140/90 mmHg) (8.7 g/day vs 8.2 g/day, p<0.05). The mean salt consumption was also different across BMI categories (p<0.05 for ANOVA test). The same profile was found for predicted mean salt intake (Appendix 1 – Table 3).” [p. 05]

Age range 15-69. Data from old/very old people were not considered at all. It is unclear if the exclusion of old people is related to the STEPS templates that include limited classes of age, or not.

Yes, it is because of the age structure of the surveys we re-analysed. This was clarified.

“Our complete-case analysis was restricted to men and non-pregnant women aged between 15 and 69 years because of data availability.” [p. 13]

Figure Pipeline Modeling Analysis. Please introduce abbreviations and provide a brief legend.

The abbreviations were spelled out.

“PCA: primary component analysis; LiR: Linear Regression; HuR: Hubber Regressor; RiR: Ridge Regressor; MLP: Multilayer Perceptron; SVR: Support Vector Regressor; KNN: kNearest Neighbors; RF: Random Forest; GBM: Gradient Boost Machine; XGB: Extreme Gradient Boosting; NN: Neural Network.” [Appendix 2 – Figure 1]

Last Figure (number is missing). Please add number, title and legend; enlarge dots and text on the right.

Appendix 2 – Figure 3 has been modified as suggested.

Tables. Please remove decimals from SBP and DBP.

In all Appendix 1 – tables, SBP and DBP are presented without decimal places.

References

  • Powles J, Fahimi S, Micha R, Khatibzadeh S, Shi P, Ezzati M, et al., Global, regional and national sodium intakes in 1990 and 2010: a systematic analysis of 24 h urinary sodium excretion and dietary surveys worldwide. BMJ Open. 2013 Dec 23;3(12):e003733.

  • Thout SR, Santos JA, McKenzie B, Trieu K, Johnson C, McLean R, et al., The Science of Salt: Updating the evidence on global estimates of salt intake. J Clin Hypertens (Greenwich). 2019 Jun;21(6):710–21.

  • Oyebode O, Oti S, Chen Y-F, Lilford RJ. Salt intakes in sub-Saharan Africa: a systematic review and meta-regression. Popul Health Metr. 2016;14:1

  • NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: a pooled analysis of 1201 population-representative studies with 104 million participants. Lancet. 2021 Sep 11;398(10304):957-980.

  • Carrillo-Larco RM, Bernabe-Ortiz A. Sodium and Salt Consumption in Latin America and the Caribbean: A Systematic-Review and Meta-Analysis of Population-Based Studies and Surveys. Nutrients. 2020 Feb 20;12(2)

  • Huang L, Crino M, Wu JHY, Woodward M, Barzi F, Land M-A, et al. Mean population salt intake estimated from 24-h urine samples and spot urine samples: a systematic review and meta-analysis. Int J Epidemiol. 2016 Feb;45(1):239–50.

  • Santos JA, Li KC, Huang L, Mclean R, Petersen K, Di Tanna GL, et al. Change in mean salt intake over time using 24-h urine versus overnight and spot urine samples: a systematic review and meta-analysis. Nutr J. 2020 Dec 6;19(1):136.

https://doi.org/10.7554/eLife.72930.sa2

Article and author information

Author details

  1. Wilmer Cristobal Guzman-Vilca

    1. School of Medicine Alberto Hurtado, Universidad Peruana Cayetano Heredia, Lima, Peru
    2. CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru
    3. Sociedad Científica de Estudiantes de Medicina Cayetano Heredia (SOCEMCH), Universidad Peruana Cayetano Heredia, Lima, Peru
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2194-8496
  2. Manuel Castillo-Cara

    Universidad de Lima, Lima, Peru
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2990-7090
  3. Rodrigo M Carrillo-Larco

    1. CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru
    2. Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, United Kingdom
    Contribution
    Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Supervision, Writing – original draft, Writing – review and editing
    For correspondence
    r.carrillo-larco@imperial.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2090-1856

Funding

Wellcome Trust (214185/Z/18/Z)

  • Rodrigo M Carrillo-Larco

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

We did not seek approval by an Institutional Review Board. We used individual-level survey data that do not include any personal identifiers.

Senior Editor

  1. Matthias Barton, University of Zurich, Zurich, Switzerland

Reviewing Editor

  1. Gian Paolo Rossi, University of Padua, Padua, Italy

Reviewer

  1. Gian Paolo Rossi, University of Padua, Padua, Italy

Publication history

  1. Received: August 9, 2021
  2. Preprint posted: September 2, 2021 (view preprint)
  3. Accepted: December 15, 2021
  4. Accepted Manuscript published: January 5, 2022 (version 1)
  5. Version of Record published: January 25, 2022 (version 2)

Copyright

© 2022, Guzman-Vilca et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 643
    Page views
  • 84
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Wilmer Cristobal Guzman-Vilca
  2. Manuel Castillo-Cara
  3. Rodrigo M Carrillo-Larco
(2022)
Development, validation, and application of a machine learning model to estimate salt consumption in 54 countries
eLife 11:e72930.
https://doi.org/10.7554/eLife.72930

Further reading

    1. Epidemiology and Global Health
    2. Immunology and Inflammation
    James A Hay, Stephen M Kissler ... Yonatan H Grad
    Research Article

    Background: The combined impact of immunity and SARS-CoV-2 variants on viral kinetics during infections has been unclear.

    Methods: We characterized 1,280 infections from the National Basketball Association occupational health cohort identified between June 2020 and January 2022 using serial RT-qPCR testing. Logistic regression and semi-mechanistic viral RNA kinetics models were used to quantify the effect of age, variant, symptom status, infection history, vaccination status and antibody titer to the founder SARS-CoV-2 strain on the duration of potential infectiousness and overall viral kinetics. The frequency of viral rebounds was quantified under multiple cycle threshold (Ct) value-based definitions.

    Results: Among individuals detected partway through their infection, 51.0% (95% credible interval [CrI]: 48.3-53.6%) remained potentially infectious (Ct<30) five days post detection, with small differences across variants and vaccination status. Only seven viral rebounds (0.7%; N=999) were observed, with rebound defined as 3+ days with Ct<30 following an initial clearance of 3+ days with Ct≥30. High antibody titers against the founder SARS-CoV-2 strain predicted lower peak viral loads and shorter durations of infection. Among Omicron BA.1 infections, boosted individuals had lower pre-booster antibody titers and longer clearance times than non-boosted individuals.

    Conclusions: SARS-CoV-2 viral kinetics are partly determined by immunity and variant but dominated by individual-level variation. Since booster vaccination protects against infection, longer clearance times for BA.1-infected, boosted individuals may reflect a less effective immune response, more common in older individuals, that increases infection risk and reduces viral RNA clearance rate. The shifting landscape of viral kinetics underscores the need for continued monitoring to optimize isolation policies and to contextualize the health impacts of therapeutics and vaccines.

    Funding: Supported in part by CDC contract #200-2016-91779, a sponsored research agreement to Yale University from the National Basketball Association contract #21-003529, and the National Basketball Players Association.

    1. Epidemiology and Global Health
    2. Evolutionary Biology
    Theo Sanderson
    Tools and Resources

    The COVID-19 pandemic has resulted in a step change in the scale of sequencing data, with more genomes of SARS-CoV-2 having been sequenced than any other organism on earth. These sequences reveal key insights when represented as a phylogenetic tree, which captures the evolutionary history of the virus, and allows the identification of transmission events and the emergence of new variants. However, existing web-based tools for exploring phylogenies do not scale to the size of datasets now available for SARS-CoV-2. We have developed Taxonium, a new tool that uses WebGL to allow the exploration of trees with tens of millions of nodes in the browser for the first time. Taxonium links each node to associated metadata and supports mutation-annotated trees, which are able to capture all known genetic variation in a dataset. It can either be run entirely locally in the browser, from a serverbased backend, or as a desktop application. We describe insights that analysing a tree of five million sequences can provide into SARS-CoV-2 evolution, and provide a tool at cov2tree.org for exploring a public tree of more than five million SARS-CoV-2 sequences. Taxonium can be applied to any tree, and is available at taxonium.org, with source code at github.com/theosanderson/taxonium.