Risk factors relate to the variability of health outcomes as well as the mean: a GAMLSS tutorial

  1. David Bann  Is a corresponding author
  2. Liam Wright
  3. Tim J Cole
  1. University College London, United Kingdom


Background: Risk factors or interventions may affect the variability as well as the mean of health outcomes. Understanding this can aid aetiological understanding and public health translation, in that interventions which shift the outcome mean and reduce variability are typically preferable to those which affect only the mean. However, most commonly used statistical tools do not test for differences in variability. Tools that do have few epidemiological applications to date, and fewer applications still have attempted to explain their resulting findings. We thus provide a tutorial for investigating this using GAMLSS (Generalised Additive Models for Location, Scale and Shape).

Methods: The 1970 British birth cohort study was used, with body mass index (BMI; N=6,007) and mental wellbeing (Warwick-Edinburgh Mental Wellbeing Scale; N=7,104) measured in midlife (42-46 years) as outcomes. We used GAMLSS to investigate how multiple risk factors (sex, childhood social class and midlife physical inactivity) related to differences in health outcome mean and variability.

Results: Risk factors were related to sizable differences in outcome variability-for example males had marginally higher mean BMI yet 28% lower variability; lower social class and physical inactivity were each associated with higher mean and higher variability (6.1% and 13.5% higher variability, respectively). For mental wellbeing, gender was not associated with the mean while males had lower variability (-3.9%); lower social class and physical inactivity were each associated with lower mean yet higher variability (7.2% and 10.9% higher variability, respectively).

Conclusions: The results highlight how GAMLSS can be used to investigate how risk factors or interventions may influence the variability in health outcomes. This underutilised approach to the analysis of continuously distributed outcomes may have broader utility in epidemiologic, medical, and psychological sciences. A tutorial and replication syntax is provided online to facilitate this (https://osf.io/5tvz6/).

Funding: DB is supported by the Economic and Social Research Council (grant number ES/M001660/1), The Academy of Medical Sciences / Wellcome Trust ('Springboard Health of the Public in 2040' award: HOP001/1025); DB and LW are supported by the Medical Research Council (MR/V002147/1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data availability

All data are available to download from the UK Data Archive: https://beta.ukdataservice.ac.uk/datacatalogue/series/series?id=200001

Article and author information

Author details

  1. David Bann

    Centre for Longitudinal Studies, Social Research Institute, University College London, london, United Kingdom
    For correspondence
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6454-626X
  2. Liam Wright

    Centre for Longitudinal Studies, Social Research Institute, University College London, london, United Kingdom
    Competing interests
    The authors declare that no competing interests exist.
  3. Tim J Cole

    Great Ormond Street Institute of Child Health, University College London, london, United Kingdom
    Competing interests
    The authors declare that no competing interests exist.


Medical Research Council (MR/V002147/1)

  • David Bann
  • Liam Wright

Economic and Social Research Council (ES/M001660/1)

  • Liam Wright

Wellcome Trust (HOP001/1025)

  • David Bann

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


Human subjects: This paper uses secondary data analysis using data from a cohort study which has been followed-up since birth in 1970. Cohort members provided informed consent, and the study received full ethical approval - most recently from the NRES Committee South East Coast-Brighton and Sussex.

Reviewing Editor

  1. Belinda Nicolau, McGill University, Canada

Publication history

  1. Preprint posted: March 31, 2021 (view preprint)
  2. Received: July 20, 2021
  3. Accepted: January 4, 2022
  4. Accepted Manuscript published: January 5, 2022 (version 1)
  5. Version of Record published: January 26, 2022 (version 2)


© 2022, Bann et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.


  • 955
    Page views
  • 130
  • 2

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. David Bann
  2. Liam Wright
  3. Tim J Cole
Risk factors relate to the variability of health outcomes as well as the mean: a GAMLSS tutorial
eLife 11:e72357.

Further reading

    1. Computational and Systems Biology
    2. Epidemiology and Global Health
    Oliver Robinson, Chung-Ho E Lau ... Martine Vrijheid
    Research Article

    Background: While biological age in adults is often understood as representing general health and resilience, the conceptual interpretation of accelerated biological age in children and its relationship to development remains unclear. We aimed to clarify the relationship of accelerated biological age, assessed through two established biological age indicators, telomere length and DNA methylation age, and two novel candidate biological age indicators , to child developmental outcomes, including growth and adiposity, cognition, behaviour, lung function and onset of puberty, among European school-age children participating in the HELIX exposome cohort.

    Methods: The study population included up to 1,173 children, aged between 5 and 12 years, from study centres in the UK, France, Spain, Norway, Lithuania, and Greece. Telomere length was measured through qPCR, blood DNA methylation and gene expression was measured using microarray, and proteins and metabolites were measured by a range of targeted assays. DNA methylation age was assessed using Horvath's skin and blood clock, while novel blood transcriptome and 'immunometabolic' (based on plasma protein and urinary and serum metabolite data) clocks were derived and tested in a subset of children assessed six months after the main follow-up visit. Associations between biological age indicators with child developmental measures as well as health risk factors were estimated using linear regression, adjusted for chronological age, sex, ethnicity and study centre. The clock derived markers were expressed as Δ age (i.e., predicted minus chronological age).

    Results: Transcriptome and immunometabolic clocks predicted chronological age well in the test set (r= 0.93 and r= 0.84 respectively). Generally, weak correlations were observed, after adjustment for chronological age, between the biological age indicators. Among associations with health risk factors, higher birthweight was associated with greater immunometabolic Δ age, smoke exposure with greater DNA methylation Δ age and high family affluence with longer telomere length. Among associations with child developmental measures, all biological age markers were associated with greater BMI and fat mass, and all markers except telomere length were associated with greater height, at least at nominal significance (p<0.05). Immunometabolic Δ age was associated with better working memory (p = 4e -3) and reduced inattentiveness (p= 4e -4), while DNA methylation Δ age was associated with greater inattentiveness (p=0.03) and poorer externalizing behaviours (p= 0.01). Shorter telomere length was also associated with poorer externalizing behaviours (p=0.03).

    Conclusions: In children, as in adults, biological ageing appears to be a multi-faceted process and adiposity is an important correlate of accelerated biological ageing. Patterns of associations suggested that accelerated immunometabolic age may be beneficial for some aspects of child development while accelerated DNA methylation age and telomere attrition may reflect early detrimental aspects of biological ageing, apparent even in children.

    Funding: UK Research and Innovation (MR/S03532X/1); European Commission (grant agreement numbers: 308333; 874583).

    1. Epidemiology and Global Health
    Katharine Sherratt, Hugo Gruson ... Sebastian Funk
    Research Article Updated


    Short-term forecasts of infectious disease burden can contribute to situational awareness and aid capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise the predictive performance of such forecasts if multiple models are combined into an ensemble. Here, we report on the performance of ensembles in predicting COVID-19 cases and deaths across Europe between 08 March 2021 and 07 March 2022.


    We used open-source tools to develop a public European COVID-19 Forecast Hub. We invited groups globally to contribute weekly forecasts for COVID-19 cases and deaths reported by a standardised source for 32 countries over the next 1–4 weeks. Teams submitted forecasts from March 2021 using standardised quantiles of the predictive distribution. Each week we created an ensemble forecast, where each predictive quantile was calculated as the equally-weighted average (initially the mean and then from 26th July the median) of all individual models’ predictive quantiles. We measured the performance of each model using the relative Weighted Interval Score (WIS), comparing models’ forecast accuracy relative to all other models. We retrospectively explored alternative methods for ensemble forecasts, including weighted averages based on models’ past predictive performance.


    Over 52 weeks, we collected forecasts from 48 unique models. We evaluated 29 models’ forecast scores in comparison to the ensemble model. We found a weekly ensemble had a consistently strong performance across countries over time. Across all horizons and locations, the ensemble performed better on relative WIS than 83% of participating models’ forecasts of incident cases (with a total N=886 predictions from 23 unique models), and 91% of participating models’ forecasts of deaths (N=763 predictions from 20 models). Across a 1–4 week time horizon, ensemble performance declined with longer forecast periods when forecasting cases, but remained stable over 4 weeks for incident death forecasts. In every forecast across 32 countries, the ensemble outperformed most contributing models when forecasting either cases or deaths, frequently outperforming all of its individual component models. Among several choices of ensemble methods we found that the most influential and best choice was to use a median average of models instead of using the mean, regardless of methods of weighting component forecast models.


    Our results support the use of combining forecasts from individual models into an ensemble in order to improve predictive performance across epidemiological targets and populations during infectious disease epidemics. Our findings further suggest that median ensemble methods yield better predictive performance more than ones based on means. Our findings also highlight that forecast consumers should place more weight on incident death forecasts than incident case forecasts at forecast horizons greater than 2 weeks.


    AA, BH, BL, LWa, MMa, PP, SV funded by National Institutes of Health (NIH) Grant 1R01GM109718, NSF BIG DATA Grant IIS-1633028, NSF Grant No.: OAC-1916805, NSF Expeditions in Computing Grant CCF-1918656, CCF-1917819, NSF RAPID CNS-2028004, NSF RAPID OAC-2027541, US Centers for Disease Control and Prevention 75D30119C05935, a grant from Google, University of Virginia Strategic Investment Fund award number SIF160, Defense Threat Reduction Agency (DTRA) under Contract No. HDTRA1-19-D-0007, and respectively Virginia Dept of Health Grant VDH-21-501-0141, VDH-21-501-0143, VDH-21-501-0147, VDH-21-501-0145, VDH-21-501-0146, VDH-21-501-0142, VDH-21-501-0148. AF, AMa, GL funded by SMIGE - Modelli statistici inferenziali per governare l'epidemia, FISR 2020-Covid-19 I Fase, FISR2020IP-00156, Codice Progetto: PRJ-0695. AM, BK, FD, FR, JK, JN, JZ, KN, MG, MR, MS, RB funded by Ministry of Science and Higher Education of Poland with grant 28/WFSN/2021 to the University of Warsaw. BRe, CPe, JLAz funded by Ministerio de Sanidad/ISCIII. BT, PG funded by PERISCOPE European H2020 project, contract number 101016233. CP, DL, EA, MC, SA funded by European Commission - Directorate-General for Communications Networks, Content and Technology through the contract LC-01485746, and Ministerio de Ciencia, Innovacion y Universidades and FEDER, with the project PGC2018-095456-B-I00. DE., MGu funded by Spanish Ministry of Health / REACT-UE (FEDER). DO, GF, IMi, LC funded by Laboratory Directed Research and Development program of Los Alamos National Laboratory (LANL) under project number 20200700ER. DS, ELR, GG, NGR, NW, YW funded by National Institutes of General Medical Sciences (R35GM119582; the content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS or the National Institutes of Health). FB, FP funded by InPresa, Lombardy Region, Italy. HG, KS funded by European Centre for Disease Prevention and Control. IV funded by Agencia de Qualitat i Avaluacio Sanitaries de Catalunya (AQuAS) through contract 2021-021OE. JDe, SMo, VP funded by Netzwerk Universitatsmedizin (NUM) project egePan (01KX2021). JPB, SH, TH funded by Federal Ministry of Education and Research (BMBF; grant 05M18SIA). KH, MSc, YKh funded by Project SaxoCOV, funded by the German Free State of Saxony. Presentation of data, model results and simulations also funded by the NFDI4Health Task Force COVID-19 (https://www.nfdi4health.de/task-force-covid-19-2) within the framework of a DFG-project (LO-342/17-1). LP, VE funded by Mathematical and Statistical modelling project (MUNI/A/1615/2020), Online platform for real-time monitoring, analysis and management of epidemic situations (MUNI/11/02202001/2020); VE also supported by RECETOX research infrastructure (Ministry of Education, Youth and Sports of the Czech Republic: LM2018121), the CETOCOEN EXCELLENCE (CZ.02.1.01/0.0/0.0/17-043/0009632), RECETOX RI project (CZ.02.1.01/0.0/0.0/16-013/0001761). NIB funded by Health Protection Research Unit (grant code NIHR200908). SAb, SF funded by Wellcome Trust (210758/Z/18/Z).