Research Article

Risk factors affecting polygenic score performance across diverse cohorts

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, United States
Division of Nephrology, Department of Medicine, Columbia University, United States
Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, United States
Department of Cardiovascular Medicine, Mayo Clinic, United States
Department of Biomedical Informatics, Vanderbilt University Medical Center, United States
Department of Pediatrics, University of Alabama at Birmingham, United States
Departments of Pediatrics and Medicine, Columbia University Irving Medical Center, Columbia University, United States
Department of Neurology, School of Medicine, University of Alabama at Birmingham, United States
The Center for Autoimmune Genomics and Etiology, Division of Human Genetics, Cincinnati Children's Hospital Medical Center, United States
Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, United States
Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, United States
Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, United States
Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, United States
Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington Medical Center, United States

Jan 24, 2025

https://doi.org/10.7554/eLife.88149.3

Open access
Copyright information

Version of Record: January 24, 2025 Read the peer reviews
Reviewed Preprint: v2 June 12, 2024
Reviewed Preprint: v1 July 31, 2023

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

Altmetric provides a collated score for online attention across various platforms and media.
See more details

1. Part of Collection
Special Issue: Systems Genetics

Edited by David James et al.

eLife assessment

This study presents a convincing analysis of the effects of covariates, such as age, sex, socioeconomic status, or biomarker levels, on the predictive accuracy of polygenic scores for body mass index; the work is further supported by important approaches for improving prediction accuracy by accounting for such covariates across a variety of association studies. The authors did a commendable job addressing reviewer suggestions and comments. The work will be of interest to colleagues using and developing methods for phenotypic prediction based on polygenic scores.

https://doi.org/10.7554/eLife.88149.3.sa0

Significance of the findings:

Important: Findings that have theoretical or practical implications beyond a single subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Convincing: Appropriate and validated methodology in line with current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Materials and methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

Apart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed the effects of covariate stratification and interaction on body mass index (BMI) PGS (PGS_BMI) across four cohorts of European (N = 491,111) and African (N = 21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R² differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R² being nearly double between best- and worst-performing quintiles for certain covariates. Twenty-eight covariates had significant PGS_BMI–covariate interaction effects, modifying PGS_BMI effects by nearly 20% per standard deviation change. We observed overlap between covariates that had significant R² differences among strata and interaction effects – across all covariates, their main effects on BMI were correlated with their maximum R² differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGS_BMI individuals have highest R² and increase in PGS effect. Using quantile regression, we show the effect of PGS_BMI increases as BMI itself increases, and that these differences in effects are directly related to differences in R² when stratifying by different covariates. Given significant and replicable evidence for context-specific PGS_BMI performance and effects, we investigated ways to increase model performance taking into account nonlinear effects. Machine learning models (neural networks) increased relative model R² (mean 23%) across datasets. Finally, creating PGS_BMI directly from GxAge genome-wide association studies effects increased relative R² by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGS_BMI performance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.

Introduction

Polygenic scores (PGS) provide individualized genetic predictors of a phenotype by aggregating genetic effects across hundreds or thousands of loci, typically estimated from genome-wide association studies (GWAS). In recent years, it has become increasingly apparent that the transferability of PGS performance across different cohorts is poor (Martin et al., 2019). Most analyses to date have focused on ancestry differences as the main driver of this lack of portability (Wang et al., 2020; Galinsky et al., 2019; Shi et al., 2021). However, a growing body of evidence has demonstrated that PGS performance and effect estimates are influenced by differences in certain contexts, that is, environmental (classically termed ‘gene–environment’ effects or interactions) or personal-level covariates – different phenotypes seem to be differently affected by these covariates, with adiposity traits such as body mass index (BMI) having substantial evidence for these effects (Rask-Andersen et al., 2017; Robinson et al., 2017; Sulc et al., 2020; Justice et al., 2017; Helgeland et al., 2019; Vogelezang et al., 2020; Couto Alves et al., 2019; Choh et al., 2014; Mostafavi et al., 2020; Elks et al., 2012). In one previous study, they showed that GWAS stratified by sample characteristics had better PGS performance in cohorts that matched the sample characteristics of the stratified GWAS, and that differences in heritability between the stratified cohorts partially explained this observation (Mostafavi et al., 2020).

There are several gaps in current knowledge about these covariate-specific effects. Many analyses have assessed only a handful of these covariates due to the myriad of choices possible in typical large-scale biobanks. Little investigation has been done to systematically understand why certain covariates affect PGS performance, with such knowledge being useful to reduce the potential search for variables that impart context-specific effects. Furthermore, most studies investigating PGS–covariate interactions have been in European ancestry individuals; notably, comparing differences in PGS performance and prediction while controlling for differences in ancestry versus differences in context has not been assessed in previous studies. Moreover, covariate-specific effects are notorious for replicating poorly in human genetics studies, and previous studies of PGS–covariate interactions have been predominantly performed in the UK Biobank (UKBB) (Bycroft et al., 2018), where the majority of individuals are aged 40–69 (i.e., excluding young adults), are overall healthier than those from other, for example, hospital-based cohorts, and are predominantly European ancestry. Additionally, PGS performance is often assessed using linear models and in isolation of clinical covariates, which in practice would often be available. Machine learning models can have increased performance over linear models and are capable of modeling complex relationships and interactions between variables, which may serve to increase predictive performance, especially given evidence for PGS–covariate-specific effects. Finally, given evidence for context-specific effects, it should be possible to directly incorporate SNP–covariate interaction effects from a GWAS directly to improve prediction performance, instead of relying on post hoc interactions from a typical PGS calculated from main GWAS effects.

Using genetic data with linked-phenotypic information from electronic health records, we estimated the effects of covariate stratification and interaction on performance and effect estimates of PGS for BMI (PGS_BMI) – a flowchart summarizing our analyses is presented in Figure 1. These analyses were done across four datasets (Supplementary file 1a): UKBB, Penn Medicine BioBank (PMBB) (Verma et al., 2022), Electronic Medical Records and Genomics (eMERGE) network dataset (Stanaway et al., 2019), and Genetic Epidemiology Research on Adult Health and Aging (GERA) (Banda et al., 2015). These datasets include participants from two ancestry groups (N = 491,111 European ancestry [EUR], N = 21,612 African ancestry [AFR]), and 62 covariates (25 present in multiple datasets) representing laboratory, survey, and biometric data types typically associated with cardiometabolic health and adiposity. After constructing PGS_BMI using out-of-sample multi-ancestry BMI GWAS, we assessed the effects of covariate stratification on PGS_BMI R², the significance of PGS_BMI–covariate interaction terms and their increases to model R² over models only using main effects, as well as correlation of main effect, interaction effect, and R² differences. We then assessed ways to increase model performance through using machine learning models, and creating PGS_BMI using GxAge GWAS effects. This study addresses a plethora of open issues considering performance and effects of PGS on individuals from diverse backgrounds.

Figure 1

Download asset Open asset

Results

Effect of covariate stratification on PGS_BMI performance

We assessed 62 covariates for PGS_BMI R² differences (25 present, or suitable proxies, in multiple datasets Supplementary file 1b) after stratifying on binary covariates and quintiles for continuous covariates. With UKBB EUR as discovery (N = 376,729), 18 covariates had significant differences (Bonferroni p<0.05/62) in R² among groups (Figure 2a), including age, sex, alcohol consumption, different physical activity measurements, Townsend deprivation index, different dietary measurements, lipids, blood pressure, and HbA1c, with 40 covariates having suggestive (p<0.05) evidence of R² differences. From an original PGS_BMI R² of 0.076, R² increased to 0.094–0.088 for those in the bottom physical activity, alcohol intake, and high-density lipoprotein (HDL) cholesterol quintiles, and decreased to 0.067–0.049 for those in the top quintile, respectively, comparable to differences observed between ancestries (Martin et al., 2019). We note that the differences in R² due to alcohol intake and HDL were larger than those of any physical activity phenotype, despite physical activity having one of the oldest and most replicable evidence of interaction with genetic effects of BMI (Kilpeläinen et al., 2011; Rampersaud et al., 2008). Despite considerable published evidence suggesting covariate-specific genetic effects between BMI and smoking behaviors (Robinson et al., 2017; Justice et al., 2017), we were only able to find suggestive evidence for R² differences when stratifying individuals across several smoking phenotypes (minimum p=0.016, for smoking pack years). R² differences due to educational attainment were also only suggestive (p=0.015), with published evidence on this association being conflicting (Amin et al., 2018; Li et al., 2021; Frank et al., 2019).

Figure 2

Download asset Open asset

Polygenic score (PGS) R² stratified by quintiles for quantitative variables and by binary variables.

(a) Continuous covariates with significant (p<8.1 × 10^–4) R² differences across quintiles in UK Biobank (UKBB) European ancestry (EUR). Pork and processed meat consumption per week were excluded from this plot in favor of pork and processed meat intake. (b) Covariates with significant differences that were available in multiple cohorts. When traits had the same or directly comparable units between cohorts we show the actual trait values (and show percentiles for physical activity, alcohol intake frequency, and socioeconomic status, which had slightly differing phenotype definitions across cohorts) plotted on x-axis. Townsend index and income were used as variables for socioeconomic status in UKBB and Genetic Epidemiology Research on Adult Health and Aging (GERA), respectively. Note that the sign for Townsend index was reversed, since increasing Townsend index is lower socioeconomic status, while increasing income is higher socioeconomic status. PA, physical activity (PA); IPAQ, International Physical Activity Questionnaire.

We replicated these analyses in three additional large-scale cohorts of European and African ancestry individuals (Figure 2b, Supplementary file 1c), as well as in African ancestry UKBB individuals. Among covariates with significant performance differences in the discovery analysis, we were able to replicate significant (p<0.05) R² differences for age, HDL cholesterol, alcohol intake frequency, physical activity, and HbA1c, despite much smaller sample sizes. We again observed mostly insignificant differences across cohorts and ancestries when stratifying due to different smoking phenotypes and educational attainment. For each covariate and ancestry combination, we combined data across cohorts and conducted a linear regression weighted by sample size, regressing R² values on covariate values across groupings. Slopes of the regressions across cohorts had different signs between ancestries for the same covariate (triglyceride levels, HbA1c, diastolic blood pressure, and sex), although larger sample sizes may be needed to confirm these differences are statistically significant.

Several observations related to age-specific effects on PGS_BMI we considered noteworthy. First, in the weighted linear regression of all R² values across ancestries, expected R² for African ancestry individuals can become greater than that of European ancestry individuals among individuals within bottom and top age quintiles observed in these data. For instance, the predicted R² of 0.048 for 80-year-old European ancestry individuals would be lower than that of African ancestry individuals aged 24.7 and lower, indicating that differences in covariates can affect PGS_BMI performance more than differences due to ancestry. Second, we obtained these results despite the average age of GWAS individuals being 57.8, which should increase PGS_BMI R² for individuals closest to this age (Mostafavi et al., 2020). This result suggests that PGS performance due to decreased heritability with age cannot be fully reconciled using GWAS from individuals of similar age being used to create PGS_BMI (as heritability is an upper bound on PGS performance). Finally, we observed that PGS_BMI R² increases as age decreases, consistent with published evidence suggesting that the heritability of BMI decreases with age (Ge et al., 2017; Min et al., 2013).

PGS–covariate interaction effects

Next, we estimated the differences in PGS effects due to interactions with covariates by modeling interaction terms between PGS_BMI and the covariate for each covariate in our list (described in ‘Materials and methods’). We implemented a correction for shared heritability between covariates of interest and outcome (which can inflate test statistics Aschard et al., 2015) to better measure the environmental component of each covariate, and show that this correction successfully reduces significance of interaction estimates (Figure 3—figure supplement 1). Again, using UKBB EUR as the discovery cohort, we observed 28 covariates with significant (Bonferroni p<0.05/62) PGS–covariate interactions (Table 1), with 38 having suggestive (p<0.05) evidence (Supplementary file 1d). We observed the largest effect of PGS–covariate interaction with alcohol drinking frequency (20.0% decrease in PGS effect per 1 standard deviation [SD] increase, p=2.62 × 10^–55), with large effects for different physical activity measures (9.4–12.5% decrease/SD, minimum p=3.11 × 10^–66), HDL cholesterol (15.3% decrease/SD, p=1.71 × 10^–96), and total cholesterol (12.7% decrease/SD, p=1.64 × 10^–71). We observed significant interactions with diastolic blood pressure (10.8% increase/SD, p=6.06 × 10^–60), but interactions with systolic blood pressure were much smaller (1.17% increase/SD, p=4.41 × 10^–3). Significant interactions with HbA1c (4.63% increase/SD, p=5.37 × 10^–14) and type 2 diabetes (27.2% PGS effect increase in cases, p=1.83 × 10^–7) were also observed. Other significant PGS–covariate interactions included lung function, age, sex, and LDL cholesterol – various dietary measurements also had significant interactions, albeit with smaller effects than other significant covariates. We were able to find significant interaction effects for smoking pack years (4.78% increase/SD, p=3.68 × 10^–7), but other smoking phenotypes had insignificant interaction effects after correcting for multiple tests (minimum p=2.7 × 10^–3); interactions with educational attainment were also insignificant (p=4.54 × 10^–2).

Table 1

Model descriptive statistics on 28 of 62 covariates, which have significant (p<0.05/62) polygenic score (PGS)–covariate interaction terms, in UK Biobank (UKBB) European ancestry (EUR).

The third column is the percentage change in PGS effect per unit change (standard deviations for continuous variables, binary variables encoded as 0 or 1) in covariate. The fifth column is the increase in model R² with a PGS–covariate interaction term versus a main effects only model.

Variable type	Covariate	% change in β_PGS per covariate unit change	Interaction p		N
Continuous	HDL cholesterol	–15.29	1.71 × 10^–96	0.0012	328,719
	Total cholesterol	–12.70	1.64 × 10^–71	0.00082	359,221
	IPAQ	–12.50	3.11 × 10^–66	0.001	304,951
	Moderate-vigorous PA	–11.41	8.92 × 10^–65	0.001	304,951
	Diastolic BP	10.84	6.06 × 10^–60	0.0007	352,804
	Townsend Index	6.78	2.86 × 10^–58	0.00089	376,283
	Age	–9.02	3.60 × 10^–57	0.00061	376,729
	FVC	–9.66	4.69 × 10^–56	0.0008	343,467
	Drink frequency/week	–19.96	2.62 × 10^–55	0.0024	122,281
	LDL cholesterol	–9.86	2.63 × 10^–51	0.00058	358,556
	N days vigorous PA/week	–9.37	2.42 × 10^–35	0.0007	299,963
	FEV1	–7.38	7.15 × 10^–35	0.0005	343,544
	Mean alcohol consumption	–7.38	7.65 × 10^–22	0.00113	126,756
	HbA1c	4.63	5.37 × 10^–14	0.0002	358,798
	Mean drinks/week	–7.66	1.01 × 10^–13	0.0008	112,204
	Water intake	4.60	2.97 × 10^–13	0.00014	347,472
	Processed meat intake	3.70	2.38 × 10^–7	0.0002	376,205
	Starch mean	5.51	3.15 × 10^–7	0.00018	128,346
	Smoking pack years	4.78	3.68 × 10^–7	0.0002	114,135
	Protein mean	4.82	6.52 × 10^–7	0.00018	128,181
	Saturated fat mean	4.92	1.23 × 10^–6	0.00017	127,899
	Fat mean	4.40	1.64 × 10^–5	0.00013	128,092
	Saturated fat grams/week	2.46	1.79 × 10^–5	4.00 × 10^-5	364,629
	Retinol mean	3.77	3.54 × 10^–4	9.00 × 10^-5	126,029
Binary	IPAQ	–12.68	5.30 × 10^–62	0.0009	304,951
	Vigorous PA/week	–20.55	9.07 × 10^–54	0.0009	304,951
	Sex	–11.02	1.41 × 10^–24	0.00025	376,729
	Diabetes	27.19	1.83 × 10^–7	0.0004	375,903

BP = blood pressure, PA = physical activity, FVC = forced vital capacity, FEV1 = forced expiratory volume in 1 s, HDL = high-density lipoprotein, LDL = high-density lipoprotein, IPAQ = International Physical Activity Questionnaire.

We replicated these analyses across ancestries and the other non-UKBB EUR cohorts (Figure 3, Supplementary file 1d). For age and sex, which were available for all cohorts, interactions were significant (p<0.05) and directionally consistent across cohorts and ancestries (except for GERA AFR, which had small sample size [N = 1,789]). We were able to test interactions with alcohol intake frequency and physical activity in GERA, and replicated significant and directionally consistent associations. We observed poor replication for LDL cholesterol, HbA1c, and smoking pack years, with insignificant and directionally inconsistent interaction effects across cohorts. Educational attainment was available in GERA, and interactions were once again insignificant. We observed significant and directionally consistent interaction effects for TG in eMERGE EUR and PMBB EUR, while the effect was inconsistent in UKBB EUR despite much larger sample size.

Figure 3 with 1 supplement see all

Download asset Open asset

Relative percentage changes in polygenic score (PGS) effect per unit change in covariate, for covariates that significantly changed PGS effect (i.e., significant interaction beta at Bonferroni p<8.1 × 10^–4 – denoted by asterisks) and were present in multiple cohorts and ancestries.

Same covariate groupings and transformations were performed as in Figure 1. Similarly, actual values were used when variables had comparable units across cohorts, and standard deviations (SD) used otherwise.

However, despite significance of interaction terms, increases in model R² when including PGS–covariate interaction terms were small. For instance, the maximum increase among all covariates in UKBB EUR was only 0.0024 from a base R² of 0.1049 (2.1% relative increase), for alcoholic drinks per week. Across all cohorts and ancestries, the maximum increase in R² was only 0.0058 from a base R² of 0.09454 (6.1% relative increase), when adding a PGS–age interaction term for eMERGE EUR (p=5.40 × 10^–46) – this was also the largest relative increase among models with significant interaction terms. This result suggests that, while interaction effects can significantly modify PGS_BMI effect, their overall impact on model performance is relatively small, despite large differences in R² when stratifying by covariates.

Correlations between R² differences, interaction effects, and main effects

We next investigated the relationship between interaction effects, maximum R² differences across quintiles, and main effects of covariates on BMI. We first estimated the main effects of each covariate on BMI (‘Materials and methods’, Supplementary file 1e), and then calculated the correlation weighted by sample size between main effects, maximum PGS_BMI R² across quintiles, and PGS–covariate interaction effects (Figure 4) across all cohorts and ancestries – GERA data were excluded from these analyses due to slightly different phenotype definition (Supplementary file 1f), as were binary variables. Interaction effects and maximum R² differences had a 0.80 correlation (p=2.1 × 10^–27), indicating that variables with larger interaction effects also had larger effects on PGS_BMI performance across quintiles, and that covariates that increase PGS_BMI effect also have the largest effect on PGS_BMI performance, that is, individuals most at risk for obesity will have both disproportionately larger PGS_BMI effect and R². Main effects and maximum R² differences had a 0.56 correlation (p=1.3 × 10^–11), while main effects and interaction effects had a 0.58 correlation (p=7.6 × 10^–12), again suggesting that PGS_BMI are more predictive in individuals with higher values of BMI-associated covariates, although less predictive than estimating the interaction effects themselves directly. However, this result demonstrates that covariates that influence both PGS_BMI effect and performance can be predicted just using main effects of covariates, which are often known for certain phenotypes and easier to calculate, as genetic data and PGS construction would not be required.

Figure 4

Download asset Open asset

Relationships (Pearson correlations weighted by sample size) between maximum R² differences across strata, main effects of covariate on log(BMI), and polygenic score (PGS)–covariate interaction effects on log(BMI).

Main effect units are in standard deviations, interaction effect units are in PGS standard deviations multiplied by covariate standard deviations. Only continuous variables are plotted and modeled. Genetic Epidemiology Research on Adult Health and Aging (GERA) was excluded due to slightly different phenotype definitions. BMI, body mass index.

Increase in PGS effect for increasing percentiles of BMI itself, and its relation to R² differences when stratifying by covariates

Given large and replicable correlations between main effects, interaction effects, and maximum R² differences for individual covariates, it seemed these differences may be due to the differences in BMI itself, rather than any individual or combination of covariates. To assess this, we used quantile regression to evaluate the effect of PGS_BMI on BMI at different deciles of BMI itself. We observed that the effect of PGS_BMI consistently increases from lower deciles to higher deciles across all cohorts and ancestries (Figure 5) – for instance, in European ancestry UKBB individuals, the effect of PGS_BMI (in units of log(BMI)) when predicting the bottom decile of log(BMI) was 0.716 (95% CI: 0.701–0.732), and increased to 1.31 (95% CI: 1.29–1.33) in the top decile. Across all cohorts and ancestries, the effect of PGS_BMI between lowest and highest effect decile ranged from 1.43 to 2.06 times larger, with all cohorts and ancestries having nonoverlapping 95% confidence intervals between their effects (except for African ancestry eMERGE individuals, which had much smaller sample size).

Figure 5 with 2 supplements see all

Download asset Open asset

Quantile regression effects of PGS_BMI (in units of log(BMI)) on log(BMI) at each decile of BMI in each cohort and ancestry.

Tau is an input parameter for quantile regression corresponding to the percentile of the BMI distribution being modeled, with lower tau values representing the lower deciles (e.g., tau = 0.1 for the 10th percentile) and higher tau values representing the upper deciles (e.g., tau = 0.9 for the 90th percentile). The effect of PGS_BMI increases as BMI itself increases, suggesting that no individual covariate–PGS interaction is responsible for the nonlinear effect of PGS_BMI. PGS, polygenic score; BMI, body mass index.

While this analysis showed that the effect of PGS_BMI increases as BMI itself increases, which may help explain significant interaction effects between PGS_BMI and different covariates, it does not directly explain differences in R² when stratifying by different covariates – we describe several points that help explain this result and suggest they may actually be closely related. Essentially, as the magnitude of the slope of a regression line increases while the mean squared residual does not increase, model R² will increase – we demonstrate this using simulated data (Figure 5—figure supplement 1). As the magnitude of the regression line’s slope decreases, the regression line becomes a comparatively worse predictor compared to just using the mean, which decreases R² despite the mean error being the same across models. To demonstrate this in real data, we compared simple univariable models of log(BMI)~PGS_BMI (in units of log(BMI)) between the bottom and top age quintiles in the European ancestry UKBB (Figure 5—figure supplement 2). As shown in previous sections, R² and PGS_BMI beta are higher in younger individuals (R² = 0.088 versus R² = 0.066, and beta = 1.12 and 0.87, respectively), which seem to be a direct consequence of one another, as the mean squared error in younger individuals is actually higher (0.027 versus 0.022, respectively). This description suggests that the use of R² as the sole performance metric for evaluation of PGS may not always be appropriate, despite its overwhelming usage. Furthermore, this explanation helps explain the seemingly paradoxical results of significant interaction terms yet small increases in overall model R² and comparably much larger differences in R² in the stratified analyses.

Effects of machine learning approaches on predictive performance

Given evidence of PGS–covariate dependence, we aimed to assess increases in R² when using machine learning models (neural networks), which can inherently model interactions and other nonlinearities, over linear models even with interaction terms. We first included age and sex as the only covariates (along with genotype PCs and PGS_BMI), as age and sex were present in all datasets and had significant and replicable evidence for PGS-dependence across our analyses. Three models were assessed – L1-regularized (i.e., LASSO) linear regression without any interaction terms, LASSO including a PGS–age and PGS–sex interaction term, and neural networks (without interaction terms). When comparing neural networks to LASSO with interaction terms, the relative tenfold cross-validated R² increased up to 67% (mean 23%) across cohorts and ancestries (Figure 6, Supplementary file 1g). The inclusion of interaction terms increased cross-validated R² up to 12% (mean 3.9%) when comparing LASSO including interaction terms to LASSO with main effects only.

Figure 6

Download asset Open asset

Model R² from different machine learning models across cohorts and ancestries using age and gender as covariates (along with PGS_BMI and PCs 1–5).

Across all cohorts and ancestries, LASSO with PGS–age and PGS–gender interaction terms had better average tenfold cross-validation R² than LASSO without interaction terms, while neural networks outperformed LASSO models. PGS, polygenic score; BMI, body mass index.

We then modeled all available covariates and their interactions with PGS for each cohort and did similar comparisons. Cross-validated R² increased by up to 17.6% (mean 9.5%) when using neural networks versus LASSO with interaction terms, and up to 7.0% (mean 2.0%) when comparing LASSO with interaction terms to LASSO with main effects only. Increases in model performance using neural networks were smaller in UKBB, perhaps due to the age range being smaller than other cohorts (all participants aged 39–73). This result suggests that additional variance explained through nonlinear effects with age and sex are explained by other variables present in the remainder of the datasets. Our findings show machine learning methods can improve model R² that include PGS_BMI as variables beyond including interaction terms in linear models, even when variable selection is performed using LASSO, demonstrating that model performance can be increased beyond modeling nonlinearities through linear interaction terms and a feature selection procedure.

Calculating PGS directly from GxAge GWAS effects

Previous studies Mostafavi et al., 2020 have created PGS using GWAS stratified by different personal-level covariates, but for practical purposes this leads to a large loss of power as the full size of the GWAS is not utilized for each strata and continuous variables are forced into bins. We developed a novel strategy where PGS are instead created from a full-size GWAS that includes SNP–covariate interaction terms (‘Materials and methods’). We focused on age interactions, given their large and replicable effects based on our results – similar to a previous study (Mostafavi et al., 2020), we conducted these analyses in the European UKBB. We used a 60% random split of study individuals to conduct three sets of PGS using GWAS of the following designs: main effects only, main effects also with an SNP–age interaction term, and main effects but stratified into quartiles by age. Twenty percent of the remaining data were used for training and the final 20% were held out as a test set. The best-performing PGS created from SNP–age interaction terms (PGS_GxAge) increased test R² to 0.0771 (95% bootstrap CI: 0.0770–0.0772) from 0.0715 (95% bootstrap CI: 0.0714–0.0716), a 7.8% relative increase compared to the best-performing main effect PGS (Figure 7, Supplementary file 1h) – age-stratified PGS had much lower performance than both other strategies (unsurprising given reasons previously mentioned). Including a PGS_GxAge–age interaction term only marginally increased R² (0.0001 increase), with similar increases for the other two sets of PGS, further demonstrating that post hoc modeling of interactions cannot reconcile performance gained through directly incorporating interaction effects from the original GWAS. The strategy of creating PGS directly from full-sized SNP–covariate interactions is potentially quite useful as it increases PGS performance without the need for additional data – there are almost certainly a variety of points of improvement (described more in ‘Discussion’), but we consider their investigation outside the scope of this study.

Figure 7

Download asset Open asset

Polygenic score (PGS) R² based on three sets of genome-wide association studies (GWAS) setups.

‘Main effects’ were from a typical main effect GWAS, ‘GxAge’ effects were from a GWAS with an SNP–age interaction term, and ‘Age stratified’ GWAS had main effects only but were conducted in four age quartiles. PGS R² was evaluated using two models: one with main effects only and one with an additional PGS * Age interaction term.

Discussion

We uncovered replicable effects of covariates across four large-scale cohorts of diverse ancestry on both performance and effects of PGS_BMI. When stratifying by quintiles of different covariates, certain covariates had significant and replicable evidence for differences in PGS_BMI R², with R² being nearly double between top- and bottom-performing quintiles for covariates with the largest differences. When testing PGS–covariate interaction effects, we also found covariates with significant interaction effects, where, for the largest effect covariates, each standard deviation change affected PGS_BMI effect by nearly 20%. Across analyses, we found age and sex had the most replicable interaction effects, with levels of serum cholesterol, physical activity, and alcohol consumption having the largest effects across cohorts. Interaction effects and R² differences were strongly correlated, with main effects also being correlated with interaction effects and R² differences, suggesting that covariates with the largest interaction effects also contribute to the largest R² differences, with simple main effects also being predictive of expected differences in R² and interaction effects. Relatedly, we observed the effect of PGS_BMI increases as BMI itself increases, and reason that differences in R² when stratifying by covariates are largely a consequence of difference in PGS_BMI effects. Next, we employed machine learning methods for prediction of BMI with models that include PGS_BMI and demonstrate that these methods outperform regularized linear regression models that include interaction effects. Finally, we employed a novel strategy that directly incorporates SNP interaction effects into PGS construction and demonstrate that this strategy improves PGS performance when modeling SNP–age interactions compared to PGS created only from main effects.

These observations are relevant to current research and clinical use of PGS, as individuals above a percentile cutoff are designated high risk (Ge et al., 2019), implying that individuals most at risk for obesity have both disproportionately higher predicted BMI and increased BMI prediction performance compared to the general population. More broadly, these results may likely extend to single variant effects instead of those aggregated into a PGS, which may inform the cause of previous GxE discoveries – for instance, variants near FTO that interact with physical activity discovered through GWAS of BMI are robust and well-documented. However, individuals engaging in physical activity will generally have lower BMI than those that are sedentary, and these results suggest it may not be the difference in physical activity that is driving the interaction, rather the difference in BMI itself. This concept may also apply to other traits – for instance, sex-specific analyses are commonly performed, and variants with differing effects between male and female GWAS may largely be explained by phenotypic differences, rather than any combination of biological or lifestyle differences.

Future work may include replicating these analyses across additional traits, and trying to understand why these differences occur, as well as further exploring machine learning and deep learning methods on other phenotypes to determine if this trend of inclusion of PGS, along with covariate interaction effects, outperforms linear models for risk prediction. Additionally, inclusion of a PGS for the covariate to better measure its environmental effect is potentially worth exploring further and should improve in the future as PGS performance continues to increase. A slight limitation of this method in our study is that for the UKBB analyses the GWAS used for PGS construction were also from UKBB, thus not out-of-sample, although many of the covariates only have GWAS available through UKBB individuals. Furthermore, a variety of improvements are likely possible when creating PGS directly from SNP–covariate interaction terms. First, we used the same SNPs that were selected by pruning and thresholding based on their main effect p-values, but selection of SNPs based on their interaction p-values should also be possible and would likely improve performance. Additionally, performance of pruning and thresholding-based strategies has largely been overtaken by methods that first adjust all SNP effects for LD and do not require exclusion of SNPs, and a method that could do a similar adjustment for interaction effects would likely outperform most current methods for traits with significant context-specific effects. Next, incorporating additional SNP–covariate interactions (e.g., SNP–sex) would also likely further improve prediction performance, although any SNP selection/adjustment procedures may be further complicated by additional interaction terms. Finally, if SNP effects do truly differ according to differences in phenotype, then SNP effects would differ depending on the alleles one has, implying epistatic interactions are occurring at these SNPs.

While difference in phenotype itself may be able to explain the difference in genetic effects, it still may be that specific environmental or lifestyle characteristics are driving the differences. We propose several ideas about why BMI-associated covariates have larger interaction effects and impact on R² for PGS_BMI. First, age may be a proxy for accumulated gene–environment interactions as younger individuals have less exposure to environmental influences on weight compared to older individuals; therefore, one may expect that in younger individuals their phenotype could be better explained by genetics compared to environment. Second, PGS may more readily explain high phenotype values especially for positively skewed phenotypes, as large effect variants (e.g., associated with very high weight or height; Robinson et al., 2006) may be more responsible for extremely high phenotypic values. For example, the distribution of BMI is often positively skewed, and effects in trait-increasing alleles may have a larger potential to explain trait variation compared to trait-reducing variants. This explanation would likely be better suited to positively skewed traits and is not fully satisfactory as first log-transforming or rank-normal transforming the phenotype, as was done in this study, may invalidate this explanation.

PGS is a promising technique to stratify individuals for their risk of common, complex disease. To achieve more accurate predictions as well as promote equity, further research is required regarding PGS methods and assessments. This research provides firm evidence supporting the context-specific nature of PGS and the impact of nonlinear covariate effects for improving polygenic prediction of BMI, promoting equitable use of PGS across ancestries and cohorts.

Materials and methods

Study datasets

Request a detailed protocol

Individual inclusion criteria and sample sizes per cohort are described below – additional information is available in Supplementary file 1a.

UKBB

Request a detailed protocol

Individual-level quality control (QC) and filtering have been described elsewhere (Zhang et al., 2022) for European ancestry individuals. Briefly, individuals were split by ancestry according to both genetically inferred ancestry and self-reported ethnicity (Bycroft et al., 2018). Individuals with low genotyping quality and sex mismatch were removed, only unrelated individuals (Plink pi-hat < 0.250) were retained, and variants were filtered at INFO > 0.30 and minor allele frequency > 0.01. For African ancestry, individuals were first selected based on self-reported ethnicity ‘Black or Black British’, ‘Caribbean’, ‘African’, or ‘Any other Black background’. Individuals who were low quality, that is, ‘Outliers for heterozygosity or missing rate’, and who were Caucasian from ‘Genetic ethnic grouping’ were removed. Of these individuals, those who were ±6 standard deviations from the mean of the first five genetic principal components provided by UKBB were excluded. Finally, only unrelated individuals were retained up to the second degree using plink2 (Chang et al., 2015) ‘-king-cutoff 0.125’. After QC and consideration of phenotype, a total of 7,046 individuals in the UKBB AFR data who also had BMI available were used for downstream analyses. In total, 383,775 individuals were used for analysis (N_EUR = 376,729, N_AFR = 7,046).

eMERGE

Request a detailed protocol

Ancestry and relatedness inference have been described elsewhere (Stanaway et al., 2019). Individuals were split into European and African ancestry cohorts, and related individuals were removed (Plink pi-hat > 0.250) such that all were unrelated. In total, 35,064 individuals (N_EUR = 31,961, N_AFR = 3,103) were used for analysis.

GERA

Request a detailed protocol

Ancestry inference has been described elsewhere (Banda et al., 2015), and study individuals were divided into European and African ancestry cohorts. Related individuals were removed using plink2 ‘-king-cutoff 0.125’. In total, 57,838 individuals (N_EUR = 56,049, N_AFR = 1,789) were used for analysis.

PMBB

Request a detailed protocol

Ancestry inference and relatedness inference have been described elsewhere (Penn Medicine BioBank, 2022). Individuals were split into European and African ancestry cohorts, and related individuals were removed at pi-hat > 0.250. In total, 36,046 individuals (N_EUR = 26,372, N_AFR = 9,674) were used for analysis.

Choice of covariates

Request a detailed protocol

A total of 62 covariates were included in the analyses, 25 of which were present (or similar proxies) in multiple datasets. These covariates were selected based on relevance to cardiometabolic health and obesity, and previous evidence of context-specific effects with BMI (Rask-Andersen et al., 2017; Robinson et al., 2017; Justice et al., 2017; Tyrrell et al., 2017; Young et al., 2016; Winkler et al., 2015). For UKBB, phenotype values were used from the collection that was closest to recruitment, and for PMBB the median values were used – for GERA and eMERGE, only one value was available. Additional details on covariate constructions, transformations, filtering, and cohort availability are provided in Supplementary file 1b.

PGS construction

Request a detailed protocol

PGS for BMI (PGS_BMI) were constructed using PRS-CSx (Ruan et al., 2022), using GWAS summary statistics for individuals of European (Locke et al., 2015), African (Ng et al., 2017), and East Asian (Sakaue et al., 2021) ancestry that were out-of-sample of study participants. A set of 1.29 million HapMap3 SNPs provided by PRS-CSx was used for PGS calculation, which are generally well-imputed and variable frequency across global populations. Default settings for PRS-CSx (downloaded November 22, 2021) were used, which have been demonstrated to perform well for highly polygenic traits such as BMI (list of parameters is provided in Supplementary file 1i). The final PGS_BMI per ancestry and cohort was calculated by regressing log(BMI) on the PGS_BMI per ancestry without covariates – the combined, predicted value was used as a single PGS_BMI in downstream analyses.

For GERA, BMI was not transformed as it was already binned into a categorical variable with five levels (18≤, 19–25, 26–29, 30–39, >40). Additionally, for GERA the uncombined ancestry-specific PGS_BMI was used in the final models as it had higher R² than using the combined PGS_BMI (data not shown).

PGS_BMI performance after covariate stratification

Request a detailed protocol

Analyses were performed separately for each cohort and ancestry. For each covariate, individuals were binned by binary covariates or quintiles for continuous covariates. Incremental PGS_BMI R² was calculated by taking the difference in R² between:

l o g (B M I) \sim P G S_{B M I} + A g e + S e x + P C s_{1 - 5}

l o g (B M I) \sim A g e + S e x + P C s_{1 - 5}

We performed 5,000 bootstrap replications to obtain a bootstrapped distribution of R². p-Values for differences in R² were calculated between groups by calculating the proportion of overlap between two normal distributions of the R² value using the standard deviations of the bootstrap distributions. Again for GERA, BMI was not transformed.

PGS_BMI interaction modeling

Request a detailed protocol

Evidence for interaction with each covariate with the PGS_BMI was evaluated using linear regression. It has been reported that the inclusion of covariates that are genetically correlated with the outcome can inflate test statistic estimates (Aschard et al., 2015; Kerin and Marchini, 2020; Vanderweele et al., 2013). To assuage these concerns, we introduced a novel correction, where we first calculated a PGS for the covariate (PGS_Covariate) and included it in the model, as well as an interaction term between PGS_BMI and PGS_Covariate. The PGS_Covariate terms were calculated using the European ancestry Neale Lab summary statistics (URLs) and PRS-CS (Ge et al., 2019). To standardize effect sizes across analyses, PGS_BMI and Covariate were first converted to mean zero and standard deviation of 1 (binary covariates were not standardized). We demonstrate inclusion of PGS_Covariate terms successfully reduced significance of the PGS_BMI * Covariate term (Figure 3—figure supplement 1). The final model used to evaluate PGS_BMI and Covariate interactions was

\begin{array}{ll} l o g (B M I) \sim P G S_{B M I} * C o v a r i a t e + P G S_{B M I} + C o v a r i a t e + P G S_{C o v a r i a t e} + P G S_{B M I} * P G S_{C o v a r i a t e} + A g e + S e x + P C s_{1 - 5} \end{array}

We report the effect estimates of the PGS_BMI * Covariate term, and differences in model R² with and without the PGS_BMI * Covariate term. Again for GERA, BMI was not transformed.

Correlation between R² differences, interaction effects, and main effects

Request a detailed protocol

We estimated the main effects of each covariate on BMI with the following model:

l o g (B M I) \sim C o v a r i a t e + A g e + S e x + P C s_{1 - 5}

Note that we ran new models with main effects only, instead of using the main effect from the interaction models (as the main effects in the interaction models depend on the interaction terms, and main effects used to create interaction terms are sensitive to centering of variables despite the scale invariance of linear regression itself; Afshartous and Preston, 2011). We then estimated the correlation between main effects, interaction effects, and maximum R² differences across all cohorts and ancestries weighting by sample size, analyzing quantitative and binary variables separately.

Quantile regression to measure PGS effect across percentiles of BMI

Request a detailed protocol

The effect of PGS_BMI on BMI at different deciles of BMI was assessed using quantile regression. Tau – the parameter that sets which percentile to be predicted – was set to 0.10,0.20, …,0.90. Models included age, sex, and the top 5 genetic PCs as covariates. Analyses were stratified by ancestry and cohort, and BMI was first log transformed. GERA was excluded from these analyses as a portion of the models failed to run (as BMI values from GERA were already binned, some deciles all had the same BMI value – additionally, difference in effects between bins would be harder to evaluate as BMI within each decile would be more homogeneous).

Machine learning models

Request a detailed protocol

UKBB EUR and GERA EUR models were restricted to 30,000 random individuals for computational reasons – BMI distributions did not differ from the full-sized datasets (Kolmogorov–Smirnov p-values of 0.29 and 0.57, respectively). PGS_BMI and top 5 genetic principal components were included as features in all models. Two sets of models were evaluated for each cohort and ancestry: including age and sex as features, and including all available covariates in each cohort as features. Interaction terms between PGS and each covariate were included for models using interaction terms. ‘Ever Smoker’ status was used in favor of ‘Never’ versus ‘Current smoking’ status (if present) as individuals with ‘Never’ versus ‘Current’ status are a subset of those with ‘Ever Smoker’ status. UKBB AFR with all covariates was excluded due to small sample size (N = 53).

Neural networks were used as the model of choice, given their inherent ability to model interactions and other nonlinear dependencies. Prior to modeling, all features were scaled to be between 0 and 1. We used average tenfold cross-validation R² to evaluate model performance. Separate models were trained using untransformed and log(BMI). L1-regularized linear regression was used with 18 values of lambda (1.0, 5.0 × 10^–1, 2.0 × 10^–1, 1.0 × 10^–1, 5.0 × 10^–2, 2.0 × 10^–2, …, 1.0 × 10^–5, 5.0 × 10^–6, 2.0 × 10^–6). Models were trained without inclusion of interaction terms (which neural networks can implicitly model) using 1,000 iterations of random search with the following hyperparameter ranges: size of hidden layers [10, 200], learning rate [0.01, 0.0001], type of learning rate [constant, inverse scaling], power t [0.4, 0.6], momentum [0.80, 1.0], batch size [32, 256], and number of hidden layers [1, 2].

GxAge PGS_BMI creation and assessment

Request a detailed protocol

Analyses were conducted in the European UKBB (N = 376,629), as was done in a study on a similar topic (Mostafavi et al., 2020). Three sets of analyses were performed using GWAS conducted in a 60% random split of individuals using the following models (BMI was rank-normal transformed before GWAS):

$B M I \sim S N P + A g e + S e x + P C s_{1 - 5}$
$B M I \sim S N P + A g e * S N P + A g e + S e x + P C s_{1 - 5}$
Using the model in (1) but stratified into quartiles by age. BMI was rank-normal transformed within each quartile.

Using each set of GWAS, PGS was first calculated in a 20% randomly selected training set of the dataset using pruning and thresholding using 10 p-value thresholds (0.50, 0.10, …, 5.0 × 10^–5, 1.0 × 10^–5) and remaining settings as default in Plink 1.9. For (2), GxAge PGS_BMI was calculated using SNPs clumped by their main effect p-values from (1), and additionally incorporating the GxAge interaction effects per SNP. In other words, instead of typical PGS construction as

P G S_{i} = β_{1} k_{1} + β_{2} k_{2} + . . . + β_{n} k_{n}

For an individual i’s PGS calculation, with main SNP effect β, and n SNPs, PGS incorporating GxAge effects (PGS_GxAge) was calculated as

P G S_{G X A g e, i} = β_{1} k_{1} + β_{G X A g e, 1} k_{1} A g e_{i} + β_{2} k_{2} + β_{G X A g e, 2} k_{2} A g e_{i} . . . + β_{n} k_{n} + β_{G X A g e, n} k_{n} A g e_{i}

where β_GxAge is the GxAge effect for each SNP n and Age_i is the age for individual i.

For each of the three analyses, the parameters and models resulting in the best-performing PGS (highest incremental R², using same main effect covariates as in the three GWAS) from the training set were evaluated in the remaining 20% of the study individuals. For (3), models were first trained within each quartile separately. To maintain sense of scale across quartiles (after rank-normal transformation), R² between all predicted values and true values was calculated together. For R² confidence intervals, the training set was bootstrapped and evaluated on the test set 5,000 times.

URLs

Neale Lab UKBB summary statistics: http://www.nealelab.is/uk-biobank.

Select analysis code and data are available at https://github.com/RitchieLab/BMI_PGS_eLife (copy archived at Ritchie Lab, 2024).

Appendix 1

RGC management and leadership team

Goncalo Abecasis, PhD, Aris Baras, M.D., Michael Cantor, M.D., Giovanni Coppola, M.D., Andrew Deubler, Aris Economides, Ph.D., Luca A. Lotta, M.D., Ph.D., John D. Overton, Ph.D., Jeffrey G. Reid, Ph.D., Katherine Siminovitch, M.D., Alan Shuldiner, M.D.

Sequencing and lab operations

Christina Beechert, Caitlin Forsythe, M.S., Erin D. Fuller, Zhenhua Gu, M.S., Michael Lattari, Alexander Lopez, M.S., John D. Overton, Ph.D., Maria Sotiropoulos Padilla, M.S., Manasi Pradhan, M.S., Kia Manoochehri, B.S., Thomas D. Schleicher, M.S., Louis Widom, Sarah E. Wolf, M.S., Ricardo H. Ulloa, B.S.

Clinical informatics

Amelia Averitt, Ph.D., Nilanjana Banerjee, Ph.D., Michael Cantor, M.D., Dadong Li, Ph.D., Sameer Malhotra, M.D., Deepika Sharma, MHI, Jeffrey Staples, Ph.D.

Genome informatics

Xiaodong Bai, Ph.D., Suganthi Balasubramanian, Ph.D., Suying Bao, Ph.D., Boris Boutkov, Ph.D., Siying Chen, Ph.D., Gisu Eom, B.S., Lukas Habegger, Ph.D., Alicia Hawes, B.S., Shareef Khalid, Olga Krasheninina, M.S., Rouel Lanche, B.S., Adam J. Mansfield, B.A., Evan K. Maxwell, Ph.D., George Mitra, B.A., Mona Nafde, M.S., Sean O’Keeffe, Ph.D., Max Orelus, B.B.A., Razvan Panea, Ph.D., Tommy Polanco, B.A., Ayesha Rasool, M.S., Jeffrey G. Reid, Ph.D., William Salerno, Ph.D., Jeffrey C. Staples, Ph.D., Kathie Sun, Ph.D.

Analytical genomics and data science

Goncalo Abecasis, D.Phil., Joshua Backman, Ph.D., Amy Damask, Ph.D., Lee Dobbyn, Ph.D., Manuel Allen Revez Ferreira, Ph.D., Arkopravo Ghosh, M.S., Christopher Gillies, Ph.D., Lauren Gurski, B.S., Eric Jorgenson, Ph.D., Hyun Min Kang, Ph.D., Michael Kessler, Ph.D., Jack Kosmicki, Ph.D., Alexander Li, Ph.D., Nan Lin, Ph.D., Daren Liu, M.S., Adam Locke, Ph.D., Jonathan Marchini, Ph.D., Anthony Marcketta, M.S., Joelle Mbatchou, Ph.D., Arden Moscati, Ph.D., Charles Paulding, Ph.D., Carlo Sidore, Ph.D., Eli Stahl, Ph.D., Kyoko Watanabe, Ph.D., Bin Ye, Ph.D., Blair Zhang, Ph.D., Andrey Ziyatdinov, Ph.D.

Therapeutic area genetics

Ariane Ayer, B.S., Aysegul Guvenek, Ph.D., George Hindy, Ph.D., Giovanni Coppola, M.D., Jan Freudenberg, M.D., Jonas Bovijn M.D., Katherine Siminovitch, M.D., Kavita Praveen, Ph.D., Luca A. Lotta, M.D., Manav Kapoor, Ph.D., Mary Haas, Ph.D., Moeen Riaz, Ph.D., Niek Verweij, Ph.D., Olukayode Sosina, Ph.D., Parsa Akbari, Ph.D., Priyanka Nakka, Ph.D., Sahar Gelfman, Ph.D., Sujit Gokhale, B.E., Tanima De, Ph.D., Veera Rajagopal, Ph.D., Alan Shuldiner, M.D., Bin Ye, Ph.D., Gannie Tzoneva, Ph.D., Juan Rodriguez-Flores, Ph.D.

Research program management and strategic initiatives

Esteban Chen, M.S., Marcus B. Jones, Ph.D., Michelle G. LeBlanc, Ph.D., Jason Mighty, Ph.D., Lyndon J. Mitnaul, Ph.D., Nirupama Nishtala, Ph.D., Nadia Rana, Ph.D., Jaimee Hernandez

Penn Medicine BioBank Banner Author List and Contribution Statements

PMBB Leadership Team

Daniel J. Rader, M.D., Marylyn D. Ritchie, Ph.D.

Contribution: All authors contributed to securing funding, study design and oversight. All authors reviewed the final version of the manuscript.

Patient Recruitment and Regulatory Oversight

JoEllen Weaver, Nawar Naseer, Ph.D., M.P.H., Giorgio Sirugo, M.D., P.h.D., Afiya Poindexter, Yi-An Ko, Ph.D., Kyle P. Nerz

Contributions: JW manages patient recruitment and regulatory oversight of study. NN manages participant engagement, assists with regulatory oversight, and researcher access. GS assists with researcher access. AP, YK, KPN perform recruitment and enrollment of study participants.

Lab Operations

JoEllen Weaver, Meghan Livingstone, Fred Vadivieso, Stephanie DerOhannessian, Teo Tran, Julia Stephanowski, Salma Santos, Ned Haubein, P.h.D., Joseph Dunn

Contribution: JW, ML, FV, SD conduct oversight of lab operations. ML, FV, AK, SD, TT,JS, SS perform sample processing. NH, JD are responsible for sample tracking and the laboratory information management system.

Clinical Informatics

Anurag Verma, Ph.D., Colleen Morse Kripke, M.S. DPT, MSA, Marjorie Risman, M.S., Renae Judy, B.S., Colin Wollack, M.S.

Contribution: All authors contributed to the development and validation of clinical phenotypes used to identify study subjects and (when applicable) controls.

Genome Informatics

Anurag Verma Ph.D., Shefali S. Verma, Ph.D., Scott Damrauer, M.D., Yuki Bradford, M.S., Scott Dudek, M.S., Theodore Drivas, M.D., Ph.D.

Contribution: AV, SSV, and SD are responsible for the analysis, design, and infrastructure needed to quality control genotype and exome data. YB performs the analysis. TD and AV provides variant and gene annotations and their functional interpretation of variants.

For PMBB, please use: biobank@pennmedicine.upenn.edu

For Regeneron, please use: RGCcollaborations@regeneron.com

Data availability

UK Biobank data was accessed under project #32133. eMERGE data is available at dbGaP in phs001584.v2.p2. GERA data is available at dbGaP in phs000674.v3.p3. PMBB individual level genotype and phenotype data can be accessed through a research collaboration with a Penn investigator, as long as the requestor is from a not-for-profit organization. The PMBB genetic data used in this study were generated in collaboration with Regeneron Genetics Center; as such, we are unable to share the data with for-profit organizations without a three-way research collaboration agreement. Data sharing will require a PMBB project proposal and IRB approval. Collaboration requests can be sent to biobank@upenn.edu. Select analysis code and data are available at https://github.com/RitchieLab/BMI_PGS_eLife (copy archived at Ritchie Lab, 2024).

The following previously published data sets were used

1. Banda Y
2. Kvale MN
3. Hoffmann TJ
4. Hesselson SE
5. Ranatunga D
6. Tang H
(2015) dbGaP
ID phs000674.v3.p3. Resource for Genetic Epidemiology Research on Aging (GERA).

https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000674.v3.p3
(2020) dbGaP
ID phs001584.v2.p2. eMERGE Network Phase III: HRC SNV and 1000 Genomes SV Imputed Array Data of 105,000 Participants.

https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001584.v2.p2

References

1. Afshartous D
2. Preston RA
(2011) Key results of interaction models with centering
Journal of Statistics Education 19:11889620.

https://doi.org/10.1080/10691898.2011.11889620
- Google Scholar
1. Amin V
2. Dunn P
3. Spector T
(2018) Does education attenuate the genetic risk of obesity? Evidence from U.K. Twins
Economics and Human Biology 31:200–208.

https://doi.org/10.1016/j.ehb.2018.08.011
- PubMed
- Google Scholar
(2015) Adjusting for heritable covariates can bias effect estimates in genome-wide association studies
American Journal of Human Genetics 96:329–339.

https://doi.org/10.1016/j.ajhg.2014.12.021
- PubMed
- Google Scholar
1. Banda Y
2. Kvale MN
3. Hoffmann TJ
4. Hesselson SE
5. Ranatunga D
6. Tang H
7. Sabatti C
8. Croen LA
9. Dispensa BP
10. Henderson M
11. Iribarren C
12. Jorgenson E
13. Kushi LH
14. Ludwig D
15. Olberg D
16. Quesenberry CP Jr
17. Rowell S
18. Sadler M
19. Sakoda LC
20. Sciortino S
21. Shen L
22. Smethurst D
23. Somkin CP
24. Van Den Eeden SK
25. Walter L
26. Whitmer RA
27. Kwok P-Y
28. Schaefer C
29. Risch N
(2015) Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort
Genetics 200:1285–1295.

https://doi.org/10.1534/genetics.115.178616
- PubMed
- Google Scholar
1. Bycroft C
2. Freeman C
3. Petkova D
4. Band G
5. Elliott LT
6. Sharp K
7. Motyer A
8. Vukcevic D
9. Delaneau O
10. O’Connell J
11. Cortes A
12. Welsh S
13. Young A
14. Effingham M
15. McVean G
16. Leslie S
17. Allen N
18. Donnelly P
19. Marchini J
(2018) The UK Biobank resource with deep phenotyping and genomic data
Nature 562:203–209.

https://doi.org/10.1038/s41586-018-0579-z
- PubMed
- Google Scholar
1. Chang CC
2. Chow CC
3. Tellier LC
4. Vattikuti S
5. Purcell SM
6. Lee JJ
(2015) Second-generation PLINK: rising to the challenge of larger and richer datasets
GigaScience 4:7.

https://doi.org/10.1186/s13742-015-0047-8
- PubMed
- Google Scholar
1. Choh AC
2. Lee M
3. Kent JW
4. Diego VP
5. Johnson W
6. Curran JE
7. Dyer TD
8. Bellis C
9. Blangero J
10. Siervogel RM
11. Towne B
12. Demerath EW
13. Czerwinski SA
(2014) Gene-by-age effects on BMI from birth to adulthood: the Fels Longitudinal Study
Obesity 22:875–881.

https://doi.org/10.1002/oby.20517
- PubMed
- Google Scholar
1. Couto Alves A
2. De Silva NMG
3. Karhunen V
4. Sovio U
5. Das S
6. Taal HR
7. Warrington NM
8. Lewin AM
9. Kaakinen M
10. Cousminer DL
11. Thiering E
12. Timpson NJ
13. Bond TA
14. Lowry E
15. Brown CD
16. Estivill X
17. Lindi V
18. Bradfield JP
19. Geller F
20. Speed D
21. Coin LJM
22. Loh M
23. Barton SJ
24. Beilin LJ
25. Bisgaard H
26. Bønnelykke K
27. Alili R
28. Hatoum IJ
29. Schramm K
30. Cartwright R
31. Charles M-A
32. Salerno V
33. Clément K
34. Claringbould AAJ
35. BIOS Consortium
36. van Duijn CM
37. Moltchanova E
38. Eriksson JG
39. Elks C
40. Feenstra B
41. Flexeder C
42. Franks S
43. Frayling TM
44. Freathy RM
45. Elliott P
46. Widén E
47. Hakonarson H
48. Hattersley AT
49. Rodriguez A
50. Banterle M
51. Heinrich J
52. Heude B
53. Holloway JW
54. Hofman A
55. Hyppönen E
56. Inskip H
57. Kaplan LM
58. Hedman AK
59. Läärä E
60. Prokisch H
61. Grallert H
62. Lakka TA
63. Lawlor DA
64. Melbye M
65. Ahluwalia TS
66. Marinelli M
67. Millwood IY
68. Palmer LJ
69. Pennell CE
70. Perry JR
71. Ring SM
72. Savolainen MJ
73. Rivadeneira F
74. Standl M
75. Sunyer J
76. Tiesler CMT
77. Uitterlinden AG
78. Schierding W
79. O’Sullivan JM
80. Prokopenko I
81. Herzig K-H
82. Smith GD
83. O’Reilly P
84. Felix JF
85. Buxton JL
86. Blakemore AIF
87. Ong KK
88. Jaddoe VWV
89. Grant SFA
90. Sebert S
91. McCarthy MI
92. Järvelin M-R
93. Early Growth Genetics Consortium
(2019) GWAS on longitudinal growth traits reveals different genetic factors influencing infant, child, and adult BMI
Science Advances 5:eaaw3095.

https://doi.org/10.1126/sciadv.aaw3095
- PubMed
- Google Scholar
1. Elks CE
2. den Hoed M
3. Zhao JH
4. Sharp SJ
5. Wareham NJ
6. Loos RJF
7. Ong KK
(2012) Variability in the heritability of body mass index: A systematic review and meta-regression
Frontiers in Endocrinology 3:29.

https://doi.org/10.3389/fendo.2012.00029
- PubMed
- Google Scholar
1. Frank M
2. Dragano N
3. Arendt M
4. Forstner AJ
5. Nöthen MM
6. Moebus S
7. Erbel R
8. Jöckel K-H
9. Schmidt B
(2019) A genetic sum score of risk alleles associated with body mass index interacts with socioeconomic position in the Heinz Nixdorf Recall Study
PLOS ONE 14:e0221252.

https://doi.org/10.1371/journal.pone.0221252
- PubMed
- Google Scholar
1. Galinsky KJ
2. Reshef YA
3. Finucane HK
4. Loh P-R
5. Zaitlen N
6. Patterson NJ
7. Brown BC
8. Price AL
(2019) Estimating cross-population genetic correlations of causal effect sizes
Genetic Epidemiology 43:180–188.

https://doi.org/10.1002/gepi.22173
- PubMed
- Google Scholar
1. Ge T
2. Chen CY
3. Neale BM
4. Sabuncu MR
5. Smoller JW
(2017) Phenome-wide heritability analysis of the UK Biobank
PLOS Genetics 13:e1006711.

https://doi.org/10.1371/journal.pgen.1006711
- PubMed
- Google Scholar
1. Ge T
2. Chen CY
3. Ni Y
4. Feng YCA
5. Smoller JW
(2019) Polygenic prediction via Bayesian regression and continuous shrinkage priors
Nature Communications 10:1776.

https://doi.org/10.1038/s41467-019-09718-5
- PubMed
- Google Scholar
1. Helgeland Ø
2. Vaudel M
3. Juliusson PB
4. Lingaas Holmen O
5. Juodakis J
6. Bacelis J
7. Jacobsson B
8. Lindekleiv H
9. Hveem K
10. Lie RT
11. Knudsen GP
12. Stoltenberg C
13. Magnus P
14. Sagen JV
15. Molven A
16. Johansson S
17. Njølstad PR
(2019) Genome-wide association study reveals dynamic role of genetic variation in infant and early childhood growth
Nature Communications 10:4448.

https://doi.org/10.1038/s41467-019-12308-0
- PubMed
- Google Scholar
1. Justice AE
2. Winkler TW
3. Feitosa MF
4. Graff M
5. Fisher VA
6. Young K
7. Barata L
8. Deng X
9. Czajkowski J
10. Hadley D
11. Ngwa JS
12. Ahluwalia TS
13. Chu AY
14. Heard-Costa NL
15. Lim E
16. Perez J
17. Eicher JD
18. Kutalik Z
19. Xue L
20. Mahajan A
21. Renström F
22. Wu J
23. Qi Q
24. Ahmad S
25. Alfred T
26. Amin N
27. Bielak LF
28. Bonnefond A
29. Bragg J
30. Cadby G
31. Chittani M
32. Coggeshall S
33. Corre T
34. Direk N
35. Eriksson J
36. Fischer K
37. Gorski M
38. Neergaard Harder M
39. Horikoshi M
40. Huang T
41. Huffman JE
42. Jackson AU
43. Justesen JM
44. Kanoni S
45. Kinnunen L
46. Kleber ME
47. Komulainen P
48. Kumari M
49. Lim U
50. Luan J
51. Lyytikäinen LP
52. Mangino M
53. Manichaikul A
54. Marten J
55. Middelberg RPS
56. Müller-Nurasyid M
57. Navarro P
58. Pérusse L
59. Pervjakova N
60. Sarti C
61. Smith AV
62. Smith JA
63. Stančáková A
64. Strawbridge RJ
65. Stringham HM
66. Sung YJ
67. Tanaka T
68. Teumer A
69. Trompet S
70. van der Laan SW
71. van der Most PJ
72. Van Vliet-Ostaptchouk JV
73. Vedantam SL
74. Verweij N
75. Vink JM
76. Vitart V
77. Wu Y
78. Yengo L
79. Zhang W
80. Hua Zhao J
81. Zimmermann ME
82. Zubair N
83. Abecasis GR
84. Adair LS
85. Afaq S
86. Afzal U
87. Bakker SJL
88. Bartz TM
89. Beilby J
90. Bergman RN
91. Bergmann S
92. Biffar R
93. Blangero J
94. Boerwinkle E
95. Bonnycastle LL
96. Bottinger E
97. Braga D
98. Buckley BM
99. Buyske S
100. Campbell H
101. Chambers JC
102. Collins FS
103. Curran JE
104. de Borst GJ
105. de Craen AJM
106. de Geus EJC
107. Dedoussis G
108. Delgado GE
109. den Ruijter HM
110. Eiriksdottir G
111. Eriksson AL
112. Esko T
113. Faul JD
114. Ford I
115. Forrester T
116. Gertow K
117. Gigante B
118. Glorioso N
119. Gong J
120. Grallert H
121. Grammer TB
122. Grarup N
123. Haitjema S
124. Hallmans G
125. Hamsten A
126. Hansen T
127. Harris TB
128. Hartman CA
129. Hassinen M
130. Hastie ND
131. Heath AC
132. Hernandez D
133. Hindorff L
134. Hocking LJ
135. Hollensted M
136. Holmen OL
137. Homuth G
138. Jan Hottenga J
139. Huang J
140. Hung J
141. Hutri-Kähönen N
142. Ingelsson E
143. James AL
144. Jansson JO
145. Jarvelin MR
146. Jhun MA
147. Jørgensen ME
148. Juonala M
149. Kähönen M
150. Karlsson M
151. Koistinen HA
152. Kolcic I
153. Kolovou G
154. Kooperberg C
155. Krämer BK
156. Kuusisto J
157. Kvaløy K
158. Lakka TA
159. Langenberg C
160. Launer LJ
161. Leander K
162. Lee NR
163. Lind L
164. Lindgren CM
165. Linneberg A
166. Lobbens S
167. Loh M
168. Lorentzon M
169. Luben R
170. Lubke G
171. Ludolph-Donislawski A
172. Lupoli S
173. Madden PAF
174. Männikkö R
175. Marques-Vidal P
176. Martin NG
177. McKenzie CA
178. McKnight B
179. Mellström D
180. Menni C
181. Montgomery GW
182. Musk AB
183. Narisu N
184. Nauck M
185. Nolte IM
186. Oldehinkel AJ
187. Olden M
188. Ong KK
189. Padmanabhan S
190. Peyser PA
191. Pisinger C
192. Porteous DJ
193. Raitakari OT
194. Rankinen T
195. Rao DC
196. Rasmussen-Torvik LJ
197. Rawal R
198. Rice T
199. Ridker PM
200. Rose LM
201. Bien SA
202. Rudan I
203. Sanna S
204. Sarzynski MA
205. Sattar N
206. Savonen K
207. Schlessinger D
208. Scholtens S
209. Schurmann C
210. Scott RA
211. Sennblad B
212. Siemelink MA
213. Silbernagel G
214. Slagboom PE
215. Snieder H
216. Staessen JA
217. Stott DJ
218. Swertz MA
219. Swift AJ
220. Taylor KD
221. Tayo BO
222. Thorand B
223. Thuillier D
224. Tuomilehto J
225. Uitterlinden AG
226. Vandenput L
227. Vohl MC
228. Völzke H
229. Vonk JM
230. Waeber G
231. Waldenberger M
232. Westendorp RGJ
233. Wild S
234. Willemsen G
235. Wolffenbuttel BHR
236. Wong A
237. Wright AF
238. Zhao W
239. Zillikens MC
240. Baldassarre D
241. Balkau B
242. Bandinelli S
243. Böger CA
244. Boomsma DI
245. Bouchard C
246. Bruinenberg M
247. Chasman DI
248. Chen YD
249. Chines PS
250. Cooper RS
251. Cucca F
252. Cusi D
253. de Faire U
254. Ferrucci L
255. Franks PW
256. Froguel P
257. Gordon-Larsen P
258. Grabe HJ
259. Gudnason V
260. Haiman CA
261. Hayward C
262. Hveem K
263. Johnson AD
264. Wouter Jukema J
265. Kardia SLR
266. Kivimaki M
267. Kooner JS
268. Kuh D
269. Laakso M
270. Lehtimäki T
271. Marchand LL
272. März W
273. McCarthy MI
274. Metspalu A
275. Morris AP
276. Ohlsson C
277. Palmer LJ
278. Pasterkamp G
279. Pedersen O
280. Peters A
281. Peters U
282. Polasek O
283. Psaty BM
284. Qi L
285. Rauramaa R
286. Smith BH
287. Sørensen TIA
288. Strauch K
289. Tiemeier H
290. Tremoli E
291. van der Harst P
292. Vestergaard H
293. Vollenweider P
294. Wareham NJ
295. Weir DR
296. Whitfield JB
297. Wilson JF
298. Tyrrell J
299. Frayling TM
300. Barroso I
301. Boehnke M
302. Deloukas P
303. Fox CS
304. Hirschhorn JN
305. Hunter DJ
306. Spector TD
307. Strachan DP
308. van Duijn CM
309. Heid IM
310. Mohlke KL
311. Marchini J
312. Loos RJF
313. Kilpeläinen TO
314. Liu CT
315. Borecki IB
316. North KE
317. Cupples LA
(2017) Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits
Nature Communications 8:14977.

https://doi.org/10.1038/ncomms14977
- PubMed
- Google Scholar
1. Kerin M
2. Marchini J
(2020) Inferring Gene-by-Environment Interactions with a Bayesian Whole-Genome Regression Model
American Journal of Human Genetics 107:698–713.

https://doi.org/10.1016/j.ajhg.2020.08.009
- PubMed
- Google Scholar
1. Kilpeläinen TO
2. Qi L
3. Brage S
4. Sharp SJ
5. Sonestedt E
6. Demerath E
7. Ahmad T
8. Mora S
9. Kaakinen M
10. Sandholt CH
11. Holzapfel C
12. Autenrieth CS
13. Hyppönen E
14. Cauchi S
15. He M
16. Kutalik Z
17. Kumari M
18. Stančáková A
19. Meidtner K
20. Balkau B
21. Tan JT
22. Mangino M
23. Timpson NJ
24. Song Y
25. Zillikens MC
26. Jablonski KA
27. Garcia ME
28. Johansson S
29. Bragg-Gresham JL
30. Wu Y
31. van Vliet-Ostaptchouk JV
32. Onland-Moret NC
33. Zimmermann E
34. Rivera NV
35. Tanaka T
36. Stringham HM
37. Silbernagel G
38. Kanoni S
39. Feitosa MF
40. Snitker S
41. Ruiz JR
42. Metter J
43. Larrad MTM
44. Atalay M
45. Hakanen M
46. Amin N
47. Cavalcanti-Proença C
48. Grøntved A
49. Hallmans G
50. Jansson J-O
51. Kuusisto J
52. Kähönen M
53. Lutsey PL
54. Nolan JJ
55. Palla L
56. Pedersen O
57. Pérusse L
58. Renström F
59. Scott RA
60. Shungin D
61. Sovio U
62. Tammelin TH
63. Rönnemaa T
64. Lakka TA
65. Uusitupa M
66. Rios MS
67. Ferrucci L
68. Bouchard C
69. Meirhaeghe A
70. Fu M
71. Walker M
72. Borecki IB
73. Dedoussis GV
74. Fritsche A
75. Ohlsson C
76. Boehnke M
77. Bandinelli S
78. van Duijn CM
79. Ebrahim S
80. Lawlor DA
81. Gudnason V
82. Harris TB
83. Sørensen TIA
84. Mohlke KL
85. Hofman A
86. Uitterlinden AG
87. Tuomilehto J
88. Lehtimäki T
89. Raitakari O
90. Isomaa B
91. Njølstad PR
92. Florez JC
93. Liu S
94. Ness A
95. Spector TD
96. Tai ES
97. Froguel P
98. Boeing H
99. Laakso M
100. Marmot M
101. Bergmann S
102. Power C
103. Khaw K-T
104. Chasman D
105. Ridker P
106. Hansen T
107. Monda KL
108. Illig T
109. Järvelin M-R
110. Wareham NJ
111. Hu FB
112. Groop LC
113. Orho-Melander M
114. Ekelund U
115. Franks PW
116. Loos RJF
(2011) Physical activity attenuates the influence of FTO variants on obesity risk: a meta-analysis of 218,166 adults and 19,268 children
PLOS Medicine 8:e1001116.

https://doi.org/10.1371/journal.pmed.1001116
- PubMed
- Google Scholar
1. Li Y
2. Cai T
3. Wang H
4. Guo G
(2021) Achieved educational attainment, inherited genetic endowment for education, and obesity
Biodemography and Social Biology 66:132–144.

https://doi.org/10.1080/19485565.2020.1869919
- PubMed
- Google Scholar
1. Locke AE
2. Kahali B
3. Berndt SI
4. Justice AE
5. Pers TH
6. Day FR
7. Powell C
8. Vedantam S
9. Buchkovich ML
10. Yang J
11. Croteau-Chonka DC
12. Esko T
13. Fall T
14. Ferreira T
15. Gustafsson S
16. Kutalik Z
17. Luan J
18. Mägi R
19. Randall JC
20. Winkler TW
21. Wood AR
22. Workalemahu T
23. Faul JD
24. Smith JA
25. Zhao JH
26. Zhao W
27. Chen J
28. Fehrmann R
29. Hedman ÅK
30. Karjalainen J
31. Schmidt EM
32. Absher D
33. Amin N
34. Anderson D
35. Beekman M
36. Bolton JL
37. Bragg-Gresham JL
38. Buyske S
39. Demirkan A
40. Deng G
41. Ehret GB
42. Feenstra B
43. Feitosa MF
44. Fischer K
45. Goel A
46. Gong J
47. Jackson AU
48. Kanoni S
49. Kleber ME
50. Kristiansson K
51. Lim U
52. Lotay V
53. Mangino M
54. Leach IM
55. Medina-Gomez C
56. Medland SE
57. Nalls MA
58. Palmer CD
59. Pasko D
60. Pechlivanis S
61. Peters MJ
62. Prokopenko I
63. Shungin D
64. Stančáková A
65. Strawbridge RJ
66. Sung YJ
67. Tanaka T
68. Teumer A
69. Trompet S
70. van der Laan SW
71. van Setten J
72. Van Vliet-Ostaptchouk JV
73. Wang Z
74. Yengo L
75. Zhang W
76. Isaacs A
77. Albrecht E
78. Ärnlöv J
79. Arscott GM
80. Attwood AP
81. Bandinelli S
82. Barrett A
83. Bas IN
84. Bellis C
85. Bennett AJ
86. Berne C
87. Blagieva R
88. Blüher M
89. Böhringer S
90. Bonnycastle LL
91. Böttcher Y
92. Boyd HA
93. Bruinenberg M
94. Caspersen IH
95. Chen Y-DI
96. Clarke R
97. Daw EW
98. de Craen AJM
99. Delgado G
100. Dimitriou M
101. Doney ASF
102. Eklund N
103. Estrada K
104. Eury E
105. Folkersen L
106. Fraser RM
107. Garcia ME
108. Geller F
109. Giedraitis V
110. Gigante B
111. Go AS
112. Golay A
113. Goodall AH
114. Gordon SD
115. Gorski M
116. Grabe H-J
117. Grallert H
118. Grammer TB
119. Gräßler J
120. Grönberg H
121. Groves CJ
122. Gusto G
123. Haessler J
124. Hall P
125. Haller T
126. Hallmans G
127. Hartman CA
128. Hassinen M
129. Hayward C
130. Heard-Costa NL
131. Helmer Q
132. Hengstenberg C
133. Holmen O
134. Hottenga J-J
135. James AL
136. Jeff JM
137. Johansson Å
138. Jolley J
139. Juliusdottir T
140. Kinnunen L
141. Koenig W
142. Koskenvuo M
143. Kratzer W
144. Laitinen J
145. Lamina C
146. Leander K
147. Lee NR
148. Lichtner P
149. Lind L
150. Lindström J
151. Lo KS
152. Lobbens S
153. Lorbeer R
154. Lu Y
155. Mach F
156. Magnusson PKE
157. Mahajan A
158. McArdle WL
159. McLachlan S
160. Menni C
161. Merger S
162. Mihailov E
163. Milani L
164. Moayyeri A
165. Monda KL
166. Morken MA
167. Mulas A
168. Müller G
169. Müller-Nurasyid M
170. Musk AW
171. Nagaraja R
172. Nöthen MM
173. Nolte IM
174. Pilz S
175. Rayner NW
176. Renstrom F
177. Rettig R
178. Ried JS
179. Ripke S
180. Robertson NR
181. Rose LM
182. Sanna S
183. Scharnagl H
184. Scholtens S
185. Schumacher FR
186. Scott WR
187. Seufferlein T
188. Shi J
189. Smith AV
190. Smolonska J
191. Stanton AV
192. Steinthorsdottir V
193. Stirrups K
194. Stringham HM
195. Sundström J
196. Swertz MA
197. Swift AJ
198. Syvänen A-C
199. Tan S-T
200. Tayo BO
201. Thorand B
202. Thorleifsson G
203. Tyrer JP
204. Uh H-W
205. Vandenput L
206. Verhulst FC
207. Vermeulen SH
208. Verweij N
209. Vonk JM
210. Waite LL
211. Warren HR
212. Waterworth D
213. Weedon MN
214. Wilkens LR
215. Willenborg C
216. Wilsgaard T
217. Wojczynski MK
218. Wong A
219. Wright AF
220. Zhang Q
221. LifeLines Cohort Study
222. Brennan EP
223. Choi M
224. Dastani Z
225. Drong AW
226. Eriksson P
227. Franco-Cereceda A
228. Gådin JR
229. Gharavi AG
230. Goddard ME
231. Handsaker RE
232. Huang J
233. Karpe F
234. Kathiresan S
235. Keildson S
236. Kiryluk K
237. Kubo M
238. Lee J-Y
239. Liang L
240. Lifton RP
241. Ma B
242. McCarroll SA
243. McKnight AJ
244. Min JL
245. Moffatt MF
246. Montgomery GW
247. Murabito JM
248. Nicholson G
249. Nyholt DR
250. Okada Y
251. Perry JRB
252. Dorajoo R
253. Reinmaa E
254. Salem RM
255. Sandholm N
256. Scott RA
257. Stolk L
258. Takahashi A
259. Tanaka T
260. van ’t Hooft FM
261. Vinkhuyzen AAE
262. Westra H-J
263. Zheng W
264. Zondervan KT
265. ADIPOGen Consortium
266. AGEN-BMI Working Group
267. CARDIOGRAMplusC4D Consortium
268. CKDGen Consortium
269. GLGC
270. ICBP
271. MAGIC Investigators
272. MuTHER Consortium
273. MIGen Consortium
274. PAGE Consortium
275. ReproGen Consortium
276. GENIE Consortium
277. International Endogene Consortium
278. Heath AC
279. Arveiler D
280. Bakker SJL
281. Beilby J
282. Bergman RN
283. Blangero J
284. Bovet P
285. Campbell H
286. Caulfield MJ
287. Cesana G
288. Chakravarti A
289. Chasman DI
290. Chines PS
291. Collins FS
292. Crawford DC
293. Cupples LA
294. Cusi D
295. Danesh J
296. de Faire U
297. den Ruijter HM
298. Dominiczak AF
299. Erbel R
300. Erdmann J
301. Eriksson JG
302. Farrall M
303. Felix SB
304. Ferrannini E
305. Ferrières J
306. Ford I
307. Forouhi NG
308. Forrester T
309. Franco OH
310. Gansevoort RT
311. Gejman PV
312. Gieger C
313. Gottesman O
314. Gudnason V
315. Gyllensten U
316. Hall AS
317. Harris TB
318. Hattersley AT
319. Hicks AA
320. Hindorff LA
321. Hingorani AD
322. Hofman A
323. Homuth G
324. Hovingh GK
325. Humphries SE
326. Hunt SC
327. Hyppönen E
328. Illig T
329. Jacobs KB
330. Jarvelin M-R
331. Jöckel K-H
332. Johansen B
333. Jousilahti P
334. Jukema JW
335. Jula AM
336. Kaprio J
337. Kastelein JJP
338. Keinanen-Kiukaanniemi SM
339. Kiemeney LA
340. Knekt P
341. Kooner JS
342. Kooperberg C
343. Kovacs P
344. Kraja AT
345. Kumari M
346. Kuusisto J
347. Lakka TA
348. Langenberg C
349. Marchand LL
350. Lehtimäki T
351. Lyssenko V
352. Männistö S
353. Marette A
354. Matise TC
355. McKenzie CA
356. McKnight B
357. Moll FL
358. Morris AD
359. Morris AP
360. Murray JC
361. Nelis M
362. Ohlsson C
363. Oldehinkel AJ
364. Ong KK
365. Madden PAF
366. Pasterkamp G
367. Peden JF
368. Peters A
369. Postma DS
370. Pramstaller PP
371. Price JF
372. Qi L
373. Raitakari OT
374. Rankinen T
375. Rao DC
376. Rice TK
377. Ridker PM
378. Rioux JD
379. Ritchie MD
380. Rudan I
381. Salomaa V
382. Samani NJ
383. Saramies J
384. Sarzynski MA
385. Schunkert H
386. Schwarz PEH
387. Sever P
388. Shuldiner AR
389. Sinisalo J
390. Stolk RP
391. Strauch K
392. Tönjes A
393. Trégouët D-A
394. Tremblay A
395. Tremoli E
396. Virtamo J
397. Vohl M-C
398. Völker U
399. Waeber G
400. Willemsen G
401. Witteman JC
402. Zillikens MC
403. Adair LS
404. Amouyel P
405. Asselbergs FW
406. Assimes TL
407. Bochud M
408. Boehm BO
409. Boerwinkle E
410. Bornstein SR
411. Bottinger EP
412. Bouchard C
413. Cauchi S
414. Chambers JC
415. Chanock SJ
416. Cooper RS
417. de Bakker PIW
418. Dedoussis G
419. Ferrucci L
420. Franks PW
421. Froguel P
422. Groop LC
423. Haiman CA
424. Hamsten A
425. Hui J
426. Hunter DJ
427. Hveem K
428. Kaplan RC
429. Kivimaki M
430. Kuh D
431. Laakso M
432. Liu Y
433. Martin NG
434. März W
435. Melbye M
436. Metspalu A
437. Moebus S
438. Munroe PB
439. Njølstad I
440. Oostra BA
441. Palmer CNA
442. Pedersen NL
443. Perola M
444. Pérusse L
445. Peters U
446. Power C
447. Quertermous T
448. Rauramaa R
449. Rivadeneira F
450. Saaristo TE
451. Saleheen D
452. Sattar N
453. Schadt EE
454. Schlessinger D
455. Slagboom PE
456. Snieder H
457. Spector TD
458. Thorsteinsdottir U
459. Stumvoll M
460. Tuomilehto J
461. Uitterlinden AG
462. Uusitupa M
463. van der Harst P
464. Walker M
465. Wallaschofski H
466. Wareham NJ
467. Watkins H
468. Weir DR
469. Wichmann H-E
470. Wilson JF
471. Zanen P
472. Borecki IB
473. Deloukas P
474. Fox CS
475. Heid IM
476. O’Connell JR
477. Strachan DP
478. Stefansson K
479. van Duijn CM
480. Abecasis GR
481. Franke L
482. Frayling TM
483. McCarthy MI
484. Visscher PM
485. Scherag A
486. Willer CJ
487. Boehnke M
488. Mohlke KL
489. Lindgren CM
490. Beckmann JS
491. Barroso I
492. North KE
493. Ingelsson E
494. Hirschhorn JN
495. Loos RJF
496. Speliotes EK
(2015) Genetic studies of body mass index yield new insights for obesity biology
Nature 518:197–206.

https://doi.org/10.1038/nature14177
- PubMed
- Google Scholar
1. Martin AR
2. Kanai M
3. Kamatani Y
4. Okada Y
5. Neale BM
6. Daly MJ
(2019) Clinical use of current polygenic risk scores may exacerbate health disparities
Nature Genetics 51:584–591.

https://doi.org/10.1038/s41588-019-0379-x
- PubMed
- Google Scholar
1. Min J
2. Chiu DT
3. Wang Y
(2013) Variation in the heritability of body mass index based on diverse twin studies: a systematic review
Obesity Reviews 14:871–882.

https://doi.org/10.1111/obr.12065
- PubMed
- Google Scholar
(2020) Variable prediction accuracy of polygenic scores within an ancestry group
eLife 9:e48376.

https://doi.org/10.7554/eLife.48376
- PubMed
- Google Scholar
1. Ng MCY
2. Graff M
3. Lu Y
4. Justice AE
5. Mudgal P
6. Liu C-T
7. Young K
8. Yanek LR
9. Feitosa MF
10. Wojczynski MK
11. Rand K
12. Brody JA
13. Cade BE
14. Dimitrov L
15. Duan Q
16. Guo X
17. Lange LA
18. Nalls MA
19. Okut H
20. Tajuddin SM
21. Tayo BO
22. Vedantam S
23. Bradfield JP
24. Chen G
25. Chen W-M
26. Chesi A
27. Irvin MR
28. Padhukasahasram B
29. Smith JA
30. Zheng W
31. Allison MA
32. Ambrosone CB
33. Bandera EV
34. Bartz TM
35. Berndt SI
36. Bernstein L
37. Blot WJ
38. Bottinger EP
39. Carpten J
40. Chanock SJ
41. Chen Y-DI
42. Conti DV
43. Cooper RS
44. Fornage M
45. Freedman BI
46. Garcia M
47. Goodman PJ
48. Hsu Y-HH
49. Hu J
50. Huff CD
51. Ingles SA
52. John EM
53. Kittles R
54. Klein E
55. Li J
56. McKnight B
57. Nayak U
58. Nemesure B
59. Ogunniyi A
60. Olshan A
61. Press MF
62. Rohde R
63. Rybicki BA
64. Salako B
65. Sanderson M
66. Shao Y
67. Siscovick DS
68. Stanford JL
69. Stevens VL
70. Stram A
71. Strom SS
72. Vaidya D
73. Witte JS
74. Yao J
75. Zhu X
76. Ziegler RG
77. Zonderman AB
78. Adeyemo A
79. Ambs S
80. Cushman M
81. Faul JD
82. Hakonarson H
83. Levin AM
84. Nathanson KL
85. Ware EB
86. Weir DR
87. Zhao W
88. Zhi D
89. Arnett DK
90. Grant SFA
91. Kardia SLR
92. Oloapde OI
93. Rao DC
94. Rotimi CN
95. Sale MM
96. Williams LK
97. Zemel BS
98. Becker DM
99. Borecki IB
100. Evans MK
101. Harris TB
102. Hirschhorn JN
103. Li Y
104. Patel SR
105. Psaty BM
106. Rotter JI
107. Wilson JG
108. Bowden DW
109. Cupples LA
110. Haiman CA
111. Loos RJF
112. North KE
113. Bone Mineral Density in Childhood Study Group
(2017) Discovery and fine-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African Ancestry Anthropometry Genetics Consortium
PLOS Genetics 13:e1006719.

https://doi.org/10.1371/journal.pgen.1006719
- PubMed
- Google Scholar
Website
1. Penn Medicine BioBank
(2022) Penn Medicine BioBank
Internet. Accessed February 1, 2022.

https://pmbb.med.upenn.edu/
1. Rampersaud E
2. Mitchell BD
3. Pollin TI
4. Fu M
5. Shen H
6. O’Connell JR
7. Ducharme JL
8. Hines S
9. Sack P
10. Naglieri R
11. Shuldiner AR
12. Snitker S
(2008) Physical activity and the association of common FTO gene variants with body mass index and obesity
Archives of Internal Medicine 168:1791–1797.

https://doi.org/10.1001/archinte.168.16.1791
- PubMed
- Google Scholar
(2017) Gene-environment interaction study for BMI reveals interactions between genetic factors and physical activity, alcohol consumption and socioeconomic status
PLOS Genetics 13:e1006977.

https://doi.org/10.1371/journal.pgen.1006977
- PubMed
- Google Scholar
Software
1. Ritchie Lab
(2024) BMI_PGS_eLife, version swh:1:rev:58046f2f88349bfea043f97b0b2aac29b6f83004
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:f7781372708602f3b7e46a2b74d10bbf56ad6790;origin=https://github.com/RitchieLab/BMI_PGS_eLife;visit=swh:1:snp:a6a9d047dd0bb085e89baef8f526c09ee17d08c0;anchor=swh:1:rev:58046f2f88349bfea043f97b0b2aac29b6f83004
1. Robinson PN
2. Arteaga-Solis E
3. Baldock C
4. Collod-Béroud G
5. Booms P
6. De Paepe A
7. Dietz HC
8. Guo G
9. Handford PA
10. Judge DP
11. Kielty CM
12. Loeys B
13. Milewicz DM
14. Ney A
15. Ramirez F
16. Reinhardt DP
17. Tiedemann K
18. Whiteman P
19. Godfrey M
(2006) The molecular genetics of Marfan syndrome and related disorders
Journal of Medical Genetics 43:769–787.

https://doi.org/10.1136/jmg.2005.039669
- PubMed
- Google Scholar
1. Robinson MR
2. English G
3. Moser G
4. Lloyd-Jones LR
5. Triplett MA
6. Zhu Z
7. Nolte IM
8. van Vliet-Ostaptchouk JV
9. Snieder H
10. LifeLines Cohort Study
11. Esko T
12. Milani L
13. Mägi R
14. Metspalu A
15. Magnusson PKE
16. Pedersen NL
17. Ingelsson E
18. Johannesson M
19. Yang J
20. Cesarini D
21. Visscher PM
(2017) Genotype-covariate interaction effects and the heritability of adult body mass index
Nature Genetics 49:1174–1181.

https://doi.org/10.1038/ng.3912
- PubMed
- Google Scholar
1. Ruan Y
2. Lin Y-F
3. Feng Y-CA
4. Chen C-Y
5. Lam M
6. Guo Z
7. Stanley Global Asia Initiatives
8. He L
9. Sawa A
10. Martin AR
11. Qin S
12. Huang H
13. Ge T
(2022) Improving polygenic prediction in ancestrally diverse populations
Nature Genetics 54:573–580.

https://doi.org/10.1038/s41588-022-01054-7
- PubMed
- Google Scholar
1. Sakaue S
2. Kanai M
3. Tanigawa Y
4. Karjalainen J
5. Kurki M
6. Koshiba S
7. Narita A
8. Konuma T
9. Yamamoto K
10. Akiyama M
11. Ishigaki K
12. Suzuki A
13. Suzuki K
14. Obara W
15. Yamaji K
16. Takahashi K
17. Asai S
18. Takahashi Y
19. Suzuki T
20. Shinozaki N
21. Yamaguchi H
22. Minami S
23. Murayama S
24. Yoshimori K
25. Nagayama S
26. Obata D
27. Higashiyama M
28. Masumoto A
29. Koretsune Y
30. Ito K
31. Terao C
32. Yamauchi T
33. Komuro I
34. Kadowaki T
35. Tamiya G
36. Yamamoto M
37. Nakamura Y
38. Kubo M
39. Murakami Y
40. Yamamoto K
41. Kamatani Y
42. Palotie A
43. Rivas MA
44. Daly MJ
45. Matsuda K
46. Okada Y
47. FinnGen
(2021) A cross-population atlas of genetic associations for 220 human phenotypes
Nature Genetics 53:1415–1424.

https://doi.org/10.1038/s41588-021-00931-x
- PubMed
- Google Scholar
1. Shi H
2. Gazal S
3. Kanai M
4. Koch EM
5. Schoech AP
6. Siewert KM
7. Kim SS
8. Luo Y
9. Amariuta T
10. Huang H
11. Okada Y
12. Raychaudhuri S
13. Sunyaev SR
14. Price AL
(2021) Population-specific causal disease effect sizes in functionally important regions impacted by selection
Nature Communications 12:1098.

https://doi.org/10.1038/s41467-021-21286-1
- PubMed
- Google Scholar
1. Stanaway IB
2. Hall TO
3. Rosenthal EA
4. Palmer M
5. Naranbhai V
6. Knevel R
7. Namjou-Khales B
8. Carroll RJ
9. Kiryluk K
10. Gordon AS
11. Linder J
12. Howell KM
13. Mapes BM
14. Lin FTJ
15. Joo YY
16. Hayes MG
17. Gharavi AG
18. Pendergrass SA
19. Ritchie MD
20. de Andrade M
21. Croteau-Chonka DC
22. Raychaudhuri S
23. Weiss ST
24. Lebo M
25. Amr SS
26. Carrell D
27. Larson EB
28. Chute CG
29. Rasmussen-Torvik LJ
30. Roy-Puckelwartz MJ
31. Sleiman P
32. Hakonarson H
33. Li R
34. Karlson EW
35. Peterson JF
36. Kullo IJ
37. Chisholm R
38. Denny JC
39. Jarvik GP
40. Crosslin DR
41. eMERGE Network
(2019) The eMERGE genotype set of 83,717 subjects imputed to ~40 million variants genome wide and association with the herpes zoster medical record phenotype
Genetic Epidemiology 43:63–81.

https://doi.org/10.1002/gepi.22167
- PubMed
- Google Scholar
1. Sulc J
2. Mounier N
3. Günther F
4. Winkler T
5. Wood AR
6. Frayling TM
7. Heid IM
8. Robinson MR
9. Kutalik Z
(2020) Quantification of the overall contribution of gene-environment interaction for obesity-related traits
Nature Communications 11:1385.

https://doi.org/10.1038/s41467-020-15107-0
- PubMed
- Google Scholar
1. Tyrrell J
2. Wood AR
3. Ames RM
4. Yaghootkar H
5. Beaumont RN
6. Jones SE
7. Tuke MA
8. Ruth KS
9. Freathy RM
10. Davey Smith G
11. Joost S
12. Guessous I
13. Murray A
14. Strachan DP
15. Kutalik Z
16. Weedon MN
17. Frayling TM
(2017) Gene-obesogenic environment interactions in the UK Biobank study
International Journal of Epidemiology 46:559–575.

https://doi.org/10.1093/ije/dyw337
- PubMed
- Google Scholar
(2013) Environmental confounding in gene-environment interaction studies
American Journal of Epidemiology 178:144–152.

https://doi.org/10.1093/aje/kws439
- PubMed
- Google Scholar
1. Verma A
2. Damrauer SM
3. Naseer N
4. Weaver J
5. Kripke CM
6. Guare L
7. Sirugo G
8. Kember RL
9. Drivas TG
10. Dudek SM
11. Bradford Y
12. Lucas A
13. Judy R
14. Verma SS
15. Meagher E
16. Nathanson KL
17. Feldman M
18. Ritchie MD
19. Rader DJ
(2022) The penn medicine biobank: towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population
Journal of Personalized Medicine 12:1974.

https://doi.org/10.3390/jpm12121974
- PubMed
- Google Scholar
1. Vogelezang S
2. Bradfield JP
3. Ahluwalia TS
4. Curtin JA
5. Lakka TA
6. Grarup N
7. Scholz M
8. van der Most PJ
9. Monnereau C
10. Stergiakouli E
11. Heiskala A
12. Horikoshi M
13. Fedko IO
14. Vilor-Tejedor N
15. Cousminer DL
16. Standl M
17. Wang CA
18. Viikari J
19. Geller F
20. Íñiguez C
21. Pitkänen N
22. Chesi A
23. Bacelis J
24. Yengo L
25. Torrent M
26. Ntalla I
27. Helgeland Ø
28. Selzam S
29. Vonk JM
30. Zafarmand MH
31. Heude B
32. Farooqi IS
33. Alyass A
34. Beaumont RN
35. Have CT
36. Rzehak P
37. Bilbao JR
38. Schnurr TM
39. Barroso I
40. Bønnelykke K
41. Beilin LJ
42. Carstensen L
43. Charles MA
44. Chawes B
45. Clément K
46. Closa-Monasterolo R
47. Custovic A
48. Eriksson JG
49. Escribano J
50. Groen-Blokhuis M
51. Grote V
52. Gruszfeld D
53. Hakonarson H
54. Hansen T
55. Hattersley AT
56. Hollensted M
57. Hottenga JJ
58. Hyppönen E
59. Johansson S
60. Joro R
61. Kähönen M
62. Karhunen V
63. Kiess W
64. Knight BA
65. Koletzko B
66. Kühnapfel A
67. Landgraf K
68. Langhendries JP
69. Lehtimäki T
70. Leinonen JT
71. Li A
72. Lindi V
73. Lowry E
74. Bustamante M
75. Medina-Gomez C
76. Melbye M
77. Michaelsen KF
78. Morgen CS
79. Mori TA
80. Nielsen TRH
81. Niinikoski H
82. Oldehinkel AJ
83. Pahkala K
84. Panoutsopoulou K
85. Pedersen O
86. Pennell CE
87. Power C
88. Reijneveld SA
89. Rivadeneira F
90. Simpson A
91. Sly PD
92. Stokholm J
93. Teo KK
94. Thiering E
95. Timpson NJ
96. Uitterlinden AG
97. van Beijsterveldt CEM
98. van Schaik BDC
99. Vaudel M
100. Verduci E
101. Vinding RK
102. Vogel M
103. Zeggini E
104. Sebert S
105. Lind MV
106. Brown CD
107. Santa-Marina L
108. Reischl E
109. Frithioff-Bøjsøe C
110. Meyre D
111. Wheeler E
112. Ong K
113. Nohr EA
114. Vrijkotte TGM
115. Koppelman GH
116. Plomin R
117. Njølstad PR
118. Dedoussis GD
119. Froguel P
120. Sørensen TIA
121. Jacobsson B
122. Freathy RM
123. Zemel BS
124. Raitakari O
125. Vrijheid M
126. Feenstra B
127. Lyytikäinen LP
128. Snieder H
129. Kirsten H
130. Holt PG
131. Heinrich J
132. Widén E
133. Sunyer J
134. Boomsma DI
135. Järvelin MR
136. Körner A
137. Davey Smith G
138. Holm JC
139. Atalay M
140. Murray C
141. Bisgaard H
142. McCarthy MI
143. Early Growth Genetics Consortium
144. Jaddoe VWV
145. Grant SFA
146. Felix JF
(2020) Novel loci for childhood body mass index and shared heritability with adult cardiometabolic traits
PLOS Genetics 16:e1008718.

https://doi.org/10.1371/journal.pgen.1008718
- PubMed
- Google Scholar
1. Wang Y
2. Guo J
3. Ni G
4. Yang J
5. Visscher PM
6. Yengo L
(2020) Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations
Nature Communications 11:3865.

https://doi.org/10.1038/s41467-020-17719-y
- PubMed
- Google Scholar
1. Winkler TW
2. Justice AE
3. Graff M
4. Barata L
5. Feitosa MF
6. Chu S
7. Czajkowski J
8. Esko T
9. Fall T
10. Kilpeläinen TO
11. Lu Y
12. Mägi R
13. Mihailov E
14. Pers TH
15. Rüeger S
16. Teumer A
17. Ehret GB
18. Ferreira T
19. Heard-Costa NL
20. Karjalainen J
21. Lagou V
22. Mahajan A
23. Neinast MD
24. Prokopenko I
25. Simino J
26. Teslovich TM
27. Jansen R
28. Westra HJ
29. White CC
30. Absher D
31. Ahluwalia TS
32. Ahmad S
33. Albrecht E
34. Alves AC
35. Bragg-Gresham JL
36. de Craen AJM
37. Bis JC
38. Bonnefond A
39. Boucher G
40. Cadby G
41. Cheng YC
42. Chiang CWK
43. Delgado G
44. Demirkan A
45. Dueker N
46. Eklund N
47. Eiriksdottir G
48. Eriksson J
49. Feenstra B
50. Fischer K
51. Frau F
52. Galesloot TE
53. Geller F
54. Goel A
55. Gorski M
56. Grammer TB
57. Gustafsson S
58. Haitjema S
59. Hottenga JJ
60. Huffman JE
61. Jackson AU
62. Jacobs KB
63. Johansson Å
64. Kaakinen M
65. Kleber ME
66. Lahti J
67. Mateo Leach I
68. Lehne B
69. Liu Y
70. Lo KS
71. Lorentzon M
72. Luan J
73. Madden PAF
74. Mangino M
75. McKnight B
76. Medina-Gomez C
77. Monda KL
78. Montasser ME
79. Müller G
80. Müller-Nurasyid M
81. Nolte IM
82. Panoutsopoulou K
83. Pascoe L
84. Paternoster L
85. Rayner NW
86. Renström F
87. Rizzi F
88. Rose LM
89. Ryan KA
90. Salo P
91. Sanna S
92. Scharnagl H
93. Shi J
94. Smith AV
95. Southam L
96. Stančáková A
97. Steinthorsdottir V
98. Strawbridge RJ
99. Sung YJ
100. Tachmazidou I
101. Tanaka T
102. Thorleifsson G
103. Trompet S
104. Pervjakova N
105. Tyrer JP
106. Vandenput L
107. van der Laan SW
108. van der Velde N
109. van Setten J
110. van Vliet-Ostaptchouk JV
111. Verweij N
112. Vlachopoulou E
113. Waite LL
114. Wang SR
115. Wang Z
116. Wild SH
117. Willenborg C
118. Wilson JF
119. Wong A
120. Yang J
121. Yengo L
122. Yerges-Armstrong LM
123. Yu L
124. Zhang W
125. Zhao JH
126. Andersson EA
127. Bakker SJL
128. Baldassarre D
129. Banasik K
130. Barcella M
131. Barlassina C
132. Bellis C
133. Benaglio P
134. Blangero J
135. Blüher M
136. Bonnet F
137. Bonnycastle LL
138. Boyd HA
139. Bruinenberg M
140. Buchman AS
141. Campbell H
142. Chen YDI
143. Chines PS
144. Claudi-Boehm S
145. Cole J
146. Collins FS
147. de Geus EJC
148. de Groot L
149. Dimitriou M
150. Duan J
151. Enroth S
152. Eury E
153. Farmaki AE
154. Forouhi NG
155. Friedrich N
156. Gejman PV
157. Gigante B
158. Glorioso N
159. Go AS
160. Gottesman O
161. Gräßler J
162. Grallert H
163. Grarup N
164. Gu YM
165. Broer L
166. Ham AC
167. Hansen T
168. Harris TB
169. Hartman CA
170. Hassinen M
171. Hastie N
172. Hattersley AT
173. Heath AC
174. Henders AK
175. Hernandez D
176. Hillege H
177. Holmen O
178. Hovingh KG
179. Hui J
180. Husemoen LL
181. Hutri-Kähönen N
182. Hysi PG
183. Illig T
184. De Jager PL
185. Jalilzadeh S
186. Jørgensen T
187. Jukema JW
188. Juonala M
189. Kanoni S
190. Karaleftheri M
191. Khaw KT
192. Kinnunen L
193. Kittner SJ
194. Koenig W
195. Kolcic I
196. Kovacs P
197. Krarup NT
198. Kratzer W
199. Krüger J
200. Kuh D
201. Kumari M
202. Kyriakou T
203. Langenberg C
204. Lannfelt L
205. Lanzani C
206. Lotay V
207. Launer LJ
208. Leander K
209. Lindström J
210. Linneberg A
211. Liu YP
212. Lobbens S
213. Luben R
214. Lyssenko V
215. Männistö S
216. Magnusson PK
217. McArdle WL
218. Menni C
219. Merger S
220. Milani L
221. Montgomery GW
222. Morris AP
223. Narisu N
224. Nelis M
225. Ong KK
226. Palotie A
227. Pérusse L
228. Pichler I
229. Pilia MG
230. Pouta A
231. Rheinberger M
232. Ribel-Madsen R
233. Richards M
234. Rice KM
235. Rice TK
236. Rivolta C
237. Salomaa V
238. Sanders AR
239. Sarzynski MA
240. Scholtens S
241. Scott RA
242. Scott WR
243. Sebert S
244. Sengupta S
245. Sennblad B
246. Seufferlein T
247. Silveira A
248. Slagboom PE
249. Smit JH
250. Sparsø TH
251. Stirrups K
252. Stolk RP
253. Stringham HM
254. Swertz MA
255. Swift AJ
256. Syvänen AC
257. Tan ST
258. Thorand B
259. Tönjes A
260. Tremblay A
261. Tsafantakis E
262. van der Most PJ
263. Völker U
264. Vohl MC
265. Vonk JM
266. Waldenberger M
267. Walker RW
268. Wennauer R
269. Widén E
270. Willemsen G
271. Wilsgaard T
272. Wright AF
273. Zillikens MC
274. van Dijk SC
275. van Schoor NM
276. Asselbergs FW
277. de Bakker PIW
278. Beckmann JS
279. Beilby J
280. Bennett DA
281. Bergman RN
282. Bergmann S
283. Böger CA
284. Boehm BO
285. Boerwinkle E
286. Boomsma DI
287. Bornstein SR
288. Bottinger EP
289. Bouchard C
290. Chambers JC
291. Chanock SJ
292. Chasman DI
293. Cucca F
294. Cusi D
295. Dedoussis G
296. Erdmann J
297. Eriksson JG
298. Evans DA
299. de Faire U
300. Farrall M
301. Ferrucci L
302. Ford I
303. Franke L
304. Franks PW
305. Froguel P
306. Gansevoort RT
307. Gieger C
308. Grönberg H
309. Gudnason V
310. Gyllensten U
311. Hall P
312. Hamsten A
313. van der Harst P
314. Hayward C
315. Heliövaara M
316. Hengstenberg C
317. Hicks AA
318. Hingorani A
319. Hofman A
320. Hu F
321. Huikuri HV
322. Hveem K
323. James AL
324. Jordan JM
325. Jula A
326. Kähönen M
327. Kajantie E
328. Kathiresan S
329. Kiemeney L
330. Kivimaki M
331. Knekt PB
332. Koistinen HA
333. Kooner JS
334. Koskinen S
335. Kuusisto J
336. Maerz W
337. Martin NG
338. Laakso M
339. Lakka TA
340. Lehtimäki T
341. Lettre G
342. Levinson DF
343. Lind L
344. Lokki ML
345. Mäntyselkä P
346. Melbye M
347. Metspalu A
348. Mitchell BD
349. Moll FL
350. Murray JC
351. Musk AW
352. Nieminen MS
353. Njølstad I
354. Ohlsson C
355. Oldehinkel AJ
356. Oostra BA
357. Palmer LJ
358. Pankow JS
359. Pasterkamp G
360. Pedersen NL
361. Pedersen O
362. Penninx BW
363. Perola M
364. Peters A
365. Polašek O
366. Pramstaller PP
367. Psaty BM
368. Qi L
369. Quertermous T
370. Raitakari OT
371. Rankinen T
372. Rauramaa R
373. Ridker PM
374. Rioux JD
375. Rivadeneira F
376. Rotter JI
377. Rudan I
378. den Ruijter HM
379. Saltevo J
380. Sattar N
381. Schunkert H
382. Schwarz PEH
383. Shuldiner AR
384. Sinisalo J
385. Snieder H
386. Sørensen TIA
387. Spector TD
388. Staessen JA
389. Stefania B
390. Thorsteinsdottir U
391. Stumvoll M
392. Tardif JC
393. Tremoli E
394. Tuomilehto J
395. Uitterlinden AG
396. Uusitupa M
397. Verbeek ALM
398. Vermeulen SH
399. Viikari JS
400. Vitart V
401. Völzke H
402. Vollenweider P
403. Waeber G
404. Walker M
405. Wallaschofski H
406. Wareham NJ
407. Watkins H
408. Zeggini E
409. CHARGE Consortium
410. DIAGRAM Consortium
411. GLGC Consortium
412. Global-BPGen Consortium
413. ICBP Consortium
414. MAGIC Consortium
415. Chakravarti A
416. Clegg DJ
417. Cupples LA
418. Gordon-Larsen P
419. Jaquish CE
420. Rao DC
421. Abecasis GR
422. Assimes TL
423. Barroso I
424. Berndt SI
425. Boehnke M
426. Deloukas P
427. Fox CS
428. Groop LC
429. Hunter DJ
430. Ingelsson E
431. Kaplan RC
432. McCarthy MI
433. Mohlke KL
434. O’Connell JR
435. Schlessinger D
436. Strachan DP
437. Stefansson K
438. van Duijn CM
439. Hirschhorn JN
440. Lindgren CM
441. Heid IM
442. North KE
443. Borecki IB
444. Kutalik Z
445. Loos RJF
(2015) The influence of age and sex on genetic associations with adult body size and shape: a large-scale genome-wide interaction study
PLOS Genetics 11:e1005378.

https://doi.org/10.1371/journal.pgen.1005378
- PubMed
- Google Scholar
(2016) Multiple novel gene-by-environment interactions modify the effect of FTO variants on body mass index
Nature Communications 7:12724.

https://doi.org/10.1038/ncomms12724
- PubMed
- Google Scholar
1. Zhang X
2. Lucas AM
3. Veturi Y
4. Drivas TG
5. Bone WP
6. Verma A
7. Chung WK
8. Crosslin D
9. Denny JC
10. Hebbring S
11. Jarvik GP
12. Kullo I
13. Larson EB
14. Rasmussen-Torvik LJ
15. Schaid DJ
16. Smoller JW
17. Stanaway IB
18. Wei WQ
19. Weng C
20. Ritchie MD
(2022) Large-scale genomic analyses reveal insights into pleiotropy across circulatory system diseases and nervous system disorders
Nature Communications 13:3428.

https://doi.org/10.1038/s41467-022-30678-w
- PubMed
- Google Scholar

Article and author information

Author details

Daniel Hui

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States

Contribution
Conceptualization, Formal analysis, Visualization, Methodology, Writing – original draft, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-8023-7352
Scott Dudek

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States

Contribution
Formal analysis, Writing – review and editing

Competing interests
No competing interests declared
Krzysztof Kiryluk

Division of Nephrology, Department of Medicine, Columbia University, New York, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Theresa L Walunas

Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Iftikhar J Kullo

Department of Cardiovascular Medicine, Mayo Clinic, Rochester, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Wei-Qi Wei

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Hemant Tiwari

Department of Pediatrics, University of Alabama at Birmingham, Birmingham, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Josh F Peterson

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Wendy K Chung

Departments of Pediatrics and Medicine, Columbia University Irving Medical Center, Columbia University, New York, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Brittney H Davis

Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Atlas Khan

Division of Nephrology, Department of Medicine, Columbia University, New York, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Leah C Kottyan

The Center for Autoimmune Genomics and Etiology, Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Nita A Limdi

Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Qiping Feng

Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-6213-793X
Megan J Puckelwartz

Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Chunhua Weng

Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Johanna L Smith

Department of Cardiovascular Medicine, Mayo Clinic, Rochester, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Elizabeth W Karlson

Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Regeneron Genetics Center

Contribution
DNA sequencing

Competing interests
No competing interests declared
Penn Medicine BioBank

Contribution
Recruitment of participants, extraction of de-identified EHR data

Competing interests
No competing interests declared
Gail P Jarvik

Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington Medical Center, Seattle, United States

Contribution
Writing – review and editing

Competing interests
No competing interests declared
Marylyn D Ritchie

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States

Contribution
Conceptualization, Resources, Supervision, Funding acquisition, Investigation, Methodology, Writing – original draft, Project administration, Writing – review and editing

For correspondence
marylyn@pennmedicine.upenn.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-1208-1720

Funding

National Institutes of Health (AI077505)

Daniel Hui
Scott Dudek
Marylyn D Ritchie

National Institutes of Health (HL169458)

Daniel Hui
Scott Dudek
Marylyn D Ritchie

National Institute of Diabetes and Digestive and Kidney Diseases (DK52431)

Wendy K Chung

National Institutes of Health (U01 HG011166)

Josh F Peterson

National Institutes of Health (U01 HG008680)

Chunhua Weng

Group Health Cooperative/University of Washington (U01HG008657)

Gail P Jarvik

Brigham and Women's Hospital (U01HG008685)

Elizabeth W Karlson

Vanderbilt University Medical Center (U01HG008672)

Wei-Qi Wei
Josh F Peterson
Qiping Feng

Cincinnati Children’s Hospital Medical Center (U01HG008666)

Leah C Kottyan

Mayo Clinic (U01HG006379)

Iftikhar J Kullo
Johanna L Smith

Columbia University Health Sciences (U01HG008680)

Krzysztof Kiryluk
Atlas Khan
Chunhua Weng
Wendy K Chung

Northwestern University (U01HG008673)

Theresa L Walunas
Megan J Puckelwartz

Vanderbilt University Medical Center serving as the Coordinating Center (U01HG008701)

Josh F Peterson
Wei-Qi Wei

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

For UK Biobank: This research has been conducted using the UK Biobank Resource under Application Number 32133. This work uses data provided by patients and collected by the NHS as part of their care and support.

For GERA: Data came from a grant, the Resource for Genetic Epidemiology Research in Adult Health and Aging (RC2 AG033067; Schaefer and Risch, PIs) awarded to the Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH) and the UCSF Institute for Human Genetics. The RPGEH was supported by grants from the Robert Wood Johnson Foundation, the Wayne and Gladys Valley Foundation, the Ellison Medical Foundation, Kaiser Permanente Northern California, and the Kaiser Permanente National and Northern California Community Benefit Programs. The RPGEH and the Resource for Genetic Epidemiology Research in Adult Health and Aging are described in the following publication, Schaefer C, et al., The Kaiser Permanente Research Program on Genes, Environment and Health: Development of a Research Resource in a Multi-Ethnic Health Plan with Electronic Medical Records, In preparation, 2013.

For eMERGE: We acknowledge David Crosslin for helping clean the eMERGE data. Please see funding section for eMERGE funding acknowledgments.

For PMBB: We acknowledge the Penn Medicine BioBank (PMBB) for providing data and thank the patient- participants of Penn Medicine who consented to participate in this research program. We would also like to thank the Penn Medicine BioBank team and Regeneron Genetics Center for providing genetic variant data for analysis. The PMBB is approved under IRB protocol# 813913 and supported by Perelman School of Medicine at University of Pennsylvania, a gift from the Smilow family, and the National Center for Advancing Translational Sciences of the National Institutes of Health under CTSA award number UL1TR001878.

Version history

Sent for peer review: April 17, 2023
Preprint posted: May 14, 2023
Reviewed Preprint version 1: July 31, 2023
Reviewed Preprint version 2: June 12, 2024
Version of Record published: January 24, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.88149. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.