A flowchart of the project.

PGS R2 stratified by quintiles for quantitative variables and by binary variables. a) Continuous covariates with significant (p < 7.7×10−4) R2 differences across quintiles in UKBB EUR. Pork and processed meat consumption per week were excluded from this plot in favor of pork and processed meat intake. b) Covariates with significant differences that were available in multiple cohorts, note actual values instead of percentiles plotted on x-axis (except percentiles were used for physical activity, alcohol intake frequency, and socioeconomic status, which had slightly differing phenotype definitions across cohorts). Townsend index and income were used as variables for socioeconomic status UKBB and GERA, respectively. Note that the sign for Townsend index was reversed, since increasing Townsend index is lower socioeconomic status, while increasing income is higher socioeconomic status. Abbreviations: physical activity (PA), International Physical Activity Questionnaire IPAQ).

Model descriptive statistics on 28 of 62 covariates, which have significant (p<.05/62) PGS-covariate interaction terms, in UKBB EUR. The third column is the percentage change in PGS effect per unit change (standard deviations for continuous variables, binary variables encoded as 0 or 1) in covariate. The fifth column is the increase in model R2 with a PGS-covariate interaction term versus a main effects only model. Abbreviations: blood pressure (BP), physical activity (PA), forced vital capacity (FVC), forced expiratory volume in 1-second (FEV1), International Physical Activity Questionnaire (IPAQ).

Relative percentage changes in PGS effect per unit change in covariate, for covariates that significantly changed PGS effect (i.e., significant interaction beta at Bonferroni p < 7.7×10−4– denoted by asterisks) and were present in multiple cohorts and ancestries. Same covariate groupings and transformations were performed as with Figure 1.

Pearson correlations weighted by sample size between maximum R2 differences across strata, main effects of covariate on log(BMI), and PGS-covariate interaction effects on log(BMI). Main effect units are in standard deviations, interaction effect units are in PGS standard deviations multiplied by covariate standard deviations. Only continuous variables are plotted and modeled. GERA was excluded due to slightly different phenotype definitions.

Model R2 across cohorts and ancestries using age and gender as covariates (along with PGSBMI and PCs 1-5). Across all cohorts and ancestries, LASSO with PGS-age and PGS-gender interaction terms had better average 10-fold cross-validation R2 than LASSO without interaction terms, while neural networks outperformed LASSO models.

PGS R2 based on three sets of GWAS. “Main effects” were from a typical main effect GWAS, “GxAge” effects were from a GWAS with a SNP-age interaction term, and “Age stratified” GWAS had main effects only but were conducted in four age quartiles. PGS R2 was evaluated using two models: one with main effects only, and one with an additional PGS*Age interaction term.

PGS-covariate interaction term p-values in UKBB EUR, with and without including the covariate PGS in the model – the mean -log10(p) is reduced from 18.0899 to 14.97072 with their inclusions. Note age and sex PGS were not calculated, and their interaction p-values are excluded from this figure.