Introduction

The genome-wide association study (GWAS) literature has identified a vast number of polygenic indices (PGIs) on almost every widely-measured human phenotype13. One motivation for this enterprise has been to construct tools that may help clinical practice in disease risk prediction46. In addition, PGIs can possibly have utility for health risk assessment in a more indirect manner, as they may have downstream importance in predicting health and functioning more widely beyond their immediate phenotypes, such as with regards to their ability to predict all-cause mortality. Despite some earlier contributions711, the knowledge on the extent that PGIs predict mortality is still scarce, and existing literature covers mostly disease or biomarker-related PGIs. It is largely unclear to what extent PGIs for social, psychological, and behavioural phenotypes or PGIs for typically non-fatal health conditions can help in mortality prediction.

When the interest lies beyond merely predictive uses, the research has increasingly shown the limitations of PGIs as black-box predictors, which may include – in addition to the usually-desired direct genetic signals – population-related phenomena due to geographical stratification of ancestries, dynastic or shared environmental effects in families and kins, as well as assortative mating. Within-sibship analysis designs may alleviate such limitations, taking advantage on that the genetic differences between siblings originate from the random segregation at meiosis1,12.

The mortality risks between individuals differ substantially by their sex, age and education13,14, warranting an assessment on the potential heterogenous effects regarding them. It is also possible that social, psychological, and behaviour-related PGIs may matter disproportionately more for certain causes of death including accidents, suicides, violent and alcohol-related deaths. Such “external” mortality is conceptually closely connected to risky behaviour and alcohol consumption. A relevant question for their practical utility is also whether PGIs can bring additional information on predicting the risk of death also in cases when information of the measured phenotype is available.

In this study, we answer these gaps in knowledge by assessing the association between 35 different PGIs – mostly related to social, behavioural, and psychological traits or typically non-fatal health conditions – and mortality using a population-representative sample of over 40 000 Finnish individuals with up to 25 years of register-based mortality follow-up. We also assess the extent of potential population-stratification and related biases by within-sibship analysis of over 10 000 siblings. Furthermore, we examine potential heterogeneous PGI-mortality associations by sex and education, as well as for mortality occurring at different ages and separately for external and natural causes of death. Finally, we compare six PGIs that show the strongest associations with mortality (PGIs of ever smoking, self-rated health, body mass index (BMI), depressive symptoms, and alcoholic drinks per week) and their phenotypes when mutually adjusting for each other.

Methods

Study population

The main (“population”) analysis sample consists of genetically informed population surveys FINRISK rounds 1992, 1997, 2002, 2007 and 2012, as well as Health 2000 and 2011 and FinHealth 20171517. The response rates of these data collections varied between 65 and 93%. Among the initial pooled sample, 88% had genotyped data available. The genetic data followed the Sequencing Initiative Suomi protocols in the quality control and imputation procedures18,19. These data were linked to administrative registers using pseudonymized personal identity codes, including socio-demographic and mortality information maintained by Statistics Finland. In addition to this population sample, we used a within-sibship analysis sample. Individuals in the sibling data were mainly dizygotic twins from Finnish twin cohorts: “old cohort” (born before 1958); FinnTwin12 (born 1974–1979); and Finntwin16 (1983–1987)20. These sibling data were complemented with individuals identified from the population sample as likely full siblings based on their genetic similarity (0.35<Identity by descent<0.80) and having age difference of less than 18 years. They were excluded from the main population analysis to achieve non-overlapping samples.

The individuals were followed from (whichever latest) 1) January of 1995, 2) July of the data collection year, or 3) the month the respondent turned 25 years. The mortality follow-up ended (whichever earliest) on the end of 2019, or at the date of the death. The analytic sample size was 40 097 individuals (564 885 person-years of follow-up) and 5948 deaths in the population analysis. The within-sibship analysis included 10 174 individuals (200 683 person-years of follow-up) in 5071 sibships and 2116 deaths.

Variables

The outcome was death that occurred during the follow-up period. External (accidents, violence, suicides, and alcohol-related causes of death; International Classification of Diseases 10th revision codes: F10, G312, G4051, G621, G721, I426, K292, K70, K852, K860, O354, P043, Q860, V01–Y89; 587 deaths) and natural causes of death (other codes; 5349 deaths) were identified from the national cause of death register collected by Statistics Finland. Twelve individuals had an unknown cause of death and were excluded from the cause-specific analysis.

As the independent variables of main interest, we used 35 different PGIs in the Polygenic Index repository by Becker et al.2, which were mainly based on GWASes using UK Biobank and 23andMe, Inc. data samples, and other data collections. They were tailored for the Finnish data, i.e. excluding overlapping samples and performing linkage-disequilibrium adjustment. The PGIs are described in Supplementary table S1 (see also 2,22). The PGIs were standardized to have mean 0 and standard deviation (SD) 1.

We also measured corresponding phenotypes for six PGIs with the strongest association with mortality: body mass index (BMI, kg/m2), alcohol intake (grams of ethanol per week), depression (number of indicators 0–3), education (expected years to complete the highest attained degree), smoking (never/ quitted at least six months ago/ current), and self-rated health (1–5). Supplementary table S2 presents more information of the measurement of the phenotypes and their distributions. For parsimony and comparability to PGIs, these phenotypes were also standardized to SD units in the main analysis, whereas analyses of categorically measured phenotypes are presented in the supplement. After excluding individuals with any missing information, analysis including phenotypes had N: 37 548 individuals and 5407 deaths. Supplementary table S3 presents correlations between PGIs and studied phenotypes.

Modelling

We estimated Cox proportional hazards regressions predicting mortality by each PGI. We used age as the time scale. All the models were adjusted for sex, indicators for the data collection baseline year and the ten first principal components of the full (pruned) SNP matrix. The models were first estimated for the whole population sample. We compared these models to the corresponding within-sibship analysis, using the sibship identifier as the strata variable23.

Next, we assessed for heterogenous associations by estimating the corresponding models among men and women separately, as well as in three educational groups. We also investigated possible age-related heterogeneous patterns, fitting the corresponding model in three age-specific mortality follow-up periods (25–64 years, 65–79-years, 80+ years). We also conducted an additional analysis by separating the outcome to external and natural causes of death.

Finally, we analysed the six PGIs with strongest association to mortality in more detail. Here we also measured the corresponding phenotypes and fitted four types of models: Model 1 was adjusted for controls and each PGI/phenotype separately. Model 2 jointly adjusted for corresponding PGI and phenotype. Model 3 adjusted for all PGIs or phenotypes (but not both simultaneously) and model 4 adjusted for all six PGIs and phenotypes simultaneously. We also carried out an analysis stratified by whether the study participants had or had not the phenotypic risk factor.

Proportional hazards assumption of Cox models was evaluated with Schoenfeld residuals (see Supplementary table S4)24. Residual correlations of PGIs were no more than 0.050 their absolute value. The correlations with investigated phenotypes were higher, however, particularly for (continuous) BMI (-0.099).

The software used for producing genetic variables was PLINK v. 1.9/2.0. The statistical analysis was conducted with Stata v. 16 and 18.

Results

Figure 1 displays the associations between 35 PGIs and all-cause mortality for the population analysis and within-sibship analysis samples. In the population analysis, PGIs that showed the strongest associations with mortality were ever smoking (hazard ratio [HR] per 1 SD larger PGI =1.12, 95 per cent confidence interval [95%CI]= 1.09;1.15), self-rated health (HR=0.90, 95%CI 0.88;0.93), BMI (HR=1.10, 95%CI 1.07;1.13), educational attainment (HR=0.91, 95%CI 0.89;0.94), depressive symptoms (HR=1.07, 95%CI 1.04;1.10), and drinks per week (HR=1.06, 95%CI 1.04;1.09). Most studied PGIs had negligible associations with mortality, as 18 PGIs had HRs between 0.98 and 1.02.

Hazard ratios of polygenic indices for all-cause mortality.

Population and within-sibship estimates. Note: Estimates from Cox proportional hazards models adjusted for indicators for the baseline year, sex and 10 first principal components of the genome. Capped bars are 95% confidence intervals. For a table of corresponding estimates, see Supplementary table S5. Abbreviations: HR=Hazard ratio; ADHD= Attention deficit hyperactivity disorder.

Although confidence intervals overlapped with regards to every individual PGI except extraversion, on average within-sibship analysis had marginally larger associations than the population analysis (inverse-variance-weighted mean absolute log HR was 0.023, 95%CI 0.005;0.042 larger in within-sibship than population analyses). In within-sibship analysis, PGI of BMI had the strongest association with mortality (HR=1.22, 95%CI 1.10;1.36).

Figure 2 presents PGI-mortality associations by sex, education, age, and cause of death. Overall, men had slightly stronger PGI-mortality associations (Panel A; inverse-variance-weighted mean absolute log HR was 0.013, 95%CI 0.005;0.022 larger among men). The largest sex difference was observed for PGI of ADHD (HR=1.08 95%CI 1.04;1.11 among men; HR=1.01 95%CI 0.97;1.05 among women; p=0.012 for difference). The HRs in Panel B indicate no evidence for heterogeneous associations by educational level. Panel C analyses morality in three age-specific follow-up periods. The PGIs were more predictive of death in younger age groups, although the difference between 25–64 and 65–79 age groups was small, except for PGI of ADHD (HR=1.14 95%CI 1.08;1.21 for 25–64-year-olds; HR=1.04 95%CI 1.00;1.08 for 65–79-year-olds; p=0.008 for difference). PGIs predicted death only negligibly among those aged 80+.

Hazard ratios of polygenic indices for all-cause mortality by sex (Panel A), educational group (Panel B), age-specific mortality follow-up period (Panel C), and cause of death (Panel D).

Note: Estimates from Cox proportional hazards models adjusted for indicators for the baseline year, sex and 10 first principal components of the genome. Capped bars are 95% confidence intervals. For a table of corresponding estimates, see Supplementary table S6. Abbreviations: HR=Hazard ratio; ADHD= Attention deficit hyperactivity disorder.

Panel D displays that most PGIs had stronger associations with external (accidents, violent, suicide and alcohol related deaths) than natural causes of death. For external causes, strongest associations were observed for PGI for drinks per week (HR=1.16, 95%CI 1.07;1.26), depressive symptoms (HR=1.15, 95%CI 1.06;1.25), educational attainment (HR=0.88, 95%CI 0.81;0.95) and ADHD (HR=1.14, 95%CI 1.05;1.23). Twelve PGIs had HRs ≥1.1 (drinks per week, depressive symptoms, cigarettes per day, ADHD, alcohol misuse, ever smoker, risk tolerance) or ≤0.9 (cannabis use, self-rated health, religious attendance, age at first birth and educational attainment) per 1 SD difference in PGI, whereas only three PGIs had HRs ≥1.1 or ≤0.9 for natural causes of death (ever smoking, body mass index and self-rated health). HRs of natural causes of death followed similar patterns as all-cause mortality, which was expected, as they constituted 90% of all observed deaths.

Table 1 shows HRs of the six most predictive PGIs (ever smoking, self-rated health, educational attainment, BMI, depressive symptoms and drinks per week) and their corresponding phenotypes (smoking, self-rated health, years of education, BMI, number of depression indicators, and alcohol intake per week) for all-cause mortality. Models 1a–1l present HRs of each variable adjusted only for baseline covariates. All phenotypes except BMI had stronger associations with mortality than their corresponding PGIs. Models 2a–2f adjust for each phenotype and their corresponding PGI simultaneously. HRs of phenotypes were only slightly attenuated, whereas for PGIs this attenuation was around one third on average. Nevertheless, each PGI was clearly associated with mortality even after adjusting for their phenotype, and all estimated 95%CIs excluded one in Models 2a–2f. Model 3a adjusts for all six phenotypes and 3b all six PGIs simultaneously. The most substantial attenuation is observed for depression indicators and the PGI of depressive symptoms. Finally, Model 4 adjusts all phenotypes and PGIs simultaneously. In Model 4, PGIs had modest independent associations, strongest observed for PGI of smoking (HR=1.04, 95%CI 1.01;1.08), PGI of BMI (HR=1.03, 95%CI 1.00;1.06) and PGI of drinks per week (HR=1.03, 95%CI 1.00;1.06). Supplementary table S7 presents corresponding analyses with categorised phenotypes, indicating curvilinear morality patterns for BMI and alcohol intake. Although HRs of these phenotypes in Table 1 should be interpreted with caution, HRs of PGIs were consistent between these analyses.

Hazard ratios of selected polygenic indices and corresponding phenotypes for all-cause mortality (N: 37 548 individuals; 5407 deaths)

Table 2 shows the associations between these six PGIs and all-cause mortality stratified by whether an individual was lacking the phenotype risk factor in question. We did not observe evidence for substantial difference in PGI-mortality HR between individuals with and without these risk factors in their corresponding phenotype. Furthermore, the only PGI that showed consistent attenuation in their HRs compared to the analysis on the whole sample (presented in Table 1 and Figure 1) was PGI of ever smoking (HR=1.07, 95%CI 1.03;1.11 for never smokers, HR=1.09, 95%CI 1.05;1.12 for others, c.f. HR=1.12, 95%CI 1.09;1.15 for unstratified population analysis).

Hazard ratios of selected polygenic indices for all-cause mortality regarding belonging to a risk category in the related phenotypes

Discussion

We investigated the association between 35 PGIs – mostly related to social, behavioural, and psychological traits, or typically non-fatal health conditions – and mortality using a population-representative register-linked sample from Finland with up to 25-year mortality follow-up. PGIs most strongly associated to mortality were typically related to the best-established phenotypic mortality risk factors, including smoking, self-rated health, BMI, education and alcohol consumption2527. Although the majority of the investigated PGIs had negligible associations with the risk of death, the strongest associations observed were about a 10% difference in the mortality hazard for 1 SD difference in PGI. Given the severity of the outcome, these associations cannot be disregarded as trivial, particularly when considering individuals with particularly high or low PGIs. Our within-sibship analyses showed broadly similar results and thus do not indicate that these PGI-mortality associations were systematically inflated due to population stratification or related biases. Limited previous literature exists overall on PGI-mortality associations, particularly for other than disease or biomarker related PGIs. Nevertheless, the associations of PGIs of smoking, alcohol consumption, depression and BMI that we observed for Finland were roughly comparable or moderately stronger than what was observed in a previous study on the UK Biobank for PGIs unadjusted for each other10 and in U.S. based Health and Retirement Study for mutually adjusted PGIs9.

The investigated PGIs were slightly more predictive of mortality among men than women across the board. This aligns with sex differences for all-cause mortality observed for many social-level mortality risk phenotypes such as socioeconomic position and marital status2830, but similar excess risk among men is not consistently observed on more physiologically proximate or behavioural mortality risk phenotypes such as obesity, alcohol use and smoking3133. We also evaluated but did not observe differences in PGI-mortality associations between educational groups. The PGIs were more predictive of death at younger ages, whereas among those who survived to age 80, PGIs made hardly any difference in morality risk. In contrast, a previous study found stronger HRs in older age groups when directly identifying SNPs associated with mortality7. A possible explanation for such “age as a leveller” pattern (see13,34) may lie in the increasing importance of acute mortality-risk enhancing factors towards the end of life, including emerging and progressing illness and biological ageing, that trump more distant and indirect mechanisms13,35. Such age–specific heterogeneity has also methodological implications. Researchers analysing PGI-mortality association using samples of (disproportionately) older individuals should be cautious in generalizing their results to the overall population due to potential survivorship bias36,37.

In general, PGIs were more strongly predictive of external than natural causes of death, and this was particularly evident for many psychological and behavioural PGIs, including (alcohol) drinks per week, ADHD, depressive symptoms, religious attendance, and risk tolerance. Only the PGI of BMI showed clearly stronger association with natural causes. It is worth noting that despite that alcohol-related deaths were included in external causes, smoking-related PGIs predicted both external and natural mortality in roughly consistent manner and PGI of cannabis use had even a negative association with external mortality. This suggests that despite a substantial shared genetic aetiology between different addictions38,39, the genetic architecture between the use of different substances also differs importantly from the perspective of mortality risk.

Among those PGIs that were most predictive of mortality, their associations tend to be roughly one third of the strength of the respective phenotypes when mutually adjusted, although with substantial variation between phenotypes. Additionally, PGIs were predictive also among those who lack the phenotypic risk factor. These two observations imply that PGIs provide additional information on the risk of death even when the phenotypic measures are available. The potential advantage of PGIs in research and health care relative to phenotype is that they need to be measured only once, whereas for many phenotypes most precise monitoring would require extensive longitudinal measures. This is particularly relevant for phenotypes related to a specific points of life course, e.g. on health-related factors typically manifesting at older age. This suggests that the independent association of PGIs might be stronger if we had even longer mortality follow-up. Additionally, PGIs capture liabilities in a continuum, which may offer an advantage in risk assessment compared to phenotypes that can be measured through binary diagnoses or with limited categories. PGIs also avoid some potential forms of measurement error, such as those related to self-reporting or short-term variations over time.

The strengths of this study included population-representative data with high response rates linked to a long register-based follow-up with minimal attrition-related biases. On the other hand, PGIs have known limitations, as they capture the genetic liability incompletely, are agnostic regards to the specific biological mechanisms and may also capture environmental signals1. Within-sibship analysis allowed us to evaluate and mitigate population stratification and other related biases in the analyses, however, at the cost of power loss and increased sensitivity to measurement error40. Overall, these analyses confirmed our main findings. In addition, comparability of the PGIs may be affected by differences in the GWASes that underlie them, e.g. in their sample sizes and phenotype measurement quality. Finally, despite that the Schoenfeld residual correlations suggested some violation proportional hazards assumption, such correlations were present for phenotypes that were not central to the analysis.

To conclude, PGIs related to the best-established phenotypic risk factors had the strongest associations with mortality. Particularly for deaths occurring at younger age, PGIs confer additional information on mortality risk, even when information on related phenotype is available, and within-sibship analysis suggests that such associations appear not to be systematically inflated by population phenomena.

Acknowledgements

We thank Aysu Okbay for weighting for the used PGIs. We would like to thank the research participants and employees of 23andMe, Inc. for making this work possible. The genetic samples used for the research were obtained from the THL Biobank (study number: THLBB2020_8/ THLBB2023_51), and we thank all study participants for their generous participation in the THL Biobank.

Additional information

Ethics approval

The Finnish Social and Health Data Permit Authority (Findata) has accepted the use of clinical data (THL/4725/14.02.00/2020; THL/1423/14.06.00/2022), and THL Biobank has approved the use of genetic data (THLBB2020_8; THLBB2023_51) and the data linkage to the Finnish population registers (TK-53-876-20; TK/2041/07.03.00/2023). All participants gave their informed consent.

Data availability

Due to the privacy protection of the study participants, data are confidential and cannot be shared publicly. Datasets are available from the THL Biobank on written application and following the instructions given on the website of the Biobank (https://thl.fi/en/web/thl-biobank/for-researchers). Register linkage from these data may be applied for from Findata (https://findata.fi/en/permits/).

Author contributions

HL conceptualised the study, contributed the data preparation, conducted the main analysis and wrote the first draft. AG, KK and PM contributed to the data infrastructure and data preparation for analysis. JK, KK and SL supported the empirical analysis with their methodological expertise. AG, JK, KK, SL, KS and PM revised the manuscript text with critical content and contributed to the interpretation of the findings.

Funding

HL was supported by the European Research Council [grant #101019329] as well as the Max Planck – University of Helsinki Center for Social Inequalities in Population Health. SL gratefully acknowledges funding from the Academy of Finland (decision 350399).

PM was supported by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 101019329), the Strategic Research Council (SRC) within the Research Council of Finland grants for ACElife (#352543-352572) and LIFECON (# 345219), the Research Council of Finland profiling grant for SWAN and FooDrug, and grants to the Max Planck – University of Helsinki Center from the Jane and Aatos Erkko Foundation (#210046), the Max Planck Society (# 5714240218), University of Helsinki (#77204227), and Cities of Helsinki, Vantaa and Espoo. The study does not necessarily reflect the Commission’s views and in no way anticipates the Commission’s future policy in this area. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Additional files

Supplementary table.