1. Genetics and Genomics
Download icon

Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances

  1. Paul RHJ Timmers
  2. Ninon Mounier
  3. Kristi Lall
  4. Krista Fischer
  5. Zheng Ning
  6. Xiao Feng
  7. Andrew D Bretherick
  8. David W Clark
  9. eQTLGen Consortium
  10. Xia Shen
  11. Tõnu Esko
  12. Zoltán Kutalik
  13. James F Wilson
  14. Peter K Joshi  Is a corresponding author
  1. University of Edinburgh, United Kingdom
  2. University Hospital of Lausanne, Switzerland
  3. Swiss Institute of Bioinformatics, Switzerland
  4. University of Tartu, Estonia
  5. Karolinska Institutet, Sweden
  6. Sun Yat-sen University, China
  7. Broad Institute of Harvard and MIT, United States
Research Communication
  • Cited 1
  • Views 3,765
  • Annotations
Cite this article as: eLife 2019;8:e39856 doi: 10.7554/eLife.39856

Abstract

We use a genome-wide association of 1 million parental lifespans of genotyped subjects and data on mortality risk factors to validate previously unreplicated findings near CDKN2B-AS1, ATXN2/BRAP, FURIN/FES, ZW10, PSORS1C3, and 13q21.31, and identify and replicate novel findings near ABO, ZC3HC1, and IGF2R. We also validate previous findings near 5q33.3/EBF1 and FOXO3, whilst finding contradictory evidence at other loci. Gene set and cell-specific analyses show that expression in foetal brain cells and adult dorsolateral prefrontal cortex is enriched for lifespan variation, as are gene pathways involving lipid proteins and homeostasis, vesicle-mediated transport, and synaptic function. Individual genetic variants that increase dementia, cardiovascular disease, and lung cancer – but not other cancers – explain the most variance. Resulting polygenic scores show a mean lifespan difference of around five years of life across the deciles.

Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).

https://doi.org/10.7554/eLife.39856.001

eLife digest

Ageing happens to us all, and as the cabaret singer Maurice Chevalier pointed out, "old age is not that bad when you consider the alternative". Yet, the growing ageing population of most developed countries presents challenges to healthcare systems and government finances. For many older people, long periods of ill health are part of the end of life, and so a better understanding of ageing could offer the opportunity to prolong healthy living into old age.

Ageing is complex and takes a long time to study – a lifetime in fact. This makes it difficult to discern its causes, among the countless possibilities based on an individual’s genes, behaviour or environment. While thousands of regions in an individual’s genetic makeup are known to influence their risk of different diseases, those that affect how long they will live have proved harder to disentangle. Timmers et al. sought to pinpoint such regions, and then use this information to predict, based on their DNA, whether someone had a better or worse chance of living longer than average.

The DNA of over 500,000 people was read to reveal the specific ‘genetic fingerprints’ of each participant. Then, after asking each of the participants how long both of their parents had lived, Timmers et al. pinpointed 12 DNA regions that affect lifespan. Five of these regions were new and had not been linked to lifespan before. Across the twelve as a whole several were known to be involved in Alzheimer’s disease, smoking-related cancer or heart disease. Looking at the entire genome, Timmers et al. could then predict a lifespan score for each individual, and when they sorted participants into ten groups based on these scores they found that top group lived five years longer than the bottom, on average.

Many factors beside genetics influence how long a person will live and our lifespan cannot be read from our DNA alone. Nevertheless, Timmers et al. had hoped to narrow down their search and discover specific genes that directly influence how quickly people age, beyond diseases. If such genes exist, their effects were too small to be detected in this study. The next step will be to expand the study to include more participants, which will hopefully pinpoint further genomic regions and help disentangle the biology of ageing and disease.

https://doi.org/10.7554/eLife.39856.002

Introduction

Human lifespan is a highly complex trait, the product of myriad factors involving health, lifestyle, genetics, environment, and chance. The extent of the role of genetic variation in human lifespan has been widely debated (van den Berg et al., 2017), with estimates of broad sense heritability ranging from around 25% based on twin studies (Ljungquist et al., 1998; Herskind et al., 1996; McGue et al., 1993) (perhaps over-estimated [Young et al., 2018]) to around 16.1%, (narrow sense 12.2%) based on large-scale population data (Kaplanis et al., 2018). One very recent study suggests it is much lower still (<7%) (Ruby et al., 2018), pointing to assortative mating as the source of resemblance amongst kin.

Despite this modest heritability, extensive research has gone into genome-wide association studies (GWAS) finding genetic variants influencing human survival, using a variety of trait definitions and study designs (Deelen et al., 2011; Sebastiani et al., 2012; Beekman et al., 2013; Broer et al., 2015; Joshi et al., 2016; Pilling et al., 2016; Zeng et al., 2016; Pilling et al., 2017). GWAS have primarily focused on extreme cases of long-livedness (longevity) – individuals surviving past a certain age threshold – and scanning for differences in genetic variation from controls. While this case-control design has the advantage of focusing on highly statistically-informative individuals, who also often exhibit extreme healthspan and have potentially unique genetic attributes (Sebastiani et al., 2013; Sebastiani et al., 2016), the exceptional nature of the phenotype precludes collection of large samples, and differences in definitions of longevity complicate meta-analysis. As a result, only two robustly replicated, genome-wide significant associations (near APOE and FOXO3) have been made to date (Broer et al., 2015; Deelen et al., 2014).

An alternative approach is to study lifespan as a quantitative trait in the general population and use survival models (such as Cox proportional hazards [Cox, 1972]) to allow long-lived survivors to inform analysis. However, given the incidence of mortality in middle-aged subjects is low, studies have shifted to the use of parental lifespans with subject genotypes (an instance of Wacholder’s kin-cohort method [Wacholder et al., 1998]), circumventing the long wait associated with studying age at death in a prospective study (Joshi et al., 2016; Pilling et al., 2016). In addition, the recent increase in genotyped population cohorts around the world, and in particular the creation of UK Biobank (Bycroft et al., 2017), has raised GWAS sample sizes to hundreds of thousands of individuals, providing the statistical power necessary to detect genetic effects on mortality.

A third approach is to gather previously published GWAS on risk factors thought to possibly affect lifespan, such as smoking behaviour and cardiovascular disease (CVD), and estimate their actual independent, causal effects on mortality using Mendelian Randomisation. These causal estimates can then be used in a Bayesian framework to inform previously observed SNP associations with lifespan (McDaid et al., 2017).

Here, we blend these three approaches to studying lifespan and perform the largest GWAS on human lifespan to date. First, we leverage data from UK Biobank and 26 independent European-heritage population cohorts (Joshi et al., 2017) to carry out a GWAS of parental survival, quantified using Cox models. We then supplement this with data from 58 GWAS on mortality risk factors to conduct a Bayesian prior-informed GWAS (iGWAS). Finally, we use publicly available case-control longevity GWAS statistics to compare the genetics of lifespan and longevity and provide collective replication of our lifespan GWAS results.

We also examine the diseases associated with lifespan-altering variants and the effect of known disease variants on lifespan, to provide insight into the interplay between lifespan and disease. Finally, we use our GWAS results to implicate specific genes, biological pathways, and cell types, and use our findings to create and test whole-genome polygenic scores for survival.

Results

Genome-wide association analysis

We carried out GWAS of survival in a sample of 1,012,240 parents (60% deceased) of European ancestry from UK Biobank and a previously published meta-analysis of 26 additional population cohorts (LifeGen [Joshi et al., 2017]; Table 1—source data 1). We performed a sex-stratified analysis and then combined the allelic effects in fathers and mothers into a single parental survival association in two ways. First, we assumed genetic variants with common effect sizes (CES) for both parents, maximising power if the effect is indeed the same. Second, we allowed for sex-specific effect sizes (SSE), maximising power to detect sexually dimorphic variants, including those only affecting one sex. The latter encompasses a conventional sex-stratified analysis, but uses only one statistical test for the much more general alternative hypothesis that there is an effect in at least one sex.

Table 1
Twelve genome-wide significant associations with lifespan using UK Biobank and LifeGen.

Parental phenotypes from UK Biobank and LifeGen meta-analysis, described in Table 1—source data 1, were tested for association with subject genotype. See Table 1—source data 2 for LD Score regression intercept of each cohort separately and combined. Displayed here are loci associating with lifespan at genome-wide significance (p < 2.5 × 10−8). At or near – Gene, set of genes, or cytogenetic band nearest to the index SNP; rsID – The index SNP with the lowest P value in the standard or sex-specific effect (SSE) analysis. Chr – Chromosome; Position – Base-pair position on chromosome (GRCh37); A1 – the effect allele, increasing lifespan; Freq1 – Frequency of the A1 allele; Years1 – Years of life gained for carrying one copy of the A1 allele; SE – Standard Error; P – the P value for the Wald test of association between imputed dosage and cox model residual; Disease – Category of disease for known associations with SNP or close proxies (r2 > 0.6), see Table 1—source data 3 for details and references. Despite the well-known function of the HTT gene in Huntington’s disease, SNPs within the identified locus near this gene have not been associated with the disease at genome-wide significance.

https://doi.org/10.7554/eLife.39856.003
At or nearrsIDChrPositionA1Freq1Years1SEPSSE PDisease
MAGI3rs12306661114173410G0.850.32240.05556.4E-096.1E-08Autoimmune
KCNK3rs1275922226932887G0.740.25790.04436.0E-092.7E-07Cardiometabolic
HTTrs6134820843089564T0.390.22990.03955.8E-091.2E-07-
HLA-DQA1rs34967069632591248T0.070.56130.09564.3E-093.6E-09Autoimmune
LPArs104558726161010118A0.920.76390.07438.5E-253.1E-24Cardiometabolic
CDKN2B-AS1rs1556516922100176G0.500.25100.03867.5E-116.4E-12Cardiometabolic
ATXN2/BRAPrs1106597912112059557C0.560.27980.03931.0E-126.2E-13Autoimmune/
Cardiometabolic
CHRNA3/5rs80428491578817929T0.650.43680.04101.6E-261.9E-30Smoking-related
FURIN/FESrs62241591423543G0.520.25070.03901.3E-101.8E-09Cardiometabolic
HPrs129248861672075593A0.800.27980.04931.4E-089.1E-08Cardiometabolic
LDLRrs1421589111911190534A0.120.35500.06168.1E-093.3E-08Cardiometabolic
APOErs4293581945411941T0.851.05610.05463.1E-831.8E-85Cardiometabolic/
Neuropsychiatric

We find 12 genomic regions with SNPs passing genome-wide significance for one or both analyses (p < 2.5 × 10–8, accounting for the two tests CES/SSE) (Figure 1; Table 1). Among these are five loci discovered here for the first time, at or near MAGI3, KCNK3, HTT, HP, and LDLR. Carrying one copy of a life-extending allele is associated with an increase in lifespan between 0.23 and 1.07 years (around 3 to 13 months). Despite our sample size exceeding 1 million phenotypes, a variant had to have a minor allele frequency exceeding 5% and an effect size of 0.35 years of life or more per allele for our study to detect it with 80% power.

SNP associations with lifespan across both parents under the assumption of common and sex-specific effect sizes.

Miami plot of genetic associations with joint parental survival. In purple are the associations under the assumption of common SNP effect sizes across sexes (CES); in green are the associations under the assumption of sex-specific effect sizes (SSE). P refers to the two-sided P values for association of allelic dosage on survival under the residualised Cox model. The red line represents our multiple testing-adjusted genome-wide significance threshold (p = 2.5 × 10−8). Annotated are the gene, set of genes, or cytogenetic band near the index SNP, marked in red. P values have been capped at –log10(p) = 15 to better visualise associations close to genome-wide significance. SNPs with P values beyond this cap (near APOE, CHRNA3/5 and LPA) are represented by triangles.

https://doi.org/10.7554/eLife.39856.007

We also attempted to validate novel lifespan SNPs discovered by Pilling et al. (2017) in UK Biobank at an individual level by using the LifeGen meta-analysis as independent replication sample. Testing 20 candidate SNPs for which we had data available, we find directionally consistent, nominally significant associations for six loci (p < 0.05, one-sided test), of which three have sex-specific effects. We also provide evidence against three putative loci but lack statistical power to assess the remaining 11 (Figure 2Figure 2—source data 1).

Figure 2 with 1 supplement see all
Validation of SNPs identified in other studies using independent samples of European descent.

Discovery – Candidate SNPs or proxies (r2 > 0.95) associated with lifespan (top panels, stratified by sex) and longevity (bottom panel) by previous studies (Zeng et al., 2016; Pilling et al., 2017; Deelen et al., 2014; Flachsbart et al., 2009; Sebastiani et al., 2017; Ben-Avraham et al., 2017). Effect sizes have been rescaled to years of life to make direct comparisons between studies (see Materials and methods and Figure 2—figure supplement 1). Replication – Independent samples, either the LifeGen meta-analysis to replicate Pilling et al. (2017), or the full dataset including UK Biobank. Gene names are as reported by discovery and have been coloured based on overlap between confidence intervals (CIs) of effect estimates. Dark blue – Nominal replication (p < 0.05, one-sided test). Light blue – CIs overlap (Phet > 0.05) and cover zero, but replication estimate is closer to discovery than zero. Yellow – CIs overlap (Phet > 0.05) and cover zero, and replication estimate is closer to zero than discovery. Red – CIs do not overlap (Phet < 0.05) and replication estimate covers zero. Black – no replication data.

https://doi.org/10.7554/eLife.39856.008

We then used our full sample to test six candidate SNPs previously associated with longevity (Zeng et al., 2016; Deelen et al., 2014; Flachsbart et al., 2009; Sebastiani et al., 2017) for association with lifespan, and find directionally consistent evidence for SNPs near FOXO3 and EBF1. The remaining SNPs did not associate with lifespan despite apparently adequate power to detect any effect similar to that originally reported (Figure 2Figure 2—source data 1).

Finally, we tested a deletion, d3-GHR, reported to affect male lifespan by 10 years when homozygous (Ben-Avraham et al., 2017) by converting its effect size to one we expect to observe when fitting an additive model. We used a SNP tagging the deletion and estimated the expected effect size in a linear regression for the (postulated) recessive effect across the three genotypes, given their frequency (see Materials and methods). While this additive model reduces power relative to the correct model, our large sample size is more than able to offset the loss of power, and we find evidence d3-GHR does not associate with lifespan with any (recessive or additive) effect similar to that originally reported (Figure 2Figure 2—source data 1).

Mortality risk factor-informed GWAS (iGWAS)

We integrated 58 publicly available GWAS on mortality risk factors with our CES lifespan GWAS, creating Bayesian priors for each SNP effect based on causal effect estimates of 16 independent risk factors on lifespan. These included body mass index, blood biochemistry, CVD, type 2 diabetes, schizophrenia, multiple sclerosis, education levels, and smoking traits.

The integrated analysis reveals an additional seven genome-wide significant associations with lifespan (Bayes Factor permutation p < 2.5 × 10–8), of which SNPs near TMEM18, GBX2/ASB18, IGF2R, POM12C, ZC3HC1, and ABO are reported at genome-wide significance for the first time (Figure 3; Table 2). A total of 82 independent SNPs associate with lifespan when allowing for a 1% false discovery rate (FDR) (Table 2—source data 2).

Figure 3 with 1 supplement see all
SNP associations with lifespan across both parents when taking into account prior information on mortality risk factors.

Bayesian iGWAS was performed using observed associations from the lifespan GWAS and priors based on 16 traits selected by an AIC-based stepwise model. As the P values were assigned empirically using a permutation approach, the minimum P value is limited by the number of permutations; SNPs reaching this limit are represented by triangles. Annotated are the gene, cluster of genes, or cytogenetic band in close proximity to the top SNP. The red line represents the genome-wide significance threshold (p = 2.5 × 10−8). The blue line represents the 1% FDR threshold. Figure 3—figure supplement 1 shows the associations of each genome-wide significant SNP with the 16 risk factors.

https://doi.org/10.7554/eLife.39856.011
Table 2
Bayesian GWAS using mortality risk factors reveals seven additional genome-wide significant variants.

At or near – Gene or set of genes nearest to the index SNP; rsID – The index SNP with the lowest P value in the risk factor-informed analysis. Chr – Chromosome; Position – Base-pair position on chromosome (GRCh37); A1 – the effect allele, increasing lifespan; Freq1 – Frequency of the A1 allele; Years1 – Years of life gained for carrying one copy of the A1 allele; SE – Standard Error; CES P – the P value for the Wald test of association between imputed dosage and cox model residual, under the assumption of common effects between sexes. Risk – mortality risk factors associated with the variant (p < 3.81 × 10−5, accounting for 82 independent SNPs and 16 independent factors). BF P – Empirical P value derived from permutating Bayes Factors. See Table 2—source data 1 for the causal estimate of each risk factor. See Table 2—source data 2 for all SNPs significant at FDR < 1%.

https://doi.org/10.7554/eLife.39856.013
At or nearrsIDChrPositionA1Freq1Years1SECES PRiskBF P
CELSR2/PSRC1rs49708361109821797G0.230.22340.04631.4E-06LDL
HDL
CAD
1.6E-09
TMEM18rs67446532628524A0.170.27720.05115.8E-08BMI7.0E-10
GBX2/ASB18rs102114712237081854C0.800.24010.04931.1E-06Education2.3E-08
IGF2Rrs1113330056160487196G0.980.86650.15773.9E-08LDL
CAD
6.6E-09
POM121Crs113160991775094329G0.780.25410.04952.8E-07BMI
Insulin
7.5E-09
ZC3HC1rs561795637129685597A0.390.21070.04062.1E-07CAD5.6E-09
ABOrs25190939136141870C0.810.22440.04976.3E-06LDL
CAD
1.9E-08

As has become increasingly common (Pilling et al., 2017), we attempted to replicate our genome-wide significant findings collectively, rather than individually. This is usually done by constructing polygenic risk scores from genotypic information in an independent cohort and testing for association with the trait of interest subject-by-subject. We used publicly available summary statistics on extreme longevity as an independent replication dataset (Broer et al., 2015; Deelen et al., 2014), but lacking individual data from such studies, we calculated the collective effect of lifespan SNPs on longevity using the same method as inverse-variance meta-analysis two-sample Mendelian randomisation (MR) using summary statistics (Hemani et al., 2018), which gives equivalent results. Prior to doing this, all effects observed in the external longevity studies were converted to hazard ratios using the APOE variant effect size as an empirical conversion factor, to allow the longevity studies to be meta-analysed despite their different study designs (and to be adjusted for sample overlap; see Materials and methods).

Although the focus is on collective replication, our method has the advantage of transparency at an individual variant level, which is of particular importance for researchers seeking to follow-up individual loci. Remarkably, all lead lifespan variants show directional consistency with the independent longevity sample, and 4 SNPs or close proxies (r2 > 0.8) reach nominal replication (p < 0.05, one-sided test) (Figure 4—source data 1). Of these, SNPs near ABO, ZC3HC1, and IGF2R are replicated for the first time, and thus appear to affect overall survival and survival to extreme age. The overall ratio of replication effect sizes to discovery effect sizes – excluding APOE – is 0.42 (95% CI 0.23–0.61; p = 1.35 × 10−5). The fact this ratio is significantly greater than zero indicates most lifespan SNPs are indeed longevity SNPs. However, the fact most SNPs have a ratio smaller than one indicates they may affect early mortality more than survival to extreme age, relative to APOE (which itself has a greater effect on late-life mortality than early mortality) (Figure 4).

Collective replication of individual lifespan SNPs using GWAMAs for extreme long-livedness shows directional consistency in all cases.

Forest plot of effect size ratios between genome-wide significant lifespan variants from our study and external longevity studies (Broer et al., 2015; Deelen et al., 2014), having converted longevity effect sizes to our scale using APOE as benchmark (see Materials and methods and Figure 4—source data 1). Alpha – ratio of replication to discovery effect sizes on the common scale and 95% CI (reflecting uncertainty in the numerator and denominator; P values are for one-sided test). A true (rather than estimated) ratio of 1 indicates the relationship between SNP effect on lifetime hazard and extreme longevity is the same as that of APOE, while a ratio of zero suggests no effect on longevity. A true ratio between 0 and 1 suggests a stronger effect on lifetime hazard than longevity relative to APOE. SNPs overlapping both 0 and 1 are individually underpowered. The inverse variance meta-analysis of alpha over all SNPs, excluding APOE, is 0.42 (95% 0.23 to 0.61; p = 1.35 × 10–5) for Halpha = 0.

https://doi.org/10.7554/eLife.39856.016

Sex- and age-specific effects

We stratified our UK Biobank sample (for which we had individual level data) by sex and age bands to identify sex- and age-specific effects for survival SNPs discovered and/or replicated in this study. Although power was limited, as we sought contrasts in small effect sizes, we find 5 SNPs with differential effects on lifespan when stratified (FDR 5% across the 24 variants considered).

The effect of the APOE variant increases with age: the ε4 log hazard ratio on individuals older than 70 years is around 3 times greater than those between ages 40–70. In contrast, the effect of lead variants near CHRNA3/5, CDKN2B-AS1, and ABO tends to decline after age 60, at least when expressed as hazard ratios (Figure 5A).

Age and sex specific effects on parent survival for 5 variants showing 5% FDR age- or sex-specificity of effect size from 23 lifespan-increasing variants.

(A) Variants showing age-specific effects; (B) Variants showing sex-specific effects. Panel titles show the gene, cluster of genes, or cytogenetic band in close proximity to the index lifespan variant, with this variant and lifespan-increasing allele in parentheses. Beta – loge(protection ratio) for 1 copy of effect allele in self in the age band (i.e. 2 x observed due to 50% kinship). Note the varying scale of y-axis across panels. Age range: the range of ages over which beta was estimated. Sex p – nominal P value for association of effect size with sex. Age p – nominal P value for association of effect size with age.

https://doi.org/10.7554/eLife.39856.018

Independent of age, lead variants near APOE and PSORS1C3 also show an effect (lnHR) of 0.036; 0.038 greater in women (95% CI 0.013–0.059; 0.019–0.056, respectively), compared to men (Figure 5B). Notably, the SNP near ZW10, which was identified by Pilling et al. (2017) in fathers, and which replicated in LifeGen fathers, may affect men and women equally (95% CI years gained per effect allele, men 0.17–0.42, women 0.04–0.31), as measured in our meta-analysis of UK Biobank and LifeGen.

Causal genes and methylation sites

We used SMR-HEIDI to look for causal effects of gene expression or changes in methylation on lifespan within the 24 loci discovered or replicated in our study. Using blood eQTL summary statistics from two studies (Westra et al., 2013; Lloyd-Jones et al., 2017), we suggest causal roles for expression of PSRC1, SESN1, SH2B3, PSMA4, FURIN, FES, and KANK2 at 5% FDR (Supplementary file 1). GTEx tissue-wide expression data suggests further roles for 16 genes across 24 tissues, especially FES (nine tissues), PMS2P3 (six tissues) and PSORS1C1 (four tissues). Methylation data reveals roles for 44 CpG sites near nine loci, especially near the PSORS1C3 locus (21 sites), APOE locus (nine sites), and HLA-DQA1 locus (four sites) (Supplementary file 2).

We next used SOJO to perform conditional analysis on the same loci to find additional independent variants associated with lifespan. We find substantial allelic heterogeneity in several association intervals and identify an additional 335 variants, which increase out-of-sample explained variance from 0.095% to 0.169% (78% increase). CELSR2/PSRC1, KCNK3, HLA-DQA1, LPA, ZW10, FURIN/FES, and APOE are amongst the most heterogeneous loci with at least 25 variants per locus showing independent effects (Supplementary file 3).

Disease and lifespan

We next sought to understand the link between our lifespan variants and disease. We looked up known associations with our top hits and proxies (r> 0.6) in the GWAS catalog (MacArthur et al., 2017) and PhenoScanner (Staley et al., 2016), excluding loci identified in iGWAS as these used disease associations to build the effect priors. We also excluded trait associations discovered solely in UK Biobank, as the overlap with our sample could result in spurious association due to correlations between morbidity and mortality. Under these restrictions, we find alleles which increase lifespan associate with a reduction in cardiometabolic, autoimmune, smoking-related, and neuropsychiatric disease and their disease risk factors (Table 1, Table 1—source data 3). None of the loci show any association with cancer other than lung cancer.

We then looked up associations of the 81 iGWAS SNPs (1% FDR) with the risk factor GWAMAs used to inform the prior. While associations are a priori limited to the risk factors included in the iGWAS, the pattern of association is still of interest. We find loci show strong clustering in either blood lipids or CVD, show moderate clustering of metabolic and neurological traits, and show weak but highly pleiotropic clustering amongst most of the remaining traits (see Figure 3—figure supplement 1 for clustering of genome-wide significant SNPs).

In order to study the relative contribution of diseases to lifespan, we approached the question from the other end and looked up known associations for disease categories (CVD, type 2 diabetes, neurological disease, smoking-related traits, and cancers) in large numbers (>20 associations in each category) from the GWAS catalog (MacArthur et al., 2017) and used our GWAS to see if the disease loci associate with lifespan. Our measure was lifespan variance explained (LVE, years2 [Ljungquist et al., 1998]) by the locus, which balances effect size against frequency, and is proportional to selection response and the GWAS test statistic and thus monotonic for risk of false positive lifespan associations. Taking each independent disease variant, we ordered them by LVE, excluding any secondary disease where the locus was pleiotropic.

The Alzheimer’s disease locus APOE shows the largest LVE (0.23 years2), consistent with its most frequent discovery as a lifespan SNP in GWAS (Joshi et al., 2016; Pilling et al., 2017; Deelen et al., 2014; Deelen et al., 2013). Of the 20 largest LVE SNPs, 12 and 4 associate with CVD and smoking/lung cancer, respectively, while only two associate with other cancers (near ZW10 and NRG1; neither in the top 15 LVE SNPs). Cumulatively, the top 20/45 LVE SNPs explain 0.33/0.43 years2 through CVD, 0.13/0.15 years2 through smoking and lung cancer, and 0.03/0.11 years2 through other cancers (Figure 6).

Disease loci explaining the most lifespan variance are protective for neurological disease, cardiovascular disease, and lung cancer.

SNPs reported as genome-wide significant for disease in European population studies, ordered by their lifespan variance explained (LVE), show the cumulative effect of disease SNPs on variation in lifespan. An FDR cut-off of 1.55% is applied simultaneously across all diseases, allowing for one false positive association with lifespan among the 45 independent loci. Note the log scale on the X axis. Cardiovascular disease – SNPs associated with cardiovascular disease or myocardial infarction. Alzheimer's/Parkinson's – SNPs associated with Alzheimer’s disease or Parkinson’s disease. Smoking/lung cancer – SNPs associated with smoking behaviour, chronic obstructive pulmonary disease and lung adenocarcinomas. Other cancers – SNPs associated with cancers other than lung cancer (see Figure 7—source data 1 for a full list). Type 2 diabetes – SNPs associated with type 2 diabetes.

https://doi.org/10.7554/eLife.39856.021

Strikingly, two of the three largest LVE loci for non-lung cancers (at or near ATXN2/BRAP and CDKN2B-AS1) show increased cancer protection associating with decreased lifespan (due to antagonistic pleiotropy with CVD), while the third (at or near MAGI3) also shows evidence of pleiotropy, having an association with CVD three times as strong as breast cancer, and in the same direction. In addition, 6 out of the 11 remaining cancer-protective loci which increase lifespan and pass FDR (near ZW10, NRG1, C6orf106, HNF1A, C20orf187, and ABO) also show significant associations with CVD but could not be tested for pleiotropy as we did not have data on the relative strength of association of every type of cancer against CVD, and thus (conservatively from the point of view of our conclusion) remain counted as cancer SNPs (Figure 7, Figure 7—source data 1). Visual inspection also reveals an interesting pattern in the SNPs that did not pass FDR correction for affecting lifespan: cardio-protective variants associate almost exclusively with increased lifespan, while cancer-protective variants appear to associate with lifespan in either direction (grey dots often appear below the x-axis for other cancers).

Lifespan variance explained by individual genome-wide significant disease SNPs within disease categories.

Genome-wide significant disease SNPs from the GWAS catalog are plotted against the amount of lifespan variance explained (LVE), with disease-protective alleles signed positively when increasing lifespan and signed negatively when decreasing lifespan. SNPs with limited evidence of an effect on lifespan are greyed out: an FDR cut-off of 1.55% is applied simultaneously across all diseases, allowing for one false positive among all significant SNPs. Secondary pleiotropic SNPs (i.e. those associating more strongly with another one of the diseases, as assessed by PheWAS in UK Biobank) are coloured to indicate the main effect on increased lifespan seems to arise elsewhere. Of these, turquoise SNPs show one or more alternative disease associations in the same direction and at least twice as strong (double Z statistic – see Detailed Materials and methods) as the principal disease, while brown SNPs show one or more significant associations with alternative disease in the opposite direction that explains the negative association of the disease-protective SNP with lifespan. The variance explained by all SNPs in black is summed (∑LVE) by disease. Annotated are the gene, cluster of genes, or cytogenetic band near the lead SNPs. The Y axis has been capped to aid legibility of SNPs with smaller LVE: SNPs near APOE pass this cap and are represented by triangles. See Figure 7—source data 1 for the full list of disease SNP associations.

https://doi.org/10.7554/eLife.39856.022

Together, the disease loci included in our study with significant effects on lifespan explain 0.95 years2, or less than 1% of the phenotypic variance of lifespan of European parents in UK Biobank (123 years2), and around 5% of the heritability.

Cell type and pathway enrichment

We used stratified LD-score regression to assess whether cell type-specific regions of the genome are enriched for lifespan variants. As this method derives its power from SNP heritability, we limited the analysis to genomically British individuals in UK Biobank, which showed the lowest heterogeneity and the highest SNP heritability. At an FDR < 5%, we find enrichment in SNP heritability in five categories: two histone and two chromatin marks linked to male and female foetal brain cells, and one histone mark linked to the dorsolateral prefrontal cortex (DLPC) of the brain. Despite testing other cell types, such as heart, liver, and immune cells, no other categories are statistically significant after multiple testing correction (Supplementary file 4).

We also determined which biological pathways could explain the associations between our genetic variants and lifespan using three different methods, VEGAS, PASCAL, and DEPICT. VEGAS highlights 33 gene sets at an FDR < 5%, but neither PASCAL nor DEPICT (with SNP thresholds at p < 5 × 10–8 and p < 1 × 10–5) identify any gene sets passing multiple testing correction. The 33 gene sets highlighted by VEGAS are principally for blood lipid metabolism (21), with the majority involving lipoproteins (14) or homeostasis (4). Other noteworthy gene sets are neurological structure and function (5) and vesicle-mediated transport (3). Enrichment was also found for organic hydroxy compound transport, macromolecular complex remodelling, signalling events mediated by stem cell factor receptor (c-kit), and regulation of amyloid precursor protein catabolism (Supplementary file 5).

Finally, we performed an analysis to assess whether genes that have been shown to change their expression with age (Peters et al., 2015) are likely to have a causal effect on lifespan itself. Starting with a set of independent SNPs affecting gene expression (eQTLs), we created categories based on whether gene expression was age-dependent and whether the SNP was associated with lifespan in our study (at varying levels of significance). We find eQTLs associated with lifespan are 1.69 to 3.39 times more likely to have age-dependent gene expression, depending on the P value threshold used to define the set of lifespan SNPs (Supplementary file 6).

Out-of-sample lifespan PRS associations

We calculated polygenic risk scores (PRS) for lifespan for two subsamples of UK Biobank (Scottish individuals and a random selection of English/Welsh individuals), and one sample from the Estonian Biobank. The PRS were based on (recalculated) lifespan GWAS summary statistics that excluded these samples to ensure independence between training and testing datasets.

When including all independent markers, we find an increase of one standard deviation in PRS increases lifespan by 0.8 to 1.1 years, after doubling observed parent effect sizes to compensate for the imputation of their genotypes (see Table 3—source data 1 for a comparison of performance of different PRS thresholds).

Table 3
Polygenic scores for lifespan associate with out-of-sample parent and subject lifespans.

A polygenic risk score (PRS) was made for each subject using GWAS results that did not include the subject sets under consideration. Subject or parent survival information (age entry, age exit, age of death, if applicable) was used to test the association between polygenic risk score and survival as (a) a continuous score and (b) by dichotomising the top and bottom decile scores. Population – Population sample of test dataset, where E and W is England and Wales; Kin – Individuals tested for association with polygenic score; N – Number of lives used for analysis; Deaths – Number of deaths; Beta – Effect size per PRS standard deviation, in loge(protection ratio), doubled in parents to reflect the expected effect in cohort subjects. SE – Standard error, doubled in parents to reflect the expected error in cohort subjects; Years – Estimated years of life gained per PRS standard deviation; P – P value of two-sided test of association; Contrast age at death – difference between the median lifespan of individuals in the top and bottom deciles of the score in year of life (observed parent contrast is again doubled to account for imputation of their genotypes).

https://doi.org/10.7554/eLife.39856.024
Sample descriptivesEffect of polygenic scoreContrast age at death
PopulationKinNDeathsBetaSEYearsPMenWomen
ScotlandParents46,93633,1960.1070.0111.074.2E-225.65.6
ScotlandSubjects24,0599410.0850.0330.851.0E-02--
E and WParents58,07039,3470.1330.0101.337.3E-396.44.8
E and WSubjects29,8157600.0980.0370.987.1E-03--
EstoniaParents61,72829,6600.0990.0120.992.5E-173.02.8
EstoniaSubjects24,80028940.0870.0190.872.6E-063.52.7
Per standard deviationTop vs. bottom 10%

Correspondingly – again after doubling for parental imputation – we find a difference in median survival for the top and bottom deciles of PRS of 5.6/5.6 years for Scottish fathers/mothers, 6.4/4.8 for English and Welsh fathers/mothers and 3.0/2.8 for Estonian fathers/mothers. In the Estonian Biobank, where data is available for a wider range of subject ages (i.e. beyond median survival age) we find a contrast of 3.5/2.7 years in survival for male/female subjects, across the PRS tenth to first deciles (Table 3, Figure 8).

Figure 8 with 1 supplement see all
Survival curves for highest and lowest deciles of lifespan polygenic risk score.

A polygenic risk score was made for each subject using GWAS results that did not include the subject sets under consideration. Subject or parent survival information (age entry, age exit, age of death (if applicable) was used to create Kaplan-Meier curves for the top and bottom deciles of score. In this figure (only) no adjustment has been made for the dilution of observed effects due to parent imputation from cohort subjects. Effect sizes in parent, if parent genotypes had been used, are expected to be twice that shown. E and W – England and Wales; PRS – polygenic risk score.

https://doi.org/10.7554/eLife.39856.026

Finally, as we did for individual variants, we looked at the age- and sex-specific nature of the PRS on parental lifespan and then tested for associations with (self-reported) age-related diseases in subjects and their kin. We find a high PRS has a larger protective effect on lifespan for mothers than fathers in UK Biobank subsamples (p = 0.0071), and has a larger protective effect on lifespan in younger age bands (p = 0.0001) (Figure 9), although in both cases, it should be borne in mind that women and younger people have a lower baseline hazard, so a greater improvement in hazard ratio does not necessarily mean a larger absolute protection.

Figure 9 with 1 supplement see all
Sex and age specific effects of polygenic survival score (PRS) on parental lifespan in UK Biobank.

The effect of out-of-sample PRS on parental lifespan stratified by sex and age was estimated for Scottish and English/Welsh subsamples individually (see Figure 9—figure supplement 1) and subsequently meta-analysed. The estimate for the PRS on father lifespan in the highest age range has very wide confidence intervals (CI) due to the limited number of fathers surviving past 90 years of age. The beta 95% CI for this estimate is –0.15 to 0.57. Beta – loge(protection ratio) for one standard deviation of PRS for increased lifespan in self in the age band (i.e. 2 x observed due to 50% kinship), bounds shown are 95% CI; Age range – the range of ages over which beta was estimated; sex p – P value for association of effect size with sex; age p – P value for association of effect size with age.

https://doi.org/10.7554/eLife.39856.028

We find that overall, higher PRS scores (i.e. genetically longer life) are associated with less heart disease, diabetes, hypertension, respiratory disease and lung cancer, but increased prevalence of Alzheimer’s disease, Parkinson’s disease, prostate cancer and breast cancer, the last three primarily in parents. We find no association between the score and prevalence of cancer in subjects. (Figure 10).

Figure 10 with 1 supplement see all
Associations between polygenic lifespan score and diseases of UK Biobank subjects and their kin.

Logistic regression was performed on standardised polygenic survival score (all variants) and 21 disease traits reported by 24,059 Scottish and 29,815 English/Welsh out-of-sample individuals about themselves and their kin. For grouping of UK Biobank disease codes, see Figure 10—source data 1. Displayed here are inverse-variance meta-analysed estimates of the diseases for which multiple sources of data were available (i.e. parents and/or siblings; see Figure 10—figure supplement 1 for all associations). ‘Cancer’ is only in subjects, whilst the specific subtypes are analysed for kin. The left panel shows disease estimates for each kin separately; the right panel shows the combined estimate, with standard errors adjusted for correlation between family members. Diseases have been ordered by magnitude of effect size (combined estimate). Beta – log odds reduction ratio of disease per standard deviation of polygenic survival score, where a negative beta indicates a deleterious effect of score on disease prevalence (lifetime so far), and positive beta indicates a protective effect on disease. Effect sizes for first degree relatives have been doubled. Cancer – Binary cancer phenotype (any cancer, yes/no).

https://doi.org/10.7554/eLife.39856.031

Discussion

Applying the kin-cohort method in a GWAS and mortality risk factor iGWAS across UK Biobank and the LifeGen meta-analysis, we identified 11 novel genome-wide significant associations with lifespan and replicated six previously discovered loci. We also replicated long-standing longevity SNPs near APOE, FOXO3, and 5q33.3/EBF1 – albeit with smaller effect sizes in the latter two cases – but found evidence of no association (at effect sizes originally published) with lifespan for more recently published longevity SNPs near IL6, ANKRD20A9P, USP42, and TMTC2. Conversely, all individual variants identified in our analyses showed directionally consistent effects in a meta-analysis of two European-ancestry studies of extreme longevity, and a test of association of a polygenic risk score of the variants was highly significant in the longevity dataset (p < 1.5 × 10−5).

Our findings validate the results of a previous Bayesian analysis performed on a subset (N = 116,279) of the present study’s discovery sample (McDaid et al., 2017), which highlighted two loci which are now genome-wide significant in conventional GWAS in the present study’s larger sample. iGWAS thus appears to be an effective method able to identify lifespan-associated variants in smaller samples than standard GWAS, albeit relying on known biology.

With the curious exception of a locus near HTT (the Huntington's disease gene), all lead SNPs are known to associate with autoimmune, cardiometabolic, neuropsychiatric, or smoking-related disease, and it is plausible these are the major pathways through which the variants affect lifespan. Whole-genome polygenic risk scores showed similar associations with disease, excluding late-onset disorders such as Alzheimer’s and Parkinson’s, where polygenic risk scores for extended lifespan increased risk (of survival to age at onset) of the disease.

Genetic variants affecting lifespan were enriched for pathways involving the transport, homeostasis and metabolism of lipoprotein particles, validating previous reports (McDaid et al., 2017). We also identified new pathways including vesicle transport, metabolism of acylglycerol and sterols, and synaptic and dendritic function. We discovered genomic regions with epigenetic marks determining cell differentiation into foetal brain and DLPC cells were enriched for genetic variants affecting lifespan. Finally, we showed that we can use our GWAS results to construct a polygenic risk score, which makes 3 to 5 year distinctions in life expectancy at birth between individuals from the score’s top and bottom deciles.

Despite studying over 1 million lives, our standard GWAS only identified 12 variants influencing lifespan at genome-wide significance. This contrasts with height (another highly polygenic trait) where a study of around 250,000 individuals by Wood et al., 2014. found 423 loci. This difference can partly be explained by the much lower heritability of lifespan (0.12; Kaplanis et al., 2018) (cf. 0.8 for height [Wood et al., 2014]), consistent with evolution having a stronger influence on the total heritability of traits more closely related to fitness and limiting effect sizes. In addition, the use of indirect genotypes (the kin-cohort method) reduces the effective sample size to 1/4 for the parent-offspring design.

When considering these limitations, we calculate our study was equal in power to a height study of only around 23,224 individuals, were lifespan to have a similar genetic architecture to height (see Materials and methods). Under this assumption, we would require a sample size of around 10 million parents (or equivalently 445,000 nonagenarian cases, with even more controls) to detect a similar number of loci as Wood et al. At the same time, our inability to replicate several previous borderline significant longevity and lifespan findings suggests research into survival in general requires substantial increases in power to robustly identify loci.

Meta-analysis of mothers and fathers, permitting common or sex-specific effect sizes, of course, doubled effective sample size, with slight attenuation to reflect the observed correlation (~10%) between father and mother traits (consistent with previous studies [Kaplanis et al., 2018]). This correlation indicates the presence of assortative mating on traits which correlate with lifespan (as lifespan itself is of course not observed until later), or post-pairing environmental convergence. We note that in principle, assortative mating could lead to allelic correlations at causal loci for the contributing traits, causing departures from Hardy-Weinberg equilibrium, and increasing the genotypic variance and thus power to detect association. However, in practice, at least for lifespan, the effects are too small for the effect to be material.

The association of lifespan variants with well-known, life-shortening diseases (cardiovascular, autoimmune, smoking-related diseases and lung cancer; Mathers et al., 2018) is not surprising, but the paucity of associations with other forms of cancer – without pleiotropic effects on CVD – is. This paucity suggests cancer deaths may often be due to (perhaps many) rarer variants or environmental exposures, although effect sizes might simply be slightly below our cut-off threshold to detect. Disappointingly, the variants and pathways we identified do not appear to underpin a generalised form of ageing independent of disease.

Our finding that lifespan genetics are enriched for lipid metabolism genes is in line with expectations, given lipid metabolites – especially cholesterol metabolites – have well-established effects on atherosclerosis, type 2 diabetes, Alzheimer’s disease, osteoporosis, and age-related cancers (Zarrouk et al., 2014). Pilling et al. (2017) implicated nicotinic acetylcholine receptor pathways in human lifespan, which we detected at nominal significance (p = 2 × 10−4) but not at 5% FDR correction (q = 0.0556). Instead we highlighted more general synapse and dendrite pathways and identified foetal brain and DLPC cells as important in ageing. The DLPC is involved in smoking addiction (Hayashi et al., 2013), dietary self-control (Lowe et al., 2014), and is susceptible to neurodegeneration (Morrison and Baxter, 2012), which could explain why genetic variation for lifespan is specifically enriched in these cells, mediated through smoking-related, cardiometabolic, and neuropsychiatric disease.

Much work has been done implicating FOXO3 as an ageing gene in model organisms (Kenyon et al., 1993; Hwangbo et al., 2004), however we found the association in humans at that locus may be driven by expression of SESN1 (admittedly a finding restricted to peripheral blood tissue). SESN1 is a gene connected to the FOXO3 promoter via chromatin interactions and is involved in the response to reactive oxygen species and mTORC1 inhibition (Donlon et al., 2017). While fine-mapping studies have specifically found genetic variation within the locus causes differential expression of FOXO3 itself (Flachsbart et al., 2017; Grossi et al., 2018), this does not rule out the effect of co-expression of SESN1. More powered tissue-specific expression data and experimental work on SESN1 vs. FOXO3 could elucidate the causal mechanism. For now, results from model organisms seem to leave the preponderance of evidence for FOXO3.

Our results suggest disease-associated lifespan variants reduce the chances of extreme long-livedness, but remain agnostic as to the more interesting two-part question: are there longevity variants that have little effect on lifespan in the normal range (Sebastiani et al., 2016), and if so, do they control underlying ageing processes? We note, the genetic overlap between lifespan and extreme long-livedness is high (0.73), but not complete (McDaid et al., 2017). Regardless of this, only a small part of the heritability of both lifespan and longevity has thus far been explained by GWAS. It thus remains plausible that an enlarged long-livedness or lifespan study will find variants controlling the rate of ageing and associated pathways. Curiously, we find little evidence of SNPs of large deleterious effect on lifespan acting with antagonistic pleiotropy on other fitness and developmental component traits, despite long-standing theoretical suggestions to the contrary (Williams, 1957). However, we did not examine mortality before the age of 40, or mortality of individuals without offspring (by definition as we were examining parental lifespans), which may exhibit this phenomenon. For the time being, our findings that the improved polygenic risk score for lifespan was associated with an increased prevalence of Alzheimer's disease, Parkinson's disease, and prostate and breast cancer, means we appear to be predominantly measuring a propensity for longer life through avoidance of early disease-induced mortality, rather than healthy ageing or fertility costs.

Whilst it has previously been shown that transcriptomic age calculated based on age-related genes is meaningful in the sense that its deviation from the chronological age is associated with biological features linked to ageing (Peters et al., 2015), the role of these genes in ageing was unclear. A gene might change expression with age because (i) it is a biological clock (higher expression tracking biological ageing, but not influencing ageing or disease); (ii) it is a response to the consequences of ageing (e.g. a protective response to CVD); (iii) it is an indicator of selection bias: if low expression is life-shortening, older people with low expression tend to be eliminated from the study, hence the average expression level of older age groups is higher. However, our results now show that the differential expression of many of the age-related genes discovered by Peters et al., 2015 are not only biomarkers of ageing, but are also enriched for direct effects on lifespan.

There is increasing interest in polygenic risk scores, and their potential clinical utility for some diseases appears to be similar to some Mendelian mutations (albeit such monogenic tests are usually only applied in the context of family history; Khera et al., 2018). At first sight, the magnitude of the distinctions in our genetic lifespan score (5 years of life between top and bottom deciles, for both the parent and subject generations) are quite small compared with variability in individual lifespans. However, these distinctions are potentially material at a group level, for example, actuarially. The implied distinction in price (14%; Methods) is greater than some recently reported annuity profit margins (8.9%) (Legal General Group PLC, 2017). In our view, the legal and ethical frameworks (at least in the UK [Association of British Insurers and UK Government, 2014]) are presently under-developed for genome-wide scores, whether for disease or lifespan and this needs to be urgently addressed. At the same time, although material in isolation, our lifespan associations may only have practical utility in many applications if they provide additional information than that provided by conventional clinical risk measures (e.g. the Framingham score [D'Agostino et al., 2008]). Such an assessment has been beyond the scope of this work, in part as such risk measures are not readily available for the parents (rather than subjects) studied.

One limitation of our study was the power reduction caused by the exclusion of relatives in our study, rather than linear mixed modelling (LMM) with a term for kinship as measured by the genomic relationship matrix (GRM) (Pilling et al., 2017; Loh et al., 2015). However, as the correct adjustment is not derivable under the kin-cohort method, we felt this was the best approach. To see that the normal adjustment is not correct, consider two siblings. The phenotypes under study are of course identical (as the parents are the same), but the expected correlation under the mixed model would only be 50% of the heritability. Simply excluding siblings, however, is not sufficient. For example, consider two offspring subjects who are first cousins descended from two full brothers. The GRM entry in this situation is 12.5% whilst the appropriate relatedness factor for the father trait is 50% and the mother 0%. Exclusion of relatives thus appears the most straightforward solution, although if a pedigree were available, not just a GRM, accurate LMM might have been feasible.

The analysis of parent lifespans has enabled us to probe mortality for a generation whose lives are mostly complete and attain increased power in a survival GWAS. However, changes in the environment (and thus the relative importance of each genetic susceptibility, for example following the smoking ban) inevitably mean we have less certainty about associations with prospective lifespans for the present generation of middle-aged people, or a different population (with perhaps different relative importance of disease or even overall heritability of lifespan). The 21% reduction in the effect size of the association between our PRS for the UK offspring generation supports this idea, although the estimated contrast in hazard ratios across the deciles was not reduced, which may be a statistical artefact or due to the different periods of life probed. The lower explanatory power of the PRS in Estonia may reflect the differing alleles and LD patterns between the UK training data and the Estonian test data, but also the different environments, in particular the sources of mortality in that country in the Soviet, and early post-Soviet era.

In conclusion, recent genomic susceptibility to death in the normal age range seems rooted in modern diseases: Alzheimer’s, CVD and lung cancer; in turn arising from our modern – long-lived, obesogenic and tobacco-laden – environment, however the keys to the distinct traits of ageing and extreme longevity remain elusive. At the same time, genomic information alone can now make material distinctions at a group level in variations in expected length of life, although the limited individual accuracy of these distinctions is far from reaching genetic determinism of that most (self-) interesting of traits – your lifespan.

Materials and methods

Summary

GWAS

For genetically British ancestry (as identified by UK Biobank using genomic PCA) and each self-reported European ethnicity in UK Biobank (including self-declared British but not genetically British ancestry), independent association analyses were performed between unrelated subjects’ genotypes (MAF > 0.005; HRC imputed SNPs only; ~9 million markers) and parent survival using age and alive/dead status in residualised Cox models, as described in Joshi et al. (2017). To account for parental genotype imputation, effect sizes were doubled, yielding log hazard ratios for the allele in carriers themselves. These values were negated to obtain a measure of log protection ratio, where higher values indicate longer life. While methods exist to account for related individuals using linear mixed models, such as BOLT-LMM (Loh et al., 2015), these are not accurate when trying to account for relatedness between parents (See Detailed Materials and methods).

Mother and father survival information was combined in two separate ways, essentially assuming the effects were the same in men and women, or allowing for sex-specific effect sizes (SSE), with appropriate allowance for the covariance amongst the traits. For the first analysis we summed parental survival residuals; for the second analysis we used MANOVA, implemented in MultiABEL (Shen et al., 2015).

For LifeGen, where individual-level data was not available, parent survival summary statistics were combined using conventional fixed-effects meta-analysis, adjusted to account for the correlation between survival traits (estimated from summary-level data). For SSE, the same procedure was followed as for the UK Biobank samples, with correlation between traits again estimated from summary-level data. The GWAS statistics showed acceptable inflation, as measured by their LD-score regression intercept (<1.06, Table 1—source data 2).

Candidate SNP replication

Effect sizes from longevity studies were converted to our scale using an empirical conversion factor, based on the observed relationships between longevity and hazard ratio at the most significant variant at or near APOE, observed in the candidate SNPs study and our data (Joshi et al., 2017). These studies were then meta-analysed using inverse variance weighting and standard errors were inflated to account for sample overlap (see Detailed Methods)

Estimates reported in Pilling et al. (2017) were based on rank-normalized Martingale residuals, unadjusted for the proportion dead, which – for individual parents – could be converted to our scale by multiplying by sqrt(c)/c, where c is the proportion dead in the original study (see Detailed Methods for derivation). Combined parent estimates were converted using the same method as the one used for longevity studies.

The deletion reported by Ben-Avraham et al. (2017) is perfectly tagged by a SNP that we used to assess replication. Assuming a recessive effect and parental imputation, we derived the expected additive effect to be β^C=β^CCq2q2+2pq, where β^C is the effect we expect to observe under our additive model, β^CC is the homozygous effect reported in the original study, q is the C allele frequency, and p is 1-q (see Detailed Materials and methods for derivation).

iGWAS

58 GWAS on mortality risk factors were used to create Bayesian priors for the SNP effects observed in the CES study, as described in McDaid et al. (2017). Mendelian randomisation was used to estimate causal effects of independent risk factors on lifespan, and these estimates were combined with the risk factor GWAS to calculate priors for each SNP. Priors were multiplied with observed Z statistics and used to generate Bayes factors. Observed Z statistics were then permuted, leading to 7.2 billion null Bayes factors (using the same priors), which were used to assess significance.

Sex and age stratified analysis

Cox survival models, adjusting for the same covariates as the standard GWAS, were used to test SNP dosage against survival of UK Biobank genomically British fathers and mothers, separately. The analysis was split into age bands, where any parent who died at an age younger than the age band was excluded and any parent who died beyond the age band was treated as alive. Using the R package ‘metafor’, moderator effects of sex and age on hazard ratio could be estimated while taking into account the estimate uncertainty (see Detailed Materials and methods for formula).

Causal genes and methylation sites

SMR-HEIDI (Zhu et al., 2016) tests were performed on CES statistics to implicate causal genes and methylation sites. Summary-level data from two studies on gene expression in blood (Westra et al., 2013; Lloyd-Jones et al., 2017) and data on gene expression in 48 tissues from the GTEx consortium (Battle et al., 2017) were tested to find causal links between gene expression and lifespan. Similarly, data from a genome-wide methylation study (McRae et al., 2017) was used to find causal links between CpG sites and lifespan. All results from the SMR test passing a 5% FDR threshold where the HEIDI test p>0.05 were reported.

Conditional analysis

SOJO (Ning et al., 2017) was used to fine-map the genetic signals in 1 Mb regions around lead SNPs reaching genome-wide significance and candidate SNPs reaching nominal significance in our study. The analysis was based on CES statistics from UK Biobank genomically British individuals, using the LifeGen meta-analysis results to optimise the LASSO regression tuning parameters. For each parameter, a polygenic score was built and the proportion of predictable variance from the regional polygenic score in the validation sample was calculated.

Disease association analysis

The GWAS catalog (MacArthur et al., 2017) and PhenoScanner (Staley et al., 2016) were checked for known genome-wide significant associations with our GWAS hits and proxies (r2 >0.6) in European samples. Associations discovered in UK Biobank by Churchhouse and Neale, 2018 were omitted from the PhenoScanner database as the findings have not been replicated, and the large sample overlap with our own study could result in false positive associations, due to phenotypic correlations between morbidity and mortality. Triallelic SNPs and associations without effect sizes were excluded before near-identical traits were grouped together, discarding all but the strongest association and keeping the shortest trait name. For example, ‘Lung cancer’, ‘Familial lung cancer’, and ‘Small cell lung cancer’ were grouped and renamed to ‘Lung cancer’. The remaining associations were classified into disease categories based on keywords and subsequent manual curation.

Lifespan variance explained by disease SNPs

The GWAS catalog (MacArthur et al., 2017) was checked for disease associations discovered in European ancestry studies, which were grouped into broad disease categories based on keywords and manual curation (see Figure 7—source data 1 and Detailed Methods). Associations were pruned by distance (500 kb) and LD (r2 <0.1), keeping the SNP most strongly associated with lifespan in the CES GWAS. Where possible this SNP was tested against diseases in UK Biobank subjects and their family to test for pleiotropy (see Detailed Matrials and methods). Significance of associations with lifespan was determined by setting an FDR threshold that allowed for one false positive among all independent SNPs tested (q ≤ 0.022). Lifespan variance explained (LVE) was calculated as 2pqa2, where p and q are the frequencies of the effect and reference alleles in our lifespan GWAS, and a is the SNP effect size in years of life (Falconer et al., 1996).

Cell type enrichment

Stratified LD-score regression (Finucane et al., 2015) was used to test for cell type-specific enrichment in lifespan heritability. As the power of this method depends on SNP heritability, standard LD-score regression (Finucane et al., 2015) was first used to check which of our samples (UK Biobank, LifeGen, or the combined cohort) had the highest SNP heritability. Lifespan summary statistics from UK Biobank genomically British individuals were then analysed using the procedure described in Finucane et al., 2015, and P values were adjusted for multiple testing using the Benjamin-Hochberg procedure.

Pathway enrichment

VEGAS2 v2.01.17 (Mishra and Macgregor, 2015) was used to calculate gene scores using SNPs genotyped in UK Biobank, based on summary statistics from the full CES cohort and the default software settings. VEGAS2Pathway was then used to check for pathway enrichment using those gene scores and the default list of gene sets (Mishra and MacGregor, 2017).

DEPICT (Pers et al., 2015) was also used to map genes to lifespan loci and check for pathway enrichment in the combined cohort CES GWAS. Default analysis was run for regions with genome-wide significant (p < 5e-8) variants in the first analysis, and genome-wide suggestive (p < 1e-5) variants in the second analysis, excluding the MHC in both cases.

PASCAL (Lamparter et al., 2016) was used with the same summary statistics and gene sets as DEPICT, except the gene probabilities within the sets were dichotomized (Z > 3) as described in Marouli et al., 2017.

For each software independently, pathway enrichment was adjusted for multiple testing using the Benjamin-Hochberg procedure.

Age-related eQTL enrichment

Combined cohort CES lifespan statistics were matched to eQTLs associated with the expression of at least one gene (p < 10–3) in a dataset from the eQTLGen Consortium (31,684 individuals) (Võsa et al., 2018). Data on age-related expression (Peters et al., 2015) allowed eQTLs to be divided into four categories based on association with age and/or lifespan. Fisher's exact test was used check if age-related eQTLs were enriched for associations with lifespan.

Polygenic score analysis

Polygenic risk scores (PRS) for lifespan were calculated for two subsamples of UK Biobank (24,059 Scottish individuals and a random 29,815 English/Welsh individuals), and 36,499 individuals from the Estonian Biobank, using combined cohort CES lifespan summary statistics that excluded these samples. PRSice 2.0.14.beta (Euesden et al., 2015) was used to construct the scores from genotyped SNPs in UK Biobank and imputed data from the Estonian Biobank, pruned by LD (r2 = 0.1) and distance (250 kb). Polygenic scores were Z standardised.

Cox proportional hazard models were used to fit parental survival against polygenic score, adjusted for subject sex; assessment centre; genotyping batch and array; and 10 principal components. Parental hazard ratios were converted into subject years of life as described in the GWAS method section.

Logistic regression models were used to fit polygenic score against the same self-reported UK Biobank disease categories used for individual SNPs. Effect estimates of first-degree relatives were doubled to account for imputation of genotypes and then meta-analysed using inverse variance weighting, adjusting for trait correlations between family members.

Postulation of equivalent sample size in height GWAS

The use of parent imputation, low trait heritability, and incomplete proportion dead all reduce the power to detect effect sizes. The equivalent sample size in a hypothetical, completely heritable trait with otherwise identical genetic architecture would be the original sample size, diluted (i.e. multiplied) by the heritability (0.122) (Kaplanis et al., 2018), the r2 of offspring genotype on parent genotype (0.250) and the proportion dead (0.602). This gives an equivalent sample size of 18,579 from the 1,012,240 parent lifespans. We then calculated sample size for height that would have the same properties, accounting for the heritability of height (0.8) (Wood et al., 2014): 23,224 (i.e. 18,579/0.8). We next calculated the P value that would have been reported by Wood et al's 250,000 sample size height GWAMA (Wood et al., 2014) for a SNP that was just significant in a hypothetical 23,224 sample height GWAMA: p < 1.8 × 10−72. Six distinct loci passed this significance threshold in Wood et al's results.

With 17,893 nonagenarians, Deelen et al. (2014) attained a P value of 2.33 × 10−26 at rs4420638. With 1.012 m parents we attained a P value of 1.75 × 10−64. Other things being equal a nonagenarian sample size of 44,500 thus appears to be equally powered to one million parents.

Sensitivity of annuity prices to age

Market annuity rates for life annuities in January 2018 written to 55, 60, 65, and 70 year olds were obtained from the sharing pensions website http://www.sharingpensions.com/annuity_rates.htm (accessed 22 January 2018) and were £4158, £4680, £5476, £6075, £7105 respectively per year for a £100,000 purchase price. The arithmetic average increase from one quinquennial age to the next is 14 percent.

Data availability

Individual phenotypic and genetic data is available from UK Biobank upon application: https://www.ukbiobank.ac.uk/register-apply/. Phenotypes used in this work include subject age, sex, ethnicity, relatedness, genotyping batch, array, and principal components, as well as parental age and alive/dead status. Also included are self-reported diseases of subjects and their families. A full description of our application can be found at https://www.ukbiobank.ac.uk/2015/02/dr-james-wilson-university-of-edinburgh-centre-for-population-sciences/. The results that support our findings, in particular, the GWAS summary statistics we generated for >1 million parental lifespans in this study are available at http://dx.doi.org/10.7488/ds/2463. Gene expression data used in the age-related gene analysis is being made available by the eQTLGen Consortium, see http://www.eqtlgen.org/ and Võsa et al., 2018. Single tissue gene expression data used in the SMR-Heidi analysis can be found on the GTEx website https://gtexportal.org/home/datasets, under GTEx_Analysis_v7_eQTL.tar.gz.

Details

Data sources

Our UK Biobank dataset consisted of 409,700 British individuals (determined by genomic PCA) and 48,656 European individuals of self-reported (but not genetic) British, Irish, and other White European descent. Details on genotyping marker and sample QC are described in Bycroft et al., 2017. Subjects completed a questionnaire which included questions on adoption status, parental age, and parental deaths. For our analysis, we excluded individuals who were adopted or otherwise unclear about their adoption status (N = 7,279), individuals who did not report their parental ages (N = 2,995), and individuals both of whose parents died before the age of 40 and which were therefore more likely due to accident or injury (N = 4,472). We further excluded one of each pair from related individuals (third degree or closer; N = 88,354) from every relative pair reported by UK Biobank, leaving 443,610 individuals for the final analysis. Although exclusion of relatives reduces sample size, we were concerned that linear mixed modelling to account for relatedness might not be fully appropriate under the kin-cohort model. Consider the parental phenotypic correlation for two full sibling subjects (r2 = 1) or the maternal genetic covariance amongst two subjects who are the offspring of two brothers (r2 = 0): the heritability/GRM implied covariance is incorrect for both cases (although in the sibling case, it may be correct on average). Individuals passing QC reported a total of 339,732 paternal and 351,889 maternal lifespans, ranging from 40 to 107 years of life, that is 691,621 lives in total (Table 1—source data 1).

Our second dataset was the publicly available summary statistics from LifeGen, a consortium of 26 population cohorts investigating genomic effects on parental lifespans (Joshi et al., 2017). LifeGen had included results from UK Biobank, but the UK Biobank GWAS data were removed here, giving GWAS summary statistics for 160,461 father and 160,158 mother lifespans in the form of log hazard ratios. Combined, our datasets had 1,012,240 lives.

UK Biobank Genome-Wide association study

In each separate UK Biobank ethnicity, we carried out association analysis between genotype (MAF > 0.005; HRC imputed SNPs only; ~9 million markers) and parent age and alive/dead status, effectively analysing the effect of genotype in offspring on parent survival, given survival to at least age 40, using Cox Proportional Hazards models. The following model was assumed to hold:

(1) hx=h0xeβX+γ1Z1+γkZk

Where x is (parent) age, h0 the baseline hazard and X the offspring genotype (coded 0,1,2), beta the loge(hazard ratio) associated with X and Z1-Zk the covariates, with corresponding effect sizes y1-yk. The covariates were genotyping batch and array, the first forty principal components of relatedness, as provided by UK Biobank, and subject sex (but not age, as we were analysing parent age).

To facilitate practical runtimes, the Martingale residuals of the Cox model were calculated for father and mothers separately and divided by the proportion dead to give estimates of the hazard ratio (Therneau et al., 1990) giving a residual trait suitable for GWAS (for more details of the residual method see Joshi et al., 2017). Effect sizes observed under this model, for a SNP in offspring, are half that of the actual effect size in the parent carrying the variant (Joshi et al., 2017). Reported effect sizes (and their SE) have therefore been doubled to give the effect sizes in carriers themselves, giving an estimate of the log hazard ratios (or often, with sign reversed, log protective ratios). These estimates are suitable for meta-analysis and allow direct comparison with the log hazard ratios from LifeGen.

Analysis of association between genotype and survival across both parents was made under two contrasting assumptions and associated models, which had to adjust for the covariance amongst parent traits, preventing conventional unadjusted inverse-variance meta-analysis. Firstly, we assumed that the hazard ratio was the same for both sexes, that is a common effect size across sexes (CES). If there were no correlation amongst parents’ traits, this could have been done by straightforward inverse variance meta-analysis of the single parent results. However, to account for the covariance amongst father and mother lifespans, we calculated a total parent residual, the sum of individual parent residuals, for each subject (i.e. offspring). Under the common effect assumption, the combined trait’s effect size is twice that in the single parent, and the variance of the combined trait, automatically and appropriately reflects the parents’ covariance, amongst the two parents, giving a residual trait suitable for GWAS, with an effect size equal to that in a carrier of the variant, and correct standard error. Secondly, we assumed that, there might be sex-specific effect sizes (SSE) in fathers and mothers. Under the SSE assumption, individual parental GWAS were carried out, and the summary statistics results were meta-analysed using MANOVA, accounting for the correlation amongst the parent traits and the sample overlap (broadly complete), but agnostic as to whether the effect size was similar or different in each parent, giving a P value against the null hypothesis that both effect sizes are zero, but, naturally, no estimate of a single common beta. This procedure was carried out using the R package MultiABEL and summary-level data (Shen et al., 2015). The procedure requires an estimate of the correlation amongst the traits (in this case parent residuals), which was measured directly (r = 0.1). The procedure automatically estimates the variance of the traits from summary level data (Mother residuals σ2 = 6.74; Father residuals σ2 = 5.25)

For the LifeGen results, the SSE procedure to combine results was identical to UK Biobank (Mother residuals σ2 = 14.12; Father residuals σ2 = 18.75), except the trait correlation was derived from summary level data instead of measured directly (r = 0.1). This was done by taking the correlations in effect estimates from independent SNPs from the summary statistics of the individual parents, which equals the trait correlation, assuming full sample overlap (which is slightly conservative). Similarly, since we did not have access to individual level (residual) data, it was not possible to carry out a single total parent residual GWAS under the CES assumption. Instead we meta-analysed the single parent effect sizes using inverse variance meta-analysis, but adjusted the standard errors to reflect the correlation amongst the traits (r) as follows:

SE(β^)=SE0(β^)1+r

Where SE0(β^) is the usual (uncorrected) inverse-variance weighted meta-analysis standard error, ignoring the correlation amongst the estimates and SE(β^) is the corrected estimate used.

This is slightly conservative as

(2) Variance(β^)=Variance0(β^)(1+2rs1s2s12+s22) <= Variance0(β^)(1+r)

which follows straight forwardly from β^=P1β^1+P2β^2 P1+P2.

Where s1 and s2 are the standard error of the individual estimates and P1, P2 their associated precisions (i.e. reciprocal of the variance). Equation (2) is always conservative, but exact if s1=s2. In practice s1 and s2 were similar, as the sample sizes, allele frequencies and variance in the traits for the two parents were very similar.

As we were using unrelated populations and fitting forty principal components to control for population structure, material inflation of test statistics due to structure or relatedness was not to be expected. This was confirmed using the intercept of LD-score regression (Bulik-Sullivan et al., 2015) for the summary statistics as shown in Table 1—source data 2. We have tried to use a consistent approach to the direction of lifespan altering effects: positive implies longer life, consistent with previous studies of long-livedness (Deelen et al., 2014). Our base measure was thus a protection ratio, directly mirroring the cox hazard ratio. Effect sizes (betas) are typically –loge(cox hazard ratio), which we denote the loge(protection ratio). Years of life gained were estimated as 10 * log protection ratio, in accordance with a long-standing actuarial rule of thumb and recently verified (Joshi et al., 2017).

Candidate SNP replication

We sought to reproduce and replicate genome-wide significant associations reported by Pilling et al. (2017), who recently published a GWAS on the same UK Biobank data, but using a slightly different method. Rather than excluding relatives, Pilling et al. used BOLT-LMM and the genomic relationship matrix in subjects, to approximately account for covariance amongst parental phenotype. Pilling et al. also analysed parents separately as well as jointly, using a last survivor phenotype. Despite these factors, reproduction (obtaining the same result from almost the same data) was straightforward and consistent, once effect sizes were placed on the same scale (see below and Figure 2—figure supplement 1), confirming our re-scaling was correct. To try to independently replicate their results, we used the consortium, LifeGen, excluding individuals from UK Biobank.

Pilling et al. (2017) performed multiple parental survival GWAS in UK Biobank, identifying 14 loci using combined parent lifespan and 11 loci using individual parent lifespan. Their study design involved rank-normalising Martingale residuals before regressing against genotype, which does not give an estimate of the loge(hazard ratio) (lnHR), nor, we believe, another naturally interpretable scale of effects, as the scale is now dependent on the proportion dead. Simulations (not shown) suggested sdc for some Martingale residual distributions, where sd is the standard deviation of the distribution and c is the proportion dead. As multiplying the untransformed Martingale residual distribution by 1/c gives an estimate of the hazard ratio (Joshi et al., 2017; Therneau et al., 1990), for individual parents, we could convert Pilling et al.’s effect sizes by multiplying them by sqrt(c) to return them to the Martingale residual scale (which still depends on the study structure) and then by 1/c to place them on the lnHR scale, using the proportion dead from Pilling et al.’s study descriptives. Further multiplication by 2 allows conversion from a subject-parent effect to an effect in self. The cumulative scale conversion allowing for all three of these effects was to multiply Pilling et al.’s effect sizes by 2.5863/2.2869 in mothers/fathers, respectively, placing them on a lnHR scale for effects in male/female subjects. The joint life parent phenotype does not appear to have a straightforward conversion to lnHR in self. Instead, we used an empirical estimate derived from effect sizes comparison of the APOE allele between Pilling et al.’s discovery sample and our own UK Biobank Gen. British sample (both parents combined), which were largely overlapping: to get from Pilling et al.’s effect size to loge HR, we had to multiply their effect sizes by 1.9699 for APOE and used this ratio for other alleles, which should be completely valid under the proportional hazard assumption. Whilst this scheme may appear a little ad hoc (the use of simulation and APOE), it was confirmed empirically: visual inspection indeed showed hazard ratios from our own UK Biobank Gen. British sample calculations and inferred hazard ratios from Pilling were highly concordant (noting the concordance for joint life at APOE, which was pre-defined to be perfectly concordant by our procedure, is not, of itself, evidence) (Figure 2—figure supplement 1).

Flachsbart et al. (2009), Deelen et al. (2014), and Broer et al. (2015) tested extreme longevity cases (95–110 years, ≥85 years, ≥90 years, respectively) against controls (60–75 years, 65 years, deceased at 55–80 years, respectively), identifying SNPs at or near FOXO3 and 5q33.3/EBF1. As done previously (Joshi et al., 2017), we estimated the relationship between study-specific longevity log odds ratio and log hazard ratio empirically using the most powered APOE variant overlapping with our own study, assuming increased odds of surviving to extreme age is due to a reduction in lifetime mortality risk. For Flachsbart et al. and Deelen et al., we used rs4420638_G (reported log OR –0.33 (Deelen et al., 2014), our lnHR 0.086). Inverting the sign to give loge(protection ratio) estimates, the conversion estimate used was 3.82. For Broer et al., we used rs6857_T (reported log OR –0.20 (Broer et al., 2015), our lnHR 0.087), with a conversion estimate of 0.43, again yielding a loge(protection ratio).

Ben-Avraham et al. (2017) reported a deletion in Growth Hormone Receptor exon 3 (d3-GHR) associated with an increase of 10 years in male lifespan when homozygous. This deletion is tagged by rs6873545_C (McKay et al., 2007), which is present in the UK Biobank and LifeGen population sample at a frequency of 26.9% (q). Considering the association is recessive and we are imputing father genotypes, we converted the reported effect size into expected years of life per allele as follows:

If the subject genotype is CT, the parent contributing the C allele has 50% chance of being the father and q2q2+2pq chance of being homozygous. If the subject genotype is CC, the father has 100% chance of contributing the C allele and again has q2q2+2pq chance of being homozygous. We therefore expect the relationship to be β^C=12β^CCq2q2+2pq, where β^C is the observed effect per subject allele on father lifespan and β^CC is the reported effect of the homozygous deletion in the father. As before, doubling the allele effect gives an estimate of the effect of the allele on subject lifespan, which finally yielded a converted estimate of 0.155. That is, under Ben-Avrahim et al.’s assumptions on inheritance patterns, if their estimate of effect size in minor homozygotes is correct, we should see under the additive model an effect size of 0.155 years, or a loge(hazard ratio) of –0.015, and correspondingly scaled standard errors (note we are assuming that the effect is actually recessive, and estimating how that effect should appear if an additive model were fitted).

Standard errors were calculated from inferred betas and reported P values, assuming a two-sided test with a normally-distributed estimator. Confidence interval overlap was then assessed using a two-sided test on the estimate difference (Pdiff), using a Z statistic from the difference divided by the standard error of the difference.

iGWAS

We performed a Bayesian Genome-Wide Association Study using the CES GWAS results and summary statistics on 58 risk factor GWASs (imputed, leading to 7.2 million SNPs in common between all the studies), as described by McDaid et al. (2017). To calculate our prior for SNPs on a given chromosome, first we used a multivariate Mendelian Randomization (masking the focal chromosome) to identify the risk factors significantly influencing lifespan and estimate their causal effect. This identified 16 risk factors independently causally contributing to lifespan (see Table 2—source data 1 for the causal effect estimates). Next, assuming that a SNP affects lifespan through its effects on the 16 risk factors, prior effects estimates were estimated as the sum of the products of the causal effect estimates of the 16 significant risk factors on lifespan and the effect of the SNP on each risk factor. We added one to the prior effect variance formula described in McDaid et al. (2017) to account for the fact that prior effects are estimated using observed Z-scores, and not true Z-scores, with Zobs 𝒩(Ztrue, 1).

We computed Bayes factors by combining the prior effects and the observed association Z statistics. The significance of the Bayes factors was assessed using a permutation approach to calculate P values, by comparing observed Bayes factors to 7.2 billion null Bayes factors. These null Bayes factors were estimated using 1000 null sets of Z statistics combined with the same priors. These empirical P values were then adjusted for multiple testing using the Benjamini-Hochberg procedure.

Replication in extreme long-livedness

To replicate our novel lifespan findings, we inverse-variance meta-analysed summary statistics from Deelen et al. (2014) and Broer et al. (2015), having converted their effect sizes to a common scale (see Candidate SNP replication). These effect sizes were given or could be estimated from P value, effect direction and N, as well as the SNPs MAF.

Both longevity studies and the LifeGen consortium contain individuals from the Rotterdam Study, but due to differences in trait definitions, we could not inflate standard errors directly based on sample overlap. Instead, we calculated the covariance in null SNPs (Z < 1) between each study (r ~ 0.01) and then adjusted the standard errors based on equation 5 from Lin and Sullivan (2009):

Variance(β^)=Variance0(β^)+2 n=1N1m=n+1Nwnwm\ Cov(β^n,β^m)

Where Variance0 is the unadjusted variance of the SNP effect β^ after meta-analysis, w is the inverse variance weight of the SNP, N is the number of studies, and Cov (β^n,β^m) is the null SNP covariance between each study.

Test of the hypothesis that the effect was zero, was one sided, with alternate hypothesis that the effect had the same sign as in discovery. Effect sizes in discovery and replication were then compared by calculating the ratio (alpha) of replication effect sizes to discovery effect sizes:

α=βrepβdisc

and its standard error using the following formula, reflecting the Taylor series expansion of the denominator for SE:

SEα=SErep2βdisc2+βrep2 SEdisc2(βdisc2)2

where rep and disc are replication and discovery, respectively. Alpha was then inverse-variance meta-analysed across all SNPs to test for collective evidence that the discovery SNPs influence longevity.

Age and sex-stratified effects

Calculation of age and sex stratified effect sizes was done using the full Cox model (Equation 1), imputed dosages and the package ‘Survival’ in R. We split the full analysis into age decades from 40 to 90 and a wider band, 90–120, beyond that, excluding any parent who died at an age younger than the age band and treating any parent who died beyond the age band as alive at the end of the age band. We thus had, across independent periods of life, estimates of the hazard ratio by decade of age and parent sex, along with standard errors. This gave estimates of the hazard ratio beta(age band, sex) in each age band and sex.

We tested the effect of age and sex, by fitting the linear model beta(age band, sex) = intercept + beta1 x ageband + beta2 x sex + e, where e is independent, but with known variance (the square of the SE in the age/sex stratified model fit) and using the rma function from the R package ‘metafor’ which uses known variances of dependent variables. The process is more easily understood by examining the age and sex related effect size graphs, and recognising we are fitting age and sex as explanatory variables, considering the standard error of each point shown.

Causal gene prediction

In order to more accurately implicate causal genes and methylation sites from the detected loci associated with human lifespan, Summary-level Mendelian Randomisation (SMR) and HEterogeneity In InDependent Instruments (HEIDI) tests (Zhu et al., 2016) were performed on our CES GWAS statistics. Three separate analyses were performed. First, cis-eQTL scan results from peripheral blood tissue from two previous studies, the Westra data (Westra et al., 2013) and CAGE data (Lloyd-Jones et al., 2017), were used to prioritize causal genes. Second, cis-eQTL signals (SNPs with FDR < 0.05) for 48 tissues from the GTEx consortium (Battle et al., 2017) were used to prioritize causal genes in multiple tissues. Third, genome-wide methylation QTL (mQTL) scan signals in blood tissue from the Brisbane Systems Genetics Study and Lothian Birth Cohort (McRae et al., 2017) were used to predict causal CpG sites associated with human lifespan loci. All results from SMR test passing a 5% FDR threshold where the HEIDI test p > 0.05 were reported.

Fine-mapping using LASSO regression

Selection Operator for JOint multi-SNP analysis (SOJO) (Ning et al., 2017) was used to perform conditional fine-mapping analysis of the lifespan loci. The SOJO procedure implements LASSO regression for each locus, which outperforms standard stepwise selection procedure (e.g. GCTA-COJO), based on summary association statistics and the European-ancestry 1000 Genomes samples for LD reference. We based the SOJO analysis on our CES summary association statistics from UK Biobank and used the LifeGen summary statistics as validation sample to optimise the LASSO tuning parameters for each locus. Loci were defined prior to analysis as 1 Mb windows centred at each top variant from the GWAS. For each locus, based on UK Biobank data, we recorded the first 30 variants entering the model and the tuning parameters for these entering points along the LASSO path, as well as the LASSO results at the tuning parameters. For each recorded tuning parameter, we then built a polygenic score and computed the proportion of predictable variance from the regional polygenic score in the validation sample. The best out-of-sample R squared is reported, together with the selected variants per locus.

Lifespan variance explained

We sought an independent set of disease-associated SNPs to assess which diseases had the greatest effect on lifespan. A large number of SNPs per disease category, especially other cancers, were used to ensure that diseases were not under-represented when testing for association with lifespan. The latest, genome-wide significant disease SNPs from European ancestry studies were retrieved from the GWAS catalog (14 March 2018), based on string matching within reported trait names. For Alzheimer’s/Parkinson’s disease, these were ‘alzh’ and ‘parkin’; for CVD, these were ‘myocard’, ‘cvd’, ‘cardiovascular’, ‘coronary’, and ‘artery disease’; for Type 2 diabetes this was ‘type 2 diabetes’; for cancers, this was ‘cancer’, ‘noma’, ‘ioma’, ‘tumo[u]r’, and ‘leukemia’. Cancers were then divided in Lung cancer and Other cancers based on the presence or absence of the keyword ‘lung’. The Smoking/Lung cancer category was created by adding traits containing the keywords ‘smoking’ and ‘chronic obstructive’ to the lung cancers. Each category was manually checked to include only associations with the diseases themselves or biomarkers of the diseases. Although some throat cancers are often caused by smoking and alcohol consumption, we did not treat these as smoking loci; in practice, this choice had no effect as the only significant throat cancer locus (oesophageal cancer near CFTR) was discounted as secondary pleiotropic – see below.

SNPs missing from the CES summary statistics were imputed from the closest proxy (min. r2 > 0.9) or averaged from multiple proxies if equally close. SNPs without effect sizes, SNPs matching neither our reference nor effect alleles, and SNPs with reported frequencies differing by more than 0.3 from our own were excluded. The remaining SNPs were subdivided into independent (r2 < 0.1) loci 500 kb apart, keeping the SNP most strongly associated with lifespan in the CES GWAS – thus proportional only to the lifespan GWAS test statistic and independent from the number of disease SNPs per category. Lastly, where possible, loci were tested for association with their disease category in UK Biobank, using self-reported diseases of 325,292 unrelated, genomically British subjects, their siblings, and each parent separately. Diseases of subject relatives were already coded into broad disease categories by UK Biobank. For subjects themselves, ICD codes had been recorded which we grouped into similar categories (hypertension, cerebral infarction, heart disease, diabetes, dementia, depression, stress, pulmonary disease, and cancer, in accordance with Figure 10—source data 1, although cancer in subjects was more directly taken as the trait of reporting number of cancers > 0). The trait of reporting these diseases (separately for each relative and the subject themselves) was then tested for association with genotypic dosage for the GWAS catalog disease SNPs. The model fitted was a logistic regression of not reporting the disease, using the same covariates as the CES GWAS with the addition of subject age, and estimated the log odds ratio of protection from disease in UK Biobank for each copy of the disease-protective allele in the GWAS catalog. Effects reported in the GWAS catalog for which we found the pooled estimate from our association study was in the opposite direction were flipped (if p < 0.05) or discarded (if p ≥ 0.05).

Our final dataset consisted of 555 disease SNPs (81 neurological, 72 cardiovascular, 65 diabetes, 22 smoking/lung cancer, and 315 other cancers). Lifespan variance explained (LVE) was calculated as 2pqa2, where p and q are the frequencies of the effect and reference alleles in our lifespan GWAS, and a is the SNP effect size in years of life (Falconer et al., 1996). To assess pleiotropy, SNPs were tested against other disease categories, and where possible, the relative strengths of standardised associations between disease categories were compared. SNPs associating more strongly with another disease, as defined by a Z statistic more than double that of the original disease, were marked as pleiotropic and secondary. Whilst strength of association would not normally be perceived as appropriately measured in this way (odds ratio being more conventional and independent of prevalence), here we are interested in the excess number of disease cases in the population due to the variant, so any locus with a moderate OR for a highly prevalent disease is judged more causative of that disease than a locus with a (somewhat) higher OR for a very rare disease, as the number of attributable cases will be lower. The Z statistic captures this – given that p and q are obviously the same (same SNP, same population). Correspondingly, for diseases only present in one sex, the other sex was treated as all controls. Whilst this halves the apparent effect size, the required measure is the amount of disease caused across the whole population. A SNP conferring similar attributable counts of CVD and breast cancer in women, but also CVD in men, is causing CVD more than cancer across the population. Correspondingly, selection pressure on the breast cancer effect is half that for a matching effect in both sexes. SNPs conferring both an increase in disease and an increase in lifespan were marked as antagonistically pleiotropic. Unsurprisingly, in practice, there were one or more other diseases reduced by the SNP and therefore the reported disease-increasing association was considered secondary. Total LVE per disease category was calculated by summing SNPs not marked as secondary and with significant effects on lifespan, where significance was determined by setting an FDR threshold that allowed for one false positive among all SNPs tested (q ≤ 0.016, 60 SNPs). To compare the cumulative LVE of the top LVE loci, all non-secondary association SNPs from the disease categories were pooled and again subdivided into independent loci (r2 < 0.1) 500 kb apart. Applying an FDR threshold with the same criteria (q ≤ 0.022), a total of 45 (1 neurological, 23 cardiovascular, four diabetes, six smoking/lung cancer, and 11 other cancer) independent loci remained and their LVE was summed by disease category.

Cell type and pathway enrichment

Stratified LD-score regression (Finucane et al., 2015) partitions SNP heritability into regions linked to specific tissues and cell types, such as super-enhancers and histone marks, and then assesses whether the SNPs in these regions contribute disproportionately to the total SNP heritability. Standard LD-score regression (Bulik-Sullivan et al., 2015) indicated that between the different samples (UK Biobank and/or LifeGen) and analyses (CES or SSE), the CES results from UK Biobank genomically British ancestry individuals had the highest SNP heritability, plausibly due to its uniformity of population sample. These statistics were analysed using the procedure described by Finucane et al., 2015, which included limiting the regressions to HapMap3 SNPs with MAF > 0.05 to reduce statistical noise. Results from all cell types were merged and then adjusted for multiple testing using Benjamini–Hochberg (FDR 5%).

The full CES dataset was subjected to gene-based tests, which used up to 106 SNP permutations per gene to assign P values to 26,056 genes, as implemented by VEGAS2 v2.01.17 (Mishra and Macgregor, 2015). Only directly genotyped SNPs from the UK Biobank array were used to facilitate practical runtimes. Using the default settings, all SNPs located within genes (relative to the 5’ and 3’ UTR) were included. Scored genes were then tested for enrichment in 9741 pathways from the NCBI BioSystems Database with up to 108 gene permutations per pathway using VEGAS2Pathway (Mishra and MacGregor, 2017). Pathway enrichment P values were automatically adjusted for pathway size (empirical P) and further adjusted for multiple testing using Benjamini-Hochberg (FDR 5%).

DEPICT was also used to create a list of genes; however, this method uses independent SNPs passing a P value threshold to define lifespan loci and then attempts to map 18,922 genes to them. Gene prioritization and subsequent gene set enrichment is done for 14,461 probabilistically-defined reconstituted gene sets, which are tested for enrichment under the self-contained null hypothesis (Pers et al., 2015). Two separate analyses were performed on the CES summary statistics, using independent SNPs (>500 kb between top SNPs) which were present in the DEPICT database. The first analysis used a genome-wide significance threshold (GW DEPICT analysis) and mapped genes to 10 loci, automatically excluding the major histocompatibility complex (MHC) region. The second used a suggestive significance threshold (p < 10–5), which yielded 93 loci and mapped genes to 91 of these, again excluding the MHC region. To test if pathways were significantly enriched at a 5% FDR threshold, we used the values calculated by DEPICT, already adjusted for the non-independence of the gene sets tested.

PASCAL was used with the same summary statistics and gene sets as DEPICT, except the gene probabilities within the sets were dichotomized (Z > 3) (Marouli et al., 2017), leading to the analysis of the same 14,461 pathways. PASCAL transformed SNP P values into gene-based P values (with default method ‘--genescoring=sum’) for 21,516 genes (Lamparter et al., 2016). When testing the pathways for overrepresentation of high gene scores, the P values are estimated under the competitive null hypothesis (Maciejewski, 2014). These pathway empirical P values were further adjusted for multiple testing using Benjamini-Hochberg procedure.

Age-related eQTLs enrichment

We identified SNPs in our CES GWAS that were eQTLs that is associated with the expression of at least one gene with p < 10–4 in a dataset by the eQTLGen Consortium (n = 31,684 individuals) (Võsa et al., 2018). A total of 3577 eQTLs after distance pruning (500 kb) were present, of which 755 were associated with genes differentially expressed with age (Peters et al., 2015). We used Fisher's exact test to determine, amongst the set of eQTLs, if SNPs which were associated with lifespan (at varying thresholds of statistical significance) were enriched for SNPs associated with genes whose expression is age-related.

Polygenic lifespan score associations

We used the CES GWAS, excluding (one at a time) all Scottish populations (whether from Scottish UK Biobank assessment centres or Scottish LifeGen cohorts), Estonian populations and a random 10% of UK Biobank English and Welsh subjects to create polygenic risk scores using PRSice (Euesden et al., 2015), where the test subjects had not been part of the training data. As we find polygenic risk scores developed using all (p ≤ 1) independent (r2 < 0.1) SNPs (PRSP1), rather than those passing a tighter significance threshold are most associated (highest standardised effect size; see Table 3—source data 1 for comparison between thresholds), these were used in the analysis.

To make cross-validated lifespan associations using polygenic scores, our unrelated, genomically British sample was partitioned into training and test sets. The first test set consisted of Scottish individuals from UK Biobank, as defined by assessment centre or northings and eastings falling within Scotland (N = 24,059). The second set consisted of a random subset of the remaining English and Welsh population, reproducibly sampled based on the last digit of their UK Biobank identification number (#7, N = 29,815). The training set was constructed by excluding these two populations, as well as excluding individuals from Generation Scotland, from our GWAS and recalculating estimates of beta on that subset.

A third independent validation set was constructed by excluding the EGCUT cohort from the LifeGen sample and using the remaining data to test lifespan in the newly genotyped EGCUT cohort (Leitsalu et al., 2015), using unrelated individuals only (N = 36,499).

Polygenic survival scores were constructed using PRSice 2.0.14.beta (Euesden et al., 2015) in a two-step process. First, lifespan SNPs were LD-clumped based on an r2threshold of 0.1 and a window size of 250 kb. To facilitate practical run times of PRSice clumping, only directly genotyped SNPs were used in the Scottish and English/Welsh subsets. The Estonian sample was genotyped on four different arrays with limited overlap, so here imputed data (with imputation measure R2 > 0.9) was used and clumped with PLINK directly (r2 = 0.1; window = 1000 kb). The clumped SNPs (85,539 in UK Biobank, 68,234 in Estonia) were then further pruned based on several different P value thresholds, to find the most informative subset. For all individuals, a polygenic score was calculated as the sum of SNP dosages (of SNPs passing the P value threshold) multiplied by their estimated allele effect. These scores were then standardised to allow for associations to be expressed in standard deviations in polygenic scores.

Polygenic scores of test cohorts were regressed against lifespan and alive/dead status using a cox proportional hazards model, adjusted for sex, assessment centre, batch, array, and 10 principal components. Where parental lifespan was used, hazard ratios were doubled to gain an estimate of the polygenic score on own mortality. Scores were also regressed against self-reported diseases in UK Biobank subjects, their siblings, and each parent separately, using a logistic regression adjusted for the same covariates as in the lifespan analysis plus subject age. As with previous disease associations, estimates were transformed so positive associations indicate a protective or life-extending effect, and effect estimates of first degree relatives were doubled. Meta-analysis of estimates between cohorts was done using inverse variance weighting. Where estimates between kin were meta-analysed, standard errors were adjusted for correlation between family members. This involved multiplying standard errors by 1+r for each correlation (r) with the reference kin (Equation 2), which appears slightly conservative. As correlations between family member diseases were very low (range 0.0005 to 0.1048), in practice, this adjustment had no effect.

URLs

MultiABEL: https://github.com/xiashen/MultiABEL/

LDSC: https://github.com/bulik/ldsc

SMR/HEIDI: https://cnsgenomics.com/software/smr/

SOJO: https://github.com/zhenin/sojo/

DEPICT: https://www.broadinstitute.org/mpg/depict/

PASCAL: https://www2.unil.ch/cbg/index.php?title=Pascal

GTEx: https://gtexportal.org/home/datasets

References

  1. 1
    Concordat and Moratorium on Genetics and Insurance
    1. Association of British Insurers and UK Government
    (2014)
    ABI.
  2. 2
    Genetic effects on gene expression across human tissues
    1. A Battle
    2. CD Brown
    3. BE Engelhardt
    4. SB Montgomery
    5. GTEx ConsortiumLaboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working GroupStatistical Methods groups—Analysis Working GroupEnhancing GTEx (eGTEx) groupsNIH Common FundNIH/NCINIH/NHGRINIH/NIMHNIH/NIDABiospecimen Collection Source Site—NDRIBiospecimen Collection Source Site—RPCIBiospecimen Core Resource—VARIBrain Bank Repository—University of Miami Brain Endowment BankLeidos Biomedical—Project ManagementELSI StudyGenome Browser Data Integration &Visualization—EBIGenome Browser Data Integration &Visualization—UCSC Genomics Institute, University of California Santa CruzLead analysts:Laboratory, Data Analysis &Coordinating Center (LDACC):NIH program management:Biospecimen collection:Pathology:eQTL manuscript working group:
    6. Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group
    7. Statistical Methods groups—Analysis Working Group
    8. Enhancing GTEx (eGTEx) groups
    9. NIH Common Fund
    10. NIH/NCI
    11. NIH/NHGRI
    12. NIH/NIMH
    13. NIH/NIDA
    14. Biospecimen Collection Source Site—NDRI
    15. Biospecimen Collection Source Site—RPCI
    16. Biospecimen Core Resource—VARI
    17. Brain Bank Repository—University of Miami Brain Endowment Bank
    18. Leidos Biomedical—Project Management
    19. ELSI Study
    20. Genome Browser Data Integration &Visualization—EBI
    21. Genome Browser Data Integration &Visualization—UCSC Genomics Institute, University of California Santa Cruz
    22. Lead analysts:
    23. Laboratory, Data Analysis &Coordinating Center (LDACC):
    24. NIH program management:
    25. Biospecimen collection:
    26. Pathology:
    27. eQTL manuscript working group:
    (2017)
    Nature 550:204.
    https://doi.org/10.1038/nature24277
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
    Regression models and Life-Tables
    1. DR Cox
    (1972)
    Journal of the Royal Statistical Society: Series B 34:187–202.
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
    Introduction to Quantitative Genetics
    1. DS Falconer
    2. TFC Mackay
    3. R Frankham
    (1996)
    Trends in genetics : TIG, Introduction to Quantitative Genetics, 12, Cell Press.
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
    Interim Management Report
    1. Legal General Group PLC
    (2017)
    Legal and General Group.
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
    Rare and low-frequency coding variants alter human adult height
    1. E Marouli
    2. M Graff
    3. C Medina-Gomez
    4. KS Lo
    5. AR Wood
    6. TR Kjaer
    7. RS Fine
    8. Y Lu
    9. C Schurmann
    10. HM Highland
    11. S Rüeger
    12. G Thorleifsson
    13. AE Justice
    14. D Lamparter
    15. KE Stirrups
    16. V Turcot
    17. KL Young
    18. TW Winkler
    19. T Esko
    20. T Karaderi
    21. AE Locke
    22. NG Masca
    23. MC Ng
    24. P Mudgal
    25. MA Rivas
    26. S Vedantam
    27. A Mahajan
    28. X Guo
    29. G Abecasis
    30. KK Aben
    31. LS Adair
    32. DS Alam
    33. E Albrecht
    34. KH Allin
    35. M Allison
    36. P Amouyel
    37. EV Appel
    38. D Arveiler
    39. FW Asselbergs
    40. PL Auer
    41. B Balkau
    42. B Banas
    43. LE Bang
    44. M Benn
    45. S Bergmann
    46. LF Bielak
    47. M Blüher
    48. H Boeing
    49. E Boerwinkle
    50. CA Böger
    51. LL Bonnycastle
    52. J Bork-Jensen
    53. ML Bots
    54. EP Bottinger
    55. DW Bowden
    56. I Brandslund
    57. G Breen
    58. MH Brilliant
    59. L Broer
    60. AA Burt
    61. AS Butterworth
    62. DJ Carey
    63. MJ Caulfield
    64. JC Chambers
    65. DI Chasman
    66. YI Chen
    67. R Chowdhury
    68. C Christensen
    69. AY Chu
    70. M Cocca
    71. FS Collins
    72. JP Cook
    73. J Corley
    74. JC Galbany
    75. AJ Cox
    76. G Cuellar-Partida
    77. J Danesh
    78. G Davies
    79. PI de Bakker
    80. GJ de Borst
    81. S de Denus
    82. MC de Groot
    83. R de Mutsert
    84. IJ Deary
    85. G Dedoussis
    86. EW Demerath
    87. AI den Hollander
    88. JG Dennis
    89. E Di Angelantonio
    90. F Drenos
    91. M Du
    92. AM Dunning
    93. DF Easton
    94. T Ebeling
    95. TL Edwards
    96. PT Ellinor
    97. P Elliott
    98. E Evangelou
    99. AE Farmaki
    100. JD Faul
    101. MF Feitosa
    102. S Feng
    103. E Ferrannini
    104. MM Ferrario
    105. J Ferrieres
    106. JC Florez
    107. I Ford
    108. M Fornage
    109. PW Franks
    110. R Frikke-Schmidt
    111. TE Galesloot
    112. W Gan
    113. I Gandin
    114. P Gasparini
    115. V Giedraitis
    116. A Giri
    117. G Girotto
    118. SD Gordon
    119. P Gordon-Larsen
    120. M Gorski
    121. N Grarup
    122. ML Grove
    123. V Gudnason
    124. S Gustafsson
    125. T Hansen
    126. KM Harris
    127. TB Harris
    128. AT Hattersley
    129. C Hayward
    130. L He
    131. IM Heid
    132. K Heikkilä
    133. Ø Helgeland
    134. J Hernesniemi
    135. AW Hewitt
    136. LJ Hocking
    137. M Hollensted
    138. OL Holmen
    139. GK Hovingh
    140. JM Howson
    141. CB Hoyng
    142. PL Huang
    143. K Hveem
    144. MA Ikram
    145. E Ingelsson
    146. AU Jackson
    147. JH Jansson
    148. GP Jarvik
    149. GB Jensen
    150. MA Jhun
    151. Y Jia
    152. X Jiang
    153. S Johansson
    154. ME Jørgensen
    155. T Jørgensen
    156. P Jousilahti
    157. JW Jukema
    158. B Kahali
    159. RS Kahn
    160. M Kähönen
    161. PR Kamstrup
    162. S Kanoni
    163. J Kaprio
    164. M Karaleftheri
    165. SL Kardia
    166. F Karpe
    167. F Kee
    168. R Keeman
    169. LA Kiemeney
    170. H Kitajima
    171. KB Kluivers
    172. T Kocher
    173. P Komulainen
    174. J Kontto
    175. JS Kooner
    176. C Kooperberg
    177. P Kovacs
    178. J Kriebel
    179. H Kuivaniemi
    180. S Küry
    181. J Kuusisto
    182. M La Bianca
    183. M Laakso
    184. TA Lakka
    185. EM Lange
    186. LA Lange
    187. CD Langefeld
    188. C Langenberg
    189. EB Larson
    190. IT Lee
    191. T Lehtimäki
    192. CE Lewis
    193. H Li
    194. J Li
    195. R Li-Gao
    196. H Lin
    197. LA Lin
    198. X Lin
    199. L Lind
    200. J Lindström
    201. A Linneberg
    202. Y Liu
    203. Y Liu
    204. A Lophatananon
    205. J Luan
    206. SA Lubitz
    207. LP Lyytikäinen
    208. DA Mackey
    209. PA Madden
    210. AK Manning
    211. S Männistö
    212. G Marenne
    213. J Marten
    214. NG Martin
    215. AL Mazul
    216. K Meidtner
    217. A Metspalu
    218. P Mitchell
    219. KL Mohlke
    220. DO Mook-Kanamori
    221. A Morgan
    222. AD Morris
    223. AP Morris
    224. M Müller-Nurasyid
    225. PB Munroe
    226. MA Nalls
    227. M Nauck
    228. CP Nelson
    229. M Neville
    230. SF Nielsen
    231. K Nikus
    232. PR Njølstad
    233. BG Nordestgaard
    234. I Ntalla
    235. JR O'Connel
    236. H Oksa
    237. LM Loohuis
    238. RA Ophoff
    239. KR Owen
    240. CJ Packard
    241. S Padmanabhan
    242. CN Palmer
    243. G Pasterkamp
    244. AP Patel
    245. A Pattie
    246. O Pedersen
    247. PL Peissig
    248. GM Peloso
    249. CE Pennell
    250. M Perola
    251. JA Perry
    252. JR Perry
    253. TN Person
    254. A Pirie
    255. O Polasek
    256. D Posthuma
    257. OT Raitakari
    258. A Rasheed
    259. R Rauramaa
    260. DF Reilly
    261. AP Reiner
    262. F Renström
    263. PM Ridker
    264. JD Rioux
    265. N Robertson
    266. A Robino
    267. O Rolandsson
    268. I Rudan
    269. KS Ruth
    270. D Saleheen
    271. V Salomaa
    272. NJ Samani
    273. K Sandow
    274. Y Sapkota
    275. N Sattar
    276. MK Schmidt
    277. PJ Schreiner
    278. MB Schulze
    279. RA Scott
    280. MP Segura-Lepe
    281. S Shah
    282. X Sim
    283. S Sivapalaratnam
    284. KS Small
    285. AV Smith
    286. JA Smith
    287. L Southam
    288. TD Spector
    289. EK Speliotes
    290. JM Starr
    291. V Steinthorsdottir
    292. HM Stringham
    293. M Stumvoll
    294. P Surendran
    295. LM 't Hart
    296. KE Tansey
    297. JC Tardif
    298. KD Taylor
    299. A Teumer
    300. DJ Thompson
    301. U Thorsteinsdottir
    302. BH Thuesen
    303. A Tönjes
    304. G Tromp
    305. S Trompet
    306. E Tsafantakis
    307. J Tuomilehto
    308. A Tybjaerg-Hansen
    309. JP Tyrer
    310. R Uher
    311. AG Uitterlinden
    312. S Ulivi
    313. SW van der Laan
    314. AR Van Der Leij
    315. CM van Duijn
    316. NM van Schoor
    317. J van Setten
    318. A Varbo
    319. TV Varga
    320. R Varma
    321. DR Edwards
    322. SH Vermeulen
    323. H Vestergaard
    324. V Vitart
    325. TF Vogt
    326. D Vozzi
    327. M Walker
    328. F Wang
    329. CA Wang
    330. S Wang
    331. Y Wang
    332. NJ Wareham
    333. HR Warren
    334. J Wessel
    335. SM Willems
    336. JG Wilson
    337. DR Witte
    338. MO Woods
    339. Y Wu
    340. H Yaghootkar
    341. J Yao
    342. P Yao
    343. LM Yerges-Armstrong
    344. R Young
    345. E Zeggini
    346. X Zhan
    347. W Zhang
    348. JH Zhao
    349. W Zhao
    350. W Zhao
    351. H Zheng
    352. W Zhou
    353. JI Rotter
    354. M Boehnke
    355. S Kathiresan
    356. MI McCarthy
    357. CJ Willer
    358. K Stefansson
    359. IB Borecki
    360. DJ Liu
    361. KE North
    362. NL Heard-Costa
    363. TH Pers
    364. CM Lindgren
    365. C Oxvig
    366. Z Kutalik
    367. F Rivadeneira
    368. RJ Loos
    369. TM Frayling
    370. JN Hirschhorn
    371. P Deloukas
    372. G Lettre
    373. EPIC-InterAct Consortium
    374. CHD Exome+ Consortium
    375. ExomeBP Consortium
    376. T2D-Genes Consortium
    377. GoT2D Genes Consortium
    378. Global Lipids Genetics Consortium
    379. ReproGen Consortium
    380. MAGIC Investigators
    (2017)
    Nature 542:186–190.
    https://doi.org/10.1038/nature21039
  44. 44
    Global Health Estimates 2016: Deaths by Cause Age, Sex, by Country and by Region
    1. C Mathers
    2. GA Stevens
    3. WR Mahanani
    4. DM Fat
    5. D Hogan
    (2018)
    WHO.
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
    Shortening of life span and causes of excess mortality in a population-based series of subjects with rheumatoid arthritis
    1. R Myllykangas-Luosujärvi
    2. K Aho
    3. H Kautiainen
    4. H Isomäki
    (1995)
    Clinical and Experimental Rheumatology 13:149–153.
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
    Pleiotropy, natural selection, and the evolution of senescence
    1. GC Williams
    (1957)
    Evolution 11:398–411.
  72. 72
    Defining the role of common variation in the genomic and biological architecture of adult human height
    1. AR Wood
    2. T Esko
    3. J Yang
    4. S Vedantam
    5. TH Pers
    6. S Gustafsson
    7. AY Chu
    8. K Estrada
    9. J Luan
    10. Z Kutalik
    11. N Amin
    12. ML Buchkovich
    13. DC Croteau-Chonka
    14. FR Day
    15. Y Duan
    16. T Fall
    17. R Fehrmann
    18. T Ferreira
    19. AU Jackson
    20. J Karjalainen
    21. KS Lo
    22. AE Locke
    23. R Mägi
    24. E Mihailov
    25. E Porcu
    26. JC Randall
    27. A Scherag
    28. AA Vinkhuyzen
    29. HJ Westra
    30. TW Winkler
    31. T Workalemahu
    32. JH Zhao
    33. D Absher
    34. E Albrecht
    35. D Anderson
    36. J Baron
    37. M Beekman
    38. A Demirkan
    39. GB Ehret
    40. B Feenstra
    41. MF Feitosa
    42. K Fischer
    43. RM Fraser
    44. A Goel
    45. J Gong
    46. AE Justice
    47. S Kanoni
    48. ME Kleber
    49. K Kristiansson
    50. U Lim
    51. V Lotay
    52. JC Lui
    53. M Mangino
    54. I Mateo Leach
    55. C Medina-Gomez
    56. MA Nalls
    57. DR Nyholt
    58. CD Palmer
    59. D Pasko
    60. S Pechlivanis
    61. I Prokopenko
    62. JS Ried
    63. S Ripke
    64. D Shungin
    65. A Stancáková
    66. RJ Strawbridge
    67. YJ Sung
    68. T Tanaka
    69. A Teumer
    70. S Trompet
    71. SW van der Laan
    72. J van Setten
    73. JV Van Vliet-Ostaptchouk
    74. Z Wang
    75. L Yengo
    76. W Zhang
    77. U Afzal
    78. J Arnlöv
    79. GM Arscott
    80. S Bandinelli
    81. A Barrett
    82. C Bellis
    83. AJ Bennett
    84. C Berne
    85. M Blüher
    86. JL Bolton
    87. Y Böttcher
    88. HA Boyd
    89. M Bruinenberg
    90. BM Buckley
    91. S Buyske
    92. IH Caspersen
    93. PS Chines
    94. R Clarke
    95. S Claudi-Boehm
    96. M Cooper
    97. EW Daw
    98. PA De Jong
    99. J Deelen
    100. G Delgado
    101. JC Denny
    102. R Dhonukshe-Rutten
    103. M Dimitriou
    104. AS Doney
    105. M Dörr
    106. N Eklund
    107. E Eury
    108. L Folkersen
    109. ME Garcia
    110. F Geller
    111. V Giedraitis
    112. AS Go
    113. H Grallert
    114. TB Grammer
    115. J Gräßler
    116. H Grönberg
    117. LC de Groot
    118. CJ Groves
    119. J Haessler
    120. P Hall
    121. T Haller
    122. G Hallmans
    123. A Hannemann
    124. CA Hartman
    125. M Hassinen
    126. C Hayward
    127. NL Heard-Costa
    128. Q Helmer
    129. G Hemani
    130. AK Henders
    131. HL Hillege
    132. MA Hlatky
    133. W Hoffmann
    134. P Hoffmann
    135. O Holmen
    136. JJ Houwing-Duistermaat
    137. T Illig
    138. A Isaacs
    139. AL James
    140. J Jeff
    141. B Johansen
    142. Å Johansson
    143. J Jolley
    144. T Juliusdottir
    145. J Junttila
    146. AN Kho
    147. L Kinnunen
    148. N Klopp
    149. T Kocher
    150. W Kratzer
    151. P Lichtner
    152. L Lind
    153. J Lindström
    154. S Lobbens
    155. M Lorentzon
    156. Y Lu
    157. V Lyssenko
    158. PK Magnusson
    159. A Mahajan
    160. M Maillard
    161. WL McArdle
    162. CA McKenzie
    163. S McLachlan
    164. PJ McLaren
    165. C Menni
    166. S Merger
    167. L Milani
    168. A Moayyeri
    169. KL Monda
    170. MA Morken
    171. G Müller
    172. M Müller-Nurasyid
    173. AW Musk
    174. N Narisu
    175. M Nauck
    176. IM Nolte
    177. MM Nöthen
    178. L Oozageer
    179. S Pilz
    180. NW Rayner
    181. F Renstrom
    182. NR Robertson
    183. LM Rose
    184. R Roussel
    185. S Sanna
    186. H Scharnagl
    187. S Scholtens
    188. FR Schumacher
    189. H Schunkert
    190. RA Scott
    191. J Sehmi
    192. T Seufferlein
    193. J Shi
    194. K Silventoinen
    195. JH Smit
    196. AV Smith
    197. J Smolonska
    198. AV Stanton
    199. K Stirrups
    200. DJ Stott
    201. HM Stringham
    202. J Sundström
    203. MA Swertz
    204. AC Syvänen
    205. BO Tayo
    206. G Thorleifsson
    207. JP Tyrer
    208. S van Dijk
    209. NM van Schoor
    210. N van der Velde
    211. D van Heemst
    212. FV van Oort
    213. SH Vermeulen
    214. N Verweij
    215. JM Vonk
    216. LL Waite
    217. M Waldenberger
    218. R Wennauer
    219. LR Wilkens
    220. C Willenborg
    221. T Wilsgaard
    222. MK Wojczynski
    223. A Wong
    224. AF Wright
    225. Q Zhang
    226. D Arveiler
    227. SJ Bakker
    228. J Beilby
    229. RN Bergman
    230. S Bergmann
    231. R Biffar
    232. J Blangero
    233. DI Boomsma
    234. SR Bornstein
    235. P Bovet
    236. P Brambilla
    237. MJ Brown
    238. H Campbell
    239. MJ Caulfield
    240. A Chakravarti
    241. R Collins
    242. FS Collins
    243. DC Crawford
    244. LA Cupples
    245. J Danesh
    246. U de Faire
    247. HM den Ruijter
    248. R Erbel
    249. J Erdmann
    250. JG Eriksson
    251. M Farrall
    252. E Ferrannini
    253. J Ferrières
    254. I Ford
    255. NG Forouhi
    256. T Forrester
    257. RT Gansevoort
    258. PV Gejman
    259. C Gieger
    260. A Golay
    261. O Gottesman
    262. V Gudnason
    263. U Gyllensten
    264. DW Haas
    265. AS Hall
    266. TB Harris
    267. AT Hattersley
    268. AC Heath
    269. C Hengstenberg
    270. AA Hicks
    271. LA Hindorff
    272. AD Hingorani
    273. A Hofman
    274. GK Hovingh
    275. SE Humphries
    276. SC Hunt
    277. E Hypponen
    278. KB Jacobs
    279. MR Jarvelin
    280. P Jousilahti
    281. AM Jula
    282. J Kaprio
    283. JJ Kastelein
    284. M Kayser
    285. F Kee
    286. SM Keinanen-Kiukaanniemi
    287. LA Kiemeney
    288. JS Kooner
    289. C Kooperberg
    290. S Koskinen
    291. P Kovacs
    292. AT Kraja
    293. M Kumari
    294. J Kuusisto
    295. TA Lakka
    296. C Langenberg
    297. L Le Marchand
    298. T Lehtimäki
    299. S Lupoli
    300. PA Madden
    301. S Männistö
    302. P Manunta
    303. A Marette
    304. TC Matise
    305. B McKnight
    306. T Meitinger
    307. FL Moll
    308. GW Montgomery
    309. AD Morris
    310. AP Morris
    311. JC Murray
    312. M Nelis
    313. C Ohlsson
    314. AJ Oldehinkel
    315. KK Ong
    316. WH Ouwehand
    317. G Pasterkamp
    318. A Peters
    319. PP Pramstaller
    320. JF Price
    321. L Qi
    322. OT Raitakari
    323. T Rankinen
    324. DC Rao
    325. TK Rice
    326. M Ritchie
    327. I Rudan
    328. V Salomaa
    329. NJ Samani
    330. J Saramies
    331. MA Sarzynski
    332. PE Schwarz
    333. S Sebert
    334. P Sever
    335. AR Shuldiner
    336. J Sinisalo
    337. V Steinthorsdottir
    338. RP Stolk
    339. JC Tardif
    340. A Tönjes
    341. A Tremblay
    342. E Tremoli
    343. J Virtamo
    344. MC Vohl
    345. P Amouyel
    346. FW Asselbergs
    347. TL Assimes
    348. M Bochud
    349. BO Boehm
    350. E Boerwinkle
    351. EP Bottinger
    352. C Bouchard
    353. S Cauchi
    354. JC Chambers
    355. SJ Chanock
    356. RS Cooper
    357. PI de Bakker
    358. G Dedoussis
    359. L Ferrucci
    360. PW Franks
    361. P Froguel
    362. LC Groop
    363. CA Haiman
    364. A Hamsten
    365. MG Hayes
    366. J Hui
    367. DJ Hunter
    368. K Hveem
    369. JW Jukema
    370. RC Kaplan
    371. M Kivimaki
    372. D Kuh
    373. M Laakso
    374. Y Liu
    375. NG Martin
    376. W März
    377. M Melbye
    378. S Moebus
    379. PB Munroe
    380. I Njølstad
    381. BA Oostra
    382. CN Palmer
    383. NL Pedersen
    384. M Perola
    385. L Pérusse
    386. U Peters
    387. JE Powell
    388. C Power
    389. T Quertermous
    390. R Rauramaa
    391. E Reinmaa
    392. PM Ridker
    393. F Rivadeneira
    394. JI Rotter
    395. TE Saaristo
    396. D Saleheen
    397. D Schlessinger
    398. PE Slagboom
    399. H Snieder
    400. TD Spector
    401. K Strauch
    402. M Stumvoll
    403. J Tuomilehto
    404. M Uusitupa
    405. P van der Harst
    406. H Völzke
    407. M Walker
    408. NJ Wareham
    409. H Watkins
    410. HE Wichmann
    411. JF Wilson
    412. P Zanen
    413. P Deloukas
    414. IM Heid
    415. CM Lindgren
    416. KL Mohlke
    417. EK Speliotes
    418. U Thorsteinsdottir
    419. I Barroso
    420. CS Fox
    421. KE North
    422. DP Strachan
    423. JS Beckmann
    424. SI Berndt
    425. M Boehnke
    426. IB Borecki
    427. MI McCarthy
    428. A Metspalu
    429. K Stefansson
    430. AG Uitterlinden
    431. CM van Duijn
    432. L Franke
    433. CJ Willer
    434. AL Price
    435. G Lettre
    436. RJ Loos
    437. MN Weedon
    438. E Ingelsson
    439. JR O'Connell
    440. GR Abecasis
    441. DI Chasman
    442. ME Goddard
    443. PM Visscher
    444. JN Hirschhorn
    445. TM Frayling
    446. Electronic Medical Records and Genomics (eMEMERGEGE) Consortium
    447. MIGen Consortium
    448. PAGEGE Consortium
    449. LifeLines Cohort Study
    450. MIGen Consortium
    451. PAGEGE Consortium
    452. LifeLines Cohort Study
    (2014)
    Nature Genetics 46:1173–1186.
    https://doi.org/10.1038/ng.3097
  73. 73
  74. 74
  75. 75
  76. 76

Decision letter

In the interests of transparency, eLife includes the editorial decision letter, peer reviews, and accompanying author responses.

[Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed.]

Thank you for submitting your article "Genomic underpinnings of lifespan allow prediction and reveal basis in modern risks" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Mark McCarthy as the Senior Editor. The following individual involved in review of your submission has agreed to reveal his identity: Joris Deelen (Reviewer #3). The other reviewers remain anonymous.

The Reviewing Editor has highlighted the concerns that require revision and/or responses, and we have included the separate reviews below for your consideration. If you have any questions, please do not hesitate to contact us.

The major concerns are as follows:

– We found the paper was often hard to read and would benefit from substantial re-writing to improve clarity, being more succinct at some points, and providing additional necessary details at other occasions.

– We recommend that the authors limit the discussion around the (non)replication of results, and focus more attention on the subset of loci that show significant association.

– We felt that (as has been shown repeatedly) a one-stage approach – using a more stringent significance threshold – would likely provide more statistical power for discovery, than the current multiple-stage design. It would also simplify and focus the reporting, allowing more in-depth analysis and discussion of each (robustly) associated locus.

– We recommend that UKBiobank analyses use appropriate statistical tools to account for population stratification and/or (cryptic) relatedness (e.g. BOLT-LMM); and that any overlap between samples is removed when combining data.

– We recommend that analyses indicating significant sex-specific effects include both sex-interaction and sex-stratified analyses.

– We felt strongly that there was an opportunity for more in-depth discussion to increase the informativeness of the study: see reviewer comments for suggestions.

– An important topic that has not been fully worked out relates to the proportion of the genetic basis of longevity that is explained by avoiding disease vs other mechanisms; i.e. is longevity simply about avoiding disease?

Separate reviews (please include a response on each point):

Reviewer #1:

This study reports on the genetic underpinnings of lifespan using survival data from 635,205 parents of participants to the UK Biobank (British ancestry only) and of 377,035 parents of participants to the LifeGen of European ancestry. A total of 6 novel loci were identified, whereas another 6 loci from previous studies were replicated. Nevertheless, several other loci reported before for longevity were not replicated in the current study for lifespan. Consistent with previous reports, the life-span increasing alleles of several of the lead variants has protective effects on cardiovascular disease (risk factors), type 2 diabetes, COPD, and lung cancer. Surprisingly, lifespan increasing alleles were not associated with lower risk of others cancers. Pathway enrichment analyses confirmed the importance of pathways related to lipoprotein particles, but also highlighted new pathways involved in vesicle transport, metabolism of acylglycerol and sterols and synaptic and dendritic function. Using a polygenic risk score, based on the lifespan loci, the authors estimate that individuals in the top decile, carriers the most lifespan increasing alleles, live on average 5 years longer than individuals in the bottom decile of individuals (with the fewest lifespan increasing alleles).

General comments

Strengths:

The authors have combined a massive amount of information, using a kin-cohort design, genotype information of participants and phenotype data of parents are used for analyses. In the context of longevity, this approach allows using more complete information as compared to using phenotype (age of death) data of participants themselves.

Weaknesses:

The paper is generally hard to read; the text is often long-winded, yet, at many occasions, critical details are missing for readers to be able to interpret results. Along the same lines, some interpretations by the authors lack background for readers to be able to follow their reasoning. The paper will require a substantial re-writing to get the valuable information across. I have provided some suggestions in the "specific comments" section.

The study design is confusing and seems generally inefficient.

– For example, why were >70,000 related individuals (i.e. >17% of the population) removed from analyses – methods exist to use the information of related individuals (e.g. BOLT-LMM).

– The kin-cohort is an interesting design; how does deal with (or take advantage of?) potential assortative mating ?

– The discovery cohort exists of the "British ancestry" population. It is not clear whether these are genetically homogeneous, or whether they are all of European (or other) ancestry ? Did the author confirm their ancestry genetically? Why were the remaining European ancestry individuals not included in discovery ?

– Why did the authors choose for a two-stage approach ? As they seem to have limited statistical power, a full meta-analysis of all available data, possibly with a more stringent significance threshold, might have been more effective for discovery of new loci, and for replication of previously reported loci.

– The authors performed conditional analyses after already performing many other analyses on identified loci, and then identify several additional independent loci. Clearly, conditional analyses should have been performed immediately following the identification of the 12 loci, such that the additional loci could have been included in the follow-up analyses. Also, more details need to be provided on how analyses were performed in the HLA locus – given the complicate of the HLA locus, this requires analyses beyond the typically GWAS analyses.

– It is disappointing that no mention was made of non-European ancestry populations. While they may have been underpowered on their own, they may have increased statistical power in overall analyses.

– I am surprised by the relatively low statistical power of the current study – studies that include 1 million variants typically identify hundred(s) of genetic loci, even when the narrow-sense heritability is relatively low. The identification of just 12 loci seems disappointing; some discussion on potential reasons of low power and low yield would be informative.

– The authors spend a lot of time/text listing the variants they identified in discovery, which ones of those did or did not replicated, how they compare to previously identified loci and how previously identified loci perform in the current analyses. Readers will get lost, and what is needed is a much more succinct, clear cut section on which loci are robustly associated with lifespan. It is also not clear when variants are indeed considered as replicated and what the significance thresholds are. This results in a cluttered list of replicated and non-replicated variants, and by the end of this section, I had no idea what the main findings were.

– The authors report on a number of analyses that turn out to be side-tracks; i.e. reporting (and discussing) variants that did not reach genome-wide significance or that were not replicated, is misleading to readers. Why perform analyses of which the authors then conclude that findings are biased, to then redo analyses in an unbiased manner ? Simply report the unbiased results/analyses.

– The authors state (also in the title) that the genetic findings (mainly related to the PRS) are "predictive". However, I have a feeling that the word "prediction/predictive" is not used in the correct context. The PRS is "associated" with lifespan, but the authors did not provide data to show that the PRS predicts lifespan (or extreme longevity). The wording (also in the title) (or the analyses) should be corrected.

– The section on implication of causal genes and methylation sites is confusing. Please, provide more details on how the causal genes were identified. In fact, a session on fine-mapping would be informative (e.g. with credible sets).

– Given the vast amount of data, it would also be of interest to have a focus analyses of rare coding variants. Or an analysis of (common and rare) variants and the extremes of the "lifespan" distribution could be informative.

Specific comments:

Title: The title is not clear; no prediction analyses were performed and what is mean with "modern risk"?

Introduction: The introduction is too long and wordy, yet often confusing, as several aspects are not clear. E.g. it should be made clear what the difference is between longevity, lifespan…? The kin-cohort design may need some explanation as well. Heritability estimates are only reported as explained variance from GWAs studies, but what about heritability estimated using family and twin designs.

Results: As mentioned before, it is hard to keep track of which loci are considered "real" and which ones are not. Be more succinct and focus on the key (replicated) findings, no need to provide extensive detail on loci that turn out to be not "real".

Also, there is no need to state over and again that results are not significant because 95%CI overlap with the "0", as that's part of the definition of significance.

The power calculations are not very informative; i.e. the combined analysis had 50% power to detect very common variants (MAF>30%) with an effect size of 0.25 years? 50% is low power. Why not provide the data for 80 or 90% power?

Table 1 is also confusing as it reports many loci that were not replicated. As a main table, readers will want to know which of the loci were truly discovered. Hence, a table should just report the results of the replicated variants; others can be reported in supplementary tables. Also, the legend (of this table and others) is enormous; no need to repeat what has written elsewhere. Clearly, too much redundant information is being reported for the main table as CES P, PDES P and iGWAS p all report the same in this case. Instead of reporting all these statistics in the main table, the authors should consider reporting the most informative results and report others in (very) brief in the text or in supplementary tables.

What is meant with "having converted the recessive effect into an apparent effect for a truly recessive allele"?

The iGWAS needs some explanation in the main text. Based on the highlights in Table 2—source data 2, it seem BMI was not included in the iGWAS? If so, please clarify ? Also, why were significant results for BMI not highlighted?

Stating that the narrow sense heritability was increased "by" 79% is mainly misleading, but also not informative when you do not provide from what value it is increasing. Be straightforward and report from which to which value the narrow-sense heritability is increased.

Provide more details on how cell type and pathway enrichment analyses were performed in the main text.

The so-called "out of sample lifespan predictions" should not be labeled as predictions – no predictive analyses were performed. The analyses reported on in this section seems to be a simple validation of a PRS in another population. Some discussion on why the difference between top and bottom deciles in the Estonian population is smaller would be informative.

Discussion: I would be interested in a greater discussion as to why findings from extreme longevity studies are not (all) replicated in the lifespan studies and vice versa. [No need to speculate on the statistics power, as this can be calculated.] In additional, more insight into why genetic variants do not seems to affect lifespan would be of interest too. The author may consider working with PRS and also estimate genetic correlations.

Reviewer #2:

The manuscript by Timmers et al. describes a multi-study GWAS for parental age at death, a trait which has been show to efficiently identify loci for longevity. Although the novelty of this paper is compromised by other similar papers using highly overlapping datasets, the authors should be commended for their rigorous analytical approach and attention to detail. I did however feel at times that clarity of story was sometimes scarified to demonstrate this rigor, and would recommend that the (main) text was re-read and simplified where possible.

I do feel that the heavy focus on lack of replication from one of the previous papers in this area is a bit overstated. Pilling et al. (I'm not connected to this study) did evaluate a polygenic risk score (PRS) in an external dataset, although they were not able to replicate individual variants. Although this current paper does a far better job in that respect, it is still the case that only a very small fraction of the loci presented in main Table 1 replicate in the true sense of the word (i.e P < 0.05 / N). Whilst I definitely don't dispute that replication is important, many large-scale discovery GWAS studies (for example GIANT Locke et al) do not replicate individual SNPs, but may instead evaluate PRS performance in other studies. Ultimately the nature of replication may have more importance when considering what other investigators do with these results downstream. For example, the individual variant level of significance is more important when considering experimental follow-up vs inclusion in a PRS.

Given many of the loci in Table 1 come from the discovery + replication combined analysis, it may be more appropriate to present this as the primary analysis. This would simplify much of the manuscript and allow for a more detailed presentation of the more interesting analyses (some of which were buried in supplemental sections and not described well in the main text). This doesn't preclude first evaluating/replicating the loci previously reported by Pilling et al.

When considering why this study had identified fewer loci than Pilling et al., I noticed that only unrelated individuals were included. This must result in quite a large sample drop, which seems unnecessary. It may also increase novelty to rerun this analysis using the full (v3) UKBB imputation (rather than v2) and include the X-chromosome (I couldn't tell if this had been included).

Can the authors estimate what proportion of the heritable component of lifespan is explained by avoiding disease (i.e identifying protective alleles) vs unexplained mechanisms. This seems key for considering future studies in this area, and I wonder if there are other ways this could be more directly assessed?

Reviewer #3:

The paper by Joshi et al. reports the results from a GWAS on parental lifespan, which consist of a discovery (UK Biobank) and replication phase (Lifegen). The authors identified multiple novel loci for parental lifespan when combining the discovery and replication phase. In addition, they were able to replicate several previously identified loci for parental lifespan as well as longevity. They subsequently perform follow-up analyses (including polygenic risk score analyses) to pinpoint the affected pathways as well as relations with other phenotypes and show a relation between parental lifespan and several age-related diseases.

Major comments:

1) Instead of reporting the most significant variant within a locus, it would be more appropriate to report the most-likely causal variant as well as independent genetic variants within each locus. This is common practice in GWAS.

2) The authors performed a meta-analysis on three published longevity GWAS to use for replication of their findings (subsection “Mortality risk factor-informed GWAS (iGWAS)”). They thereby assumed that these three studies looked at the same phenotype, which is actually not the case (one looked at mortality and the other two at longevity (with one using death controls only and the other one dead + alive controls). In addition, the cohorts used in these three GWAS are partly overlapping, so they cannot be analyzed together. Instead the authors should look at the replication separately in each of the studies and update Figure 4 and Figure 4—source data 1 accordingly.

3) If the authors assume that there may be sex-specific effects, why have they not performed a sex-stratified analysis to see if this is indeed the case? They currently only looked at sex-specific effects for their top hits.

4) The out-of-sample lifespan predictions actually highlight an important limitation of using the parental age at death as proxy for an individual's own age at death, since the polygenic risk score works less well in the subjects as compared to their parents. This should be discussed in more detail in the Discussion section.

5) The fact that a higher polygenic risk score results in an increased prevalence of Alzheimer's disease, Parkinson's disease, and prostate and breast cancer shows that the majority of the identified genetic variants are likely not related to healthy ageing, but only to lifespan. Hence, the question arises if this is what we are aiming for with our efforts to unravel the genetics of longevity. This should be discussed in more detail in the Discussion section.

6) I think the authors should mention that the prediction that SESN1 is the gene driving the effect at the FOXO3A locus is likely incorrect, since the functional study show that the variant at this locus has a direct effect on the functioning of FOXO3A itself. Hence, this shows that predictions can be informative but are not always supported by functional evidence.

7) I think the part in the Discussion section about the comparison between lifespan and annuity pricing is farfetched and does not add anything to the manuscript. In addition, I do not agree with the conclusion that the results from the polygenic risk score are "meaningful socially and actuarially", since these results are not that convincing (see comment 6). I would advise to rewrite or even completely remove this section.

Minor Comments:

– The Title of the manuscript is a bad representation of the results described in the manuscript, especially the part "reveal basis in modern risks", which is merely speculative. Hence, the authors should come up with a Title that better carries the load.

– Introduction section: The narrow-sense heritability for longevity mentioned in the study by Kaplanis et al. is 12.2% and since they compared this heritability with the previously published ones, it would be good to also mention that one here (instead of the 16.1%).

– Results paragraph two: Reference 10 should be reference 7.

– I think the word "causal" opening sentence of subsection “Implication of causal genes and methylation sites” is misleading, since the authors are not able to prove causality for their variants with the analyses they performed.

– The numbering of the Supplementary Tables is not in line with the text.

https://doi.org/10.7554/eLife.39856.047

Author response

The major concerns are as follows:

– We found the paper was often hard to read and would benefit from substantial re-writing to improve clarity, being more succinct at some points, and providing additional necessary details at other occasions.

With hindsight, we agree. The original version followed too closely our results development process, rather than looking at the end results holistically. We have revised the manuscript substantially, and focused results and discussion as suggested in specific reviewer comments.

– We recommend that the authors limit the discussion around the (non)replication of results, and focus more attention on the subset of loci that show significant association.

Yes, this has been done. Indeed, much of the non-replication aspects have been superseded by the dropping of the multi-stage design (see next editor comment).

– We felt that (as has been shown repeatedly) a one-stage approach – using a more stringent significance threshold – would likely provide more statistical power for discovery, than the current multiple-stage design. It would also simplify and focus the reporting, allowing more in-depth analysis and discussion of each (robustly) associated locus.

We accept this suggestion. As suggested the results do appear robust and the presentation is much more straightforward, i.e. we changed our two-stage discovery and replication into a single stage. To determine an adequate significance threshold, we looked for GWAS studies published in 2018 and found studies only adjusted their threshold when analysing multiple traits or rare variants (see included document). We chose to apply a threshold of 2.5e-8 to account for multiple testing and validate our results using a polygenic risk score. The change affects much of the manuscript, but is most obvious on Table 1.

– We recommend that UKBiobank analyses use appropriate statistical tools to account for population stratification and/or (cryptic) relatedness (e.g. BOLT-LMM);

We agree LMM is often useful in UK Biobank, but judge that the kin-cohort method is an exception. We have now explained this in subections “GWAS” and “Data sources”.

and that any overlap between samples is removed when combining data.

Agreed, this was a misunderstanding on our part. However, as we do not have access to the cohort level data from the meta-analyses performed by other researchers, we have instead adjusted standard errors to reflect the overlap. See subsections “Mortality risk factor-informed GWAS (iGWAS)”, “Candidate SNP replication” and “Replication in extreme long-livedness”.

– We recommend that analyses indicating significant sex-specific effects include both sex-interaction and sex-stratified analyses.

We apologise: our terminology Common Effect Size (CES) and Potentially Different Effect Size (PDES) was confusing. Our approach was to start with sex-stratified analyses (mothers and fathers). The meta-analysis of these results, assuming a single common effect size was thus an unstratified analysis. However, in the PDES analysis, the use of MANOVA, meant that the father and mother traits in the latter analysis was a generalised form of sex-stratification. In particular, an effect on only one sex would be apparent, albeit with an appropriate adjustment to P value to reflect the implicit multiple testing. This approach has the advantage of permitting antagonistic pleiotropy across sexes or (as we in fact observe) different effect sizes in the same direction to be combined, with proper adjustment for the implicit multiple correlated tests, akin to Fishers method for combining P values, for independent tests. We have amended the analysis names to "Sex Common Effect Size" and "Sex Specific Effect Size"n i.e. stratified in the latter case – See subsections “Genome-wide association analysis”, paragraph two of “GWAS” and “UK Biobank Genome-Wide Association Study”.

– We felt strongly that there was an opportunity for more in-depth discussion to increase the informativeness of the study: see reviewer comments for suggestions.

Thank you. We have expanded our discussion significantly.

– An important topic that has not been fully worked out relates to the proportion of the genetic basis of longevity that is explained by avoiding disease vs other mechanisms; i.e. is longevity simply about avoiding disease ?

This is a good question, but one which our data and results can only provide a glimpse of the answer to. We have calculated the proportion of heritability explained by the disease SNPs – see paragraph six of “Disease and lifespan”. This is limited. i.e. there is much missing h2 and it is as yet not possible to say its split between disease and aging.

Separate reviews (please include a response on each point):

Reviewer #1:

[…] The study design is confusing and seems generally inefficient.

– For example, why were >70,000 related individuals (i.e. >17% of the population) removed from analyses – methods exist to use the information of related individuals (e.g. BOLT-LMM).

Please, see above response to the editor.

– The kin-cohort is an interesting design; how does deal with (or take advantage of?) potential assortative mating?

Yes, comment added in discussion, see paragraph seven.

– The discovery cohort exists of the "British ancestry" population. It is not clear whether these are genetically homogeneous, or whether they are all of European (or other) ancestry? Did the author confirm their ancestry genetically? Why were the remaining European ancestry individuals not included in discovery?

Although somewhat superseded by the dropping of the two-stage approach, we have clarified ancestry in subsection “GWAS” and “Data sources”, and do include European ancestry.

– Why did the authors choose for a two-stage approach? As they seem to have limited statistical power, a full meta-analysis of all available data, possibly with a more stringent significance threshold, might have been more effective for discovery of new loci, and for replication of previously reported loci.

Accepted, please see the editor response above.

– The authors performed conditional analyses after already performing many other analyses on identified loci, and then identify several additional independent loci. Clearly, conditional analyses should have been performed immediately following the identification of the 12 loci, such that the additional loci could have been included in the follow-up analyses. Also, more details need to be provided on how analyses were performed in the HLA locus – given the complicate of the HLA locus, this requires analyses beyond the typically GWAS analyses.

In line with the previous comment on multiple, potentially biased analyses, we have replaced the PheWAS with a disease lookup in the GWAS catalog and PhenoScanner. This new analysis looks for associations with multiple SNPs in the same locus, and as such including conditional SNPs is no longer relevant.

– It is disappointing that no mention was made of non-European ancestry populations. While they may have been underpowered on their own, they may have increased statistical power in overall analyses.

Yes we could have done this, but the non-European ancestry components were very small in UK Biobank and initial analyses on these sub-cohorts appeared noisy. Trans-ethnic meta-analysis was therefore beyond the scope of this work.

– I am surprised by the relatively low statistical power of the current study – studies that include 1 million variants typically identify hundred(s) of genetic loci, even when the narrow-sense heritability is relatively low. The identification of just 12 loci seems disappointing; some discussion on potential reasons of low power and low yield would be informative.

Yes, we were surprised too, although a little general reasoning has suggested the result is reasonable. We have expanded discussion paragraph five and six, concluding that due to low heritability and the indirect use of parent genotypes, a further order of magnitude increase in sample size is needed, to attain granularity of results available in other GWAMA.

– The authors spend a lot of time/text listing the variants they identified in discovery, which ones of those did or did not replicated, how they compare to previously identified loci and how previously identified loci perform in the current analyses. Readers will get lost, and what is needed is a much more succinct, clear cut section on which loci are robustly associated with lifespan. It is also not clear when variants are indeed considered as replicated and what the significance thresholds are. This results in a cluttered list of replicated and non-replicated variants, and by the end of this section, I had no idea what the main findings were.

We apologise for this. Adopting the one stage approach and removing the skewed downstream analyses has simplified these issues. We have replaced the UK Biobank PheWAS analysis with an unbiased GWAS catalog lookup, using only SNPs passing our stringent genome-wide significance threshold (see subsection “Disease and lifespan”). We have limited the analyses investigating specific loci to SNPs which are either discovered or replicated in our study (for examples, see subsection “Genome-wide association analysis”).

– The authors report on a number of analyses that turn out to be side-tracks; i.e. reporting (and discussing) variants that did not reach genome-wide significance or that were not replicated, is misleading to readers. Why perform analyses of which the authors then conclude that findings are biased, to then redo analyses in an unbiased manner? Simply report the unbiased results/analyses.

Agreed. We have replaced the UK Biobank PheWAS analysis with independent GWAS catalog lookup, using only SNPs passing our stringent genome-wide significance threshold (see “Disease and lifespan”). We have limited the analyses investigating specific loci to SNPs which are either discovered or replicated in our stud.

– The authors state (also in the title) that the genetic findings (mainly related to the PRS) are "predictive". However, I have a feeling that the word "prediction/predictive" is not used in the correct context. The PRS is "associated" with lifespan, but the authors did not provide data to show that the PRS predicts lifespan (or extreme longevity). The wording (also in the title) (or the analyses) should be corrected.

We see both sides of this issue. We have not created predicted ages at death and compared them with actual ages at death, not least because of censoring. On the other hand our scores are predictive of survival (e.g. the KM curves). Although "associated" with survival would be accurate too, we are confident that this association is also predictive. We have moved to "associate" and "explain" in general.

– The section on implication of causal genes and methylation sites is confusing. Please, provide more details on how the causal genes were identified. In fact, a session on fine-mapping would be informative (e.g. with credible sets).

We have now included the analysis method used to implicate causal genes in the main text (subsection “Causal genes and methylation sites”) and summarised the results more clearly.

– Given the vast amount of data, it would also be of interest to have a focus analyses of rare coding variants. Or an analysis of (common and rare) variants and the extremes of the "lifespan" distribution could be informative.

We agree this would be an interesting study, perhaps along the lines of https://www.biorxiv.org/content/early/2018/09/04/407981. However, it is beyond the scope of present work.

Specific comments

Title: The title is not clear; no prediction analyses were performed and what is mean with "modern risk"?

Accepted. We have revised the title along these lines. "Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances"

Introduction: The introduction is too long and wordy, yet often confusing, as several aspects are not clear. E.g. it should be made clear what the difference is between longevity, lifespan…? The kin-cohort design may need some explanation as well. Heritability estimates are only reported as explained variance from GWAs studies, but what about heritability estimated using family and twin designs.

Accepted. We revised much of the Introduction, extending discussion of heritability estimates, lifespan vs. longevity, and the kin-cohort method, while reducing overall wordiness and improving clarity.

Results: As mentioned before, it is hard to keep track of which loci are considered "real" and which ones are not. Be more succinct and focus on the key (replicated) findings, no need to provide extensive detail on loci that turn out to be not "real".

Accepted. Our single-phase approach has simplified these issues. For examples, see paragraph two and three of “Genome-wide association analysis”.

Also, there is no need to state over and again that results are not significant because 95%CI overlap with the "0", as that's part of the definition of significance.

Accepted. For examples, see subsection “Genome-wide association analysis”.

The power calculations are not very informative; i.e. the combined analysis had 50% power to detect very common variants (MAF>30%) with an effect size of 0.25 years? 50% is low power. Why not provide the data for 80 or 90% power?

Accepted. Our initial attempt was to show for variants like this we had some (not high) power. However, the conventional calculation with 80% power is perhaps more readily understood and so we now take that approach in paragraph two of subsection “Genome-wide association analysis”.

Table 1 is also confusing as it reports many loci that were not replicated. As a main table, readers will want to know which of the loci were truly discovered. Hence, a table should just report the results of the replicated variants; others can be reported in supplementary tables.

Accepted. Table 1 is now much simpler: see table and legend of Figure 10—figure supplement 1.

Also, the legend (of this table and others) is enormous; no need to repeat what has written elsewhere. Clearly, too much redundant information is being reported for the main table as CES P, PDES P and iGWAS p all report the same in this case. Instead of reporting all these statistics in the main table, the authors should consider reporting the most informative results and report others in (very) brief in the text or in supplementary tables.

Accepted. We have reduced the legends of some display items (e.g. Figure 7), but this is balanced by eLife’s request to include relevant methodological information in figure and table legends.

What is meant with "having converted the recessive effect into an apparent effect for a truly recessive allele"?

This is now explained more fully in the final paragraph of subsection “Genome-wide association analysis”. Essentially an additive GWAS can detect recessive effects (albeit at reduced power) and we have calculated the corresponding effect size.

The iGWAS needs some explanation in the main text. Based on the highlights in Table 2—source data 2, it seem BMI was not included in the iGWAS? If so, please clarify ? Also, why were significant results for BMI not highlighted?

Accepted. We expanded the explanation in the main text (subsection “Mortality risk factor-informed GWAS (iGWAS)”). BMI was included in the analysis but was mistakenly not highlighted. This has been added.

Stating that the narrow sense heritability was increased "by" 79% is mainly misleading, but also not informative when you do not provide from what value it is increasing. Be straightforward and report from which to which value the narrow-sense heritability is increased.

Accepted and actioned. See subsection “Causal genes and methylation sites”.

Provide more details on how cell type and pathway enrichment analyses were performed in the main text.

Accepted. See subsection “Cell type and pathway enrichment”.

The so-called "out of sample lifespan predictions" should not be labeled as predictions – no predictive analyses were performed. The analyses reported on in this section seems to be a simple validation of a PRS in another population. Some discussion on why the difference between top and bottom deciles in the Estonian population is smaller would be informative.

Accepted. We have now moved to "associate" and discussed the reduced explanatory power in Estonia. See paragraph fifteen of the Discussion section.

Discussion: I would be interested in a greater discussion as to why findings from extreme longevity studies are not (all) replicated in the lifespan studies and vice versa. [No need to speculate on the statistics power, as this can be calculated.] In additional, more insight into why genetic variants do not seems to affect lifespan would be of interest too. The author may consider working with PRS and also estimate genetic correlations.

Yes, done in Discussion paragraph eleven. Broadly we consider the traits as highly overlapping (at least genetically). We find that lifespan SNPs are longevity SNPs (Figure 4) and that some reported longevity SNPs that are not lifespan SNPs may be false positives, or less plausibly only have effects delayed well beyond those of APOE e4.

Reviewer #2:

The manuscript by Timmers et al. describes a multi-study GWAS for parental age at death, a trait which has been show to efficiently identify loci for longevity. Although the novelty of this paper is compromised by other similar papers using highly overlapping datasets, the authors should be commended for their rigorous analytical approach and attention to detail. I did however feel at times that clarity of story was sometimes scarified to demonstrate this rigor, and would recommend that the (main) text was re-read and simplified where possible.

I do feel that the heavy focus on lack of replication from one of the previous papers in this area is a bit overstated. Pilling et al. (I'm not connected to this study) did evaluate a polygenic risk score (PRS) in an external dataset, although they were not able to replicate individual variants. Although this current paper does a far better job in that respect, it is still the case that only a very small fraction of the loci presented in main Table 1 replicate in the true sense of the word (i.e P < 0.05 / N). Whilst I definitely don't dispute that replication is important, many large-scale discovery GWAS studies (for example GIANT Locke et al) do not replicate individual SNPs, but may instead evaluate PRS performance in other studies. Ultimately the nature of replication may have more importance when considering what other investigators do with these results downstream. For example, the individual variant level of significance is more important when considering experimental follow-up vs inclusion in a PRS.

Accepted. We have moved to a single stage approach, with a lower P value threshold, and an approach akin to testing a PRS in an independent population, but using summary statistics from the second population, an approach familiar from two sample mendelian randomisation. We hope this is a good compromise, as well as a simpler presentation: whilst well powered for the score as a whole, and underpowered variant by variant, we believe the approach (e.g. Figure 4) does give a sense of the degree of confidence of individual variants. This reasoning has been added in subsection “Mortality risk factor-informed GWAS (iGWAS)”.

Given many of the loci in Table 1 come from the discovery + replication combined analysis, it may be more appropriate to present this as the primary analysis. This would simplify much of the manuscript and allow for a more detailed presentation of the more interesting analyses (some of which were buried in supplemental sections and not described well in the main text). This doesn't preclude first evaluating/replicating the loci previously reported by Pilling et al.

Accepted. See response to editor above, and Figure 2, which evaluates prior findings in our independent datasets.

When considering why this study had identified fewer loci than Pilling et al., I noticed that only unrelated individuals were included. This must result in quite a large sample drop, which seems unnecessary. It may also increase novelty to rerun this analysis using the full (v3) UKBB imputation (rather than v2) and include the X-chromosome (I couldn't tell if this had been included).

We believe our study is more conservative than Pilling et al. Using our one stage approach, we now have a bigger sample size, despite restricting to unrelated samples, rather than LMM (see editor response). We have set a significance threshold that is conservative with respect to multiple testing, in accordance with editorial suggestion. Nonetheless we do accept some of Pilling et al's hits are likely real, and try to present further evidence in figures. Extension of analysis to the new UK Biobank a imputation is beyond the scope of this manuscript.

Can the authors estimate what proportion of the heritable component of lifespan is explained by avoiding disease (i.e identifying protective alleles) vs unexplained mechanisms. This seems key for considering future studies in this area, and I wonder if there are other ways this could be more directly assessed?

Agreed. See response to editor above and reviewer 1 final comment.

Reviewer #3:

The paper by Joshi et al. reports the results from a GWAS on parental lifespan, which consist of a discovery (UK Biobank) and replication phase (Lifegen). The authors identified multiple novel loci for parental lifespan when combining the discovery and replication phase. In addition, they were able to replicate several previously identified loci for parental lifespan as well as longevity. They subsequently perform follow-up analyses (including polygenic risk score analyses) to pinpoint the affected pathways as well as relations with other phenotypes and show a relation between parental lifespan and several age-related diseases.

Major comments:

1) Instead of reporting the most significant variant within a locus, it would be more appropriate to report the most-likely causal variant as well as independent genetic variants within each locus. This is common practice in GWAS.

Our SOJO analysis has reported independent genetic variants within each locus, and following your comment, we attempted a Bayesian fine-mapping analysis using JAM to create credible sets, but this was underpowered. As such, we have chosen to report the index SNP as most-likely causal. We have changed the wording in table legends to reflect lead variants are index SNPs.

2) The authors performed a meta-analysis on three published longevity GWAS to use for replication of their findings (subsection “Mortality risk factor-informed GWAS (iGWAS)”). They thereby assumed that these three studies looked at the same phenotype, which is actually not the case (one looked at mortality and the other two at longevity (with one using death controls only and the other one dead + alive controls). In addition, the cohorts used in these three GWAS are partly overlapping, so they cannot be analyzed together. Instead the authors should look at the replication separately in each of the studies and update Figure 4 and Figure 4—source data 1 accordingly.

We are sorry not to have spotted the sample overlap issue and thank you for bringing it to our attention. We have now adjusted for it using an adjustment to the standard error, in the meta-analysis. We recognise that the traits are somewhat heterogeneous, but we are trying to replicate a distinct trait anyway. By recalibrating each study, we have made a best estimate meta-analysis, and most importantly, the meta-analysed effect will have the required behaviours under the null hypothesis, due to the recalibrated SE. See Figure 4—source data 1.

3) If the authors assume that there may be sex-specific effects, why have they not performed a sex-stratified analysis to see if this is indeed the case? They currently only looked at sex-specific effects for their top hits.

Agreed sex stratification is desirable, but we did do this (and apologise that it was unclear). See response to the editor above.

4) The out-of-sample lifespan predictions actually highlight an important limitation of using the parental age at death as proxy for an individual's own age at death, since the polygenic risk score works less well in the subjects as compared to their parents. This should be discussed in more detail in the Discussion section.

We agree. We have drawn this point out more in paragraph fifteen of the Discussion section. And note the reduction is only 20% – i.e. the signal is far from lost and across the deciles the hazard ratio is the same in both generations.

5) The fact that a higher polygenic risk score results in an increased prevalence of Alzheimer's disease, Parkinson's disease, and prostate and breast cancer shows that the majority of the identified genetic variants are likely not related to healthy ageing, but only to lifespan. Hence, the question arises if this is what we are aiming for with our efforts to unravel the genetics of longevity. This should be discussed in more detail in the Discussion section.

Yes, see paragraph eleven of the Discussion section.

6) I think the authors should mention that the prediction that SESN1 is the gene driving the effect at the FOXO3A locus is likely incorrect, since the functional study show that the variant at this locus has a direct effect on the functioning of FOXO3A itself. Hence, this shows that predictions can be informative but are not always supported by functional evidence.

While we agree the functional study has shown genetic variation at this locus influences FOXO3 expression, SMR-HEIDI does not find a link between FOXO3 expression and lifespan in our own data. Differential FOXO3 expression does not preclude a role for SESN1, which is connected to the same promotor and may thus also be differentially expressed. We have added this line of reasoning to the discussion and stressed the need for follow-up work.

7) I think the part in the Discussion section about the comparison between lifespan and annuity pricing is farfetched and does not add anything to the manuscript. In addition, I do not agree with the conclusion that the results from the polygenic risk score are "meaningful socially and actuarially", since these results are not that convincing (see comment 6). I would advise to rewrite or even completely remove this section.

We accept the phrase "meaningful" was subjective. However, we do feel we evidence that a 14% difference in price between the two pools is material – as two insurers playing solely in either of those pools would see profits double or almost wiped out (8.9% +/- 7%). Even if the 20% dilution observed across the whole spectrum applies, we would see an 11% distinction in profitability across the pools. In any case, there is increasing interest in the association of polygenic scores with outcomes and their use in potential screening programmes for example Khera et al. PMID: 30104762. Although it is too early to evidence in publications, that work in particular has to our knowledge attracted interest from the actuarial profession including active follow-up work. Furthermore, the present work will be presented by Dr Joshi FoIFA to a sessional meeting of the UK actuarial profession on 21 January 2019. To contextualise, the score performs in line with the UK insurers leading existing method of annuity underwriting – postcodes. We also have clarified the deciles hazard (only) remains constant and tried to be clear we are not advocating genetic testing for insurance, but highlighting the need for regulation to be updated.

Minor Comments:

– The Title of the manuscript is a bad representation of the results described in the manuscript, especially the part "reveal basis in modern risks", which is merely speculative. Hence, the authors should come up with a Title that better carries the load.

Accepted. We have revised the title on the "predicts" and "risks" aspects.

– Introduction section: The narrow-sense heritability for longevity mentioned in the study by Kaplanis et al. is 12.2% and since they compared this heritability with the previously published ones, it would be good to also mention that one here (instead of the 16.1%).

Agreed. This has been updated.

– Results paragraph two: Reference 10 should be reference 7.

Thank you for spotting this. It has now been superseded by rewriting of the Introduction.

– I think the word "causal" opening sentence of subsection “Implication of causal genes and methylation sites” is misleading, since the authors are not able to prove causality for their variants with the analyses they performed.

We have weakened the wording slightly, and agree with the reviewer that we have not shown the variants are causal, however we are confident the loci are causal, and suggest the genes are causal through use of mendelian randomisation on the genes/loci. Nonetheless, we have weakened the wording slightly, as MR does rely on the no horizontal pleiotropy assumption, which is highly plausible here, but not provable even for in cis action, and is still valid for variants in LD with the causal variant. Subsection “Causal genes and methylation sites”.

– The numbering of the Supplementary Tables is not in line with the text.

Apologies, we have now corrected this.

https://doi.org/10.7554/eLife.39856.048

Article and author information

Author details

  1. Paul RHJ Timmers

    Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Formal analysis, Investigation, Visualization, Writing—original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5197-1267
  2. Ninon Mounier

    1. Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland
    2. Swiss Institute of Bioinformatics, Lausanne, Switzerland
    Contribution
    Formal analysis, Investigation, Visualization, Writing—original draft
    Competing interests
    No competing interests declared
  3. Kristi Lall

    1. Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
    2. Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia
    Contribution
    Formal analysis, Investigation, Writing—original draft
    Competing interests
    No competing interests declared
  4. Krista Fischer

    1. Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
    2. Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia
    Contribution
    Conceptualization, Formal analysis, Supervision, Investigation, Visualization, Methodology, Writing—original draft
    Competing interests
    No competing interests declared
  5. Zheng Ning

    Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
    Contribution
    Formal analysis, Visualization, Writing—review and editing
    Competing interests
    No competing interests declared
  6. Xiao Feng

    State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
    Contribution
    Formal analysis, Visualization, Writing—review and editing
    Competing interests
    No competing interests declared
  7. Andrew D Bretherick

    MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Software, Formal analysis, Writing—review and editing
    Competing interests
    No competing interests declared
  8. David W Clark

    Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Software, Formal analysis, Writing—review and editing
    Competing interests
    No competing interests declared
  9. eQTLGen Consortium

    Competing interests
    No competing interests declared
    1. M Agbessi, Ontario Institute for Cancer Research, Toronto, Canada
    2. H Ahsan, Department of Public Health Sciences, University of Chicago, Chicago, United States
    3. I Alves, Ontario Institute for Cancer Research, Toronto, Canada
    4. A Andiappan, Singapore Immunology Network, Agency for Science, Technology and Research, Singapore, Singapore
    5. P Awadalla, Ontario Institute for Cancer Research, Toronto, Canada
    6. A Battle, Department of Computer Science, Johns Hopkins University, Baltimore, United States
    7. MJ Bonder, Department of Genetics University, Medical Centre Groningen, Groningen, The Netherlands
    8. D Boomsma, Vrije Universiteit, Amsterdam, The Netherlands
    9. M Christiansen, Cardiovascular Health Research Unit, University of Washington, Seattle, United States
    10. A Claringbould, Department of Genetics University, Medical Centre Groningen, Groningen, The Netherlands
    11. P Deelen, Department of Genetics University, Medical Centre Groningen, Groningen, The Netherlands
    12. J van Dongen, Vrije Universiteit, Amsterdam, The Netherlands
    13. T Esko, Estonian Genome Center, University of Tartu, Tartu, Estonia
    14. M Favé, Ontario Institute for Cancer Research, Toronto, Canada
    15. L Franke, Department of Genetics University, Medical Centre Groningen, Groningen, The Netherlands
    16. T Frayling, Exeter Medical School, University of Exeter, Exeter, United Kingdom
    17. SA Gharib, Department of Medicine, University of Washington, Seattle, United States
    18. G Gibson, School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States
    19. G Hemani, MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom
    20. R Jansen, Vrije Universiteit, Amsterdam, The Netherlands
    21. A Kalnapenkis, Estonian Genome Center, University of Tartu, Tartu, Estonia
    22. S Kasela, Estonian Genome Center, University of Tartu, Tartu, Estonia
    23. J Kettunen, University of Helsinki, Helsinki, Finland
    24. Y Kim, Department of Computer Science, Johns Hopkins University, Baltimore, United States
    25. H Kirsten, Institut für Medizinische Informatik, Statistik und Epidemiologie, LIFE – Leipzig ResearchCenter for Civilization Diseases, Universität Leipzig, Leipzig, Germany
    26. P Kovacs, IFB Adiposity Diseases, Universität Leipzig, Leipzig, Germany
    27. K Krohn, Interdisciplinary Center for Clinical Research, Faculty of Medicine, Universität Leipzig, Leipzig, Germany
    28. J Kronberg-Guzman, Estonian Genome Center, University of Tartu, Tartu, Estonia
    29. V Kukushkina, Estonian Genome Center, University of Tartu, Tartu, Estonia
    30. Z Kutalik, Lausanne University Hospital, Lausanne, Switzerland
    31. M Kähönen, Department of Clinical Physiology and Faculty of Medicine and Life Sciences, Tampere University Hospital and University of Tampere, Tampere, Finland
    32. B Lee, Singapore Immunology Network, Agency for Science, Technology and Research, Singapore, Singapore
    33. T Lehtimäki, Department of Clinical Chemistry, Fimlab Laboratories and Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
    34. M Loeffler, Institut für Medizinische Informatik, Statistik und Epidemiologie, LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig, Leipzig, Germany
    35. U Marigorta, School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States
    36. A Metspalu, Estonian Genome Center, University of Tartu, Tartu, Estonia
    37. J van Meurs, Department of Internal Medicine, Erasmus Medical Centre, Rotterdam, The Netherlands
    38. L Milani, Estonian Genome Center, University of Tartu, Tartu, Estonia
    39. M Müller-Nurasyid, Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
    40. M Nauck, Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany
    41. M Nivard, Vrije Universiteit, Amsterdam, The Netherlands
    42. B Penninx, Vrije Universiteit, Amsterdam, The Netherlands
    43. M Perola, National Institute for Health and Welfare, University of Helsinki, Helsinki, Finland
    44. N Pervjakova, Estonian Genome Center, University of Tartu, Tartu, Estonia
    45. B Pierce, Department of Public Health Sciences, University of Chicago, Chicago, United States
    46. J Powell, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
    47. H Prokisch, Institute of Human Genetics, Helmholtz Zentrum München, München, Germany
    48. BM Psaty, Departments of Epidemiology, Medicine, and Health Services, Cardiovascular Health Research Unit, University of Washington, Seattle, United States
    49. O Raitakari, Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital and University of Turku, Turku, Finland
    50. S Ring, School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom
    51. S Ripatti, University of Helsinki, Helsinki, Finland
    52. O Rotzschke, Singapore Immunology Network, Agency for Science, Technology and Research, Singapore, Singapore
    53. S Ruëger, Lausanne University Hospital, Lausanne, Switzerland
    54. A Saha, Department of Computer Science, Johns Hopkins University, Baltimore, United States
    55. M Scholz, Institut für Medizinische InformatiK, Statistik und Epidemiologie, LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig, Leipzig, Germany
    56. K Schramm, Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
    57. I Seppälä, Department of Clinical Chemistry, Fimlab Laboratories and Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
    58. M Stumvoll, Department of Medicine, Universität Leipzig, Leipzig, Germany
    59. P Sullivan, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
    60. A Teumer, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
    61. J Thiery, Institute for Laboratory Medicine, LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig, Leipzig, Germany
    62. L Tong, Department of Public Health Sciences, University of Chicago, Chicago, United States
    63. A Tönjes, Department of Medicine, Universität Leipzig, Leipzig, Germany
    64. J Verlouw, Department of Internal Medicine, Erasmus Medical Centre, Rotterdam, The Netherlands
    65. PM Visscher, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
    66. U Võsa, Department of Genetics, University Medical Centre Groningen, Groningen, The Netherlands
    67. U Völker, Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, Germany
    68. H Yaghootkar, Exeter Medical School, University of Exeter, Exeter, United Kingdom
    69. J Yang, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
    70. B Zeng, School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States
    71. F Zhang, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
    72. M Agbessi, Ontario Institute for Cancer Research, Toronto, Canada
    73. H Ahsan, Department of Public Health Sciences, University of Chicago, Chicago, United States
    74. I Alves, Ontario Institute for Cancer Research, Toronto, Canada
    75. A Andiappan, Singapore Immunology Network, Agency for Science, Technology and Research, Singapore, Singapore
    76. P Awadalla, Ontario Institute for Cancer Research, Toronto, Canada
    77. A Battle, Department of Computer Science, Johns Hopkins University, Baltimore, United States
    78. MJ Bonder, Department of Genetics University, Medical Centre Groningen, Groningen, The Netherlands
    79. D Boomsma, Vrije Universiteit, Amsterdam, The Netherlands
    80. M Christiansen, Cardiovascular Health Research Unit, University of Washington, Seattle, United States
    81. A Claringbould, Department of Genetics University, Medical Centre Groningen, Groningen, The Netherlands
    82. P Deelen, Department of Genetics University, Medical Centre Groningen, Groningen, The Netherlands
    83. J van Dongen, Vrije Universiteit, Amsterdam, The Netherlands
    84. T Esko, Estonian Genome Center, University of Tartu, Tartu, Estonia
    85. M Favé, Ontario Institute for Cancer Research, Toronto, Canada
    86. L Franke, Department of Genetics University, Medical Centre Groningen, Groningen, The Netherlands
    87. T Frayling, Exeter Medical School, University of Exeter, Exeter, United Kingdom
    88. SA Gharib, Department of Medicine, University of Washington, Seattle, United States
    89. G Gibson, School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States
    90. G Hemani, MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom
    91. R Jansen, Vrije Universiteit, Amsterdam, The Netherlands
    92. A Kalnapenkis, Estonian Genome Center, University of Tartu, Tartu, Estonia
    93. S Kasela, Estonian Genome Center, University of Tartu, Tartu, Estonia
    94. J Kettunen, University of Helsinki, Helsinki, Finland
    95. Y Kim, Department of Computer Science, Johns Hopkins University, Baltimore, United States
    96. H Kirsten, Institut für Medizinische Informatik, Statistik und Epidemiologie, LIFE – Leipzig ResearchCenter for Civilization Diseases, Universität Leipzig, Leipzig, Germany
    97. P Kovacs, IFB Adiposity Diseases, Universität Leipzig, Leipzig, Germany
    98. K Krohn, Interdisciplinary Center for Clinical Research, Faculty of Medicine, Universität Leipzig, Leipzig, Germany
    99. J Kronberg-Guzman, Estonian Genome Center, University of Tartu, Tartu, Estonia
    100. V Kukushkina, Estonian Genome Center, University of Tartu, Tartu, Estonia
    101. Z Kutalik, Lausanne University Hospital, Lausanne, Switzerland
    102. M Kähönen, Department of Clinical Physiology and Faculty of Medicine and Life Sciences, Tampere University Hospital and University of Tampere, Tampere, Finland
    103. B Lee, Singapore Immunology Network, Agency for Science, Technology and Research, Singapore, Singapore
    104. T Lehtimäki, Department of Clinical Chemistry, Fimlab Laboratories and Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
    105. M Loeffler, Institut für Medizinische Informatik, Statistik und Epidemiologie, LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig, Leipzig, Germany
    106. U Marigorta, School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States
    107. A Metspalu, Estonian Genome Center, University of Tartu, Tartu, Estonia
    108. J van Meurs, Department of Internal Medicine, Erasmus Medical Centre, Rotterdam, The Netherlands
    109. L Milani, Estonian Genome Center, University of Tartu, Tartu, Estonia
    110. M Müller-Nurasyid, Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
    111. M Nauck, Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany
    112. M Nivard, Vrije Universiteit, Amsterdam, The Netherlands
    113. B Penninx, Vrije Universiteit, Amsterdam, The Netherlands
    114. M Perola, National Institute for Health and Welfare, University of Helsinki, Helsinki, Finland
    115. N Pervjakova, Estonian Genome Center, University of Tartu, Tartu, Estonia
    116. B Pierce, Department of Public Health Sciences, University of Chicago, Chicago, United States
    117. J Powell, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
    118. H Prokisch, Institute of Human Genetics, Helmholtz Zentrum München, München, Germany
    119. BM Psaty, Departments of Epidemiology, Medicine, and Health Services, Cardiovascular Health Research Unit, University of Washington, Seattle, United States
    120. O Raitakari, Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital and University of Turku, Turku, Finland
    121. S Ring, School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom
    122. S Ripatti, University of Helsinki, Helsinki, Finland
    123. O Rotzschke, Singapore Immunology Network, Agency for Science, Technology and Research, Singapore, Singapore
    124. S Ruëger, Lausanne University Hospital, Lausanne, Switzerland
    125. A Saha, Department of Computer Science, Johns Hopkins University, Baltimore, United States
    126. M Scholz, Institut für Medizinische InformatiK, Statistik und Epidemiologie, LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig, Leipzig, Germany
    127. K Schramm, Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
    128. I Seppälä, Department of Clinical Chemistry, Fimlab Laboratories and Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
    129. M Stumvoll, Department of Medicine, Universität Leipzig, Leipzig, Germany
    130. P Sullivan, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
    131. A Teumer, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
    132. J Thiery, Institute for Laboratory Medicine, LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig, Leipzig, Germany
    133. L Tong, Department of Public Health Sciences, University of Chicago, Chicago, United States
    134. A Tönjes, Department of Medicine, Universität Leipzig, Leipzig, Germany
    135. J Verlouw, Department of Internal Medicine, Erasmus Medical Centre, Rotterdam, The Netherlands
    136. PM Visscher, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
    137. U Võsa, Department of Genetics, University Medical Centre Groningen, Groningen, The Netherlands
    138. U Völker, Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, Germany
    139. H Yaghootkar, Exeter Medical School, University of Exeter, Exeter, United Kingdom
    140. J Yang, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
    141. B Zeng, School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States
    142. F Zhang, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
  10. Xia Shen

    1. Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, United Kingdom
    2. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
    3. State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
    Contribution
    Conceptualization, Formal analysis, Supervision, Investigation, Visualization, Methodology, Writing—original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4390-1979
  11. Tõnu Esko

    1. Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
    2. Broad Institute of Harvard and MIT, Cambridge, United States
    Contribution
    Resources, Supervision, Funding acquisition, Writing—original draft
    Competing interests
    Reviewing Editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1982-6569
  12. Zoltán Kutalik

    1. Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland
    2. Swiss Institute of Bioinformatics, Lausanne, Switzerland
    Contribution
    Conceptualization, Resources, Software, Investigation, Methodology, Writing—original draft
    Competing interests
    No competing interests declared
  13. James F Wilson

    1. Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, United Kingdom
    2. MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing—original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5751-9178
  14. Peter K Joshi

    1. Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, United Kingdom
    2. Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland
    Contribution
    Conceptualization, Supervision, Validation, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    peter.joshi@ed.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6361-5059

Funding

Medical Research Council (DTP in Precision Medicine MR/N013166/1,HGU QTL in health and disease)

  • Paul RHJ Timmers
  • Andrew D Bretherick
  • David W Clark
  • eQTLGen Consortium
  • James F Wilson

Estonian Research Competency Council (PUT 1665)

  • Kristi Lall
  • Krista Fischer

Wellcome Trust (PhD Training Fellowship for Clinicians)

  • Andrew D Bretherick

Edinburgh Clinical Academic Track (204979/Z/16/Z)

  • Andrew D Bretherick

Svenska Forskningsrådet Formas (2014-00371)

  • Xia Shen

Svenska Forskningsrådet Formas (2017-02543)

  • Xia Shen

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (31003A_169929)

  • Zoltán Kutalik

SystemsX.ch (51RTP0_151019)

  • Zoltán Kutalik

AXA Research Fund

  • Peter K Joshi

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank the UK Biobank Resource, approved under application 8304; we acknowledge funding from the UK Medical Research Council Human Genetics Unit, Wellcome Trust PhD Training Fellowship for Clinicians - the Edinburgh Clinical Academic Track (ECAT) programme (204979/Z/16/Z), the Medical Research Council Doctoral Training Programme in Precision Medicine (MR/N013166/1) and the AXA research fund. We thank Tom Haller of the University of Tartu, for tailoring RegScan so we could use it with compressed files (Haller et al., 2015). We would also like to thank the researchers, funders and participants of the LifeGen consortium (Joshi et al., 2017).

Ethics

Human subjects: This work used existing datasets, for which ethical approval had been gathered for genetic investigation at the time of collection.

Publication history

  1. Received: July 5, 2018
  2. Accepted: November 20, 2018
  3. Version of Record published: January 15, 2019 (version 1)
  4. Version of Record updated: January 16, 2019 (version 2)

Copyright

© 2019, Timmers et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,765
    Page views
  • 409
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Genetics and Genomics
    2. Stem Cells and Regenerative Medicine
    Cheen Euong Ang et al.
    Research Article Updated
    1. Genetics and Genomics
    Anna Lozano-Ureña, Sacri R Ferrón
    Insight