Introduction

Given the decline in the force of natural selection with age, genes that influence aging and age-related diseases are likely to be selected for their influence on early life events 1. The theory of antagonistic pleiotropy posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction may have costly consequences later, contributes to the aging process and age-related diseases 1-3. Though evidence for antagonistic pleiotropy has been observed in invertebrate models, evidence for causal relationships in mammals, especially humans, is largely lacking 4. The timing of reproductive events, such as menarche and childbirth, has long been recognized as a crucial aspect of human life history and evolution. However, accumulating evidence suggests that these reproductive milestones may have far-reaching implications 5,6.

Considering reproductive events are largely regulated by genetic factors that can manifest the physiological outcome later in life, we hypothesized that earlier or later onset of menarche and first childbirth could reflect broader genetic influences on longevity and disease susceptibility, serving as proxies for biological aging processes (fig. S1 and fig. S2). Two-sample and two-step Mendelian randomization (MR) analyses were adopted to explore the genetic causal associations using the inverse variance weighted (IVW) model. After preprocessing, there were 209 and 33 single nucleotide polymorphisms (SNPs) included in MR analyses for the two exposures, age at menarche 7 and age at first birth 8, respectively.

Later menarche was genetically associated with later aging outcomes

Since parental ages at death may reflect the genetic heritability of lifespan, parental ages at death were used as general aging outcomes 9. Compared to early menarche, later age at menarche was significantly associated with later parental ages at death for mothers or fathers (OR=1.023, P=1.84×10−4; OR=1.020, P=7.88×10−4 for father and mother’s ages at death respectively) (Fig. 1A and Table S1-S3). Aging is the biggest risk factor for frailty and several age-related diseases. Later age at menarche was associated with a lower frailty index, which was calculated using a set of 49 self-reported questionnaire items on traits covering health, presence of diseases and disabilities, and mental well-being 10 (OR=0.977, P=5.75×10−3). After excluding the instrumental variables (IVs) associated with body mass index (BMI), the significant association between later menarche and outcomes remained based on the weighted median model (WMM) for frailty index (OR=0.972, P=0.0202) and based on IVW for parental ages at death (OR=1.015, P=2.46×10−2; OR=1.012, P=3.87×10−2). Next, we examined the associations between the age of menarche and organ aging or age-related diseases (Fig. 1B). Later age of menarche was associated with later menopause onset (OR=1.121, P=2.11×10−2 with WMM), lower risks of late-onset Alzheimer’s disease (LOAD) (OR=0.897, P=3.88×10−2), chronic heart failure (CHF) (OR=0.953, P=2. 80×10−2), essential hypertension (OR=0.990, P=1.99×10−8), facial aging (OR=0.983, P=2.16×10−9), and early onset chronic obstructive pulmonary disease (COPD) (OR=0.838, P=9.50×10−4) and mildly higher risk of osteoporosis (OR=1.001, P=2.07×10−2). These associations were still significant after excluding SNPs associated with BMI, except for CHF. One potential mechanism of antagonistic pleiotropy is accelerated cell growth and division; therefore, we examined the relationship between menarche and certain cancers. Later age at menarche was associated with lower risks of breast cancer (OR=0.859, P=1.53×10−2) and endometrial cancer (OR=0.896, P=9.02×10−4) compared with early menarche (Fig. 1B).

Genetic associations between exposures and outcomes.

(A), genetic associations between exposures and general aging. Later age at menarche was associated with a lower frailty index and higher parental ages at death. Later age at first birth was associated with lower frailty index and GrimAge and higher parental ages at death. (B), genetic associations between age at menarche and outcomes of organ aging and diseases as well as organ cancer. Later age at menarche was associated with later age at menopause onset and lower risks of LOAD, CHF, essential hypertension, facial aging, early onset COPD, breast cancer, and endometrial cancer. (C), genetic associations between age at first birth and outcomes of organ aging and diseases as well as organ cancer. Later age at first birth was associated with later age at menopause onset and lower risks of type 2 diabetes, CHF, essential hypertension, gastrointestinal or abdominal disease, and cervical cancer. The significant associations were not detected after BMI-related SNPs excluded for outcomes of CHF and cervical cancer. BMI included, BMI-related SNPs included; CHF, chronic heart failure; GAD, gastrointestinal or abdominal disease.

Later age at first birth was genetically associated with later aging outcomes

Next, we examined the associations between the age of childbirth and age-related outcomes. Compared to early first birth, later age at first birth was associated with lower frailty index (OR=0.937, P=5.93×10−11) and GrimAge, a measure of epigenetic aging (OR=0.447, P=4.37×10−2), and older parental ages at death (OR=1.035, P=6.88×10−5 for father’s age at death; OR=1.034, P=1.78×10−4 for mother’s age at death) (Fig. 1A). Similar to results with the age of menarche, later age at first birth was associated with later menopause onset (OR=1.306, P=3.08×10−4), lower risks of type 2 diabetes (OR=0.890, P=3.65×10−6), CHF (OR=0.926, P=2.56×10−2), essential hypertension (OR=0.989, P=1.54×10−4), and gastrointestinal or abdominal disease (GAD) (OR=0.991, P=1.29×10−3). Most significant associations remained after the BMI-related SNPs were excluded, except for CHF (Fig. 1C and Table S3). Furthermore, later age at first birth was also significantly associated with a lower risk of cervical cancer (OR=0.999, P=1.52×10−2) but not breast and endometrial cancers (Fig. 1C).

BMI is an important mediator in significant associations

As BMI is an important modulator of aging 11, we examined its role in explaining these associations. Based on the two-sample MR analyses, significant associations between early menarche and CHF, and between early first birth and CHF and cervical cancer, were not observed after excluding SNPs related to BMI (Fig. 1). To estimate the effect of BMI as the mediator, we further conducted two-step MR. Exposures of age of menarche and age at first birth, and outcomes of frailty index, type 2 diabetes, and CHF, were included in the two-step MR analysis. Later ages of menarche and first birth were associated with lower BMI (OR=0.948, P=2.73×10−13; OR=0.956, P=6.85×10−5). Higher BMI was associated with higher risks of type 2 diabetes (OR=2.391, P=4.65×10−156), CHF (OR=1.690, P=1.67×10−44), and cervical cancer (OR=1.001, P=3.16×10−2). Significant pleiotropy was detected in the association between BMI and frailty index. Two-step MR showed that BMI significantly mediated the associations between menarche and, type 2 diabetes or CHF (Table S4). BMI also significantly mediated the associations between first birth and, type 2 diabetes, CHF, or cervical cancer (Table S4). As both early menarche and first birth were associated with higher BMI, the BMI had a significant effect to explain the associations partially. Results of colocalization analysis showed 5 SNPs associated with both the exposures and outcomes (Table S5).

Mechanisms explaining antagonistic pleiotropy

We identified 128 significant SNPs in the genes from the MR analysis for their association between age of menarche or first birth with 15 aging and age-related disease outcomes. These SNPs were then listed according to (i) the number of study outcomes they were associated with and (ii) their frequency of occurrence in the European and global population. We chose to depict only the SNPs that were associated with greater than three outcomes as shown in Fig. 2. The SNP rs2003476 in the CRTC1 gene was found to be associated with seven aging and age-related disease outcomes (frailty index, LOAD, CHF, father’s age at death, mother’s age at death, breast, and endometrial cancer). Previous studies have indicated that CRTC1 transcription domain is linked with extending the lifespan in C. elegans 12 and a decrease in CRTC1 levels was found to be associated with human AD 13. Two other genes involved in the glutathione metabolism pathway, CHAC1 and GGT7, were also found to be associated with four different aging outcomes (CHAC1 was associated with frailty index, diabetes, COPD, and BMI; GGT7 was associated with father’s age at death, hypertension, breast cancer and BMI) (Fig. 2). Importantly, the SNP in CHAC1 is a missense variant, suggesting its impact on protein translation and potential contribution to disease biology. CHAC1 has recently been implicated in age-related macular degeneration 14 and muscle wasting 15. Posttranslational protein-lipid modification processes are known to contribute to aging and age-related diseases 16. N-myristoylation is one such process that is catalyzed by NMT (N-myristoyltransferase) 17, another intriguing gene candidate identified in our analysis. We found an association of NMT1 with three aging outcomes: T2D, CHF, and endometrial cancer. Various myristoylated proteins have been implicated in diverse intracellular signaling pathways 18 including AMP-activated protein kinase (AMPK) 19,20. It has been shown that NMT1 exerts synovial tissue-protective functions by promoting the recruitment of AMPK to lysosomes and inhibiting mTORC1 signaling 19. In addition, drugs targeting NMTs (NMT1 and NMT2) have been suggested to be potent senolytics 21 and a target for treating or preventing various diseases such as cancer 22,23 and heart failure 24. Furthermore, modulating the evolutionarily conserved N-myristoyl transferase NMT1 was shown to extend lifespan in yeast 25. The full table of SNPs and their population frequency and association with aging outcomes is provided in Table S6.

List and heatmap of SNPs/genes from post-MR analysis showing an association between age at menarche and first birth with three or more aging outcomes.

Genomic region location for each variant is depicted by different characters: intron variant#; 2KB upstream variant; Missense variant*; Intergenic variant&; Regulatory region variant@. Heatmap is representative of an association of each gene/SNP with different aging outcomes. Red bar represents a harmful association (-beta for Father’s DA, Mother’s DA, and menopause; +beta for other outcomes); Blue bar represents a beneficial association (+beta for Father’s DA, Mother’s DA, and menopause; −beta for other outcomes); White bar represents no association. AFB; age at first birth; FI, frailty index; Father’s DA, father’s age at death; Mother’s DA, mother’s age at death; Menopause, age at menopause; LOAD, late-onset Alzheimer’s disease; OS, osteoporosis; T2D, type II diabetes; CHF, chronic heart failure; EH, essential hypertension; FA, facial aging; COPD, chronic obstructive pulmonary disease; GAD, gastrointestinal or abdominal disease; BC, breast cancer; EC, endometrial cancer; BMI, body mass index.

Canonical signaling pathways analysis

A total of 498 canonical signaling pathways related to 128 SNPs/gene outcomes were identified based on the Ingenuity Pathway Knowledge Base (IPKB). After ranking the identified canonical signaling pathways according to their adjusted P-values, the top 25 canonical signaling pathways with a p-value < 10−2 enriched by SNPs/genes involved in the age of menarche and first birth are represented in Fig. 3A. The top enriched canonical signaling pathways fell into these broader categories: (1) Developmental and cellular signaling pathways such as BMP, IGF-1, and growth hormone signaling; (2) Neuronal signaling and neurological disorders that include amyloid processing, activation of NMDA receptors and postsynaptic events, and glioma signaling; (3) Metabolic signaling pathways such as insulin receptor, mTOR, and leptin signaling; (4) Immunological and inflammatory signaling pathways, which include lymphotoxin β receptor and TREM1 signaling; (5) Cardiovascular and muscular signaling pathways including the role of NFAT in cardiac hypertrophy. In addition to canonical pathways, SNPs/genes were shown to be further associated with diseases and functions. IPA showed a total of 80 diseases and functions associated with age at menarche and first birth. These associated diseases and functions were rated according to their adjusted P-values, and the top 20 enriched categories with a P-value < 10−5 were selected and depicted in Fig. 3B.

Ingenuity Pathway Analysis (IPA). SNPs/gene outcomes from MR analysis were subjected to IPA.

(A) Canonical signaling pathways in age at menarche and first birth. The adjusted P-values of the top 25 signaling pathways are listed. (B) Disease and functions in age at first birth and menarche. The adjusted P-value of the top 20 significantly involved diseases and functions are listed. (C) IPA network 1 and (D) IPA network 2. The genes from our post-MR analysis are in bold. Solid lines indicate direct connections, while dotted lines indicate indirect connections (circular arrows mean influence itself). Color coding: pink-receptors, green-enzymes, purple-channels, orange-other proteins.

Gene network analysis and drug interactions

Besides pathway and disease associations, the 128 SNPs/genes were further connected to identify 11 functional networks using IPA. These IPA networks were ranked based on their consistency scores. The top 5 networks included a range of 13-17 genes with scores above 25, indicating robust regulatory analysis 26 (Table S7). The top networks were mainly connected to the following functions: Cellular interaction/signaling/development (Fig. 3C), cell death and survival, cancer, gastrointestinal, reproductive (Fig. 3C), neurological (Fig. 3D), and cardiovascular system, development, function, diseases. These results support the notion that dynamic cellular interactions and the development of various vital organ systems influenced by age at menarche and first birth also influence age-related diseases. The two top IPA networks were further outlined in Fig. 3C and 3D. To further explore the drugs targeting genes associated with age at menarche and first birth, we conducted an analysis using ChEMBL27 and DrugBank databases28. This examination revealed that a total of 11 FDA-approved drugs target the identified genes (PRKAG2, SCN2A, DPYD, MC3R, AKT, MAPK, KCNK9, TACR3, HTR4, KCNB2, RXRG) (Table S8).

Population validation for associations between age at menarche, or first live birth, or number of births with age-related outcomes

Next, we validated the genetic associations in the population cohort of the UK Biobank. 264,335, 184,481, and 259,758 participants were included for independent variables of age at menarche, age at first live birth, and number of births, respectively (Table S9). Ages at menarche were divided into 5 age groups, <11y, 11-12y, 13-14y, 15-16y, and >16y. The probability of living to 80 years old, having menopause ≥50 years old, risk of diabetes, high blood pressure (HBP), heart failure, COPD, breast cancer, endometrial cancer, and obesity were compared among the 5 age groups of menarche. Each of the outcomes was compared between the highest and lowest education levels and 3 BMI categories (18.5-24.9, 25-29.9, and ≥30) among the 5 age groups (Fig. 4A-I). The ORs for each age group were calculated based on the coefficients (β) in logistic regression after including confounders of smoking, alcohol, education, and BMI. Compared to the group with menarche under 11 years old, which generally had the highest risk, females with menarche at 13-14y had the highest probability of living to 80 years old (OR=1.412, P<0.001) and having menopause ≥50 years old (OR=1.146, P<0.001), and had the lowest risk of diabetes (OR=0.773, P<0.001), heart failure (OR=0.835, P<0.001), and COPD (OR=0.728, P<0.001). The group with menarche at 15-16y had the lowest risks of high blood pressure (HBP) (OR=0.793, P<0.001) and obesity (OR=0.337, P<0.001). Females with menarche >16y had the lowest risks of breast cancer (OR=0.832, P<0.05) and endometrial cancer (OR=0.641, P<0.05) compared to females with menarche <11y (Table S10).

The distributions of outcome age and risk according to age group of menarche and fist live birth as well as number of births before and after correcting confounders.

Condition Baseline, no confounder was corrected. Conditions Highest education and Lowest education, confounders of education, smoking, and drinking were corrected, and bars were painted based on most smoking and drinking situations with the highest and lowest education levels. Conditions Median education with BMI 18.5-24.9, Median education with BMI 25-29.9, and Median education with BMI 30-, confounders of education, smoking, drinking, and BMI were corrected, and bars were painted based on median education and most smoking and drinking situation with 3 BMI categories. BMI, body mass index; HBP, high blood pressure; COPD, chronic obstructive pulmonary disease; AD, Alzheimer’s disease.

Ages at first birth were divided into 5 age groups, <21y, 21-25y, 26-30y, 31-35y, and >35y. The probability of living to 80 years old, having menopause ≥50 years old, risk of diabetes, AD, HBP, heart failure, cerebral infarction, digestive system disease, cervical cancer, and obesity were compared among the 5 age groups of first live birth. Compared to the group with first live birth under 21 years old, which generally had the highest risks, females with first live birth at 21-25y had the highest probability to live to 80 years old (OR=1.667, P<0.001), and females with first live birth at 26-30y had the highest probability to have menopause ≥50 years old (OR=1.184, P<0.001) and lowest risk of cervical cancer (OR=0.587, P<0.01).

Females with first live birth at 31-35y had the lowest risk of diabetes (OR=0.612, P<0.001). Females with first live birth >35y had the lowest risk of having AD (OR=0.474, P<0.01), HBP (OR=0.587, P<0.001), heart failure (OR=0.446, P<0.001), cerebral infarction (OR=0.485, P<0.001), digestive system disease (OR=0.569, P<0.001), and obesity (OR=0.416, P<0.001) (Fig. 4J-S).

Females with the number of births from 0 to 4 were included. Compared to no birth, females with 3 births had the highest probability of living to 80 years old (OR=1.818, P<0.001) and having menopause ≥50 years old (OR=1.562, P<0.001). Females with 4 births had the highest risk of diabetes (OR=1.399, P<0.001), AD (OR=1.676, P<0.001), HBP (OR=1.274, P<0.001), heart failure (OR=1.439, P<0.001), cerebral infarction (OR=1.470, P<0.001), digestive system disease (OR=1.370, P<0.001), and obesity (OR=1.327, P<0.001) but lowest risk of breast cancer (OR=0.869, P<0.001) (Fig. 4T-AC). The significant estimated effect (β) of each group compared to the youngest age groups of menarche and first live birth and 0 birth group and the ability to explain the variability of outcomes in regression analysis are exhibited in Table S10.

To estimate the combined effect of age at menarche and first live birth on diabetes, HBP, heart failure, and BMI≥30, participants were divided into 25 groups according to the menarche and first live birth ages. Females with menarche<11y and first live birth<21y had higher risks of diabetes (OR=1.881, P<0.001), HBP (OR= 1.436, P<0.05), heart failure (OR= 1.776, P<0.01), and BMI≥30 (OR= 4.417, P<0.001) compared to females with menarche of 13-14y and first live birth of 26-30y (Table S11 and fig. S3).

The timing of reproductive milestones such as the age of menarche and age at first birth is partly governed by genetic factors, which may exert lasting impacts on an individual’s health trajectory. Our study finds support for antagonistic pleiotropy by demonstrating that genetic variants that drive early menarche or early pregnancy accelerate several aging outcomes, including parental age, frailty index, and several age-related diseases such as diabetes and dementia. These results were validated in a cohort from the UK biobank, showing that women with early menarche, or early pregnancy, were associated with accelerated aging outcomes and increased risk of several age-related diseases. These results are also consistent with the disposable soma theory that suggests aging as an outcome tradeoff between an organism’s investment in reproduction and somatic maintenance and repair. Consistent with a recent study suggesting the relation of early first birth with a higher likelihood of frailty 29, our MR analysis provided a strong association between early age of first birth and GrimAge, a DNA methylation based marker to predict epigenetic age. Our findings align with the observed negative genetic correlation between reproductive traits and lifespan that individuals with higher polygenetic scores for reproduction have lower survivorships to age 76 30. Prior studies linking the female reproductive factors with aging are either limited to results obtained from observational studies 31,32 or are limited to few outcomes such as brain disorders 33,34. Furthermore, the origins of aging in humans have also been considered adaptive 35.

Using MR, we identified 128 SNPs associated with early menarche or first birth that significantly influence age-related outcomes. Among the top 25 enriched canonical signaling pathways, those involved in developmental and cellular processes are particularly significant in determining the timing of menarche and first childbirth, with nearly half of these pathways (12 out of 25) falling into this category. These developmental and cellular signaling pathways work together to regulate continuous growth and involution from puberty through menopause 36. Interestingly, known longevity pathways such as insulin-like growth factor 1 (IGF1), growth hormone (GH) signaling, melatonin signaling, and bone morphogenetic protein (BMP) signaling, seem to play a crucial role in regulating the timing of these reproductive events. IGF-1 signaling promotes the growth and development of reproductive organs and tissues 37-39, but is also a conserved modulator of longevity 40-43. Melatonin, known for regulating circadian rhythms, influences the timing of menarche through its effects on the hypothalamic-pituitary-gonadal axis 44 and is postulated to modulate oxidative, inflammatory, and autophagy states 45. Furthermore, different processes in early life such as cell proliferation or differentiation 46, ovarian function, and follicular development 47-49 are dependent on BMP signaling. However, increased BMP signaling also contributes significantly to AD pathology 50 and impairments in neurogenesis and cognitive decline associated with aging 51. Aberrant BMP signaling is also shown to be associated with age-related metabolic and cardiovascular diseases 52,53. Additionally, growth hormone signaling is not only vital for development during childhood and puberty 54 but also impacts longevity 55. Besides signaling pathways, genes in age at menarche and first childbirth were connected in gene-gene interaction networks. In IPA network 1, most genes were connected with either AKT or genes encoding for follicle-stimulating hormone (FSH), luteinizing hormone (LH), leptin receptor (LEPR), and inhibin subunit beta A (INHBA). AKT signaling is implicated in oocyte maturation and embryonic development 56 and mediates pregnancy-induced cardiac adaptive responses 57. AKT signaling also mediates age-related disease pathologies, such as AD 58. Major ovarian functions are controlled by FSH (follicular growth, cellular proliferation, and estrogen production) and LH (oocyte maturation, ovulation, and terminal differentiation of follicles) which in turn are modulated by other ovarian factors such as INHBA (a subunit of activin and inhibin) 59,60. Endocrine alterations at advanced aging or reproductive aging 61 are marked by changes in FSH and LH due to altered feedback resulting from the ovarian decline in sex steroids, inhibin A, and inhibin B production 62-64. In IPA network 2, the hub genes are connected mostly with extracellular signal-regulated kinases (ERK), Similar to AKT, ERK is also a member of the mitogen-activated kinase family (MAPK), a highly conserved signaling pathway that plays a vital role in transducing extracellular signals into a wide range of cellular responses 65,66, maintaining tissue integrity during aging 67, and mediating age-related disease pathology 58,68. These studies underscore the importance of considering that dysregulation or abnormal activation of any of these pathways could contribute to the onset of late-life diseases consistent with antagonistic pleiotropy in humans 2,69.

The thrifty gene hypothesis provides a compelling framework to explain antagonistic pleiotropy, particularly in the context of modern health challenges. The thrifty gene hypothesis suggests that genes favoring efficient energy storage were advantageous in historical periods of food scarcity, helping individuals survive through famines. However, in today’s environment of abundant food, the same genes contribute to obesity and metabolic diseases like diabetes. Similarly, in the context of antagonistic pleiotropy, we hypothesize that genes that enhance early-life reproductive success favor efficient energy storage to support reproductive health. This is supported by our observation that early pregnancy is associated with increasing BMI, a recognized key factor in systemic aging 70,71. In support of this, we found a lack of significant association between early menarche or age at first birth for CHF and cervical cancer after removal of BMI as a confounder in MR analysis. Based on regression analysis at the population level (Fig. 4), higher BMI was associated with higher risks of cardiovascular diseases and diabetes. Thus, an increase in BMI is one of the factors that explains the increased risk of age-related disease seen due to the exposure of early pregnancy or the prevalence of genes that enhance early reproductive success. These results are also consistent with calorie restriction being a robust way to extend healthspan and lifespan in many species 72.

Several limitations of our study need to be considered. Firstly, all the associations are explored at the genetic level with MR and population level with the UK Biobank. However, to confirm causal relationships, further validation through in vitro and in vivo research is essential. Secondly, though we excluded confounder-related SNPs and addressed potential pleiotropic effects, there may also be potential bias which could be addressed by extending these findings to other populations. Thirdly, significant heterogeneity and pleiotropy remained in some association analyses, which need further exploration. Given the increasing age of first childbirth in modern times, the ideal period of first childbirth for both slowing down female aging and benefiting fetal development needs more research 73. In summary, our study underscores the complex relationship between genetic legacies and modern diseases. Understanding these genetic predispositions can inform public health strategies and their relevance to age-related disease outcomes in women of various ethnic groups. For example, interventions could be tailored not only to mitigate the risks associated with early reproductive timing but also to address lifestyle factors exacerbating conditions linked to thrifty genes, like diet and physical activity.

Methods

All the MR research data is from the public genome-wide association studies (GWAS) databases of the IEU Open GWAS project (https://gwas.mrcieu.ac.uk/) and PubMed. The target analysis is based on public datasets. The population data is from the UK Biobank. No definite personal information was included. No additional ethical approval was required for our study.

Exposure data

2 traits were included as female reproductive activity exposures, involving age at menarche 7 (GWAS ID: ebi-a-GCST90029036) and age at first birth 8 (GWAS ID: ebi-a-GCST90000048).

Outcome data

The aging outcomes involved general aging, organ aging/disease, and organ cancers (Table S1). Frailty index 74 (GWAS ID: ebi-a-GCST90020053), father’s age at death75 (GWAS ID: ebi-a-GCST006700), mother’s age at death 75 (GWAS ID: ebi-a-GCST006699), and DNA methylation GrimAge acceleration 76 (GWAS ID: ebi-a-GCST90014294) were included as general aging outcomes. Specific aging diseases and aging levels included age at menopause onset 7 (GWAS ID: ebi-a-GCST90029037), late-onset Alzheimer’s disease 77 (LOAD) (PMID: 30820047), osteoporosis 78 (GWAS ID: ebi-a-GCST90038656), and type 2 diabetes 79 (GWAS ID: ebi-a-GCST90018926), chronic heart failure 79 (CHF) (GWAS ID: ebi-a-GCST90018806), essential hypertension (GWAS ID: ukb-b-12493), facial aging (GWAS ID: ukb-b-2148), eye aging 80 (PMID: 36975205), cirrhosis (GWAS ID: finn-b-CIRRHOSIS_BROAD), chronic kidney disease 81 (CKD) (GWAS ID: ebi-a-GCST008026), early onset chronic obstructive pulmonary disease (COPD) (GWAS ID: finn-b-COPD_EARLY), and gastrointestinal or abdominal disease (GAD) 78 (GWAS ID: ebi-a-GCST90038597). Organ cancers included breast cancer 79 (GWAS ID: ebi-a-GCST90018799), ovarian cancer 79 (GWAS ID: ebi-a-GCST90018888), endometrial cancer 82 (GWAS ID: ebi-a-GCST006464), and cervical cancer (GWAS ID: ukb-b-8777).

SNPs selection

We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10−8 83,84. In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r2 = 0.001 and kb = 10,000). Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 85,86 (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10−5. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded. During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 87. If the effect allele frequency (EAF) was missing in the primary dataset, EAF would be collected from dsSNP (https://www.ncbi.nlm.nih.gov/snp/) based on the population to calculate the F value.

MR analysis

All analysis was performed using R software (version 4.3.1, R Foundation for Statistical Computing, Vienna, Austria) and RStudio software (version 2023.09.0 Posit, PBC, Boston, USA) with TwoSampleMR (version 0.5.7) and MRPRESSO (version 1.0) packages.

Five two-sample MR methods were applied for analysis, including inverse variance weighted model (IVW), MR Egger regression model (MER), weighted median model (WMM), simple mode, and weighted mode. A pleiotropy test was used to check if the IVs influence the outcome through pathways other than the exposure of interest. A heterogeneity test was applied to ensure whether there is a variation in the causal effect estimates across different IVs. Significant heterogeneity test results indicate that some instruments are invalid or that the causal effect varies depending on the IVs used. MRPRESSO was applied to detect and correct potential outliers of IVs with NbDistribution = 10,000 and threshold p = 0.05. Outliers would be excluded for repeated analysis. The causal estimates were given as odds ratios (ORs) and 95% confidence intervals (CI). Leave-one-out analysis was conducted to ensure the robustness of the results by sequentially excluding each IV and confirming the direction and statistical significance of the remained SNPs.

In two-step MR, we mainly use IVW to access the causal associations between exposure and outcome, exposure and mediator, and mediator and outcome. Body mass index (BMI) (GWAS ID: ukb-b-19953) was applied as a mediator. Similar steps of two-sample MR were repeated in the analysis after excluding confounders. The mediator effect (ME) and direct effect (DE) were calculated. The MEs were calculated with the formula: beta1 × beta2; DEs were calculated according to the formula: beta – (beta1 × beta2), beta stands for the total effect obtained from the primary analysis, beta1 stands for the effect of exposure on the mediator, and beta2 stands for the effect of the mediator on the outcome.

Colocalization analysis

To improve the robustness of our research, colocalization analysis was conducted with packages gwasglue (version 0.0.0.9000), coloc (version 5.2.3), and gassocplot (version 1.0) between the exposures and outcomes revealing significant associations after BMI-related SNPs were excluded. SNPs with significant P values in single SNP OR analysis were set as target SNPs. SNP-level colocalization was conducted with 50kb 88 windows around each target SNP. EAF was set to 0.5 if it was missed in the primary datasets. The Bayesian algorithm in the coloc package generates posterior probabilities (PP) for the hypothesis that both traits are associated and share the same single causal variant at a specific locus (H4) 89. SNP with significant results in the colocalization analysis would be removed and two-sample MR analyses would be conducted again.

Target Analysis

We employed the Ingenuity Pathway Analysis (IPA) software (version 01-22-01; Ingenuity Systems; QIAGEN) to investigate various gene-related aspects associated with age at first birth and menarche. IPA is a widely used bioinformatics tool for interpreting high-throughput data 90. Briefly, the SNPs/genes from MR analysis were uploaded into Qiagen’s IPA system for core analysis and then the outcome was overlaid with the global molecular network in the Ingenuity pathway knowledge base (IPKB). IPA was performed to identify canonical pathways, diseases, and functions, and to investigate gene networks. Additionally, the Chemical Biology Database (ChEMBL) (https://www.ebi.ac.uk/chembl/), a large-scale database of bioactive molecules for drug discovery 27, and the DrugBank database 28, a comprehensive database on FDA-approved drugs, drug targets, mechanisms of action, and interactions (https://go.drugbank.com/), were used to identify candidate genes targeted by approved drugs in clinical trials or under current development phases.

Population validation

The MR results were further validated based on the UK Biobank with package stats (version 4.3.1). In addition to age at menarche and first live birth, the number of births is listed as an independent variable for the validation analysis. Based on the significant associations, regression analysis was adopted to explore the effect of independent variables and related confounders on outcomes. Ages at menarche were divided into 5 age groups, <11y, 11-12y, 13-14y, 15-16y, and >16y. Ages at first live birth were divided into 5 age groups, <21y, 21-25y, 26-30y, 31-35y, and >35y. Females with the number of births from 0 to 4 were included. Education was divided into 7 levels based on the degrees. Smoking was divided into 2 categories, ever smoking or not. Drinking was divided into 3 categories, never, previous, and current drinking. BMI was divided into four categories, <18.5, 18.5-24.9, 25-29.9, and ≥30 based on the average value of 4 records. The outcome results were preprocessed based on the baseline information collection, follow-up outcome, surgical history, and ICD-10 diagnosis summary. Logistic regression was used for the analysis of disease risks (categorical variables). The coefficients (β) in logistic regression were log-odds ratios. Although the continuous variables exhibited mild to moderate skewed distributions, considering the large sample size, linear regression was used for the analysis of age at death, menopause age, and BMI in the first step. Then logistic regression was applied to confirm the robustness of the results with the outcome of death age ≥ 80, menopause age ≥ 50, and BMI ≥ 30. Confounders for regression analysis I include education, smoking, and drinking. Confounders for regression analysis II include education, smoking, drinking, and BMI. Variance Inflation Factors (VIF) were calculated for each predictor in regression analysis to avoid multicollinearity. Values of VIF were less than 1.5 for each predictor. For outcomes of diabetes, HBP, heart failure, and BMI≥30, the combined effect of age at menarche and first birth was analyzed. 179,821 participants were divided into 25 groups according to the menarche and first live birth age groups. All other groups were compared to the group with menarche of “13-14y” and the first birth of “26-30y” in logistic regression (confounders including education, smoking, drinking, and BMI for diabetes, HBP, and heart failure; including education, smoking, and drinking for BMI≥30).

Data Availability

All data are available in the main text, supplementary materials, or public database. The population data is available through the UK Biobank.

Data and materials availability

All data are available in the main text, supplementary materials, or public database. The population data is available through the UK Biobank.

Acknowledgements

We thank the Kapahi Lab at Buck Institute for helpful discussions. This research has been conducted using the UK Biobank Resource under Application Number 151101.

Additional information

Funding

Hevolution Foundation (PK). National Institute of Health grant R01AG068288 and R01AG045835 (PK). Larry L. Hillblom Foundation (PK), Larry L. Hillblom Foundation (PS).

Author contributions

Conceptualization: YX, VT, PS, LLF, and PK. MR analysis and UK biobank validation: YX. Gene target analysis: VT. Funding acquisition: PK and PS. Writing original draft: YX, PS, and VT. All authors reviewed and edited the manuscript.

Competing interests

Authors declare that they have no competing interests.