Methylation Clocks Do Not Predict Age or Alzheimer’s Disease Risk Across Genetically Admixed Individuals

  1. Biological and Medical Informatics Program, University of California, San Francisco, San Francisco, United States
  2. Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, United States
  3. Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, United States
  4. John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, United States
  5. Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, United States
  6. The Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Coral Gables, United States
  7. Neurogenetics Research Center, Instituto Nacional de Ciencias Neurologicas, Lima, Peru
  8. Department of Internal Medicine, Universidad Central Del Caribe, Bayamón, Puerto Rico
  9. Maya Angelou Center for Health Equity, Wake Forest University, Winston-Salem, United States

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Jenny Tung
    Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
  • Senior Editor
    Pankaj Kapahi
    Buck Institute for Research on Aging, Novato, United States of America

Reviewer #1 (Public review):

Summary:

Cruz-González and colleagues draw on DNA methylation and paired genetic data from 621 participants (n=308 controls; n=313 participants with Alzheimer's Disease). The authors generate a panel of epigenetic biomarkers of aging with a primary focus on the Horvath multi-tissue clock. The authors find weaker correlations between predicted epigenetic age and chronological age in subgroups with higher African ancestry than within a subgroup identified as White. The authors then examine genetic variation as a potential source for between-group differences in epigenetic clock performance. The authors draw on a large collection of publicly available methylation quantitative trait loci datasets and find evidence for substantial overlap between clock CpGs located within the Horvath clock and methQTLs. Going further, the authors show that methQTLs that overlap with Horvath clock CpGs show greater allelic variation in African ancestral groups pointing to a potential explanation for poorer clock performance within this group.

Strengths:

This is an interesting dataset and an important research question. The authors cite issues of portability regarding polygenic risk scores as a motivation to examine between-group differences in the performance of a panel of epigenetic clocks. The authors benefit from a diverse cohort of individuals with paired genetic data and focus on a clinical phenotype, Alzheimer's disease, of clear relevance for studies evaluating age-related biomarkers.

Weaknesses:

While the authors tackle an important question using a diverse cohort the current manuscript is lacking some detail that may diminish the potential impact of this paper. For example:

(1) Information on chronological ages across groups should be reported to ensure there are no systematic differences in ages or age ranges between groups (see point below).

(2) The authors compare correlations between chronological age and epigenetic age in sub-groups within to correlations reported by Horvath (2013). Attempting to draw comparisons between these two datasets is problematic. The current study has a much smaller N (particularly for sub-group analyses) and has a more restricted age range (60-90yrs versus 0-100 yrs). Thus, is an alternative explanation simply that any weaker correlations observed in this study are driven by sample size and a restricted age range? Reporting the chronological ages (and ranges) across subgroups in the current study would help in this regard. Similarly, given the lack of association between AD status and epigenetic age (and very small effect in the white group), it may be of interest to examine the correlation between chronological age and epigenetic age in each group including the AD participants: would the between-group differences in correlations between chronological age and epigenetic be altered by increasing the sample size?

(3) The correlation between chronological age and epigenetic age, while helpful is not the most informative estimate of accuracy. Median absolute error (and an analysis of MAE across subgroups) would be a helpful addition.

(4) More information should be provided about how DNAm data were generated. Were samples from each ancestral group randomized across plates/slides to ensure ancestry and batch are not associated? How were batch effects considered? Given the relatively small sample sizes, it would be important to consider the impact of technical variation on measures of epigenetic age used in the current study. The use of principal Component-based versions of these clocks (Higgins Chen et al., 2023; Nature Aging https://doi.org/10.1038/s43587-022-00248-2) may help address concerns such concerns.

(5) Marioni et al., (2015) found a very weak cross-sectional association between DNAm Age and cognitive function (r~0.07) in a cohort of >900 participants. Given these effect sizes, I would not interpret the absence of an effect in the current study to reflect issues of portability of epigenetic biomarkers.

  1. The methQTL analyses presented are suggestive of potential genetic influence on DNAm at some Horvath CpGs. Do authors see differences in DNAm across ancestral groups at these potentially affected CpGs? This seems to be a missing piece together (e.g., estimating the likely impact of methQTL on clock CpG DNAm).

Reviewer #2 (Public review):

Summary:

This paper seeks to characterize the portability of methylation clocks across groups. Methylation clocks are trained to predict biological aging from DNA methylation but have largely been developed in datasets of individuals with primarily European ancestries. Given that genetic variation can influence DNA methylation, the authors hypothesize that methylation clocks might have reduced accuracy in non-European ancestries.

Strengths:

The authors evaluate five methylation clocks in 621 individuals from the MAGENTA study. This includes approximately 280 individuals sampled in Puerto Rico, Cuba, and Peru, as well as approximately 200 self-identified African American individuals sampled in the US. To understand how methylation clock accuracy varies with proportion of non-European ancestry, the authors inferred local ancestry for the Puerto Rican, Cuban, Peruvian, and African American cohorts. Overall, this paper presents solid evidence that methylation clocks have reduced accuracy in individuals with non-European ancestries, relative to individuals with primarily European ancestries. This should be of great interest to those researchers who seek to use methylation clocks as predictors of age-related, late-onset diseases and other health outcomes.

Weaknesses:

One clear strength of this paper is the ability to do more sophisticated analyses using the local ancestry calls for the MAGENTA study. It would be valuable to capitalize on this strength and assess portability across the genetic ancestry spectrum, as was recently advocated by Ding et al. in Nature (2023). For example, the authors could regress non-European local ancestry fraction on measures of prediction accuracy. This could paint a clearer picture of the relationship between genetic ancestry and clock accuracy, compared to looking at overall correlations within each cohort.

The authors present two possible reasons that methylation clocks might have reduced accuracy in individuals with non-European ancestries: genetic variants disrupting methylation sites (i.e. "disruptive variants"), and genetic variants influencing methylation sites (i.e. meQTLs). The authors conclude disruptive variants do not contribute to poor methylation clock portability, but the evidence in support of this conclusion is incomplete. The site frequency spectrum of disruptive variants in Figure 4 is estimated from all gnomAD individuals, and gnomAD is comprised of primarily European individuals. Thus, the observation that disruptive variants are generally rare in gnomAD does not rule them out as a source of poor clock portability in admixed individuals with non-European ancestries.

It is also unclear to what extent meQTLs impact methylation clock portability. The authors find that the frequency of meQTLs is higher in African ancestry populations, but this could reflect the fact that some of the analyzed meQTLs were ascertained in African Americans. The number of meQTL-affected methylation sites also varies widely between clocks, ranging from 6 to 271; thus, meQTLs likely impact the portability of different clocks in different ways. Overall, the paper would benefit from a more quantitative assessment of the extent to which meQTLs influence clock portability.

The paper implies that methylation clocks have an inferior ability to predict AD risk in admixed populations relative to white individuals, but the difference between white AD patients and controls is not significant when correcting for multiple testing. This nuance should be made more explicit.

Finally, this paper overlooks the possibility that environmental exposures co-vary with genetic ancestry and play a role in decreasing the accuracy of methylation clocks in genetically admixed individuals. Quantifying the impact of environmental factors is almost certainly outside of the scope of this paper. However, it is worth acknowledging the role of environmental factors to provide the field with a more comprehensive overview of factors influencing methylation clock portability. It is also essential to avoid the assumption that correlations with genetic ancestry necessarily arise from genetic causes.

Reviewer #3 (Public review):

This manuscript examines the accuracy of DNA methylation-based epigenetic clocks across multiple cohorts of varying genetic ancestry. The authors find that clocks were generally less accurate at predicting age in cohorts with large proportions of non-European (especially African) ancestry, compared to cohorts with high European ancestry proportions. They suggest that some of this effect might be explained by meQTLs that occur near CpG sites included in clocks, because these variants may be at higher frequencies (or at least different frequencies) in cohorts with high proportions of non-European ancestry relative to the training set. They also provide discussions of potential paths forward to alleviate bias and improve portability for future clock algorithms.

The topic is timely due to the increasing popularity of DNA methylation-based clocks and the acknowledgment that many algorithms (e.g., polygenic risk scores) lack portability when applied to cohorts that substantially differ in ancestry or other characteristics from the training set. This has been discussed to some degree for DNA methylation-based clocks, but could of course use more discussion and empirical attention which the authors nicely provide using an impressive and diverse collection of data.

The manuscript is clear and well-written, however, some key background was missing (e.g., what we know already about the ancestry composition of clock training sets) and most importantly several analyses would benefit from being taken one step further. For example, the main argument of the paper is that ancestry impacts clock predictions, but this is determined by subsetting the data by recruitment cohort rather than analyzing ancestry as a continuous variable. Extending some of the analyses could really help the authors nail down their hypothesized sources of lack of portability, which is critical for making recommendations to the community and understanding the best paths forward.

Author response:

Public Reviews:

Reviewer #1 (Public review):

Summary:

Cruz-González and colleagues draw on DNA methylation and paired genetic data from 621 participants (n=308 controls; n=313 participants with Alzheimer's Disease). The authors generate a panel of epigenetic biomarkers of aging with a primary focus on the Horvath multi-tissue clock. The authors find weaker correlations between predicted epigenetic age and chronological age in subgroups with higher African ancestry than within a subgroup identified as White. The authors then examine genetic variation as a potential source for between-group differences in epigenetic clock performance. The authors draw on a large collection of publicly available methylation quantitative trait loci datasets and find evidence for substantial overlap between clock CpGs located within the Horvath clock and methQTLs. Going further, the authors show that methQTLs that overlap with Horvath clock CpGs show greater allelic variation in African ancestral groups pointing to a potential explanation for poorer clock performance within this group.

Thank you for this summary.

Strengths:

This is an interesting dataset and an important research question. The authors cite issues of portability regarding polygenic risk scores as a motivation to examine between-group differences in the performance of a panel of epigenetic clocks. The authors benefit from a diverse cohort of individuals with paired genetic data and focus on a clinical phenotype, Alzheimer's disease, of clear relevance for studies evaluating age-related biomarkers.

Weaknesses:

While the authors tackle an important question using a diverse cohort the current manuscript is lacking some detail that may diminish the potential impact of this paper. For example:

(1) Information on chronological ages across groups should be reported to ensure there are no systematic differences in ages or age ranges between groups (see point below).

Thank you for pointing out this omission. The age ranges are similar across cohorts. No individuals under 60 were considered, and the average ages per cohort ranged from 72 to 76. Neither average age nor age range was consistently higher or lower in the admixed cohorts for which the clocks had lower performance compared to the White cohort. We will report the age distributions in supplementary material in the revision.

(2) The authors compare correlations between chronological age and epigenetic age in sub-groups within to correlations reported by Horvath (2013). Attempting to draw comparisons between these two datasets is problematic. The current study has a much smaller N (particularly for sub-group analyses) and has a more restricted age range (6090yrs versus 0-100 yrs). Thus, is an alternative explanation simply that any weaker correlations observed in this study are driven by sample size and a restricted age range? Reporting the chronological ages (and ranges) across subgroups in the current study would help in this regard. Similarly, given the lack of association between AD status and epigenetic age (and very small effect in the white group), it may be of interest to examine the correlation between chronological age and epigenetic age in each group including the AD participants: would the between-group differences in correlations between chronological age and epigenetic be altered by increasing the sample size?

Our conclusions about the reduced accuracy of the clocks in admixed individuals are based on comparisons within the MAGENTA cohorts, not on the comparisons to previous reports. We show significantly reduced accuracy on African American and Puerto Rican cohorts in MAGENTA compared to the White MAGENTA cohort. The reviewer is correct that the lower correlation in each of the cohorts compared to those in the Horvath study is due to the older age range of our cohort. Indeed, other studies applying the Horvath clock have seen similar correlations to those observed on the White MAGENTA cohort (Marioni et al., 2015, Horvath 2013, and Shireby et al., 2020). Following the suggestion to increase sample size, we conducted the chronological age vs. epigenetic age correlation analysis with the inclusion of AD cases. The significantly lower performance of the clock on Puerto Ricans and African Americans relative to White individuals remains after including all individuals in each cohort. We will include these results on the full cohorts in MAGENTA in the revision.

(3) The correlation between chronological age and epigenetic age, while helpful is not the most informative estimate of accuracy. Median absolute error (and an analysis of MAE across subgroups) would be a helpful addition.

We used correlation because this is commonly used to evaluate the performance of epigenetic age clocks, but we agree that direct error quantification provides a complementary perspective. We confirm that the African American and Puerto Rican cohorts have higher error than the White cohort, and we will report these comparisons in the revision.

(4) More information should be provided about how DNAm data were generated. Were samples from each ancestral group randomized across plates/slides to ensure ancestry and batch are not associated? How were batch effects considered? Given the relatively small sample sizes, it would be important to consider the impact of technical variation on measures of epigenetic age used in the current study. The use of principal Component-based versions of these clocks (Higgins Chen et al., 2023; Nature Aging https://doi.org/10.1038/s43587-022-00248-2) may help address concerns such concerns.

Thank you for pointing out the need for additional context on data generation. All omics data from the MAGENTA study were generated using protocols that aim to minimize technical artifacts and batch effects. We will add detailed protocol information will be detailed in the revision. We also thank the reviewer for their suggestion on applying the principal component clock to account for potential technical variation. We are planning to perform these analyses and include them in the revision.

(5) Marioni et al., (2015) found a very weak cross-sectional association between DNAm Age and cognitive function (r~0.07) in a cohort of >900 participants. Given these effect sizes, I would not interpret the absence of an effect in the current study to reflect issues of portability of epigenetic biomarkers.

We agree that previous links between DNAm Age and AD/cognitive function have been small in magnitude. For example, the PhenoAge paper (Levine et al., 2018) and a study using the Horvath clock (Levine et al., 2015) found age acceleration of less than a year in AD patients relative to non-demented individuals. These effects have been detected in studies with relatively small sample sizes (e.g., 700 for Levine et al. 2015 and 604 for Levine et al. 2018). Our study is of similar size, but the cohort-specific analyses have lower power. Nonetheless, we replicate the modest, but significant association with AD in the white MAGENTA cohort. We have performed power calculations and find that we have 26% power to detect an effect of this size in the Cubans, 46% for the Peruvians, 66% for the Whites, 74% for the Puerto Ricans, and 84% for the African Americans. Given the relatively high power in the Puerto Rican and African American cohorts, we suggest that the reduced accuracy of the clocks contributes to the lack of association. We will also add caveats about power and the small sample size in the revision.

  1. The methQTL analyses presented are suggestive of potential genetic influence on DNAm at some Horvath CpGs. Do authors see differences in DNAm across ancestral groups at these potentially affected CpGs? This seems to be a missing piece together (e.g., estimating the likely impact of methQTL on clock CpG DNAm).

Thank you for this excellent suggestion. We will add this analysis in the revision. This will enable us to test for further evidence for our hypothesis about the role of ancestryspecific meQTL on clock accuracy.

Reviewer #2 (Public review):

Summary:

This paper seeks to characterize the portability of methylation clocks across groups. Methylation clocks are trained to predict biological aging from DNA methylation but have largely been developed in datasets of individuals with primarily European ancestries. Given that genetic variation can influence DNA methylation, the authors hypothesize that methylation clocks might have reduced accuracy in non-European ancestries.

Strengths:

The authors evaluate five methylation clocks in 621 individuals from the MAGENTA study. This includes approximately 280 individuals sampled in Puerto Rico, Cuba, and Peru, as well as approximately 200 self-identified African American individuals sampled in the US. To understand how methylation clock accuracy varies with proportion of nonEuropean ancestry, the authors inferred local ancestry for the Puerto Rican, Cuban, Peruvian, and African American cohorts. Overall, this paper presents solid evidence that methylation clocks have reduced accuracy in individuals with non-European ancestries, relative to individuals with primarily European ancestries. This should be of great interest to those researchers who seek to use methylation clocks as predictors of agerelated, late-onset diseases and other health outcomes.

Thank you for this summary.

Weaknesses:

One clear strength of this paper is the ability to do more sophisticated analyses using the local ancestry calls for the MAGENTA study. It would be valuable to capitalize on this strength and assess portability across the genetic ancestry spectrum, as was recently advocated by Ding et al. in Nature (2023). For example, the authors could regress non-European local ancestry fraction on measures of prediction accuracy. This could paint a clearer picture of the relationship between genetic ancestry and clock accuracy, compared to looking at overall correlations within each cohort.

Thank you for this excellent suggestion. We agree that modeling portability across genetic ancestry as a spectrum would help support our conclusions. We will add this to the revision.

The authors present two possible reasons that methylation clocks might have reduced accuracy in individuals with non-European ancestries: genetic variants disrupting methylation sites (i.e., "disruptive variants") and genetic variants influencing methylation sites (i.e., meQTLs). The authors conclude disruptive variants do not contribute to poor methylation clock portability, but the evidence in support of this conclusion is incomplete. The site frequency spectrum of disruptive variants in Figure 4 is estimated from all gnomAD individuals, and gnomAD is comprised of primarily European individuals. Thus, the observation that disruptive variants are generally rare in gnomAD does not rule them out as a source of poor clock portability in admixed individuals with non-European ancestries.

Thank you for this question. The allele frequencies were so low that even if they all occurred in individuals of non-European ancestries, they would still be incredibly rare. Nonetheless, in the revision, we will make this clear by reporting ancestry-specific allele frequencies.

It is also unclear to what extent meQTLs impact methylation clock portability. The authors find that the frequency of meQTLs is higher in African ancestry populations, but this could reflect the fact that some of the analyzed meQTLs were ascertained in African Americans. The number of meQTL-affected methylation sites also varies widely between clocks, ranging from 6 to 271; thus, meQTLs likely impact the portability of different clocks in different ways. Overall, the paper would benefit from a more quantitative assessment of the extent to which meQTLs influence clock portability.

We agree that the meQTL likely influence the clocks in different ways and that the ascertainment of the meQTLs in different populations makes direct comparisons challenging. To provide mechanistic insights into the ways that meQTL influence the methylation clocks, we plan to leverage the individual-level genetic data generated for the MAGENTA individuals. This will allow us to explore whether the individuals who have the specified clock-influencing meQTL receive less accurate predictions from the methylation clocks. In addition, the new analysis of whether individuals from different cohorts have different methylation levels at clock CpGs with ancestry-variable meQTLs will help establish the differences between groups (see response to Reviewer #1 point 6). Finally, to resolve potential bias due to ascertaining some of the meQTL in African Americans, we will conduct the same analyses from the manuscript, holding out the set of meQTL from African Americans. These results will be included in the revision.

The paper implies that methylation clocks have an inferior ability to predict AD risk in admixed populations relative to white individuals, but the difference between white AD patients and controls is not significant when correcting for multiple testing. This nuance should be made more explicit.

We agree that the signal is not particularly strong in the white cohort, but the effect size is in line with previous studies. We will add power calculations and discussion to help the interpretation of these results (see response to Reviewer #1 point 5).

Finally, this paper overlooks the possibility that environmental exposures co-vary with genetic ancestry and play a role in decreasing the accuracy of methylation clocks in genetically admixed individuals. Quantifying the impact of environmental factors is almost certainly outside of the scope of this paper. However, it is worth acknowledging the role of environmental factors to provide the field with a more comprehensive overview of factors influencing methylation clock portability. It is also essential to avoid the assumption that correlations with genetic ancestry necessarily arise from genetic causes.

We entirely agree about the importance of discussing environmental exposures. We did not intend to discount them in our manuscript. We will clarify their potential role and the scope of our analyses in the revision. We expect that environmental factors certainly contribute to differences between groups. The revisions outlined above may help us better quantify the genetic contribution.

Reviewer #3 (Public review):

This manuscript examines the accuracy of DNA methylation-based epigenetic clocks across multiple cohorts of varying genetic ancestry. The authors find that clocks were generally less accurate at predicting age in cohorts with large proportions of nonEuropean (especially African) ancestry, compared to cohorts with high European ancestry proportions. They suggest that some of this effect might be explained by meQTLs that occur near CpG sites included in clocks, because these variants may be at higher frequencies (or at least different frequencies) in cohorts with high proportions of non-European ancestry relative to the training set. They also provide discussions of potential paths forward to alleviate bias and improve portability for future clock algorithms.

The topic is timely due to the increasing popularity of DNA methylation-based clocks and the acknowledgment that many algorithms (e.g., polygenic risk scores) lack portability when applied to cohorts that substantially differ in ancestry or other characteristics from the training set. This has been discussed to some degree for DNA methylation-based clocks, but could of course use more discussion and empirical attention which the authors nicely provide using an impressive and diverse collection of data.

The manuscript is clear and well-written, however, some key background was missing (e.g., what we know already about the ancestry composition of clock training sets) and most importantly several analyses would benefit from being taken one step further. For example, the main argument of the paper is that ancestry impacts clock predictions, but this is determined by subsetting the data by recruitment cohort rather than analyzing ancestry as a continuous variable. Extending some of the analyses could really help the authors nail down their hypothesized sources of lack of portability, which is critical for making recommendations to the community and understanding the best paths forward.

Thank you for these suggestions. As noted in our response to reviewer #2, we will analyze ancestry as a continuous variable in the revision. We will also add details on the training of previous clocks and previous work on clock accuracy.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation