Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species

  1. Osvaldo Villa
  2. Nicole L Stuhr
  3. Chia-an Yen
  4. Eileen M Crimmins
  5. Thalida Em Arpawong
  6. Sean P Curran  Is a corresponding author
  1. Leonard Davis School of Gerontology, University of Southern California, United States
  2. Dornsife College of Letters, Arts, and Science, Department of Molecular and Computational Biology, University of Southern California, United States
  3. Norris Comprehensive Cancer Center, University of Southern California, United States

Abstract

The influence of genetic variation on the aging process, including the incidence and severity of age-related diseases, is complex. Here, we define the evolutionarily conserved mitochondrial enzyme ALH-6/ALDH4A1 as a predictive biomarker for age-related changes in muscle health by combining Caenorhabditis elegans genetics and a gene-wide association scanning (GeneWAS) from older human participants of the US Health and Retirement Study (HRS). In a screen for mutations that activate oxidative stress responses, specifically in the muscle of C. elegans, we identified 96 independent genetic mutants harboring loss-of-function alleles of alh-6, exclusively. Each of these genetic mutations mapped to the ALH-6 polypeptide and led to the age-dependent loss of muscle health. Intriguingly, genetic variants in ALDH4A1 show associations with age-related muscle-related function in humans. Taken together, our work uncovers mitochondrial alh-6/ALDH4A1 as a critical component to impact normal muscle aging across species and a predictive biomarker for muscle health over the lifespan.

Editor's evaluation

Age-associated muscle decline is a pervasive aspect of human aging biology. Here the authors provide substantial genetic evidence implicating C. elegans mitochondrial enzyme ALH-6 (P5C dehydrogenase) in muscle oxidative stress and in the maintenance of muscle functionality late into life. The authors also analyzed databases on human aging to identify a linkage of human ALH-6 homolog ALDH4A1 and indicators of human muscle function in aging, which suggests conserved function of ALH-6/ALDH4A1 in aging and the potential for use of ALDH4A1 genetic data as a predictor for old age muscle health.

https://doi.org/10.7554/eLife.74308.sa0

eLife digest

Ageing is inevitable, but what makes one person ‘age well’ and another decline more quickly remains largely unknown. While many aspects of ageing are clearly linked to genetics, the specific genes involved often remain unidentified.

Sarcopenia is an age-related condition affecting the muscles. It involves a gradual loss of muscle mass that becomes faster with age, and is associated with loss of mobility, decreased quality of life, and increased risk of death. Around half of all people aged 80 and over suffer from sarcopenia. Several lifestyle factors, especially poor diet and lack of exercise, are associated with the condition, but genetics is also involved: the condition accelerates more quickly in some people than others, and even fit, physically active individuals can be affected.

To study the genetics of conditions like sarcopenia, researchers often use animals like flies or worms, which have short generation times but share genetic similarities with humans. For example, the worm Caenorhabditis elegans has equivalents of several human muscle genes, including the gene alh-6. In worms, alh-6 is important for maintaining energy supply to the muscles, and mutating it not only leads to muscle damage but also to premature ageing. Given this insight, Villa, Stuhr, Yen et al. wanted to determine if variation in the human version of alh-6, ALDH4A1, also contributes to individual differences in muscle ageing and decline in humans.

Evaluating variation in this gene required a large amount of genetic data from older adults. These were taken from a continuous study that follows >35,000 older adults. Importantly, the study collects not only information on gene sequences but also measures of muscle health and performance over time for each individual. Analysis of these genetic data revealed specific small variations in the DNA of ALDH4A1, all of which associated with reduced muscle health.

Follow-up experiments in worms used genetic engineering techniques to test how variation in the worm alh-6 gene could influence age-related health. The resulting mutant worms developed muscle problems much earlier than their normal counterparts, supporting the role of alh-6/ALDH4A1 in determining muscle health across the lifespan of both worms and humans.

These results have identified a key influencer of muscle health during ageing in worms, and emphasize the importance of validating effects of genetic variation among humans during this process. Villa, Stuhr, Yen et al. hope that this study will help researchers find more genetic ‘markers’ of muscle health, and ultimately allow us to predict an individual’s risk of sarcopenia based on their genetic make-up.

Introduction

Sarcopenia is defined as the age-related degeneration of skeletal muscle mass and is characterized by a progressive decline in strength and performance (Santilli et al., 2014). This syndrome is prevalent in older adults and has been estimated by large scale studies to afflict 5–13% of people aged 60–70 years and expands to 50% of those aged 80 and above (von Haehling et al., 2010). Loss of muscle function is associated with a decline in quality of life and higher mortality and morbidity rates due to increased chance of falls and fractures (Tsekoura et al., 2017; Ahmadpanah et al., 2015 ). Sarcopenia is linked to risk factors, such as a sedentary lifestyle, lack of exercise, and a diet deficient in protein and micronutrients (Tsekoura et al., 2017). However, several aspects of the molecular basis of the age-dependent decline in muscle health remain unknown.

Although age-related muscle function is clearly linked to frailty (Dent et al., 2019), previously, different etiologies of clinical weakness led to discrepancies in the definitions of sarcopenia (Batsis et al., 2013; Cruz-Jentoft et al., 2010). Furthermore, the identification of human genetic loci that influence age-related functions has traditionally been difficult to characterize due to the methodological difficulties in longitudinal assessments; the prevalence of sarcopenia for example begins in the fourth decade of life (Sayer et al., 2008). The US Health and Retirement Study (HRS) is a nationally representative survey of adults aged 50 years and older and has proven to be an invaluable dataset for investigating the normal aging processes (Fisher and Ryan, 2018; Sonnega et al., 2014; Juster and Suzman, 1995; HRS Health and Retirement Study, 2021). New skeletal muscle cutpoints for identifying elevated risk for physical disability in older adults (Janssen et al., 2004) have enabled cross-sectional analyses to identify cohorts of HRS participants with age-related decline in muscle function (e.g., grip strength basic activities of daily living [ADL] and instrumental ADL [IADL]) (MacEwan et al., 2018).

In Caenorhabditis elegans, mutation of the conserved proline catabolic gene alh-6 (88% identity to ALDH4A1 in humans) leads to premature aging and impaired muscle mitochondrial function (Pang and Curran, 2014). Proline catabolism functions in a two-step reaction, beginning with the conversion of proline to 1-pyrroline-5-carboxylate (P5C) which is catalyzed by proline dehydrogenase, PRDH-1; subsequently, P5C dehydrogenase, ALH-6, catalyzes the conversion of P5C to glutamate. alh-6 expression is observed in body wall muscle, pharyngeal muscle, and neurons (Pang and Curran, 2014), and when alh-6 is mutated, the activation of gst-4p::gfp oxidative stress reporter is predominantly observed in the body wall muscle tissue and only in postreproductive adults (Tang and Pang, 2016). alh-6 mutants have increased levels of P5C; the accumulation of this toxic metabolic intermediate leads to an increase in reactive oxygen species, including H2O2 (Pang and Curran, 2014), which then activates cytoprotective responses, impairs mitochondrial activity, and drives cellular dysfunction (Pang and Curran, 2014; Pang et al., 2014; Yen et al., 2020; Yen and Curran, 2021).

Several studies have linked disease states that drive morbidity and mortality with genomic variation through genome-wide association studies (Timmers et al., 2019; Tam et al., 2019; Manolio et al., 2009) and nonhuman models have been utilized to test how single genes can drive phenotypes that mimic the disease state in humans (Ke et al., 2021; Song et al., 2020; Teumer et al., 2019). However, biological testing of genetic association studies for the normal human aging process remains underrepresented. The recent expansion of the HRS data to include genotyping of participants has enabled scans to test associations between normal aging phenotypes and variation across genes (Liu et al., 2019). In this study, we exploit the facile genetic tractability of C. elegans with the rich genetic and phenotypic data available in the HRS to reveal genetic variation in alh-6/ALDH4A1 as a predictive indicator of muscle-related functionality in later life.

Results and discussion

While the strong induction of oxidative stress reporter activity in the musculature was linked to mutation of the mitochondrial P5C dehydrogenase gene, and not observed in other genetic mutants (Pang et al., 2014; Yen et al., 2020; Yen and Curran, 2021; Lo et al., 2017; Spatola et al., 2019; Lynn et al., 2015; Nhan et al., 2019), the breadth of genetic mutations that could induce stress responses in muscle was unknown. In order to identify additional genetic components of this age-related muscle phenotype, we performed an ethyl methanesulfonate mutagenesis screen selecting for the same age-dependent activation of the gst-4p::gfp reporter in the musculature. We screened the progeny of ~4000 mutagenized F1 animals and isolated 96 mutant animals with age-dependent activation of the gst-4p::gfp reporter restricted to the body wall musculature, which phenocopies the alh-6(lax105) mutant (Figure 1a, Figure 1—figure supplement 1). To rule out additional loss-of-function alleles of alh-6 we performed genetic complementation (cis–trans) testing with our established alh-6(lax105) allele; surprisingly, all 96 new mutations failed to complement and as such were all loss-of-function alleles of alh-6.

Figure 1 with 2 supplements see all
Mutation of alh-6 uniquely activates age-dependent and activation of the gst-4p::gfp oxidative stress reporter in muscle.

(a) Schematic representation of genetic screen for mutants that phenocopy alh-6(lax105). (b) Schematic representation of the ALH-6 protein with the molecular identity of mutants isolated and sequenced annotated. Alleles that were selected for additional functional tests of muscle function (Figure 4) are highlighted in red and the location of the canonical alh-6(lax105) allele is highlighted in green. These alleles represent all the sequenced mutations in alh-6 that were isolated from the ethyl methanesulfonate (EMS) screen. (c) Quantification of stress reporter activation in the muscle in the new alh-6 mutant alleles, as measured by the intensity of GFP fluorescence from the oxidative stress reporter gst-4p::gfp (see Figure 1—figure supplement 1 for representative images). t-Test relative to gst-4p::gfp reporter animals (control); *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001.

Figure 1—source data 1

Structure-function predictions of ALH-6 mutant proteins from computational modeling.

The impact of each missense mutation as predicted by Phyre and Missense3D (Kelley et al., 2015; Ittisoponpisan et al., 2019) and the corresponding phenotypic observations made in C. elegans.

https://cdn.elifesciences.org/articles/74308/elife-74308-fig1-data1-v2.xlsx

To catalog these mutations, we began sequencing the alh-6 genomic locus in each of the mutants isolated. After sequencing approximately half of the mutants, we noted the repeated independent isolation of several distinct molecular lesions in alh-6: E78Stop (lax903, lax918, lax930), E447K (lax916, lax920, lax929, lax934), G527R (lax914, lax932, lax933, lax947), etc. (Figure 1b). The lack of diversity in genes uncovered and the independent isolation of identical alleles multiple times from this unbiased screen strongly suggest genetic saturation and specificity of this response to animals with defective mitochondrial proline catabolism. In addition, several mutations mapped to discreet regions of the linear ALH-6 polypeptide, including G152/K153, G199/E201/G202, and E418/G419, which may define critical domains in the folded protein. Imaging at day 3 of adulthood revealed that each mutant was phenotypically identical to lax105 in the activation of the gst-4p::gfp stress reporter in the bodywall muscle (Figure 1—figure supplement 1), but with varying intensity (Figure 1c). We next mapped the location of each amino acid mutated in our panel of alh-6 mutants on the structure of the ALH-6 protein (Figure 1—figure supplement 2; Kelley et al., 2015; Ittisoponpisan et al., 2019), which enabled a prediction model of the impact of each missense mutation (Figure 1—source data 1). Most missense mutations were predicted to maintain the overall structure (no structural damage), which suggests the associated phenotypes derive from a range of reduction of function mutations. Since the degree of mitochondrial dysfunction can influence both beneficial and detrimental physiological outcomes (Wang and Hekimi, 2015; Shields et al., 2021), this collection of mutants provides a model to understand the complex role mitochondria play in organismal health over the lifespan. Taken together, these data reveal that the age-dependent and muscle activation of the gst-4p::gfp is driven specifically by mutations in mitochondrial alh-6.

Based on the striking specificity of the muscle-restricted and age-dependent activation of the gst-4p::gfp stress reporter in C. elegans harboring mutations in alh-6 (Pang and Curran, 2014; Yen et al., 2020), combined with the high degree of conservation in mitochondrial metabolism pathways across metazoans (Pang et al., 2014), we reasoned that ALDH4A1 genetic variants would associate with phenotypes indexing normative, longitudinal changes in human aging-related functionality, specifically those that involve usage of different muscle groups. To test this hypothesis, we performed gene-wide association scanning (GeneWAS) adjusting for relevant covariates and indicators of population stratification in the US HRS; a nationally representative longitudinal study of >36,000 adults over age 50 in the United States (Sonnega et al., 2014; Weir, 2013). HRS collects biological and genetic samples on subsets of participants and assesses physical and psychosocial measures of all study participants in older adulthood, including multiple measures of muscle-related functionality (Figure 2—source data 1). The human phenotypes, represented in the HRS index normative changes in aging-related physiological ability. There were 70 single nucleotide polymorphism (SNP) markers within the ALDH4A1 region that are on the Illumina Omni array representing 273 human SNPs in the gene. While measures like grip strength are more commonly used to assess muscle health, our inclusion of another phenotype represents changes in complex physiological process that are influenced by the musculature and also other systems (e.g., metrics of walking speed can also be influenced by neurological factors). As such, future work to assess the role of neuronal alh-6/ALDH4A1 will be important. Nevertheless, the observed decline in muscle-related measures with age is relevant. Overall, two associations between variants within ALDH4A1 and two phenotypes were detected and surpassed the respective empirical p value thresholds, determined by permutation testing (Tables 1 and 2). These demonstrate a pattern of association between ALDH4A1 variation and two independent phenotypes (Table 1). Because each of the SNPs within the ALDH4A1 region represents a tag, or marker SNP for human variation within the locus, this GeneWAS was unable to directly identify a causal variant; however, the indexing of variations within the same gene suggests conserved associations within this aging human cohort.

Table 1
Top SNPs associated with specific phenotypes.
PhenotypeSNP nameLocationRef. alleleMinor alleleFreq.*Scan NEffect sizep value
Grip strength decliners2866569919200185AA0.0145228−0.0459.1E−04
Gait speed decliners7760858019196968AG0.01733190.0522.5E−03
  1. *

    Freq = frequency of minor allele as reported by 1000 genomes.

  2. N = sample size of scan for the phenotype and SNP.

  3. Effect sizes provided are standardized regression coefficients.

Table 2
SNPs remaining after filtering for minor allele frequency and pruning based on linkage disequilibrium.
SNPLocationReference alleleMinor allele frequency
rs2865277819194995A0.20
rs2840517919195143A0.03
rs11128960319195492G0.03
kgp251595419195951A0.02
rs7760858019196968A0.04
rs969948519197237G0.02
rs393582419197849G0.18
rs2866569919200185A0.03
rs2849306719203333A0.35
rs642681419204173A0.19
rs3528545719205258A0.14
rs736597819206020A0.21
rs2850840719210018A0.27
rs11323207519211163G0.02
rs942671819213022A0.02
rs491198519215440G0.22
rs2858207619217295G0.02
rs1148474319219987C0.02
rs1749251819221621A0.04
rs491204419230263A0.18
rs7925105719231130A0.04

With this study design, we did not intend to find a single genetic variant that would explain functionality of a specific muscle group. Specifically, we chose to include common measures of physical functioning that index aging-related decline. We calculated phenotypes for decline in functionality over time because they are more robust for testing genetic associations, represent normative aging processes in human samples compared to single-time point assessments, and index broader human functionality. (Figure 2—source data 1). These results indicate that variants within the ALDH4A1 locus affect an individual’s performance on basic ambulatory movements such as speed of walking short distances or ability to exert hand grip strength.

rs77608580 was significantly associated with change in gait speed over time (Figure 2a). Specifically, with each additional A allele, there was an average increase in gait speed of 0.052 m per second per year compared to other same aged individuals without the allele (p value = 0.0025, surpassing the empirical p value threshold of 0.006). This was assessed among N = 3319 older individuals with a mean age of 73.0 years (standard deviation [SD] = 5.9) and mean gait speed of 0.80 m per second (SD = 0.25), or 2.6 ft/s (Figure 3a).

ALDH4A1 variants associate with human age-related phenotypes for change in muscle function.

Plot of association between variants in the ALDH4A1 gene and normative aging-related muscle decline in (a) gait speed and (b) grip strength in the US Health and Retirement Study (HRS). The x-axis shows the beta estimate for the effect of each SNP, represented by a dot, on the phenotype. The y-axis shows the log of the p value for the association between the SNP and the phenotype. SNPs that surpassed the empirical p value threshold, shown as a red line, for decline in gait speed (empirical p value = 0.006) and grip strength (empirical p value = 0.0019) are depicted as red dots. SNPs that surpassed a suggestive threshold (p value = 0.009 for gait speed) are depicted as purple dots.

Figure 2—source data 1

Details for phenotypes calculated from the US Health and Retirement Study.

https://cdn.elifesciences.org/articles/74308/elife-74308-fig2-data1-v2.docx
Figure 3 with 1 supplement see all
Effects of ALDH4A1 variation on phenotypes representing association with change in aging-related function in a normative, population-based sample of older adults.

(a) Change in gait speed over 10 years. Effect of SNP rs77608580 on aging-related changes in gait speed (b = 0.052, p = 0.0025). Over the span of one decade, on average, those with one or two effect alleles will have faster gait speeds with a difference of 0.52 and 1.04 m/s, respectively, compared to those without an effect allele. (b) Decline in grip strength over 10 years. Variation in ALDH4A1 (SNP rs28665699) is inversely associated with decline in aging-related grip strength (b = −0.045, p = 0.0009). Individuals with one or two effect alleles have slower progression of weakened grip strength over 10 years by 0.5 and 1.0 kg, respectively, compared to the same aged individuals without the effect allele.

Measures of muscle health, such as grip strength, are effective biomarkers of overall health in older populations (Carson, 2018; McGrath et al., 2020). rs28665699 was significantly associated with an increase in grip strength over time (Figure 2b); with each additional A allele, there was an average increase in grip strength by 0.045 kg weight per year while holding all other characteristics constant (age, sex, and use of the dominant hand for gripping; p value = 0.0009, surpassing the empirical p value threshold of 0.0019). This was assessed among N = 5228 older individuals with a mean age of 68.9 years (SD = 10.4), mean grip strength of 30.21 kg (SD = 11.1), and average level of decline in grip strength at 2.31 kg per year (SD = 5.37). If calculated as change over a 10-year period, those with one or two effect alleles would have stronger grip by 0.5 and 1.0 kg compared to those without an effect allele, respectively. The allele therefore is associated with a slower rate of decline in grip strength over a decade of age (Figure 3b).

These effects are examples where variation in the gene contributing to phenotypes that represent different age-related change in functionality; overall we find there are small effects associated with each phenotype, but there are possible pleiotropic effects, and environmental or behavioral factors contributing. It is not known if any one of the identified ALDH4A1 SNPs is a causal variant or if they mark a different variant within the ALDH4A1 gene that was not represented on the HRS array. Regardless, these results collectively support a true association between ALDH4A1 and age-related physical function.

We tested replication of the top two SNPs from the GeneWAS across ethnic subsamples in the HRS by calculating a common effect size across the samples. We did this by completing a fixed effects and random effects meta-analysis using PLINK software (Rentería et al., 2013; Purcell et al., 2007; Table 3). For one SNP, the minor allele frequency in the African ancestry sample was below 1% and thus the subsample was excluded from the meta-analysis. The Cochrane’s Q statistic (Q), as an indicator of variance across sample effect sizes, and the heterogeneity index (I), which quantifies dispersion across samples indicate random effects analysis fit the data better for gait speed decline, thus we focus on results from random effects to account for differences in effect sizes by sample (e.g., the I index indicates 64.95% of the observed variance between samples is due to differences in effect sizes between samples). Given these results, the common effect size calculated for grip strength decline still suggest significance of these associations with SNPs in the gene, whereas the effect for gait speed decline remained for the European ancestry cohort only and not across subsamples. Genetic data obtained from similarly large international cohorts studies (e.g., English Longitudinal Study of Ageing [ELSA; https://www.elsa-project.ac.uk]; Irish Longitudinal Study on Ageing [TILDA; https://tilda.tcd.ie/]; cohorts in the Survey of Health, Ageing and Retirement in Europe [SHARE; https://g2aging.org/overviews?study=share-aut], or Northern Ireland Cohort for the Longitudinal Study of Ageing [NICOLA; https://www.qub.ac.uk/sites/NICOLA/AboutNICOLA/]; and others who are aged 50 and older will enable additional replication and additional cross comparisons).

Table 3
Replication across ethnic subsamples in the HRS.
PhenotypeSNP nameLocationMinor alleleEuropean ancestry(N)*African ancestry(N)*Hispanicancestry(N)*Fixed effect p valueRandom effectp valueFixed effect:OR or betaRandom effect:OR or betaQ§I
Grip strength decliners2866569919200185A52284090.001500.00150−0.0418−0.04180.33410.00
Gait speed decliners7760858019196968A33193812370.007750.729000.04240.01460.057764.95
  1. *

    N: sample size by group included in the meta-analysis.

  2. Fixed effect: p value and effect size.

  3. Random effect: p value and effect size.

  4. §

    Cochrane’s Q statistic: indicator of variance across sample effect sizes.

  5. I: heterogeneity index to quantify dispersion across samples.

To test how genetic variation in P5C dehydrogenase can influence age-related muscle function, we returned to our collection of C. elegans strains harboring mutations in alh-6. We measured individual animal movement speed as a function of muscle health with age (Roussel et al., 2014). Only animals harboring mutations of ALH-6 at position G152R(lax940), K153T(lax924), S523F(lax928), and G527R(lax933) resulted in a significant loss of movement speed at larval stage 4; just prior to adulthood (Figure 4a). However, with the exception of S230F(lax907) and Y427N(lax918), all mutants tested displayed a significant reduction in movement speed at day 3 of adulthood (Figure 4b). As a secondary measure of muscle function in our panel of alh-6 mutants, we measured changes in swimming performance (Figure 4—figure supplement 1), which has documented effects on animal health and longevity (Laranjeiro et al., 2019). It is established that swimming is a more energetically demanding activity than crawling on a plate (Laranjeiro et al., 2017). Intriguingly, the effect of the canonical alh-6(lax105) mutation on swimming was less pronounced than that observed for crawling speed and our panel of alh-6 mutants displayed differences in developmental and adult swimming performance. Taken together these data support the age-specific acceleration of muscle decline in mitochondrial proline catabolism mutants, which is conserved from nematodes to humans.

Figure 4 with 1 supplement see all
alh-6 mutations accelerate loss of muscle function.

WormLab software analysis of adjusted center point speed of individual animals of the given genotypes at the L4 stage (a) or day 3 of adulthood (b). Brown–Forsythe and Welch analysis of variance (ANOVA) test with Dunnett’s T3 multiple comparisons test, with individual variances computed for each comparison. *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001.

The traits analyzed in the HRS came from a population-based study and were not assessed to allow us to identify physiological degeneration in specific muscles, rather to index and track overall age-related decline in functionality over time. It is widely accepted that genetic variation underlying these aging-related traits are highly polygenic. Thus, it is not expected that a single variant within a gene would be identified to drive these phenotypic results in humans. It is likely that small effects of multiple SNPs across multiple genes, including within the same gene, and with nonadditive effects (e.g., gene-by-environment effects, Yen and Curran, 2016), contribute to the resulting phenotypes. With this use of gene-wide association scanning approach, it is only possible to identify variants associated with overall effects. Without identifying a causal SNP, we can only aggregate available data to suggest what contributes to a biological pathway. For example, through exploitation of the publicly available Genotype-Tissue Expression (GTEx) database (Battle et al., 2017), we found that one of the tag SNPs in ALDH4A1, rs77608580, was significantly associated with differential ALDH4A1 expression levels through the association with an expression quantitative trait locus in whole blood (Figure 3—figure supplement 1). Further experimental studies to reveal the downstream effect(s) of altered gene expression and/or specific muscle functionality phenotyping, are required to address mechanistic questions pertaining to unique muscle groups and muscle-specific activities.

With the goal of better understanding the relationships between the phenotypes and potential disease, or system functionality, investigating more than one phenotype is an important strength (Dey et al., 2017; Pendergrass et al., 2015). Several studies have now demonstrated the biological utility of invoking multiple phenotypes for genetic association scans (Hall et al., 2014; Pendergrass et al., 2013). Invoking more than one phenotype and multiple SNPs for genetic testing, however, brings forth new challenges when needing to consider multiple hypothesis-testing burden, or type 1 error, while not missing underlying associations from overly stringent significance criteria that typically assume independent genetic variants and phenotypes. Prior studies that have used a multiple phenotype approach to investigate upwards of a thousand phenotypes do so with a completely agnostic design, used for exploratory hypothesis generation (Hall et al., 2014; Pendergrass et al., 2013). In contrast, with the current study, we sought to use the selected phenotypes in human data to cross-validate findings, with hypotheses generated from a model organism. Thus, the current study uses a more targeted, hypothesis-driven approach, by limiting a study to two selected phenotypes and variants within one identified genomic region. With this design, we retain the potential for detecting possible areas of genetic pleiotropy (i.e., genetic variation on more than one phenotype) and possiblities for identifying mechanistic pathways leading to normative aging-related muscle functioning and decline.

Existing biomarkers of muscle health that can accurately predict muscle health later in life are extremely scarce due to limited data in human aging and an incomplete understanding of the molecular basis of sarcopenia. Moreover, facile approaches to experimentally validate the hypothesis generated from deep human genetic variation datasets are scarce. Understanding the diversity of genetic variation underlying sarcopenia, as well as their corresponding phenotypic outcomes, will be critical for providing accurate risk assessments for family planning and genetic counseling of older adults. We have established a powerful experimental platform that synergistically utilizes the data rich resources of the US Health and Retirement study with the genetically tractible and methodologically rich C. elegans model. We anticipate this new research paradigm will be a formidable tool for collaboration between computational and bench scientists. Although the etiology of human disease is complex and multifactorial, we have used a combination of classical C. elegans genetics and human genetic association studies to define genetic variation in alh-6/ALDH4A1 as a new biomarker of age-related muscle health in human.

Materials and methods

C. elegans strains and maintenance

Request a detailed protocol

C. elegans were cultured using standard techniques at 20°C (Brenner, 1974). The following strains were used: wild-type (WT) N2 Bristol, SPC321 [alh-6(lax105)], CL2166[gst-4p::gfp], SPC223 [alh-6(lax105);gst-4p::gfp], SPC542 [alh-6(lax917);gst-4p::gfp], SPC531 [alh-6(lax906);gst-4p::gfp], SPC528 [alh-6(lax903);gst-4p::gfp], SPC552 [alh-6(lax927);gst-4p::gfp], SPC561 [alh-6(lax937);gst-4p::gfp], SPC564 [alh-6(lax940);gst-4p::gfp], SPC549 [alh-6(lax924);gst-4p::gfp], SPC566 [alh-6(lax945);gst-4p::gfp], SPC540 [alh-6(lax915);gst-4p::gfp], SPC562 [alh-6(lax938);gst-4p::gfp], SPC563 [alh-6(lax939);gst-4p::gfp], SPC546 [alh-6(lax921);gst-4p::gfp], SPC527 [alh-6(lax902);gst-4p::gfp], SPC529 [alh-6(lax904);gst-4p::gfp], SPC536 [alh-6(lax911);gst-4p::gfp], SPC534 [alh-6(lax909);gst-4p::gfp], SPC559 [alh-6(lax935);gst-4p::gfp], SPC532 [alh-6(lax907);gst-4p::gfp], SPC569 [alh-6(lax993);gst-4p::gfp], SPC544 [alh-6(lax919);gst-4p::gfp], SPC562 [alh-6(lax938);gst-4p::gfp], SPC551 [alh-6(lax926);gst-4p::gfp], SPC530 [alh-6(lax905);gst-4p::gfp], SPC533 [alh-6(lax908);gst-4p::gfp], SPC548 [alh-6(lax923);gst-4p::gfp], SPC550 [alh-6(lax925);gst-4p::gfp], SPC565 [alh-6(lax941);gst-4p::gfp], SPC538 [alh-6(lax913);gst-4p::gfp], SPC543 [alh-6(lax918);gst-4p::gfp], SPC541 [alh-6(lax916);gst-4p::gfp], SPC545 [alh-6(lax920);gst-4p::gfp], SPC554 [alh-6(lax929);gst-4p::gfp], SPC558 [alh-6(lax934);gst-4p::gfp], SPC526 [alh-6(lax901);gst-4p::gfp], SPC568 [alh-6(lax992);gst-4p::gfp], SPC535 [alh-6(lax910);gst-4p::gfp], SPC553 [alh-6(lax928);gst-4p::gfp], SPC539 [alh-6(lax914);gst-4p::gfp], SPC556 [alh-6(lax932);gst-4p::gfp], SPC557 [alh-6(lax933);gst-4p::gfp], SPC567 [alh-6(lax947);gst-4p::gfp].

Double mutants were generated by standard genetic techniques. E. coli strains used were as follows: OP50/E. coli B for standard growth. All genetic mutants were backcrossed at least 4× prior to phenotypic analyses.

Genetic complementation (cis–trans) testing

Request a detailed protocol

Hermaphrodites from each isolated mutant that phenocopied the alh-6(lax105)-like, age-related activation of the gst-4p::gfp reporter in the musculature were mated to SPC223 [alh-6(lax105);gst-4p::gfp] males. F1 progeny were screened at day 3 of adulthood for the alh-6(lax105)-like phenotype, which indicates a failure of the alh-6(lax105) allele to complement the mutation in the new mutant strain; thus the new mutant harbors a loss-of-function allele in alh-6.

DNA sequencing of alh-6 genetic mutants

Request a detailed protocol

Approximately 200 adult worms were collected and washed with M9. Animals were homogenized and genomic DNA was extracted using the Zymo Research Quick-DNA Miniprep kit (Cat. #D3025). The entire alh-6 genomic sequence (ATG to stop) was amplified by PCR and cloned in a linearized pMiniT 2.0 vector (NEB PCR Cloning Kit, Cat. #E1202S). Plasmid DNA was purified using the Zymo Research Zyppy Plasmid Miniprep kit (Cat. D4019) and sequenced.

Microscopy

Request a detailed protocol

Zeiss Axio Imager and ZEN software were used to acquire all images used in this study. For GFP reporter strains, worms were mounted in M9 with 10 mM levamisole and imaged with DIC and GFP filters. Worm areas were measured in ImageJ software (National Institutes of Health, Bethesda, MD) using the polygon tool.

HRS human samples

Request a detailed protocol

The US HRS (Sonnega et al., 2014; Juster and Suzman, 1995) is a nationally representative, longitudinal sample of adults aged 50 years and older, who have been interviewed every 2 years, beginning in 1992. Because the HRS is nationally representative, including households across the country and the surveyed sample now includes over 36,000 participants, it is often used to calculate national prevalence rates for specific conditions for older adults, including physical and mental health outcomes, cognitive outcomes, as well as financial and social indicators.

The sample for the current study is comprised of a subset of the HRS for which genetic data were collected, as described below. To reduce potential issues with population stratification, the GeneWAS in this study was limited to individuals of primarily European ancestry. The final sample was N = 3319 , with the proportion of women at 58.5%.

Genotyping data

Request a detailed protocol

For HRS, genotype data were accessed from the National Center for Biotechnology Information Genotypes and Phenotypes Database (dbGaP; HRS Health and Retirement Study, 2021). DNA samples from HRS participants were collected in two waves. In 2006, the first wave was collected from buccal swabs using the Qiagen Autopure method (Qiagen, Valencia, CA). In 2008, the second wave was collected using Oragene saliva kits and extraction method. Both waves were genotyped by the NIH Center for Inherited Disease Research (CIDR; Johns Hopkins University) using the HumanOmni2.5 arrays from Illumina (San Diego, CA). Raw data from both phases were clustered and called together. HRS followed standard quality control recommendations to exclude samples and markers that obtained questionable data, including CIDR technical filters (Laurie et al., 2010), removing SNPs that were duplicates, had missing call rates ≥2%, >4 discordant calls, >1 Mendelian error, deviations from Hardy–Weinberg equilibrium (at p value < 10−4 in European samples, and sex differences in allelic frequency ≥0.2). Further detail is provided in HRS documentation (Weir, 2013). Applying these criteria to the gene region, on chromosome 1, (NC_000001.10): 19,194,787–19,232,430 resulted in available data on 70 SNPs within the ALDH4A1 region that are on the Illumina array to represent 273 human SNPs in the gene. With the goal of evaluating whether representative marker SNPs across the gene are associated with the phenotypes of interest, we implemented a pruning procedure, which sequentially scans SNPs in linkage disequilibrium, and performs thinning to subset to more independent SNPs based on a given threshold of correlation between SNPs and between linear combinations of SNPs. To achieve this, SNPs were first filtered to retain 53 SNPs that had a minor allele frequency at 0.01 or greater. We then pruned by recursively removing SNPs within a sliding window of 25 (i.e., 25 consecutive SNPs), shifted the window with 5 SNPs forward, and set the variance inflation factor threshold at 2. This yielded 21 SNPs for consideration (Table 2).

Statistical analysis of HRS dataset

Request a detailed protocol

Following SNP extraction, we followed analytical steps of Phenotype construction GeneWASs, and SNP evauation.

HRS phenotype construction

Request a detailed protocol

HRS phenotype construction was completed to calculate common measures of normal age-related muscle decline in functionality over time. Figure 2—source data 1 shows the HRS data years from which phenotypes were calculated and details on how the variable is defined, and score or variable range. Datasets from multiple survey years were merged to get repeated assessments of variables on the same individuals. Phenotypes were calculated based on consensus following a review of the literature on assessments for age-related outcomes for variables implemented in the HRS and similar population-based surveys of aging. Further background for coding of specific phenotypes is described in detail previously for gait (Wu et al., 2017; Batsis et al., 2016; Kim et al., 2019). Phenotypes for grip strength decline and gait speed decline were assessed as change in performance on those tasks over time. Change was calculated by taking the score from the most recent assessment and subtracting the score from the first assessment for each person, within the respective years listed. Additional descriptive statistics on phenotypes can be provided. Phenotypes were calculated using SAS 9.4.

GeneWAS

GeneWAS occurred through separate linear regression scans, under an additive model, adjusting for relevant covariates and indicators of population stratification as described below.

Population stratification

Request a detailed protocol

As with any statistical analysis of association, if the correlation between dependent and independent variables differs for subpopulations, this may result in spurious genetic associations (Novembre et al., 2008). To reduce such type 1 error, we conducted the GeneWAS adjusting for population structure as indicated by latent factors from principal components analysis (PCA) (Tian et al., 2008; Price et al., 2006). Detailed descriptions of the processes employed for running PCA, including SNP selection, are provided by HRS, and follow methods outlined by Price et al., 2006. Two PCAs were run. The first PCA included 1230 HapMap anchors from various ancestries and was used to test against self-reported race and ethnic classifications. Several corrections to the dataset were made based on this analysis. The second PCA was run on the corrected dataset, on unrelated individuals and excluding HapMap anchors, to create eigenvectors to serve as covariates and adjust for population stratification in association tests. From the second PCA, the first two eigenvalues with the highest values accounted for less than 4.5% of the overall genetic variance, with additional components (3–8) increasing this minimally, by a total of ~1.0% (Weir, 2013). Based on these analyses, we opted for a strategy that does not ignore population substructure, but also does not overcorrect, and adjusted for the first four PCs in all analyses. When coupling this approach of adjusting for principal components with all quality control procedures performed, excluding any related individuals and limiting the dataset for ancestral homogeneity, we reduce the likelihood of false associations resulting from population stratification (Tian et al., 2008; Price et al., 2006; Li and Yu, 2008; Serre et al., 2008; Zhang et al., 2003; Zhu et al., 2002; Price et al., 2010).

Regression models and other covariates

Request a detailed protocol

When conducting regressions on phenotypes indicating change over time, additional adjustments were made using covariates for baseline levels, number of years during which change was calculated, and variables shown to affect outcomes. For example, with change in gait speed, a linear regression scan was run adjusting for sex, age at the first assessment point, number of years of follow-up, baseline walking speed, and floor type in addition to principal components. All GeneWASs were completed using PLINK 2.0 (Purcell et al., 2007). The strength of the associations, as indicated by effect sizes and p values, is not directly comparable for each phenotype because the sample sizes differed by phenotype. Thus, the strength of an association does not reflect how strong a SNP effects one phenotype compared to another. Because we did find more than one variant associated with the phenotypes, we are more confident that these results were not due to type 1 error.

SNP evaluation

Request a detailed protocol

We evaluated SNP associations in the GeneWAS by p value. With the number of SNPs and primary phenotypes in this study, strict Bonferroni correction would yield an adjusted multiple test correction p value threshold of 0.0012 (for 21 × 2 tests). However, Bonferroni correction such as these are too conservative because of the correlations among SNPs (Han et al., 2009; Sham and Purcell, 2014) and the cross-validation approach. To address the correlation among SNPs, we implement a pruning schema and calculate empirical p value thresholds, through permutation (Han et al., 2009; Sham and Purcell, 2014; Dudbridge and Gusnanto, 2008; Pahl and Schäfer, 2010). Permutation is a process whereby necessary correlations between SNPs and phenotypes are intentionally shuffled so that p values for the shuffled (null) data are compared to the nonshuffled data. This permutation is repeated multiple times in order to determine an empirical p value (Sham and Purcell, 2014; Pahl and Schäfer, 2010; North et al., 2002), a calculated threshold at which a test result is less likely to achieve significance by chance alone. Thus, when performing 1000 permutations using PLINK and max(T) option (Pahl and Schäfer, 2010), the empirical p value thresholds of 0.0019 for grip strength decline and 0.006 for gait speed decline were observed for determining gene-wide significance. For SNP comparisons, we used R (CRAN; https://www.r-project.org).

Gene expression

Request a detailed protocol

The GTEx database (Battle et al., 2017), the most comprehensive, publicly available resource for tissue-specific gene expression data, was used to evaluate whether there was evidence for regulatory functions of SNPs within the gene. We entered the top SNPs into GTEx to assess relationships with differential gene expression.

WormLab measurements

Request a detailed protocol

As previously described (Roussel et al., 2014), but in brief; 15–20 animals were moved to a NGM stock plate without E. coli OP50 and recorded in WormLab software (MBF Bioscience) for 2 min.

Swimming measurements

Request a detailed protocol

As previously described (Stuhr and Curran, 2020), but in brief; 15–20 worms were moved to an unseeded NGM stock plate for 1 hr. Then worms were washed with M9 into 5 µl drops onto a fresh NGM plate. After 1 min, 15–20 worms were imaged via Movie Recorder at 50 ms exposure using ZEN 2 software (Zeiss Axio Imager).

Statistical analysis of Alh-6 genetic mutants

Request a detailed protocol

Data are presented as mean ± SD. Comparisons and significance were analyzed in GraphPad Prism 8. Comparisons between more than two groups were done using analysis of variance.

Appendix 1

Appendix 1—key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (C. elegans)N2Caenorhabditis Genetics Center (CGG)Laboratory reference strain (wild type)
Strain, strain background (C. elegans)SPC321PMID:24440036Genotype: alh-6(lax105)
Strain, strain background (C. elegans)CL2166Caenorhabditis
Genetics Center (CGG)
Genotype: gst4-p::gfp
Strain, strain background (C. elegans)SPC223PMID:24440036Genotype: alh-6(lax105);gst-4p::gfp
Strain, strain background (C. elegans)SPC542This paperGenotype: alh-6(lax917);gst-4p::gfp
Strain, strain background (C. elegans) SPC531This paperGenotype: alh-6(lax906);gst-4p::gfp
Strain, strain background (C. elegans)SPC528This paperGenotype: alh-6(lax903);gst-4p::gfp
Strain, strain background (C. elegans)SPC552This paperGenotype: alh-6(lax927);gst-4p::gfp
Strain, strain background (C. elegans)SPC561This paperGenotype: alh-6(lax937);gst-4p::gfp
Strain, strain background (C. elegans)SPC564This paperGenotype: alh-6(lax940);gst-4p::gfp
Strain, strain background (C. elegans)SPC549This paperGenotype: alh-6(lax924);gst-4p::gfp
Strain, strain background (C. elegans)SPC566This paperGenotype: alh-6(lax945);gst-4p::gfp
Strain, strain background (C. elegans)SPC540This paperGenotype: alh-6(lax915);gst-4p::gfp
Strain, strain background (C. elegans)SPC562This paperGenotype: alh-6(lax938);gst-4p::gfp
Strain, strain background (C. elegans)SPC563This paperGenotype: alh-6(lax939);gst-4p::gfp
Strain, strain background (C. elegans)SPC546This paperGenotype: alh-6(lax921);gst-4p::gfp
Strain, strain background (C. elegans)SPC527This paperGenotype: alh-6(lax902);gst-4p::gfp
Strain, strain background (C. elegans)SPC529This paperGenotype: alh-6(lax904);gst-4p::gfp
Strain, strain background (C. elegans)SPC536This paperGenotype: alh-6(lax911);gst-4p::gfp
Strain, strain background (C. elegans)SPC534This paperGenotype: alh-6(lax909);gst-4p::gfp
Strain, strain background (C. elegans)SPC559This paperGenotype: alh-6(lax935);gst-4p::gfp
Strain, strain background (C. elegans)SPC532This paperGenotype: alh-6(lax907);gst-4p::gfp
Strain, strain background (C. elegans)SPC569This paperGenotype: alh-6(lax993);gst-4p::gfp
Strain, strain background (C. elegans)SPC544This paperGenotype: alh-6(lax919);gst-4p::gfp
Strain, strain background (C. elegans)SPC562This paperGenotype: alh-6(lax938);gst-4p::gfp
Strain, strain background (C. elegans)SPC551This paperGenotype: alh-6(lax926);gst-4p::gfp
Strain, strain background (C. elegans)SPC530This paperGenotype: alh-6(lax905);gst-4p::gfp
Strain, strain background (C. elegans)SPC533This paperGenotype: alh-6(lax908);gst-4p::gfp
Strain, strain background (C. elegans)SPC548This paperGenotype: alh-6(lax923);gst-4p::gfp
Strain, strain background (C. elegans)SPC550This paperGenotype: alh-6(lax925);gst-4p::gfp
Strain, strain background (C. elegans)SPC565This paperGenotype: alh-6(lax941);gst-4p::gfp
Strain, strain background (C. elegans)SPC538This paperGenotype: alh-6(lax913);gst-4p::gfp
Strain, strain background (C. elegans)SPC543This paperGenotype: alh-6(lax918);gst-4p::gfp
Strain, strain background (C. elegans)SPC541This paperGenotype: alh-6(lax916);gst-4p::gfp
Strain, strain background (C. elegans)SPC545This paperGenotype: alh-6(lax920);gst-4p::gfp
Strain, strain background (C. elegans)SPC554This paperGenotype: alh-6(lax929);gst-4p::gfp
Strain, strain background (C. elegans)SPC558This paperGenotype: alh-6(lax934);gst-4p::gfp
Strain, strain background (C. elegans)SPC526This paperGenotype: alh-6(lax901);gst-4p::gfp
Strain, strain background (C. elegans)SPC568This paperGenotype: alh-6(lax992);gst-4p::gfp
Strain, strain background (C. elegans)SPC535This paperGenotype: alh-6(lax910);gst-4p::gfp
Strain, strain background (C. elegans)SPC553This paperGenotype: alh-6(lax928);gst-4p::gfp
Strain, strain background (C. elegans)SPC539This paperGenotype: alh-6(lax914);gst-4p::gfp
Strain, strain background (C. elegans)SPC556This paperGenotype: alh-6(lax932);gst-4p::gfp
Strain, strain background (C. elegans)SPC557This paperGenotype: alh-6(lax933);gst-4p::gfp
Strain, strain background (C. elegans)SPC567This paperGenotype: alh-6(lax947);gst-4p::gfp
Sequence-based reagentpMiniT 2.0 vector and cloning kitNew England Biolabs#E1202S
Software, algorithmMBF Bioscience
Wormlab
https://www.mbfbioscience.com/wormlab
OtherUS Health and Retirement Study (HRS)https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000428.v2.p2National Center for Biotechnology Information Genotypes and Phenotypes Database dbGaP Study Accession: phs000428.v2.p2
Strain, strain background (Escherichia coli)OP50-1Caenorhabditis
Genetics Center (CGG)
RRID:WB-STRAIN:WBStrain00041971Standard E. coli B diet Streptomycin resistant
Software, algorithmGraphPad PrismGraphPad Prism (https://graphpad.com)RRID:SCR_015807Version 6
Software, algorithmImageJImageJ (http://imagej.nih.gov/ij/)RRID:SCR_003070
Software, algorithmPhyre2http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index
Software, algorithmMisSense3Dhttp://missense3d.bc.ic.ac.uk/missense3d/

Data availability

All data are available within the manuscript. Health and retirement study (HRS) data are maintained at the University of Michigan - https://hrs.isr.umich.edu/about.

The following data sets were generated

References

    1. Santilli V
    2. Bernetti A
    3. Mangone M
    4. Paoloni M
    (2014)
    Clinical definition of sarcopenia
    Clinical Cases in Mineral and Bone Metabolism 11:177–180.
    1. Teumer A
    2. Li Y
    3. Ghasemi S
    4. Prins BP
    5. Wuttke M
    6. Hermle T
    7. Giri A
    8. Sieber KB
    9. Qiu C
    10. Kirsten H
    11. Tin A
    12. Chu AY
    13. Bansal N
    14. Feitosa MF
    15. Wang L
    16. Chai J-F
    17. Cocca M
    18. Fuchsberger C
    19. Gorski M
    20. Hoppmann A
    21. Horn K
    22. Li M
    23. Marten J
    24. Noce D
    25. Nutile T
    26. Sedaghat S
    27. Sveinbjornsson G
    28. Tayo BO
    29. van der Most PJ
    30. Xu Y
    31. Yu Z
    32. Gerstner L
    33. Ärnlöv J
    34. Bakker SJL
    35. Baptista D
    36. Biggs ML
    37. Boerwinkle E
    38. Brenner H
    39. Burkhardt R
    40. Carroll RJ
    41. Chee M-L
    42. Chee M-L
    43. Chen M
    44. Cheng C-Y
    45. Cook JP
    46. Coresh J
    47. Corre T
    48. Danesh J
    49. de Borst MH
    50. De Grandi A
    51. de Mutsert R
    52. de Vries APJ
    53. Degenhardt F
    54. Dittrich K
    55. Divers J
    56. Eckardt K-U
    57. Ehret G
    58. Endlich K
    59. Felix JF
    60. Franco OH
    61. Franke A
    62. Freedman BI
    63. Freitag-Wolf S
    64. Gansevoort RT
    65. Giedraitis V
    66. Gögele M
    67. Grundner-Culemann F
    68. Gudbjartsson DF
    69. Gudnason V
    70. Hamet P
    71. Harris TB
    72. Hicks AA
    73. Holm H
    74. Foo VHX
    75. Hwang S-J
    76. Ikram MA
    77. Ingelsson E
    78. Jaddoe VWV
    79. Jakobsdottir J
    80. Josyula NS
    81. Jung B
    82. Kähönen M
    83. Khor C-C
    84. Kiess W
    85. Koenig W
    86. Körner A
    87. Kovacs P
    88. Kramer H
    89. Krämer BK
    90. Kronenberg F
    91. Lange LA
    92. Langefeld CD
    93. Lee JJ-M
    94. Lehtimäki T
    95. Lieb W
    96. Lim S-C
    97. Lind L
    98. Lindgren CM
    99. Liu J
    100. Loeffler M
    101. Lyytikäinen L-P
    102. Mahajan A
    103. Maranville JC
    104. Mascalzoni D
    105. McMullen B
    106. Meisinger C
    107. Meitinger T
    108. Miliku K
    109. Mook-Kanamori DO
    110. Müller-Nurasyid M
    111. Mychaleckyj JC
    112. Nauck M
    113. Nikus K
    114. Ning B
    115. Noordam R
    116. Connell JO
    117. Olafsson I
    118. Palmer ND
    119. Peters A
    120. Podgornaia AI
    121. Ponte B
    122. Poulain T
    123. Pramstaller PP
    124. Rabelink TJ
    125. Raffield LM
    126. Reilly DF
    127. Rettig R
    128. Rheinberger M
    129. Rice KM
    130. Rivadeneira F
    131. Runz H
    132. Ryan KA
    133. Sabanayagam C
    134. Saum K-U
    135. Schöttker B
    136. Shaffer CM
    137. Shi Y
    138. Smith AV
    139. Strauch K
    140. Stumvoll M
    141. Sun BB
    142. Szymczak S
    143. Tai E-S
    144. Tan NYQ
    145. Taylor KD
    146. Teren A
    147. Tham Y-C
    148. Thiery J
    149. Thio CHL
    150. Thomsen H
    151. Thorsteinsdottir U
    152. Tönjes A
    153. Tremblay J
    154. Uitterlinden AG
    155. van der Harst P
    156. Verweij N
    157. Vogelezang S
    158. Völker U
    159. Waldenberger M
    160. Wang C
    161. Wilson OD
    162. Wong C
    163. Wong T-Y
    164. Yang Q
    165. Yasuda M
    166. Akilesh S
    167. Bochud M
    168. Böger CA
    169. Devuyst O
    170. Edwards TL
    171. Ho K
    172. Morris AP
    173. Parsa A
    174. Pendergrass SA
    175. Psaty BM
    176. Rotter JI
    177. Stefansson K
    178. Wilson JG
    179. Susztak K
    180. Snieder H
    181. Heid IM
    182. Scholz M
    183. Butterworth AS
    184. Hung AM
    185. Pattaro C
    186. Köttgen A
    (2019) Genome-wide association meta-analyses and fine-mapping elucidate pathways influencing albuminuria
    Nature Communications 10:4130.
    https://doi.org/10.1038/s41467-019-11576-0
  1. Book
    1. Weir DR
    (2013)
    Quality Control Report for Genotypic Data
    University of Michigan.

Decision letter

  1. Monica Driscoll
    Reviewing Editor; Rutgers University, United States
  2. Carlos Isales
    Senior Editor; Medical College of Georgia at Augusta University, United States
  3. Alfred Fisher
    Reviewer; University of Nebraska Medical Center, United States

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Genetic variation in ALDH4A1 predicts muscle health over the lifespan and across species" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Carlos Isales as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Alfred Fisher (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

The reviewers raised important points regarding statistical significance, and the relationship of SNPs to ALDH41A expression/stability.

Following the suggestions in the reviews should enhance the readability and accuracy of presentation.

High priority-Essential

1) Please make a more compelling case for the significance of the identified SNPs. Reviewer 2 (R2) suggests a threshold for significance for the SNPs of p=5.8e-5, which is not cleared by reported SNPs.

More specifically, R2 says: "The authors tested the significance of 53 SNPs in the ALDH4A1 locus with 16 phenotypes in the HRS (Table S1). They claim that they are performing a gene-wide association study and are correcting for 16 tests (0.05/16) in their statistical significance assessment. This analysis is not correct. The authors need to correct for 53x16=848 tests (0.05/848 = 5.8e-5 if they choose to use Bonferroni correction). The authors can argue that the SNPs are not independent, which would be true. In that case, they need to prune the SNPs based on linkage disequilibrium and use index/tag SNPs for their analyses. Further, some of the 16 phenotypes could also be highly correlated; and this needs to be acknowledged/addressed".

– the authors should justify the P-value threshold with strong statistical argument or a reinforcing analysis. If this is not possible, the work as presented here cannot be recommended as adequately rigorous.

If possible, address replication in cohorts. R3 suggests that a validation of the importance could involve author identification of the best phenotype and SNPs (as they have done), and then to validate these findings in other muscle aging data sets. The authors could use a similar approach to this work, but with many fewer SNPs and phenotypes, which would lower the p-value threshold, and make the work more robust. Such an approach would exploit existing electronic data, so the work involved should be fairly modest.

2) Please link SNPs to ALDH4 expression or stability to provide some sense of how a SNP could result in functional changes--making this connection compelling underlies the major conclusion of the paper.

We suggest that you mine the eQTL and RNA stability databases to determine which SNPs can be linked to some potential change in ALDH41A expression/stability. The eQTL analysis is both important to substantiate the claims and doable. What the authors would need to do to substantiate their claim that ALDH41A is a biomarker of muscle health in humans is to mine and perform the appropriate genetics and statistical analyses using existing cell-specific data in the extensive human eQTLs repositories (e.g., GTEx, eQTLGen, etc). Tissue-specific human eQTL data are publicly available, and can be extracted and analyzed. For more information on these repositories, the authors can refer to PMID: 34493866.

Better connect predictor human SNPs and C. elegans muscle aging biology.

The SNPs of human ALDH4A1 were not analyzed with respect to the C. elegans alh-6 mutations. Reviewers agreed that making a "functional" connection from nematode to human SNP would strengthen the arguments put forward in the paper. Minimally, the authors should include clear listing of whether SNPs might be associated with a change in the C. elegans protein/transcript.

Easy additions for clarity and value:

The goal of establishing a functional link between identified SNPs and ALDH4A1 function in human muscle can and should be enhanced by adding details regarding SNP impact on protein structure and function prediction, or of potential mRNA consequence (splicing site perturbation). Compare conserved amino acid sequences of C. elegans ALH-6 and human ALDH4A1 in parallel. Include schematic illustration of ALH-6 protein and predicted structures of ALH-6 wild-type and mutant proteins using protein structure prediction tools (i.e. AlphaFold2), which will provide useful data for alh-6 substitutions identified.

Suggested but not essential:

Constructing human cognate CRISPR alleles in nematodes would be welcome (especially if any support a clear structure/function hypothesis), but since 1) it is possible that a SNP change might not directly correspond to nematode impact even if the proposed relationship were operative, and 2) such engineering could require a fair amount of elegans manipulation, making this connection is not considered a requirement for a successful revision.

3) Comment on novelty of approach. The GWAS mining approach is important as it reports a success that opens up a novel avenue for connecting C. elegans biology/genetics to human physiology. Still, success with this approach for human obesity genes have been recently reported (PLoS Genetics; PMID: 34492009), so the approach is not the first of its type. Authors should cite that literature; their contribution on this front is still significant and the application in this system moves this approach toward a more central activity in the field, which is powerful.

4) The title should be precise and true to findings. R1 noted that "The title of this paper is somewhat misleading. The authors produced a prediction model of age-dependent decline of muscle functions based on genotypes of human ALDH4A1, but not those of C. elegans alh-6". The title. e.g., "Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species" should be revised for accuracy.

5) Explain better, write carefully on SKN-1 activity claims. R2 notes that gst-4::GFP can be induced by SKN-1 as well as by additional transcription factors; SKN-1 activity is not directly tested in this study. Authors should directly test SKN-1 activation, or just state gst-4::GFP expression as the outcome assayed, with caveats mentioned. The authors have previously produced data on skn-1 in this response, possibly more extensive discussion of that data might allay some of the concerns.

Authors should also consider R1 points about muscle-specific activation of gst-4::gfp claims.

In sum, in revision, the authors precise as to what is directly assayed in the screen (gst-4::GFP expression, not necessarily SKN-1 activity) in the summary.

Reviewer #1 (Recommendations for the authors):

1. I think the biggest issue for this excellent paper is that the information about SNPs of human ALDH4A1 was not analyzed with respect to the C. elegans alh-6 mutations. Do the SNPs occur at the same or similar loci of the orthologous C. elegans mutation sites? If not, can the authors introduce human SNPs into C. elegans for the orthologous changes and test whether they affect age-dependent declines in muscle functions? This is the key for improving the paper to fit the purpose of the work.

2. The title of this paper is somewhat misleading. The authors produced a prediction model of age-dependent decline of muscle functions based on genotypes of human ALDH4A1, but not those of C. elegans alh-6. Please downplay the title. e.g., "Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species"

3. In Figure 1b, please show schematic illustration of ALH-6 protein. In addition, predicted structures of ALH-6 wild-type and mutant proteins using protein structure prediction tools (i.e. AlphaFold2) will provide useful information for alh-6 mutations (for example, "G152/K153" in page 5 line 89).

4. In Figure 1c, please describe accurately how the authors executed One-way ANOVA statistical analysis for measuring GFP intensity.

5. I recommend comparing conserved amino acid sequences of C. elegans ALH-6 and human ALDH4A1 in parallel.

6. In figure supplement 1, please add a scale bar.

7. In Figure 2, please elaborate the choice of p values 0.003 and 0.013.

8. In Figure 3, please mark exact genotypes at x axes.

9. For Figure 3, p values are in written the Figure Legends and Table 1. However, it will be better to mark p values or asterisks in the panels as well.

10. I recommend adding discussions regarding how genetic variants in alh-6/ALDH4A1 contribute to age-dependent impairment in muscle functions. In particular, it will be great to add speculation for changes in mitochondrial proline dehydrogenase levels and/or activities that may affect sarcopenia.

11. Please change one of "IADLA Decline" to "IADLA2 Decline" in Figure 2.

12. On page 5, line 85, add lax918 in addition to lax903 and lax930 for E78Stop.

13. On page 6, line 163, use "day 3" instead of "Day 3" for consistency.

Reviewer #2 (Recommendations for the authors):

To strengthen the main claims this reviewer suggests a revision of the following points:

1) The authors state:

"Based on the striking specificity of the muscle-restricted and age-dependent activation of SKN-1 in C. elegans harboring mutations in alh-6,…"

However, currently, data showing SKN-1 activation in the alh-6 mutant backgrounds are lacking. Direct observation of SKN-1::GFP activation would be important because the transcriptional reporter gst-4P::GFP, although a target of SKN-1, is also activated by other stress-responsive TFs including NHR-49 (PMID: 30297383) and DAF-16 (PMID: 32161088).

In addition, in Suppl. Figure 1 various alh-6 mutant alleles show non-muscle gst-4P::GFP signal.

Therefore, the authors may need to directly test SKN-1 activation, and revise the muscle-specific claims.

2) Structural predictions may enable the authors to generate hypotheses suggesting how a very early STOP codon in alh-6 (e.g. lax906) may have a phenotypic impact on muscle function comparable to a late amino acid substitution (e.g. lax947).

3) The authors tested the significance of 53 SNPs in the ALDH4A1 locus with 16 phenotypes in the HRS (Table S1). They claim that they are performing a gene-wide association study and are correcting for 16 tests (0.05/16) in their statistical significance assessment. This analysis is not correct. The authors need to correct for 53x16=848 tests (0.05/848 = 5.8e-5 if they choose to use Bonferroni correction). The authors can argue that the SNPs are not independent, which would be true. In that case, they need to prune the SNPs based on linkage disequilibrium and use index/tag SNPs for their analyses. Further, some of the 16 phenotypes could also be highly correlated; and this needs to be acknowledged/addressed.

4) It is unclear how the SNPs presented in Table 1 are linked to ALDH4A1. The SNPs are in the locus, but that does not necessarily mean they have a functional impact on ALDH4A1. For instance, do the identified SNPs lead to amino acid, splicing, or other regulatory changes in ALDH4A1?

Ultimately, this paper aims to advance ALDH4A1 as genetic factor underlying muscle-function decline in aging humans. Therefore, to go beyond the already publicly available link between the SNPs in the ALDH4A1 locus and muscle performance in aging humans, the authors may need to define whether the reported SNPs are linked to ALDH4A1 expression as eQTLs, especially in muscle tissue/cells.

5) For clarity, the distribution of the phenotypes for each allele may be presented in Figure 3 like they are presented in Figure 4.

Reviewer #3 (Recommendations for the authors):

1) The finding that alh-6 mutants exhibit declines in mobility during aging could reflect a selective effect on muscle function, or could be reflective of a larger acceleration of aging. It would be helpful to show lifespan data for the alh-6 mutant, or discuss if this work has been published previously.

2) The conclusion that the alh-6 mutant affects muscle function during aging could be further bolstered via studying muscle mass and structure as well as mitochondrial number and structure will readily available reporters.

3) For the human SNP analysis, engaging the help of an expert in human Kinesiology or Geriatrics can help with narrowing the phenotypes to make them more selective for muscle function, and exclude those where the connection to muscle function is more tangential.

4) Several recent papers have used genome-wide association analysis to identify SNPs associated with decreased muscle strength in middle-aged and older individuals. This work will provide information on other readily available dataset that could be suitable for a replication study to determine if the findings are seen in separate study populations.

5) For the SNP studies, the 0.003 p-value cut-off is likely too low given that 53 SNPs are being studied making the multiple testing greater than the correction used.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species" for further consideration by eLife. Your revised article has been evaluated by Carlos Isales (Senior Editor), a Guest Reviewing Editor, the original reviewers, and an expert in statistics. We are sorry about the delay.

While Reviewer 1 is now in favor of acceptance as is, Reviewer 3 states: "I still have concerns about the multiple testing correction being used by the authors, and I did not feel that the authors addressed this in a fully compelling manner, such as adding a statistician."

Reviewer 2 concurs: "Although I would really like for the work to be complete, I'd need to agree with Reviewer 3. In addition to unjustified processing of the data that leads to a significance level (0.05) that seems inappropriate for this type of study, the authors have not addressed the concerns related to experimental testing of some of the key claims in C. elegans. For instance, the authors did not directly look at SKN-1::GFP even though the tools are available and this is a feasible experiment. As explained before, the current readout is indirect (gst-4::GFP), and although gst-4 is a target of SKN-1, it is also a target of DAF-16 and other transcription factors. Therefore, gst-4::GFP levels are only suggestive of SKN-1 activation. I agree that adding a GWAS expert statistician would be really helpful."

eLife consultant on statistics questions added: "It is true that in candidate gene association you may not need to correct for multiple tests, however, this is only the case when SNPs used are known to have a function (usually non-synonymous coding). I support the suggestion made by reviewers to use linkage disequilibrium (LD) for SNP pruning and also reduce the number of phenotypes tested if possible.Regarding SNPs, the authors' suggestion that all 53 SNPs are "tags" of unknown functional variants is not justified. How were they selected? If they are correlated, many of them still do tag the same function. Whether pruned or not, SNP significant association from this analysis is only an association, not causality especially given the complex nature of the phenotypes. It is true that LD pruning would make it difficult to replicate findings in other studies, particularly from other populations but it would be possible to use the nearest SNP available in those studies (as would also be the case if a different genotyping array was used).

Regarding phenotypes, first of all it is good to have such a richly annotated data. However, as already acknowledged, these are not independent and correcting for 16 tests is unduly self-penalising (just as it would be to correct for 53 SNPs). It would be more appropriate to set one phenotype (or a few unrelated) as the primary focus and others as exploratory or explanatory of the main ones if a link can be established between them (which I think there is)."

Please address these issues in what will hopefully be the last round of revision.

https://doi.org/10.7554/eLife.74308.sa1

Author response

High priority-Essential

1) Please make a more compelling case for the significance of the identified SNPs. Reviewer 2 (R2) suggests a threshold for significance for the SNPs of p=5.8e-5, which is not cleared by reported SNPs.

More specifically, R2 says: "The authors tested the significance of 53 SNPs in the ALDH4A1 locus with 16 phenotypes in the HRS (Table S1). They claim that they are performing a gene-wide association study and are correcting for 16 tests (0.05/16) in their statistical significance assessment. This analysis is not correct. The authors need to correct for 53x16=848 tests (0.05/848 = 5.8e-5 if they choose to use Bonferroni correction). The authors can argue that the SNPs are not independent, which would be true. In that case, they need to prune the SNPs based on linkage disequilibrium and use index/tag SNPs for their analyses. Further, some of the 16 phenotypes could also be highly correlated; and this needs to be acknowledged/addressed".

– the authors should justify the P-value threshold with strong statistical argument or a reinforcing analysis. If this is not possible, the work as presented here cannot be recommended as adequately rigorous.

If possible, address replication in cohorts. R3 suggests that a validation of the importance could involve author identification of the best phenotype and SNPs (as they have done), and then to validate these findings in other muscle aging data sets. The authors could use a similar approach to this work, but with many fewer SNPs and phenotypes, which would lower the p-value threshold, and make the work more robust. Such an approach would exploit existing electronic data, so the work involved should be fairly modest.

These are critical issues, and we thank the reviewers for bringing them up so we can add clarity to the approach. We did not prune SNPs based on linkage disequilibrium (LD) for several reasons. First, to address SNP pruning, it is unknown which SNPs represent conservation across species and given unique human ancestral lineages, we did not want to miss potential SNP associations that represent important variation within the gene. It is now known that not all SNPs function equally across a gene, rather such things as positioning (exons closer to 5’ UTR, exons closer to 3’ UTR, introns) show different enrichment patterns within a gene, and thus effects can reflect differently in association studies; these differences are not clearly differentiated by LD between SNPs. Secondly, because each of the 53 SNPs within the gene represents a tag, or marker SNP for human variation within the gene with this association testing, we are not able to identify a causal variant, rather we are only to index that some variation within the gene suggests conserved associations with humans. Given this study design, we did not expect to identify causal SNPs, but rather sought to establish a pattern of association across all SNPs. So, by not pruning, if SNPs within our array are suggestive of association with phenotypes, yet are tagging SNPs not represented in the array, we would not miss further evaluating the SNP associations to represent variation within the gene particularly if suggestive associations appeared across more than one phenotype (which we now make clearer the correlations that exist between them).

Relatedly, with regard to the multiple phenotypes, the traits used in these scans came from a population-based study so that the traits were not assessed to allow us to identify physiological degeneration in specific muscles, rather to index and track overall age-related decline in functionality over time. It is widely accepted now that the genetic variation underlying these aging-related traits are highly polygenic. Thus, it is not expected that a single variant within a gene would be identified to drive these phenotypic results in humans. It is likely that small effects of multiple SNPs across multiple genes, including within the same gene, and with non-additive effects (e.g., gene-by-environment effects), contribute to the resulting phenotype. With this use of gene-wide association scanning approach, it is only possible to identify those variants contributing to overall effects. This is, for example, one reason SNPs are not always pruned for LD in the creation of polygenic risk scores, another being that pruning would be highly skewed towards the dominant ancestry sample. Also related, LD structures are explicitly tied to ancestry among humans (e.g., distinct demographic and recombination histories of groups). Thus, as mentioned, pruning by LD would be specific to the human cohort for this study, of primarily European ancestry and for future replication in cohorts, any pruning would potentially remove SNPs that are more important in cohorts with different LD structures. Because we are validating a finding initially resulting from C. elegans, there would be no rationale for pruning SNPs based on the LD structure of a single ancestry cohort. In the text, we now make better acknowledgements to the application of these techniques and the goals for these genewide-scans that invoked all SNPs. Given this approach, we did not calculate a p-value threshold on the basis of the number of SNPs rather designed the scan using all SNPs as markers for capturing variation within the single gene (Bonferroni corrected p-value of 0.0031 = 0.05/16 total phenotypes for variation in one gene). We now emphasize in the text that with these results among humans, we cannot identify a causal SNP nor elucidate a mechanism for the biological pathway, but can validate that there is an association between variation within the gene with muscle functionality that was previously found in C. elegans. Further experimental studies, or specific muscle functionality phenotyping, would be required to address the mechanisms.

Also, thank you for pointing out this important aspect of the approach with regard to correlated phenotypes. The 16 phenotypes selected for testing are correlated by design, with the aim of establishing a pattern across aging-related muscle phenotypes that represent aging-related physical functioning, in a normative aging sample. (We did not intend to find a single genetic variant that would explain functionality of a specific muscle group responsible for walking, for example.) Specifically, we chose to include common composite measures of physical functioning that index aging-related decline, including ADLA, IADLs (3 and 5 item composites), mobility, and large muscle functioning. Because we viewed these composite phenotypes as sensitivity measures, and ones that subsume other individual phenotypes tested, we did not consider these to be independent. For example, the ADLA composite includes 5 tasks (bathing, eating, dressing, walking across a room, and getting in or out of bed) and as shown, the SNP association with the individual phenotype of walking across a room was stronger than for the composite ADLA. With IADL decline phenotypes, we show these as broader composites that indicate how changes in human functionality are associating with gene variation in addition to the individual phenotypes that show weaker and suggestive associations (e.g., getting up from a chair, jogging, lifting 10 lbs, walking 1 block, walking upstairs). Thus, we emphasize that the inclusion of the composites as overlapping phenotypes aid in establishing a pattern. Relatedly, we did not include composite phenotypes in the calculation of the suggestive p-value cut-off and grouped other phenotypes by physiological functionality with arm pushing/pulling, getting up from a chair, walking and jogging, and grip phenotypes (0.5/4=0.013) with the approach of testing for associations among SNPs representing variation across the gene.

With regard to replication in other cohorts, access to harmonized phenotypes and genetic data is required; however, we were able to more immediately test replication across ethnic subsamples in the HRS by calculating a common effect size across the samples. We did this by completing a fixed effects and random effects meta-analysis using PLINK software, with results show in Author response table 1. Statistics shown are the N (sample size by group included in the meta-analysis), Fixed effects (p-value and effect size), random effects (p-value and effect size), Cochrane’s Q statistic as an indicator of variance across sample effect sizes, and the I heterogeneity index, which quantifies dispersion across samples. For some SNPs, the minor allele frequency in the African ancestry sample was below 1% and thus the subsample was excluded from the meta-analysis. The Q and I statistics indicate random effects analysis fit the data better for IADL and gait speed decline, thus we focus on results from random effects to account for differences in effect sizes by sample (e.g., the I index for IADL decline indicates 63.39% of the observed variance between samples is due to differences in effect sizes between samples). Given these results, the common effect size calculated for walking across a room and grip strength decline still suggest significance of these associations with SNPs in the gene, whereas the effects for IADL and gait speed decline remain for the European ancestry cohort only and not across subsamples. Further probing of the associations between SNPs within the gene and such phenotypes require physiological measures to target specific muscle groups.

Author response table 1
phenotypeSNPnamelocationminor alleleEuropean AncestryNAfrican AncestryNHispanicNFixed EffectP-valueRandom EffectP-valueFixed Effect:OR or βRandom Effect:OR or βQI
Walking across a roomRs 11128960319195492G9907--10670.000500.000501.46641.46640.58250.00
Grip strength declineRs 2866569919200185A5228--4090.001500.00150-0.0418-0.04180.33410.00
IADL decline (3 tasks)rs11128960319195492G9041--9350.003160.074600.06440.06420.098463.39
Gait speed decliners7760858019196968A33193812370.007750.729000.04240.01460.057764.95

2) Please link SNPs to ALDH4 expression or stability to provide some sense of how a SNP could result in functional changes--making this connection compelling underlies the major conclusion of the paper.

We emphasize that we have not identified a single variant that is responsible for driving the specific aging-related functional phenotypes tested in this particular study, rather we establish that there is variation within the gene that strongly suggests relationships with a common underlying factor. Further, it is likely that small effects of multiple SNPs across multiple genes, including within this same gene, and with non-additive effects (e.g., gene-by-environment effects), contribute to the resulting phenotypes. Without identifying a causal SNP, we can only aggregate available data to suggest what contributes to a biological pathway. For example, through exploitation of the publicly available Genotype-Tissue Expression (GTEx; https://pubmed.ncbi.nlm.nih.gov/29022597/) database, we found that a SNP within the gene was significantly associated with tissue-specific differential ALDH4A1 expression levels in whole blood, as shown in Figure 3—figure supplement 1. Further experimental studies to know the downstream effect of this altered gene expression or specific muscle functionality phenotyping, would be required to address the mechanisms.

We suggest that you mine the eQTL and RNA stability databases to determine which SNPs can be linked to some potential change in ALDH41A expression/stability. The eQTL analysis is both important to substantiate the claims and doable. What the authors would need to do to substantiate their claim that ALDH41A is a biomarker of muscle health in humans is to mine and perform the appropriate genetics and statistical analyses using existing cell-specific data in the extensive human eQTLs repositories (e.g., GTEx, eQTLGen, etc). Tissue-specific human eQTL data are publicly available, and can be extracted and analyzed. For more information on these repositories, the authors can refer to PMID: 34493866.

Invoking the Genotype-Tissue Expression (GTEx) database (https://gtexportal.org/home/), which is currently the most comprehensive resource for tissue-specific gene expression and regulation data (similar to the eQTL Catalog in the PMID mentioned), we find that there is an eQTL found in whole blood associated with one SNP in ALDH41A (rs77608580). This association with the eQTL implies there is some regulatory function related to the SNP, although further evaluation of muscle-specific effects using experimental approaches would be next steps for this research.

Better connect predictor human SNPs and C. elegans muscle aging biology.

The SNPs of human ALDH4A1 were not analyzed with respect to the C. elegans alh-6 mutations. Reviewers agreed that making a "functional" connection from nematode to human SNP would strengthen the arguments put forward in the paper. Minimally, the authors should include clear listing of whether SNPs might be associated with a change in the C. elegans protein/transcript.

We have updated the text of the manuscript (thoroughly discussed above)

Easy additions for clarity and value:

The goal of establishing a functional link between identified SNPs and ALDH4A1 function in human muscle can and should be enhanced by adding details regarding SNP impact on protein structure and function prediction, or of potential mRNA consequence (splicing site perturbation). Compare conserved amino acid sequences of C. elegans ALH-6 and human ALDH4A1 in parallel. Include schematic illustration of ALH-6 protein and predicted structures of ALH-6 wild-type and mutant proteins using protein structure prediction tools (i.e. AlphaFold2), which will provide useful data for alh-6 substitutions identified.

As suggested, we have used available computational resources to create models of the impact of each missense mutation on the protein structure of ALH-6 (Figure 1, figure supplement 2). We accomplished this by first modeling the wild-type ALH protein in Phyre2 – http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index2. We then systematically modeled each missense mutation using http://missense3d.bc.ic.ac.uk/missense3d/3, which reports the impact, if any, to the protein structure. This data is summarized in Figure 1, figure supplement 3.

Suggested but not essential:

Constructing human cognate CRISPR alleles in nematodes would be welcome (especially if any support a clear structure/function hypothesis), but since 1) it is possible that a SNP change might not directly correspond to nematode impact even if the proposed relationship were operative, and 2) such engineering could require a fair amount of elegans manipulation, making this connection is not considered a requirement for a successful revision.

In the future, we are certainly interested in testing structure/function hypotheses based on causal SNPs that we identify.

3) Comment on novelty of approach. The GWAS mining approach is important as it reports a success that opens up a novel avenue for connecting C. elegans biology/genetics to human physiology. Still, success with this approach for human obesity genes have been recently reported (PLoS Genetics; PMID: 34492009), so the approach is not the first of its type. Authors should cite that literature; their contribution on this front is still significant and the application in this system moves this approach toward a more central activity in the field, which is powerful.

This is an important point, and we now provide additional references of the literature for GWAS studies and testing in model systems. However, the novelty of our approach is the use of the large human cohort in the US Health and Retirement study that represents “normal aging” in the population and the use of the gene-wide association scanning (GeneWAS) approach. This is different than the more commonly applied genome-wide association studies (GWAS) that select for individuals with a specific disease state that drive morbidity and mortality.

4) The title should be precise and true to findings. R1 noted that "The title of this paper is somewhat misleading. The authors produced a prediction model of age-dependent decline of muscle functions based on genotypes of human ALDH4A1, but not those of C. elegans alh-6". The title. e.g., "Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species" should be revised for accuracy.

We have changed the title as suggested

5) Explain better, write carefully on SKN-1 activity claims. R2 notes that gst-4::GFP can be induced by SKN-1 as well as by additional transcription factors; SKN-1 activity is not directly tested in this study. Authors should directly test SKN-1 activation, or just state gst-4::GFP expression as the outcome assayed, with caveats mentioned. The authors have previously produced data on skn-1 in this response, possibly more extensive discussion of that data might allay some of the concerns.

Although, we have previously demonstrated that the activation of the gst-4p::gfp reporter is dependent on SKN-1, we have adjusted the text as requested to indicate GFP expression from the gst-4p promoter as the assay outcome.

Authors should also consider R1 points about muscle-specific activation of gst-4::gfp claims.

This is an important point. Although the musculature is the most obvious response, our previous published studies reveal the expression was restricted to body wall muscle, pharyngeal muscle and neurons. We have included a comment about neurons in our manuscript.

In sum, in revision, the authors precise as to what is directly assayed in the screen (gst-4::GFP expression, not necessarily SKN-1 activity) in the summary.

See point 9 above.

Reviewer #1 (Recommendations for the authors):

1. I think the biggest issue for this excellent paper is that the information about SNPs of human ALDH4A1 was not analyzed with respect to the C. elegans alh-6 mutations. Do the SNPs occur at the same or similar loci of the orthologous C. elegans mutation sites? If not, can the authors introduce human SNPs into C. elegans for the orthologous changes and test whether they affect age-dependent declines in muscle functions? This is the key for improving the paper to fit the purpose of the work.

We thank the reviewer for the positive assessment for our work. Due to limitations in the assessment of the genomic data from the large human cohort, which is not whole genome sequencing but an Illumina array of 2.4 million SNPs, so we are unable to assess complete variation in Aldh4a1. It is possible to use imputed SNPs, but these would not augment our ability to identify additional or causal variants within the gene because the imputed ones are already represented by marker SNPs in high linkage disequilibrium that are present on the genotyping array. Thus, the available SNPs are those that best represent common variation within the gene given the current array technology. However, the association of multiple SNPs in Aldh4a1 with age-associated loss of muscle health is highly suggestive that variation at this locus (perhaps attributable to a SNP not directly measured by the array) is linked to function. Once fully sequenced it would be of great interest to test any SNPs that change protein coding regions and compare with the frequency of mutations in those homologous regions in ALH-6 recovered from our genetic screens.

2. The title of this paper is somewhat misleading. The authors produced a prediction model of age-dependent decline of muscle functions based on genotypes of human ALDH4A1, but not those of C. elegans alh-6. Please downplay the title. e.g., "Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species"

We have changed the title as suggested. “Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species”

3. In Figure 1b, please show schematic illustration of ALH-6 protein. In addition, predicted structures of ALH-6 wild-type and mutant proteins using protein structure prediction tools (i.e. AlphaFold2) will provide useful information for alh-6 mutations (for example, "G152/K153" in page 5 line 89).

Figure 1b is indeed a schematic illustration of the ALH-6 protein. We have modified the legend to explicitly state this.

4. In Figure 1c, please describe accurately how the authors executed One-way ANOVA statistical analysis for measuring GFP intensity.

We apologize for the confusion, we initially performed a one-way ANOVA to compare the gst-4p::gfp reporter strain, the alh-6(lax105) canonical allele, and the new alleles isolated from our screen. However, to make the analysis more clear we have modified our statistical analysis to evaluate each mutant relative to the gst-4p::gfp control.

5. I recommend comparing conserved amino acid sequences of C. elegans ALH-6 and human ALDH4A1 in parallel.

We have added the % identity for ALH-6 and ALDH4A1 in the introduction.

6. In figure supplement 1, please add a scale bar.

We have added a scale bar.

7. In Figure 2, please elaborate the choice of p values 0.003 and 0.013.

We agree this approach of p-value determination is important to carefully address. With agnostic genome-wide association studies, the gold standard method is to adjust the p-values based on stringent thresholds, most simply and conservatively done through Bonferroni-corrected p-values (calculated as 0.05/number of SNPs tested*independent traits tested). This is because it is unknown whether any variation in the gene is associated with the traits of interest. However, this investigation was not agnostic, rather there was a hypothesized association between variation in the gene and age-related muscle function from prior experimental results as described in C. elegans. In subsequent association studies, in order to validate a hypothesized association, a threshold of p<0.05 has been used; however, this raises the potential for type 1 error or reporting false positives. Thus, we clarify our approach more thoroughly below -- under “High priority-Essential” item #1.

We note that some SNPs represent rarer variants in humans depending on the population from which the sample was drawn. Also, it is unknown the degree to which the SNP frequency represents conservation of variation in a model organism like C. elegans because frequencies are highly dependent upon human ancestry. Thus, we chose to include results for all SNPs with a minor allele frequency of at least 0.01 as each of these SNPs serve as markers for other SNPs within the gene that may not be represented on the array.

8. In Figure 3, please mark exact genotypes at x axes.

We have updated the figure as requested. The counts represent the number of minor alleles for each SNP (which is among European ancestry individuals):

Author response table 2
SNP name012
Rs77608580GGAGAA
Rs28665699GGAGAA
Rs111289603AAAGGG

9. For Figure 3, p values are in written the Figure Legends and Table 1. However, it will be better to mark p values or asterisks in the panels as well.

We have left the figures as is and emphasized in the text that these effects are examples of where there is variation in the gene contributing to phenotypes that represent a range of age-related change in functionality; overall we find there are small effects associated with each phenotype, there are possible pleiotropic effects, and the SNPs identified represent markers for variation within the gene, but we cannot claim causality.

10. I recommend adding discussions regarding how genetic variants in alh-6/ALDH4A1 contribute to age-dependent impairment in muscle functions. In particular, it will be great to add speculation for changes in mitochondrial proline dehydrogenase levels and/or activities that may affect sarcopenia.

See response under the “High priority-Essential” item #1.

The traits used in these scans came from a large and sociodemographically diverse, naturally aging human sample so that they do not allow us to identify physiological degeneration or impairment in specific muscles, rather to index and track overall decline in functionality through older ages. It is widely accepted now that the genetic variation underlying these aging-related traits are highly polygenic. Thus, it is not expected that a single variant within a gene would be identified to drive these phenotypic changes in humans. It is likely that small effects of multiple SNPs across multiple genes, including within the same gene, and with non-additive effects (e.g., gene-by-environment effects), contribute to the resulting phenotype, or multiple phenotypes (i.e., pleiotropy). With this use of genewide-association scanning approach, it is only possible to identify those variants associated with overall effects.

11. Please change one of "IADLA Decline" to "IADLA2 Decline" in Figure 2.

We have changed the text and modified the figure

12. On page 5, line 85, add lax918 in addition to lax903 and lax930 for E78Stop.

We have changed the text

13. On page 6, line 163, use "day 3" instead of "Day 3" for consistency.

We have changed the text

Reviewer #2 (Recommendations for the authors):

To strengthen the main claims this reviewer suggests a revision of the following points:

1) The authors state:

"Based on the striking specificity of the muscle-restricted and age-dependent activation of SKN-1 in C. elegans harboring mutations in alh-6,…"

However, currently, data showing SKN-1 activation in the alh-6 mutant backgrounds are lacking. Direct observation of SKN-1::GFP activation would be important because the transcriptional reporter gst-4P::GFP, although a target of SKN-1, is also activated by other stress-responsive TFs including NHR-49 (PMID: 30297383) and DAF-16 (PMID: 32161088).

In addition, in Suppl. Figure 1 various alh-6 mutant alleles show non-muscle gst-4P::GFP signal.

Therefore, the authors may need to directly test SKN-1 activation, and revise the muscle-specific claims.

We thank the reviewer for this suggestion. Although we have previously shown that the activation of the gst-4p::gfp reporter is lost from skn-1 RNAi treatment, we have edited the text to clarify and specify the observation of the muscle specific activation of the gst-4p::gfp reporter.

2) Structural predictions may enable the authors to generate hypotheses suggesting how a very early STOP codon in alh-6 (e.g. lax906) may have a phenotypic impact on muscle function comparable to a late amino acid substitution (e.g. lax947).

This is indeed an interesting question for future structure-function studies. We have used Phyre to model predicted protein structure of wild-type ALH-6, and used Missense3D to locate the mutations on the predicted protein structure. See response under “High priority-Essential” item #5 below.

Previous studies have demonstrated that the level of mitochondrial dysfunction (and ROS production) is related to both beneficial and detrimental physiological outcomes; even driving lifespan extension versus lifespan shortening (reviewed in PMID:33644065).

3) The authors tested the significance of 53 SNPs in the ALDH4A1 locus with 16 phenotypes in the HRS (Table S1). They claim that they are performing a gene-wide association study and are correcting for 16 tests (0.05/16) in their statistical significance assessment. This analysis is not correct. The authors need to correct for 53x16=848 tests (0.05/848 = 5.8e-5 if they choose to use Bonferroni correction). The authors can argue that the SNPs are not independent, which would be true. In that case, they need to prune the SNPs based on linkage disequilibrium and use index/tag SNPs for their analyses. Further, some of the 16 phenotypes could also be highly correlated; and this needs to be acknowledged/addressed.

See response under the “High priority-Essential” item #1.

4) It is unclear how the SNPs presented in Table 1 are linked to ALDH4A1. The SNPs are in the locus, but that does not necessarily mean they have a functional impact on ALDH4A1. For instance, do the identified SNPs lead to amino acid, splicing, or other regulatory changes in ALDH4A1?

Ultimately, this paper aims to advance ALDH4A1 as genetic factor underlying muscle-function decline in aging humans. Therefore, to go beyond the already publicly available link between the SNPs in the ALDH4A1 locus and muscle performance in aging humans, the authors may need to define whether the reported SNPs are linked to ALDH4A1 expression as eQTLs, especially in muscle tissue/cells.

This is an important point, which we have clarified further in the text. Due to limitations in the assessment of the genomic data from the large human cohort, which is not whole genome sequencing but an Illumina array of 2.4 million SNPs, so we are unable to assess complete variation in Aldh4a1. It is possible to use imputed SNPs, but these would not identify additional or causal variants within the gene because the imputed ones are already represented by marker SNPs in high linkage disequilibrium that are present on the genotyping array. Thus, the available SNPs are those that best represent common variation within the gene given the current array technology. However, the association of multiple SNPs in Aldh4a1 with age-associated loss of muscle health is highly suggestive that variation at this locus (perhaps attributable to a SNP not directly measured by the array) is linked to function. Once fully sequenced it would be of great interest to test any SNPs that change protein coding regions and compare with the frequency of mutations in those homologous regions in ALH-6 recovered from our genetic screens.

5) For clarity, the distribution of the phenotypes for each allele may be presented in Figure 3 like they are presented in Figure 4.

The distribution of phenotypes for each variant is shown in Figure 2. We have updated figure 3 to show the effect of each allele.

Reviewer #3 (Recommendations for the authors):

1) The finding that alh-6 mutants exhibit declines in mobility during aging could reflect a selective effect on muscle function, or could be reflective of a larger acceleration of aging. It would be helpful to show lifespan data for the alh-6 mutant, or discuss if this work has been published previously.

We have referenced the previous work of Pang et al., Which documents the lifespan data of the original alh-6 mutant alleles.

2) The conclusion that the alh-6 mutant affects muscle function during aging could be further bolstered via studying muscle mass and structure as well as mitochondrial number and structure will readily available reporters.

We have referenced the previous work of Pang et al., and Yen et al., which documents the mitochondrial measures of alh-6 mutants.

3) For the human SNP analysis, engaging the help of an expert in human Kinesiology or Geriatrics can help with narrowing the phenotypes to make them more selective for muscle function, and exclude those where the connection to muscle function is more tangential.

We appreciate the reviewer’s suggestion. We have chosen to be more inclusive of the data available in the HRS, which is a population-based sample with rich phenotyping across multiple traits. These traits were not assessed to allow us to identify physiological degeneration in specific muscles, rather to index and track overall age-related decline in functionality through a large and sociodemographically diverse, naturally aging human sample. Thus, the inclusion of multiple indicators allows us to provide some degree of sensitivity testing using the population-based collection of measures, and thus greater confidence that normative human muscle functioning and change in functioning over age finds some association with variation in this gene. We have now included in the text the notion that some HRS phenotypes will be more directly related to muscle function while others are more complex and are thus more likely to depend on additional factors.

4) Several recent papers have used genome-wide association analysis to identify SNPs associated with decreased muscle strength in middle-aged and older individuals. This work will provide information on other readily available dataset that could be suitable for a replication study to determine if the findings are seen in separate study populations.

We agree and we have referenced these data sets, although they require application, review and approval to use the data for new studies. Any of the HRS family of sister studies that are collecting genetic data: English Longitudinal Study of Ageing (ELSA; https://www.elsa-project.ac.uk); Irish Longitudinal Study on Ageing (TILDA; https://tilda.tcd.ie/); cohorts in the Survey of Health, Ageing and Retirement in Europe (SHARE; https://g2aging.org/overviews?study=share-aut), or Northern Ireland Cohort for the Longitudinal Study of Ageing (NICOLA; https://www.qub.ac.uk/sites/NICOLA/AboutNICOLA/); and others that include European ancestry individuals who are aged 50 and older.

5) For the SNP studies, the 0.003 p-value cut-off is likely too low given that 53 SNPs are being studied making the multiple testing greater than the correction used.

We agree that with agnostic genetic association testing, designed for the discovery of whether there is an association between variation within a gene and a trait, that a cut-off with the 53 SNPs would be more appropriate. However, in this case, the genetic association testing was designed to test an a priori hypothesis given findings presented in C. elegans. We chose the 0.003 cut-off for several reasons, primarily that (a) this study was designed to validate an existing hypothesis; (b) that each of the 53 SNPs within the gene represents a tag, or marker SNP for human variation within the same gene, such that with this association testing we are not be able to identify a causal variant, rather only to index that some variation within the gene suggests conserved associations with humans; and (c) because it is unknown which SNPs represent conservation across species, particularly among humans, for whom there is genetic variation due to ancestral histories, we did not want to miss potential SNP associations that do represent important variation within the gene. We have acknowledged this rationale in the text as well as the limitations of these methods for providing us with a complete understanding of the molecular genetic and etiological basis for age-related muscle-functioning and decline.

References cited in response to reviews:

1. Battle, A. et al. Genetic effects on gene expression across human tissues. Nature 550, 204-213, doi:10.1038/nature24277 (2017).

2. Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. and Sternberg, M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 10, 845-858, doi:10.1038/nprot.2015.053 (2015).

3. Ittisoponpisan, S. et al. Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated? J Mol Biol 431, 2197-2212, doi:10.1016/j.jmb.2019.04.009 (2019).

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species" for further consideration by eLife. Your revised article has been evaluated by Carlos Isales (Senior Editor), a Guest Reviewing Editor, the original reviewers, and an expert in statistics. We are sorry about the delay.

While Reviewer 1 is now in favor of acceptance as is, Reviewer 3 states: "I still have concerns about the multiple testing correction being used by the authors, and I did not feel that the authors addressed this in a fully compelling manner, such as adding a statistician."

We have implemented suggestions from Reviewer 3 (described in further detail below) and focus on two specific two phenotypes (grip strength and gait speed) instead of all sixteen as previously analyzed. Although this reduces the depth of the data presented, we have accommodated the request of the reviewer. As requested, we have consulted with Wendy Mack, PhD, who is a professor of biostatistics in the Department of Preventive Medicine at the Keck School of Medicine of USC. She co-directs the department’s Division of Biostatistics graduate programs and directs Biostatistics Resources at the Southern California Clinical and Translational Science Institute (SC CTSI).

Reviewer 2 concurs: "Although I would really like for the work to be complete, I'd need to agree with Reviewer 3. In addition to unjustified processing of the data that leads to a significance level (0.05) that seems inappropriate for this type of study, the authors have not addressed the concerns related to experimental testing of some of the key claims in C. elegans. For instance, the authors did not directly look at SKN-1::GFP even though the tools are available and this is a feasible experiment. As explained before, the current readout is indirect (gst-4::GFP), and although gst-4 is a target of SKN-1, it is also a target of DAF-16 and other transcription factors. Therefore, gst-4::GFP levels are only suggestive of SKN-1 activation. I agree that adding a GWAS expert statistician would be really helpful."

We are not sure what this reviewer is asking; with the exception of the introduction section’s reference to past literature about stress pathways, we do not mention SKN-1 (or DAF-16) anywhere in our results or discussion in the manuscript. We have removed the two instances of usage of the word "SKN-1" from the introduction as this does not change our manuscript and the references provided make this point for us.

eLife consultant on statistics questions added: "It is true that in candidate gene association you may not need to correct for multiple tests, however, this is only the case when SNPs used are known to have a function (usually non-synonymous coding). I support the suggestion made by reviewers to use linkage disequilibrium (LD) for SNP pruning and also reduce the number of phenotypes tested if possible.

We have reanalyzed all the data and focus the results on the SNPs that are most representative after pruning and for targeted phenotypes as described below.

Regarding SNPs, the authors' suggestion that all 53 SNPs are "tags" of unknown functional variants is not justified. How were they selected? If they are correlated, many of them still do tag the same function. Whether pruned or not, SNP significant association from this analysis is only an association, not causality especially given the complex nature of the phenotypes. It is true that LD pruning would make it difficult to replicate findings in other studies, particularly from other populations but it would be possible to use the nearest SNP available in those studies (as would also be the case if a different genotyping array was used).

We now see where our language was confusing and edited the sentence (top of page 5) for clarity as well as edit the description in the methods in the last two sentences under Genotyping Data. SNPs selected for inclusion involved initially using all present on the Illumina Omni2.4 array, which totaled 70. After filtering for minor allele frequency of 0.01 or greater, 53 SNPs remained.

We also completely agree that the SNPs included are designed to be tags and results can represent only associations, without any implications for causality as explained in the revised manuscript. We have revised sentences that were confusing in this respect including removing the last sentence at the end of first paragraph on page 5, editing the last sentence of the introduction (removed the word “predictive” with “associates with”), and summary (revised wording from “impact” to “associations”).

The reviewers have provided helpful suggestions and in order to identify the most representative SNPs for potential functionality and efficiency in approach, we pruned SNPs and re-ran the GeneWAS with the refocus on fewer primary phenotypes. In the Methods, we now describe the process of filtering for minor allele frequency and pruning based on LD. This yielded 21 SNPs for consideration in the GeneWAS. We revise the corresponding methods and results to reflect this.

Regarding phenotypes, first of all it is good to have such a richly annotated data. However, as already acknowledged, these are not independent and correcting for 16 tests is unduly self-penalising (just as it would be to correct for 53 SNPs). It would be more appropriate to set one phenotype (or a few unrelated) as the primary focus and others as exploratory or explanatory of the main ones if a link can be established between them (which I think there is)."

We agree with the correlated nature of the phenotypes and appreciate the reviewer’s emphasis on the self-penalizing nature of the study design. In order to strengthen the analytical approach and focus on phenotypes that better address the goal of testing associations between genetic variation in ALDH4A1 and normative aging-related muscle functioning (vs. endurance, cardiovascular fitness, or stamina), as reviewer suggests, we have prioritized the primary focus onto main phenotypes that are more robust indicators of functionality and more precise measures of aging-related change (grip strength decline, gait speed decline) such that the two index change using repeated measures of the phenotype data rather than single time-point assessments. Instead of employing a Bonferroni adjustment for multiple-test correction, we now invoke permutation testing to set an empirical p-value threshold with which to evaluate the significance of associations. The presentation of methods and results has been revised based on this approach.

https://doi.org/10.7554/eLife.74308.sa2

Article and author information

Author details

  1. Osvaldo Villa

    Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States
    Contribution
    Formal analysis, Investigation, Methodology, Visualization, Writing - original draft
    Contributed equally with
    Nicole L Stuhr and Chia-an Yen
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0803-0752
  2. Nicole L Stuhr

    1. Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States
    2. Dornsife College of Letters, Arts, and Science, Department of Molecular and Computational Biology, University of Southern California, Los Angeles, United States
    Contribution
    Formal analysis, Investigation, Methodology, Visualization, Writing - review and editing
    Contributed equally with
    Osvaldo Villa and Chia-an Yen
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2537-7114
  3. Chia-an Yen

    1. Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States
    2. Dornsife College of Letters, Arts, and Science, Department of Molecular and Computational Biology, University of Southern California, Los Angeles, United States
    Contribution
    Investigation, Methodology, Visualization, Writing - review and editing
    Contributed equally with
    Osvaldo Villa and Nicole L Stuhr
    Competing interests
    No competing interests declared
  4. Eileen M Crimmins

    Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States
    Contribution
    Data curation, Methodology, Resources
    Competing interests
    No competing interests declared
  5. Thalida Em Arpawong

    Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States
    Contribution
    Data curation, Formal analysis, Methodology, Resources, Visualization, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9671-9535
  6. Sean P Curran

    1. Leonard Davis School of Gerontology, University of Southern California, Los Angeles, United States
    2. Dornsife College of Letters, Arts, and Science, Department of Molecular and Computational Biology, University of Southern California, Los Angeles, United States
    3. Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, United States
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing - original draft, Writing - review and editing
    For correspondence
    spcurran@usc.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7791-6453

Funding

National Institute on Aging (R01 AG058610)

  • Sean P Curran

National Institute on Aging (RF1 AG063947)

  • Sean P Curran

National Institute on Aging (T32 AG052374)

  • Osvaldo Villa
  • Nicole L Stuhr

National Institute of General Medical Sciences (T32 GM118289)

  • Nicole L Stuhr

National Institute on Aging (P30 AG068345)

  • Sean P Curran
  • Thalida Em Arpawong

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank J Gonzalez for technical assistance, Dr. W Mack for statistical consultation, and Drs. R Irwin and C Duangjan for critical reading of the manuscript. Some strains were provided by the CGC, which is funded by the NIH Office of Research Infrastructure Programs (P40 OD010440). This work was funded by the NIH R01 AG058610 and RF1 AG063947 to SPC, T32 AG052374 to OV, and NLS and T32 GM118289 to NLS. This study was supported in part by funding from The National Institute on Aging, through the USC-Buck Nathan Shock Center (P30 AG068345). The National Institute on Aging has supported the collection of both survey and genotype data for the Health and Retirement Study through co-operative agreement U01 AG009740. The datasets are produced by the University of Michigan, Ann Arbor. The HRS phenotypic data files are public use datasets, available through: https://hrs.isr.umich.edu/data-products/access-to-public-data. The HRS genotype data are available to authorized researchers: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000428.v2.p2https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000428.v2.p2.

Senior Editor

  1. Carlos Isales, Medical College of Georgia at Augusta University, United States

Reviewing Editor

  1. Monica Driscoll, Rutgers University, United States

Reviewer

  1. Alfred Fisher, University of Nebraska Medical Center, United States

Publication history

  1. Preprint posted: September 10, 2021 (view preprint)
  2. Received: September 29, 2021
  3. Accepted: April 13, 2022
  4. Accepted Manuscript published: April 26, 2022 (version 1)
  5. Version of Record published: May 13, 2022 (version 2)

Copyright

© 2022, Villa et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 962
    Page views
  • 179
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Osvaldo Villa
  2. Nicole L Stuhr
  3. Chia-an Yen
  4. Eileen M Crimmins
  5. Thalida Em Arpawong
  6. Sean P Curran
(2022)
Genetic variation in ALDH4A1 is associated with muscle health over the lifespan and across species
eLife 11:e74308.
https://doi.org/10.7554/eLife.74308

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Henrike Indrischek et al.
    Research Article Updated

    Despite decades of research, knowledge about the genes that are important for development and function of the mammalian eye and are involved in human eye disorders remains incomplete. During mammalian evolution, mammals that naturally exhibit poor vision or regressive eye phenotypes have independently lost many eye-related genes. This provides an opportunity to predict novel eye-related genes based on specific evolutionary gene loss signatures. Building on these observations, we performed a genome-wide screen across 49 mammals for functionally uncharacterized genes that are preferentially lost in species exhibiting lower visual acuity values. The screen uncovered several genes, including SERPINE3, a putative serine proteinase inhibitor. A detailed investigation of 381 additional mammals revealed that SERPINE3 is independently lost in 18 lineages that typically do not primarily rely on vision, predicting a vision-related function for this gene. To test this, we show that SERPINE3 has the highest expression in eyes of zebrafish and mouse. In the zebrafish retina, serpine3 is expressed in Müller glia cells, a cell type essential for survival and maintenance of the retina. A CRISPR-mediated knockout of serpine3 in zebrafish resulted in alterations in eye shape and defects in retinal layering. Furthermore, two human polymorphisms that are in linkage with SERPINE3 are associated with eye-related traits. Together, these results suggest that SERPINE3 has a role in vertebrate eyes. More generally, by integrating comparative genomics with experiments in model organisms, we show that screens for specific phenotype-associated gene signatures can predict functions of uncharacterized genes.

    1. Evolutionary Biology
    2. Genetics and Genomics
    Tathagata Biswas et al.
    Insight

    Comparing the genomes of mammals which evolved to have poor vision identifies an important gene for eyesight.