Epigenetic profiling of growth plate chondrocytes sheds insight into regulatory genetic variation influencing height

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

GWAS have identified hundreds of height-associated loci. However, determining causal mechanisms is challenging, especially since height-relevant tissues (e.g. growth plates) are difficult to study. To uncover mechanisms by which height GWAS variants function, we performed epigenetic profiling of murine femoral growth plates. The profiled open chromatin regions recapitulate known chondrocyte and skeletal biology, are enriched at height GWAS loci, particularly near differentially expressed growth plate genes, and enriched for binding motifs of transcription factors with roles in chondrocyte biology. At specific loci, our analyses identified compelling mechanisms for GWAS variants. For example, at CHSY1, we identified a candidate causal variant (rs9920291) overlapping an open chromatin region. Reporter assays demonstrated that rs9920291 shows allelic regulatory activity, and CRISPR/Cas9 targeting of human chondrocytes demonstrates that the region regulates CHSY1 expression. Thus, integrating biologically relevant epigenetic information (here, from growth plates) with genetic association results can identify biological mechanisms important for human growth.

https://doi.org/10.7554/eLife.29329.001

eLife digest

Humans vary considerably in height, a trait that is partly inherited from each individual's parents. Studies have identified hundreds of small changes in DNA that contribute to differences in human height. These small changes often swap out just one of the four letters that make up the DNA code. Some changes occur within genes, yet most occur in stretches of DNA that do not contain genes. These stretches of DNA likely control whether genes important for the growth of the skeleton are switched on or off.

Cartilage cells, also known as chondrocytes, are important for height. These cells are found near the end of bones in the growth plates, where the bones grow during childhood and adolescence. The on and off switches for growth genes in chondrocytes are unknown. Identifying these genetic switches could help scientists understand how hundreds of small changes in DNA help determine how tall a person will be.

Now, Guo, Liu, Willen et al. identify thousands of switches that turn on and off genes in chondrocytes. In their experiments, chondrocytes were removed from mouse bones and a type of genetic sequencing called ATAC-seq was used to identify the stretches of DNA that act as on/off switches for genes in these cells. Further analysis revealed that many of these same on/off switches occur in human chondrocytes too. Importantly, the experiments showed that many of the small changes in DNA that contribute to differences in human height are also found within these DNA switches.

Guo, Liu, Willen et al. next tested how these switches and height-linked genes interact, and found, for example, that one switch acts to shut off the gene for a protein called chondroitin sulfate synthase 1. People with mutations in this gene can have unusually short stature. However, people with DNA changes in its nearby switch can be taller or shorter than average based on how the DNA influences the switch. More studies might help scientists understand the evolutionary significance of these DNA changes. They will also help determine if the genetic switches in chondrocytes contribute to diseases that affect the bones and joints, like bone cancer or arthritis.

https://doi.org/10.7554/eLife.29329.002

Introduction

Human height, a main outcome of skeletal growth, is the product of many biological and environmental interactions spanning numerous cell and tissue types acting during pre- and post-natal development. The growth plate, located at the distal ends of long bones such as the femur, is among the most important tissues influencing height. In the growth plate, condensations of mesenchymal cells differentiate into chondrocytes in the process of endochondral ossification. Chondrocytes in the growth plate reside in longitudinally oriented columns divided in zones (resting, proliferative, pre-hypertrophic, and hypertrophic) each with important functions in allowing the bone to elongate and mature (see Liu et al., 2017 and Samsa et al., 2017 for reviews) (Liu et al., 2017; Samsa et al., 2017). This process of growth plate chondrocyte differentiation and maturation promotes elongation of long bones, which ultimately determines much of human height (Baron et al., 2015). These processes are under the influence of many extrinsic and intrinsic signals (Samsa et al., 2017; Lui et al., 2012; Kronenberg, 2003).

Height has been studied for centuries as a model genetic trait, as it is easily measured and highly heritable (typically, 70–90% of variation in height within a population is attributable to genetic variation) (Silventoinen et al., 2003; Perola et al., 2007; Visscher et al., 2006; Visscher et al., 2007; Galton, 1886). Height and body proportions are likely selected and thus are interesting from an evolutionary standpoint (Turchin et al., 2012). Studies of the genetics of height can shed insight into the mechanisms of childhood growth and into developmental biology of the skeleton (Baron et al., 2015; Dauber et al., 2014). Additionally, abnormalities of growth—short stature or overgrowth—are among the most common childhood disorders and their pathophysiology is poorly understood. Thus, genetic studies of height could further our understanding of the genetic basis of skeletal disease, evolution, biology, and normal human growth and development.

Recently, genome-wide association studies (GWAS) have been used to study the genetics of height, as well as many other traits and disorders (Wood et al., 2014). This approach systematically tests each common variant in the genome for association with a phenotype of interest, such as adult height, with the goal of identifying relevant biology, either from the identities of the genes near the associated variants, or by deducing mechanism from the associated variants themselves. GWAS for height are among the largest conducted thus far (sample sizes > 250,000) and have identified more associations than any other phenotype, with nearly 700 regions of the genome (or loci) robustly associated with height (Wood et al., 2014). The finding of so many loci influencing height, with the number expected to increase with larger sample sizes, supports the assertion that height is highly polygenic. Although each of the variants at these loci have only a small effect on height (typically much less than 0.5 cm per allele) (Wood et al., 2014), the genes within the associated loci as a group highlight important mechanisms influencing growth. For example, height variants are enriched for genes implicated in numerous biological processes relevant to growth, including embryonic stem cell function, long bone development, and spine length, among a number of other processes (Wood et al., 2014). Height variants are also preferentially located near genes that are differentially expressed in the growth plate (Lui et al., 2012).

However, as with all GWAS, moving from genetic associations to biological mechanisms at any individual locus can be challenging. There are two primary reasons for this difficulty. First, there is extensive linkage disequilibrium (LD) in the human genome, meaning that the genotypes at nearby genetic variants are often tightly correlated (Gabriel et al., 2002). The consequence of LD is that at any particular locus, there will be many variants that have indistinguishable associations, and determining which variant(s) are in fact causal (rather than associated due to LD with a causal variant) remains difficult. Second, most GWAS variants (>80%) reside in non-coding regions (Gusev et al., 2014), with no nearby coding variants that can account for the associations. Non-coding variants are challenging to interpret because there is no universal regulatory code to infer function/mechanism, and these variants can act to influence gene expression at long distances and in highly tissue-specific manners (Bernstein and ENCODE Project Consortium, 2012). Various approaches have been pursued for identifying the causal variants and/or mechanisms, including overlapping associated variants with epigenetic datasets in appropriate tissues/cell-types, statistical genetic methods leveraging subtle variations in LD patterns (called ‘fine mapping’), and in vitro and in vivo functional assays (Tak and Farnham, 2015). It is also becoming increasingly clear that GWAS variants tend to fall in regions that have particular epigenetically defined states (such as enhancers or repressors) (Finucane et al., 2015; Farh et al., 2015). Thus, the overlap between GWAS and the appropriate epigenetic annotations can help refine the identities of causal variants, helping to decipher mechanism. However, epigenetic data may not exist for the biologically relevant cell type, and there is no universally successful approach to translating non-coding associations from GWAS into biological mechanisms.

Although the large number of associated variants offers advantages in making biological inferences for the set of associated variants as a whole (Lango Allen et al., 2010), defining functional mechanisms of individual height GWAS variants faces specific challenges. Height variants likely act in many tissues and stages of development, including not only the growth plate but also embryonic cells, endocrine tissues such as the pituitary, and bone, among others (Wood et al., 2014). Human growth is also a whole-organism phenotype and is less amenable to study in cell-intrinsic models, making functional approaches particularly challenging. Finally, skeletal tissues can be highly heterogeneous and challenging to isolate because in their mature form they are surrounded by a hard extracellular matrix and strongly adhering soft tissues such as tendons, ligaments, and muscles. These factors make it difficult to isolate the large numbers of biologically relevant cells that are needed for functional experiments or for traditional epigenomic assays such as ChIP-seq (Furey, 2012).

In this study, we profile the epigenetic landscape of mouse growth plate chondrocytes using a new approach called ATAC-seq (Buenrostro et al., 2015, 2013), which allows for accurate profiling of open chromatin regions using relatively few cells (~50,000). We then use the epigenetic landscape in growth plate chondrocytes to gain new insights into the mechanisms of non-coding elements in the genome influencing expression of growth plate genes. Finally, we use these epigenetic marks, in the context of statistical fine-mapping results and gene expression profiles, to better understand height GWAS variants and the mechanisms by which they may function to influence height in humans. These studies illustrate a path that can help decipher mechanism from GWAS, even when the relevant cell type is relatively inaccessible in humans.

Results

Generation of mouse femoral growth plate epigenetic profiles

In order to identify open chromatin regions involved in long bone development, we performed ATAC-seq (Buenrostro et al., 2015, 2013) on E15.5 (Theiler Stage 23) mouse proximal and distal femoral growth zones, regions comprised predominately of the growth plate proper, the overlying perichondrium, and the cartilaginous super-structures that dictate adult bony morphology (Materials and methods). This stage of mouse embryogenesis corresponds to approximately E48-51 (Carnegie Stage 19) of human development and reflects a developmental window when the growth plate proper possesses all chondrocyte zones (resting, proliferative, pre-hypertrophic, hypertrophic zone) and is undergoing rapid cell proliferation and differentiation during endochondral ossification. Given that it remains unclear which sub-region(s) (i.e. chondrocyte zones) of the growth plate are under genetic influence to drive height variation, we performed our experiments on tissues that include all chondrocyte zones (including perichondrium) and cartilaginous tissues, excluding regions indicative of the osteogenic invasion in the developing bone collar (Figure 1a). We used ATAC-seq to isolate open chromatin regions from the DNA, which were subsequently subjected to next-generation sequencing (Materials and methods; Supplementary file 1). This approach yielded 28,257 proximal femur peaks and 27,958 distal femur peaks, with 23,764 shared peaks, 4493 peaks specific to the proximal femur, and 4194 specific to the distal femur (Figure 1b). The proximal and distal peaks span approximately 0.79% and 0.76% of the genome, respectively. We also assessed the genomic distribution of proximal and distal femur ATAC-seq peaks and found a characteristic enrichment near gene transcriptional start sites and more distal intronic and intergenic non-coding sequences (Figure 1—figure supplement 1).

Figure 1 with 2 supplements see all

Download asset Open asset

ATAC-seq on in vivo collected mouse femoral growth plate zones.

(A) Pipeline of ATAC-seq on the embryonic mouse femur. Proximal (red) and distal (blue) samples were extracted from E15.5 femoral growth plates, as labeled by *Col2a1* expression (insets). Tissues were harvested just proximal (red dotted lines in inset) or distal (blue dotted lines in inset) to the forming bone collar (outlined by dashed black lines). Cells were then treated to generate a single-cell suspension. Approximately 50,000 cells per sample were utilized for subsequent ATAC-seq. Open chromatin regions were then identified from the ATAC-seq data. ATAC-seq protocol adapted from Buenrostro et al. (2013). (B) Venn diagram shows the number of ATAC-seq peaks in the proximal and distal samples. (C) ATAC-seq peaks (after subtracting peaks shared in brain ATAC-seq) were analyzed for enrichment in gene sets using GREAT. Horizontal bars show enrichment p values (shown as –log₁₀(p value)) for Human Phenotypes. The union of proximal and distal femur peaks was used.

https://doi.org/10.7554/eLife.29329.003

Since a main goal was to understand the gene regulatory functions of variants influencing human height, we next mapped the locations of each mouse femur open chromatin region to the human genome (Materials and methods). Our liftover resulted in 24,805 proximal femur peaks and 24,788 distal femur peaks, representing approximately 87.7% and 88.6% of sites identified in mouse, respectively. These sites that were capable of liftover to human likely reflect relatively conserved genes involved in vertebrate musculoskeletal development (Shubin et al., 1997; Petit et al., 2017). The union of the two sets contained 25,829 peaks (Figure 1—figure supplement 1).

To determine whether ATAC-seq peaks are enriched near gene sets/pathways with specific functions, we performed an enrichment analysis using Genomic Regions Enrichment of Annotations Tool (GREAT) (McLean et al., 2010), which identifies curated gene sets enriched near epigenetic peaks. Analyses of each proximal, distal, and union femur datasets yielded general GO Biological Process and GO Molecular Processes terms related to basic cell biological processes, as well as phenotypes related to a number of biological tissues including skeletal tissues (Supplementary file 2–4). They also yielded phenotype terms related to general skeletal development, chondrogenesis, and long bone skeletal phenotypes, such as ‘Metaphyseal widening’, ‘Abnormality of the femoral epiphyses’, and ‘Abnormality of the femoral head’ (Figure 1c). Thus, ATAC-seq datasets generated on each specific femoral growth plate in mice reflect in part highly relevant anatomy, as well as general aspects of skeletal development in humans (see Discussion).

The ATAC-seq peaks we identified also highlight biological processes that are important in the growth plate. For example, GREAT analyses uncovered enrichment for several relevant GO Biological Processes such as ‘Cartilage condensation’, ‘Positive regulation of chondrocyte differentiation’, and ‘Chondrocyte proliferation’ (Supplementary file 2–4). Additionally, we illustrate ATAC-seq peaks at several well-known growth plate genes including Col2a1, Fgfr1, and Pth1r (Figure 1—figure supplement 2) (Long and Ornitz, 2013).

ATAC-seq peaks overlap other skeletal and chondrocyte epigenetic data sets

We next wanted to validate that our ATAC-seq datasets reflect open chromatin regions important to mouse chondrocyte biology. To do so, we analyzed a previous dataset that used ChIP-seq to profile Sox9 binding in mouse rib chondrocytes (Ohba et al., 2015). We found extensive overlap between Sox9-occupied sites and our open chromatin ATAC-seq peaks. For example, 86.4% of Class I Sox9 elements (i.e. elements clustering near transcriptional start sites of highly expressed non-chondrocyte-specific genes), 44.6% of Class II Sox9 elements (i.e. elements directing chondrocyte-related gene activity through direct Sox9 dimer binding), and 93.8% of Sox9 super-enhancer regions (i.e. enhancer complexes involved in chondrocyte cell identity) overlap with our femur ATAC-seq peaks (p<0.001 for all three sets, Materials and methods, Figure 2a).

Figure 2

Download asset Open asset

Mouse proximal and distal ATAC-seq datasets capture chondrocyte biology in mouse and humans.

(A) Proportion of mouse Sox9 Class I, Class II, and super enhancer sites (Ohba et al., 2015) overlapping with ATAC-seq peaks. (B) Proportion of mouse rib chondrocyte H3K27ac, H3K36me3, H3K4me2, H3K4me3, p300, and RNA Pol II sites overlapping with ATAC-seq peaks. (C) Proportion of human embryonic limb E47 H3K27ac (Cotney et al., 2013) and human BMDC H3K27ac sites (Kundaje et al., 2015) overlapping with ATAC-seq peaks. (D) ATAC-seq peaks at the *Matn3* locus. Published rib chondrocyte peaks for H3K4me2, H3K4me3, and Sox9 are shown in shades of orange. Proximal ATAC-seq peaks are shown in red, with called peaks shown as bars below. Distal ATAC-seq peaks are shown in blue, with called peaks shown as bars below. The locations of previously published chondrocyte enhancers are shown as black bars.

https://doi.org/10.7554/eLife.29329.006

We also examined overlap between our dataset and various histone modifications in the mouse rib chondrocytes (Materials and methods) (Figure 2b) (Ohba et al., 2015). We found that 85.1% of RNA Pol II transcriptionally active promoters, 56.9% of H3K4me2 active promoter sites, 86.7% of H3K4me3 active promoter sites, 87.4% of H3K27ac active enhancer sites, and 59.4% of p300 histone acetyltransferase active enhancer sites overlapped with femur ATAC-seq peaks. Conversely, our ATAC-seq datasets displayed relatively lower overlap with markers of gene repression; that is, only 33.7% of H3K27me3-repressed enhancer sites overlap with femur ATAC-seq signals (Figure 2b). Based on permutation analyses, the enrichments of all these histone modification sets were significant at p<0.001. Thus, we found that femoral ATAC-seq peaks show marked evidence of active chromatin state and/or gene transcription.

We also wanted to determine if our ATAC-seq peaks reflect regulatory elements involved in human limb development and chondrocyte biology. First, we used a ChIP-seq dataset comprised of H3K27ac peaks found in E47 human limb buds (equivalent to E13.5 mouse limb buds) that are shared with mouse limb buds (Cotney et al., 2013). We found that 45.2% of these E47 human limb bud H3K27ac peaks overlapped with ATAC-seq peaks, indicating that many of our mouse femur ATAC-seq peaks reflect active human limb enhancers involved in chondrogenesis (p<0.001, Figure 2c). Second, we used an H3K27ac ChIP-seq dataset generated on human adult cultured bone marrow mesenchymal stem-cell-derived chondrocytes (BMDCs) (Herlofsen et al., 2013; Kundaje et al., 2015) and found that 13.3% of human adult BMDC H3K27ac peaks overlapped our mouse distal femur dataset (p<0.001) (Figure 2c), which indicates that our femur ATAC-seq dataset also captures cell-intrinsic properties of chondrogenesis. However, the greater overlap seen in the limb bud dataset as compared to the BMDCs suggests that our dataset may be capturing epigenetic properties present in developing in vivo tissues that are poorly modeled by cell culture systems.

Our ATAC-seq data were also able to identify 14 of 26 previously-discovered active regulatory elements residing near key cartilage genes detected at different time-points and using different assays (e.g. see Figure 2d, Supplementary file 5). This suggests that mouse femoral ATAC-seq regions have in vivo enhancer activity as determined in independent assays. Overall, these data indicate that ATAC-seq datasets generated on the mouse femoral growth plates match previously expected patterns of regulatory usage important to chondrocyte biology.

Human height variants are enriched in femoral open chromatin regions

We next sought to use our ATAC-seq dataset to gain insight into height-associated variants identified by GWAS. To do so, we intersected our orthologous human ATAC-seq peaks with 688 height GWAS loci from Wood et al. (2014) (nine loci were excluded; see Materials and methods) (Wood et al., 2014). Since we do not know which variants are causal and which are associated with height due to LD, for each locus, we identified potentially causal ‘proxy’ SNPs: those with a correlation of r² >0.5 to the lead SNP based on the European subset of 1000 Genomes Phase 3 (Auton et al., 2015). The number of proxy SNPs ranged from 1 to 3548 for each of these 688 loci (Figure 3—figure supplement 1).

Across the 688 height loci, we found that 317 loci (or ~46%) contained at least one variant that overlapped with an ATAC-seq peak. To test the significance of the observed overlap, we applied GoShifter (see Materials and methods), which performs a shuffling-based approach to evaluate the overlap between an epigenetic data set and GWAS data. We found strong enrichments for the ATAC-seq peak set and height GWAS data (p=0.0059) (Figure 3—figure supplement 1a). We repeated this analysis using the Sox9 binding and histone modifications in mouse rib chondrocytes, human BMDCs, and mouse embryonic limb datasets from above, as well as DNaseI hypersensitivity sites from mouse brain, liver, and lung (Bernstein and ENCODE Project Consortium, 2012; Ohba et al., 2015; Cotney et al., 2013; Yue et al., 2014). Only the mouse rib chondrocyte H3K4me2 peaks reached statistical significance (p=0.0237), and was far less significant than our ATAC-seq data. This suggests that our ATAC-seq data captures functional genetic variation influencing height that is not captured by the other datasets. Also, as no enrichment was observed for other mouse tissues from ENCODE, this suggests that the observed enrichment is unlikely due to artifacts stemming from our use of mouse tissues (e.g. from conserved regions being lifted over).

Figure 3 with 1 supplement see all

Download asset Open asset

Height GWAS loci are enriched in ATAC-seq peaks.

(A) Enrichment of height GWAS loci in ATAC-seq peaks using GoShifter. Histogram (light blue bars) shows the number of overlaps observed in the 100,000 GoShifter permutations. Dotted line shows the number of overlaps observed in the ATAC-seq data. (B) GoShifter analysis p values for the ATAC-seq data, mouse rib chondrocyte Sox9 binding and histone modifications, human embryonic limb E47 H3K27ac, H3K27Ac of BMDCs, and brain, liver, and lung DNaseI hypersensitivity site samples from mouse ENCODE (C) Enrichment of height GWAS loci in ATAC-seq peaks using random matched GWAS loci. Histogram (light blue bars) shows the number of overlaps observed in 1000 matched random loci. Dotted line shows the number of overlaps observed in the ATAC-seq data.

https://doi.org/10.7554/eLife.29329.007

To further demonstrate the specificity of our ATAC-seq peaks for capturing height GWAS signals, we repeated the GoShifter analysis by examining the enrichment of GWAS results for other complex traits/diseases at our ATAC-seq peaks. These other traits/diseases were chosen as they have no obvious connection to height and included age of menarche (Day et al., 2017), body mass index (Locke et al., 2015), coronary artery disease risk (Schunkert et al., 2011), LDL cholesterol (Teslovich et al., 2010), mean corpuscular volume (van der Harst et al., 2012), schizophrenia risk (Ripke and Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014), and type 2 diabetes risk (Morris et al., 2012). These additional traits/diseases all demonstrated little to no enrichment at ATAC-seq peaks (Figure 3—figure supplement 1), suggesting that the growth plate ATAC-seq peaks are specifically capturing height/growth biology.

As a second approach to evaluate overlap between our ATAC-seq peak set and height GWAS data, we generated 1000 sets of random loci which were matched to the actual height GWAS loci based on factors including SNP density, minor allele frequency, number of proxy SNPs in the locus, and gene density (Materials and methods) (Pers et al., 2015). For each of the 1000 matched sets of random loci, we repeated the overlap with the ATAC-seq peaks. As none of the matched sets had as much overlap as our height GWAS loci, the enrichment we observed for the height GWAS data is highly significant (p<0.001; z-score: 11.88) (Figure 3c).

Notably, we observed that the actual number of SNPs that overlap an ATAC-seq peak for each set was higher than the number of represented loci. In total, there were 928 variants that overlapped with an ATAC-seq peak; these 928 variants were distributed across 317 loci (Supplementary file 6; Figure 3—figure supplement 1). This finding of more than one overlapping variant per locus may indicate multiple causal variants at each locus, as has been shown for other traits (Roman et al., 2015).

Fine-mapped height variants overlap with femoral growth plate open chromatin regions

An alternative method to narrow down potentially causal signals within GWAS loci is to use statistical fine-mapping, which leverages the association statistics and patterns of LD at each locus to identify which variants at a given locus are most likely to be causal. We applied PICS (Farh et al., 2015) to perform statistical fine-mapping of the (Wood et al., 2014) height GWAS results (Materials and methods). For each of 688 height-associated loci, we calculated credible set probabilities for all SNPs with an r² >0.5 to the lead variant based on the European subset of 1000 Genomes Phase 3 (Auton et al., 2015). For each locus, we then determined the 95% credible set, which represents the minimum set of SNPs at that locus such that the truly causal SNP has at least a 95% probability of being in the set. Across 688 loci, this yielded a 95% credible set of 32,846 SNPs, which represents approximately a 48.2% reduction from the total number of variants in the GWAS loci. Intersecting these credible set SNPs with femur open chromatin regions revealed 468 overlapping variants spanning 192 of the 688 loci (Figure 4a).

Figure 4

Download asset Open asset

Fine-mapping of GWAS height variants.

(**left**): Larger light green circle represents all height GWAS SNPs and their proxies (r² >0.5). Smaller green circle represents the proportion of these SNPs in the PICS 95% credible set. (**right**) Larger light blue circle represents the 688 height GWAS loci. Medium blue circle represents 317 height GWAS loci overlapping an ATAC-seq peak. Smaller dark blue circle represents 192 loci where a PICS 95% credible set variant overlaps an ATAC-seq peak.

https://doi.org/10.7554/eLife.29329.009

A subset of human height variants in femoral open chromatin regions reside near genes differentially expressed in the growth plate

A previous study profiled expression of mouse growth plate genes and identified a set of 427 genes that were differentially expressed in the growth plate (Lui et al., 2012). That study also revealed that height GWAS loci (using a previous smaller height GWAS) are enriched near differentially expressed growth plate genes (Lui et al., 2012; Lango Allen et al., 2010). We expanded on those results using the most recent height GWAS from Wood et al. (2014) and indeed found that height GWAS loci are enriched near 427 differentially expressed growth plate genes (within 100 kb) when compared to the 1000 matched random GWAS loci (p<0.001, z-score: 8.54, Figure 5a).

Figure 5

Download asset Open asset

Height GWAS loci are enriched in differentially expressed growth plate genes.

(A) Enrichment of height GWAS loci in differentially expressed growth plate genes. Dotted line shows number of GWAS loci with a nearby differentially expressed growth plate gene (±100 kb). Histogram (light blue bars) shows the number of overlaps observed in 1000 matched random loci. (B) Enrichment of height GWAS loci in differentially expressed growth plate genes and ATAC-seq peaks. Same as (A), except showing overlap for GWAS loci overlapping differentially expressed growth plate gene (±100 kb) and ATAC-seq peak.

https://doi.org/10.7554/eLife.29329.010

Since we detected enrichment of height GWAS variants near femoral open chromatin regions (see Figure 3) and reasoned that many of the variants likely function by influencing expression of growth plate genes, we next identified the number of GWAS variants that fit the criteria of residing in an open chromatin region near a differentially expressed gene (within 100 kb). We again used the 688 GWAS loci from Wood et al. (2014) and their proxy SNPs (r² >0.5) and found that 46/688 loci (or ~6.7%) contained at least one GWAS variant that overlapped with an ATAC-seq peak and resided near a differentially expressed gene (Supplementary file 6). Using the same 1000 sets of matched random loci as described above, we found that these overlaps were highly significant (p<0.001, z-score: 6.36, Figure 5b). These loci likely represent combinations of functional variants that overlap an epigenomic element in growth plates that then modulates expression of nearby genes influencing height.

Wood et al., had previously observed that height GWAS SNPs that represent cis eQTLs in whole blood are enriched for expression in cartilage (Wood et al., 2014). Here, we sought to evaluate whether differentially expressed growth plate genes that are also eQTLs genes in whole blood are enriched at ATAC-seq peaks. Among the 427 differentially expressed growth plate genes, 157 had previously been identified as eQTL genes in whole blood (Westra et al., 2013). Using the GoShifter permutation analysis as described above, we demonstrated that the corresponding eQTL variants for these 157 genes are significantly enriched at ATAC-seq peaks (p=0.0004)

Of the 46 loci in the ATAC-seq femur comparison that fall near genes differentially expressed in growth plates, 44 different genes have been identified as possibly being modulated by these putative regulatory variants (note, some genomic loci have multiple genes and/or GWAS loci) (Supplementary file 6). These genes represent major signaling molecules (e.g. IHH, HHIP), transcription factors (e.g. RUNX2), extracellular matrix factors (e.g. CHSY1, EXTL1), among other factors involved in the growth plate in both humans and mice.

Known and novel targets of human height variants in putative chondrocyte regulatory regions

We next examined some of the most compelling loci from a genetic and mechanistic perspective. We filtered for loci where a height GWAS variant in the PICS 95% credible set overlapped an ATAC-seq peak, and was near a differentially expressed growth plate gene (±100 kb). In total, 59 variants distributed across 26 loci met these criteria. These 59 variants thus have strong evidence for being potentially causal variants and might be prioritized for downstream study (Supplementary file 6).

For example, at the CHSY1 locus, four variants (rs8042551, rs11639408, rs9920291, and rs3911964) met these criteria. Disruption of Chsy1 in mice and humans leads to severe skeletal phenotypes including long bone length reductions (Wilson et al., 2012; Temtamy et al., 1998; Tian et al., 2010; Li et al., 2010). Of the four variants, rs9920291 is predicted to alter HOXD13 binding, as suggested by data generated using universal protein binding microarray technology that demonstrates that the eight base-pair sequence matching the immediate rs9920291 locus disrupts in vitro binding by HOXD13 (Figure 6a, Supplementary file 7), a homeodomain transcription factor that is expressed in the growth plate (Kuss et al., 2014; Reno et al., 2016) and has known activating and repressive roles in skeletal development (Johnson and Tabin, 1997). We performed a series of functional assays to test the effects of allelic variation at rs9920921. First, using a luciferase reporter assay in T/C-28a2 human chondrocytes, we found that the open chromatin region surrounding rs9920291 acts as a repressor of expression (p=7.83×10⁻⁶ and 3.6 × 10⁻⁴ for variants carrying the reference and alternate alleles respectively) (Figure 6b). In addition, the alternate T allele, (i.e. the height-increasing allele of rs9920291) makes the regulatory element act as a weaker repressor (p=0.0014). This is consistent with data from GTEx (Lonsdale J, GTEx Consortium, 2013), where rs9920291 acts as an eQTL for CHSY1 in transformed fibroblasts (p=1.0×10^-7; Figure 6—figure supplement 1), with the T allele correlating with increased CHSY1 expression levels. Additionally, we leveraged the fact that in HEK-293FT cells, rs9920291 is heterozygous, along with several additional SNPs in the coding region of CHSY1. Assaying a heterozygous exonic coding SNP (rs28364839) and a 3’UTR SNP (rs11433), we demonstrate that rs9920291 displays allelic skew in expression (p<0.001 for both assayed variants; Figure 6d).

Figure 6 with 2 supplements see all

Download asset Open asset

Intersections of GWAS height variants with ATAC-seq data identify novel putatively causal variants.

(A) ATAC-seq height GWAS variant intersections at the *CHSY1* locus refine height GWAS loci to fewer putatively causal variants. Height GWAS SNPs (r² >0.5 to lead SNP) are shown as black bars, and SNPs in the PICS 95% credible set are shown as red bars. Published H3K27ac ChIP-seq in E47 human limb buds and BMDCs are shown in orange and purple respectively. ATAC-seq peaks lifted over to human are shown for proximal femur (red) and distal femur (blue). Published Sox9 class II sites from mouse rib chondrocytes lifted over to human are shown in light blue. Four height SNPs overlap an ATAC-seq peaks, representing putatively causal variants. Of these four variants, rs9920291, a C to T substitution, significantly alters the predicted binding of HOXD13 (UniProbe: HOXD13 C allele z-score = 0.7825; HOXD13 T allele z-score = 5.1269 (also Supplementary file 7) (B) Results of a luciferase reporter assay. A 341 bp region surrounding rs9920291 and containing either the reference or alternate allele at rs9920291 was cloned into a luciferase reporter vector and transfected into the human chondrocyte cell line T/C-28a2. Expression was normalized (to 1.0) by luciferase activity of the empty vector luciferase construct. (C) Allelic skew of two *CHSY1* genic variants (rs28364839 which is exonic and rs11433 which is in the 3’UTR) in HEK-293FT cells. Notably, HEK-293FT is heterozygous for rs9920291, along with both assayed variants. (D) Expression of *CHSY1* following CRISPR-Cas9-mediated deletion of regions surrounding rs9920291 (see Figure 6—figure supplement 1) in T/C-28a2 chondrocyte cell lines (N = 3 biological replicates). *CHSY1* expression following deletion of an approximate 135 bp region using guides 1 and 4 is shown in purple and expression following deletion of an approximate 17 bp region using guides 6 and 8 is shown in orange. Expression values are normalized to a control CRISPR-Cas9 vector (green). (E) *CHSY1* expression following HOXD13 overexpression in T/C-28a2 cells (N = 3 biological replicates) (see also Figure 6—figure supplement 1). Expression is normalized by an empty overexpression vector. ***p<0.001 by unpaired t-test.

https://doi.org/10.7554/eLife.29329.011

To further demonstrate that the region surrounding rs9920291 displays regulatory activity for CHSY1, we performed CRISPR-Cas9 targeting of a 135 bp sequence spanning a conserved portion of the regulatory element containing rs9920291 (guides sg1 and sg4), or an 17 bp sequence immediately surrounding rs9920291 (guides sg6 and sg8). Either deletion led to a significant upregulation of CHSY1 expression in vitro in T/C-28a2 cells (p=7.83×10⁻⁶ for the 135 bp deletion and p=3.64×10⁻⁴ for the 17 bp deletion; Figure 6d), confirming the regulatory element’s repressor activity. Consistent with our computational predictions that rs9920291 perturbs HOXD13 binding, we found that overexpression of HOXD13 in T/C-28a2 cells significantly upregulated CHSY1 expression (p=1.63×10⁻⁵; Figure 6e,Figure 6—figure supplement 1).

We also highlight a previously identified height-associated variant (rs4911178) at the GDF5 locus, which resides in an ATAC-seq peak active in femoral growth plates (Figure 6—figure supplement 2). We have shown in separate work that the underlying enhancer drives GDF5 expression in growth plates and that rs4911178 has allele-specific activity(Capellini et al., 2017). Thus, our approach of integrating genetic and growth plate epigenetic data has the ability to identify previously discovered cis-regulatory targets that mediate human height variation.

Motif analysis of height variants that reside within distal femur ATAC-seq peaks

To expand upon our understanding of transcription factor binding at ATAC-seq open chromatin regions, we performed a series of motif enrichment analyses using HOMER (see Materials and methods). First, we performed a de novo motif analysis on the union of distal and proximal femoral ATAC-seq peaks in mouse to identify enriched transcription-factor-binding motifs. We identified a number of highly enriched motifs relevant to chondrogenesis and other skeletal developmental processes, including Atf3 (James et al., 2006), Msx2 (Amano et al., 2008), Pbx1 (Selleri et al., 2001), and Ctcf (Jerković et al., 2017) (Figure 7a, Supplementary file 8, sheet 1). We performed a similar analysis on the liftover human ATAC-seq regions and found enrichments for a similar set of motifs (Figure 7a, Supplementary file 8, sheet 2).

Figure 7

Download asset Open asset

Motif disruption by height GWAS variants intersecting ATAC-seq peaks.

(A) Selected enriched motifs within mouse femoral ATAC-seq open chromatin regions. These motifs were also found following liftover of the peaks to the human genome (see Supplementary file 8, sheets 1 and 2). (B) Histogram showing distribution of log-odds motif scores of CTCF-disruption for height GWAS variants within ATAC-seq regions. Negative values indicate a disruptive effect of the variant while positive values indicate a 'conservation' of the motif hit by the variant. (C) Histogram of number of variants predicted to disrupt CTCF motifs within random subsamples of variants. Dotted line indicates number of CTCF-disrupting variants observed in the height GWAS variants overlapping ATAC-seq regions (n = 78, p<0.01). (D) An example of a variant (rs12529733) that overlaps ATAC-seq peaks and is predicted to alter CTCF binding. Features in the figure are the same as described in Figure 6A.

https://doi.org/10.7554/eLife.29329.014

We next characterized the effect of height GWAS variants that intersect with these motifs within ATAC-seq peaks. Further characterization of the GWAS variants within CTCF motifs indicated a tendency toward disruptive effects (Figure 7b). Comparing the height GWAS variants intersecting ATAC-seq peaks to 500 sets of random GWAS variants revealed an enrichment for variants that are predicted to impact CTCF-binding (p<0.01, z-score: 6.01, Figure 7c) (see Materials and methods). An example of a variant within a CTCF motif is shown in Figure 7d. Additionally, we detected a strong enrichment of height GWAS variants overlapping predicted PBX1 motifs within ATAC-seq regions, when compared to height GWAS variants across the genome (adjusted p value=1.91×10⁻¹⁰⁸, Supplementary file 9; see Materials and methods).

Discussion

Like many anthropometric traits, height is highly heritable and influenced by a large number of distinct genetic inputs that influence the development and/or growth of a number of tissues (Visscher et al., 2010). Of these tissues, the growth plate plays a key role in determining overall height. Despite decades of research on chondrocyte growth plate biology, we have been limited in our understanding of how height GWAS variants—most of which are non-coding—act in the growth plate to influence height. This is due in part to the paucity of epigenetic data related to the regulatory landscape of the growth plate, as these datasets have been technically challenging to generate. Here, we leveraged a new method (ATAC-seq) (Buenrostro et al., 2015, 2013) to overcome previous technical challenges and reveal the open chromatin regulatory landscape of growth plate chondrocytes. We then analyzed height GWAS data in the context of the femur growth plate epigenetic landscape to gain insight into how height GWAS variants act at the growth plate.

Our analyses revealed that ATAC-seq datasets generated from mouse growth plate chondrocytes overlap previously identified signals of chondrocyte biology in the mouse, as well as in humans. The ATAC-seq peaks reside near genes enriched for human phenotypes related to skeletal biology, such as ‘metaphyseal widening.’ These peaks also reside near genes with known roles in chondrocyte biology and include biological processes such as ‘cartilage condensation’ and ‘chondrocyte proliferation.’ We also show that our open chromatin profiles overlap quite faithfully with locations of active regulatory enhancers and Sox9-binding sites as assessed using ChIP-seq on mouse rib chondrocytes (Ohba et al., 2015). Additionally, our datasets overlap with evidence of regulatory control during chondrogenesis and human limb embryogenesis (Cotney et al., 2013; Herlofsen et al., 2013; Kundaje et al., 2015). These findings indicate that general aspects of chondrocyte regulation, in part shared between humans and mice, are captured by our ATAC-seq growth plate datasets. While we note that appropriate human datasets generated from growth plate chondrocytes are still unavailable and that there is evidence of substantial functional divergence in the genic and non-coding usage in cell types between species (Bernstein and ENCODE Project Consortium, 2012), the strong overlap between our mouse ATAC-seq data and human embryonic limb and human BMDCs suggests considerable conservation in the regulatory profiles at the growth plate.

While our datasets capture aspects of general chondrocyte biology, we also found that ATAC-seq peaks acquired on femoral tissues show enrichment in human phenotypes related to the anatomy of femur. This finding points to a fine-grained, modularized control of skeletal element-specific biology. This is also supported by our independent identification of a number of previously published long bone growth plate chondrocyte enhancers in mice and humans, as well as the growing evidence that a number of important skeletal development genes, such as Bmp5 (Guenther et al., 2015), Gdf6 (Mortlock et al., 2003; Indjeian et al., 2016), and Gdf5 (Capellini et al., 2017; Chen et al., 2016) have conserved modularized cis-regulatory architectures. As height is encoded by a number of genetic inputs, undoubtedly variants that influence general, as well as element-specific regulatory enhancers (e.g. tibia versus femur), will be important factors shaping height variation in humans.

Given that over 80% of GWAS height loci are non-coding, our study demonstrates that height GWAS variants are significantly enriched in growth plate chondrocyte regulatory regions and that the majority reside in intergenic portions of the genome. Importantly, this enrichment of height GWAS loci was not observed nearly as strongly in other chondrocyte or limb bud tissues, which suggests that our dataset is capturing functional genetic variation influencing height that is not captured by other existing datasets. No enrichment was observed for DNaseI hypersensitivity sites from mouse brain, liver, or lung, which suggests that our enrichment with height GWAS variants is tissue-specific and also not due to an artifact from the liftover process from mouse to human. These ATAC-seq peaks overlapping height GWAS variants are enriched for nearby differentially expressed growth plate genes. GWAS results for other traits/diseases that are not clearly related to growth/height did not show enrichment at the ATAC-seq peaks, demonstrating that the ATAC-seq peaks are specifically capturing growth/height biology. Finally, our motif analysis additionally revealed that a number of these human height variants may alter the binding of transcription factors with important roles in chondrogenesis and gene regulatory processes.

Combining our statistical fine-mapping and functional fine-mapping has allowed us to narrow down to a much smaller subset of putatively causal variants for downstream functional testing. As an example, one putatively causal variant (rs9920291) resides in the Chondroitin Sulfate Synthase 1 (CHSY1) locus. Human mutations in CHSY1 result in Temtamy preaxial brachydactyly syndrome, characterized by short stature, preaxial brachydactyly, and hyperphalangism of digits 1–3 (45, 46), as well as a number of additional soft and hard tissue disorders. rs9920291 is in the fine-mapped credible set, resides in regulatory elements active in human adult chondrocytes and embryonic limb tissues, and displays evidence of binding by Sox9 in mouse chondrocytes. rs9920291 additionally modifies a critical nucleotide (C to T) and disrupts the experimentally determined in vitro binding motif of HOXD13, a homeodomain transcription factor that is expressed in chondrocytes (Kuss et al., 2014; Reno et al., 2016) and has known roles in skeletal development (Johnson and Tabin, 1997). We performed extensive experimental characterization of rs9920291 at the CHSY1 locus. We demonstrated that rs9920291 displays allelic regulatory activity in human chondrocyte cell lines. We also deleted the regulatory region surrounding rs9920291 using CRISPR-Cas9 and demonstrated that these deletions alter CHSY1 expression. Thus, our results elucidate how common regulatory variation at this locus influences height by modulating CHSY1 expression. Height, given that it is generally not a cell-intrinsic phenotype, has traditionally not been conducive to functional experimentation. Our results highlight how we can pinpoint likely causal variants and mechanisms to make experimentation more tractable for a challenging phenotype such as height. To our knowledge, along with GDF5, this is the only other height GWAS locus with functional validation (Capellini et al., 2017).

A number of other variants, listed in the browsable Supplementary file 6, are available for functional testing. When more extensive height GWAS data (ideally imputed to a dense reference panel) are generated and more sophisticated fine-mapping methods are developed, additional evidence of enrichment for fine-mapped variants may be revealed using our approach. When these experiments are combined with high-throughput functional assays to test variant functionality, a more refined picture of height biology will emerge.

Share this article

Cite this article

ATAC-seq on in vivo collected mouse femoral growth plate zones.

Mouse proximal and distal ATAC-seq datasets capture chondrocyte biology in mouse and humans.

Height GWAS loci are enriched in ATAC-seq peaks.

Fine-mapping of GWAS height variants.

Height GWAS loci are enriched in differentially expressed growth plate genes.

Intersections of GWAS height variants with ATAC-seq data identify novel putatively causal variants.

Motif disruption by height GWAS variants intersecting ATAC-seq peaks.

Author details

Michael Guo

Contribution

Contributed equally with

Competing interests

Zun Liu

Contribution

Contributed equally with

Competing interests

Jessie Willen

Contribution

Contributed equally with

For correspondence

Competing interests

Cameron P Shaw

Contribution

For correspondence

Competing interests

Daniel Richard

Contribution

For correspondence

Competing interests

Evelyn Jagoda

Contribution

For correspondence

Competing interests

Andrew C Doxey

Contribution

For correspondence

Competing interests

Joel Hirschhorn

Contribution

For correspondence

Competing interests

Terence D Capellini

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organisms