Breaking enhancers to gain insights into developmental defects
Abstract
Despite ground-breaking genetic studies that have identified thousands of risk variants for developmental diseases, how these variants lead to molecular and cellular phenotypes remains a gap in knowledge. Many of these variants are non-coding and occur at enhancers, which orchestrate key regulatory programs during development. The prevailing paradigm is that non-coding variants alter the activity of enhancers, impacting gene expression programs, and ultimately contributing to disease risk. A key obstacle to progress is the systematic functional characterization of non-coding variants at scale, especially since enhancer activity is highly specific to cell type and developmental stage. Here, we review the foundational studies of enhancers in developmental disease and current genomic approaches to functionally characterize developmental enhancers and their variants at scale. In the coming decade, we anticipate systematic enhancer perturbation studies to link non-coding variants to molecular mechanisms, changes in cell state, and disease phenotypes.
Introduction
Enhancers are regulatory elements that drive development and lineage specification through temporal and cell-type-specific control of gene expression (Banerji et al., 1981). Thus, it is not surprising that enhancer dysregulation has been frequently implicated in developmental diseases (termed ‘enhanceropathies’). Whole genome sequencing (WGS) and genome-wide association studies (GWAS) have highlighted an abundance of disease-associated genetic variants at enhancers (Gusev et al., 2014; Parker et al., 2013). It is thought that these variants alter the activity of enhancers, impact gene expression programs, and ultimately contribute to disease risk and pathogenesis (Kleinjan and van Heyningen, 2005).
Several notable developmental enhanceropathies highlight key concepts. First, unlike protein-coding variants, the impact of genetic variants on enhancers may not be obvious. For example, point mutations of an enhancer of the sonic hedgehog (SHH) gene cause the developmental malformation, polydactyly (Lettice et al., 2017; Lettice et al., 2003). This enhancer was initially mapped to an intron of the LMBR1 gene through breakpoint analysis in a patient with preaxial polydactyly (Lettice et al., 2003). Functional analysis of the enhancer sequence in transgenic mouse embryos showed that the regulatory element’s activity overlapped with SHH expression in the zone of polarizing activity (ZPA) during early limb specification. Remarkably, this enhancer is over 1 Mb away from the SHH gene promoter and does not regulate the expression of several genes in closer proximity (Williamson et al., 2016). This example illustrates the potential for non-coding enhancer variants to cause developmental defects owing to their roles in early cell specification, by regulating distally located genes. This example also highlights the importance of 3D genome architecture in gene regulation and the interpretation of enhanceropathy mechanisms. Other examples include a long-range enhancer of IRX3 with obesity-associated variants in the brain (Smemo et al., 2014) and a cardiometabolic risk-associated enhancer that regulates the FST gene ∼522 kb away in the liver (Civelek et al., 2017). This topic of long-range gene regulation is more extensively covered in recent reviews (Razin et al., 2023; Vermunt et al., 2019).
Second, expanding on the first concept, enhancers can contribute to disease through indirect and cell-type specific mechanisms. For example, a risk-associated enhancer for the persistence of fetal hemoglobin alters the expression of the transcriptional repressor BCL11A (Bauer et al., 2013). However, it is not BCL11A per se, but its target HbF (fetal hemoglobin) that is directly responsible for the phenotype (Basak et al., 2015). Importantly, this gene regulatory event only manifests in hematopoietic cells. Third, enhanceropathies can phenocopy genetic variants. Congenital heart defects (CHD) are a widespread class of cardiac defects with multiple gene drivers (Fahed et al., 2013). One of the more well-known drivers is TBX5, with exon mutations associated with a variety of CHD-related complications such as Holt-Oram Syndrome and atrial/septal defects (Bruneau et al., 2001; McDermott et al., 2005). Sequencing studies in CHD patients identified several genetic variants, a handful of which ablated the activity of a TBX5 enhancer (Smemo et al., 2012). This observation suggests that TBX5 enhanceropathies can contribute to CHD by reducing TBX5 expression, and phenocopies haploinsufficiency of TBX5 caused by gene mutations.
The culmination of genetic studies has identified enhancer variants as major factors in the human diseases and underscores the role of disrupted spatiotemporal gene regulation in developmental disease. However, as the above examples indicate, the assignment of risk variants to the mechanisms of developmental diseases is often non-trivial. A key challenge in the field is to directly perturb enhancers, their variants, and associated genes in the appropriate cell type to identify their impacts on molecular, cellular, and organismal phenotypes that are relevant to developmental diseases (Figure 1). Here, we will review the development of recent genomic technologies to accomplish this ambitious goal and anticipate future challenges along three key dimensions: genomic perturbations, cellular systems, and the phenotypic readouts.
Functional evaluation of enhancer elements
Since disease-associated variants overlap enhancers, one approach to gain clues to the mechanisms of variants is to first perform a functional analysis of whole enhancer elements. This approach has several advantages. First, diverse tools have been developed to measure enhancer function at scale, including massively parallel reporter assays and CRISPR-based screening approaches (Table 1). Recent approaches have coupled these technologies with single-cell assays to further increase the scale and resolution of enhancer characterization (Dixit et al., 2016). Second, by focusing analyses on the small subset of enhancers that overlap variants (out of the universe of millions), this approach significantly reduces the set of enhancers to functionally test. While a given developmental state may have thousands of active enhancers, many will not be relevant to the disease. However, it is important to note that measuring the activity of an entire enhancer does not directly answer how a disease variant functions. Nonetheless, this approach can serve to prioritize enhancers for downstream variant-level analyses, which are more challenging to perform and lower throughput.
Reporter assays for analysis of enhancer elements during in vivo development
Since enhancer activity is cell-type and developmental time specific, an important initial step is to define active enhancers in a developmental system. Chromatin-based approaches have been widely adopted for this purpose. In particular, recent applications of single-cell ATAC-Seq in developmental systems have enabled the identification of thousands of potential enhancers at a cell-type specific level (Cusanovich et al., 2018a; Cusanovich et al., 2018b; Domcke et al., 2020; Gao et al., 2022). Chromosome conformation capture assays have also been extensively applied to identify enhancers that engage in 3D chromatin interactions with putative target genes (Lu et al., 2020; Ron et al., 2017). For focused discussion on these topics, we refer the reader to other reviews (Gasperini et al., 2020; Razin et al., 2023; Shlyueva et al., 2014; Vermunt et al., 2019).
While epigenetic signatures can be used to identify putative enhancer elements, functional assays are needed to definitively demonstrate an enhancer’s spatiotemporal activity during in vivo development. Reporter assays are simple and flexible tools that have addressed this gap in enhancer functional characterization. In a reporter assay, enhancer activity is assessed by cloning a putative enhancer downstream of a minimal promoter driving expression of a reporter gene (Stanojevic et al., 1991). For example, Conrad and Botchan used a reporter assay to discover the first enhancer as an element of the SV40 genome that drove the expression of a B-globin reporter gene in HeLa cells (Conrad and Botchan, 1982). However, reporters test enhancers outside of their native genomic context. A key development has been the extension of reporter assays to characterize the activity of enhancers in vivo (Hammer et al., 1987; Johnson et al., 1989; Pinkert et al., 1987; Swift et al., 1984). Imaging-based approaches, such as those incorporating luciferase or lacZ, have also enabled high-resolution spatiotemporal characterization of enhancer activity during mammalian development (Kvon et al., 2020; Pennacchio et al., 2006; Smemo et al., 2012). Notably, the VISTA Enhancer browser now documents thousands of enhancers that have been experimentally tested using reporter assays in embryonic mice (Visel et al., 2007). These detailed images of stained embryos provided by in vivo reporter assays demonstrate the exquisite spatiotemporal specificity of enhancers in development. However, the scale of traditional reporter assays remains limited to a handful of targets.
High throughput functional characterization of developmental enhancers
Massively parallel reporter assays (MPRA) use next-generation sequencing to measure enhancer activity at scale (Melnikov et al., 2012; Patwardhan et al., 2009). In an MPRA, thousands of DNA sequences are cloned into a reporter construct upstream of a minimal promoter, a fluorescent marker, and a barcode. By sequencing the barcodes and measuring abundance, the level of expression provided by each enhancer can be determined (Inoue and Ahituv, 2015). Innoue et al., applied an MPRA to characterize over 2,000 enhancers throughout the differentiation of human embryonic stem cells toward neural progenitors (Inoue et al., 2019). By collecting samples at multiple time points, the authors demonstrated the temporal specificity of enhancers. Notably, enhancer activity in the exogenous reporter correlated with epigenetic hallmarks found at the endogenous enhancers including ATAC-seq and H3K27ac, and target gene expression. As MPRAs test thousands of enhancers, they also reveal patterns of transcription factor binding and the features that drive enhancer activity (Smith et al., 2013). For example, Inoue et al., found that regulatory activity depended primarily on chromatin context (Inoue et al., 2019). The sequence of the regulatory elements, including binding motifs, primarily dictated whether these regions acted to increase or decrease transcription. STARR-seq is another massively parallel reporter assay that systematically tests all genomic fragments for enhancer activity (Arnold et al., 2013). An adaptation of STARR-seq has been applied in vivo to assess 408 sequences for enhancer activity in the early mouse brain (Lambert et al., 2021). This approach has the advantage of measuring the spatial activity of enhancers in vivo.
Traditional MPRAs are limited to characterizing putative enhancers in a uniform cell population. To address this issue, MPRAs have recently been combined with single-cell RNA-Seq (scRNA-seq) to characterize enhancer activity in heterogeneous cell systems (Lalanne et al., 2022; Zhao et al., 2023). Lalanne et al characterized 213 putative regulatory elements in mouse embryonic stem cell-derived embryoid bodies generating the three germ layers (Lalanne et al., 2022). The authors identified an enhancer of Lamc1 which exhibits pleiotropic activity in two populations, in contrast to the Lamc1 gene which is expressed in a single cell type. This discordant regulation suggests that the expression of Lamc1 is not strictly dependent on the activity of its nearby enhancer. These studies highlight the power of single-cell technologies to address enhancer cell-type specificity and their ability to reveal regulatory networks. Thus, MPRAs expand the throughput of reporter assays and can be used to identify features of enhancers active in particular developmental systems.
Despite the widespread use, reporter assays have several limitations. First, the choice of the expression vector can produce dramatic differences in enhancer-mediated expression that can complicate interpretation (Lungu-Mitea and Lundqvist, 2020). For example, Lungu-Mitea et al., queried multiple reporter backbones and identified an induction fold difference of 10 between two backbones. Second, the length of the insert selected, often limited by DNA synthesis technologies, may not capture all the functional sequences of an enhancer (Romanov et al., 2021). Klein et al., performed an MPRA containing different enhancer lengths and found that results tended to correlate poorly between lengths (Klein et al., 2020). Finally, since reporter assays are exogenous, they do not recapitulate endogenous enhancer function. For example, in a screen of over 2000 DNA sequences, Inoue et al., observed that half of the identified putative enhancers did not correlate with endogenous chromatin signatures (Inoue et al., 2019). Thus, while reporter systems have the flexibility and scalability to assess thousands of sequences for enhancer activity, other methods are required to determine the developmental relevance of enhancers in their native genomic context.
Genome engineering for endogenous analysis of enhancers in development
Endogenous perturbations of enhancers in their genomic context are required to understand the role of these regulatory elements in development. TALENs, which are programmable nucleases that can be recruited to user-specified genomic loci, were initially applied to this question (Boch et al., 2009; Moscou and Bogdanove, 2009). Wang et al. directed TALENs to an enhancer in which lupus-associated variants had been identified (Wang et al., 2016). The introduction of variants at this enhancer disrupted promoter interaction with TNFAIP3 and ultimately led to the downregulation of the downstream autoimmune gene NF-κB. While TALENs allow for targeted genetic perturbations, they are too difficult to scale for genetic screens (Morbitzer et al., 2011; Reyon et al., 2012; Weber et al., 2011). In contrast, CRISPR/Cas9 has been widely adopted to endogenously perturb genes (Jinek et al., 2012). However, since enhancers consist of hundreds of base pairs, short indel introduced by one sgRNA alone may not be sufficient to disrupt enhancer activity (He et al., 2011; Reddy et al., 2012). Thus, paired sgRNAs flanking an enhancer have been used for deletion studies. For example, Zhou et al., used this approach to delete putative enhancers of the developmental gene SOX2 in embryonic stem cells (ESCs) (Zhou et al., 2014). This study identified multiple enhancers that were essential for regulating SOX2 expression and for the maintenance of pluripotency. To increase the throughput of this approach, Diao et al., generated a pool of sgRNAs spanning 174 potential enhancers to tile the POU5F1 locus in human ESCs (Diao et al., 2017). The authors identified regulators of POU5F1 within hESCs, including enhancers that behaved atypically by only temporarily reducing gene expression when perturbed. By systematically tiling the entire locus, they also found enhancer-like promoters which regulate POU5F1 as well as other genes in the region. Tiling screens can thus uncover a gene’s regulatory elements from a single pooled perturbation experiment. However, CRISPR/Cas9-mediated genetic perturbation of enhancers is limited by two factors: efficiency and scalability. Requiring two sgRNAs to excise an enhancer is inefficient, and often generates a heterogeneous population of cells in which not every cell will harbor an enhancer deletion (Zheng et al., 2014). Clonal selection is also required to ensure homozygous knockout as the cleavage efficiency of both alleles is even lower (Eleveld et al., 2021). Enhancer deletion also complicates high-throughput screens because sgRNA pairs must be simultaneously introduced into an individual cell.
To address these issues, a flexible suite of CRISPR-based tools has been developed to alter enhancer activity through epigenetic modification. This approach fuses catalytically dead Cas9 (dCas9) with chromatin modifiers to introduce epigenome edits (Qi et al., 2013). Common dCas9 modifiers include: KRAB, which introduces the heterochromatic modification H3K9me3 to repress enhancer activity (CRISPRi); and p300, a histone acetylase that activates enhancer activity (CRISPRa) (Gilbert et al., 2013; Hilton et al., 2015). CRISPRi/a has proven to be a potent modifier of enhancer activity and gene expression capable of robust effects even with a single sgRNA (Thakore et al., 2015). This capability has enabled CRISPRi screens with hundreds to thousands of sgRNAs to systematically interrogate multiple enhancers in a single experiment. Fulco et al., applied a tiling CRISPRi screen targeting two loci harboring essential transcription factors (TFs) and enriched them for cells that influence viability (Fulco et al., 2016). This approach identified 9 enhancers which regulate TF expression and harbor phenotypic consequences on cell growth. These experiments indicate that only a subset of enhancers identified by chromatin profiling (Crawford et al., 2006) have impacts on downstream gene expression (Malin et al., 2013). Besides cellular proliferation, RNA expression is another selectable marker that has been applied to CRISPRi enhancer screens (Fulco et al., 2019). CRISPRi-FlowFISH applies an enhancer screen to cells with genes fluorescently labeled using RNA FISH. Sorting cells into bins of expression based on fluorescence intensity identifies enhancers which regulate the labeled genes and the relative level of regulation. Fulco et al., applied CRISPRi-FlowFISH to over 4,000 enhancer-gene pairs to identify the features of enhancer-promoter regulation (Fulco et al., 2019), and showed that enhancers do not always regulate the closest gene. This highlights the importance of chromatin looping and architecture which mediates the DNA-to-DNA interactions mediating enhancer to promoter regulation (Kadauke and Blobel, 2009). One disadvantage of viability and gene expression CRISPRi screens is that the throughput is typically limited to a single phenotypic readout.
Single-cell screens for enhancer activity in heterogeneous developmental systems
Traditional bulk CRISPR screens are restricted to a single readout that can be assessed. Single-cell RNA-seq (scRNA-seq) has been combined with CRISPR screens to address this limitation by providing high-content transcriptome-wide readouts of cell state (Dixit et al., 2016). In single-cell CRISPR screens, a pool of sgRNAs is transduced into cells followed by scRNA-Seq to detail the transcriptome and perturbations of individual cells. Dixit et al., first demonstrated single-cell CRISPR perturbations (Perturb-seq) to knock out dozens of genes and screen for transcriptional differences (Dixit et al., 2016). As Cas9-mediated genetic knockout has low efficiency, Perturb-seq has also been applied in conjunction with dCas9-KRAB in a similar genetic screen (Adamson et al., 2016). Both of these approaches allow for the large-scale screening of multiple elements from a single experiment. This approach has been applied to a developmental system of hESC to cardiomyocyte differentiation, identifying enhancers which impact lineage specification of the cardiac system (Armendariz et al., 2022). Leveraging single-cell gene expression provides not only the transcriptional phenotypes mediated by each enhancer, but also the effect enhancer repression has on cell state. In this way, phenotypes such as differentiation potential can be identified. Similar screens have been applied to promoters in neuronal and endoderm development, yet the application towards enhancers remains limited (Genga et al., 2019; Tian et al., 2019). A significant limitation of using scRNA-seq as a readout is that lowly expressed genes are difficult to detect, which limits statistical analysis and can result in false negatives. To address this problem, TAP-Seq selectively enriches key transcripts in single-cell libraries to increase the sensitivity of detecting genes of interest (Schraivogel et al., 2020). TAP-seq thus increases the detection of genes regulated by targeted enhancers and has the added advantage of reducing sequencing costs. Transcript amplification methods are ideal for enhancer studies as enhancers typically regulate nearby genes, reducing the number of candidate genes for amplification. However, the low overall sensitivity of single-cell RNA-Seq means that very lowly expressed genes may still escape robust detection. Methods to perturb enhancers have rapidly advanced in scale and efficiency over the last decade. Improvements to delivery systems and differentiation models that can better recapitulate development have also provided an increase in disease relevance.
Functional evaluation of enhancer variants
High throughput reporter assays to interpret enhancer variants
While studying enhancers at the element level provides information on their molecular and cellular roles (Figure 1), they lack the resolution of nucleotide variants observed in disease states (Lensch et al., 2022; Thakore et al., 2015). Thus, it is crucial to study perturbations at a nucleotide resolution to elucidate the role of enhancer variants in disease. However, one key challenge is that the effect of a nucleotide variant on an enhancer’s activity is expected to be modest compared to the whole-enhancer perturbations (Kheradpour et al., 2013; Patwardhan et al., 2012). As a result, the molecular changes in target gene expression and downstream cellular phenotypes are also expected to be modest. Thus, sensitive assays are required to measure variant effects (Table 1).
Reporter assays can be readily adapted to study variants by directly incorporating patient-derived nucleotide changes in tested fragments. For example, Smemo et al., identified a TBX5 enhancer G-to-T transversion in a patient with an isolated congenital heart defect. TBX5 is a transcription factor well-studied in the context of cardiac development with coding variants causing the cardiac developmental disorder Holt-Oram syndrome. The wildtype sequence of the enhancer showed myocardium-specific expression in the ventricles and ventricular septum in transgenic mice. Notably, the authors showed abrogation of heart-specific enhancer activity in the G-to-T mutant using beta-galactosidase reporter assay. (Smemo et al., 2012). This study is a rare example validating a patient-derived enhancer variant in a model for developmental disease. However, extending this approach more widely requires higher throughput methods.
To increase the throughput of the variants and elements in the reporter assays, saturation mutagenesis with MPRA has been used to study many disease-associated variants at a single nucleotide resolution (Kircher et al., 2019). In this study, the authors examined over 30,000 single nucleotide variants for 20 disease-associated regulatory elements. The authors identified developmentally relevant enhancer variants, including the well-known zone of polarizing activity regulatory sequence (ZRS). ZRS is a SHH limb enhancer and the identified variants are known to cause severe limb malformations like polydactyly (Riddle et al., 1993; Zeller et al., 2009). Another example is Factor IX (F9) which is associated with an X-linked bleeding disorder called Hemophilia B Leyden. Single nucleotide changes in the promoter region have been implicated in the disease. Kircher et al discovered mutations in the binding sequences for HNF4A and ETS-related transcription factors that reduced promoter activity.
MPRAs have historically lacked the ability to pinpoint the cell type-specific activity of enhancers, which is critical while studying development. Recent work by Zhao et al., addressed this challenge by developing single-cell MPRA screens. Zhao et al applied this approach in the mouse retina to test 113 variants of the Gnb3 promoter, a cis-regulatory element known for its differential expression among the subtypes of retinal cells. Beyond validating the cell type specificity of Gnb3 promoter a variant identified in a previous study, the authors captured the effects of single nucleotide variants across different binding sites in the promoter in different retinal cells (Murphy et al., 2019; Zhao et al., 2023). For example, the authors created single nucleotide variants in the E-box motif, which is critical for the development of multiple retinal cell subtypes. While most of these variants showed effects on gene expression level, only one variant affected cell type specificity. Thus, scMPRA provides a powerful tool to study the effect of single nucleotide variants in a developmental context in complex tissues.
CRISPR genome engineering for analysis of enhancer variants
While CRISPRi is an effective tool for enhancer perturbation at ~1 kb resolution, it is too blunt and cannot resolve function at the level of variant-identified disease conditions.(Lensch et al., 2022; Li et al., 2020; Thakore et al., 2015) Thus, to model the effect of nucleotide variants on enhancer function, alternative tools are needed. The short indels generated by CRISPR/Cas9-directed genome editing (typically <30 base pairs) offer higher resolution than CRISPRi/a (Allen et al., 2018; Koike-Yusa et al., 2014; Kosicki et al., 2022; van Overbeek et al., 2016). This method exploits the ability of the cells to be repaired by a non-homologous end joining pathway and generate random insertions and deletions in the guide RNA targeted region. The first study to successfully characterize enhancer activity using CRISPR/Cas9 studied the human erythroid enhancer of BCL11A (Canver et al., 2015). BCL11A is a developmentally crucial transcriptional repressor that facilitates fetal (HbF) to adult hemoglobin switching (Bauer et al., 2012). Bauer et al identified common genetic variants in the erythroid-specific enhancer of BCL11A that was associated with HbF expression (Bauer et al., 2013). By dissecting the enhancer at near-base-pair resolution using CRISPR/Cas9 saturation mutagenesis, the authors pinpointed nucleotides in the enhancer that altered BCL11A expression and subsequently HbF expression (Canver et al., 2015). Specifically, the authors tiled the enhancers identified using DNAse hypersensitivity assays with gRNAs and quantified the effect of perturbation on target gene HbF, through flow cytometry-based sorting. They identified the effect of enhancer perturbation through guide RNA abundance. Other studies using similar single-gene screening strategies have also functionally identified enhancers at scale (Diao et al., 2017; Sanjana et al., 2016).
One drawback of CRISPR/Cas9 approaches is that the double-strand breaks caused can be detrimental to the cells. Recently developed technologies like base editing have solved this problem. For example, Martin-Rufino et al applied base editing screens to precisely alter variants in the promoter region of the γ-hemoglobin gene (HBG1/2), an important component of fetal hemoglobin (Martin-Rufino et al., 2023). Using a pooled base editing screen for ~120 sgRNAs targeting the 300 bp promoter, coupled with FACS-based enrichment for HbF populations, the authors identified new and previously known variants that increase HbF expression. Importantly, the authors also performed pooled single-cell genotyping to link the nucleotide variants to the phenotype. In a separate study, Chen et al., took a tiered approach by combining deep learning-based methodologies with CRISPRi and base editing screens to identify variants in enhancers of CD69 (Chen et al., 2023). Finally, Morris et al., extended this base editing screening strategy to functionally test GWAS variants for blood traits with a single-cell RNA-Seq readout (Morris et al., 2023). Besides base editing, other methods like prime editing, which is currently being used to study variants in protein-coding regions, can be also applied to study enhancer variants (Erwood et al., 2022).
One challenge to studying enhancer variants is the ability to map variants to phenotypes. In the BCL11A enhancer dissection study, the authors sorted the cells based on target gene expression (HbF) and quantified gRNAs contributing to that phenotype. While this enables the study of perturbations at scale, it is an indirect measure for actual edits on DNA. Variant-level analyses that measure nucleotide edits at scale have been recently applied to study exons. Findlay et al., used saturation mutagenesis to study 13 exons of BRCA1 to characterize variants of unknown significance (Findlay et al., 2018). This study used homology-directed repair to supply all possible single nucleotide variants and quantified the effects using growth-based screens. Since the authors sequenced the installed variants in genomic DNA and measured the effects of the variants at the RNA level, this study offers a comprehensive way to link variants to gene expression. Extending this approach to non-coding elements in a disease-relevant system can offer a way to map enhancers’ effect on target genes at variant resolution.
Future challenges
GWAS indicate that ~90% of disease-associated variants are non-coding and likely at enhancers (Edwards et al., 2013). However, to date, the vast majority of enhancers and their phenotype-associated genetic variants remain uncharacterized. Thus, to understand the molecular and cellular mechanisms of developmental diseases, a key future challenge will be the development of new technologies that enable more comprehensive functional studies of enhancers and their genetic variants. We view the spectrum of these genomic technologies along three dimensions: the resolution of genomic perturbations, the complexity of the cellular system, and the depth of the readout (Figure 2). Here, we evaluate current technologies along each of these dimensions and discuss future challenges.
Genomic perturbation: From elements to variants
MPRAs enable the high-throughput study of enhancer function at base-pair resolution but lack endogenous context. Conversely, CRISPRi/a studies enable medium-scale perturbation of enhancers in their native endogenous context, but lack the resolution to give insights on base-level variants sequenced in patients (Armendariz et al., 2022; Fulco et al., 2019; Fulco et al., 2016; Xie et al., 2017). One recent development is to employ base editing or prime editing to test the molecular and cellular phenotypes of endogenously installed variants. These tools are already being used to study disease-relevant protein coding regions (Erwood et al., 2022). A key future goal will be to gain comprehensive analysis of enhancers, with both endogenous context and base-pair resolution. This could be attained by a combination of multiple technologies (CRISPRi/a for endogenous context and MPRA for base-pair resolution), or by scalable base/prime editing approaches. However, one complication could be enhancer redundancy in which multiple enhancers or variants cooperatively regulate a given gene. Notable examples indicate that variants in multiple enhancers may jointly contribute to disease development (Cannavò et al., 2016; Corradin et al., 2014; Corradin and Scacheri, 2014; Kvon et al., 2021; Osterwalder et al., 2018). Combinatorial perturbation of multiple enhancers will require more efficient perturbation systems and would add significant complexity to functional analyses.
Cell systems: From cell lines to organoids to animal models
Existing enhancer studies have largely focused on static in vitro systems, especially immortalized or primary cell lines (Fulco et al., 2019). However, the temporal-specific activity of enhancers requires dynamic biological systems to model development and capture the impact of enhancers on developmental phenotypes. One approach taken is to perform CRISPRi studies in human pluripotent stem cells (hPSCs) during differentiation (Armendariz et al., 2022; Genga et al., 2019; Wu et al., 2022). One key challenge is that, since enhancers are time-dependent, understanding how they work together to establish and maintain gene expression networks could require a dense sampling of time points (Bonn et al., 2012).
More complex biological systems like human organoids and mouse models, which better recapitulate development, are needed to study the spatiotemporal impacts of enhancers. Organoids are currently available for a variety of developmental systems including the brain, kidney, and heart (Di Lullo and Kriegstein, 2017; Lewis-Israeli et al., 2021; Nishinakamura, 2019). One advantage is that these systems represent 3D models of human development. A CRISPRi screen has been applied to an organoid model, identifying key TFs in fetal lung development through perturbation of select promoters (Sun et al., 2022). Similar 3D systems could be used to study the role of enhancers in developmental disease. In vivo systems are currently able to interrogate individual enhancers and variants (Smemo et al., 2012). While in vivo studies offer the highest level of disease relevance, they suffer from low throughput. As a result, they are best employed to characterize enhancers which have been prioritized using screens in simpler models. While in vivo CRISPR screens have been accomplished, perturbation complexity remains a key challenge (Gemberling et al., 2021; Jin et al., 2020). A delivery system is needed that can endogenously perturb enhancers with high efficiency that can capture enhancer activity at the appropriate developmental time point. However, the number of cells available for analysis at relevant developmental stages may be limiting. Until this critical issue is addressed, organoid models will be a more tractable system for perturbation screens in the near future. A tiered approach with multiple systems of increasing complexity could also serve as a bridge until newer technologies are developed.
Readouts: From expression to morphology to function
Interpreting the function of an enhancer or its variants is limited by the sensitivity and scalability of the phenotype measured. There are many phenotypes that can be measured in cells, including growth, transcription, and morphology. Measuring gene expression is a common readout for many enhancer studies (Armendariz et al., 2022; Canver et al., 2015; Diao et al., 2017; Fulco et al., 2016; Reilly et al., 2021). Gene expression readouts offer a trade-off between the scalability and sensitivity of the genes measured. RNA sequencing-based approaches can provide an unbiased measurement of the transcriptome. However, detecting lowly expressed genes remains a challenge, especially at the single-cell level. This is especially relevant for transcription factors, which are often lowly or moderately expressed (Pokhilko et al., 2021). On the other hand, direct RNA labeling through fluorescent in-situ hybridization (FISH) is more sensitive. For example, FISH-based methods have been adapted to identify regulatory elements of key genes by perturbing enhancers and sorting cells based on fluorescence intensity after RNA labeling (Fulco et al., 2019; Reilly et al., 2021). One drawback is that FISH-based approaches can only survey a handful of pre-identified genes. More recent approaches including MerFISH, seqFISH, osmFISH, and others can increase the scale of genes detected, but are significantly more challenging to establish (Codeluppi et al., 2018; De Biase et al., 2021; Eng et al., 2019; Xia et al., 2019). Similarly, targeted transcript amplification has been developed to enrich genes of interest (Schraivogel et al., 2020). These approaches and improved single-cell chemistry will likely increase the sensitivity of transcript detection.
While gene expression readouts are common, more complex phenotypes could offer different insights into disease states. Morphological and functional readouts are two such phenotypes. Morphology as a phenotype has been enabled by advances in imaging. For example, Cell Painting is a multiplexed image-based assay for the detection of cellular organelles like mitochondria, Golgi apparatus, and cytoskeleton through fluorescent dyes (Bray et al., 2016). Subtle perturbation phenotypes such as size, shape, and structure can then be interrogated across multiple cellular components. In combination with CRISPR perturbations, optical screens can provide information on organelle localization, cell morphology, and cell-cell interactions (Feldman et al., 2019). Optical CRISPR screens have been employed to characterize multiple phenotypic features of perturbation of essential genes (Funk et al., 2022). Since optical screens are compatible with live-cell imaging, combining them with enhancer perturbation can provide insights into the cellular dynamics across lineage specification during development. Beyond morphology, enhancers and their variants can lead to other organismal and physiological phenotypes. For example, Cunningham et al., identified two enhancers of Tbx5 that are not essential for limb development through genetic knockout (Cunningham et al., 2018). However, such phenotypes are not amenable to high throughput characterization (Bender et al., 2000; Cunningham et al., 2018; Johnson et al., 2012). Several functional assays have been developed which incorporate high-throughput techniques for large-scale screening. Patch-seq is one such readout that combines whole-cell electrophysiological recordings with single-cell RNA sequencing and immunochemistry (Cadwell et al., 2016; Fuzik et al., 2016). Such multimodal readouts will enable the linking of perturbations to gene expression and functional phenotype. Such analyses will yield fuller insights into an enhancer’s role in disease and development.
Predictive modeling
Since there are millions of regulatory elements in the human genome and many more sequence variants, it is not feasible to experimentally test all of them for their impact on developmental disorders. Computational models that can accurately predict the regulatory activity of sequence variants will be an essential component to close this gap. Several recent advances highlight the promise of this nascent field. For example, Sei is a sequence-based deep learning model trained on a compendium of ~20,000 epigenetic features, including open chromatin and transcription factor binding from ~1300 cell lines and tissues (Chen et al., 2022). Given a DNA sequence, Sei accurately predicts whether the sequence is a regulatory element and in what cell contexts. Since Sei models sequences, it can also predict how genetic variants alter enhancer activity. Similarly, Enformer also applies deep learning to predict gene regulatory activity from genomic sequences (Avsec et al., 2021). One key advance of Enformer is the ability to use information from distal genomic interactions (~100 kb) to improve predictions. In this way, Enformer can predict long-range interactions between promoters and enhancers.
Predictive modeling is an exciting area for future research, and there are many areas for future improvement. First, current approaches have focused on predicting molecular phenotypes including expression and epigenetic status. Future needs include the accurate modeling of how biological networks and pathways are perturbed by sequence variants, as well as more complex cellular and organismal phenotypes. Second, accurate predictions rely on good training data. However, much of our existing training data is derived from cell lines. Epigenetic data from human developmental systems, especially in vivo, are still rare. In addition, since there is limited data on systematic perturbation studies, existing models will need to improve as these data become available. Active learning strategies with deep integration of experimental and modeling components will be crucial to guide experiments to where computational modeling can be most improved. Third, predictive models need to be accurate across diverse populations, and doing so requires the ability to model the effect of genetic background. Fourth, the development of explainable AI models will offer insights into these models and the features important for prediction (Novakovsky et al., 2023). We anticipate that improved predictive modeling capabilities will drive clinical applications to interpret the molecular, cellular, and organismal impact of newly identified variants in patients.
Conclusion
From the initial discovery of enhancers in SV40 four decades ago, the field has witnessed a rapid progression of tools to characterize enhancers and their variants in development (Banerji et al., 1981). The 2000s saw the development of advanced reporter systems to characterize enhancer activity in vivo and culminated in the development of MPRAs (Patwardhan et al., 2012; Patwardhan et al., 2009). The 2010s witnessed the comprehensive mapping of enhancers, as well as new genome engineering tools to perturb enhancers endogenously at scale, both at increasingly cellular resolution. In the coming decade, we anticipate that innovative technologies will spearhead the high-throughput characterization of how developmental enhancers and their genetic variants impact molecular and cellular phenotypes in vivo (Figure 2). However, to gain comprehensive views of all enhancers at the nucleotide and cellular resolution, experimental strategies alone will not be sufficient. Predictive modeling and machine learning approaches will be instrumental to achieve this goal. Ultimately, this knowledge will enable the interpretation of enhancer variants in both research and clinical settings.
References
-
BCL11A deletions result in fetal hemoglobin persistence and neurodevelopmental alterationsThe Journal of Clinical Investigation 125:2363–2368.https://doi.org/10.1172/JCI81163
-
Genetic regulation of adipose gene expression and cardio-metabolic traitsAmerican Journal of Human Genetics 100:428–443.https://doi.org/10.1016/j.ajhg.2017.01.027
-
The use of brain organoids to investigate neural development and diseaseNature Reviews. Neuroscience 18:573–584.https://doi.org/10.1038/nrn.2017.107
-
Beyond GWASs: illuminating the dark road from association to functionAmerican Journal of Human Genetics 93:779–797.https://doi.org/10.1016/j.ajhg.2013.10.012
-
Engineering large-scale chromosomal deletions by CRISPR-Cas9Nucleic Acids Research 49:12007–12016.https://doi.org/10.1093/nar/gkab557
-
Saturation variant interpretation using CRISPR prime editingNature Biotechnology 40:885–895.https://doi.org/10.1038/s41587-021-01201-1
-
Genetics of congenital heart disease: the glass half emptyCirculation Research 112:707–720.https://doi.org/10.1161/CIRCRESAHA.112.300853
-
Towards a comprehensive catalogue of validated and target-linked human enhancersNature Reviews. Genetics 21:292–310.https://doi.org/10.1038/s41576-019-0209-0
-
Muscle creatine kinase sequence elements regulating skeletal and cardiac muscle expression in transgenic miceMolecular and Cellular Biology 9:3393–3399.https://doi.org/10.1128/mcb.9.8.3393-3399.1989
-
Cis-element mutated in GATA2-dependent immunodeficiency governs hematopoiesis and vascular integrityThe Journal of Clinical Investigation 122:3692–3704.https://doi.org/10.1172/JCI61623
-
Chromatin loops in gene regulationBiochimica et Biophysica Acta 1789:17–25.https://doi.org/10.1016/j.bbagrm.2008.07.002
-
Long-range control of gene expression: emerging mechanisms and disruption in diseaseAmerican Journal of Human Genetics 76:8–32.https://doi.org/10.1086/426833
-
Enhancer redundancy in development and diseaseNature Reviews. Genetics 22:324–336.https://doi.org/10.1038/s41576-020-00311-x
-
Enhancer networks revealed by correlated DNAse hypersensitivity states of enhancersNucleic Acids Research 41:6828–6838.https://doi.org/10.1093/nar/gkt374
-
Assembly of custom TALE-type DNA binding domains by modular cloningNucleic Acids Research 39:5790–5799.https://doi.org/10.1093/nar/gkr151
-
Human kidney organoids: progress and remaining challengesNature Reviews. Nephrology 15:613–624.https://doi.org/10.1038/s41581-019-0176-x
-
Obtaining genetics insights from deep learning via explainable artificial intelligenceNature Reviews Genetics 24:125–137.https://doi.org/10.1038/s41576-022-00532-2
-
High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesisNature Biotechnology 27:1173–1175.https://doi.org/10.1038/nbt.1589
-
Massively parallel functional dissection of mammalian enhancers in vivoNature Biotechnology 30:265–270.https://doi.org/10.1038/nbt.2136
-
FLASH assembly of TALENs for high-throughput genome editingNature Biotechnology 30:460–465.https://doi.org/10.1038/nbt.2170
-
Methods of massive parallel reporter assays for investigation of enhancersVavilov Journal of Genetics and Breeding 25:344–355.https://doi.org/10.18699/VJ21.038
-
Transcriptional enhancers: from properties to genome-wide predictionsNature Reviews. Genetics 15:272–286.https://doi.org/10.1038/nrg3682
-
Regulatory variation in a TBX5 enhancer leads to isolated congenital heart diseaseHuman Molecular Genetics 21:3255–3263.https://doi.org/10.1093/hmg/dds165
-
The interdependence of gene-regulatory elements and the 3D genomeThe Journal of Cell Biology 218:12–26.https://doi.org/10.1083/jcb.201809040
-
VISTA Enhancer Browser--a database of tissue-specific human enhancersNucleic Acids Research 35:D88–D92.https://doi.org/10.1093/nar/gkl822
-
Vertebrate limb bud development: moving towards integrative analysis of organogenesisNature Reviews. Genetics 10:845–858.https://doi.org/10.1038/nrg2681
-
A Sox2 distal enhancer cluster regulates embryonic stem cell differentiation potentialGenes & Development 28:2699–2711.https://doi.org/10.1101/gad.248526.114
Article and author information
Author details
Funding
Cancer Prevention and Research Institute of Texas (RP190451)
- Gary C Hon
National Institutes of Health (DP2GM128203)
- Gary C Hon
National Institutes of Health (UM1HG011996)
- Gary C Hon
National Institutes of Health (1R35GM145235)
- Gary C Hon
Burroughs Wellcome Fund (1019804)
- Gary C Hon
Welch Foundation (I-2103-20220331)
- Gary C Hon
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank the Hon lab for their helpful comments and suggestions in the preparation of this review. GCH is supported by CPRIT (RP190451), NIH (DP2GM128203, UM1HG011996, 1R35GM145235), the Burroughs Wellcome Fund (1019804), the Welch Foundation (I-2103–20220331), and the Green Center for Reproductive Biology.
Copyright
© 2023, Armendariz, Sundarrajan et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,447
- views
-
- 184
- downloads
-
- 6
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Developmental Biology
- Physics of Living Systems
Shape changes of epithelia during animal development, such as convergent extension, are achieved through the concerted mechanical activity of individual cells. While much is known about the corresponding large-scale tissue flow and its genetic drivers, fundamental questions regarding local control of contractile activity on the cellular scale and its embryo-scale coordination remain open. To address these questions, we develop a quantitative, model-based analysis framework to relate cell geometry to local tension in recently obtained time-lapse imaging data of gastrulating Drosophila embryos. This analysis systematically decomposes cell shape changes and T1 rearrangements into internally driven, active, and externally driven, passive, contributions. Our analysis provides evidence that germ band extension is driven by active T1 processes that self-organize through positive feedback acting on tensions. More generally, our findings suggest that epithelial convergent extension results from the controlled transformation of internal force balance geometry which combines the effects of bottom-up local self-organization with the top-down, embryo-scale regulation by gene expression.
-
- Computational and Systems Biology
- Developmental Biology
Understanding the principles underlying the design of robust, yet flexible patterning systems is a key problem in developmental biology. In the Drosophila wing, Hedgehog (Hh) signaling determines patterning outputs using dynamical properties of the Hh gradient. In particular, the pattern of collier (col) is established by the steady-state Hh gradient, whereas the pattern of decapentaplegic (dpp), is established by a transient gradient of Hh known as the Hh overshoot. Here, we use mathematical modeling to suggest that this dynamical interpretation of the Hh gradient results in specific robustness and precision properties. For instance, the location of the anterior border of col, which is subject to self-enhanced ligand degradation is more robustly specified than that of dpp to changes in morphogen dosage, and we provide experimental evidence of this prediction. However, the anterior border of dpp expression pattern, which is established by the overshoot gradient is much more precise to what would be expected by the steady-state gradient. Therefore, the dynamical interpretation of Hh signaling offers tradeoffs between robustness and precision to establish tunable patterning properties in a target-specific manner.