Predicting the effect of CRISPR-Cas9-based epigenome editing

Sanjit Singh Batra; Alan Cabrera; Jeffrey P Spence; Jacob Goell; Selvalakshmi S Anand; Isaac B Hilton; Yun S Song

doi:10.7554/eLife.92991.2

eLife Assessment

This study presents an advance in efforts to use histone post-translational modification (PTM) data to model gene expression and predict epigenetic editing activity. Such models are broadly useful to the research community, especially ones that can model epigenetic editing activity, which is novel; additionally, the authors have nicely integrated datasets across cell types into their model. The work is mostly solid, but it would be strengthened by performing rigorous comparisons to existing methods that predict gene expression from PTM data and from additional model validation beyond dCas9-p300 based perturbations.

https://doi.org/10.7554/eLife.92991.2.sa3

Significance of findings

useful: Findings that have focused importance and scope

landmark
fundamental
important
valuable
useful

Strength of evidence

solid: Methods, data and analyses broadly support the claims with only minor weaknesses

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Epigenetic regulation orchestrates mammalian transcription, but functional links between them remain elusive. To tackle this problem, we use epigenomic and transcriptomic data from 13 ENCODE cell types to train machine learning models to predict gene expression from histone post-translational modifications (PTMs), achieving transcriptome-wide correlations of ∼ 0.70 − 0.79 for most cell types. Our models recapitulate known associations between histone PTMs and expression patterns, including predicting that acetylation of histone subunit H3 lysine residue 27 (H3K27ac) near the transcription start site (TSS) significantly increases expression levels. To validate this prediction experimentally and investigate how natural vs. engineered deposition of H3K27ac might differentially affect expression, we apply the synthetic dCas9-p300 histone acetyltransferase system to 8 genes in the HEK293T cell line and to 5 genes in the K562 cell line. Further, to facilitate model building, we perform MNase-seq to map genome-wide nucleosome occupancy levels in HEK293T. We observe that our models perform well in accurately ranking relative fold-changes among genes in response to the dCas9-p300 system; however, their ability to rank fold-changes within individual genes is noticeably diminished compared to predicting expression across cell types from their native epigenetic signatures. Our findings highlight the need for more comprehensive genome-scale epigenome editing datasets, better understanding of the actual modifications made by epigenome editing tools, and improved causal models that transfer better from endogenous cellular measurements to perturbation experiments. Together these improvements would facilitate the ability to understand and predictably control the dynamic human epigenome with consequences for human health.

Introduction

All cells within a multicellular organism have the same genetic sequence up to a minuscule number of somatic mutations. Yet, many cell types exist with diverse morphological and functional traits. Epigenetics is an important regulator and driver of this diversity by allowing differences in cellular state and gene expression despite having the same genotype (Taherian Fard and Ragan, 2019). Indeed, cells traversing the trajectory from pluripotency through terminal differentiation have essentially the same genotype.

Epigenetic modifications such as post-translational modifications (PTMs) to histone proteins are involved in many vital regulatory processes influencing genomic accessibility, nuclear compartmentalization and transcription factor binding and recognition (Reik et al., 2001; Kouzarides, 2007; Gibney and Nolan, 2010; Klemm et al., 2019; Hafner and Boettiger, 2022; Zhang and Reinberg, 2001). The Histone Code Hypothesis suggests that combinations of different histone PTMs specify distinct chromatin states thereby regulating gene expression (Strahl and Allis, 2000; Jenuwein and Allis, 2001).

The field of epigenome editing has produced new tools for understanding the outcomes of epigenetic perturbations that promise to be useful for therapeutics by enabling fine-tuned control of gene expression (Matharu and Ahituv, 2020; Thakore et al., 2016; Goell and Hilton, 2021; Stricker et al., 2017). Currently small molecule drugs are used to potently interfere with epigenetic regulation of gene expression. For example Vorinostat inhibits histone deacetylases thereby impacting the epigenetic landscape (Estey, 2013; Yoon and Eom, 2016). However, small molecules globally disrupt the epigenome and transcriptome, and therefore are not suitable for targeting individual dysregulated genes nor clarifying epigenetic regulatory mechanisms (Swaminathan et al., 2007). Meanwhile, numerous tools have been designed to harness catalytically dead Cas9 (dCas9) to target epigenetic modifiers to DNA sequences encoded in guide RNAs (gRNAs) (Jinek et al., 2012; Mali et al., 2013; Hilton et al., 2015; Stepper et al., 2017; Kwon et al., 2017; Li et al., 2021). CRISPR-Cas9-based epigenome editing strategies facilitate unprecedented, precise control of the epigenome and gene activation providing a path to epigenetic-based therapeutics (Cheng et al., 2019).

A major challenge for epigenome editing is designing gRNAs that can achieve a desired level of transcriptional or epigenetic modulation. Finding effective gRNAs currently typically requires expensive and low throughput experimental strategies (Mohr et al., 2016; Liu et al., 2020; Mahata et al., 2023). An alternative approach would be to computationally model how epigenome editing impacts histone PTMs as well as how perturbing these PTMs would consequently impact gene expression.

To understand how histone PTMs relate to gene expression, large epigenetic and transcriptomic datasets are required. Advancements in high-throughput sequencing have allowed quantification of gene expression and profiling of histone PTMs. Large consortia have performed an extensive number of assays across a wide variety of cell types (ENCODE Project Consortium, 2012; Roadmap Epigenomics Consortium et al., 2015; Barrett et al., 2012).

These include measurements of histone PTMs, transcription factor binding, gene expression, and chromatin accessibility. These data have enhanced our understanding of how histone PTMs and other chromatin dynamics impact transcriptional regulation (Keung et al., 2015; Rao et al., 2014; Holoch and Moazed, 2015).

Studying the function of these histone PTMs, however, has been largely limited to statistical associations with gene expression, which may not capture causal relationships (Karlić et al., 2010; Stillman, 2018; Singh et al., 2016). For example deep learning has been successful in predicting gene expression from epigenetic modifications, such as transcription factor binding (Schmidt et al., 2017), chromatin accessibility (Schmidt et al., 2020), histone PTMs (Singh et al., 2016; Sekhon et al., 2018; Frasca et al., 2022; Singh et al., 2017; Hamdy et al., 2022; Chen et al., 2022), and DNA methy-lation (Zhong et al., 2019). However, these studies predict gene expression as binary levels instead of a continuous quantity. Finally, as statistical associations can be driven by non-causal mechanisms, it is unclear whether such computational models learn mechanistic, causal relationships between various epigenetic modifications and gene expression. Beyond modeling the relationship between histone PTMs and gene expression, to fully describe how a particular gRNA would affect gene expression, a model of how epigenome editing affects histone PTMs is also required. To our knowledge, there currently are no computational models that can accurately model, in silico, the impact of epigenome editing on histone PTMs.

Motivated by these observations, we explored models for how epigenome editing impacts histone PTMs as well as how histone PTMs impact gene expression. We used data available through ENCODE (Schreiber et al., 2020a; ENCODE Project Consortium, 2012) to train a model of how histone PTMs impact gene expression. Our model is highly predictive of endogenous expression and learns an understanding of chromatin biology which is consistent with known patterns of various histone PTMs (Kimura, 2013). To test this model in the context of epigenome editing, we generated perturbation data using the dCas9-p300 histone acetyltransferase system (Hilton et al., 2015). The dCas9-p300 system is thought to act primarily through local acetylation of histone lysine residues, particularly histone subunit H3 lysine residue 27 (H3K27ac). Therefore, we modeled the impact of dCas9-p300 on the epigenome as a local increase in the H3K27ac profile near the target site; since the precise effect of these perturbations is unknown, we tried a variety of potential modification patterns. We then applied our trained model to predict the impact of these putative H3K27ac modifications on gene expression (Figure 1). We found that our models, which are designed to predict gene expression values, were effective in ranking relative fold-changes among genes in response to the dCas9-p300 system, achieving a Spearman’s rank correlation of ∼0.8. However, their performance in ranking fold-changes within individual genes was less successful when compared to the prediction of gene expression across cell types from their native epigenetic signatures. We offer possible explanations in the discussion section.

Schematic of the epigenome editing prediction pipeline.
The pipeline uses epigenetic data to train models to predict endogenous gene expression. These models were used to predict fold-change in gene expression based on perturbed histone PTM input data, and their predictions were validated using CRISPR-Cas9-based epigenome editing data.

Results

Histone PTM data are highly predictive of gene expression

Genome-scale datasets are required to train models to predict gene expression using histone PTMs. Therefore, we obtained histone PTM ChIP-seq and RNA-seq data for 13 different human cell types from ENCODE (Schreiber et al., 2020a; ENCODE Project Consortium, 2012) (Appendix Table 1). We inspected metagene plots (histone PTMs averaged across genes within gene expression quantiles) describing 6 histone PTMs in each of these 13 different cell types. Based on different overall signal levels across cell types, we concluded that batch effects, likely due to inconsistent sequencing depths, would need to be corrected prior to training models (Figure 2–figure supplement 1).

We corrected these batch effects by adapting S3norm (Xiang et al., 2020) (Materials and Methods, Figure 2–figure supplement 2). These corrected histone PTM tracks were then used for the remainder of our analyses along with RNA-seq data for each of the 13 cell types (Figure 2–figure supplement 3).

Importantly, we observed that H3K27ac and H3K4me3 histone PTM signal strengths positively covaried with gene expression quantile (representative cell types shown in Figure 2; all cell types shown in Figure 2–figure supplement 3). Conversely, repressive histone PTMs such as H3K27me3 and H3K9me3 were strongly inversely correlated with gene expression quantiles. Spatial patterns in the metagene plots for H3K36me3 suggested that this mark covaried more strongly with gene expression in the gene body than near the TSS. Taken together, these observations recapitulated the current understanding of these well-studied histone PTMs with respect to their associations to gene expression (Kimura, 2013; Millán-Zambrano et al., 2022; Zhao et al., 2021).

Metagene plots show histone PTMs are consistent across cell types and recapitulate established relationships between histone PTMs and gene expression.
Colors represent genes binned into quantiles based on gene expression. Blue 75-100%, Orange 50-75%, Green 25-50%, Red 0-25% of gene expression within a cell type. The y-axis represents − log₁₀(p-value) obtained from ChIP-seq data.

Histone PTMs accurately predict endogenous gene expression

To predict how epigenome editing affects gene expression, we first trained models to predict gene expression from endogenous histone PTMs. We trained several convolutional neural networks (CNNs) and ridge regression models to predict the gene expression of each gene in each of the 13 cell types, using only histone PTM data proximal to the TSS as features (Materials and Methods, Figure 2). We observed that Spearman’s rank correlation between the true gene expression and the models’ predicted gene expression on held-out chromosomes improves as the input context size increases; and for all input context sizes, the CNNs outperform ridge regression models (Figure 3A). Therefore, for the remainder of the analyses, we use a context size of 10,000 base pairs.

To assess the models’ ability to generalize to unseen cell types, we trained a set of 10 models for each cell type. In particular, we held out the histone PTMs for a given cell type during training and then tested the models on that held-out cell type.

We observed that the CNNs outperformed ridge regression models on this cross-cell type generalization task across essentially all cell types (Figure 3B). The reduced performance on the adrenal cell type may be driven by a cell-type-specific biological mechanism that leads a lower correlation of its epigenetic data with other cell types, particularly for H3K36me3 (Figure 3–figure supplement 1).

Although our models accurately predicted endogenous gene expression, this does not guarantee their ability to accurately predict the relationship between local histone PTM variations and with gene expression for a particular gene across different cell types. Therefore, we determined Spearman’s rank correlations between the observed expression and the predicted expression for each held-out gene across the different cell types. The distribution of these correlations suggests that overall the CNNs can better rank cell types by gene expression than ridge regression (Figure 3C). In particular, the median cross-cell type correlation is ∼0.53 for CNNs compared to ∼0.39 for ridge regression.

Models recover established relationships between histone PTMs and gene expression

We investigated what features of the data the models used to predict gene expression. For a given gene, we modified the input histone PTMs one-by-one at nucleosome-scale and measured the predicted fold-change in gene expression (Figure 4, Figure 4–figure supplement 1, Materials and Methods).

Features learned by gene expression models.
Each point on the x-axis corresponds to *in silico* perturbation of that assay at that position and the y-axis measures the predicted fold-change in gene expression, averaged across a set of 100 trained models. The fold-changes were averaged across 500 randomly chosen genes.

We observed considerable changes to the predicted fold-change upon modifying different histone PTMs. In particular, our CNN models predict that repressive marks such as H3K27me3 and H3K9me3 proximal to the TSS result in a slight decrease in expression. In contrast, activating histone PTMs such as H3K27ac and H3K4me3 result in an almost two-fold increase in predicted gene expression near the TSS. Activating both of these markers exhibits a periodic pattern, likely reflecting nucleosome occupancy. However, activation of H3K4me3 results in a sharp increase in gene expression downstream of the TSS. Additionally, we observed that H3K36me3 is predicted to increase expression, but only if it is deposited in the gene body, and the degree of activation gradually increases as it is deposited further inside of the gene body. The consistency of these observations with established mechanisms, observed previously in the literature, via which these histone PTMs modulate gene expression (Kimura, 2013) lend credence to our gene expression models and show that these models learn the spatial patterns of histone PTMs.

dCas9-p300 differentially activates genes depending on gRNA-targeted site

To test if our gene expression models could accurately predict the outcome of in situ epigenome editing experiments, we first generated dCas9-p300 data in the HEK293T cell line for 8 genes (Figure 5). We assayed at least 5 gRNAs per gene with at least 3 replicates for each gRNA. We used the HEK293T cell line because it is a widely-used testbed for epigenome editing strategies (Hilton et al., 2015; Nuñez et al., 2021; O’Geen et al., 2017; Mahata et al., 2023; Escobar et al., 2022; Wang et al.,2022). Based on Figure 2 and Figure 4, the largest changes in H3K27ac across gene expression quantiles occur within 500 base pairs of the TSS, so we constrained gRNA targeting to this critical window. We filtered gRNAs for predicted specificity (Concordet and Haeussler, 2018) and on-target activity scores (Sanson et al., 2018). Each gRNA was tested individually, and relative mRNA abundance was measured using quantitative PCR (qPCR).

dCas9-p300 epigenome editing at eight endogenous genes identifies gene specific responses.
The genes tested are *CYP17A1, SOX11, C2CD4B, CXCR4, CD79A, TGFBR1, MYO1G*, and *PRSS12*.
**(A)** gRNA targeting +/− 250 bp of each gene were selected. **(B)** These Selected gRNA were individually co-transfected with dCas9-p300 with relative mRNA determined with qPCR. **(C)** Relative mRNA associated with selected guide position are displayed with the highest activating guide position marked in orange. The Y-axis corresponds to qPCR fold-change.

We successfully increased gene expression of all 8 genes with fold-change activation using the most effective respective gRNA for each gene ranging from 3-fold to ∼6,500-fold relative to a non-targeting control gRNA (Figure 5C). Some of this variation may be explained by differences in endogenous gene expression levels, with the targeting of lowly expressed genes resulting in higher fold-change measurements (Appendix Table 2), as observed previously (Wang et al., 2022). Never-theless, substantial variability was observed in gRNA efficacy for all targeted genes. In particular, two (MYO1G and PRSS12) out of eight genes had the most efficacious gRNA downstream of the TSS. This contrasts with other reports where targeting CRISPR/Cas based activators upstream of the TSS leads to the highest activation (Mohr et al., 2016; Gilbert et al., 2014).

These data indicate that the rules governing the outcomes for successful dCas9-p300-based epigenome editing – and subsequent increased transcriptional activation – are complex, and high-lights the fact that locus-specific nuances can be important factors in epigenome editing experiments. For example, two gRNAs targeting within ∼50 base pairs of each other on C2CD4B have a 100-fold difference in measured mRNA (Figure 5C). Further, gRNAs targeting the same position in different genes can have vastly different effects. For instance, several gRNAs targeting ∼250 base pairs upstream of the CYP17A1 TSS result in a high fold-change while two gRNAs targeting roughly the same position in MYO1G failed to produce substantial activation (Figure 5C).

Computationally predicting the outcome of dCas9-p300 epigenome editing experiments

To test the hypothesis that dCas9-p300 acts through the local deposition of H3K27ac, we modeled this process in silico and used these perturbations as inputs to our models trained on endogenous gene expression.

We modeled the effect of dCas9-p300 on histone PTMs based on evidence from the literature as well as additional experiments we performed. The key assumptions of this model are: 1) there exists steric hindrance of dCas9 by nucleosomes (Makasheva et al., 2021; Horlbeck et al., 2016; Isaac et al., 2016; Radzisheuskaya et al., 2016); 2) dCas9-p300 acts locally, altering H3K27ac levels near the gRNA target locus (Gemberling et al., 2021; Dominguez et al., 2022) (we adopted this simplifying assumption since off-target effects are unpredictable and underexplored (Dominguez et al., 2022; Gemberling et al., 2021; Weinert et al., 2018)); 3) dCas9-p300 can deposit H3K27ac at nucleosomes, as defined by MNase activity (see Materials and Methods) (Segelle et al., 2022; Zhou et al., 2016). Our resulting in silico perturbation model had a number of free parameters that we briefly describe below. Wherever possible, we used values for these parameters obtained from the literature or tested a range of plausible values. For a more complete description of the model, see Materials and Methods.

The first component of our perturbation model is steric hindrance of dCas9-p300 by nucleo-somes (Figure 6A). Intuitively, if DNA is tightly wound around a nucleosome, the gRNA would be less likely to bind successfully. Mathematically, we modeled this as an inverse relationship between the amount of H3K27ac deposited and the MNase activity at the gRNA target locus.

*In silico* model for dCas9-p300-based epigenome editing
**(A)** dCas9-p300 is more likely to bind to a position not occupied by the nucleosome. Thicker green arrow represents higher probability of binding for a gRNA targeting that site. **(B)** The *in silico* perturbation is modeled as a Gaussian kernel parameterized by a standard deviation, σ, and the amount of H3K27ac deposited, λ. (C) The final perturbed H3K27ac is obtained by point-wise multiplication of the Gaussian kernel with nucleosome occupancy quantified by MNase activity since dCas9-p300 can only acetylate histones within nucleosomes. **(D)** Ranks for predicted and endogenous expression across 8 genes and 13 cell types. Rank 1 corresponds to the highest numerical value. **(E)** Ranks for predicted and empirically measured expression fold-changes following perturbation by dCas9-p300 for 8 genes in HEK293T cells. Rank 1 corresponds to the highest numerical value.

It is widely assumed that dCas9-p300 activates genes through the local deposition of H3K27ac (Klann et al., 2017; Dominguez et al., 2022). To model this, we increased local levels of H3K27ac relative to endogenous levels according to a Gaussian kernel centered at the gRNA target locus (Figure 6B). This adds acetylation primarily within a distance controlled by the standard deviation (σ) of the kernel. We performed CUT&RUN experiments (see Appendix) that suggest that this distance is at least 1,000 base pairs (Figure 6–figure supplement 1). Since we also do not know the degree to which dCas9-p300 alters H3K27ac levels, we modeled this as another free parameter, λ, which we varied over a range of plausible values (Materials and Methods).

Finally, we assumed that dCas9-p300 does not affect the positioning of nucleosomes and hence can only add H3K27ac at positions currently occupied by histones (Zhou et al., 2016). As such, we expect H3K27ac levels to only increase at loci where there is MNase activity. In particular, we modulated the Gaussian kernel described above, by performing point-wise multiplication with MNase activity (Figure 6C).

Since nucleosome positioning plays a crucial role in our perturbation model, we generated, to our knowledge, the first MNase-seq data for the HEK293T cell line (see Appendix).

To get a baseline of how well our perturbation model might be able to predict the effect of dCas9-p300 on gene expression, we considered the 13 distinct cell types as being analogous to natural perturbations of local histone PTMs. Across the 8 genes discussed above, which were excluded from the training set, we observed a Spearman’s rank correlation of ∼0.8 between the endogenous expression and that predicted by our expression model (Figure 6D). This correlation was in line with the correlation observed across the endogenous transcriptome (Figure 3A,B). We further observed that our expression models were able to accurately rank gene expression across cell types within individual genes (Figure 6–figure supplement 2).

We then computed fold-changes between the expression predicted using endogenous histone PTMs and the expression predicted using in silico perturbations of these histone PTMs. We observed that our models were effective in ranking relative fold-changes across genes in response to dCas9-p300, achieving a Spearman’s rank correlation of ∼0.8 between these predicted fold-changes and the experimentally determined mRNA fold-changes induced by dCas9-p300 (Figure 6E). However, the performance in ranking fold-changes within individual genes was less accurate (Figure 6–figure supplement 3) when compared to the prediction of cell-type-specific gene expression from native epigenetic signatures (Figure 6–figure supplement 2).

We extended this analysis to a Perturb-seq dataset, consisting of gRNAs targeting proximal to the TSS of 5 genes in the K562 cell line to further assess the model’s ability to estimate gene expression changes. Consistent with the performance observed in Figure 6E, the model demonstrated robustness in predicting gene expression fold-changes across these 27 gRNAs targeting these 5 genes. Notably, these predictions achieved a Spearman’s rank correlation of ∼ 0.47 with the experimentally determined mRNA fold-changes measured by Perturb-seq, as shown in Figure 6–figure supplement 4 (see Appendix). These results reinforce the model’s effectiveness in capturing the nuanced effects of epigenome editing across different genes and cell types.

Discussion

Here, we sought to investigate whether we could predict how targeted epigenome editing affects endogenous gene expression. First, we collected data from ENCODE which reflects how post-translational modifications (PTMs) to histones covary with gene expression across cell types. We trained models to predict endogenous gene expression from these histone PTMs and found that these models were highly predictive (Figure 3). We further showed that such models learned known relationships between histone PTMs and gene expression (Figure 4). To test whether these expression models could predict the outcomes of epigenome editing experiments, we generated dCas9-p300 epigenome editing data in the HEK293T cell line for eight genes along with genome-wide MNase-seq data for this testbed cell line. We anticipate that the genome-wide nucleosome occupancy information for the HEK293T cell line provided by our MNase-seq experiment will be a useful resource for the genomics community. We also generated dCas9-p300 epigenome editing data via a perturb-seq experiment in K562 cells with gRNAs targeting the promoter regions of five genes. In this study, we focused on the histone changes induced by dCas9-p300 epigenome editing, but future studies may use the framework described in our manuscript and apply it to other transcriptional editors as well.

We modeled dCas9-p300’s impact on local H3K27ac using a variety of parameter choices and found that these models accurately predicted fold-changes across genes. However, they were less accurate at predicting the outcome of these experiments within a given gene, as compared to predicting gene expression from the endogenous epigenetic signatures (Figure 6–figure supplement 2, Figure 6–figure supplement 3). Since the endogenous epigenetic signatures could be different across genes, these global factors might drive the models’ accurate inter-gene fold-change prediction accuracy. However, since ranking fold-changes within a gene requires a detailed understanding of the epigenetic profiles before and after dCas9-p300 epigenome editing, the reduction in performance from predicting endogenous expression to predicting the outcome of epigenome editing experiments is likely explained by one or more of the following hypotheses: 1) dCas9-p300 activates gene expression by mechanisms other than the local acetylation of H3K27 or dCas9-p300 functions differently from native p300; 2) differences in gRNA efficacy are not accurately explained by existing computational scores; or 3) our models, trained on endogenous gene expression across various cell types, failed to generalize even if dCas9-p300 perturbations are correctly modeled. We discuss these possible explanations more in depth below.

We considered numerous models of how dCas9-p300 affects local histone PTMs. These models span current hypotheses of how dCas9-p300 alters local histone PTMs such as H3K27ac. The poor generalization of our models in predicting intra-gene epigenome editing fold-changes could be explained by dCas9-p300 acting via mechanisms beyond local acetylation of histone proteins and H3K27 (Zhao et al., 2021). For example, p300 is a promiscuous lysine acetyltransferase and dCas9-p300 could be broadly acetylating across the proteome impacting trans factors (Weinert et al., 2018). Alternatively, local acetylation could be contingent on unmodeled factors such as trans-acting proteins or other histone PTMs present at the locus (Zhao et al., 2021; Zheng et al., 2021). Furthermore, the genome-wide specificity of dCas9-p300-mediated histone acetylation – although likely better than small molecule-based perturbations – remains imperfect (Gemberling et al., 2021; Dominguez et al., 2022). Our inability to accurately predict the relative fold-change of different gRNAs targeting the same gene suggests that these unmodeled factors would have to differentially affect neighboring loci within the same gene. This highlights that the current understanding of the mechanism via which dCas9-p300 drives gene expression is potentially incomplete. To better understand this mechanism, it would be immensely helpful to generate a compendium of histone PTM profiles before and after performing epigenome editing, which would enable us to train better machine learning models to predict the impact of dCas9-p300 on gene expression.

Another possible explanation for the drop in accuracy is varying gRNA efficacies. For example, gRNAs might have different levels of on-target and off-target effects. Although we ensured that all of the gRNAs used in generating the dCas9-p300 epigenome editing data were predicted to have high on-target and low off-target scores, we observed examples of gRNAs that targeted roughly the same genomic position but had vastly different impacts on gene expression. This suggests that these differences could be driven by inconsistencies in gRNA efficacy instead of local acetylation dynamics. Generating a large number of pairs of gRNAs, such as through CRISPR screens (Schmidt et al., 2022), targeting nearby positions could help to elucidate the factors that drive differential gRNA efficacy for epigenome editing.

The ambiguity in how to accurately model the impact of epigenome editing stands in contrast to the simpler case of DNA sequence changes, where perturbations are relatively trivial to model. Indeed, dCas9-p300 changes histone PTMs in complex ways rendering the modeling of such perturbations much more challenging. In contrast, models like Enformer (Avsec et al., 2021) that predict gene expression directly from DNA sequence may be able to generalize to DNA sequence perturbations better due to their relative simplicity.

Another source of generalization error could be extrapolating beyond the range of the training data. Massively increasing the amount of H3K27ac at a locus may make a gene look different than any other endogenous gene observed during training. Regression approaches including neural networks are known to have limitations in extrapolation (Xu et al., 2020).

Our research indicates that we can predict endogenous gene expression accurately based on histone PTMs. By creating a comprehensive dataset of epigenome editing, which assays histone PTMs before and after in situ perturbations, we can enhance machine learning models. This will improve our understanding of the effects of dCas9-p300 on gene expression and assist in the design of gRNAs for achieving fine-tuned control over gene expression levels. These advancements are vital for devising experiments that deepen our mechanistic insight and offer effective strategies for human epigenome editing.

Materials and methods

Data preparation

We obtained − log₁₀(p-value) ChIP-seq tracks created by running the MACS2 peak-caller (Feng et al., 2012) on read count data, from the ENCODE Imputation Challenge (Schreiber et al., 2020a ). For three tracks where data were not available, we downloaded Avocado (Schreiber et al., 2020b ) imputations from the ENCODE data portal (ENCODE Project Consortium, 2012). We binned each epigenetic track at 25 base pair resolution and pre-processed them with an additional log operation before inputting them into the models for training.

We downloaded polyA-plus RNA-seq gene expression Transcripts Per Million (TPM) values for each of the 13 cell types in Appendix Table 1, from the ENCODE data portal (ENCODE Project Consortium, 2012) and preprocessed them with a log operation.

Normalizing p-values by adapting S3norm

We assigned IMR-90 to be a reference cell type, for each of the 6 histone PTMs and kept its p-values unchanged. We then performed a transformation for each of the remaining cell types adapted from the core technique developed by S3norm (Xiang et al., 2020), in order to normalize each histone PTM track in each of these remaining cell types, with respect to the corresponding histone PTM track in IMR-90.

First, we computed peaks in both, the reference as well as the target cell type. Peaks were defined as the 25 base pair bins corresponding to FDR-adjusted p-values less than 0.05 (Benjamini and Hochberg, 1995). For histone PTM tracks that were obtained from Avocado imputations (due to lack of availability of experimental data), peaks were defined to be the 1000 bins containing the smallest Avocado imputed p-values (based on suggestions from the authors of Avocado (Schreiber et al., 2020b )). All the remaining bins were defined to be background, for both, the reference as well as the target cell types.

We then computed the list of peaks that were common to both the reference and the target cell types. These were termed, common peaks. Similarly, we defined common background as the list of bins that were assigned to be background in both, the reference as well as the target cell types.

The S3norm method was designed to work with count data, which is always ≥ 1. However, the histone PTM tracks, which are represented as − log₁₀(p-values), are not guaranteed to always be ≥ 1, hence, we transformed all the histone PTM tracks by adding 1 to the − log₁₀(p-values), in both the reference as well as the target cell types.

Additionally, since the histone PTM tracks obtained from imputations performed by Avocado were not guaranteed to be distributed similar to experimental − log₁₀(p-values), we scaled all the histone PTM tracks (both experimental as well as Avocado imputations) by dividing them by the minimum observed value in common peaks and common background, in order to bring experimental data and Avocado imputations onto a similar footing. In particular, before applying the S3norm normalization, we transformed − log₁₀(p-values) in common peaks and common background for both the reference as well as the target cell type as following:

The normalization procedure of S3norm then wishes to find two positive parameters, α and β that are to be learned from the data such that both the following equations are satisfied:

Specifically, α is a scale factor that shifts the transformed − log₁₀(p-values) of the target data set in log scale, and β is a power transformation parameter that rotates the transformed − log₁₀(p-values) of the target data set in log scale (Figure 2–figure supplement 2). There is one and only one set of values for α and β that can simultaneously satisfy both the above equations for common peaks and the common background (Xiang et al., 2020).

The values of α and β were estimated by the Powell minimization method implemented in scipy (Fletcher and Powell, 1963; Virtanen et al., 2020). The resulting normalized − log₁₀(p-values) were used for all downstream analyses (Figure 2–figure supplement 3).

Training endogenous gene expression models

We trained convolutional neural network and ridge regression models, each, to predict gene expression using histone PTM tracks. Input features for each gene were centered at its TSS. We used an input context size of 10, 000 base pairs for all analyses subsequent to Figure 3. For all analyses we obtained predictions from our models by averaging predictions ensembled across 100 computational replicates.

To train convolutional neural network models, the normalized histone PTM tracks for each gene were processed with successive convolutional blocks. Each convolutional block consisted of a batch-normalization layer, rectified linear units (ReLU), a convolutional layer consisting of 32 convolutional kernels, each of width 5, followed by a dropout with 0.1 probability. Finally a pooling layer was applied to gradually reduce the dimension of the features. After being processed with 5 such convolutional blocks, the output was flattened and passed through a fully connected layer consisting of 16 neurons and a ReLU activation. This was ultimately processed with a fully connected layer with a single output and a linear activation (since this was a regression task). The models were trained with a mean squared error loss using the Adam optimizer with a learning rate of 0.001 for the first 50 epochs and 0.0005 for the remaining 50 epochs. Training convolutional neural network models took about 1.5 hours on 1 NVIDIA A100 Tensor Core GPU.

Interrogating the features learned by CNNs

To see how different features affected predicted levels of expression, we systematically perturbed each input feature and determined how much the perturbation affected predicted expression levels. To be concrete, we denoted the epigenetic feature at position i of gene g in cell type CT as We then defined a perturbation function that added a scalar value of λ₀ = 2500 to the epigenetic features within 3 bins of a focal position, say, j:

recalling that W is the number of bins of 25 base pairs considered by our models, which is set to 401, corresponding to a 10,000 base pair input context length, for all analyses subsequent to Figure 3. These perturbations corresponded to ∼ 150 base pairs which is roughly the length of DNA wrapped around a nucleosome.

To produce Figure 4, we applied the above perturbation functions, F₁, …, F_W to a histone PTM track of interest, and the measured the fold-change in predicted expression. To account for differences in the endogenous histone PTM tracks between genes, we averaged these fold-changes across 500 randomly chosen genes.

In silico modeling of dCas9-p300-based epigenome editing

Our model of how dCas9-p300 perturbs local histone PTMs has three separate components. We describe each of these components in turn, and then present the full model below. Throughout, we write j for the position that the gRNA targets.

First, we modeled steric hindrance of dCas9 due to nucleosomes. We used MNase-seq signal strength as a proxy for nucleosome occupancy. Letting m_j be the MNase-seq read coverage at the gRNA binding site, we modeled steric hindrance by scaling the acetylation activity of dCas9-p300 by a factor of exp (−5 × m_j).

Second, we assumed that dCas9-p300 primarily alters the levels of H3K27ac only locally. As such, we modeled the acetylation activity of dCas9-p300 at a particular locus as a Gaussian kernel centered at the gRNA. Concretely, the acetylation activity at position i is multiplied by a factor of exp −(i − j)²/2σ², where σ² is a parameter of the model.

Finally, we assumed that dCas9-p300 can only acetylate histones where they currently are – it cannot move histones or increase H3K27ac levels outside of histones. To model this mathematically, we multiplied the acetylation activity at site i by the MNase read coverage, m_i. Therefore, if the MNase read coverage is 0 (i.e., there is no evidence of histones at that locus) then the amount of H3K27ac added to that position is also 0.

Putting this all together, for a guide targeting at position j, the effect on H3K27ac levels at position i is proportional to

The constant of proportionality (i.e., how strong we expect dCas9-p300 to be overall) is treated as another free parameter, which we denote by λ.

ENCODE has epigenetic data for the HEK293 cell line, but we performed our dCas9-p300 perturbations in the HEK293T cell line. As such, we used the HEK293 histone PTM as well as RNA-seq data as a stand in for the HEK293T histone PTM and RNA-seq levels. This substitution is justified as gene expression levels for HEK293 and HEK293T are highly concordant (Figure 3–figure supplement 2). Indeed the Spearman’s rank correlation between expression levels for HEK293 and two independent measurements of expression levels in HEK293T are 0.86 and 0.88, which are comparable to the correlation between the two independent experiments in HEK293T (ρ = 0.92). That is, the correlation across experiments within HEK293T cells is only slightly higher than the correlation between HEK293 and and HEK293T, suggesting that cross-cell type differences between HEK293 and HEK293T are on the same order as the inherent experimental and biological noise within a single cell type.

Experimental procedure

The details of dCas9-p300 epigenome editing, qPCR, CUT&RUN, and MNase-seq experiments are provided in the Supplementary Material.

Additional information

Data and code availability

The MNase-seq data for the HEK293T cell line is available at BioProject ID PRJNA892960 on SRA. Code and data for training the gene expression models along with code for generating the figures in the manuscript are available at https://github.com/songlab-cal/epigenome_editing_2023. The data from the dCas9-p300 K562 Perturb-seq experiments is available at GSE255610 on SRA.

Declaration of interests

The authors have no competing interests.

Author Contributions

S.S.B., A.C., J.P.S, I.B.H., and Y.S.S. conceived the project and assisted with experimental design. A.C., J.G., S.S.A. and I.B.H. led laboratory experimentation and analyses, and S.S.B., J.P.S. and Y.S.S. led computational experimentation and analyses. All authors contributed to writing the manuscript.

Acknowledgements

This research is supported in part by NIH grants R35-GM134922 and R35-GM143532.

Appendix

Experimental Methods

Identification and Selection of gRNA

All gRNAs were designed following the same in silico identification algorithm. Genomic sequences were first identified using the UCSC Genome Browser and manipulated in Benchling. CRISPOR was used to separately identify all gRNA within a 500 bp window of each gene’s TSS. TSS coordinates were determined using Phantom prediction, gene annotation, and DNase Hypersensitivity data as visualized in the Genome Browser with the hg38 genome assembly. gRNA were selected from this list to optimize for coverage, predicted specificity (CFD score >=80), and predicted on target activity (Doench 16’ score) (Doench et al., 2016). When no gRNA were available in a region, predicted score constraints were minimally relaxed to identify a gRNA.

Plasmid & Guide Cloning, Transfection, mRNA extraction, and qPCR

dCas9-p300 was cloned into a lentiviral plasmid backbone (Addgene# 83889) was a gift from Gersbach lab. gRNA were cloned using the molecular cloning pipeline described by the Zhang group. These gRNA were cloned into an isogenic minimal guide expression backbone utilizing the Gecko guide cloning strategy (Shalem et al., 2014). This minimal guide cloning plasmid was a gift from Gersbach lab (Addgene#47108). Following sequence verification, gRNA tiling experiments were completed.

Cell Culture

HEK293T (ATCC, CRL-11268) were purchased from ATCC and cultured using supplemented DMEM (10%FBS (Millipore), 1% penicillin/streptomycin(Gibco)). These cells were initially expanded and cry-opreserved in 0.5% (vol/vol) DMSO containing supplemented DMEM at a concentration of 2E6/ml per vial. HEK293T cells were consistently passaged at 80% confluence using Trypsin/EDTA (Gibco) dissociation and passaged at a 1:10 ratio. Cells were disregarded after their 10th passage.

gRNA Tiling qPCR Experiments

On day 0 healthy HEK293T cells (<passage 10) were lifted with Trypsin EDTA, centrifuged, resuspended, and counted with a manual hemocytometer using Trypan blue to assess health. Cells with >95% viability were seeded into 24 well plates with a consistent cell number per well (1.5 × 10⁵). 24 hours post plating, cells with confluence of between 70% and 90% with healthy phenotype were cotransfected with individual gRNA and dCas9-p300 plasmid DNA (mass = 500 ng, 125ng, gRNA:375ng dCas9-p300, using 1.5ul Lipofectamine 3000) according to the manufacturer’s protocol. All qPCR experiments were conducted using 24 samples (2 biological samples per condition) measured in the 96-well format. Additionally, these experiments individually test 11 uniquely targeting gRNA and utilized a non-targeting gRNA as a negative control for downstream analysis. Total cell mRNA was extracted using the Qiagen RNeasy kit and protocol. Reverse transcription was then carried out using iScript Advanced reverse transcriptase (Biorad) 750ng of total RNA in a 10 ul reaction. From there, cDNA was diluted to 10ng/ul based on the initial total RNA input. Then qPCR reactions were assembled in technical duplicate and consisted of the following: 45ng (original mass) of reverse transcribed and diluted cDNA, Luna qPCR Mastermix (NEB), forward primer and reverse primer. The appropriate primer set was used to target (a) the gene intended for transcriptional modulation and (b) GAPDH a ubiquitously expressed gene used to normalize input cDNA mass.

MNase-seq

MNase sample processing was completed similarly to the previously methods (Cui and Zhao, 2012) with modifications. HEK293T cells were grown in parallel for > 3 passages. 3 biological replicates were processed together to minimize variance. Crosslinking was carried out on 20E7 HEK293T cells with 1% formaldehyde incubated for 10 minutes at 37C prior to Glycine quenching. Next, lysis and washing occurred followed by nuclei isolation via 600g centrifugation. An initial optimization was performed on 2E6 purified nuclei using MNase amounts between 0.1-64 units of enzyme. RNase treatment as well as Proteinase K treatment and removal of crosslinks were performed as previously described (McKnight et al., 2021). QIAquick PCR purification, sample DNA were visualized with the use of a 2% agarose gel and Tape station (Agilent). The mono nucleosomal band 150 bp was cut from the gel and purified using the QIAquick Gel purification. Heat was not used when melting gel to preserve AT rich regions. Following mono nucleosomal band purification samples were quantified and size verified using a Tape Station (Agilent). Illumina libraries were produced using the NEBNext Ultra II DNA library preparation kit with NEBNext Dual Index Multiplex Oligos for Illumina using 1 μg of purified DNA as input and SPRI bead size selection after adapter ligation prior to index addition. Color balanced unique i5 and i7 indices were used for each biological replicate to reduce confounds associated with index hopping. Prepared library concentrations and purity were determined on a tape station. Following verification, the 3 biological replicates were admixed with the same mass and sent to Azenta for sequencing on a single lane on the HISeq 3000/4000 platform with expected yield of 350 million paired end 150 bp length reads. This sequencing scheme was expected to yield a coverage of ∼10x for each biological replicate sample.

H3K27ac CUT&RUN qPCR

H3K27ac CUT&RUN qPCR CUT&RUN was completed using the CUTANA™ ChIC/CUT&RUN Kit by Epicypher (Catalog #: 14-1048). H3K27ac was bound using the Anti-Histone H3 (acetyl K27) antibody (Catalog#: ab4729) sold by Abcam. E. Coli spike-in DNA and Rabbit IgG (components of kit# : 14-1048) were used for qPCR input normalization and negative control respectively. All experiments were performed in duplicate with 3 independent experiments. Briefly, p300 and individual gRNA were co-transfected into HEK293T cells, after 72 hours cells were detached, and CUT&RUN was completed with identical cell number were used for each sample. qPCR was completed in technical duplicate using a primer set designed for amplification near the targeted promoter (CXCR4 or TGFBR1). qPCR reactions were also completed using a previously described primer set for quantification of the e.Coli gene uida. The ddT relative qPCR methods was used to analyze data, where uida Ct was used to normalized input and fold over rabbit IgG was calculated for each sample.

Computational Methods

Analysis of Perturb-seq Data

We analyzed single-cell RNA sequencing (scRNA-seq) data generated on the 10X Genomics platform as described previously (Goell et al., 2024), corresponding to G genes and r guide RNAs (gRNAs) across C cells using the following steps:

Quality Control

Cells were retained if the total gene count in a cell was between 1,000 and 10,000, ensuring adequate complexity and excluding potential empty droplets or doublets.
Cells with mitochondrial gene content exceeding 10% were excluded to avoid including dying or stressed cells.

Normalization

To normalize the expression data, the following was applied to each gene’s raw count in each cell:

Computing Perturb-seq Gene Expression

For each gene g, targeted by a set of gRNAs denoted as r_g, we determined the impact of each specific gRNA rⁱ on its gene expression by performing the following steps:

Expression Thresholding: Only cells with non-zero expression of gene g were selected for further analysis.
gRNA-specific Selection: From these cells, only those expressing the specific gRNA and none other from r_g were retained.
Pseudobulk Quantification: For these cells, we computed the pseudobulk mean (μ) and standard deviation (σ) of the expression levels of gene g.

Computing Perturb-seq fold-change

To establish the baseline expression of gene g, we considered cells not targeted by any gRNAs . The pseudobulk mean (μ_{control cells}) and standard deviation (σ_{control cells}) of gene g’s expression in these cells were calculated. The fold-change for gene g due to gRNA was then quantified as:

Plotting Perturb-seq fold change against model predictions

In order to prepare Figure 6–figure supplement 4, we performed the following steps:

genes were included if there were at least two distinct 25 bp bins within 250 base pairs of the TSS with a gRNA targeting that gene and having a distinct expression fold-change.
gRNAs were included if the number of cells expressing was ≥ 2.
For each bin, the average Perturb-seq fold-change μ and the average predicted fold-change μ_predicted were calculated as:
Ranks were assigned to μ_bin and μ_{predicted, bin} for comparison.
These ranks were then plotted against each other to evaluate the correlation between observed Perturb-seq fold-change and the model predicted fold-change.

ChIP-seq − log₁₀(p-values) were obtained from the ENCODE Imputation Challenge where the ground truth data were available (corresponding to entries labeled T in the table). Avocado imputations were downloaded from the ENCODE data portal, where ground truth data were not available (corresponding to entries labeled A in the table).

Endogenous gene expression of genes for which we generated dCas9-p300 epigenome editing data indicates that genes for which high fold-change was obtained are more likely to have low endogenous gene expression in HEK293T. Cross-cell type Spearman provides a metric to assess how accurate our CNN model predictions are, on any given gene, across the 13 cell types.

References

1. Avsec Ž
2. Agarwal V
3. Visentin D
4. Ledsam JR
5. Grabska-Barwinska A
6. Taylor KR
7. Assael Y
8. Jumper J
9. Kohli P
10. Kelley DR
2021Effective gene expression prediction from sequence by integrating long-range interactionsNat Methods 18:1196–1203Google Scholar
1. Barrett T
2. Wilhite SE
3. Ledoux P
4. Evangelista C
5. Kim IF
6. Tomashevsky M
7. Marshall KA
8. Phillippy KH
9. Sherman PM
10. Holko M
11. et al.
2012NCBI GEO: archive for functional genomics data sets—updateNucleic Acids Research 41:D991–D995Google Scholar
1. Benjamini Y
2. Hochberg Y.
1995Controlling the false discovery rate: a practical and powerful approach to multiple testingJournal of the Royal Statistical Society: Series B (Methodological) 57:289–300Google Scholar
1. Chen Y
2. Xie M
3. Wen J.
2022Predicting gene expression from histone modifications with self-attention based neural networks and transfer learningFrontiers in Genetics 13:1081842Google Scholar
1. Cheng Y
2. He C
3. Wang M
4. Ma X
5. Mo F
6. Yang S
7. Han J
8. Wei X.
2019Targeting epigenetic regulators for cancer therapy: mechanisms and advances in clinical trialsSignal Transduction and Targeted Therapy 4:1–39Google Scholar
1. Concordet JP
2. Haeussler M.
2018CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screensNucleic Acids Research 46:W242–W245Google Scholar
1. Cui K
2. Zhao K.
2012Genome-wide approaches to determining nucleosome occupancy in metazoans using MNase-SeqIn: Chromatin Remodeling Springer pp. 413–419Google Scholar
1. Doench JG
2. Fusi N
3. Sullender M
4. Hegde M
5. Vaimberg EW
6. Donovan KF
7. Smith I
8. Tothova Z
9. Wilen C
10. Orchard R
11. et al.
2016Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9Nature Biotechnology 34:184–191Google Scholar
1. Dominguez AA
2. Chavez MG
3. Urke A
4. Gao Y
5. Wang L
6. Qi LS
2022CRISPR-mediated Synergistic Epigenetic and Transcriptional ControlThe CRISPR Journal 5:264–275Google Scholar
1. ENCODE Project Consortium
2012An integrated encyclopedia of DNA elements in the human genomeNature 489:57–74Google Scholar
1. Escobar M
2. Li J
3. Patel A
4. Liu S
5. Xu Q
6. Hilton IB
2022Quantification of genome editing and transcriptional control capabilities reveals hierarchies among diverse CRISPR/Cas systems in human cellsACS Synthetic Biology 11:3239–3250Google Scholar
1. Estey E.
2013Epigenetics in clinical practice: the examples of azacitidine and decitabine in myelodysplasia and acute myeloid leukemiaLeukemia 27:1803–1812Google Scholar
1. Feng J
2. Liu T
3. Qin B
4. Zhang Y
5. Liu XS
2012Identifying ChIP-seq enrichment using MACSNature Protocols 7:1728–1740Google Scholar
1. Fletcher R
2. Powell MJ
1963A rapidly convergent descent method for minimizationThe Computer Journal 6:163–168Google Scholar
1. Frasca F
2. Matteucci M
3. Leone M
4. Morelli MJ
5. Masseroli M.
2022Accurate and highly interpretable prediction of gene expression from histone modificationsBMC Bioinformatics 23:151Google Scholar
1. Gemberling MP
2. Siklenka K
3. Rodriguez E
4. Tonn-Eisinger KR
5. Barrera A
6. Liu F
7. Kantor A
8. Li L
9. Cigliola V
10. Hazlett MF
11. et al.
2021Transgenic mice for in vivo epigenome editing with CRISPR-based systemsNature methods 18:965–974Google Scholar
1. Gibney E
2. Nolan C.
2010Epigenetics and gene expressionHeredity 105:4–13Google Scholar
1. Gilbert LA
2. Horlbeck MA
3. Adamson B
4. Villalta JE
5. Chen Y
6. Whitehead EH
7. Guimaraes C
8. Panning B
9. Ploegh HL
10. Bassik MC
11. et al.
2014Genome-scale CRISPR-mediated control of gene repression and activationCell 159:647–661Google Scholar
1. Goell JH
2. Hilton IB
2021CRISPR/Cas-based epigenome editing: advances, applications, and clinical utilityTrends in Biotechnology 39:678–691Google Scholar
1. Goell JH
2. Li J
3. Mahata B
4. Ma AJ
5. Kim S
6. Shah S
7. Shah S
8. Contreras M
9. Misra S
10. Reed D
11. Bedford GC
12. Escobar M
13. Hilton IB
2024Tailoring a CRISPR/Cas-based Epigenome Editor for Programmable Chromatin Acylation and Decreased CytotoxicitybioRxiv https://doi.org/10.1101/2024.09.22.611000 Google Scholar
1. Hafner A
2. Boettiger A.
2022The spatial organization of transcriptional controlNature Reviews Genetics :1–16Google Scholar
1. Hamdy R
2. Maghraby FA
3. Omar YM
2022Convchrome: Predicting gene expression based on histone modifications using deep learning techniquesCurrent Bioinformatics 17:273–283Google Scholar
1. Hilton IB
2. D’ippolito AM
3. Vockley CM
4. Thakore PI
5. Crawford GE
6. Reddy TE
7. Gersbach CA
2015Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancersNature Biotechnology 33:510–517Google Scholar
1. Holoch D
2. Moazed D.
2015RNA-mediated epigenetic regulation of gene expressionNat Rev Genet 16:71–84Google Scholar
1. Horlbeck MA
2. Witkowsky LB
3. Guglielmi B
4. Replogle JM
5. Gilbert LA
6. Villalta JE
7. Torigoe SE
8. Tjian R
9. Weissman JS
2016Nucleosomes impede Cas9 access to DNA in vivo and in vitroeLife 5:e12677Google Scholar
1. Isaac RS
2. Jiang F
3. Doudna JA
4. Lim WA
5. Narlikar GJ
6. Almeida R.
2016Nucleosome breathing and remodeling constrain CRISPR-Cas9 functioneLife 5:e13450Google Scholar
1. Jenuwein T
2. Allis CD
2001Translating the histone codeScience 293:1074–1080Google Scholar
1. Jinek M
2. Chylinski K
3. Fonfara I
4. Hauer M
5. Doudna JA
6. Charpentier E.
2012A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunityScience 337:816–821Google Scholar
1. Karlić R
2. Chung HR
3. Lasserre J
4. Vlahovicek K
5. Vingron M.
2010Histone modification levels are predictive for gene expressionProc Natl Acad Sci U S A 107:2926–2931Google Scholar
1. Keung AJ
2. Joung JK
3. Khalil AS
4. Collins JJ
2015Chromatin regulation at the frontier of synthetic biologyNat Rev Genet 16:159–171Google Scholar
1. Kimura H.
2013Histone modifications for human epigenome analysisJ Hum Genet 58:439–445Google Scholar
1. Klann TS
2. Black JB
3. Chellappan M
4. Safi A
5. Song L
6. Hilton IB
7. Crawford GE
8. Reddy TE
9. Gersbach CA
2017CRISPR– Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genomeNature Biotechnology 35:561–568Google Scholar
1. Klemm SL
2. Shipony Z
3. Greenleaf WJ
2019Chromatin accessibility and the regulatory epigenomeNature Reviews Genetics 20:207–220Google Scholar
1. Kouzarides T.
2007Chromatin modifications and their functionCell 128:693–705Google Scholar
1. Kwon DY
2. Zhao YT
3. Lamonica JM
4. Zhou Z.
2017Locus-specific histone deacetylation using a synthetic CRISPR-Cas9-based HDACNature Communications 8:1–8Google Scholar
1. Li J
2. Mahata B
3. Escobar M
4. Goell J
5. Wang K
6. Khemka P
7. Hilton IB
2021Programmable human histone phosphorylation and gene activation using a CRISPR/Cas9-based chromatin kinaseNature Communications 12:1–10Google Scholar
1. Liu G
2. Zhang Y
3. Zhang T.
2020Computational approaches for effective CRISPR guide RNA design and evaluationComputational and Structural Biotechnology Journal 18:35–44Google Scholar
1. Mahata B
2. Cabrera A
3. Brenner DA
4. Guerra-Resendez RS
5. Li J
6. Goell J
7. Wang K
8. Guo Y
9. Escobar M
10. Parthasarathy AK
11. et al.
2023Compact engineered human mechanosensitive transactivation modules enable potent and versatile synthetic transcriptional controlNature Methods 20:1716–1728Google Scholar
1. Makasheva K
2. Bryan LC
3. Anders C
4. Panikulam S
5. Jinek M
6. Fierz B.
2021Multiplexed single-molecule experiments reveal nucleosome invasion dynamics of the Cas9 genome EditorJournal of the American Chemical Society 143:16313–16319Google Scholar
1. Mali P
2. Esvelt KM
3. Church GM
2013Cas9 as a versatile tool for engineering biologyNature Methods 10:957–963Google Scholar
1. Matharu N
2. Ahituv N.
2020Modulating gene regulation to treat genetic disordersNat Rev Drug Discov 19:757–775Google Scholar
1. McKnight LE
2. Crandall JG
3. Bailey TB
4. Banks OG
5. Orlandi KN
6. Truong VN
7. Donovan DA
8. Waddell GL
9. Wiles ET
10. Hansen SD
11. et al.
2021Rapid and inexpensive preparation of genome-wide nucleosome footprints from model and non-model organismsSTAR protocols 2:100486Google Scholar
1. Millán-Zambrano G
2. Burton A
3. Bannister AJ
4. Schneider R.
2022Histone post-translational modifications—cause and consequence of genome functionNature Reviews Genetics :1–18Google Scholar
1. Mohr SE
2. Hu Y
3. Ewen-Campen B
4. Housden BE
5. Viswanatha R
6. Perrimon N.
2016CRISPR guide RNA design for research applicationsThe FEBS Journal 283:3232–3238Google Scholar
1. Nuñez JK
2. Chen J
3. Pommier GC
4. Cogan JZ
5. Replogle JM
6. Adriaens C
7. Ramadoss GN
8. Shi Q
9. Hung KL
10. Samelson AJ
11. et al.
2021Genome-wide programmable transcriptional memory by CRISPR-based epigenome editingCell 184:2503–2519Google Scholar
1. O’Geen H
2. Ren C
3. Nicolet CM
4. Perez AA
5. Halmai J
6. Le VM
7. Mackay JP
8. Farnham PJ
9. Segal DJ
2017dCas9-based epigenome editing suggests acquisition of histone methylation is not sufficient for target gene repressionNucleic Acids Research 45:9901–9916Google Scholar
1. Radzisheuskaya A
2. Shlyueva D
3. Müller I
4. Helin K.
2016Optimizing sgRNA position markedly improves the efficiency of CRISPR/dCas9-mediated transcriptional repressionNucleic Acids Research 44:e141–e141Google Scholar
1. Rao SSP
2. Huntley MH
3. Durand NC
4. Stamenova EK
5. Bochkov ID
6. Robinson JT
7. Sanborn AL
8. Machol I
9. Omer AD
10. Lander ES
11. Aiden EL
2014A 3D map of the human genome at kilobase resolution reveals principles of chromatin loopingCell 159:1665–1680Google Scholar
1. Reik W
2. Dean W
3. Walter J.
2001Epigenetic reprogramming in mammalian developmentScience 293:1089–1093Google Scholar
1. Roadmap Epigenomics Consortium
2. Kundaje A
3. Meuleman W
4. Ernst J
5. Bilenky M
6. Yen A
7. Heravi-Moussavi A
8. Kheradpour P
9. Zhang Z
10. Wang J
11. Ziller MJ
12. Amin V
13. Whitaker JW
14. Schultz MD
15. Ward LD
16. Sarkar A
17. Quon G
18. Sandstrom RS
19. Eaton ML
20. Wu YC
21. et al.
2015Integrative analysis of 111 reference human epigenomesNature 518:317–330Google Scholar
1. Sanson KR
2. Hanna RE
3. Hegde M
4. Donovan KF
5. Strand C
6. Sullender ME
7. Vaimberg EW
8. Goodale A
9. Root DE
10. Piccioni F
11. et al.
2018Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalitiesNature Communications 9:1–15Google Scholar
1. Schmidt F
2. Gasparoni N
3. Gasparoni G
4. Gianmoena K
5. Cadenas C
6. Polansky JK
7. Ebert P
8. Nordström K
9. Barann M
10. Sinha A
11. Fröhler S
12. Xiong J
13. Dehghani Amirabad A
14. Behjati Ardakani F
15. Hutter B
16. Zipprich G
17. Felder B
18. Eils J
19. Brors B
20. Chen W
21. et al.
2017Combining transcription factor binding affinities with open-chromatin data for accurate gene expression predictionNucleic Acids Res 45:54–66Google Scholar
1. Schmidt F
2. Kern F
3. Schulz MH
2020Integrative prediction of gene expression with chromatin accessibility and conformation dataEpigenetics Chromatin 13:4Google Scholar
1. Schmidt R
2. Steinhart Z
3. Layeghi M
4. Freimer JW
5. Bueno R
6. Nguyen VQ
7. Blaeschke F
8. Ye CJ
9. Marson A.
2022CRISPR activation and interference screens decode stimulation responses in primary human T cellsScience 375:eabj4008Google Scholar
1. Schreiber J
2. Bilmes J
3. Noble WS
2020aCompleting the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamplesGenome Biol 21:82Google Scholar
1. Schreiber J
2. Durham T
3. Bilmes J
4. Noble WS
2020bAvocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenomeGenome Biology 21:1–18Google Scholar
1. Segelle A
2. Núñez-Álvarez Y
3. Oldfield AJ
4. Webb KM
5. Voigt P
6. Luco RF
2022Histone marks regulate the epithelial-to-mesenchymal transition via alternative splicingCell Reports 38:110357Google Scholar
1. Sekhon A
2. Singh R
3. Qi Y.
2018DeepDiff: DEEP-learning for predicting DIFFerential gene expression from histone modificationsBioinformatics 34:i891–i900Google Scholar
1. Shalem O
2. Sanjana NE
3. Hartenian E
4. Shi X
5. Scott DA
6. Mikkelsen TS
7. Heckl D
8. Ebert BL
9. Root DE
10. Doench JG
11. et al.
2014Genome-scale CRISPR-Cas9 knockout screening in human cellsScience 343:84–87Google Scholar
1. Singh R
2. Lanchantin J
3. Robins G
4. Qi Y.
2016DeepChrome: deep-learning for predicting gene expression from histone modificationsBioinformatics 32:i639–i648Google Scholar
1. Singh R
2. Lanchantin J
3. Sekhon A
4. Qi Y.
2017Attend and predict: Understanding gene regulation by selective attention on chromatinAdvances in neural information processing systems 30Google Scholar
1. Stepper P
2. Kungulovski G
3. Jurkowska RZ
4. Chandra T
5. Krueger F
6. Reinhardt R
7. Reik W
8. Jeltsch A
9. Jurkowski TP
2017Efficient targeted DNA methylation with chimeric dCas9–Dnmt3a–Dnmt3L methyltransferaseNucleic Acids Research 45:1703–1713Google Scholar
1. Stillman B.
2018Histone Modifications: Insights into Their Influence on Gene ExpressionCell 175:6–9Google Scholar
1. Strahl BD
2. Allis CD
2000The language of covalent histone modificationsNature 403:41–45Google Scholar
1. Stricker SH
2. Köferle A
3. Beck S.
2017From profiles to function in epigenomicsNature Reviews Genetics 18:51–66Google Scholar
1. Swaminathan V
2. Reddy BAA
3. Ruthrotha Selvi B
4. Sukanya MS
5. Kundu TK
2007Small molecule modulators in epigenetics: implications in gene expression and therapeuticsSubcell Biochem 41:397–428Google Scholar
1. Taherian Fard A
2. Ragan MA
2019Quantitative Modelling of the Waddington Epigenetic LandscapeMethods Mol Biol 1975:157–171Google Scholar
1. Thakore PI
2. Black JB
3. Hilton IB
4. Gersbach CA
2016Editing the epigenome: technologies for programmable transcription and epigenetic modulationNature Methods 13:127–137Google Scholar
1. Virtanen P
2. Gommers R
3. Oliphant TE
4. Haberland M
5. Reddy T
6. Cournapeau D
7. Burovski E
8. Peterson P
9. Weckesser W
10. Bright J
11. et al.
2020SciPy 1.0: fundamental algorithms for scientific computing in PythonNature Methods 17:261–272Google Scholar
1. Wang K
2. Escobar M
3. Li J
4. Mahata B
5. Goell J
6. Shah S
7. Cluck M
8. Hilton IB
2022Systematic comparison of CRISPR-based transcriptional activators uncovers gene-regulatory features of enhancer–promoter interactionsNucleic Acids Research 50:7842–7855Google Scholar
1. Weinert BT
2. Narita T
3. Satpathy S
4. Srinivasan B
5. Hansen BK
6. Schölz C
7. Hamilton WB
8. Zucconi BE
9. Wang WW
10. Liu WR
11. et al.
2018Time-resolved analysis reveals rapid dynamics and broad scope of the CBP/p300 acetylomeCell 174:231–244Google Scholar
1. Xiang G
2. Keller CA
3. Giardine B
4. An L
5. Li Q
6. Zhang Y
7. Hardison RC
2020S3norm: simultaneous normalization of sequencing depth and signal-to-noise ratio in epigenomic dataNucleic Acids Res 48:e43Google Scholar
1. Xu K
2. Zhang M
3. Li J
4. Du SS
5. Ki Kawarabayashi
6. Jegelka S.
2020How neural networks extrapolate: From feedforward to graph neural networksarXiv :arXiv:200911848Google Scholar
1. Yoon S
2. Eom GH
2016HDAC and HDAC inhibitor: from cancer to cardiovascular diseasesChonnam Medical Journal 52:1–11Google Scholar
1. Zhang Y
2. Reinberg D.
2001Transcription regulation by histone methylation: interplay between different covalent modifications of the core histone tailsGenes Dev 15:2343–2360Google Scholar
1. Zhao W
2. Xu Y
3. Wang Y
4. Gao D
5. King J
6. Xu Y
7. Liang FS
2021Investigating crosstalk between H3K27 acetylation and H3K4 trimethylation in CRISPR/dCas-based epigenome editing and gene activationScientific Reports 11:1–11Google Scholar
1. Zheng X
2. Cui J
3. Wang Y
4. Zhang J
5. Wang C.
2021CRSIPR-AI: a webtool for the efficacy prediction of CRISPR activation and interferencebioRxiv Google Scholar
1. Zhong H
2. Kim S
3. Zhi D
4. Cui X.
2019Predicting gene expression using DNA methylation in three human populationsPeerJ 7:e6757Google Scholar
1. Zhou X
2. Blocker AW
3. Airoldi EM
4. O’Shea EK
2016A computational approach to map nucleosome positions and alternative chromatin states with base pair resolutionElife 5:e16970Google Scholar

Article and author information

Author information

Sanjit Singh Batra
Computer Science Division, University of California, Berkeley, Berkeley, United States
- Equally contributing authors
Alan Cabrera
Department of Bioengineering, Rice University, Houston, United States
- Equally contributing authors
Jeffrey P Spence
Department of Genetics, Stanford University, Stanford, United States
ORCID iD: 0000-0002-3199-1447
- Equally contributing authors
Jacob Goell
Department of Bioengineering, Rice University, Houston, United States
Selvalakshmi S Anand
Systems, Synthetic, and Physical Biology Graduate Program, Rice University, Houston, United States
Isaac B Hilton
Department of Bioengineering, Rice University, Houston, United States, Systems, Synthetic, and Physical Biology Graduate Program, Rice University, Houston, United States
ORCID iD: 0000-0002-3064-8532
- For correspondence: isaac.hilton@rice.edu
Yun S Song
Computer Science Division, University of California, Berkeley, Berkeley, United States, Department of Statistics, University of California, Berkeley, Berkeley, United States
ORCID iD: 0000-0002-0734-9868
- For correspondence: yss@berkeley.edu

Version history

Preprint posted: October 3, 2023
Sent for peer review: October 3, 2023
Reviewed Preprint version 1: December 20, 2023
Reviewed Preprint version 2: December 19, 2024
Reviewed Preprint version 3: May 27, 2025
Version of Record published: January 12, 2026

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.92991. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 0
downloads: 160
citations: 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.