Study design and identification of enhancer activity

(A) Sheared DNA from the GM12878 cell line was subjected to enrichment via capture with probes targeting loci selected in reduced representation bisulfite sequencing (RRBS) workflows (MspI targets), CpG sites on the Infinium EPIC array, the gene NR3C1 and flanking regions, and 100,000 randomly distributed control regions. Library diversity summaries are shown in Figure 1-figure supplement 2. (B) Captured loci were cloned into the mSTARR-seq vector, pmSTARRseq1, treated with either the CpG methylating enzyme M. SssI or a sham treatment, and transfected into K562 cells. Methylation levels of replicate samples are shown in Figure 1-figure supplement 3. Right panel shows an example of DNA methylation-dependent regulatory activity near the first exon of the TTC32 gene, where the methylation-dependent regulatory element overlaps an active promoter chromatin state (red horizontal bar denotes active promoter as defined by ENCODE (ENCODE, 2012)). (C) mSTARR-seq regulatory activity in the baseline condition is strongly enriched in ENCODE-defined enhancers (indicated in blue) and promoters, and depleted in repressed, repetitive, and heterochromatin states. Regions with mSTARR-seq regulatory activity detected in this experiment also significantly overlap with regions with regulatory activity in other mSTARR-seq and conventional STARR-seq datasets (Figure 1-figure supplements 4-6). (D) Left column shows, under the baseline condition (i.e. unstimulated cells), the proportion of 600 bp windows that exhibited minimal regulatory activity (at least 3 replicate samples produced at least 1 RNA-seq read in either the methylated condition or in the unmethylated condition) in the mSTARR-seq assay (pink) versus those with detectable input DNA but no evidence of regulatory activity (blue), for windows containing sites from each target set. Note that a single 600 bp window can contain multiple target types (Figure 1-figure supplement 1). Right column shows the proportion of windows with regulatory capacity (i.e., the subset of the windows represented in pink on the left that produce excess RNA relative to the DNA input at FDR < 1%) that are also methylation-dependent (dark brown). Within each column, pie charts are scaled by the total numbers of windows represented.

DNA methylation-environment interactions reveal methylation-dependent responses to IFNA and dexamethasone challenge

(A) Full mSTARR-seq design across DNA methylation and challenge conditions. An example of a DNA methylation-environment interaction is shown overlapping the interferon-induced gene IFIT5 and an ENCODE-annotated weak promoter (pink denotes weak promoter, yellow denotes heterochromatin, and green denotes weak transcription; the endogenous IFIT5 gene expression response to IFNA in our experiment is shown in Figure 2-figure supplement 2). Three consecutive 600 bp windows have interaction FDR < 1 × 10−4 in this region. Panels depict non-normalized, raw read pileups for mSTARR-seq RNA replicates, with all y-axis maximums set to 14000. No methylation-dependent activity is detectable in the baseline condition because this enhancer element is inactive. Upon IFNA stimulation, only unmethylated enhancer elements are capable of responding. (B) Upset plot showing shared and unique mSTARR-seq identified enhancer elements across conditions. While many elements are shared, 3426 are unique to a single condition (FET log-odds results for magnitude of overlap are shown in Figure 2-figure supplement 1). (C) Top five most enriched transcription factor binding motifs in IFNA- and dex-specific mSTARR-seq enhancers, compared to all windows tested. Whiskers show the standard error. See Supplementary Files 8 and 9 for all enrichment results. (D) Genes targeted by ISRE enhancers (ISRE enhancers identified from ENCODE ChIP-seq data; gene targets identified from enhancer-gene linkages from EnhancerAtlas 2.0 (Gao et al., 2016)) that are also identified as IFNA condition-specific mSTARR-seq enhancers show stronger K562 endogenous gene expression responses to IFNA stimulation than non-ISRE targets (unpaired t-test: t = 3.58, df = 118.36, p = 5.01 × 10−4; Supplementary File 12). Each box represents the interquartile range, with the median value depicted as a horizontal bar. Whiskers extend to the most extreme values within 1.5x of the interquartile range. E) mSTARR-seq regulatory activity for windows containing ISRE targets (n = 1005 windows) interacts strongly with exposure to IFNA. These windows are capable of mounting a strong response to IFNA stimulation when unmethylated (dashed line; paired t-test: t = 23.02, df = 1004, p = 1.78 × 10−94) but not when methylated (solid line; paired t-test: t = −1.74, df = 1004, p = 0.082). Dots show the mean beta corresponding to enrichment of RNA reads versus DNA reads across windows; whiskers show the standard error. Because y-axis values correspond to model estimates, they can be positive (i.e., more mSTARR-seq RNA reads than input DNA reads) or negative values (i.e., fewer mSTARR-seq RNA reads than mSTARR-Seq input DNA reads, indicating no regulatory activity).

DNA methylation in mSTARR-seq enhancers predicts in vivo gene expression in macrophages

(A) Study design of the in vivo experiment, in which matched macrophage samples from 35 individuals were either left non-infected or infected with influenza A virus (IAV) for 24 hours and processed for RNA-seq and whole genome bisulfite sequencing (WGBS; (Aracena et al., 2022)). (B) Within individuals, DNA methylation (DNAm) levels at mSTARR-seq enhancers in non-infected cells are negatively correlated with the nearest genes’ transcriptional responses to IAV, but only in mSTARR-seq enhancers that were specific to the IFNA condition (IFNA-specific enhancers: n = 1033, mean Pearson’s r = −0.170 ± 0.009 s.d., all Bonferroni-corrected p < 2 × 10−5; shared enhancers: n = 1736, mean Pearson’s r = −0.049 ± 0.01 s.d., all Bonferroni-corrected p > 0.1). Each colored line represents an individual, and vertical gray lines represent 95% confidence intervals (see Supplementary File 13 for full results). (C) The average within-individual correlation (r) between DNA methylation and gene expression (GE) is 2.44 times as large after infection (r = −0.261 ± 0.006) than at baseline (r = −0.106 ± 0.008) in IFNA-specific mSTARR-seq enhancers (right panel), but much less affected by infection at mSTARR-seq enhancers that are shared across conditions (left panel). Each colored line represents an individual (see Supplementary File 14 for full results). (D) Across individuals, the ISG15 transcriptional response to IAV is significantly correlated with average DNAm at the mSTARR-Seq enhancer chr1:1013400-1014000 in non-infected cells (R2 = 0.381, p = 6.05 × 10−5, q = 0.084). Each dot represents an individual (see Supplementary File 15 for full results and Figure 3-figure supplement 1 for condition-specific results). (E) The mSTARR-seq enhancer predictive of ISG15 response to IAV (dark green bar) is located in the active promoter of ISG15 (as defined by ENCODE (ENCODE, 2012); red denotes active promoter, pink denotes weak promoter, orange denotes strong enhancer, yellow denotes weak enhancer). Three adjacent, methylation-dependent, IFNA-specific mSTARR-seq enhancers were identified (light green), but do not significantly predict ISG15 response to IAV (q > 10%). The bottom 6 tracks depict non-normalized, raw read pileups for mSTARR-seq RNA replicates in either the unmethylated (open circle) or methylated (filled circle) condition, with all y-axis maximums set to 20,000.

Early life adversity-associated CpG sites are not enriched for mSTARR-seq regulatory activity

Log2-tranformed odds ratios from Fisher’s Exact Tests for enrichment relative to the background set of sites on each array platform, for 27 studies of early life adversity-DNA methylation level correlations (see Supplementary File 16 for full FET results). Whiskers show standard error. Only Cao-Lei et al., 2014 shows significant enrichment for regulatory activity (log2(OR) [95% CI] = 1.343 [0.715, 1.909], p = 3.39 × 10−5), but these sites are not more likely to exhibit methylation-dependent activity than expected by chance (log2(OR) [95% CI] = 0.884 [-0.338, 2.130], p = 0.16). For details on the source tissue and measures of ELA, see Figure 4-figure supplement 1.

Overlap of target genomic regions with each other

Upset plot showing the degree to which 600 bp non-overlapping genomic windows are shared between the four target genomic regions (EPIC CpGs, Msp1 CpG cut sites, the NR3C1 region, or control sites). Overlap occurs because a single 600 bp genomic window can simultaneously include EPIC CpGs, Msp1 CpG cut sites, the NR3C1 region, and/or control sites. This plot includes 722,472 unique windows, reflecting the set of windows containing at least 1 basepair of target loci.

mSTARR-seq library diversity

Comparison of diversity of unique mSTARR-seq DNA and RNA fragments from the library generated in this study (transfected into K562 cells) relative to the library published in Lea et al., 2018 (independently transfected into K562 cells). Each dot represents an experimental replicate. Each box represents the interquartile range, with the median value depicted as a horizontal bar. Whiskers extend to the most extreme values within 1.5x of the interquartile range.

Methylation levels of mSTARR-seq DNA, pre- and post-transfection

(A) Bisulfite sequencing shows that DNA methylation on the mSTARR-seq plasmid is maintained until the end of the experiment (i.e., 48 hours after transfection), with significantly higher methylation levels in the replicates from the methyltransferase reaction relative to the replicates from the sham methyltransferase reaction (mean methylated = 0.885, mean unmethylated = 0.066; unpaired t-test: t = −14.66, df = 15,124, p = 2.39 × 10−10). Each dot represents an experimental replicate. Red dots indicate post-transfection DNA samples; the single blue dot per condition indicates pre-transfection DNA methylation levels. Methylation is based on the CpG at the position 2294, which is located in the plasmid region used for Gibson assembly. One sample from the dex sham reaction, L31395, shows an unexpectedly high level of methylation, which appears to be due to an error during generation of the bisulfite sequencing library (e.g., mislabeled tube or poor bisulfite conversion), and not the experimental replicate of cells itself, as the mSTARR-seq RNA library (L31244) from the same replicate clusters with the unmethylated sham replicates as expected (panel B). (B) The first two principal components summarizing overall counts of mSTARR-seq reads for the dex-treated RNA samples (i.e., the raw readout of overall regulatory activity). Each dot represents an experimental replicate, with red and black indicating sham and methylated replicates, respectively. Overall regulatory activity of sample L31244 (indicated by arrow) clusters with the sham replicates as expected, suggesting that this replicate was indeed transfected with sham-treated mSTARR-seq DNA.

Overlap of regulatory activity across datasets

Regulatory regions (in either the unmethylated sham condition, the methylated condition, or both) identified via mSTARR-seq in this study significantly overlap with: K562 regulatory regions (in either the unmethylated sham or methylated condition, or both) from a previously generated mSTARR-seq dataset reanalyzed with our pipeline (Lea et al., 2018) (log2(OR) [95% CI] = 6.212 [6.086, 6.440], p < 1.0 × 10−300); regulatory regions (in either the unmethylated sham or methylated condition, or both) from an mSTARR-seq experiment in HepG2 liver cells (log2(OR) [95% CI] = 3.534 [3.381, 3.684], p = 5.21 × 10−307); and regulatory regions from a conventional STARR-seq experiment (i.e., an unmethylated condition) in A549 lung epithelial cells (Johnson et al., 2018) (log2(OR) [95% CI] = 2.451 [2.442, 2.461], p < 1.0 × 10−300). Bars represent 95% confidence intervals.

Library diversity and regions of regulatory activity in HepG2 cells

(A) Comparison of diversity of unique mSTARR-seq DNA and RNA fragments from the library published in Lea et al 2018 (transfected into K562 cells) versus the same library transfected into HepG2 cells in this study. Each dot represents an experimental replicate. Each box represents the interquartile range, with the median value depicted as a horizontal bar. Whiskers extend to the most extreme values within 1.5x of the interquartile range. (B) mSTARR-seq regulatory activity in HepG2 cells is strongly enriched in ENCODE-defined enhancers (indicated in blue) and promoters, and depleted in repressed and repetitive states.

Methylation-dependent regulatory activity across datasets

Effects of methylation on regulatory activity estimated in this study are consistent with methylation effects in K562s estimated from a previously generated mSTARR-seq dataset (Lea et al., 2018) and with methylation effects estimated in HepG2 liver cells (K562: R2 = 0.286 for 1250 windows with FDR < 1% in both data sets, p = 3.19 × 10−93; HepG2: R2 = 0.277 for 511 windows with FDR < 1% in both data sets, p = 8.87 × 10−38). Each dot represents a 600 basepair regulatory window identified in both datasets (FDR < 1%; note that not all regulatory windows show significant methylation dependence (MD)). Dashed lines are the best fit lines. In all cases, negative effect sizes correspond to reduced activity in the methylated condition and positive effect sizes correspond to increased activity in the methylated condition.

Regulatory activity and effects of methylation across environmental conditions

(A) Regulatory regions in the baseline condition are highly likely to retain regulatory activity upon challenge with IFNA or dex (IFNA log2(OR) [95% CI] = 8.639 [8.499, 8.812], p < 1.0 × 10−300; dex log2(OR) [95% CI] = 9.640 [9.483, 9.776], p < 1.0 × 10−300). Regulatory regions also significantly overlap between IFNA- and dex-challenged cells (log2(OR) [95% CI] = 8.554 [8.420, 8.698], p < 1.0 × 10−300). (B) Regulatory windows identified in two environmental conditions tend to share significant effects of DNA methylation on regulatory activity (i.e., interaction effects between methylation and regulatory activity) across the two environmental conditions (baseline and IFNA log2(OR) [95% CI] = 6.345 [5.673, 7.102], p < 7.33 × 10−239; baseline and dex log2(OR) [95% CI] = 5.982 [5.599, 6.384], p < 1.0 × 10−300; IFNA and dex log2(OR) [95% CI] = 5.576 [5.123, 6.055], p < 4.70 × 10−266).

IFIT5 endogenous gene expression is responsive to IFNA stimulation

Tracks show non-normalized, raw read pile-ups of endogenous IFIT5 (ENSG00000152778) gene expression in either the unmethylated (open circle) or methylated (filled circle) condition, with all y-axis maximums set to 100. The replicates for endogenous gene expression shown here are the same replicates used for measuring regulatory activity shown in Figure 2A. IFIT5 is only detectably expressed after IFNA stimulation (note that the difference in peak heights between the IFNA-stimulated unmethylated and methylated conditions is because the plot shows raw reads; there is no effect of methylation treatment on endogenous IFIT5 gene expression after normalization for library size: p = 0.489).

Across individuals, methylation in the mSTARR-seq annotated enhancer chr1:1013400-1014000 predicts the ISG15 gene expression response to flu

(A) Across individuals, average methylation within the mSTARR-seq annotated enhancer chr1:1013400-1014000 in non-infected baseline macrophages significantly predicts ISG15 gene expression (GE) in the non-infected condition (R2 = 0.324, p = 2.66 × 10−4), but (B) not in the IAV-infected condition (R2 = 0.001, p = 0.316). Each dot represents an individual. Relative GE is log(CPM) after regressing out the effects of sequencing batch, age, and admixture. (C) Individuals with relatively low methylation in the mSTARR-seq chr1:1013400-1014000 enhancer (indicated by lighter line color) in the baseline, non-infected condition tend to have higher ISG15 gene expression in the non-infected condition, ultimately resulting in a shallower ISG15 transcriptional response to IAV infection (as indicated by slope of the line). Each line represents an individual.

Summary of early-life adversity (ELA) studies

Colored and gray rectangles indicate ranges of age at adversity and at sample collection, respectively. Circles represent mean ages at sample collection and are colored according to the tissue type used to measure methylation levels. Black solid line indicates standard errors of ages at sample collection. See Supplementary File 16 for results of Fisher’s exact tests assessing enrichment of ELA-associated CpGs for regulatory activity.