Massively multiplex single-molecule oligonucleosome footprinting

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Our understanding of the beads-on-a-string arrangement of nucleosomes has been built largely on high-resolution sequence-agnostic imaging methods and sequence-resolved bulk biochemical techniques. To bridge the divide between these approaches, we present the single-molecule adenine methylated oligonucleosome sequencing assay (SAMOSA). SAMOSA is a high-throughput single-molecule sequencing method that combines adenine methyltransferase footprinting and single-molecule real-time DNA sequencing to natively and nondestructively measure nucleosome positions on individual chromatin fibres. SAMOSA data allows unbiased classification of single-molecular 'states' of nucleosome occupancy on individual chromatin fibres. We leverage this to estimate nucleosome regularity and spacing on single chromatin fibres genome-wide, at predicted transcription factor binding motifs, and across human epigenomic domains. Our analyses suggest that chromatin is comprised of both regular and irregular single-molecular oligonucleosome patterns that differ subtly in their relative abundance across epigenomic domains. This irregularity is particularly striking in constitutive heterochromatin, which has typically been viewed as a conformationally static entity. Our proof-of-concept study provides a powerful new methodology for studying nucleosome organization at a previously intractable resolution and offers up new avenues for modeling and visualizing higher order chromatin structure.

Introduction

The nucleosome is the atomic unit of chromatin. Nucleosomes passively and actively template the majority of nuclear interactions essential to life by determining target site access for transcription factors (Spitz and Furlong, 2012), bookmarking active and repressed chromosomal compartments via post-translational modifications (Zhou et al., 2011), and safeguarding the genome from mutational agents (Papamichos-Chronakis and Peterson, 2013). Our earliest views of the beads-on-a-string arrangement of chromatin derived from classical electron micrographs of chromatin fibres (Olins and Olins, 1974), which have since been followed by both light (Huang et al., 2010) and electron microscopy (Ou et al., 2017; Song et al., 2014) studies of in vitro-assembled and in vivo chromatin. In parallel, complementary biochemical methods using nucleolytic cleavage have successfully mapped the subunit architecture of chromatin structure at high resolution. These cleavage-based approaches can be stratified into those that focus primarily on chromatin accessibility (Klemm et al., 2019) (i.e. measuring ‘competent’ active chromatin [Weintraub and Groudine, 1976]), and those that survey nucleosomal structure uniformly across active and inactive genomic compartments. Understanding links between chromatin and gene regulation requires sensitive methods in all three of these broad categories: in this study, we advance our capabilities in the third, focusing on a novel method to map oligonucleosomal structures genome-wide.

Nucleolytic methods for studying nucleosome positioning have historically used cleavage reagents (e.g. dimethyl sulphate [Becker et al., 1986], hydroxyl radicals [Tullius, 1988], nucleases [Hewish and Burgoyne, 1973]) followed by gel electrophoresis and/or Southern blotting to map the abundance, accessibility, and nucleosome repeat lengths (NRLs) of chromatin fibres (Richard-Foy and Hager, 1987). More recently, these methods have been coupled to high-throughput short-read sequencing (Zentner and Henikoff, 2014), enabling genome-wide measurement of average nucleosome positions. While powerful, all these methods share key limitations: measurement of individual protein-DNA interactions inherently requires destruction of the chromatin fibre and averaging of signal across many short molecules. These limitations extend even to single-molecule methyltransferase-based approaches (Kelly et al., 2012; Krebs et al., 2017; Nabilsi et al., 2014), which have their own biases (e.g. CpG/GpC bias; presence of endogenous m⁵dC in mammals; DNA damage due to bisulphite conversion), and are still subject to the short-length biases of Illumina sequencers. While single-cell (Lai et al., 2018; Pott, 2017) and long-read single-molecule (Baldi et al., 2018) genomic approaches have captured some of this lost contextual information, single-cell data are generally sparse and single-molecule Array-seq data must be averaged over multiple molecules. Ultimately, these limitations have hindered our understanding of how combinations of ‘oligonucleosomal patterns’ (i.e. discrete states of nucleosome positioning and regularity on single DNA molecules) give rise to active and silent chromosomal domains.

The advent of third-generation (i.e. high-throughput, long-read) sequencing offers a potential solution to many of these issues (Shema et al., 2019). Here, we demonstrate Single-molecule Adenine Methylated Oligonucleosome Sequencing Assay (SAMOSA), a method that combines adenine methyltransferase footprinting of nucleosomes with base modification detection on the PacBio single-molecule real-time sequencer (Flusberg et al., 2010) to measure nucleosome positions on single chromatin templates. We first present proof-of-concept of SAMOSA using gold-standard in vitro assembled chromatin fibres, demonstrating that our approach captures single-molecule nucleosome positioning at high-resolution. We next apply SAMOSA to oligonucleosomes derived from K562 cells to profile single-molecule nucleosome positioning genome-wide. Our data enables unbiased classification of oligonucleosomal patterns across both euchromatic and heterochromatic domains. These patterns are influenced by multiple epigenomic phenomena, including the presence of predicted transcription factor binding motifs and post-translational histone modifications. Consistent with estimates from previous studies, our approach reveals enrichment for long, regular chromatin arrays in actively elongating chromatin, and highly accessible, disordered arrays at active promoters and enhancers. Surprisingly, we also observe a large amount of heterogeneity within constitutive heterochromatin domains, with both mappable H3K9me3-decorated regions and human major satellite sequences harboring a mixture of irregular and short-repeat-length oliognucleosome types. Our study provides a proof-of-concept framework for studying chromatin at single-molecule resolution while suggesting a highly dynamic nucleosome-DNA interface across chromatin sub-compartments.

Results

Single-molecule real-time sequencing of adenine-methylated chromatin captures nucleosome footprints

Existing methyltransferase accessibility assays either rely on bisulfite conversion (Kelly et al., 2012; Krebs et al., 2017; Nabilsi et al., 2014) or use the Oxford Nanopore platform to detect DNA modifications (Oberbeckmann et al., 2019; Shipony et al., 2020; Wang et al., 2019). We hypothesized that high-accuracy PacBio single-molecule real-time sequencing could detect m⁶dA deposited on chromatin templates to natively measure nucleosome positioning. To test this hypothesis, we used the nonspecific adenine methyltransferase EcoGII (Murray et al., 2018) to footprint nonanucleosomal chromatin arrays generated through salt-gradient dialysis (Figure 1—figure supplement 1), using template DNA containing nine tandem repetitive copies of the Widom 601 nucleosome positioning sequence (Lowary and Widom, 1998) separated by ~46 basepairs (bp) of linker sequence followed by ~450 bp of sequence without any known intrinsic affinity for nucleosomes. After purifying DNA, polishing resulting ends, and ligating on barcoded SMRTBell adaptors, we subjected libraries to sequencing on PacBio Sequel or Sequel II flow cells, using unmethylated DNA and methylated naked DNA as controls (Figure 1A). After filtering low quality reads, we analyzed a total of 33,594 single molecules across all three conditions. Across both platforms, we observed higher average interpulse duration (IPD) in samples exposed to methyltransferase, consistent with a rolling circle polymerase ‘pausing’ at methylated adenine residues in template DNA (Figure 1—figure supplement 2). Further inspection of footprinted chromatin samples sequenced on either platform revealed strong specificity for altered IPD values only at thymines falling outside Widom 601 repeat sequences, in contrast with fully methylated naked template and unmethylated controls (Figure 1—figure supplement 3A,B). These patterns were subtly influenced by the associated 10-mer context of sequenced bases, consistent with possible enzymatic biases, but also previous observations of sequence-influenced shifts in polymerase kinetics (Figure 1—figure supplement 4; Feng et al., 2013). These results suggest that the PacBio platform can natively detect ectopic m⁶dA added to chromatinized templates.

Figure 1 with 6 supplements see all

Download asset Open asset

Overview of the single-molecule adenine methylated oligonucleosome sequencing assay (SAMOSA).

(A) In the SAMOSA assay, chromatin is methylated using the nonspecific EcoGII methyltransferase, DNA is purified, and then subjected to sequencing on the PacBio platform. Modified adenine residues are natively detected during SMRT sequencing due to polymerase pausing, leading to an altered interpulse duration at modified residues. (B) SAMOSA data can be used to accurately infer nucleosome dyad positions given a strong positioning sequence. Shown are the distributions of called dyad positions with respect to the known Widom 601 dyad. Called dyads fall within a few nucleotides of the expected dyad position (median ±median absolute deviation [MAD]=4 ± 2.97 bp). (C) SAMOSA data accurately recapitulates the known nucleosome repeat lengths (NRL) of in vitro assembled chromatin fibres. Called NRLs are strongly concordant with the expected 193 repeat length (pairwise distance between adjacent dyads median ±MAD = 193±7.40 bp; single-molecule averaged repeat length median ±MAD = 192±1.30 bp). (D) Expected nucleosome footprints in SAMOSA data can be visually detected with single-molecule resolution (n = 500 sampled footprinted chromatin molecules).

We next developed a computational approach to assign a posterior probability describing the likelihood that an A/T basepair is methylated given IPD signals found within the same molecule (i.e. ‘modification probability’). We then paired this approach with a simple peak-calling strategy to approximate nucleosomal dyad positions. To benchmark this pipeline, we first calculated the distance between called nucleosome dyads and expected 601 dyad positions (Figure 1B). Observed dyads were highly concordant with expected positions (median ±median absolute deviation [MAD]=4 ± 2.97 bp), consistent with our data accurately capturing the expected 601 dyad. We next calculated the expected distances between nucleosomes given our dyad callset (i.e. a computationally defined nucleosome repeat length [NRL]; Figure 1C). Compared with the expected repeat length of 193 bp, our calculated results were similarly accurate at both two-dyad resolution (pairwise distance between adjacent dyads; median ±MAD = 193±7.40 bp) and averaged single-molecule resolution (median ±MAD = 192±1.30 bp). Both these measurements were qualitatively uniform across all molecules, independent of the positions of individual nucleosomes along individual array molecules (Figure 1—figure supplement 5). Finally, we directly visualized the modification probabilities of individual sequenced chromatin molecules and observed that modification patterns occurred in expected linker sequences (Figure 1D), and not in unmethylated or fully methylated control samples (Figure 1—figure supplement 6A,B). These results demonstrate that EcoGII footprinting is specific for unprotected DNA and that kinetic deviations observed in the data are not simply the result of primary sequence biases in the template itself. We hereafter refer to this approach as SAMOSA.

SAMOSA captures regular nucleosome-DNA interactions in vivo through nuclease-cleavage and adenine-methylation simultaneously

Having shown that SAMOSA can footprint in vitro assembled chromatin fibres, we sought to apply our approach to oligonucleosomal fragments from living cells. Multiple prior studies have suggested that a light micrococcal nuclease (MNase) digest followed by disruption of the nuclear envelope and overnight dialysis can be used to gently liberate oligonucleosomes into solution without dramatically perturbing nucleosomal structure (Ehrensberger et al., 2015; Gilbert and Allan, 2001; Gilbert et al., 2004). After lightly digesting and solubilizing oligonucleosomes from human K562 nuclei, we methylated chromatin with EcoGII and sequenced methylated molecules on the Sequel II platform (n = 1,855,316 molecules total; Figure 2A). As controls, we also shallowly sequenced deproteinated K562 oligonucleosomal DNA, and deproteinated oligonucleosomal DNA methylated with the EcoGII enzyme.

Figure 2 with 1 supplement see all

Download asset Open asset

In vivo SAMOSA captures oligonucleosome structure by combining MNase digestion of chromatin with adenine methylation footprinting.

(A) An overview of the in vivo SAMOSA protocol: oligonucleosomes are gently solubilized from nuclei using micrococcal nuclease and fusogenic lipid treatment. Resulting oligonucleosomes are footprinted using the EcoGII enzyme and sequencing on the PacBio platform. Each sequencing molecules captures two orthogonal biological signals: MNase cuts that capture ‘barrier’ protein-DNA interactions, and m⁶dA methylation protein-DNA footprints. (B) Fragment length distributions for in vivo SAMOSA data reveal expected oligonucleosomal laddering (bin size = 5 bp). (C) Averaged modification probabilities from SAMOSA experiments demonstrate the ability to mark nucleosome-DNA interactions directly via methylation. Modification patterns seen in the chromatin sample are not seen in unmethylated oligonucleosomal DNA or fully methylated K562 oligonucleosomal DNA.

In vivo SAMOSA has several advantages compared to existing MNase- or methyltransferase-based genomic approaches. Our approach combines MNase-derived cuts flanking each fragment with methyltransferase footprinting of nucleosomes. MNase cuts mark the boundary of genomic ‘barrier’ elements like nucleosomes and can be tuned by modifying digestion conditions; accordingly, fragment length distributions from in vivo SAMOSA data display patterns emblematic of bulk nucleosomal array regularity (Figure 2B; Figure 2—figure supplement 1). Modification patterns of sequenced molecules then capture nucleosome-positioning information at single-molecule resolution; this is evident in single-molecule averages of modification probability in chromatin samples with respect to fully methylated and unmethylated controls (Figure 2C). While previous approaches for studying nucleosome regularity may capture each of the former information types, this method is, to our knowledge, the first that simultaneously captures the positioning of protein-DNA interactions through nucleolytic cleavage, and (through DNA methylation) the positioning of proximal protein-DNA interactions on the same single-molecule.

SAMOSA enables unbiased classification of chromatin fibres on the basis of regularity and nucleosome repeat length

The relative abundance and diversity of oligonucleosome patterns across the human genome remains unknown. Given the single-molecule nature of SAMOSA, we speculated that our data could be paired with a state-of-the-art community detection algorithm to systematically cluster footprinted molecules on the basis of single-molecule nucleosome regularity and NRL (i.e. ‘oligonucleosome patterns’). To ease detection of signal regularity on single molecules, we computed autocorrelograms for each molecule in our dataset ≥500 bp in length, and subjected resulting values to unsupervised Leiden clustering (Traag et al., 2019). Cluster sizes varied considerably, but were consistent across both replicates, with each cluster containing 6.54% (Cluster 4)–20.1% (Cluster 1) of all molecules (Figure 3A). The resulting seven clusters (Figure 3—figure supplement 1A) capture the spectrum of oligonucleosome patterning genome-wide, stratifying the genome by both NRL and array regularity. Accounting for the coverage biases presented above, the measurements shown in Figure 3A provide a rough estimate of the equilibrium composition of the genome with respect to these patterns.

Figure 3 with 1 supplement see all

Download asset Open asset

SAMOSA reveals distribution of oligonucleosome patterns genome-wide.

(A) Stacked bar chart representation of the contribution of each cluster to overall signal across two replicate experiments in K562 cells. (B) Average modification probability as a function of sequence for each of the seven defined clusters. Left: Manually annoted cluster names based on NRL estimates computed by calling peaks on single-molecule autocorrelograms; Right: Median and median absolute deviation for single-molecule NRL estimates determined for each cluster. (C) Violin plot representation of the distributions of single-molecule NRL estimates for each cluster. Clusters can be separated into three ‘irregular’ and four ‘regular’ groups of oligonucleosomes. (D) Histogram of single-molecule NRL estimates for Clusters 1, 4, and 7, along with (E) 5000 randomly sampled molecules from each cluster.

The diversity in nucleosome regularity and repeat length across these clusters is visually apparent when inspecting average modification probabilities of the 5’ 1000 bp of each cluster (Figure 3B). To better annotate each of these clusters, we characterized each with respect to methylation extent and distribution of computed single-molecule NRLs. We first inspected the average modification probabilities of each molecule across clusters, finding that these averages were largely invariant (Figure 3—figure supplement 1B). This suggests that our clustering approach does not simply classify oligonucleosomes based on the amount of methylation on each molecule. We next estimated within-cluster heterogeneity in single-molecule NRLs using a simple peak-calling approach. We scanned each autocorrelogram for secondary peaks, and annotated the location of each peak to compute an estimated NRL. We then visualized these distributions as violin plots for each cluster (Figure 3C). Our data broadly fall into two categories: irregular clusters made up of molecules spanning multiple NRLs and lacking a strong regular periodicity, and highly regular clusters with defined single-molecule NRLs ranging from ~172 bp (i.e. chromatosome plus 5 bp DNA) to >200 bp. Based on the median NRLs and regularities inferred from these analyses, we named these clusters irregular-short (IRS), irregular-long (IRL), irregular-170 (IR170), regular repeat length 172 (NRL172), regular repeat length 187A and B (NRL187A/B), and regular repeat length 192 (NRL192). The difference between irregular and regular clusters is clear when closely inspecting histograms of NRL calls from selected clusters (Figure 3D; Figure 3—figure supplement 1C), as well as the modification patterns on individual molecules (Figure 3E). Our analyses also varied with respect to the fraction of molecules per cluster where a secondary peak could be detected (0.50%–38.2% of molecules across specific clusters; Figure 3—figure supplement 1D). Failure to detect a peak within a single-molecule autocorrelogram could be due to multiple factors, including technical biases (e.g. random undermethylated molecules). We observed, however, that more ‘missing’ NRL estimates occurred in irregular clusters, suggesting that at least a fraction of failed peak calls occurred due to lack of intrinsic regularity along individual footprinted molecules. These analyses together demonstrate that SAMOSA data can be clustered in an unbiased manner, thus enabling estimates of the equilibrium composition of the genome with respect to oligonucleosome regularity and repeat length.

SAMOSA captures the transient nucleosome occupancy of transcription-factor-binding motifs

We next explored the extent to which our data captures chromatin structure at predicted K562 transcription factor (TF)-binding sites (ENCODE Project Consortium, 2012). Both endo- and exo-nucleolytic MNase cleavage activities are obstructed by genomic protein-DNA contacts; resulting fragment-ends thus capture both nucleosomal- and TF-DNA interactions (Henikoff et al., 2011; Ramani et al., 2019). Inspection of cleavage patterns about six different TF-binding sites (CTCF, NRF1, NRSF/REST, PU.1, c-MYC, GATA1) (Figure 4A–F) revealed signal resembling traditional MNase-seq data, with fragment ends accumulating immediately proximal to predicted TF-binding motifs, and, in the case of some TFs (i.e. CTCF, REST, PU.1), showed characteristic patterns of phased nucleosomes. Further analysis of m⁶dA signal in sequenced molecules harboring motifs with at least 500 nucleotides of flanking DNA revealed examples of methyltransferase accessibility coincident with TF motifs (e.g. CTCF, NRF1, c-MYC), but also cases where single-molecule averages demonstrated weak or no differential signal when compared to equal numbers of molecules drawn from random genomic regions matched for GC-percentage and repeat content (e.g. GATA1; Figure 4G–L). Importantly, our methylation data do not appear to capture TF ‘footprints’ as seen in DNase I, hydroxyl radical, or MNase cleavage data—this could be due to turnover of transcription factors during our solubilization process, or owed to sterics, as EcoGII is roughly twice the molecular weight of S. aureus micrococcal nuclease (Murray et al., 2018).

Figure 4 with 1 supplement see all

Download asset Open asset

SAMOSA captures bulk and single-molecule evidence of transcription factor-DNA interaction simultaneously via two orthogonal molecular signals.

(**A-F**) SAMOSA MNase-cut signal averaged over predicted CTCF, NRF1, REST, PU.1, c-MYC, and GATA1-binding motifs in the K562 epigenome. All binding sites were predicted from ENCODE ChIP-seq data. (**G–L**) m⁶dA signal for the same transcription factors, averaged over molecules containing predicted binding sites and at least 250 bases flanking DNA on either side of the predicted motif. Methylation patterns at predicted sites were compared against average profiles taken from randomly drawn molecules from GC%- and repeat-content-matched regions of the genome (calculated for each ENCODE ChIP-seq peak set). (M) Results of clustering motif-containing molecules using the Leiden community detection algorithm. Clusters were manually annotated as containing molecules that were: ‘methylation resistant’ (MR), nucleosome occupied (NO1-8), stochastically accessible (SA1-2), accessible (A), or hyper-accessible (HA). (N) Heatmap representation of single-molecule accessibility profiles for clusters NO7, NO8, and A (500 randomly sampled molecules per cluster).

In theory, single-molecule footprinting data should distinguish nucleosome-bound and nucleosome-free states for molecules containing TF-binding sites. These accessibility patterns should be specific to TF-binding motifs (i.e. not present in control molecules matched for GC/repeat content). To test whether our assay captured such signal, we clustered all molecules shown in Figure 4G–L (including control molecules) using Leiden clustering, using modification probabilities extracted in a 500 bp window surrounding the predicted motif site/control site. In total, we defined 13 discrete states of template accessibility across all surveyed molecules (Figure 4M; cluster sizes shown in Figure 4—figure supplement 1). We interpreted these states on the basis of methyltransferase accessibility as: methyltransferase-resistant motifs (MR); nucleosome-occluded motifs (NO1-8); stochastically accessible motifs (wherein motif accessibility is slightly elevated near the DNA entry/exit point of a footprinted nucleosome; SA1-2); accessible motifs (A); and hyper-accessible motifs (HA). Notably, the patterns within these clusters were evident at single-molecule resolution (Figure 4N). Most transcription factors (excepting PU.1 and GATA1—the latter of which may productively bind nucleosomal DNA [Zaret and Carroll, 2011]) were significantly enriched for specific states as defined above, and all control regions were markedly depleted for molecules harboring the accessible ‘A’ and ‘HA’ states, hinting at the biological relevance of these patterns (Figure 5A). We speculate that the broad distribution of these states across both TF-binding sites and controls represent distributions of nucleosome ‘registers’ surrounding typical transcription factor binding motifs (i.e. states MR; NO-1–8). A fraction of these registers (i.e. states SA1/2) may stochastically permit transcription factor binding (perhaps through transient unwrapping of the nucleosome [Polach and Widom, 1995]), enabling formation of a new nucleosome register (i.e. state ‘A’), and subsequent generation of a highly accessible state (‘HA’; model illustrated in Figure 5B). The relative fraction of molecules in an ‘SA’ state could conceivably be modulated by TF intrinsic properties (e.g. ability to bind partially nucleosome-wrapped DNA [Zaret and Mango, 2016]), or extrinsic factors (e.g. local concentration of ATP-dependent chromatin remodeling enzymes [Narlikar et al., 2013]). While correlation of our replicates demonstrates the reproducibility and robustness of these findings (Figure 5—figure supplement 1), future experimental follow-up coupling our protocol with perturbed biological systems and deeper sequencing are necessary to quantitatively interrogate this model.

Figure 5 with 1 supplement see all

Download asset Open asset

TF-centric clusters exhibit significantly different usage of specific ‘registers’ of nucleosome positioning with respect to predicted TF-binding sites.

(A) We performed Fisher’s exact tests to determine relative enrichment and depletion of each cluster for each transcription factor surveyed in Figure 4. Cluster ‘A’ is consistently depleted across control molecules but enriched across molecules containing bona fide transcription factor binding motifs, suggesting that the clusters identified in this study are biologically relevant. Fishers Exact test odds ratios are plotted in heatmap form and all enrichment tests that are statistically significant under a false discovery rate of 10% (q < 0.1) are marked with a black dot. (B) Our data may be explained by the Widom ‘site exposure’ model in vivo. Transcription factor binding motifs are stochastically exposed as nucleosomes toggle between multiple ‘registers’ as seen in Figure 4M (states NO and SA). Transcription factor binding perhaps enforces a favorable nucleosome register (state A), which can then seed hyper-accessible states/further TF-DNA interactions (state HA).

Heterogeneous oligonucleosome patterns comprise human epigenomic domains

Short-read and long-read sequencing of nucleolytic fragments in mammals have suggested that NRLs vary across epigenomic domains, with euchromatin harboring shorter NRLs on average and heterochromatic domains harboring longer NRLs (Gaffney et al., 2012; Snyder et al., 2016; Valouev et al., 2011), but the relative heterogeneity of these domains remains unknown. We speculated that SAMOSA data could be used to estimate single-molecule oligonucleosome pattern heterogeneity across epigenomic domains. We revisited the seven oligonucleosome patterns defined above, and examined the distribution of patterns across collections of single molecules falling within ENCODE-defined H3K4me3, H3K4me1, H3K36me3, H3K27me3, and H3K9me3-decorated chromatin domains. To control for the impact of GC-content on these analyses, we also included GC-/repeat content matched control molecules for each epigenomic mark surveyed. Furthermore, to take advantage of the long-read and relatively unbiased nature of our data, we also incorporated molecules deriving from typically unmappable human alpha, beta, and gamma satellite DNA sampled directly from raw CCS reads.

We visualized the relative heterogeneity of these domains and controls in two ways: using histograms of computed single-molecule NRL estimates (Figure 6A), and by using stacked bar graphs to visualize cluster membership (Figure 6B). A striking finding of our analyses was that each epigenomic domain surveyed was comprised of a highly heterogeneous mixture of oligonucleosome patterns. In most cases, these patterns differed only subtly from control molecules with respect to regularity and NRL. In specific cases, we observed small effect shifts in the estimated median NRLs for specific domains—for example, a shift of ~5 bp (180 bp vs. 185 bp) in H3K9me3 chromatin with respect to random molecules, and a shift of ~4 bp (182 bp vs 186 bp) for H3K36me3. These shifts were also evident in the fraction of molecules with successful peak calls: H3K4me3 decorated chromatin, for example, had markedly fewer (78.0% vs 88.6%) successful calls compared to control molecules, a finding consistent with the expected irregularity of active promoter oligonucleosomes. We note that all these measured parameters would be unattainable using any existing biochemical method and that these preliminary findings argue against the abundance of homogeneous oligonucleosome structures in either heterochromatic or euchromatic nuclear regions.

Figure 6 with 3 supplements see all

Download asset Open asset

Human epigenomic states are punctuated by specific oligionucleosome patterns.

A) Histogram representations of the estimated single-molecule NRLs for five different epigenomic domains compared to control sets of molecules matched for GC and repeat content. Inset: Numbers of molecules plotted, median NRL estimates with associated median absolute deviations, and the percent of molecules where a peak could not be detected. (B) Stacked bar chart representation of the relative composition of each epigenomic domain with respect to the seven clusters defined in Figure 3. C. Heterochromatin: constitutive heterochromatin; F. Heterochromatin: facultative heterochromatin. (C). Heatmap of enrichment test results to determine nucleosome conformers that are enriched or depleted for each chromatin state. Tests qualitatively appearing to be chromatin-state specific are highlighted with a black box. Significant tests following multiple hypothesis correction marked with a black dot. Fisher’s Exact Test was used for all comparisons.

On first glance, our data appear to run counter to previous observations demonstrating that epigenomic domains can be delineated by differences in bulk nucleosome positioning as measured by nuclease digestion. One possible explanation for this is that epigenomic domains subtly, but significantly, vary in their relative composition of distinct oligonucleosome patterns, and the resulting average of these differences is the signal captured in MNase-Southern and other cleavage-based measurements. We tested this hypothesis by constructing a series of statistical tests to determine whether each of the seven defined oligonucleosome patterns were significantly enriched or depleted across chromatin domains and matched control regions (Figure 6C; reproducibility analyses summarized in Figure 6—figure supplement 1). Our results suggest that chromatin domains are demarcated by their relative usage of specific oligonucleosome patterns. Consistent with expectations, active chromatin marked by H3K4me3 and H3K4me1 are punctuated by a mixture of irregular oligonucleosome patterns (namely, clusters IRL and IR170). For transcription elongation associated H3K36me3 decorated chromatin, both short-read mapping in human and long-read bulk array regularity mapping in D. melanogaster have suggested relatively short, regular nucleosome repeat lengths (Baldi et al., 2018; Valouev et al., 2011). Our data partially corroborate this finding in human K562 cells: H3K36me3-domains are punctuated by irregular IRS oligoncleosome patterns (Fisher’s Exact Odds Ratio [O.R.]=1.13; q = 1.71E-50) and regular, short NRL172 patterns (O.R. = 1.39; q = 3.69E-170).

Our assay also allows us to assess compositional biases in heterochromatic domains. Short-read-based human studies and classical MNase mapping of constituve heterochromatin have suggested that H3K9me3-decorated chromatin harbor (i) long nucleosome repeat lengths on average, and (ii) are highly regular. These estimates are susceptible to artifacts, as heterochromatic nucleosomes are expected to be both strongly phased and weakly positioned. Our data partially disagree with prior estimates—across both H3K9me3 and Satellite molecules we observe enrichment for irregular IRS nucleosome conformers (Satellite O.R. = 1.13; q = 5.71E-11; H3K9me3 O.R. = 1.35; q = 3.95E-23). Still, these enriched conformers were accompanied by enrichment for regular NRL172 oligonucleosome patterns for both states (Satellite O.R. = 1.61; q = 5.25E-80; H3K9me3 O.R. = 1.23; q = 3.86E-6). These analyses demonstrate that prior NRL estimates by short-read sequencing may have been confounded by in vivo heterogeneity in nucleosome positions, that heterochromatic nucleosome conformations can be both irregular and diverse, and finally, highlight the value of SAMOSA for accurately studying nucleosome structure in heterochromatin.

Taken as a whole, our data suggest two fundamental properties of human epigenomic domains: first, epigenomic domains are comprised of a diverse array of oligonucleosome patterns varying substantially in intrinsic regularity and average distance between regularly spaced nucleosomes; second: epigenomic domains are demarcated by their usage of these oligonucleosome patterns. We find that all epigenomic states are characterized by a diverse mixture of oligonucleosomal conformers—many conformational states are neither significantly depleted nor enriched with respect to all molecules surveyed, further hinting at the diverse composition of chromatin domains genome-wide.

Discussion

Here, we present the SAMOSA, a method for resolving nucleosome-DNA interactions using the EcoGII adenine methyltransferase and PacBio single-molecule real-time sequencing. Our approach has multiple advantages over existing methyltransferase-based sequencing approaches: first, by using a relatively nonspecific methyltransferase, we avoid the primary sequence biases associated with GpC/CpG methyltransferase footprinting methods; second, by natively detecting modifications using the single-molecule real-time sequencer, we reduce enzymatic sequence bias and avoid sample damage associated with sodium bisulphite conversion; finally, and most importantly, our approach unlocks the study of protein-DNA interactions at length-scales previously unallowed by Illumina sequencing.

Our study does have limitations. While the current SAMOSA protocol enriches fragments ranging from ~500 bp to ~ 2 kb in size, high-quality PacBio CCS sequencing is compatible with fragments ranging from 10 to 15 kbp. We anticipate that with further optimization (e.g. optimization of digestion conditions), SAMOSA will be applicable to longer arrays, enabling kilobase-domain-scale study of single-molecule oligonucleosome patterning. Indeed, our preliminary SAMOSA experiments varying digestion conditions demonstrate the feasibility of such variations (Figure 2—figure supplement 1). Second, our approach involves methylating fibres following solubilization of oligonucleosomal fragments, and is thus unlikely to capture protein-DNA interactions weaker or more transient than the stable nucleosome-DNA interaction. Such transient interactions could be captured in future work by modifying the protocol to footprint nuclei prior to MNase-solubilization. Third, our proof-of-concept was performed in unsynchronized K562 cells, and thus we cannot yet address the contribution of a biological process like the cell cycle to the observed heterogeneity. Finally, as a proof-of-concept our approach falls short of generating a high-coverage reference map of the K562 epigenome; as sequencing costs for PacBio decrease and sequence-enrichment technologies (e.g. CRISPR-based enrichment Ebbert et al., 2018; SMRT-ChIP [Wu et al., 2016]) for the platform mature, SAMOSA may routinely be used to generate reference datasets with hundred-to-thousand-fold single-molecular coverage of genomic sites of interest.

Our data confirms that the human epigenome is made up of a diverse array of oligonucleosome patterns, including highly regular arrays of varying nucleosome repeat lengths, and irregular arrays where nucleosomes are positioned without a detectable periodic signature (Baldi et al., 2020). Our results broadly agree with a recent approach employing electron tomography to map the in situ structure of mammalian nuclei, which found chromatin to be highly heterogeneous at the length scale of multi-nucleosome interactions, and failed to detect evidence of a 30 nm fibre or other homogeneous higher order compaction states (Ou et al., 2017). At the sequencing depth presented here, these oligonucleosome patterns significantly, if subtly, vary across different epigenomic domains. Surprisingly, we find that both mappable (H3K9me3 ChIP-seq peaks) and unmappable (human satellite sequence) constitutive heterochromatin are enriched for irregular oligonucleosome patterns in addition to expected regular arrays—the presence of these irregular fibres may have been previously missed due to an understandable reliance on bulk averaged methods (e.g. MNase-Southern) for studying constitutive heterochromatin. This is strongly supported by orthogonal analysis of heterochromatin-spanning K562 reads generated using the recently published, conceptually similar Fiber-seq method (Stergachis et al., 2020), which also reveal that H3K9me3 domains are enriched for irregular chromatin fibres (Figure 6—figure supplement 2). Given the robustness of this finding, it is tempting to speculate that this irregularity may be linked to the dynamic restructuring of heterochromatic nucleosomes by factors like HP1 (Sanulli et al., 2019), which may promote phase-separation of heterochromatin. While stratification of analyzed satellite sequences into H3K9me3-decorated alpha/beta, and H3K9me3-free gamma satellite (Kim et al., 2009) provides correlative support for this notion (Figure 6—figure supplement 3), future studies combining SAMOSA with cellular perturbation of heterochromatin-associated factors are necessary to directly address this possibility.

More generally, future work employing our technique must focus on questioning the biological significance of this global heterogeneity: for example, is the fraction of stochastically accessible transcription factor binding sites (i.e. motif ‘site exposure’ frequency [Ahmad and Henikoff, 2001; Polach and Widom, 1995]) important for TF-DNA binding in nucleosome-occluded genomic regions? What is the interplay between transcription factor ‘pioneering’ and stochastic site accessibility? What are the global roles of ATP-dependent chromatin remodeling enzymes (i.e. SWI/SNF; ISWI; INO80; CHD) in maintaining these patterns genome-wide (Brahma and Henikoff, 2020)? Our approach also unlocks a set of conceptual questions regarding the nature of chromatin secondary structure. Significant genome-wide efforts have revealed that metazoan epigenomes are punctuated by regions of concerted histone modification and subnuclear positioning (ENCODE Project Consortium, 2012; Filion et al., 2010), but approaches for studying the distribution of oligonucleosomal patterns associated within these same regions are lacking. Given recent work suggesting that NRLs can specify the ability of nucleosomal arrays to phase separate (Gibson et al., 2019), it is likely that SAMOSA and similar assays may provide an important bridge between in vitro biochemical observations of chromatin and in vivo genome-wide ‘catalogs’ of oligonucleosome patterning.

SAMOSA adds to the growing list of technologies that use high-throughput single-molecule sequencing to explore the epigenome (Baldi et al., 2018; Lee et al., 2019; Shipony et al., 2020; Wang et al., 2019; Stergachis et al., 2020). We foresee the broad applicability of this and similar approaches to dissect gene regulatory processes at previously intractable length-scales. Our approach and associated analytical pipelines demonstrate the versatility of high-throughput single-molecule sequencing—namely the ability to cluster single-molecules in an unsupervised manner to uncover molecular states previously missed by short-read approaches. Our analytical approach bears many similarities to methods used in single-cell analysis, and indeed many of the technologies and concepts typically used for single-cell genomics (Trapnell, 2015) (e.g. clustering; trajectory analysis) will likely have value when applied to single-molecule epigenomic assays. Our approach also follows in the footsteps of multi-omic Illumina assays like NoME-seq and MapIT, representing the first of what we anticipate will be many ‘multi-omic’ third-generation sequencing assays. As third-generation sequencing technologies advance, it will likely become possible to encode multiple biochemical signals on the same single-molecules, thus enabling causal inference of the logic and ordering of biochemical modifications on single chromatin templates.

Share this article

Cite this article

Overview of the single-molecule adenine methylated oligonucleosome sequencing assay (SAMOSA).

In vivo SAMOSA captures oligonucleosome structure by combining MNase digestion of chromatin with adenine methylation footprinting.

SAMOSA reveals distribution of oligonucleosome patterns genome-wide.

SAMOSA captures bulk and single-molecule evidence of transcription factor-DNA interaction simultaneously via two orthogonal molecular signals.

TF-centric clusters exhibit significantly different usage of specific ‘registers’ of nucleosome positioning with respect to predicted TF-binding sites.

Human epigenomic states are punctuated by specific oligionucleosome patterns.

Author details

Nour J Abdulhay

Contribution

Contributed equally with

Competing interests

Colin P McNally

Contribution

Contributed equally with

Competing interests

Laura J Hsieh

Contribution

Competing interests

Sivakanthan Kasinathan

Contribution

Competing interests

Aidan Keith

Contribution

Competing interests

Laurel S Estes

Contribution

Competing interests

Mehran Karimzadeh

Contribution

Competing interests

Jason G Underwood

Contribution

Competing interests

Hani Goodarzi

Contribution

Competing interests

Geeta J Narlikar

Contribution

Competing interests

Vijay Ramani

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism