Single-cell co-mapping reveals relationship between chromatin state and gene expression in early zebrafish development

Vivek Bhardwaj; Alberto Griffa; Helena Viñas Gaza; Peter Zeller; Alexander van Oudenaarden

doi:10.7554/eLife.110400.1

Introduction

Early embryonic development in animals is characterized by the controlled movement and positioning of cells, establishment of a body plan, and specification of tissue-specific cell states. While the spatial gradients of morphogens dominate the former two events [1], the maintenance of cell identity is believed to be mainly regulated by chromatin state [2]. Similar to the morphogens that regulate the patterning of an embryo, the chromatin state can also be transgenerationally inherited [3]. This might play an important role in predefining the spatiotemporal expression of genes during early development and regulate cell fates. The relationship between chromatin state and gene expression has been studied using whole-genome assays applied to cultured cell populations, whole tissues, or enriched cell populations sorted using cell surface or transgenic markers [4,5]. Recently, chromatin and DNA methylome mapping techniques have been developed to resolve cellular heterogeneity of epigenetic states at the level of individual cells. Major progress has been made using DNA methylome profiling of single cells, which have mostly been applied to study adult tissues [6–8]. We and others have applied single-cell methods to study chromatin states in adult tissues [9–12]. However, genome-wide studies mapping temporal chromatin changes of single cells during early embryogenesis are rare [13–18]. This leaves a gap in our understanding of the process of establishment and propagation of cell-type-specific chromatin states during early embryonic development.

In this study, we asked how the active and silenced chromatin states of cells are shaped during early vertebrate embryogenesis, using zebrafish as a model system. As the chromatin and gene expression are highly dynamic in embryos across cells, bulk assays would average out these biologically important differences. Similarly, a single-cell assay profiling either chromatin or transcriptome alone cannot measure how these two layers interact with each other during development in a single cell. Therefore, we applied our single-cell co-mapping assay T-ChIC [19] to jointly profile the genome-wide active and silencing histone modifications together with full-length transcriptome from the same single cells during early zebrafish development (4-24 hpf). Using this data, we infer continuous developmental trajectories and ask how the chromatin state correlates with the expression of transcription factors and other developmentally important genes during cell fate commitment.

Results

Paired profiling of histone modifications and transcriptome of single cells during zebrafish embryogenesis

We recently developed a single-cell multi-omics method, termed T-ChIC (transcriptome and chromatin immuno-cleavage), which extends the previously described sortChIC [9] and VASA-seq [20] protocols, by integrating them in a single workflow [19]. This allows us to quantify the pattern of histone modifications at kilobase resolution, while simultaneously providing full-length transcriptome coverage in single cells. To apply T-ChIC to study multiple time points across early zebrafish development, we extended this protocol with an optimized cell dissociation and sample multiplexing strategy that allows collection of embryos from different timepoints of an experiment while reducing batch effects (Fig. 1a, Methods). We applied this modified workflow, termed “whole-organism T-ChIC” (woT-ChIC) to quantify the polycomb complex-mediated histone mark, H3K27me3, in zebrafish embryos collected at six selected time points post-fertilization, obtaining a total of 18,432 cells. This dataset provides us with complete coverage of gastrulation (4, 6, 8, 10 hpf), along with the beginning and the end of somitogenesis (12 and 24 hpf, respectively, Fig. 1b). We produced the woT-ChIC dataset in 4 independent biological replicates, along with 2 additional replicates that contained cells without a functional antibody, to validate data quality (Supplementary Table 1). This subset (labeled “T-noChIC”) showed a similar number of detected transcripts, and was co-clustered with T-ChIC cells, confirming that the transcriptome quality of woT-ChIC is independent of the ChIC fraction (Fig. S1a, b). After removing cells with low MNase cuts, as well as potentially over-fragmented cells, we observed a strong enrichment of chromatin signal over specific genomic regions (Fig. S1d, e).

whole-organism T-ChIC of zebrafish embryos quantifies of full length transcripts and histone modifications from the same single cell.
**a. woT-ChIC experimental workflow:** (1) Single cells from different time points are labelled with a combination of CellTracer dyes and permeabilized either to retain cytoplasmic RNA (whole-cell), or exclude it (nuclei) before antibody+pA-MNase incubation. (2) Cells are pooled and sorted in 384-well plates with position-indexed RNA and ChIC barcodes. (3) DNA and RNA fragments are ligated and repaired before re-pooling them for IVT and PCR amplification to produce sequencing libraries. **b. UMAP projections of single cells** using signals from H3K27me3 (left) and transcriptome (right) and colored by timepoints. The 6 sampled timepoints (right) are pooled into 3 groups (early/middle/late) based on the complexity of H3K27me3 signal. **c. Single-cell trackplot** showing signal on the 450-kb region around the hoxc gene cluster. The heatmaps show signals in single cells, while the coverage tracks on top show the pseudo-bulk signal (blue: RNA, pink: H3K27me3). Publicly available bulk H3K27me3 datasets are shown for comparison (grey) **d. UMAP projections** (based on transcriptome signal) showing gene-level normalized signal for RNA (blue) or H3K27me3 (pink) on 2 selected genes.

Early zebrafish embryos contain a high load of maternal transcripts required for early embryonic development, which are temporally replaced with newly transcribed, zygotic RNA [21]. Consistent with this transition, we observed a substantial decrease in unique fragment counts from spliced reads compared to unspliced reads with developmental time (Fig. S1f). Despite these dynamics, our overall number of detected genes with both spliced and unspliced counts is higher than previously reported in scRNA-seq studies [22,23], due to the increased sensitivity and full-length RNA recovery (Supplementary Table 2). Moreover, with our total RNA profiling approach, we were also able to detect developmentally important non-coding RNAs such as miR-430 (in pluripotent cells) known to be critical for clearance of maternal RNA in zebrafish [24,25], and miR-124 (in neural ectoderm), a known regulator of neuronal differentiation which is conserved between species [26] (Fig. S1c). At the chromatin level, we observed increased MNase cuts (representing H3K27me3 signal) per cell as development progresses (Fig. S1g). This corroborates previous observations of increasing H3K27me3 abundance during development based on bulk chromatin assays [27,28]. Overall, we retained 9275 cells with both transcriptome and H3K27me3 signal for further analysis.

We divided our data into early (4-6 hpf), intermediate (8, 10, 12 hpf), and late (24 hpf) time points, and compared our H3K27me3 signal with publicly available datasets at the corresponding time points (Fig. 1c, Methods). While our pseudo-bulk H3K27me3 profiles showed a high genome-wide correlation with publicly available bulk ChIP data from matched time points (Fig. S2a), the analysis of genomic bins ranked by H3K27me3 signal shows improved signal enrichment of our data relative to the publicly available bulk sequencing data at a comparable sequencing depth (Fig. S2b). Moreover, the H3K27me3 levels show a clear relationship with the silencing of associated genes in single cells at all time points (Fig. 1d, S2c). We observed high H3K27me3 levels associated with the silencing of gene expression as early as 4 hpf for hoxc3a, involved in anterior-posterior patterning, as well as silencing of genes such as pcdh1b late during development (Fig. 1d, Fig. S2c). Furthermore, we also observed a transient association of K27me3 on genes. For example, rfx4, expressed in the central nervous system and neural rod, was silenced in non-neural ectoderm cells by H3K27me3 during gastrulation (Fig. S2c,d). These results suggest that our data allows us to gain quantitative insight into the relationship between H3K27me3 and gene expression during development.

Spatiotemporal spreading of H3K27me3 associates with the silencing of gene expression during development

To annotate cell types in our data, we performed Leiden clustering of cells using their gene expression signal, followed by canonical correlation analysis of gene expression with that of a previously published time-course scRNA-seq data set [23] (Fig. S3a, Methods). Virtually all our cells matched one of the annotated cells from Wagner et al. with high confidence, allowing successful label transfer into our data (Fig. S3b-c). We further refined these labels based on cell ontologies from the Zebrafish Information Network [29], to categorize our cells into 34 cell types (Fig. S3e). Cell type proportions were consistent between the biological replicates of woT-ChIC (Fig. S3f). To get a fine-grained view of cellular heterogeneity while reducing signal dropouts, we aggregated cells that are transcriptionally similar to each other, into “metacells” [30] (Fig. S3d-e, Methods). Interestingly, most of the cell types annotated based on their gene expression profiles also show a clear separation based on their H3K27me3 enrichment as early as 8-12 hpf, suggesting that distinct, cell-state-specific H3K27me3 patterns already start to appear during gastrulation (Fig. 2a).

Spatio-temporal spreading of H3K27me3 correlates with gene silencing.
**a. UMAP projection of cells based on H3K27me3 signal** (same as Fig 1b, left) indicating the different cell types annotated using the transcriptome signal. **b. An example genomic locus** that demonstrates the cis-spreading of H3K27me3 signal (pink) around the *zic3* gene with time during development. Note that apart from the *zic3* gene, the spreading also correlates with a downregulation of the expression for the nearby gene (*rbmx*) with time. **c. Single-cell heatmap** each row is a single cell selected from ectoderm lineage (34.5%) showing the average H3K27me3 signal for the top 100 genes detected by the linear model, showing increase in spreading of signal at the 100-kb region surrounding the center bin with pseudotime. The center bin was identified as the bin with non-zero signal in pluripotent cells. (PT=Pseudotime, RT=Real time) **d. Line plots comparing H3K27me3 spreading and gene expression with pseudotime**. The bottom panel shows the average fraction of bins that show H3K27me3 signal on 2 sets of genes (spreading, non-spreading) with time, while the top panel shows the average gene expression of these genes-sets along pseudotime. **e. Heatmap of H3K27me3 signals across cell types for genes with cell-type-specific demethylation** (CT=Cell Type). The analysed cell types are indicated in the right legend, while top legend shows all cell types (colors same as 2a). **f. Correlation of H3K27me3 fold-change with change in gene expression**, for the selected cell types.Top 5 genes with H3K27me3 loss are labelled per cell type.

Next, we asked how the global H3K27me3 landscape is established in the cells during lineage commitment. We observed that with time, the number of genomic bins with H3K27me3 signal increased. In contrast, the average signal in detected bins plateaued at 24 hpf, indicating new regions acquiring the H3K27me3 signal instead of enrichment of signal on pre-marked regions (Fig. S4a). Therefore, we asked whether this increase in signal comes from a de novo gain in H3K27me3 or as a result of spill-over (indicating “cis-spreading”) of increasing H3K27me3 density from previously enriched regions. At least a subset of enriched regions in pluripotent cells display a cis-spreading of signal with differentiation, covering developmentally important genes such as the zic locus (Fig. 2b). To quantify cis-spreading genome-wide, we first subsetted the genomic bins, which had signal in at least 5% of filtered Pluripotent cells (at 4 hpf). Apart from tightly repressed genes such as gata3, nr2f1a, six3b, pax9a, foxc1b, zic1/4, hox clusters, and the pcdh1/2 cluster with a broadly distributed signal, all other H3K27me3 signal was localized within 5 kb bins and the majority (65%) of these bins overlapped with a promoter region. We then calculated the signal on these bins compared to the background signal (averaged over 100 kb region) surrounding these bins in single cells. These two signals correlated positively for about 30% of the 5 kb bins suggesting a spill-over from the main signal peak to the surrounding background. In contrast, the remaining 70% displays a low correlation with background indicating a localized enrichment without spill-over to the surrounding background (Fig. 2c, Fig. S4b).

We confirmed this enrichment with an alternative approach based on domain calling on the pooled, pseudo-bulk dataset (Methods). While wider H3K27me3 domains detected on the pooled data correspond to the signal at 24 hpf, the sharper subpeaks within those domains were observed at early (4-6 hpf) time points (Fig. S4d). Relatively mature cell types, particularly from the neural ectoderm (such as differentiating neurons) show a higher correlation of subpeaks to the background, suggesting a wider spread in signal (Fig. S4e). We further stratified this signal in search of bins with a significant difference between lineages (Methods). We only detected a handful of bins with statistically significant differences between lineages, and the mean H3K27me3 signal indicated that these results are not robust (Fig. S4c). Therefore the spread of H3K27me3 signal does not appear to be lineage-specific. To test whether this characteristic might be conserved across species, we re-analysed a previously published ChIP-seq dataset of mouse embryos [31] (Methods). Comparing H3K27me3 signals at and around gene promoters between E5.5 epiblast and post-gastrulation ectoderm lineage, we see the evidence of signal spreading from a subset of these sites (Fig. S4i), suggesting that the spread of H3K27me3 signal from promoters might be a conserved phenomenon.

To understand how this spread of H3K27me3 relates to gene expression in time, we plotted the expression of the “host” gene (genes with promoter enrichment of H3K27me3 in pluripotent cells), and the “nearby” genes (with promoter within 100 kb region) over single cells arranged in pseudotime (Fig. 2d). Interestingly, the “host” genes displayed an increased expression before the spread of H3K27me3 signal, followed by silencing post-spreading (Fig. 2d). In contrast, the “nearby” genes displayed relatively smaller changes in transcription during this process but were also downregulated at a later stage (Fig. S4f). To identify genes whose expression is silenced as a result of spreading of H3K27me3, we applied linear regression to predict gene expression as a function of H3K27me3 density (defined as the number of reads per kb) on their nearest, or overlapping domains in metacells (Methods). Silenced genes showed a strong negative correlation of H3K27me3 density with their expression, with the strongest targets being hox and pcdh1 gene clusters (Fig. S4g).

Considering most genes seem to be in the process of gaining H3K27me3 until 24 hpf, we asked whether there is a subset of genes which lose H3K27me3. For this, we performed a differential H3K27me3 signal analysis for each cell type at 24hpf stage compared to cell types from earlier stages and selected genes with a significant loss of H3K27me3 in specific cell types (log2-FC <-1, FDR < 0.05). We identified 265 genes across 10 cell types, with most genes being detected in periderm and differentiating neurons (Fig 2e, Supplementary table 3). For almost all of these genes we observed that the H3K27me3 was specifically lost in their cell type of origin, while being gained in almost all other cell types compared to pluripotent cells, suggesting H3K27me3 loss is a cell-type specific process active during development. This loss was also proportional to a change in gene expression signal in those celltypes, and affected key developmental genes associated with these cells, such as POU family of transcription factors (mid/hindbrain), tet3 and tet2 (neurons/optic cup), myod1 (muscle) and tal1 (endothelium) (Fig 2f). This list also included many of the significant genes from our regression analysis which showed cell-type-specific expression at 24hpf, such as gata2a, dlx3b, shha and tet2 (Fig. S4h).

Overall our analysis shows that for a specific set of genes, silencing of gene expression is achieved once sufficient gene-body H3K27me3 coverage is achieved via cis-spreading, a process seemingly uncoupled with the prior transcription state of these genes. Further, we see that H3K27me3 demethylation can occur later in development as key developmental genes are re-activated in their corresponding cell types in a cell-type-specific manner.

Global chromatin state of cells is decoupled from gene expression during early development

Considering the heterogeneity in the repressive chromatin landscape of cells observed as early as gastrulation, we asked how the interplay between active and silenced chromatin is established at this stage. To map the active chromatin, we focussed on H3K4me1; a histone modification associated with active and poised enhancers and promoters, which, unlike H3K27me3, has been observed before zygotic genome activation (ZGA) in zebrafish [32]. To mitigate the interference of maternally contributed RNA, we implemented a new cell preparation protocol within woTChIC (Methods), which leads to the expulsion of cytoplasmic RNA from the cells (hereafter referred to as “nuclei’’ batch). We generated woT-ChIC data for H3K4me1 at 4, 6, 8, 10, and 12 hpf. As expected, our nuclei dataset shows a 4-fold higher ratio of unspliced RNA compared to spliced RNA, and an overall lower number of detected genes compared to the whole-cell data, in line with the expected lack of spliced maternal RNA in the nuclei (Fig. S5a, b, Supplementary Table 2). The chromatin quality was unaffected, exemplified by the similar number and pattern of H3K27me3 MNase cuts with time from the “nuclei” and “whole cell” batch (Fig. S5c). Finally, we integrated our nuclei dataset with the 4-12 hpf subset of the whole-cell H3K27me3 T-ChIC dataset, creating a high-quality multi-omic dataset of 15961 cells (H3K27me3: 9197, H3K4me1: 6764) covering zebrafish gastrulation (Fig. 3a, Methods).

Integrative analysis of H3K27me3, H3K4me1 and transcriptome.
**a. UMAP projection of the single cells based on the 3 modalities** H3K27me3 (left), RNA (center) and H3K4me1 (right) after integration of the two batches and annotation of cells. **b. Total UMI counts per cell on H3K4me1 enriched regions** from the integrated dataset, divided into K4me1-unique regions and regions co-enriched for K27me3. H3K4me1 signal (top panels) on these regions remains unchanged with time, while H3K27me3 signal (bottom panels) increases. **c. Correlation between histone modification and gene expression with latent time**. Top plot shows metacells arranged by pseudotime (Y-axis) and the Pearson correlation coefficient between H3K4me1 and RNA (top) and H3K27me3 and RNA. **d. Scatterplot showing correlation between H3K4me1 and unspliced RNA** for two selected metacells indicated in (c) (early and late in latent time).

With our integrated dataset, we first asked how the global chromatin state of the cells changes with time. Comparing total MNase cuts for H3K4me1 and H3K27me3 with time, we observed that while the H3K27me3 signal globally increases in cells with time, the H3K4me1 signal decreases (Fig. S5c). To understand whether this global change stems from a change in the activity of cis-regulatory regions (CREs), we separated the data into H3K4me1 enriched regions (representing active or poised promoters and enhancers), H3K27me3-enriched regions (mostly observed near genes/promoters in earlier analysis), and other (mostly intergenic) regions. While the majority (84%) of H3K27me3-enriched regions were found to overlap with an H3K4me1 domain and show increasing H3K27me3 signal with time, this increase is not accompanied by a decrease in H3K4me1 on these regions (Fig. 3b, Fig. S6a). Instead, the decrease in signal was observed in a minor fraction of H3K27me3-unique sites, and random genomic regions away from enriched sites (Fig. S6b), suggesting that this global change in signal does not represent a change in CRE activity. Further, the ratio of H3K4me1 to H3K27me3 suggests that most promoters remain in a “bivalent” chromatin state during 4-12 hpf in all lineages, with a small fraction showing increased H3K4me1 activity in any specific lineage (Fig. S6c).

To obtain a more fine-grained view of cellular differentiation time and lineages on our integrated data, we took advantage of the high unspliced counts from our protocol. We applied the RNA velocity model [33], which uses the ratio between spliced and unspliced reads of genes to obtain the cell’s differentiation path, and assigns a “latent time” to the cell, indicating their differentiation stage (Fig. S7a-e). We then asked how the change in the cell’s chromatin state relates to the transcription of genes during their differentiation. For this, we aggregated transcriptionally similar cells into metacells, and correlated the H3K4me1 and H3K27me3 signals with unspliced (i.e. newly transcribed) RNA signals for all genes in each metacell. Interestingly, we observed that the correlation between the gene-body H3K4me1 and transcription increases with the average latent time of a metacell (Fig. 3c-d). Promoter regions, however, did not show this trend (Fig. S6d). For H3K27me3, the global signal was mostly uncorrelated with transcription on both promoters and gene bodies (Fig. 3c, S6d). This suggests that despite increasing heterogeneity of chromatin signal, the overall chromatin state of a cell is decoupled from its transcriptional state during early development, and this coupling increases as the cells mature.

The chromatin state of binding sites predicts the function of transcription factors during gastrulation

We next asked whether our integrated dataset could inform us about the regulation of transcription factor (TF) networks and their role in lineage specification. While many lineage-defining TFs are biochemically predicted to have both activation and silencing functions, we hypothesised that the level of H3K4me1 on transcription factor binding sites (TFBS) might indicate which function the TFs play in a cell. For example, if a TF expression is correlated to a gain in H3K4me1 on its binding sites, it could indicate its role as a transcriptional activator, while a loss in H3K4me1 on TFBS might indicate a silencing function in that cell. Further, if a TF function is epigenetically regulated, then the chromatin state of the TF itself would also be predictive of its function. Based on this idea, we built a prediction model that combines the chromatin state and expression of TFs with their H3K4me1 activity on TFBS within cells (Methods). With a combined model, we aim to classify TFs based on both their own regulation via chromatin state (regulated/independent), as well their action on their targets (activator/repressor) (Fig. 4a).

Prediction of TF activity using TF epigenetics and transcription.
**a. Schematic and outputs of the prediction model.** 1) (top) Normalized H3K4me1 and H3K27me3 signal on the TF locus, TF (spliced) RNA signal, and indicators of cell state (pseudotime and lineage) are, used to predict TF activity (bottom) 2) The lasso regression model is used to select the most useful predictors for each TF. The bottom right plot shows the TFs sorted by the R² values on the independent test dataset. 3) Coefficients from the final models are ranked and compared to categorize the TFs. The number of TFs classified as activator/repressor, or regulated/independent are shown in the right plot. **b. UMAPs showing the expression and motif activities** of TFs classified as “activators” or “repressors” using the above model. TF expression (left) is based on (normalized) spliced RNA and TF activity (right) is based on H3K4me1 signal on TFBS. For activators, the motif activities on TFBS are gained in cells where the TFs are expressed, while for the repressors, the motif activities are lost in the cells expressing the TF. **c. UMAPs show (normalized) H3K4me1 signal and H3K27me3 signal** on TF gene body, for TFs classified as “regulated” or “independent”. The histone modifications on regulated TFs are correlated with their activities.

Our model predicted the H3K4me1 activity at TFBS with high accuracy (R² > 0.6) for 45 TFs. Our classification captured the well-established developmental function of TFs, such as the activating function of tbx16 in regulating paraxial mesoderm formation [34], and that of tfap2a, a transcriptional activator shown to be important for neural crest induction [35] (Fig. 4b). Further, it helped resolve the cell-type-specific functions of TFs predicted to be activators or repressors based on their protein domains (Fig. S8a, Supplementary Table 4). For example, zbtb16a, predicted to have a DNA-binding transcriptional repressor activity, and zeb1a, speculated as a context-dependent activator/repressor [36], were both revealed as a repressor during neural ectoderm (hindbrain) specification. Next, we asked whether the gain/loss of H3K4me1 activity is reflected in the gene transcription, measured as a change in nascent (unspliced) transcripts on the genes nearest to the TFBS. For our top predicted activators and repressors, we observed the expected up and downregulation of average nascent (unspliced) RNA signal of the target genes, corresponding to the change in TF expression and activity (Fig. S8c). This indicated that our model can capture new cell type-specific activation/repression functions of TFs during gastrulation. Additionally, our model also detects TFs that are epigenetically regulated (Fig. 4c, Fig. S8b, Supplementary Table 4). For TFs such as sox13, tbx16, lhx1a, tfap2a, a gain of H3K4me1, or a loss of H3K27me3, or a combination of both was associated with their respective TFBS activity. This indicates that while the majority of TFs expressed early in development appear to be regulated by alternative mechanisms, the chromatin state could play an important role in establishing the gene expression memory for a subset of developmentally important TFs.

Discussion

In this study, we adopted our recently developed T-ChIC method [19], to study the dynamics of active (H3K4me1) and silencing (H3K27me3) histone modifications during early zebrafish development. We observe a dynamic spatiotemporal localization of these histone modifications, previously unresolved by bulk chromatin profiling assays. This data allows for a direct comparison of the chromatin state and the expression of genes in individual cells and helps to understand the role of this interaction in regulating cell fates during early development.

We observe that H3K27me3 shows a promoter-anchored spread during development. At the start of differentiation, selected genomic loci with multiple promoters are pre-marked with a broadly distributed H3K27me3 (such as the hox and pcdh1 loci), while loci with single promoter (such as pax7b) appear as focussed H3K27me3 domains. A recent study has shown that such pre-marking is established by a non-canonical interaction between the two polycomb (PcG) complexes, PRC1 and PRC2 [37]. Here, we find that a selected set of these loci shows the spreading of H3K27me3 with differentiation, which eventually confers the silencing of host genes. A recent study using mouse embryonic stem cells proposes nucleation and spreading as a way to maintain PcG silencing [38]. Based on our observations, we propose that this mechanism could also help to propagate the spread of silencing during development. While this spread appears to be mostly not lineage-specific, we do observe a cell-type specific demethylation of many genes which developmentally important for cell-type specification, suggesting that the regulation of developmental genes via H3K27me3 could be established through a lineage-agnostic spread, followed by cell-type-specific demethylation. This could likely be a fundamental mechanism to establish a cell-type-specific gene expression memory via the polycomb complex considering the silencing of developmental genes in “alternate” lineages is a conserved function of H3K27me3 [39]. We see that in the absence of H3K27 de-methylation, important developmental genes such as hox, pax and shh genes are silenced in a spatiotemporal manner. A mis-localized expression of these known PcG targets has been observed after a deletion of the core PRC2 enzyme, ezh2 [40,41]. Apart from confirming these known targets, we additionally identify developmental genes such as rfx4, important for neural tube formation [42], dlx3b, important for placode development [43], among others, as novel PcG targets.

Although a rather large number of gene promoters are marked by H3K27me3 in pluripotent cells, only a minority of these show a genomic spread and silencing of the genes. By comparing the H3K4me3 to H3K27me3 signal on promoters, we find that most of these promoters are co-marked at 4-12 hpf, suggesting that they might serve as a “placeholder” for activation or silencing later in development. This provides an explanation to why the cis-spreading, and not the promoter enrichment of H3K27me3 is linked to gene silencing during development.

In line with previous studies [32,44], we find that the H3K4me1 is widespread in the genome of pluripotent cells, marking a large number of TF motifs and other genomic regions. While this chromatin mark systematically disappears in regions outside of cis-regulatory elements (CREs) during development, its activity on the CREs does not show a monotonous change with time. In fact, a systematic increase in H3K27me3 without a loss of H3K4me1 leads to a bivalent chromatin state on most CREs, together with a lineage-specific gain or loss of H3K4me1 on selected CREs. We show that these changes in H3K4me1 levels can be leveraged to predict the lineage-specific activator or repressor functions of TFs, by correlating this activity with the TF’s own expression and chromatin states. Using this approach, we find novel functions of TFs in lineage specification, such as the role of zbtb16a/b, zeb1a/b as negative regulators during ectoderm specification, and the tfap2a/b as a positive driver of non-neural ectoderm during gastrulation. We also find selected lineage-specifying TFs such as zfhx3, foxc1a, and irx3a whose activity seem to be regulated by their own chromatin state during gastrulation. These results might point to a new pathway through which the chromatin states of the cells play a role in specifying cell fates, i.e. by establishing a transcriptional memory on key lineage regulators.

Overall, comparing the active and silenced chromatin states of cells, we observe that the active state is a better predictor of a cell’s functional (transcriptomic) state in early development. A caveat is that we have not mapped other important silencing chromatin states, such as H3K9me3 or DNA methylation, which may show complementary dynamics in early development. We also see that both active and silencing states are rather uncoupled from transcription in pluripotent cells, and get correlated as the cells mature in development. Note that this maturation time is not necessarily the same as the developmental time (hpf) of the embryo, as the transcriptionally mature cells collected from early time points also show a high correlation of active chromatin state and transcription. Therefore, we propose that a correlation of chromatin and transcriptional state of cells could be a hallmark of cell identity formation during development. Future studies to systematically map the overall chromatin state of single cells and gene expression would further explain how cell fates are established during embryogenesis.

Methods

Whole-organism T-ChIC of zebrafish embryos

The detailed, step-by-step woT-ChIC protocol of Zebrafish embryos (from embryo collection to the preparation of sequencing libraries) is available at: https://www.protocols.io/view/single-cell-tchic-for-zebrafish-q26g7pbe8gwz/v1. Below, we briefly describe the cell collection and staining steps used to produce this dataset.

Wild-type TL embryos were collected 20 minutes after fertilization in a petri dish with E3 medium and kept at 28.5°C in an incubator. During the first hour, the unfertilized embryos were discarded. At the desired stage, embryos were dechorionated by incubation in 1 mg/mL of pronase and 30 to 50 embryos embryos were deyolked in Ca-free Ringer’s solution, pelleted and washed with 500 µL of PBS + 10% FBS. For early time points (4, 6, and 8 hpf), cells were dissociated with the addition of 200 µL of pre-warmed FACSmax cell dissociation solution (Genlantis T200100) for 5 minutes resuspending gently up and down at RT. For later time points (10, 12 and 24 hpf) cells were dissociated with the addition of 200 µL of pre-warmed Protease solution for 6 minutes on a shaker at 28°C and 400 rpm, resuspending gently every 2 minutes. After dissociation, cells were filtered with a 35 µL sieve (Corning, 352235), and washed with 500 µL of PBS + 10% FBS and resuspended is Wash Buffer 1 (WB1, described in the online protocol) and kept at +4°C before starting the CellTracer staining. For the “nuclei” batch, WB1 was modified with 0.05% Saponin (Sigma, 47036-250G-F) instead of 0.05% Tween20. Cells were vortexed well and kept in the dark at +4°C for 20 minutes to stain with a combination of CellTrace™ dyes (Thermofisher C34570, Thermofisher C34573, Thermofisher C3457, and a combinations of two of these). The staining was stopped with the addition of 70 µL of rat serum (Sigma Aldrich, R9759-5ML) and a 5 min incubation at RT. Lastly, cells were washed and resuspended in WB1 with spermidine solution (0,072 µL/mL) and 4 µL/mL 0,5 M EDTA. Once all time points had been stained with their appropriate dye/dyes combinations they were pooled in a 0.5-mL protein-low binding tube with approximately 1 million cells in total. Cells were incubated overnight at 4’C with primary antibodies (1:200 H3K27me3 rabbit mAB, cell signalling #9733; 1:100 H3K4me1 polyclonal Ab, thermofisher #710795). Next day, the cells are washed, and incubated with pA-MNase in WB1 for 1 hr, washed and sorted into indexed 384-well plates containing CelSeq2 adapters. Cells were incubated for 30’ min at 4’C for MNase digestion, and stopped with the stop solution before proceeding with the rest of the library preparation steps. The pA-MNase fusion protein was produced as described earlier [45]. Following T-ChIC library preparation and QC, the final DNA libraries are sequenced paired-end 100 bp, on either a NovaSeq or NextSeq2000, at a sequencing depth between 15 and 25 million reads per sample (384-well plate).

Processing and quality control of T-ChIC data

The first-in-pair reads from the T-ChIC protocol contain an RNA or ChIC barcode in the following format “RNA: 6N7X, ChIC: 3N8X”; where N=UMI nucleotide and X=Cell barcode nucleotide. We used a custom Python script to split the raw .fastq files into the ChIC and RNA fractions based on which one of the 2 barcode patterns is observed at the start. The two fractions are then independently mapped to the GRCz11/danRer11 genome. A complete processing workflow (from.fastq to count tables) with all parameters is available at: https://github.com/bhardwaj-lab/scChICflow (v 0.4) and is briefly described below.

The RNA fraction was trimmed using cutadapt (v2.1) [46] with parameters‘-e 0.1-q 16-O 3 --trim-n --minimum-length 10 --nextseq-trim=16-A W{’10’}‘, along with Illumina truseq barcodes provided as‘-a and-b‘options. The trimmed reads are mapped to the genome using STAR (v 2.7.11)[47], using the “StarSolo” mode, with these important parameters‘--sjdbGTFfile <dr11_ens104.gtf> --outFilterIntronMotifs RemoveNoncanonical --soloCBmatchWLtype Exact --soloType CB_UMI_Simplè, where “dr11_ens104.gtf” refers to the ENSEMBL annotation version 104 (GRCz11) [48]. Secondary and supplementary alignments and low-quality mappings (<MAPQ 255) were removed using samtools (v1.21) [49] and reads were de-duplicated with UMI-tools (v.1.0.0) [50] using cell barcode and UMI position, along with options‘--method unique --spliced-is-unique‘. Coverage files were created using deepTools bamCoverage with CPM normalization [51].

For the ChIC fraction, barcodes were moved into the read header using UMI-tools extract (v.1.0.0). Reads were trimmed using cutadapt (v2.1) with parameters‘-e 0.1-O 5-u 1-u-2-U - 2-a W{10}-A W{10}-q 30 --trim-n --minimum-length 20 --nextseq-trim=30‘, along with illumina truseq barcodes provided as‘-a and-b‘options. The trimmed reads are mapped to the genome using hisat2 (v2.2.1) [52], with parameters‘--sensitive --no-spliced-alignment --no-mixed --no-discordant --no-softclip-X 1000‘. Reads were de-duplicated with UMI-tools (v.1.0.0) using cell barcode and UMI position, along with options‘--method unique --spliced-is-unique --soft-clip-threshold 2‘. Quality control was performed using deepTools. Reads were counted on 50-kb windows in the genome using sincei [53].

Analysis of publicly available data

We downloaded the raw .fastq files for 6hpf bulk CUTnRUN data of H3K27me3 from Ozdilek et. al. (GSE178343) [54], and raw .fastq files of 12hpf and 24hpf ChIP-seq data from the danio-code portal (accessions - 12hpf H3K27me3: DCD003854SQ, 12hpf H3K4me1: DCD003854SQ, 24hpf H3K27me3: DCD003200SQ). All .fastq files were mapped to the GRCz11 genome using snakePipes’ DNA-mapping workflow, with parameters‘--trim --fastqc --mapq 5 --dedup --bwBinSize 1000‘ [55]. The de-duplicated BAM files were subsampled to match the sequencing depth of the corresponding pooled time points (early vs 6hpf, middle vs 12hpf, late vs 24hpf), and the read coverage was compared using deepTools [51] multiBAMSummary (with parameter‘-bs 50000‘) and plotFingerPrint (with parameters‘--skipZeros-bs 10000-n 50000‘). For the analysis of H3K27me3 signal in mouse embryos, we downloaded the H3K27me3 bedgraph files corresponding to stages: E5.5, Endoderm, Ectoderm and Mesoderm by Xiang et al., and plotted the signal over the +50 kb bins surrounding the mouse gene transcription start sites (mm9 genome) using deeptools ComputeMatrix (with additional parameters‘-bs 500 --missingDataAsZero --skipZeros --maxThreshold 1000‘) and plotHeatmap.

Cell clustering and annotation using RNA signal

For clustering of single cells based on RNA signal, we used the “spliced” count matrices for “whole-cell” T-ChIC data, and “total” (spliced+unspliced+ambiguous) counts for “nuclei” T-ChIC data. Filtering and clustering of cells were performed in scanpy (v1.9.1) [56]. We removed cells with‘total_counts < 1000, or n_genes_by_counts > 10000, or pct_counts_in_top_100_genes > 0.6‘). We also removed cells with < 70% of counts on protein-coding genes. We selected genes present in at least 1% of cells (or at least 50 cells, whichever is smaller), and selected the top 4000 variable genes based on their analytical Pearson residuals [57]. We used the Pearson residuals to calculate principal components (PCs) and built a neighbor graph using 50 PCs and 30 neighbors (20 for nuclei data). We then used it to build a paga graph [58] based on Leiden clusters (paga threshold = 0.1, leiden resolution = 1.5). For 2D representation, UMAPs were initiated with the paga graph, along with additional parameters‘min_dist=1, spread=5‘ (spread = 1 for nuclei).

For the annotation of cell types and all other analyses, we calculated the normalized ChIC and RNA signal using the “shifted log transform” method (‘1/sqrt(alpha) log(4 * alpha * x + 1)‘) [59], with a fixed overdispersion (alpha) of 0.05 and total counts (“normed_sum”) as library size factors. For annotation of cells, we obtained the raw count matrices from Wagner et. al [23] and subsetted for the 4, 6, 8, 10, 14, and 24 hpf timepoints (for the “nuclei’’ batch, we also excluded 24 hpf). We normalized the counts in the same manner as our counts and selected the top 4000 variable genes (using‘FindVariableFeatures(selection.method = “vst”)‘ in seurat). We then performed CCA-MNN analysis in seurat using‘FindTransferAnchors(method=”cca”)‘ and used the transfer score to predict labels for single cells (Fig S3A). For top predicted labels for each cluster, we then manually confirmed the marker gene expression from ZFIN in the respective cluster, followed by renaming the cluster to suitable ZFIN cell ontology [29].

Integrated analysis of nuclei and whole cell data

To integrate the “nuclei’’ and “whole-cell” subsets of data, we merged the cells from the “nuclei’’ batch with that of 4-12 hpf subset of the “whole-cell” batch, resulting in 15961 cells. We then removed genes with total spliced or unspliced counts < 100, or genes detected in < 100 cells, from the merged data, and calculated the top 4000 variable genes (HVGs) based on their analytical Pearson residuals [57] and used the intersection of the HVGs from the 2 batches (3091 genes) to perform PCA based on the pearson residuals of the “unspliced” counts from the 2 batches. Top 50 PCs were then used to align the 2 batches using harmony [60]. The harmony-corrected PCs were used for further analysis of the joint dataset (clustering, annotation, metacells and RNA velocity).

For calculation of latent time and lineages of cells on the integrated 4-12 hpf data, we combined the RNA velocity and diffusion pseudotime approach, using cellrank (v1.5.1) [61]. We calculated RNA-velocity using the “dynamical” model as described in the scVelo package (v0.2.4) [62]. The cell-specific moments Ms and Mu were calculated using the HVGs and PCs from the above analysis, and top 1000 genes were used for the dynamical model to calculate the gene-shared latent time and cell-specific velocities (scv.tl.recover_dynamics with parameters: fit_connected_states = False, max_iter = 50, t_max = 12, fit_basal_transcription=True, scv.tl.velocity with parameters: min_r2=0.2, groups_for_fit=<8-12 hpf>). The latent time was combined with connectivities using cellrank (cr.tl.transition_matrix parameter: weight_connectivities=0.2), and 1 initial and 6 terminal states were calculated using cellrank’s GPCCA estimator.

Metacell analysis

To obtain a detailed view of cellular heterogeneity while reducing dropouts, we aggregated transcriptionally similar cells into the so-called “metacells’’, based on archetype analysis implemented in the SEACells python package [30]. We used with SEAcells with parameters‘n_SEACells=nc, n_waypoint_eigs=15, convergence_epsilon = 1e-5‘, where nc = 180 whole-cell data, and 160 for the integrated 4-12 hpf data. Each metacell was then annotated with the mean latent time or pseudotime of underlying single-cells, and the max number of cells belonging to an annotated cell type or collection time (hpf). For analysis involving a comparison of H3K4me1 and H3K27me3, the underlying number of single-cells were downsampled for each metacell, such that equal number of cells from both the histone modifications are assigned to each metacell, and only metacells with a minimum of 20 cells from both histone modifications were kept, to assure robust results.

Cell clustering using the ChIC signal

For the clustering of single cells based on ChIC signal, we performed latent semantic analysis (LSA) using the gensim package in python [63]. The Cells*Regions sparse matrix was treated as a vector of documents (Cells), where region counts represent word frequency. The documents were then transformed with log term-frequency (TF), inverse document frequency (IDF) as follows:

Where f_ik refers to the count frequency of a (50 kb) genomic bin T_k in a cell D_i, and nk refers to the number of cells containing non-zero counts for the bin T_k. The output is subjected to a pivoted unique normalization [64] to take into account the difference in total number of detected regions per cell.

In our case, we calculated pivot as the average number of non-zero bins across all cells, and fixed the slope to 0.25 (recommended by [64]). The resulting matrix is subjected to a truncated SVD (singular value decomposition) [65], yielding Cell*Topic and Region*Topic matrices. Similar to scRNA-seq, we calculated 30 nearest neighbors using the 50 topics from the LSA output (dropping Topic-1, which strongly correlates with read depth), and used it to build the paga graph (with threshold = 0.1). UMAPs were initialized using the PAGA graph, with parameters‘min_dist=0.1, spread=5‘. Leiden clusters were calculated on the neighborhood graph with‘resolution=1.5‘.

Peak calling and annotation

For the detection of regions with both narrow and broad enrichment in the genome, we used a two-step peak calling approach. We first pooled all our filtered cells from 4 to 24hr time points into a BAM file and used histoneHMM (v1.7) function‘call_regions‘ with parameters‘-bs 750-P 0.1‘ [66], and further removed the detected regions with average posterior probability < 0.4, and referred to them as “domains”. Next, we performed peak-calling using MACS2 (v2.2.4) [67] on the same file, with parameters‘--mfold 0 50 --extsize 200 --broad --keep-dup all‘, and overlapped these peaks with the domains detected from histoneHMM. Peaks overlapping with the histoneHMM domains, and having a local enrichment score >=50, were referred to as “subpeaks”. For further analysis, we replaced the domains containing subpeaks with their respective subpeaks, resulting in 11221 enriched domains for H3K27me3, and 74004 domains for H3K4me1.

For peak annotation, we used the genomicRanges R package [68] to classify these domains into “promoter” (within +300 or-200 bases of a transcription start-site), genic (overlapping a gene body, but not promoter), and “intergenic” (outside promoters or gene body). The “genic” domains were re-classified into “gene covering”, if they covered >= 80% of a gene. All domains were annotated with the gene(s) which overlapped with, or (in the case of intergenic domains) were nearest to them. To detect which cis-regulatory elements are present inside these domains, we overlapped them with the location of “consensus PADREs” annotated by the danio-code project [69]. We used the gimmemotifs [70] (v0.18.0), with parameters‘scan -N 30‘ to annotate these peaks for the associated transcription factor binding sites (TFBS), using the‘vertebrate.v5.0‘ motif database. The detected motifs were then filtered for zebrafish TF motifs that are also present in the Swiss regulon (dr11) database [71], resulting in 590 motifs belonging to 912 TFs.

H3K27me3 spreading and demethylation analysis

To detect the regions in the genome showing H3K27me3 cis-spreading, we used the table of 5-kb bin counts in single cells. We defined the “center” bin as the bins showing non-zero counts in >= 5% of “pluripotent” cells (784 bins), and “neighbor” bins as the 10 up and downstream bins to the center bins. We then applied linear regression to predict the counts in “neighbor” bins as a function of the sum of counts in the “center” bins across metacells.

To test if the spread is germ layer-specific, we compared this model to a second model including germ layer covariate via a likelihood ratio test.

Similarly, we used a linear regression model formulation above for the prediction of H3K27me3 silenced genes, where

and

represent unspliced counts of genes overlapping a H3K27me3 peak, and is the length of their introns. are ChIC counts on the H3K27me3 domain and is the length of the domain.

For the analysis of loss of H3K27me3 upon differentiation. We assigned cell-types to metacells based on the highest proportion of cell labels in that group, then summed up the gene-level H3K27me3 counts. We then filtered the cell types with < 3 metacells and used edgeR[72] to perform a differential signal analysis between cell types using metacells as biological replicates. We used the following workflow:‘filterByExpr(dge, min.count=5, min.total.count = 20, min.prop = 0.3); calcNormFactors(), estimateDisp(design), glmQLFit(dge, design)‘ with default parameters. Next we filtered genes with‘FDR < 0.05, logFC <-1‘ to only retain genes with a loss of H3K27me3. We perform the same analsis for spliced and unspliced RNA counts for these genes (except filtering by logFC) to compare their results with that of H3K27me3 in the same cell types.

TF activity prediction and classification

To calculate the TF activity using H3K4me1 signal, we filtered our annotated H3K4me1 peaks (with assigned cPADREs) for peaks uniquely enriched for H3K4me1. Zebrafish TF motifs from the Swiss regulon (dr11) database [71], (590 motifs belonging to 912 TFs) were assigned to the peaks, based on motif match score inside the cPADREs within those peaks. Next, we obtained the H3K4me1 counts per peak per cell and assigned these counts to each TF motifs annotated with these peaks. Finally, we converted these raw counts into bias-corrected TF motif deviance scores per cell using chromVar [73]. We also calculated deviance scores for metacells, using the aggregated counts per metacell, instead of single cells.

For TF activity prediction, we calculated the normalized H3K4me1 and H3K27me3 and (spliced) RNA counts per metacell, along with the metacell annotations (germ layer, latent time) and used them to predict the TF activity using lasso-penalized regression [74]. We first divided the data into a 70-30 (training/test) set and used the training set to tune the penalty parameter (λ) using grid search on 10-fold resamples. The top model was then run on each TF separately on the test set, and evaluated based on R-squared estimates. The R-squared estimates were then compared to the permutation-based estimates to obtain a significance score (P-value) and adjusted for multiple testing via Bejamini-Hochberg (BH) correction to select top TFs for classification (Padj < 0.01). For classification of TFs, we extracted the weights for the final model for each TF at the highest λ, and interpreted them based on previous knowledge about these histone modifications. For example, since H3K4me1 activity represents active or poised enhancer, a positive correlation (weight > 0) between a TF expression and H3K4me1 activity suggests its action as an activator, while a negative correlation (weight < 0) suggests a repressor. Similarly, a non-zero weight for a TFs H3K4me1 or H3K27me3 level would indicate that a TF activity is regulated by either, or both of the marks.

Author information

Contributions

V.B and P.Z designed the experiments with input from A.v.O. P.Z contributed the T-ChIC protocol. A.G and H.V.G performed the experiments with inputs from P.Z and V.B. V.B performed the analysis and interpreted results with input from A.v.O, P.Z and A.G. V.B and A.v.O wrote the manuscript with input from other authors.

Supplementary Figures

Quality control for the RNA and ChIC fraction
a. Unique Fragment Counts (UFIs) and genes detected based on the spliced (left) and unspliced (right) counts. Cells with T-ChIC signal (in blue) show a trend comparable to the cells with only transcriptome signal (T-noChIC) (in gray). b. UMAP (based on RNA), showing cells with multi-omic signal (blue) co-embed with cells containing only transcriptomic signal (gray). c. Examples of cell-type-specific miRNA expression detected by TChIC’s whole transcriptome approach. d. Unique ChIC counts (representing MN cuts) per bin against the number of detected bins, for 50-kb genomic bins. Cells passing the ChIC QC are coloured in black, while the rest (QC Fail) cells are in Grey. e. Genome track plot showing the effect of QC-based cell filtering on signal enrichment. Top track (black) shows the signal after filtering of bad quality cells, while bottom (grey) track shows the filtered (bad quality) cells. f. Unique RNA fragments (UFI) per cell, classified as “spliced” or “unspliced”, on pooled early (4-6 hpf), middle (8-12 hpf) and late (24 hpf) timepoints. g. Unique ChIC counts (UMI) per cell on pooled early (4-6 hpf), middle (8-12 hpf) and late (24 hpf) timepoints. Mb; megabases. Chr; chromosome. UMI; unique molecular identifier. UFI: unique fragment identifier.

Evaluation of the quantitative chromatin signal from tChIC
a. Pearson correlation of pseudo-bulk H3K27me3 signal for pooled timepoints from our study with comparable time points from the published datasets. b. Fraction of cumulative signal of H3K27me3 in bins, ranked lowest to highest. Pooled time points from our study (blue, orange and green) show a stronger enrichment of H3K27me3 compared to the corresponding time points in public bulk data (red, purple and brown). A uniformly distributed signal over the genome (no enrichment) would lead to a diagonal line. c. 2D density plot comparing log-scaled H3K27me3 signal with gene expression for 3 genes showing decreasing (*hoxc3a*) increasing (*pcdh1b*) and transient (*rfx4*) silencing during development (from Fig. 1c-d) in single cells. d. UMAP showing the transient silencing of *rfx4* by H3K27me3 specific to the non-neural ectoderm.

Annotation of H3K27me3-RNA tChIC data (4-24 hpf)
a. Heatmaps showing prediction score (based on CCA-MNN) between cell types annotated by Wagner et al. (2018) on single cells from our study. Leiden clusters of our cells (based on RNA) are shown on the left. b. UMAP (based on RNA) with Leiden clusters for our data c. UMAP coloured by prediction confidence for the transferred label from Wagner et al. d. UMAPs showing meta cells (dark circles). Cells and metacells are coloured by pseudotime. e. UMAPs with final annotation of celltypes (labels are the same as Fig. 2a). f. Barplot showing proportion of cell types detected at 10 and 24 hpf within the 4 biological replicates of T-ChIC experiments (other timepoints were not profiled by all four replicates). Colors correspond to the cell type labels (from e).

H3K27me3 shows cis-spreading with time
a. Plot showing the number of detected bins compared to the average counts per bin in single-cells for the 3 pooled timepoints. With time the number of detected bins increases, while the average counts per bin are saturated. b. Scatterplot showing the H3K27me3 signal in selected bins (i.e. bins with non-zero signal in pluripotent cells) compared to the bins nearby in metacells. Bins that show spreading of H3K27me3 (top panel) are correlated with nearby bins, while others (bottom) are not. c. Single-cell heatmaps (similar to Fig 2c) showing the spread of signal with pseudotime for bins that were detected as spreading only in the “ectoderm” lineage, within cells selected from mesodermal lineage (15% of total). d. Pseudo-bulk track plots showing signals of pooled timepoints from our data (red) showing the spreading of certain peaks by 24 hpf, bottom bars show the domains (green) and sub-peaks (blue) detected for our data. Gray tracks show publicly available bulk datasets. e. Alternative to (d), showing the quantification of H3K27me3 signal density (number of cuts / kb) in H3K27me3 peaks vs 100-Kb region around peaks, in pseudo-bulks of annotated cell types (each dot is a region). The correlation improves with more differentiated cells, indicating signal spreading with time. Cluster 1 (red) which contains the hox genes are already broadly marked in pluripotent cells. f. Same as Fig 2d (top) showing the expression of genes near, but not at H3K27me3 enriched bins with pseudotime. g. Volcano plot showing results of the linear regression per peak with its nearest gene. Peaks with strongly correlated/anti-correlated signals between H3K27me3 and RNA are coloured in red in the volcano plot. h. UMAPs showing the H3K27me3 and RNA signal on top genes underlying the selected (red) peaks of (g). i. Heatmap showing signal localization around promoters (+- 50kb region) of mouse genes for H3K27me3 ChIP-seq from Xiang et al. (2019). Wider regions around promoters show a relative increase in H3K27me3 upon differentiation in ectoderm, compared to E5.5 epiblast.

Quality control and comparison of H3K27me and H3K4me1 signal in nuclei
a. Unique RNA fragments (UFIs) recovered per cell, for “spliced” and “unspliced” reads per time point, for H3K27me3 and H3K4me1-targeted cells from the nuclei batch. b. Number of genes detected versus unique RNA fragments (UFIs) recovered per cell, divided by targeted antibody (top: H3K27me3, bottom: H3K4me1). Nuclei are coloured by time point. c. Unique ChIC counts per cell divided by time point and targeted antibody. The amount of signal recovered for H3K27me3 increases with time, while the amount of H3K4me1 detected decreases. d. UMAP embedding (based on RNA), coloured by the target histone modification. e. UMAP showing the prediction confidence of label transfer per cell from Wagner et al. (2018) data to our data (similar to Fig. S3c). f. UMAP with metacell assignment (similar to Fig S3d) for 4-12 hpf data. g. UMAPs based on ChIC and RNA signal (same as Fig 3a) colored by manually assigned “germ layer” annotation.

Comparison of genomic distribution of H4K4me1 and H3K27me3 with time
a. Genome tracks showing H3K4me1 enrichment (blue) and H3K27me3 enrichment (red) on a randomly selected genomic region, demonstrating a high degree of overlap between the two. b. Total UMI counts per cell (similar to Fig 3b) on H3K27me3-unique enriched sites, and random 50-kb bins in the genome. Opposite to Fig 3b, H3K4me1 signal (top) on these regions goes down with time, while H3K27me3 signal remains unchanged. c. Histograms showing the ratio of H3K4me1 to H3K27me3 on (5 kb region centered at) promoters in each metacell. Metacells are sorted (top to bottom) and colored by their annotated germ layers. d. Correlation between histone modifications and gene expression (same as Fig 3c) in metacells, for promoters.

Derivation of latent time and lineages using RNA velocity on 4-12 hpf data
a. RNA velocity phase portraits showing spliced to unspliced count ratios of selected developmental genes. Thick red lines show the fits from the dynamical model (Bergen et al. 2020) used to derive transcription or degradation rates, while the dashed line shows the steady state. b. UMAPs colored by the confidence per cell for global RNA velocity signal extracted from the top 1000 genes. c. Vector field diagrams showing inferred differentiation trajectories projected over UMAP. **d-e.** UMAPs showing the latent time per cell inferred using cellrank (combining RNA-velocity and diffusion pseudotime), compared to the known real time of embryo collection.

Examples of TFs with predicted activation and repression functions and their epigenetic regulation
a. UMAPs showing expression and motif activities supporting the predicted functions of TFs (similar to Fig. 4b). b. UMAPs showing H3K4me1 and H3K27me3 levels on TF gene bodies, supporting the predicted chromatin regulation status of TFs (similar to Fig. 4c). c. Expression, Activity and the (average) expression of target genes for two activators (*ctcf, tbxta*) and two repressors (*tcf71la, zeb1b*). Target genes are identified as genes nearest to the H3K4me1 peaks associated with the selected TF motif.

Data availability

Raw sequencing data (.fastq), count tables (.h5ad/anndata format) with gene and cell-level metadata (including annotations) from this study are publicly available at GEO (GSE265874). Additionally, the GEO repository also contains signal tracks (.bigwigs) and peaks (.bed) files. Other source data behind our figures are available on zenodo: https://doi.org/10.5281/zenodo.16813408. The code for processing of tChIC data from raw (fastq) files up to count tables is available at https://github.com/bhardwaj-lab/scChICflow. The config files containing all preprocessing parameters for scChICflow, together with scripts to reproduce our figures are available on zenodo: https://doi.org/10.5281/zenodo.16813408.

Acknowledgements

We acknowledge the Utrecht Sequencing Facility (USEQ) for providing sequencing service and data. USEQ is subsidized by the UMC Utrecht and The Netherlands X-omics Initiative (NWO project 184.034.019). We thank Reinier van der Linden (Hubrecht FACS facility) for performing single-cell sorting. This work was supported by European Research Council Advanced under grant ERC-AdG 101053581--scTranslatomics, and the NWO consortium grant OCENW.GROOT.2019.017 to A.v.O. The SNF (P2BSP3-174991), HFSP (LT000209/2018-L) and Marie Skłodowska-Curie Actions (798573) supported P.Z. EMBO LTF (ALTF 1197–2019) supported V.B.

Additional files

Supplementary Tables.

Additional information

Funding

Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) (OCENW.GROOT.2019.017)

Alexander van Oudenaarden

European Molecular Biology Organization (EMBO) (ALTF 1197-2019)

Vivek Bhardwaj

Human Frontier Science Program (HFSP)

https://doi.org/10.52044/hfsp.lt0002092018-l.pc.gr.164142

Peter Zeller

EC | H2020 | PRIORITY 'Excellent science' | H2020 Marie Skłodowska-Curie Actions (MSCA)

https://doi.org/10.3030/798573

Peter Zeller

EC | European Research Council (ERC) (ERC-AdG 101053581)

Alexander van Oudenaarden

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Paired profiling of histone modifications and transcriptome of single cells during zebrafish embryogenesis

whole-organism T-ChIC of zebrafish embryos quantifies of full length transcripts and histone modifications from the same single cell.

Spatiotemporal spreading of H3K27me3 associates with the silencing of gene expression during development

Spatio-temporal spreading of H3K27me3 correlates with gene silencing.

Global chromatin state of cells is decoupled from gene expression during early development

Integrative analysis of H3K27me3, H3K4me1 and transcriptome.

The chromatin state of binding sites predicts the function of transcription factors during gastrulation

Prediction of TF activity using TF epigenetics and transcription.

Discussion

Methods

Whole-organism T-ChIC of zebrafish embryos

Processing and quality control of T-ChIC data

Analysis of publicly available data

Cell clustering and annotation using RNA signal

Integrated analysis of nuclei and whole cell data

Metacell analysis

Cell clustering using the ChIC signal

Peak calling and annotation

H3K27me3 spreading and demethylation analysis

TF activity prediction and classification

Author information

Contributions

Supplementary Figures

Quality control for the RNA and ChIC fraction

Evaluation of the quantitative chromatin signal from tChIC

Annotation of H3K27me3-RNA tChIC data (4-24 hpf)

H3K27me3 shows cis-spreading with time

Quality control and comparison of H3K27me and H3K4me1 signal in nuclei

Comparison of genomic distribution of H4K4me1 and H3K27me3 with time

Derivation of latent time and lineages using RNA velocity on 4-12 hpf data

Examples of TFs with predicted activation and repression functions and their epigenetic regulation

Data availability

Acknowledgements

Additional files

Additional information

Funding

References

Article and author information

Author information

Vivek Bhardwaj†*

Alberto Griffa*

Helena Viñas Gaza

Peter Zeller

Alexander van Oudenaarden

Author Notes

Version history

Cite all versions

Copyright

Metrics

Vivek Bhardwaj

Alberto Griffa