Abstract
Dynamic CpG methylation “barcodes” were read from 15,000 to 21,000 single cells from three human male brains. To overcome sparse sequencing coverage, the barcode had ∼31,000 rapidly fluctuating X-chromosome CpG sites (fCpGs), with at least 500 covered sites per cell and at least 30 common sites between cell pairs (average of ∼48). Barcodes appear to start methylated and record mitotic ages because excitatory neurons and glial cells that emerge later in development were less methylated. Barcodes are different between most cells, with average pairwise differences (PWDs) of ∼0.5 between cells. About 10 cell pairs per million were more closely related with PWDs < 0.05. Barcodes appear to record ancestry and reconstruct trees where more related cells had similar phenotypes, albeit some pairs had phenotypic differences. Inhibitory and excitatory neurons both showed evidence of tangential migration with related cells in different cortical regions. fCpG barcodes become polymorphic during development and can distinguish between thousands of human cells.
Introduction
Cell lineages outline tissue development. Complete fate maps are possible by direct observation for small organisms such as C. elegans, but various elegant experimental fate markers are employed for larger tissues and longer time intervals (1). For human tissues, prior experimental manipulations are impractical, and genomic alterations are employed. Somatic mutations mark subclones and their fates can be reconstructed with DNA sequencing (molecular clock hypothesis). Recent advances in single cell technologies potentially allow fate map reconstruction at single cell resolution.
Here we show how fluctuating CpG (fCpG) DNA methylation (2) can be used as dynamic barcodes to study human brain development using single cell epigenomes annotated with their locations and phenotypes (3). DNA methylation patterns are usually copied between cell divisions, but replication errors are much higher compared to base replication, allowing for more differences between daughter cells. DNA methylation modulates expression and their patterns can be used to infer cell phenotypes (3,4), but most fCpG sites are present outside of genes or in unexpressed genes. Criteria for our fCpG barcode are as follows: 1) a defined initial pattern in a progenitor cell; 2) polymorphic changes upon cell division; 3) adequate polymorphism to distinguish between most cells; and 4) capability to record ancestry.
The brain has several features that facilitate barcode development and validation.
Foremost, there is extensive single cell methylation data, with thousands of cells annotated by locations and phenotypes (3). Although billions of cells are present in an adult brain, lineage trees are compact because growth is largely neonatal. The brain also allows for serial “stopwatch” barcode sampling because development follows a caudal to rostral pattern, and groups of neurons characteristically stop dividing and differentiate at different times and locations (5). Brainstem neurons emerge early (6) and their barcodes should most resemble the initial progenitor state, whereas the stopwatch runs longer for excitatory neurons that appear later in development. To facilitate presentation, barcode performance is summarized as follows: The brain fCpG barcode initializes as predominately methylated in the progenitor cell and becomes polymorphic with more diverse barcodes in excitatory neurons that emerge later in development. The barcode becomes sufficiently polymorphic to uniquely distinguish between most sampled brain cells, and barcoded cells organize into lineage trees.
Results
fCpG barcode identification
Barcode development was limited by the sparse single cell data (3), with < 5% of CpG sites sequenced, often with only a single read. Sparse coverage was mitigated with X- chromosome fCpG sites because only a single read can infer a binary (0,1) state in male individuals. Autosomal CpG sites require at least 2 reads to infer three possible states (0, 0.5, 1). The X-chromosome also simplifies the identification of polymorphic fCpGs because many neurons have different binary states if average methylation is between 0.25 and 0.75 in bulk WGBS adult male neurons reference data (4).
CpG sites (N∼116,000), with average methylation between 0.25 and 0.75 in bulk neurons from seven males (4), were further filtered by discarding more stable CpG sites with average methylation less than 0.2 or more than 0.8 for all cells, inhibitory neurons, and excitatory neurons in brain H02. The ∼79,000 CpGs were further filtered to remove sites with average methylation less than 0.3 or greater than 0.7 in brain H01, and ∼31,000 fCpG sites were used for analysis.
fCpG site methylation appears neutral because they are predominately intergenic, with 16% within genes or promoters (File S1). Epigenomes from 15,434 to 21,836 cells were downloaded from three male brains with a general criterion of allc.tsv.gz file sizes 90 mb or larger (Table 1). Analyzed cells had at least 500 fCpGs (average ∼1,100), with pairwise distances (PWDs) calculated between cell pairs when at least 30 fCpGs were comparable (average ∼48 fCpGs per cell pair). Each cell, annotated by its provided phenotype and location, is characterized by its fCpG methylation level and its PWDs from other cells. A PWD of 0 is a perfect match and 0.5 indicates randomization. fCpG methylation was variable between cells with averages of ∼58% for all three brains (Fig 1A). The 73 to 197 million possible cell pair comparisons revealed polymorphic barcodes with average PWDs of ∼0.47 between cells (Fig 1B).
fCpG barcodes initialize methylated and change with cell division
Brain patterns were similar, and data are presented for H01, with H02 and H04 presented in Figs S1 and S2. Methylation was variable between cells of the same type and average methylation was highest in the brain stem (pons, thalamus), intermediate for inhibitory neurons, and lowest for excitatory neurons, non-neuronal cells, and cerebellar cells (Fig 2A). Outer layer cortical excitatory neurons (L2_3) that are made later during development were less methylated than inner cortical excitatory neurons (L4_6) that appear earlier. Predominately methylated individual fCpG sites were common in brainstem neurons, less frequent in inhibitory neurons, and rare in excitatory neurons (Fig 2B). Predominately unmethylated individual fCpGs were common in glial and hippocampal cells.
The methylation hierarchy is consistent with a barcode initialized with predominately methylated fCpGs in a progenitor cell. Barcodes becomes progressively demethylated and are fixed when their cells stop dividing and differentiate, which occurs at different times and places during brain development. Simplistically, the brainstem with mature neurons (6) forms early in development, followed by inhibitory neurons in the ganglionic eminences, and then excitatory neurons and glial cells in the cortex. Barcode methylation follows this temporal development and reconstruct when specific neuron types start to appear and reach their adult contents (Fig 2C). For example, after barcodes flip from ∼100 to ∼70% methylated, most adult brainstem and inhibitory neurons are present but excitatory neurons are fewer, with very few adult outer layer (L2_3) neurons. Outer excitatory and glial progenitor cells are present (7), but their barcodes continue to demethylate until they stop dividing and differentiate later in development. This stopwatch like pattern, with more demethylated barcodes in later appearing cell types, was present in all three adult brains.
fCpG barcodes are polymorphic
A progenitor cell barcode should become increasingly polymorphic with subsequent divisions. This pattern was observed, with average barcode PWDs lowest in the brainstem, intermediate between inhibitory neurons, and highest for excitatory neurons (Fig 3A). Most cells had different barcodes, with an overall average PWD of ∼0.47 (Fig 1B). Cells of the same phenotype were more similar with lower average PWDs (Fig 3A, 3B), suggesting they are more related to each other and have common progenitors.
The human brain has billions of cells and relatively few cells were sampled from each region. Consistent with sparse sampling, cell pairs with smaller PWDs (< 0.05) were rare. To help distinguish between ancestry and chance, cells within and between brains were compared (Table 1). Closely related cell pairs were ∼2.9 times more frequent within a brain (average ∼9.8 per million) compared to between brains (average ∼3.4 per million). Closely related cells had fewer matching fCpG sites (∼35 compared to ∼48 for all cell pairs) and were more common early in development when barcodes are more methylated (Fig 3C), indicating that lower barcode complexity favors matching. Overall, fCpG barcodes are sufficiently polymorphic to distinguish between most adult brain cells.
Brain lineage trees
It should be possible to reconstruct human brain development if barcodes record ancestry. fCpG barcodes from ∼1,000 brain cells with different phenotypes yield lineage trees that resemble caudal to rostral development (Fig 4). The trees are rooted by a progenitor with a fully methylated barcode, and branches progressively yield brainstem neurons, a subset of excitatory lower (L4_6) neurons, thalamic neurons (THM), inhibitory neurons, cerebellar cells, and non-neuronal cells. Excitatory neurons branch last, and hippocampal neurons (CA, DG) that may divide postnatally (8) were at the terminus. Cells generally grouped by phenotype, with some early appearing excitatory neurons admixed among inhibitory neurons. Similar trees were observed for H02 and H04, albeit with less separation between inhibitory and excitatory neurons for H04 (Fig 4A). Barcode lineage trees are largely consistent with expected sequential neuronal differentiation.
Cell lineage fidelity and cortical migration
Barcodes could record neuronal differentiation and migration. Uncertain for mouse and human development is whether inhibitory and excitatory neurons originate from shared or distinct progenitors (3,9,10). Lineage fidelity can be quantified by comparing most closely related cells or nearest neighbor cell pairs with PWDs < 0.05. Lineage fidelity was high (>90%) for inhibitory neurons (Fig 5A). Excitatory lineage fidelity was slightly lower, indicating that some excitatory and inhibitory neurons may share common progenitors (10). Lineage trees indicate common progenitors are present earlier in development, and excitatory neurons that appear later do not have many closely related inhibitory neighbors (Fig 4B). The barcodes documented the known switching between inhibitory neuron subtypes (Fig 5B). Brainstem, cerebellar, excitatory, and non-neuronal cells had less subtype lineage fidelity.
Barcodes can also infer migration because their neurons are annotated by their adult locations. Trees (Fig 4C) indicate that most neurons sampled from the brainstem, cerebellar and hippocampal regions are related and localized to their respective regions. Inhibitory neurons were scattered throughout the cortex, consistent with their differentiation in the ganglionic eminences and subsequent tangential migration to the cortex. Nearest neighbor inhibitory cortical neuron pairs were found in the same cortical region ∼25% of the time (Fig 5C and 5D). Nearest neighbor excitatory neuron pairs were also scattered throughout the cortex, but less than inhibitory neurons, and were in the same cortical region ∼50% of the time.
The poor ability to detect localized excitatory neuron radial cortical migration with ∼1,000 cell whole brain trees (Fig 4C) may reflect that sparse sampling is unlikely to include multiple neurons from the same small clonal region (radial unit hypothesis (11)). Greater localized excitatory neuron migration was seen when trees were reconstructed with more (∼2,800) neurons, while inhibitory neurons still showed scattered tangential migration (Fig 4D). Neurons of the same subtype were still more related. Hence, lineage trees appear to increase their resolution with more cells, albeit related lower and upper excitatory neuron pairs were still uncommon, which may reflect the unlikely chance of sampling very small radial clonal units.
fCpG methylation and post-mitotic epigenetic remodeling
After progenitors stop dividing, differentiation occurs through epigenetic remodeling and neuron specific methylation (12). fCpG patterns could reflect this post-mitotic remodeling because neurons of the same phenotype generally have more similar barcodes (Fig 3A, B). To help separate ancestry from post-mitotic phenotypic differentiation, barcodes, and the gene methylation tsne coordinates used to phenotype the neurons (3), were compared (Fig 5E). Cell pairs with closely related fCpG barcodes had similar phenotypic methylation patterns, but many cells with small phenotypic differences had very different barcodes. Hence, fCpG barcodes do not correlate well with post-mitotic epigenetic remodeling because phenotypically similar cells can be unrelated.
Discussion
Dynamic barcodes would be useful to study human tissues, but testing their performance is difficult. Ideally, samples obtained at different times would document how they change. The brain facilitates barcode validation because it periodically stores neurons that stop recording at relatively defined times and locations (1,13). Specific neuron subsets recovered from the adult brain allow for sampling through time and before birth.
This serial sampling strategy facilitated fCpG barcode validation. The barcode started predominately methylated in multiple individuals and became sufficiently polymorphic to distinguish between thousands of neurons. Barcode changes appear to represent replication errors because they reconstruct lineage trees consistent with caudal to rostral brain development. Barcode methylation indicates when different neurons that survive to adulthood appear in the neonatal brain (Fig 2C).
The current barcode indicates that inhibitory and excitatory neurons have relative distinct progenitors, consistent with the lineage dendrograms reconstructed with neuron specific methylation of the same data (3). There was also evidence for common inhibitory and excitatory progenitors (10), primarily for earlier emerging excitatory neurons. Tangential migration was also detected, manifested by inhibitory neurons with closely related barcodes in different cortical regions. Tangential excitatory neuron migration was also detected, albeit related excitatory neurons were more localized than inhibitory neurons. Tangential migration is also seen with sequencing studies that find neurons with specific mutations in multiple brain regions (14–17).
fCpGs more efficiently distinguish between cells than mutations due to higher replication error rates. Although average methylation decreases with time, both demethylation and remethylation are likely because fully demethylated neurons were not observed, and balanced fluctuating methylation is inferred in other tissues when CpG sites are ∼50% methylated in bulk tissues (2). More adult divisions in brain cancers did not saturate the barcode with average fCpG methylation ∼50% (Fig S3). Fluctuating methylation complicates lineage tracing but backmutations can be modeled for ancestral reconstructions. Lineage resolution could be improved by combining mutations and fCpGs.
Weaknesses of this study including very sparse cell sampling and lack of uniform CpG sites comparisons between neurons. Inferred lineage trees (Fig 4) had relatively low statistical support for their branches. Like many human fate markers studies, it is difficult to independently verify accuracy. However, preliminary studies are largely consistent with brain development and sequential stopwatch like neurogenesis. Technical improvements such as targeted bisulfite sequencing of a limited number of informative fCpGs could lead to more consistent coverage and less expensive sequencing of more neurons. Single cell measurements of small numbers of fCpGs, and snMCode cell type specific CpG sites (3), could efficiently reconstruct human brain lineages. A barcode of 100 fCpGs has enough complexity (2100 or ∼1X1030) to potentially distinguish between most excitatory neurons, with less resolution early in development when cells are inherently more related.
The analysis of more brains can verify that a fCpG barcode starts predominately methylated in most individuals. A common initialized state could facilitate standardized human fate maps and comparisons between individuals. Many polymorphisms linked to brain abnormalities such as autism are in neuronal proliferation, migration, and maturation pathways (18), and this preliminary survey indicates lineage heterogeneity between individuals (Fig 4A). fCpG barcodes have been applied to the intestines, endometrium, and blood (2), and could be found for multiple other tissue types, helping to unravel human development and aging.
Methods
Brain Single Cells
Single cells with their annotations and methylation at each fCpG site were read from single cell files downloaded from GEO (GSE215353) and supplemental files from reference 3. Lists of fCpG sites and data summarized for the Figures are in Supplemental File 2. The cells and methylation at the fCpG sites are in Supplemental Files 3-5. PWDs were calculated between all cell pairs with at least 30 matching fCpG sites, with PWD data matrices in Supplemental File 6.
IQtree
IQtree (19) tree was downloaded and run on a server with 64 cpus and 32 GB of memory. The model (GTR2+FO+G4) accounts for backmutation and binary data with missing values. Bootstraps were 1,000 per tree with 3,000 iterations for whole brain (∼1,000 cells) trees and 1,000 iterations for inhibitory or excitatory (∼2,800 neurons) trees. Trees (.treefile) were displayed with FigTree (http://tree.bio.ed.ac.uk/software/figtree/) with truncation of long branches (generally fewer than 10) for display purposes. The cells used for the trees are in Supplemental File 7.
Acknowledgements
This work was supported by grants from the NIH (P01CA196569 and CA271237). I thank Drs. Trevor Graham and Heather Grant for useful discussions, and Omar Khan and Nikhil Krishnan for initial studies. The author thanks all of the researchers that helped produced the high quality very valuable and freely available data used for analysis.
References
- 1.Recording development with single cell dynamic lineage tracingDevelopment 146
- 2.Fluctuating methylation clocks for cell lineage tracing at high temporal resolution in human tissuesNature Biotechnology 40:720–730
- 3.Single-cell DNA methylation and 3D genome architecture in the human brainScience 382
- 4.A DNA methylation atlas of normal human cell typesNature 613:355–364
- 5.The basics of brain developmentNeuropsychology Review 20:327–348
- 6.Single-cell transcriptome analysis reveals cell lineage specification in temporal-spatial patterns in human cortical developmentScience Advances 6
- 7.Single- cell atlas of early human brain development highlights heterogeneity of human neuroepithelial cells and early radial gliaNature Neuroscience 24:584–594
- 8.Neurogenesis in the adult brainJournal of Neuroscience 22:612–613
- 9.Single-cell delineation of lineage and genetic identity in the mouse brainNature 601:404–409
- 10.Individual human cortical progenitors can produce excitatory and inhibitory neuronsNature 601:397–403
- 11.Specification of cerebral cortical areasScience 241:170–176
- 12.An epigenetic barrier sets the timing of human neuronal maturationNature 626:881–890
- 13.Linked regularities in the development and evolution of mammalian brainsScience 268:1578–1584
- 14.Somatic mutation in single human neurons tracks developmental and transcriptional historyScience 350:94–98
- 15.Somatic mosaicism reveals clonal distributions of neocortical developmentNature 604:689–696
- 16.Cell-type-resolved mosaicism reveals clonal dynamics of the human forebrainNature 629:384–392
- 17.Cell lineage analysis in human brain using endogenous retroelementsNeuron 85:49–59
- 18.Toward a better understanding of neuronal migration deficits in autism spectrum disordersFrontiers in Cell and Developmental Biology 7
- 19.IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogeniesMolecular Biology and Evolution 32:268–274
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
Copyright
© 2024, Darryl Shibata
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 277
- downloads
- 2
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.