Introduction

The vertebrate head is a highly complex region of the body that plays a key role in an organism’s ecology by centralizing numerous structures involved in diet, sensory perception and behavior1. Consequently, evolution has modified craniofacial development across lineages, producing a wide array of head morphologies concomitant with the diverse niches that vertebrates occupy. Craniofacial diversity among the major mammalian lineages in particular has long been of great interest, due to the striking differences in their developmental ontology.

Placental mammals are characterized by a long gestation with a high maternal investment during pregnancy with a considerable degree of both orofacial and neurocranial development occurring in the embryo in utero. As a result, the placental young experiences little functional constraint during early developmental stages. By contrast, marsupials have a short gestation and give birth to highly altricial young that must crawl to the teat, typically located within the maternal pouch, where they complete the remainder of their development ex utero. This unique reproductive method is thought to have imposed strong pressures on the evolution and development of the limbs and head. In particular, marsupials show accelerated development of the nasal cavity, tongue, oral bones and musculature relative to the development of the posterior end of the body25, and generally when compared to placental embryonic development2,3,6. Additionally, despite the delay in development of the central nervous system in marsupials, aspects of the peripheral nervous system appear to be accelerated in the face. We and others have observed large medial nasal swellings early in marsupial development that are innervated and proposed to be necessary for the sensory needs of newborn young in their journey to the pouch79. Comparative morphometric studies have provided a wealth of evidence that this stark difference in craniofacial development has imposed different regimes of constraint on marsupial and placental mammals2,3,5,6,1012 with marsupials in particular showing significantly less interspecies variation in orofacial structures and nasal morphology than placental mammals1214. In spite of these observations, the molecular mechanisms that underlie these differences in early craniofacial development between marsupial and placental remain poorly understood.

Cis-acting regulatory regions have been proposed to play a significant role in morphological divergence in the face, with a number of well-described enhancers that fine-tune face shape in mammals15. There is also some evidence of a role for regulatory regions in craniofacial heterochrony in marsupials. One recent study found a marsupial-specific region within a Sox9 enhancer that drives early and broad expression in pre-migratory neural crest cell domains contributing to early migration of cranial neural crest cells relative to the mouse16,17. However, no study has thus far attempted to compare the overall regulatory landscape between marsupials and placentals at developmentally comparable stages. Such surveys have the potential to provide functional insights into the loci controlling craniofacial heterochrony in mammals and consequently the causative evolutionary changes in the genome that have driven the divergent ontogenies of marsupials and placentals.

In recent years, the fat-tailed dunnart (Sminthopsis crassicaudata, hereafter referred to as the dunnart) has emerged as a tractable marsupial model species5,18. Dunnarts are born after 13.5 days of gestation and craniofacial heterochrony in line with what has been reported in other marsupials is readily observable2,3,5,6,1012, making this species an excellent system for comparative studies with placental models. Being one of the most altricial marsupials, the dunnart provides an ideal model in which to study the regulatory landscape during craniofacial development in marsupials, as they rely entirely on the advanced chondrocranium with bony elements of the skeleton not present until approximately 24 hours after birth5. To investigate a potential role for regulatory elements in this heterochrony, we used ChIP-sequencing and RNA-sequencing on craniofacial tissue (fronto-nasal, mandibular and maxillary prominences) collected from new-born dunnart pouch young to perform a detailed characterization of chromatin marks during early craniofacial development and then to facilitate comparative analyses with the placental laboratory mouse. Our work provides valuable insights into genomic regions associated with regulatory elements regulating craniofacial development in marsupials and their potential role in craniofacial heterochrony.

Results

Defining craniofacial putative enhancer- and promoter regions in the dunnart

After validating the ability of our antibodies to enrich for dunnart chromatin marks (SupNote1), ChIP-seq libraries were sequenced to average depth of 57 million reads and mapped to a de novo assembly of the dunnart genome generated for this study (Methods for more details). Peak calling with MACS2 (q < 0.05) identified 80,989 regions reproducibly enriched for H3K4me3 and 121,281 regions reproducibly enriched for H3K27ac in dunnart facial prominence tissue. Similar to previous studies1923, we found that H3K4me3 often (62% of all H3K4me3 peaks) co-occupied the genome with H3K27ac (Fig1A) while 50% of H3K27ac-enriched regions were only associated with this mark (Fig1A). Active enhancers are generally enriched for H3K27ac19,24 while sites of transcription initiation (active promoters) can be identified as being marked by both H3K27ac and H3K4me322,23.

Analysis workflow and quality control of H3K4me3 and H3K27ac peaks in the fat-tailed dunnart (Sminthopsis crassicaudata).

(A) Drawing of a D0 dunnart pouch young with dissected orofacial tissue shown in gray. Short-read alignment and peak calling workflow and numbers of reproducible peaks identified for H3K27ac (orange), H3K4me3 (blue) for craniofacial tissue. (B) Log10 distance to the nearest TSS for putative enhancer (orange) and promoters (blue). (C) Log10 of peak intensity and peak length are represented as boxplots and violin plots for putative enhancers (orange) and promoters (blue). Peak intensities correspond to average fold enrichment values over total input DNA across biological replicates. (D) Dunnart pouch young on the day of birth. Scale bar = 1mm. (E) Adult female dunnart carrying four young.

We initially defined promoters as those marked by only H3K4me3 or with >50% of reciprocal peak length for H3K27ac and H3K4me3 peaks, and enhancers as those marked only by H3K27ac, identifying 66,802 promoters and 60,626 enhancers. Enhancers were located on average 77 kb from TSS, while promoters were located on average 106 kb from the nearest TSS, despite there being a greater number of peaks located <1 kb from the TSS (1,008 enhancers versus 9,023 promoters; Fig1B). This was an unexpected finding as a large fraction (0.41) of promoters were located >3 kb from an annotated TSS (see SupNote2). Distance from TSS is frequently used to filter putative promoters from other elements. Therefore, we grouped peaks characterized as promoters based on their distance to the nearest TSS (see Methods), resulting in 12,295 high-confidence promoters for all of the following analyses (see SupNote3).

Candidate dunnart regulatory elements are associated with highly expressed genes involved in muscle, skin and bone development

Next, we asked what biological processes are associated with active regulatory regions in the dunnart face. To accomplish this, we linked peaks to genes in order to associate functional annotations of coding genes with the candidate regulatory elements that likely regulate their expression. To make use of resources available in model organisms such as GO databases, we converted all dunnart gene IDs to mouse orthologous genes for downstream applications. This reduced the dataset to 35,677 putative enhancers and 8,590 promoters near (within 1Mbp) genes with a one-to-one ortholog in mouse (SupTable1).

We found that gene annotations for both promoters and enhancers were enriched for 23% of the same GO terms, including cellular processes (protein localisation to plasma membrane, protein localisation to cell periphery, regulation of cell morphogenesis, positive regulation of cell migration) and development (axon development, camera-type eye development, muscle tissue development, striated muscle development). By contrast, 42% of GO terms were uniquely enriched amongst genes assigned to promoters which were related to mRNA processing, transcription, mRNA stability, cell cycle, and mRNA degradation (Fig2A, SupTable2) and 36% of uniquely enriched GO terms for genes assigned to candidate enhancers corresponded to processes indicative of early embryonic development (Fig2A, SupTable3).

Predicted functional enrichment for dunnart peaks.

(A) 304 significantly enriched GO terms clustered based on similarity of the terms. The function of the terms in each group are summarized by word clouds of the keywords. Rows marked by P were driven by genes linked to putative promoters, rows marked by E were driven by genes linked to putative enhancers. (B) Enriched TF motifs for transcription factor families (HOMER). PWM logos for preferred binding motifs of TFs are shown. The letter size indicates the probability of a TF binding given the nucleotide composition.

Terms related to facial skeleton development were enriched amongst genes assigned to putative dunnart enhancers, including bone cell development, muscle cell development, secondary palate development, roof of mouth development and mesenchyme development, consistent with dunnart craniofacial morphology5. Enhancers active near important palate genes25,26 such as SHH, SATB2, MEF2C, SNAI2 and IRF6 in the dunnart at birth may highlight potential regulatory mechanisms driving early palatal closure. In addition, terms related to development of the circulatory system, including regulation of vasculature development, circulatory system process and blood circulation were enriched amongst genes linked to predicted enhancer regions (for example, ACE, PDGFB, GATA4, GATA6, VEGFA). This is consistent with observations that show the oral region of newborn dunnarts is highly vascularised, with blood vessels visible through their translucent skin at birth5.

To gain further insight into dunnart gene regulation at this developmental stage, we scanned putative enhancers and promoters for 440 known Homer vertebrate motifs and tested for enriched TFs27. Enhancers were significantly enriched for 170 TFs relative to a background set of random GC- and length matched sequences (FDR-corrected, p < 0.01), including those with known roles in differentiation of cranial neural crest cells (TWIST, HOXA2), skeletal morphogenesis (DLX5, CREB5, HOXA2), bone development (ATF3, RUNX), cranial nerve development (ATOH1) and/or facial mesenchyme development (LHX2, FOXP1, MAFB; Fig2B, SupFig1). Consistent with the GO enrichment, TFBS in promoter sequences were dominated by transcriptional initiation regulatory sequences, with significant enrichment for 13 TFs (FDR-corrected, p < 0.01) including RFX3, RFX2, NRF, NRF1, GRY, ZBTB33, RONIN, JUND and GFX (Fig2B, SupFig1).

Next, we assessed predicted target gene expression by performing bulk RNA-seq in dunnart face tissue collected on the day of birth. There were 12,153 genes reproducibly expressed at a level > 1 TPM across three biological replicates, with 67% of genes (8158/12153) near an enhancer and/or promoter peak. Most enhancers (78% of all enhancers, SupTable4) or promoter (87% of all promoters, SupTable5) were linked to a reproducibly expressed gene in the dunnart (Fig3A) Additionally, the majority of reproducibly expressed genes near a regulatory peak (61%) were associated with both an enhancer and promoter region (Fig3B), highlighting the correspondence in active regulatory elements and expressed genes at this time point in the dunnart face. Genes with a medium-to-high expression level at this stage (>10 TPM) and associated with at least one putative enhancer and promoter were enriched for biological processes including “in utero embryonic development”, “skeletal system development”, “muscle tissue development”, “skin development”, “vasculature development” and “sensory organ development” (Fig3C, SupTable6). Enrichment for the term “in utero embryonic development” is indicative of the altricial nature of the dunnart neonate. In a previous study, we showed that in the dunnart ossification begins post-birth (day of birth young corresponding approximately to embryonic day (E) 12.5 in mouse) and that the dunnart neonate instead likely relies on the well-developed cranial muscles and a extremely large chondrocranium for structural head and feeding support during early pouch life5,28. Consistent with this, we observed high expression (>20 TPM) of the key head myogenesis genes2931, MYOD1, MYF6, MEF2C, PAX3, MYL1 and MYOG, essential genes regulating chondrogenesis32,33, SOX9, COL2A1 and FGFR1 and genes that act upstream of osteoblast differentiation34,35, MSX1, MSX2, CEBPA/G, ALPL, DLX3, DLX5, FGFR1, FGFR2 (SupTable4 and SupTable5).

Genes linked to craniofacial enhancers and promoters in the dunnart are reproducibly expressed and involved in embryonic vasculature, muscle, skin and sensory system development.

(A) The majority of nearest genes assigned to candidate enhancers and promoters were reproducibly expressed in dunnart face tissue, and (B) reproducibly expressed genes in dunnart were associated with both a promoter and enhancer region. (C) Biological processes enriched for genes medium to highly expressed (>10TPM) and linked to both a promoter and enhancer region (FDR-corrected, p < 0.01).

Comparative analyses of regulatory elements in mouse and dunnart reveal conserved and dunnart-specific enhancers during craniofacial development

Previously, we defined the postnatal craniofacial development of the dunnart and characterized the developmental differences between dunnart and mouse5. Marsupials, including dunnarts, have accelerated development of the cranial bones, musculature and peripheral nerves when compared to placental mammals such as the mouse29. Having now characterized the chromatin landscape of the dunnart’s craniofacial region, we next sought to compare it to that of a placental mammal. To do this, we took advantage of publicly available ChIP-seq data for H3K4me3 and H3K27ac generated by the mouse ENCODE consortium3638 spanning multiple developmental time-points (E10.5-E15.5).

After applying consistent peak calling and filtering parameters to the dunnart ChIP-seq data we found ∼17K enhancers and ∼10K promoters per stage in the mouse (Fig4A, see SupNote4). The features of mouse predicted enhancer and promoters (percentage CpG, GC content, distance from TSS, peak length and peak intensity) were consistent with the observations in the dunnart (Fig4A, SupFig2B-E). After building dunnart-mm10 liftover chains (see Methods and SupNote5) we compared mouse and dunnart regulatory elements. Between 0.74-6.77% of enhancer regions out of all alignable enhancers were present in both mouse and dunnart (SupTable7B). In contrast, between 45-57% of alignable promoter regions were present in mouse and dunnart (SupTable7A). Although this is a small fraction of the total peaks (∼8% for promoters and ∼0.5% of enhancers (SupTable7), it suggests that, consistent with the literature19, promoter regions are more stable over large evolutionary distances and that shifts in developmental timing of craniofacial marsupial may be more likely to be driven by recently evolved enhancer regions in marsupials.

Analysis workflow and features of H3K4me3 and H3K27ac enhancers and promoters in dunnart and mouse.

(A) Alignment filtering and peak calling workflow and (B) number of reproducible predicted regulatory elements identified in the dunnart and mouse embryonic stages. Log10 of distance to the nearest TSS for putative (C) enhancers and (D) promoters.

To further contextualize the regulatory patterns in the dunnart compared to mouse, we retrieved mouse gene expression data from embryonic facial prominence tissue collected from E10.5 to E15.5 in the mouse ENCODE database3638, and incorporated this into our comparative analyses. We considered only one-to-one orthologous genes in mouse and dunnart. Predicted dunnart enhancers with sequence conservation in mouse and matching peak activity at any of the five mouse embryonic stages (461 regions) were located near genes reproducibly expressed (>1 TPM, 89%) in both species (SupTable8). Genes in this group were enriched for core developmental and signaling processes; BMP signaling, cartilage development, ossification, skeletal development and chondrocyte differentiation (SupTable9). Taking the predicted dunnart enhancers alignable to the mouse but without a matching peak, we looked at nearby genes and compared expression between mouse and dunnart. For the 4,311 dunnart-specific enhancers we found that 2,310 (54%) were linked to genes expressed >1 TPM in all stages and species, suggesting that these genes in mouse could be regulated by a different set of regulatory regions or could be accounted for by the reduced enrichment in the mouse ENCODE ChIP-seq face datasets for H3K27ac (SupFig2). We found a smaller subset (179 regions, 114 unique genes) where the nearest genes were highly expressed (>10 TPM) only in the dunnart with low to no activity in the mouse at any of the embryonic stages (E10.5-E15.5; SupTable10). This included genes involved in cranial neural crest proliferation and migration (INKA2, TFAP2E, OVOL2, GPR161), keratinocyte proliferation (PLAU, HOXA1), embryonic myogenesis (PDF4), development of the ectodermal placodes and sensory systems (CNGA2, ELF5, EDN1, HOXA1, ATOH1, NPHP4, CFD, WNT2B) and vomeronasal sensory neuron development (TFAP2E; SupTable10).

Dunnart-specific expression of genes involved in the development of the mechanosensory system

Given the large evolutionary distance between the mouse and dunnart and low recovery of alignable regions, we performed a comparison between species at the gene level, by comparing genes assigned to putative enhancers and promoters between the dunnart and mouse. The number of genes that intersect can provide an idea of the similarities in genes and pathways regulated across a larger subset of the total regulatory dataset. The largest intersection size in genes with putative promoters was between the six mouse embryonic stages (1,910 genes; 21.2%) and between the dunnart all six embryonic mouse stages (1,908 genes, 21.2%; Fig5A). Overlap between enhancers was more restricted, with 4,483 predicted target genes (56%) being unique to the dunnart at D0 (Fig5A). The top enriched terms for biological processes were largely shared across dunnart and mouse, with the exception of one GO term, “sensory system development” (SupFig3A). We further investigated this by incorporating gene expression data for mouse and dunnart for genes near putative enhancers. Genes highly expressed in dunnart but lowly or not expressed in mouse (537 total; see SupTable11) were related to three main developmental processes, “epidermis/skin development and keratinization”, “sensory system development” and “muscle development and contraction” (Fig5C,D; SupTable12). The majority (70/114) of genes associated with the sequence conserved dunnart-specific enhancers (see SupTable10) overlapped with the list of genes reported here.

Genes near enhancers highly expressed only in dunnart are involved in development of the skin, muscle and mechanosensory systems.

Gene set intersections across mouse (E10.5-E15.5) and dunnart (P0) for (A) genes near promoters and (B) genes near enhancers. (C) Gene ontology term enrichment for top 500 highly expressed genes in dunnart. (D) Top 50 highly expressed genes (TPM) in dunnart compared with mouse embryonic stages. Scale bar in E. (E) Expression levels (TPM) for keratin genes across dunnart and mouse.

Genes critical for development of keratinocytes and the establishment of a skin barrier were highly expressed in dunnart facial tissue with lower expression or no transcripts expressed across the mouse embryonic stages including IGFBP2, SFN, AQP3, HOPX, KRT17, KRT7, KRT8 and KRT78 (Fig5D, SupTable12). Keratin genes are also critical for the development of the mammalian mechanosensory system39,40. Krt17 expressing epidermal keratinocytes are necessary to establish innervation patterns during development and Krt8 and Atoh1 expression is required for the specification of the Merkel cells (touch sensory cells)40. KRT17, ATOH1 and KRT8 are all very highly expressed in dunnart compared to the low expression in mouse facial tissue. Other genes in the keratin family such as KRT35, KRT79, KRT4 and KRT80 did not vary greatly between mouse and dunnart (Fig5E). Additionally, we observed high expression of genes involved in development of the olfactory system including CCK, OTX1 and ISLR (expressed in olfactory epithelium)4144, and Ybx344 (expressed in the nasal epithelium; Fig5D, SupTable12). Muscle contraction genes such as TNNI245 and TNNT346,47 and genes involved in skeletal muscle development4851 (CAR3, ATP2A2, IGFBP6 and TRIM72) were upregulated in dunnart craniofacial tissue (Fig5D, SupTable12). The majority of conserved toolkit genes involved in embryonic development had consistent expression across mouse and dunnart (SupFig4). To explore the relationship between genes expressed in the dunnart face and temporal gene expression dynamics during mouse development, we categorized mouse gene expression into five distinct temporal patterns (SupFig5A). Each of these groups appeared to reflect biological processes occurring during development (SupFig5B). Although dunnart facial development more closely resembles approximately E12.55 in the mouse, dunnart expressed genes were associated with two distinct clusters: a set of genes upregulated specifically at E15.5 in mouse (cluster 2: OR = 1.30, CI = 1.15-1.46; SupFig5C; Fig6A) and a set of genes upregulated at E14.5 (cluster 3: OR = 1.99; CI = 1.78-2.24; SupFig5C; Fig6B). Cluster 2 genes were enriched for functions related to sensory perception, skin development and keratinization (Fig6C) while cluster 3 genes were enriched for muscle development (Fig6D). These results align with our observations in dunnart enhancer and gene expression data, suggesting that shifts in the developmental timing of the skin, muscle and sensory perception may play a role in marsupial early life history.

Dunnart expressed genes are associated with two gene clusters with distinct temporal expression patterns in the mouse.

(A) Genes in cluster 2 and cluster 3 plotted with their z-scaled temporal expression (logCPM). Color-coding represents membership value (degree to which data points of a gene belong to the cluster). Gene Ontology enrichment for biological processes enriched in (B) cluster 2 and (C) cluster 3 (FDR-corrected p<0.01).

Discussion

Marsupials display advanced development of the orofacial region and delayed development of the central nervous system when compared to placental mammals3,6. Specifically the facial skeleton and muscular tissues begin development early relative to the neurocranium differentiation3,6,28,5255. Although development of the central nervous system is delayed, marsupials have well-developed peripheral motor nerves and sensory nerves (eg. the trigeminal) at birth56,57. Despite increasing descriptions of craniofacial enhancers in model species15,5861, the genetic drivers of variation in craniofacial development between mammalian species is largely unexplored. We examined the similarities and differences in genomic regions marked by H3K4me3 and H3K27ac between the dunnart and mouse during early craniofacial development and incorporated comparisons of gene expression between species.

Given the large evolutionary distance (∼170 million years) between the mouse and dunnart62,63 and high turnover of non-coding DNA sequences64,65, regulatory elements in the dunnart were largely not sequence conserved in the mouse (∼8% for promoters and ∼0.5% of enhancers). Despite this, regions with conserved activity between the two species were predominantly near genes consistently expressed in both, and were enriched for core craniofacial developmental and signaling processes. Additionally, 54% of dunnart enhancers aligned to the mouse genome but lacking active marks in the mouse were associated with genes expressed throughout all mouse developmental stages, suggesting these genes might be regulated by different enhancers in mice. Moreover, despite differences in experimental methods, when taking a gene level approach there was significant overlap in genes near regulatory elements in both species, with high concordance in enriched biological process terms. The majority of craniofacial developmental genes had highly consistent gene expression levels across the mouse and dunnart, highlighting the substantial conservation of gene regulatory networks driving facial development. This supports the notion that conserved genes and signaling pathways are crucial for mammalian facial development66.

Although there was generally a high-level of concordance between the dunnart and mouse, we discovered dunnart-specific regulatory elements near highly-expressed genes in the dunnart that were lowly or not expressed in the mouse. These genes were related to three main developmental processes, “epidermis/skin development and keratinization”, “sensory system development” and “muscle development and contraction”. In marsupials, pharyngeal muscles are well differentiated before birth preceding development of both the central nervous system and the skeletal system67. Additionally, in the first two days immediately after birth, marsupial pouch young undergo considerable differentiation of the facial muscles67. This pattern varies significantly from rodents where the events of skeletal and muscular development generally occur simultaneously67. Consistent with this, we found that dunnart expressed genes were associated with a cluster of mouse genes highly expressed at E14.5, a key timepoint for myogenesis in the mouse68.

Our data also revealed a less evident developmental constraint experienced by marsupials, with dunnart-specific enhancer and gene expression data showing an enrichment of genes related to skin development. Newborn dunnarts exhibit high expression of KRT17 in the face and other genes essential for periderm establishment and maintenance6974. Unlike placental mammals, marsupials retain the periderm post-birth57,75,76, which may relate to neonatal transcutaneous gas exchange observed in marsupials in the first few days after birth77,78. In placental mammals such as mouse and human, the periderm only exists in utero and disappears late in gestation with the eruption of hair and the development of a thick keratinized stratum corneum69. The persistence of the periderm aligns with the observation that dunnarts rely heavily on gas exchange through the skin and only begin lung respiration around 5 days postpartum coinciding with the disappearance of the periderm77,79.

The recurrence of genes related to development of the sensory systems in our comparative analyses spotlights a highly unique aspect of early pouch life in marsupials. Newborn marsupial young require highly developed sensory systems (mechanosensory and olfactory systems) to respond to the cues that guide them to the teat inside the mother’s pouch, with the nasal-oral region making regular contact with the mother’s belly during the crawl to the pouch9,80,81. The snout epidermis of newborn marsupials has been shown to be innervated with presence of mature Merkel cells with connected nerve terminals7,8,80. We observe high expression of key genes involved in the development of Merkel cells and sensory placodes uniquely in the dunnart face. In the mouse, Merkel cell development doesn’t begin until later at approximately E16.582. Upregulation of Merkel cell genes in dunnart facial tissue could be a result of prioritization of sensory development required for life outside the womb in a highly underdeveloped state.

In summary, our work suggests that craniofacial developmental constraints in the dunnart may be driven by dunnart-specific gene regulation. Future studies will apply single-cell multi-modal technologies to dunnart tissues to investigate the specific regulatory differences between CNS and orofacial development.

Methods

Tissue Collection

All animal procedures, including laboratory breeding, were conducted in accordance with the current Australian Code for the Care and Use of Animals for Scientific Purposes83 and were approved by The University of Melbourne Animal Ethics Committee (AEC: 1513686.2) and with the appropriate Wildlife Permit (number 10008652) from the Department of Environment, Land, Water and Planning. Animals were housed in a breeding colony in the School of BioSciences, The University of Melbourne. For details on animal husbandry and collection of dunnart pouch young refer to Cook et al. 20215. Details of pouch young used in this study are presented in SupTable13. After removal of pouch young from the teat, young were killed by decapitation and craniofacial tissue dissected using insulin needles (Becton Dickinson). The tongue, neural tube and eye primordia were removed to limit tissue collected to the facial prominences. Specifically, the mandibular, maxillary and fronto-nasal prominences were collected and snap frozen in liquid nitrogen. All pouch young were determined to be <24 hours old but to account for variability in time since birth we scored pouch young based on head shape. Immediately after birth the dunnart has a flat neurocranium that by approximately 1 day after birth has begun to round (see Cook et al. 20215 for more details). We combined craniofacial tissue from 50 pouch young into two replicates ensuring that sex, head shape, and parentage were accounted for (SupTable13).

Immunofluorescence

We assessed the reactivity of the ChIP antibodies in the dunnart using immunostaining on dunnart head sections. Frontal head sections (7 µm thick) on superfrost slides (Platinum Pro, Grale) were deparaffinized and rehydrated according to standard methods84, followed by antigen retrieval using pH 8.8 unmasking solution (Vector) at 99°C for 30 minutes. We then incubated the sections with either rabbit anti-H3K4me3 primary antibody (1:500 dilution; Abcam ab8580) or rabbit anti-H3K27ac primary antibody (1:500, Abcam ab4729). Sections were then incubated with Alexa Fluor 555 donkey anti-rabbit antibody (1:500 dilution; Abcam in 10% horse serum in PBS with 0.1% Triton X-100; Sigma). All sections were counterstained with 300 nM DAPI to visualize cell nuclei. We observed no staining in the negative controls (no primary antibody). Images were captured using fluorescence microscopy (BX51 Microscope and DP70 Camera; Olympus) and processed in ImageJ v 2.0.085.

Chromatin immunoprecipitation and sequencing

Chromatin immunoprecipitation (ChIP) was performed using the MAGnify Chromatin Immunoprecipitation System (Thermo Fisher, 492024) according to manufacturer’s instructions. Briefly, frozen dunnart tissue samples (see SupTable13) were diced quickly with two razor blades in cold dPBS (Gibco) followed by crosslinking in 1% formaldehyde solution (Sigma) for 10 minutes. We then added 0.125 M glycine and incubated for 5 minutes at room temperature to neutralize the formaldehyde. Chromatin was fragmented to 300 bp average size by sonication on a Covaris S2 using the following parameters [duty cycle = 5%, intensity = 2, cycles per burst = 200, cycle time = 60 seconds, cycles = 10, temperature = 4°C, power mode = frequency sweeping, degassing mode = continuous]. In each ChIP experiment, we used sheared chromatin from each replicate for immunoprecipitation with antibodies against H3K4me3 (Abcam ab8580) and H3K27ac (Abcam ab4729). An input control was included for each replicate. DNA was purified according to kit instructions using a DynaMagTM-PCR Magnet (Thermo Fisher).

We assessed the success of the immunoprecipitation in dunnart craniofacial tissues by performing qPCR for primers designed to amplify genomic regions expected to be occupied by H3K4me3 and/or H3K27ac based on mouse and human enhancers active in facial regions of E11.5 mice in VISTA enhancer86 and GeneHancer87 (SupTable14). We also designed primers that amplify regions we predicted would be unoccupied by these histone modifications (enhancers active in heart tissue of E11.5 mice in VISTA enhancer browser86 (SupTable14). qPCR using SYBR Green Supermix (Bio-Rad Laboratories) was performed in triplicate on a QuantStudio™ 5 System (Thermo Fisher) as per manufacturer’s instructions (primers listed in SupTable14). The cycling conditions were [one cycle of 95°C for 15 s, followed by 40 cycles of 95°C for 15 s, 57°C for 30 s and 72°C for 30 seconds]. A dissociation curve was also generated for each primer pair. No-template controls were included in triplicate on each plate as a negative control. Regions expected to be enriched in the test sample were quantified by expressing the test sample as a fold change relative to a control sample (no antibody control).

Illumina sequencing libraries were prepared from ChIP-enriched DNA by GENEWIZ (Suzhou, China). Libraries were constructed following the manufacturer’s protocol (NEBNext UltraTMII DNA Library Prep Kit for Illumina). For each sample, a minimum of 10 ng of ChIP product was used and libraries were multiplexed and sequenced on an Illumina HiSeq 4000 instrument according to manufacturer’s instructions (Illumina, San Diego, CA, USA). Sequencing was carried out using a 2×150 paired-end configuration to an average depth of 57 million read pairs per sample.

Genome assembly and annotation

Previously there was no fat-tailed dunnart genome available to map ChIP-seq reads against. In order to generate a genome for dunnart, tissue was collected and four sequencing libraries were prepared following methods 88. Four libraries were generated to improve molecular complexity and genome representation of input DNA. These libraries were then sequenced using the following technologies: Illumina X Ten 2×150 bp, PacBio Sequel I CLR, ONT PromethION and ONT GridION (SupTable15).

Quality trimming and residual adaptor removal from dunnart Illumina libraries (Library 1) was performed using trimmomatic v0.3889 with options [PE SLIDINGWINDOW 5:20, MINLEN 75, AVGQUAL 30]. Contigs were assembled using 200 Gb of PacBio CLR subreads (Library 2) using Flye v2.790 with options [–iterations 4 –trestle –pacbio-raw –genome-size 3.0g]. Removal of redundant contigs and two rounds of short-read and long-read scaffolding were performed using Redundans 0.14a91 with options [--nogapclosing --limit 1.0]. Inputs for redundans scaffolding were short-insert paired-end reads (Library 1) and 6.5 gigabases of Oxford Nanotechnology reads corresponding to 2 libraries (Library 3 and Library 4). Scaffolds then underwent two rounds of polishing to improve base quality using Pilon v1.2392 with Illumina Library 1 as input and with options [--vcf --diploid --chunksize 10000000 --fix snps,indexls,gaps --minqual 15]. The resulting genome assembly had a total size of 2.84 Gb and an N50 length of 23 megabases (Mb). We used BUSCO v5.2.2 to assess genome completeness [v3.0.2, -l mammalia_odb10 -m genome]. BUSCO gene recovery was 89.9% for complete orthologs in the mammalia_odb10 lineage dataset which includes 9226 BUSCOs. Together, these metrics indicate that the assembly is of comparable completeness and contiguity to other recently-published marsupial genomes9399 and therefore represents an excellent resource for downstream functional genomic experiments. The resulting de novo assembly with a resulting scaffold N50 of 23 Mb and a total size of 2.84 (SupFile2) Gb making it comparable to other marsupial genomes93,9597. The G+C content of the de novo contigs in the dunnart (∼36.25%), was similar to other marsupial species (Tasmanian devil; 36.4%96, thylacine; ∼36%93 wallaby; 34-38.8%95, opossum; 37.8%97, woylie; 38.6%98 and the brown antechinus; 36.2%99.

Gene annotations from the high-quality genome assembly of the Tasmanian devil (Sarcophilus harrisii, GCF_902635505.1 - mSarHar1.11), which has a divergence time with the dunnart of approximately 40 million years, were accessed from NCBI and then lifted over to dunnart scaffolds using the program liftoff v1.0 100 with option [--d 4].

Whole genome alignment

To compute pairwise genome alignments between the mouse and dunnart, we used the mouse mm10 assembly as the reference. We first built pairwise alignments using Lastz and axtChain to generate co-linear alignment chains101, using the previously described Lastz parameters for vertebrates, [K = 2400 L = 3000 Y = 3400, H = 200] with the HoxD55 scoring matrix102. After building chains, patchChain102 was applied to extract all the unaligned loci and create local alignment jobs for each one. The new local alignments were combined with the original local alignments for an improved set of chains. We then applied chainCleaner103 with the parameters [-LRfoldThreshold = 2.5 -doPairs -LRfoldThresholdPairs = 10 -maxPairDistance = 10000 -maxSuspectScore = 100000 -minBrokenChainScore = 75000] to improve the specificity of the alignment. After generating an improved set of chains, we applied chainPreNet, chainNet and ChainSubset to filter, produce the alignment nets and create a single chain file using only the chains that appear in the alignment nets101. Alignment nets are a hierarchical collection of the chains that attempt to capture orthologous alignments101. Chain fragments were joined using chainStitchId and dunnart to mouse chains generated using chainSwap101 (SupFile3). For quality control, maf files were generated using netToAxt and axtToMaf101. Block counts, block lengths and pairwise divergence in the alignments were assessed using MafFilter104.

ChIP sequencing data analysis

First, we assessed the raw sequencing read quality using FastQC v0.11.9105. Raw data were processed by adapter trimming and low quality read removal using Cutadapt v1.9.1106 [-q 20 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGA AAGAGTGT --max-n 0.10 -m 75]. ChIP sequencing statistics for raw reads and trimmed reads are described in SupTable16. Sequencing reads were aligned to the dunnart genome with Bowtie2 v.2.3.5.1107 [-q -X 2000 --very-sensitive]. Unfiltered aligned reads from ChIP-seq experiments performed using mouse embryonic facial prominence for E10.5, E11.5, E12.5, E13.5, E14.5 and E15.5 were downloaded from ENCODE.org3638,108 (accession details described in SupTable17). For both dunnart and mouse aligned reads, low-quality and unpaired reads were removed using Samtools v.1.9109 [-q 30 -f 2] and duplicate reads removed by the MarkDuplicates tool from picard v.2.23.1 (http://broadinstitute.github.io/picard/). Mapping statistics and library complexity for dunnart and mouse reads are described in SupTable18 and SupTable19, respectively. Effective genome size for the dunnart was calculated using the [unique-kmers.py] script from khmer v.2.0110.

Peaks were called on the dunnart-aligned reads using MACS2 v.2.1.1111 [-f BAMPE] and the mouse aligned reads using MACS2 v.2.1.1 with default parameters for mm10, using total DNA input as control and retaining all statistically significant enrichment regions (FDR-corrected P < 0.01). Reproducible consensus peaks for biological replicates within a species were determined using the ENCODE3 [overlap_peaks.py] script36,38. Enriched regions were considered reproducible when they overlapped in two biological replicates within a species by a minimum of 50% of their length using bedtools intersect v2.29.2112. Peak-calling statistics for dunnart and mouse are described in SupTable20 and SupTable21, respectively. Similar to Villar et al. 201519, we overlapped H3K4me3 and H3K27ac reproducible peaks to determine promoter-associated peaks (marked by only H3K4me3 or both H3K4me3 and H3K27ac) and enhancer-associated peaks (marked only by H3K27ac). H3K4me3 and H3K27ac reproducible peaks were overlapped to determine genomic regions enriched for H3K4me3, H3K27ac or both marks using bedtools intersect v2.29.2112. Double-marked H3K4me3 and H3K27ac elements were defined as regions reproducibly marked by H3K4me3 and H3K27ac and overlapping by a minimum 50% of their reciprocal length and were merged with bedtools v2.29.2112. Mouse and dunnart peaks called on 10 million aligned reads can be found in SupFile1.

Gene annotations lifted over from the Tasmanian devil annotation were associated with ChIP-seq peaks using the default settings for the annotatePeak function in ChIPseeker v1.26.2113. Gene domains for promoter- and enhancer-associated ChIP-seq peaks were used for gene ontology analysis. Gene ontology analysis was performed using clusterProfiler v4.1.4114, with Gene Ontology and KEGG annotation drawn from the org.Mm.eg.db v3.12.0 database115, setting an FDR-corrected p-value threshold of 0.01 for statistical significance. In order to use the mouse Ensembl GO annotations for the dunnart gene domains, we converted the Tasmanian devil RefSeq IDs to Ensembl v103 IDs using biomaRt v2.46.3116,117, and then converted these to mouse Ensembl v103. In this way, we were able to assign Devil Ensembl IDs to 74% of genes with peaks, and mouse IDs to 95% of genes with a devil Ensembl ID. For calculating enrichment in GO and KEGG in the dunnart, the list of Tasmanian devil genes with an orthologous Ensembl gene in the mouse were used as the background list (SupFile1).

Short sequence motifs enriched in dunnart peaks were identified with Homer v4.11.127 using [findMotifsGenome.pl]. In this case random GC- and length-match sequences for all promoters and enhancers were used as the background set to test for enrichment compared to random expectation. Enriched motifs were clustered into Homer motif families27. The full list of enriched motif families can be found in SupFile1.

Mouse and dunnart peak comparison

Dunnart and mouse peaks called from normalized input reads were filtered to 50 bp regions centered on the peak summit. Dunnart peak coordinates were lifted over using liftOver101 to the mouse (mm10) genome and then back to the dunnart genome. Alignable peaks were kept if after reciprocal liftOver they had the same nearest gene call (SupTable22). Alignable peaks were then intersected with enhancer and promoters at each stage in the mouse with bedtools intersect v.2.30.0112 to assess peaks with conserved activity (SupTable23).

RNA-sequencing and analyses

The reads for three replicates were generated from facial prominences of P0 dunnart pouch young. Each replicate is a pool of two pouch young. Libraries were prepared using the Illumina stranded mRNA kit and were sequenced to a depth >=50M read pairs (i.e. 25M F + 25M R) in 2×100bp format. The raw reads were first trimmed using trimmomatic v0.39 89 [ILLUMINACLIP:2:30:10 SLIDINGWINDOW:5:15 MINLEN:50 AVGQUAL:20]. Reads were mapped with hisat2 v2.21118 [--fr --no-mixed --no-discordant] and alignments were filtered using samtools view109 [-f 1 -F 2316]. Finally the count table was generated using featureCounts from the Subread package v2.0.1 119 [-s 2 -p -t exon --minOverlap 10]. Mouse gene count tables were acquired from ENCODE (see SupTable17). Mouse and dunnart gene expression data were normalized for library size with edgeR v.4.0.16120 and lowly expressed genes filtered (>2 cpm) for at least two out of three replicates.

For mouse gene expression data the resulting gene list was tested for differential expressed genes with the [DBanalysis] function in TCseq package121 which implements edgeR to fit read counts to a negative binomial generalized linear model. Differentially expressed genes with an absolute log2 fold-change > 2 compared to the starting time-point (E10.5) were determined. The optimal division of clusters was determined using the Calinski criterion implemented with the [cascadeKM] function in the vegan v. package122 with the parameters [inf.gr = 1, sup.g = 10 iter = 1000, criterion = “calinski”]. Time-clustering for 5 clusters was performed using the [timeclust] function with the parameters: [algo = ‘cm’, k = 5, standardize = TRUE] which performs unsupervised soft clustering of gene expression patterns into five clusters with similar z-scaled temporal patterns. Gene Ontology enrichment was performed using [enrichGO] as part of the clusterProfiler v4.10.1114 with the 9933 genes present across the clusters as background and a false discovery rate of 1%.

Data availability

The data generated in this study have been deposited in NCBI’s Gene Expression Omnibus123 and are available through GEO Series accession number GSE188990. Accession IDs for previously published data sets from the ENCODE consortium3638 are given in SupTable17. Processed data is available in a figshare repository (Project 140843). All analyses described were carried out using custom bash, Python3 and R v4.1.0 scripts, and are available at github.com/lecook/chipseq-cross-species.

Acknowledgements

This research was supported by an Australian Government Research Training Program (RTP) Scholarship, the David Lachlan Hay Memorial Grant and an Australian Research Council Grant (DP160103683) awarded to A.J.P. St Vincent’s Institute acknowledges the infrastructure support it receives from the National Health and Medical Research Council Independent Research Institutes Infrastructure Support Program and from the Victorian Government through its Operational Infrastructure Support Program. The authors thanks Eva Suric, Tania Long and Darren Cipolla for their technical contributions and help with animal husbandry.

Author contributions

L.E.C, I.G.R and A.J.P designed and conceived the study. L.E.C collected the tissue samples, performed the ChIP experiments, analyzed the sequencing data and wrote the manuscript. D.M.V assisted with data analysis. J.H and C.Y.F collected tissue samples, performed RNA-sequencing experiments and initial analyses. C.Y.F performed the genome assembly and annotation. I.G.R and A.J.P supervised the study and gave editorial assistance. All authors reviewed and commented on the final manuscript.