Comparison of ATACseq data from embryonic and adult tissues of Parhyale hawaiensis

(A) Genome browser plot of ATACseq data, focused on the 3.5 Mb spanning Hox gene cluster of Parhyale hawaiensis. In whole embryos (E20-24), ATACseq peaks are distributed across the entire Hox cluster. In embryonic thoracic legs (EL1-3), ATACseq peaks in the region of anterior and posterior Hox genes are partly suppressed. In adult T4 and T5 legs (L1-3), peaks of open chromatin are mostly visible in the region of Ubx and suppressed in other parts of the Hox cluster. (B) Histogram showing the proportion of transcription start sites (TSS), exonic, intronic and intergenic sequences in the Parhyale genome that overlap with an ATACseq peak in each dataset. Only TSS and exons of intron-containing genes were used in this analysis, to exclude poorly annotated transcripts. (C) Principal Component Analysis of the ATACseq datasets, based on the number of reads assigned to each peak. PC1 and PC2 (accounting together for 44% of the variation) show the datasets clustering according to tissue of origin (whole embryos, embryonic legs, adult legs, marked in different colours). (D) Mapping of ATACseq reads to the sequences surrounding transcription start sites across the genome; 1 kb upstream to 1 kb downstream of each TSS is depicted in each line (half of the 27,955 TSS are shown, keeping the same order per column; see Methods). Colours represent the number of mapped reads (see Methods). We observe a clear enrichment of open chromatin surrounding the TSS.

Overview of bulk and single-nuclei ATACseq datasets from Parhyale hawaiensis

Further information on mapping statistics are given in Suppl. Tables 1 and 2.

Single nuclei ATACseq and differential chromatin accessibility across cell types

(A-C) UMAP of integrated snATACseq experiments SN1 and SN2, including 15,969 cells from adult uninjured Parhyale legs (7,951 cells from SN1 and 8,018 cells from SN2). UMAP was colour-coded based on experiment (A), cell clusters identified from the ATACseq signal (B), or cell types identified from previously published snRNAseq data (Almazán et al. 2022; see Methods, Suppl. Figure 2) (C). (D) UMAP of snRNAseq data published in (Almazán et al. 2022), using the same colour code. (E) Dot plots of the ATACseq signal at the putative CREs that we tested in vivo, per cell cluster.

Overview of short-read genome sequencing on three Parhyale species

Cross-species sequence conservation and its relation with chromatin

(A) Molecular phylogeny depicting the evolutionary relationships of the Parhyale species included in this study (P. hawaiensis, P. aquilina, P. darvishi and P. plumicornis), with Hyalella azteca as the outgroup. The phylogeny was based on 29,097 aligned nucleotides from conserved single-copy genes (BUSCO gene set; see Methods). Divergence estimates (in million years) were calibrated against the estimated divergence between Parhyale and Hyallella (Cannizzaro and Berg 2022). Scale bar represents number of substitutions per site. (B) Overlap between regions of sequence conservation in non-coding sequences (TSS, introns and intergenic regions) identified by mapping of reads from P. aquilina, P. darvishi and P. plumicornis onto the genome the P. hawaiensis (see Methods). Values in parentheses indicate the number of overlapping regions that would be expected if the conserved regions were distributed randomly. (C) Quantification of the fraction of ATACseq peaks in non-coding sequences that overlap a conserved region identified in P. aquilina, P. darvishi and P. plumicornis (see Methods). (D) Quantification of the fraction of ATACseq peaks in non-coding sequences that overlap a conserved region in either one, two or three of the other Parhyale species (see Methods). Colour code corresponds to panel B. Black dots in panels C and D represent the overlap that would be expected if the conserved regions were distributed randomly.

Identification of CREs driving ubiquitous expression

(A,B) Genome tracks showing the ATACseq and sequence conservation profiles that we used to identify the ubi1 and ubi2 promoters and associated CREs. (A’,B’) Fluorescence observed with the ubi1.P (A’) and ubi2.CRE+P (B’) reporters in live late stage embryos. Image A’ was captured on a fluorescence stereoscope, image B’ is a max projection of an image stack captured by confocal microscopy. (A’’,B’’) Fluorescence observed with the ubi1.P (A’’) and ubi2.CRE+P (B’’) reporters in live hatchlings. These images show unilateral genetic mosaics with expression only on one half of the animal, captured on a fluorescence stereoscope. Scale bars, 50 µm. The raw image data are available in Suppl. Data 8.

Putative CREs with ubiquitous, neuron- and muscle-specific activity

The transgenic reporters that we tested carry the promoter region (P) of the gene of interest and additional putative CREs upstream of the EGFP or mNeonGreen coding sequences. The fragments tested are shown in Figures 4A-B, 5A-C. The number of embryos screened, the number of embryos showing unilateral or bilateral expression of the Opsin1 transgenesis marker, and the number of embryos showing (mosaic) reporter expression are indicated. Genetic mosaicism results in different numbers of positives for the Opsin1 marker (eyes) and reporter expression (other tissues). CNS: central nervous system. The P and CRE sequences are given in Suppl. Data 7.

Identification of CREs driving expression in neurons and muscles

(A-C) Genome tracks showing the ATACseq and sequence conservation profiles that we used to identify the neuro5, neuro6 and Mhc neuron- and muscle-specific promoters and associated CREs. (A’-C’) Fluorescence observed with the neuro5-Src64B-mNeonGreen (A’), neuro6-Src64B-mNeonGreen (B’) and Mhc.CRE6+P-mNeonGreen (C’) reporters in late embryos. Max projections of image stacks captured by confocal microscopy. (A’’-C’’) Fluorescence observed with the of the neuro5.P (A’’), neuro6.P+CRE (B’’) and Mhc.CRE6+P (C’’) reporters in the thoracic legs of juveniles. Max projections of image stacks captured by confocal microscopy. Scale bars, 50 µm. The raw image data are available in Suppl. Data 8.

Sequence comparisons of known Parhyale CREs with homologous regions from Hyalella azteca

Dot plot alignments between previously characterised Parhyale regulatory fragments from the hsc70 (PhMS CRE, Pavlopoulos and Averof 2005), opsin 1 and opsin 2 genes (Ramos et al. 2019) and homologous regions of the Hyalella azteca genome (sequence accessions NW_025942174, NW_025945614 and NW_025930472, respectively; Poynton et al. 2018). Both putative opsin 1 genes of Hyallella were tested. The dot plots were made using the EMBOSS Dotmatcher tool (Rice et al. 2000), with low stringency settings (window size 10, threshold 30). Regions of sequence conservation should appear as diagonal lines. The only region of significant se quence similarity, in opsin 2, coincides with part of the coding sequence.

Label transfer scores from snRNAseq to snATACseq

(A) Distribution of scores for the transfer of cell labels from the snRNAseq (Almazán et al. 2022) to snATACseq (SN1 and SN2) data. (B) UMAP of the snATACseq experiments, color-coded according to the label transfer scores. Both panels show that cell identities of several cell types (including muscles, neurons, epidermis and haemocytes) can be transferred from RNAseq to ATACseq clusters with relatively high confidence from the RNAseq data.

Estimates of genome coverage in P. aquilina, P. darvishi and P. plumicornis

Distribution of per nucleotide sequence coverage, based on the sequences of single-copy BUSCO genes in P. hawaiensis, P. aquilina, P. darvishi and P. plumicornis (see Methods).

Putative CREs of Parhyale neuron-specific genes tested using transgenic reporters

Genome browser plots for 5 loci harboring putative neuron-specific CREs (highlighted in grey). The tracks of the other two loci that we tested, neuro5 and neuro6, are shown in Figure 5A,B.

Comparison of the activity of neuro5 and neuro6 reporters

Side view of a transgenic embryo carrying the neuro5>Src64B-mNeonGreen (in cyan) and neuro6-mScarlet3-HRas (in red) transgenes. Max projection of image stack captured by confocal microscopy. Scale bar, 50 µm.

Putative CREs of Parhyale developmental genes tested using transgenic reporters

Genome browser plots for Dll-e, dac1 and dac2, highlighting the promoter regions (P) and putative CREs that were tested (in grey). ATACseq, nDNA, and sequence conservation (P. darvishi, P. aquilina, and P. plumicornis) tracks were normalised by fragments per million and autoscaled to the highest peak in each region. For dac2, the y-axis maximum for whole embryo, embryo and adult leg tracks was halved to account for the large TSS peak, which otherwise masks the signal at putative CREs. Regions annotated as Ns are indicated by the N stretches track.

Chromatin accessibility and sequence conservation in previously identified Parhyale CREs

Genome browser plots showing the patterns of chromatin accessibility and cross-species sequence conservation in three genomic loci of Parhyale hawaiensis, that were previously shown to harbor active cis-regulatory elements: PhMS (hsc70 locus, Pavlopoulos and Averof 2005), PhOpsin1 and PhOpsin2 (Ramos et al. 2019); the three elements are highlighted in grey. We observe ATACseq peaks and overlapping islands of sequence conservation among Parhyale species in all three fragments. Note that the same elements showed no significant sequence conservation when compared with homologous regions from the more distant species Hyalella azteca (Suppl. Figure 1).

Read mapping of ATACseq data to the Parhyale hawaiensis genome

Statistics of ATACseq mapping using Bowtie2 to the P. hawaiensis genome, and subsequent filtering to remove mitochondrial reads, unpaired reads (except E24, which was sequenced single end), low mapping quality reads, and PCR duplicated reads. Paired-end reads were counted as two reads.

Number of nucleotides in ATACseq peaks in TSS, exon, intron and intergenic sequences

We determined the nucleotides of ATACseq peaks belonging to each of these genomic features: transcription start sites (TSS), exons, introns, and intergenic regions. Counts on TSS and exons were also determined for intron-containing genes only. Undetermined nucleotides (Ns) in the genome assembly were excluded from the counts.

Putative CREs of Parhyale developmental genes tested using transgenic reporters

The transgenic reporters tested carried the promoter region (P) of the gene of interest and additional putative CREs upstream of the mNeonGreen coding sequence. The fragments tested are shown in Suppl. Figure 6. In 6-10% of the embryos screened for Dll-e activity we observed expression patterns that were not reproducible; we interpret these as enhancer traps associated with specific sites of transgene insertion, rather than the activity of sequences carried in the reporter. The P and CRE sequences are given in Suppl. Data 7.