linc-mipep and linc-wrb encode micropeptides that regulate chromatin accessibility in vertebrate-specific neural cells

  1. Valerie A Tornini  Is a corresponding author
  2. Liyun Miao
  3. Ho-Joon Lee
  4. Timothy Gerson
  5. Sarah E Dube
  6. Valeria Schmidt
  7. François Kroll
  8. Yin Tang
  9. Katherine Du
  10. Manik Kuchroo
  11. Charles E Vejnar
  12. Ariel Alejandro Bazzini
  13. Smita Krishnaswamy
  14. Jason Rihel  Is a corresponding author
  15. Antonio J Giraldez  Is a corresponding author
  1. Department of Genetics, Yale University, United States
  2. Yale Center for Genome Analysis, Yale University, United States
  3. Department of Cell and Developmental Biology, University College London, United Kingdom
  4. Department of Computer Science, Yale University, United States
  5. Stowers Institute for Medical Research, United States
  6. Department of Molecular & Integrative Physiology, University of Kansas School of Medicine, United States
  7. Yale Stem Cell Center, Yale University School of Medicine, United States
  8. Yale Cancer Center, Yale University School of Medicine, United States

Abstract

Thousands of long intergenic non-coding RNAs (lincRNAs) are transcribed throughout the vertebrate genome. A subset of lincRNAs enriched in developing brains have recently been found to contain cryptic open-reading frames and are speculated to encode micropeptides. However, systematic identification and functional assessment of these transcripts have been hindered by technical challenges caused by their small size. Here, we show that two putative lincRNAs (linc-mipep, also called lnc-rps25, and linc-wrb) encode micropeptides with homology to the vertebrate-specific chromatin architectural protein, Hmgn1, and demonstrate that they are required for development of vertebrate-specific brain cell types. Specifically, we show that NMDA receptor-mediated pathways are dysregulated in zebrafish lacking these micropeptides and that their loss preferentially alters the gene regulatory networks that establish cerebellar cells and oligodendrocytes – evolutionarily newer cell types that develop postnatally in humans. These findings reveal a key missing link in the evolution of vertebrate brain cell development and illustrate a genetic basis for how some neural cell types are more susceptible to chromatin disruptions, with implications for neurodevelopmental disorders and disease.

Editor's evaluation

The study describes the discovery of two related micro-peptides that regulate zebrafish behavior by affecting chromatin accessibility in the embryonic brain. Zebrafish mutants lacking these micro-peptides show altered gene regulatory networks that preferentially affect oligodendrocytes and cerebellar cells in the embryonic brain. The data presented in the study is solid and presents convincing additional evidence for versatile functions of micro-peptides.

https://doi.org/10.7554/eLife.82249.sa0

Introduction

While most of the vertebrate genome is transcribed, only a small portion encodes for functional proteins. Much of the remaining transcriptome is comprised of non-coding RNAs, including thousands of predicted long intergenic non-coding RNAs (lincRNAs). Despite this large number of lincRNAs, the functional significance of most remains unclear (Goudarzi et al., 2019). Recent advances in ribosome profiling and mass spectrometry have identified short open-reading frames (sORFs) within putative lincRNA sequences that may encode micropeptides, which were otherwise missed due to their small size (<100 aa) (Bazzini et al., 2014; Chen et al., 2020; Ingolia et al., 2009; Kondo et al., 2007; Pauli et al., 2014; Couso and Patraquim, 2017). Despite conventional rules assuming that short peptides are unlikely to fold into stable structures to perform functions and subjective cut-offs (100 aa) used in computational identification of protein coding genes, there are several examples of these small peptides performing diverse, important cellular functions (Bi et al., 2017; Chen et al., 2020; D’Lima et al., 2017; Fields et al., 2015).

Many lincRNAs are expressed in a tissue-specific manner, and about 40% of all long noncoding RNAs identified in the human genome are specifically expressed in the central nervous system (Derrien et al., 2012; Ulitsky et al., 2011). The vertebrate central nervous system consists of some of the most diverse and specialized cell types in the vertebrate body and has distinct chromatin states and gene regulatory networks that have evolved to establish and maintain this diversity. Since many micropeptides have a relatively recent evolutionarily origin and, given their small size, may be able to access and regulate cellular machines inaccessible by larger proteins (Makarewich and Olson, 2017), the lincRNA tissue-specificity may indicate undiscovered roles in vertebrate-specific CNS development and function.

Evolutionarily recent micropeptides may contribute to vertebrate-specific functions and phenotypes that have otherwise been missed due to misclassification as non-coding transcripts and lack of high-throughput phenotyping for coding functions. We sought to identify micropeptides that were cryptically encoded in long non-coding RNAs but were missed due to assumptions about minimal protein sizes, dubious homologies, or mis-annotations. Here, we interrogate the function of predicted non-coding RNAs and identify two related micropeptides that regulate behavior, chromatin accessibility, and gene regulatory networks that establish evolutionarily newer neural cell types.

Results

Screen of long non-coding RNAs identifies micropeptide regulators of vertebrate behavior

To identify lincRNAs that may encode for micropeptides, we first analyzed ribosome profiles for previously published lincRNAs Ulitsky et al., 2011 in zebrafish embryos during early development (0–48 hr post-fertilization) (Bazzini et al., 2014), performed in situ hybridization on 21 of these candidates, and identified brain-enriched micropeptide candidates (Figure 1—figure supplement 1; Supplementary file 1). To identify the physiological role of ten of these putative micropeptides, we adapted an F0 CRISPR/Cas9 behavioral screening pipeline (Figure 1A; Kroll et al., 2021). CRISPR/Cas9 targeting efficiently induced a range of mutations in the targeted gene sequences, with inferred indel or large deletion rates with multiple guides estimated between ~40 and 100% per targeted locus, including frame-shift mutations (Figure 1—figure supplement 2; Supplementary file 1).

Figure 1 with 7 supplements see all
linc-mipep and linc-wrb loss-of-protein-function mutant larvae are behaviorally hyperactive.

(A) Left, schematic of F0 CRISPR knockout behavioral screen. Zebrafish embryos were injected early at the one-cell embryo stage with multiple sgRNAs and Cas9 targeting the ORF of candidate micropeptides encoded within putative lincRNAs. Right, schematic of behavior screening platform. Each well of a 96-well flat bottom plate contains one zebrafish larva (4–7days post-fertilization, dpf) from the same wild type (WT) clutch. Individual locomotor activity was tracked at 25 frames per second on a 14hr:10hr light:dark cycle. (B) Ribosome footprint of linc-mipep (also known as lnc-rps25) (top) or linc-wrb (bottom) at 5hours post fertilization (hpf) across annotated transcript length, with putative coding frames in green (+3), orange (+2), or blue (+1); input (control) on bottom tracks. Magenta asterisk marks predicted short open reading frame. RPF, ribosome-protected fragment. (C) Summary of mutagenesis strategy to decode transcript functions. Magenta bars denote CRISPR-targeted area. Mutated/removed sequence is in gray. TSS, transcription start site. sORF, short open reading frame. ATG, start codon. ncRNA, non-coding RNA. Right, phenotypes predicted (check mark) or not predicted (x mark) for each mutant if the gene functions as a regulatory region, noncoding RNA, or protein-coding gene. (D) Stable mutants for linc-mipep: full region deletion (1.78kb deletion, from intron 1 – proximal 3’UTR, top); translation start site deletion that removes the ATG sequence (middle); frameshift deletion (8bp deletion at exon 4, second from bottom); 74bp deletion that removes highly conserved 3’UTR sequence (bottom). (E) Stable frameshift mutant for linc-wrb (11bp deletion, exon 3). (F) Locomotor activity of linc-mipepdel-1.8kb/del-1.8kb;linc-wrbdel-11/del-11 (linc-mipep -/-; linc-wrb -/-, magenta); linc-mipepdel-1.8kb/+;linc-wrbdel-11/+ (linc-mipep +/-; linc-wrb +/-, black); and wild-type (linc-mipep +/+; linc-wrb +/+, blue) sibling-matched larvae over 2 nights. (G) Locomotor activity of wild type (WT, blue) or maternal-zygotic linc-mipepdel1.8kb/del1.8kb;linc-wrbdel11bp/del11bp (linc-mipep;linc-wrb, orange) larvae across two nights. The ribbon represents± SEM. Zeitgeber time is defined from lights ON = 0. (H) Representative daytime locomotor activity tracking of wild type (top 2 rows) and maternal-zygotic linc-mipepdel1.8kb/del1.8kb;linc-wrbdel11bp/del11bp (linc-mipep;linc-wrb, bottom 2 rows) larvae during 1min at 6 dpf. Blue and orange dots represent start and stop locations, respectively.

At 4–7 days post-fertilization (dpf), zebrafish display a repertoire of conserved, stereotyped baseline locomotor behaviors across day:night cycles (Prober et al., 2006; Rihel et al., 2010; Kroll et al., 2021). To quantitatively track locomotor activity of wild type and F0 mutant fish, single larvae from each condition were placed into individual clear wells of a clear 96-square well flat plate, then placed on a tracking platform that detects the change in pixels per frame for each well, between 4 dpf and 7 dpf (Figure 1A). We measured daytime and nighttime behavioral parameters, calculated the deviation (Z-score) of each F0 mutant larva from its wild type siblings, generated ‘behavioral fingerprints’ (Figure 1—figure supplement 3A), and measured the Euclidean distance between each larva and the mean fingerprint of its wild type siblings (Figure 1—figure supplement 3B).

This screen identified two candidate genes, linc-mipep and linc-wrb, that had a specific daytime hyperactivity phenotype and correlated behavioral fingerprints (r=0.67) when mutated in the ORF identified by ribosome footprints (Figure 1B; Figure 1—figure supplement 3C; Figure 1—figure supplement 4). Sequence analysis revealed that linc-mipep (current nomenclature si:ch73-1a9.3, ENSDART00000158245, also called lnc-rps25) and linc-wrb (current nomenclature si:ch73-281n10.2, ENSDART00000155252) (Bazzini et al., 2014; Ulitsky et al., 2011; Figure 1—figure supplement 5A, B) both had homology in their sORFs’ exon structure (Figure 1—figure supplement 5A, B) and mRNA sequences (BLAST identity score = 72%) (Supplementary file 1), as well as a highly conserved element (92% identical sequence) in their non-coding sequences (Figure 1—figure supplement 5C). While both linc-mipep and linc-wrb were originally identified as long non-coding RNAs, both genes have ribosome-protected fragments, suggesting they are likely encoding proteins 87aa and 93aa in size, respectively (Figure 1B; Figure 1—figure supplement 5D, E). In situ hybridization and RNA-sequencing revealed that transcripts for both genes are expressed throughout embryogenesis, through 5 dpf (Figure 1—figure supplement 5F, G). These results indicate that linc-mipep and linc-wrb might encode redundant or paralogous genes functioning as either lincRNAs or micropeptide-encoding genes involved in behavior.

linc-mipep and linc-wrb encode for related micropeptides that regulate zebrafish behavior

Although linc-mipep and linc-wrb are transcribed and likely translated (Figure 1B; Figure 1—figure supplement 5D–F), ribosome profiling data is insufficient to distinguish between pervasive background translation and translation of functional proteins. For example, these could represent sORFs within enhancer RNAs or in noncoding RNAs that have acquired an ORF but yield a nonfunctional protein. Thus, to distinguish whether linc-mipep and linc-wrb function as regulatory DNA, noncoding RNA, or protein coding genes, we used CRISPR-Cas9 gene editing to generate stable deletion mutants that either target the full sequence, the translation start site, the putative coding region, or the conserved untranslated/non-coding region (Figure 1C–E ; Figure 1—figure supplement 6). Examining the behavioral profile of these mutants identified a consistent and specific increase in locomotor activity during the daytime in all mutants affecting the ORF for both linc-mipep and linc-wrb (Figure 1—figure supplement 6A–D). In contrast, deleting the highly conserved element in the untranslated region in linc-mipep, which could encode a conserved lincRNA sequence, did not result in any detectable morphological or behavioral phenotypes (Figure 1—figure supplement 6E).

First, we asked whether the coding part of these genes is necessary. Start codon mutations in linc-mipep (zygotic or maternal-zygotic linc-mipepATG-del6) resulted in a similar daytime hyperactivity phenotype as frameshift mutations (linc-mipepdel8) or deletion of most of the linc-mipep region (linc-mipepdel1.8kb) (Figure 1—figure supplement 6A–C, Figure 1—figure supplement 7A). These results indicate that the observed phenotypes are the result of protein coding function of linc-mipep rather than a non-coding transcript or a regulatory DNA sequence function. Double linc-mipepdel1.8kb; linc-wrbdel11 homozygous mutants display even higher daytime locomotor hyperactivity levels compared to linc-mipep; linc-wrb heterozygous or wildtype larvae (Figure 1—figure supplement 7C), with no significant changes in nighttime activity (Figure 1F), a phenotype that is maintained if we remove the maternal contribution in maternal-zygotic (MZ) linc-mipepdel1.8kb; linc-wrbdel11 animals (Figure 1G and H; Figure 1—figure supplement 7B). Each additional loss of a copy of either gene generally results in higher hyperactivity levels (Figure 1—figure supplement 7C), suggesting that these genes may work together in a dose-dependent manner.

Next, we asked whether the coding part of these genes is sufficient to drive behavior. To determine that the behavioral phenotypes observed in mutants result from the loss of coding function, we generated transgenic zebrafish that ubiquitously express the coding sequence (CDS) of linc-mipep and tracked their behavior (Figure 2A and B). The sORF encoded in linc-mipep was able to rescue the hyperactivity phenotypes in linc-mipep mutants (Figure 2C and D) without significant changes to wild type activity levels (Figure 2D) or in nighttime activity (magnified, Figure 2C). Moreover, linc-mipep expression was able to rescue the hyperactivity of linc-wrb heterozygous mutants to almost wild type levels (Figure 2—figure supplement 1A, B), suggesting that these proteins share properties that can rescue loss of the other.

Figure 2 with 6 supplements see all
linc-mipep and linc-wrb encode proteins with homology to human HMGN1.

(A) Top, diagram of transgenic linc-mipep overexpression construct. Transgenic lines were established via Tol2-mediated integration of 3.5kb ubiquitin B (ubb) promotor driving the linc-mipep coding sequence with a FLAG and HA tag at the C-terminus, followed by a T2A self-cleaving peptide, mCherry reporter, and SV40 polyA tail. Bottom, fluorescent and brightfield images of 5 dpf zebrafish siblings either without overexpression (wild type, mCherry-negative, left) or with linc-mipep overexpression (mCherry-positive, right). (B) Activity plot of wild type (mCherry-negative, blue) or linc-mipep overexpression (Tg(ubb:linc-mipep) mCherry-positive, orange) siblings at 6 dpf. n=48 per genotype. Average day activity p=0.028, one-way ANOVA. (C) Locomotor activity of linc-mipep mutants, with or without transgenic linc-mipep overexpression (Tg(ubb:linc-mipep CDS-T2A-mCherry), ‘rescue’), sibling-matched larvae over 24hr. Inset, no effect on nighttime activity. (D) Average waking activity of 6 dpf linc-mipep mutant, heterozygous, or wild type larvae, with (denoted by +) or without (denoted by -) linc-mipep transgenic rescue. Each dot represents a single fish, and crossbars plot the mean ± SEM. p Values from a Dunnett’s test, using wild type (linc-mipep +/+) as the baseline condition. (E) Amino acid sequences of linc-mipep (top), linc-wrb (bottom), and human Hmgn1 (middle). Conserved amino acids are denoted in blue (if conserved between two sequences) or magenta (if conserved across the three sequences). Conserved functional domains for Hmgn1 are denoted (NLS, nuclear localization signal; Nuclear Binding Domain; RD, Regulatory Domain; and CHUD, Chromatin Unwinding Domain). (F) Locomotor activity of linc-mipep mutants, with or without transgenic human Hmgn1 overexpression (Tg(ubb:hHmgn1CDS-T2A-mCherry), ‘rescue’), sibling-matched larvae over 24hr. Inset, no effect on nighttime activity. (G) Average waking activity of 6 dpf linc-mipep mutant (-/-) or wild type (+/+) larvae, with (denoted by +) or without (denoted by -) human Hmgn1 transgenic rescue. Each dot represents a single larva, and crossbars plot the mean ± SEM. p Values from a Dunnett’s test, using wild type (linc-mipep +/+) as the baseline condition.

Finally, to confirm and visualize the protein encoded by linc-mipep and linc-wrb, we developed custom antibodies (Figure 2—figure supplement 2A, B). The protein product of both transcripts are detected in developing wild type embryos and larvae (Figure 2—figure supplement 2C–K). We find that these proteins are expressed throughout early development, with stronger staining and broader expression pattern for the protein encoded by linc-mipep compared to that of linc-wrb (Figure 2—figure supplement 2G, H). We note nonspecific staining of the Linc-wrb antibody in embryos and in likely endothelial cells throughout early development, as staining is still detected in these cells in linc-wrb mutants (Figure 2—figure supplement 2I, K). We further observed that the protein products of both transcripts are enriched in non-dividing wild-type nuclei (Figure 2—figure supplement 2J, K) and absent in linc-mipep;linc-wrb loss-of-function mutant embryos (Figure 2—figure supplement 2J, K) and larval brains (Figure 2—figure supplement 2L, M). Together, these results indicate that linc-mipep and linc-wrb encode for nuclear-localized micropeptides that have a dosage effect to regulate locomotor activity and behavior in zebrafish.

Vertebrate-specific evolutionary and functional conservation of proteins encoded by linc-mipep and linc-wrb

Protein BLAST of both Linc-mipep (87aa) and Linc-wrb (93aa) ORFs identified conserved sequences across teleosts and other vertebrates, including humans, with homology to non-histone chromosomal protein HMG-14, or High Mobility Group N1 (HMGN1), and the related HMG-17/HMGN2 protein (Supplementary file 2; Bustin, 2001). Whereas the cDNA sequence showed some mild conservation (Figure 2—figure supplement 3A), the highly conserved proximal 3’UTR elements instead allowed us to identify homologous predicted lincRNAs, unannotated genes, pseudogenes, and HMGN1 genes across vertebrate species spanning over 450 million years (Figure 2—figure supplement 3B–C; Figure 2—figure supplement 4; Supplementary file 2; Kumar and Hedges, 1998).

We first identify that linc-wrb is syntenic to human Hmgn1 (Figure 2—figure supplement 5D–F). To identify the evolutionary origin of this gene, we traced back the synteny for sequences or expressed sequence tags (ESTs) that were identified between flanking genes that are syntenically conserved with humans, Get1/Wrb and Brwd1 (Figure 2—figure supplement 5E). Through these analyses, we were first able to identify an unannotated ORF in the basal agnathan (jawless vertebrate) lamprey, syntenic to human HMGN1, that encodes for an ancestral protein more similar to human HMGN2 (Figure 2—figure supplement 5B; Figure 2—figure supplement 3E). Though we did not identify any linc-mipep or linc-wrb protein-coding homolog in invertebrates (Supplementary file 2), in line with previous results (Johns, 1982), we did identify an ORF syntenic to linc-wrb in the invertebrate basal chordate lancelet (or amphioxus) genome (Figure 2—figure supplement 5A). When we analyzed whether there were any similarities between the sequence of this APEX1-like gene in the lancelet genome (Figure 2—figure supplement 5A), we found by BLAST that its N-terminal sequence (30aa) aligns to HMGN family members in various vertebrate species (Figure 2—figure supplement 3D). These results suggest that the N-terminal sequence of the gene in the ancestral location that would give rise to linc-wrb and human HMGN1 may have been co-opted to give rise to the HMGN gene and pseudogene families in vertebrates.

We next searched for the evolutionary origins of linc-mipep. The highly conserved 3’UTR suggested that linc-mipep and linc-wrb derived from the same ancestral gene, either before or after the teleost-specific genome duplication. To address this question, we analyzed the regions syntenic to human HMGN1 in spotted gar, a slowly evolving species whose lineage diverged from teleosts before the teleost genome duplication, and in coelacanth, a lobe-finned fish with the slowest evolving bony vertebrate genome that split from ray-finned fish such as gar and zebrafish (Braasch et al., 2016). In coelacanth, we only identified one protein sequence that aligns to HMGN1, syntenic to human HMGN1, with no ESTs or other sequences identified elsewhere (Supplementary file 2). In spotted gar, we found the gene syntenic to linc-wrb and human Hmgn1 (Figure 2—figure supplement 5C). Although we did not identify a gene syntenic to linc-mipep, we did identify the appearance of both Mipep next to Brwd1, and of Igsf5 next to Sh3bgr (Figure 2—figure supplement 5C). When analyzed compared to the genomic location of linc-mipep in zebrafish (Figure 2—figure supplement 5D), we suggest that linc-mipep may have resulted from a gene duplication of linc-wrb into the neighboring IGSF5 intronic region, which then rearranged to land next to Mipep in the teleost genome duplication (compare Figure 2—figure supplement 5C, D). We found that linc-mipep has been maintained in other teleost fish species (Figure 2—figure supplement 5; Figure 2—figure supplement 6). Together, these findings suggest that linc-mipep arose from a gene duplication from linc-wrb, and that linc-wrb arose from what we identify here as the basal HMGN gene in agnathan lineages.

Finally, to understand whether the proteins encoded by linc-mipep and linc-wrb share common functions with human Hmgn1, we asked whether the human HMGN1 homologous protein can rescue the hyperactivity of linc-mipep and linc-wrb mutants. We generated transgenic zebrafish that ubiquitously express the coding sequence (CDS) of human HMGN1 in each mutant background. Human HMGN1 was able to rescue the hyperactivity phenotypes in linc-mipep mutants (Figure 2F and G), without significant changes in nighttime activity (magnified, Figure 2F). We were unable to rescue the linc-wrb mutant phenotype with human HMGN1 (Figure 2—figure supplement 1D). These data suggest that genes encoded within linc-mipep and linc-wrb have some functional homology with each other, and that at least the protein encoded by linc-mipep has functional homology with, and can be rescued by, human HMGN1. Based on these results, we propose renaming linc-wrb as hmgn1a, and linc-mipep as hmgn1b, as their official nomenclature.

linc-mipep; linc-wrb mutants have dysregulation of NMDA receptor-mediated signaling and immediate early gene induction

To gain insight into pathways regulated by linc-mipep and linc-wrb, we analyzed the behavioral fingerprints of each mutant compared to zebrafish larvae treated with 550 psychoactive drugs that affect different pathways (Rihel et al., 2010). We used hierarchical clustering (Rihel et al., 2010) to identify drugs that elicit a similar behavior to the linc-mipep and linc-wrb mutants (i.e. drugs that phenocopy across multiple day-night behavioral measurements) (Figure 3—figure supplement 1A, B, overlapping hits in blue text). We found that linc-mipep mutant behaviors most resembled those of WT fish treated with an NMDA receptor antagonist (Figure 3A), suggesting that NMDA signaling may be reduced in linc-mipep mutants. The linc-mipep and linc-wrb mutant phenotypes also resembled that of WT fish treated with glucocorticoid receptor activators (Figure 3A, Supplementary file 3), suggesting that downstream glucocorticoid signaling may be upregulated in the mutants.

Figure 3 with 3 supplements see all
linc-mipep mutants have dysregulation of NMDA receptor-mediated signaling and immediate early gene induction.

(A) Left, hierarchical clustering of the linc-mipep del-1.8kb (schematic of mutation at top) behavioral fingerprints (right), compared with the fingerprints of wild-type zebrafish larvae exposed to 550 psychoactive agents from 4 to 6 dpf (Rihel et al., 2010). The Z score, defined as the average value (in standard deviations) relative to the behavioral profiles of WT exposed to DMSO, is represented by each rectangle in the clustergram (magenta, higher than DMSO; cyan, lower than DMSO). The linc-mipep del-1.8kb fingerprint correlates with agents that induce daytime activity (‘‘Correlating Drugs’’). Right, compounds ranked according to correlation with the linc-mipep del-1.8kb fingerprint, with biological target(s) noted in last column. (B) Locomotor average activity of wild-type larvae treated with DMSO (WT, blue) or with 10μM NMDA receptor antagonist L-701,324 (magenta), and linc-mipep del-1.8kb/del-1.8kb larvae treated with DMSO (linc-mipep, green) or with 10μM L-701,324 (purple); sibling-matched larvae tracked over 24hr. (C) Average activity (day 6) of WT larvae treated with DMSO or 10μM L-701-324, compared to linc-mipepdel-1.8kb/del-1.8kb larvae treated with DMSO or 3μM, 10μM, or 30μM L-701-324. Each dot represents one fish. L-701–324 has a strong effect in the wild type animals but not in the mutants (P=0.05, DrugXGenotype interaction, two-way ANOVA). Key p-values are shown based on Tukey’s post-hoc testing. (D) Heatmaps (left) and density plots (right) showing chromatin accessibility (omni-ATAC-seq, average of three replicates) profiles of 2167 regions globally with lower accessibility in linc-mipep; linc-wrb mutant brains at 5 dpf compared to wild type (WT) brains (top), or 1220 regions globally with higher accessibility in linc-mipep; linc-wrb mutant brains at 5 dpf compared to wild type (WT) brains. Heatmaps are centered at the summit of the Omni-ATAC peak with 500bp on both sides and ranked according to global accessibility levels in WT. (E) Transcription factor (TF) motifs enriched in up-regulated and down-regulated regions (in D), relative to unaffected regions (in Figure 3—figure supplement 3).

The identified drugs may alter either common or parallel pathways as loss of linc-mipep. To distinguish between these possibilities, we first assessed the effect of glucocorticoid receptor agonist flumethasone on linc-mipep mutant behavior. These treatments further exacerbated the daytime locomotor activity of linc-mipep-/- larvae above the control-treated linc-mipep mutant levels (Figure 3—figure supplement 2A), with higher nighttime activity levels in linc-mipep mutants treated with flumethasone (Figure 3—figure supplement 2B). Since both the daytime and nighttime effects of glucocorticoids were much stronger in the mutants than in similarly treated wild type controls, linc-mipep mutants are sensitized to glucocorticoid signaling. We found similar glucocorticoid sensitivity in linc-wrb mutants (Figure 3—figure supplement 2C, D).

Next, to test the NMDA receptor pathway, we compared the response of WT and linc-mipep mutant to L-701–324, an NMDA receptor antagonist at the glycine binding site. L-701–324 elicited a daytime locomotor hyperactivity in WT larvae to a level that was similar to that of linc-mipep mutant larvae and linc-mipep larvae treated with L-701–324 (Figure 3B). Yet, treatment with higher doses of L-701–324 did not affect or exacerbate the activity levels in linc-mipep mutants (Figure 3C). We found similar results with linc-wrb mutants treated with L-701–324 (Figure 3—figure supplement 2E, F). These non-additive results indicate that NMDA receptor antagonism and mutations in linc-mipep and linc-wrb share a common mechanism for inducing hyperactivity.

linc-mipep and linc-wrb regulate chromatin accessibility for transcription factors modifying neural activation

Given that linc-mipep and linc-wrb have protein domains with homology to nucleosome binding and chromatin unwinding domains of HMGN1 (Cuddapah et al., 2011; Deng et al., 2013), and given that both NMDA antagonism and glucocorticoid signaling alter immediate early gene expression, we hypothesized that the daytime hyperactivity might be due to altered chromatin accessibility in the mutants. To test the effect of full loss-of-function of both related proteins encoded by linc-mipep and linc-wrb on chromatin accessibility, we performed omni-ATAC-seq (Corces et al., 2017) at 5 dpf comparing WT and double mutant brains (Figure 3—figure supplement 3A, B).

We first observed a broad dysregulation of chromatin accessibility, with 2167 regions losing accessibility and 1220 regions gaining accessibility in linc-mipep;linc-wrb mutant brains (Figure 3D; Supplementary file 4), with most regions remaining unchanged (Figure 3—figure supplement 3C). CTCF/L transcription factor (TF) motifs were enriched in regions that lost accessibility, suggesting a possible dysregulation of 3D chromatin structure (Figure 3E). Enriched TF motifs at regions that lost accessibility were members of the ATF (activating transcription factor)/CREB (cAMP responsive element binding proteins) family, and AP-1 transcription factor components (Figure 3E, left panel). TFs binding at these motifs regulate the expression of immediate early response genes (IEG) such as c-fos, c-jun, and c-myc (Sheng and Greenberg, 1990). We confirmed reduced c-fos transcription in linc-mipep;linc-wrb brains at this timepoint by in situ hybridization and by qPCR (Figure 3—figure supplement 3D). We also found that the motifs for the glucocorticoid modulatory element binding protein 2 (GMEB2), and for interferon-stimulated transcription factor 3, gamma (ISGF3G, also called IRF-9), were enriched in regions that lost accessibility in linc-mipep; linc-wrb mutants. On the other hand, TFs most enriched in regions that gained accessibility were KLF/SP family members, which promote stem cell pluripotency and are downregulated during differentiation, and EGR family members (Figure 3E, right panel; Yamane et al., 2018). Altogether, these results indicate that linc-mipep; linc-wrb have altered accessibility for TF binding sites, which modify the expression of genes involved in neural activation.

Evolutionarily newer vertebrate brain cell types are more susceptible to loss of linc-mipep and linc-wrb

Our molecular analyses of wild-type and mutant brains point to gene regulatory networks involved in global transcription rather than neural cell type-specific TFs. We hypothesize that the observed hyperactivity may instead be a result of defects in cells most susceptible to loss of linc-mipep and linc-wrb. To test this hypothesis, we used single-cell multiomics (transcriptomic and chromatin accessibility) and determined how single cell states are affected in mutant brains compared to sibling-matched WT brains at 6 dpf (Figure 4a). To circumvent batch effects from unmatched (non-sibling) samples that may skew single-cell analyses, and because our results so far indicated generally overlapping functions for linc-mipep and linc-wrb, we chose to analyze linc-mipep mutant brain cells and then to validate findings in vivo in linc-mipep; linc-wrb double mutants.

Figure 4 with 9 supplements see all
Evolutionarily newer vertebrate cell types are more susceptible to loss of linc-mipep and linc-wrb proteins.

(A) UMAP representation of WNN analyses of wild type (n=6,942 nuclei) and linc-mipep del-1.8kb/del-1.8kb (n=7740 nuclei) mutant brains at 6 dpf. Identified cell types as labeled. (B) PHATE plot of integrated diffusion analysis of 6 dpf linc-mipep del-1.8kb/del-1.8kb mutant or WT sibling brain nuclei, color-coded by mutant likelihood score as computed by MELD using Integrated Diffusion operator. (C) Integrated diffusion analysis on identified cell types from 6 dpf wild type (orange) and linc-mipep del-1.8kb/del-1.8kb (blue) brains. Each dot represents a single cell, with mutant likelihood score across X-axis. Most wild type- or mutant-like groups noted with an asterisk. Cell types are clustered by known marker genes as defined in Supplementary file 6. (D) Schematic of analysis to identify most differentially accessible peaks between WT and linc-mipep del-1.8kb/del-1.8kb mutant brain nuclei from merged Weighted Nearest Neighbors (WNN) clusters. The most statistically significant changes in chromatin accessibility peaks were identified by the Wilcoxon rank sum and the Kolmogorov-Smirnov (KS) one-tailed tests methods on intensity distributions of each peak in WT and mutant samples, for either wild type or mutant differentially expressed genes per cluster, and for transcription factor (TF) motif overrepresentation by genotype in each cluster.(E) Statistically significantly different chromatin accessibility peaks between 6 dpf wild type (WT, blue) and linc-mipep del-1.8kb/del-1.8kb mutant (red) nuclei in the cerebellar granule cells cluster. Each column is one nucleus. Color scale, peak intensity (blue, more accessible). (F) Statistically significantly different chromatin accessibility peaks between 6 dpf wild type (WT, blue) and linc-mipep del-1.8kb/del-1.8kb mutant (red) nuclei in the oligodendrocyte progenitor cells (OPCs) cluster. Color scale, peak intensity (blue, more accessible). (G) Left, lateral view confocal images (Z-stack) from Tg(olig2:GFP) brains in wild type (left) or linc-mipep; linc-wrb double mutant (right) backgrounds at 6 dpf, stained with GFP (olig2+, green) and acetylated alpha-tubulin (magenta). A, anterior; P, posterior; D, dorsal; V, ventral. Right, quantification of intensity ratio of GFP+/DAPI signal of whole brain normalized to WT. One-tailed t-test, P=0.0053. (H) Select differentially regulated genes, down- or up-regulated per each cerebellar granule cells, OPCs, or Purkinje cells cluster. Full list of genes is presented in Supplementary file 5.

First, we used Weighted Nearest Neighbors (WNN) (Hao et al., 2021) on transcriptomic and chromatin accessibility data from both linc-mipep mutant and WT nuclei all pooled together. This analysis identified 43 clusters (Figure 4A, Figure 4—figure supplement 1A–C; Supplementary file 2). linc-mipep transcripts were detected in all WT clusters except microglia, with a slight enrichment in Purkinje cells, the inhibitory projection neurons of the cerebellum (Figure 4—figure supplement 1D, E), raising the possibility that these cells may be more affected in linc-mipep mutant brains. linc-wrb was detected in all WT clusters except cranial ganglia and ventral habenula cells, and was broadly expressed at lower levels than linc-mipep transcripts (Figure 4—figure supplement 1D, E). Each cluster was comprised of both WT and linc-mipep mutant cells, indicating that there was no complete absence of any cell type in mutants. In linc-mipep mutant cells, we note that an almost complete loss of expression of linc-mipep was observed in all clusters, without major changes in linc-wrb levels (Figure 3—figure supplement 2A).

Next, to identify the brain cells most significantly affected by linc-mipep mutations, we used Multiscale PHATE/Integrated Diffusion (Figure 4B and C, Figure 4—figure supplement 3A, B; Kuchroo et al., 2022; Kuchroo et al., 2021). This approach measures the effect of linc-mipep loss on cellular states by calculating the relative likelihood that any sampled cell state would be observed in either WT or mutant cells. When we analyzed the ‘Mutant Likelihood Score’ (from Figure 4B) for each cell by its respective cluster, we found that differentiating neuronal progenitor, glial progenitor, and cerebellar granule (excitatory) cell states were more likely to be represented in WT brains, while oligodendrocyte progenitor cell (OPC) states were more likely to be represented in linc-mipep-/- mutant brains (Figure 4C, asterisks; Figure 4—figure supplement 3A, B; Supplementary file 6). Indeed, we find a subcluster of oligodendrocyte progenitor cell states much more likely to be found in linc-mipep-/- samples (Figure 4C, dashed box). These data indicate that linc-mipep preferentially regulates oligodendrocyte and cerebellar cell states during development. Interestingly, these cell states correspond to evolutionarily newer vertebrate brain cell types (Lamanna et al., 2022).

In wild-type brains, Linc-mipep and Linc-wrb proteins are expressed throughout the brain. Linc-wrb antibody staining reveals an even expression pattern across the brain at 5 dpf, including in the cerebellar region and in olig2:GFP+ OPCs (Figure 4—figure supplement 4C, H–K). We found that Linc-mipep is more weakly expressed in the torus longitudinalis and tegmentum (as in Figure 4—figure supplement 4D–G) compared to Linc-wrb staining. These data suggest that both Linc-mipep and Linc-wrb are expressed in olig2 + cells and throughout the cerebellum.

To determine why the mutation affected these particular cell types, we next asked whether loss of linc-mipep caused any significant changes in chromatin accessibility and gene expression in single cell types (Figure 4D; Supplementary file 5). While all cell types are present in both mutant and wild-type brains, we found a strong dysregulation of chromatin accessibility and gene expression within multiple cell types (Figure 4—figure supplement 3C–E; Supplementary file 5; Supplementary file 8). When we examined TF motifs in regions of differential chromatin accessibility in each cluster, we found that linc-mipep regulates accessibility for key neurodevelopmental transcription factor families, including Sox, Stat, and Zic family members, in radial glial cells (clusters 3 and 7) and other cells; Esrra/b in midbrain glutamatergic neurons (cluster 13); and Egr and NeuroD members across various cell types (Figure 4—figure supplement 5A).

We then examined each cell type of interest more closely to better define cell type-specific changes, starting with cerebellar cell types. In cerebellar granule (excitatory) cells, we found 989 regions where chromatin accessibility was strongly dependent on linc-mipep function (645 regions with decreased accessibility, and 344 regions with increased accessibility) (Figure 4E). We specifically assessed linc-mipep granule cells and found that they lost accessibility at motifs known to bind the transcription factors Bhlhe22, Hic1, which is expressed in mature cerebellar granule cells and transcriptionally represses Atoh1 (Briggs et al., 2008), Neurod2, which required for survival of granule cells (Miyata et al., 1999), and Nfia, and gained accessibility at binding sites for Gfi1b and Nfatc1 (Figure 4—figure supplement 5B). Purkinje (inhibitory) neurons also showed significant differences in chromatin accessibility (Figure 4—figure supplement 3E), losing accessibility at motifs known to bind the transcription factor Gbx2, which is required for cerebellar development (Figure 4—figure supplement 5C; Wassarman et al., 1997). Furthermore, examining single-cell expression data, we found that, compared to wild-type Purkinje cells, linc-mipep mutant Purkinje cells exhibited a significant decrease in the expression of numerous genes, including roraa, rorb, foxp4, and prkcg, which are required for maturation or maintenance of Purkinje cells in zebrafish (Figure 4H; Supplementary file 5; Takeuchi et al., 2017). Consistent with these results showing cerebellar cell types are affected, Pol II ChIP-seq in 5 dpf brains showed that genes involved in cerebellar development, including zic2a, ascl1b, and atxn3, have reduced RNA Polymerase II binding in mutant brains (Supplementary file 7).

We next asked whether loss of linc-mipep caused any significant changes in chromatin accessibility and gene expression in OPCs. Like cerebellar granule and Purkinje cells, OPCs similarly showed a broad loss of chromatin accessibility in the absence of linc-mipep (Figure 4F). OPCs from linc-mipep mutants showed reduced accessibility at binding sites of E2f7, Elf1 (which is upregulated in differentiating oligodendrocytes), Fev, and Hinfp TFs, and increased accessibility at Sox10 binding sites (Figure 4—figure supplement 5D). These changes in accessibility were associated with shifts in gene expression levels consistent with defects in OPC development or maturation, as we found 136 genes that were down-regulated and 57 genes that were up-regulated in linc-mipep mutant OPCs relative to wild-type OPCs (Supplementary file 6).

To better understand how OPCs may be affected, we further analyzed omni-ATAC-seq analyses in wild type and linc-mipep;linc-wrb mutant brains. These analyses revealed differentially accessible regions downstream of olig2 (a transcription factor that activates the expression of myelin-associated genes), within a large intronic span of sgms2b (which synthesizes a component of myelin sheath), and upstream of fabp7a (which is important for OPC differentiation in vitro in mouse) (Foerster et al., 2020; Figure 4—figure supplement 6A–C). To validate that OPCs are affected in vivo, we found a significant 13% decrease (p=0.0053) in olig2 + oligodendrocyte progenitor cells’ signal in mutant brains compared to WT brains, with most of the loss coming from the optic tectum and the cerebellum of Tg(olig2:eGFP); linc-mipep-/-; linc-wrb-/- compared to control larvae (Figure 4G; Figure 4—figure supplement 4A).

Finally, we asked whether some of the genes that were differentially regulated in OPC, cerebellar granule cell, or Purkinje cell clusters could explain the dysregulation of NMDA receptor signaling and sensitization to glucocorticoids that we found in our earlier pharmacological profiling (Figure 3). Indeed, we found that some of the differentially regulated genes in single-cell analyses are known to be involved in NMDA receptor and glucocorticoid receptor signaling. For example, fkbp5, which is associated with glucocorticoid signaling, showed reduced expression in linc-mipep mutant OPCs, and scg5, which can mediate stress responses (Cao-Lei et al., 2014; Mbikay et al., 2001), showed reduced expression broadly (Figure 4H; Figure 4—figure supplement 7). These genes also showed changed chromatin accessibility in the linc-mipep;linc-wrb double mutant brains (Figure 4—figure supplement 6D and E). Similarly, numerous genes involved in NMDA receptor activity (aldocb, ttyh3b, slc1a2b, nrxn1a, grin1b, gpmbaa, atp1a1b) showed reduced expression in linc-mipep mutant OPCs relative to wild-type OPCs, consistent with a reduction in NMDA receptor signaling in mutants (adjusted p-value = 0.0294, GO Molecular Function from FishEnrichR analysis) (Supplementary file 6). For one of these genes, grin1b, we also observed associated changes in chromatin accessibility (Figure 4—figure supplement 6F). At the single-cell level, we find that the expression of grin1a and grin1b, which encode NMDAR subunits, are significantly downregulated in linc-mipep mutant OPCs relative to wild-type OPCs (Figure 4H; Figure 4—figure supplement 7 and Figure 4—figure supplement 8A, B). Some genes that are significantly misregulated specifically in linc-mipep mutant OPCs, such as erbb4, mag, qkia, and myt1b, are also specifically enriched in wild type granule cells, despite different developmental lineage origins and cellular progressions (Figure 4—figure supplement 9A–D). Together, these results suggest that loss of linc-mipep and linc-wrb preferentially affect the development of oligodendrocyte progenitor cells and cerebellar cells – evolutionarily newer vertebrate cell types - and these effects may mediate changes in NMDAR and glucocorticoid signaling through changes in chromatin accessibility and gene expression.

Discussion

Here, we present the first zebrafish brain single-cell multiome analysis to understand the cell type-specific effects of loss-of-function of the proteins encoded by linc-mipep and linc-wrb. We found that mutations in these genes preferentially regulate cerebellar cell types and OPCs and regulate behavior in a dose-dependent manner.

LincRNAs represent a prevalent and functionally diverse class of non-coding transcripts that likely emerged from previously untranscribed DNA sequences, either by duplication from other ncRNAs or from changes of coding regions (Ulitsky et al., 2011). Here, we establish that linc-mipep (or lnc-rps25) and linc-wrb, previously identified as long non-coding RNAs, encode micropeptides with homology to the vertebrate-specific non-histone chromosomal protein HMGN1. While it is possible that linc-mipep, linc-wrb, and HMGN1 arose from an originally non-coding transcript, possibly in invertebrates, we identify a basal-most vertebrate sequence in lamprey for an ancestral HMGN protein lacking the key C-terminal regulatory domain of human HMGN1. We propose that this ancestral protein may be derived from an unannotated ORF in the invertebrate, basal chordate Amphioxus (lancelet) encoding for an APEX1-like gene in the HMGN1 syntenic region. The emergence of linc-mipep, linc-wrb, and HMGN1 in jawed vertebrates, and their effects in cerebellar and oligodendrocyte cells, is intriguing. Neural crest cells, myelinating cells (both oligodendrocytes in the CNS and neural crest-derived Schwann cells in the peripheral nervous system), and cerebellar cells (including granule and Purkinje cells) are considered to be among these jawed vertebrate-specific innovations (Gans and Northcutt, 1983; Lamanna et al., 2022; Sugahara et al., 2021). We hypothesize that linc-mipep, linc-wrb, and HMGN1 co-evolved with the gene regulatory networks that establish these cell types in development, in line with findings from previous reports (Deng et al., 2017; González-Romero et al., 2015; Hock et al., 2007; Ihewulezi and Saint-Jeannet, 2021; Zalc, 2016; Zalc et al., 2008), as we find that these evolutionarily newer brain cell types are most affected by loss of linc-mipep and linc-wrb in zebrafish. It will be important for future studies to investigate the effects of the acquisition and evolution of HMGN genes and their preferential roles in the development of these vertebrate cell types (Deng et al., 2017; González-Romero et al., 2015; Hock et al., 2007; Ihewulezi and Saint-Jeannet, 2021; Zalc, 2016; Zalc et al., 2008).

We find that mutations in linc-mipep and linc-wrb most affect cerebellar granule and Purkinje cells and OPCs and behavior. Both OPCs and cerebellar cells are typically associated with post-natal growth in humans. The cerebellum is a folded hindbrain structure important for coordinating body movements and higher-order cognitive functions. Our results suggest a plausible explanation for with recent findings in Trisomy 21 (Down syndrome) pathology, in which HMGN1 is overexpressed, that developing and adult Down syndrome brains have dysregulated expression of genes associated with oligodendrocyte development and myelination in addition to alterations in the cerebellar cortex (Baxter et al., 2000; Mowery et al., 2018; Olmos-Serrano et al., 2016), highlighting the important roles that oligodendrocytes play in normal neurodevelopment and neurodevelopmental disorders (Jin et al., 2020). Our behavioral mutant analyses highlight the dose-dependent roles of linc-mipep and linc-wrb; evolutionarily conserved functions between linc-mipep, linc-wrb, and human HMGN1 in neurodevelopment (Abuhatzira et al., 2011; Deng et al., 2017; Deng et al., 2013); and the importance of understanding the ancestral and conserved roles of key neurodevelopmental genes in non-mammalian and more basal vertebrate systems. Altogether, these studies emphasize the importance of non-neuronal and non-cerebral cortex cell types in neurodevelopmental disorders (Sathyanesan et al., 2019), in which the vertebrate-specific Hmgn1 and related proteins may play a unifying role by regulating chromatin accessibility for key transcription factors.

Our results indicate that loss of linc-mipep and linc-wrb has an effect on chromatin accessibility, which has an effect on the regulatory activity of multiple TFs and gene expression networks. In particular, chromatin accessibility in mutants is altered at grin1b, among other regions, and we find differential regulation of other genes in linc-mipep mutant OPCs related with NMDA receptor signaling. These data provide a potential mechanism for how these genes are significantly differentially expressed between wild type and linc-mipep mutant OPCs. However, future studies will be needed to understand how these non-histone chromosomal proteins regulate not only this pathway but other epigenetic aspects of neural development and cell function. One possibility is that NMDA signaling is preferentially dysregulated in these cells. Alternatively, NMDA signaling may be broadly dysregulated, while affecting these cells the most. Evidence from early mouse development found that NMDA receptors are most abundant in oligodendrocyte progenitor cells compared to mature oligodendrocytes (De Biase et al., 2010; Zhang et al., 2014). One study proposes that a main role specifically for NR1 (encoded by Grin1 in mouse) is to maintain oligodendrocyte glucose transport, which is crucial for the function and health of myelinated axons (Saab et al., 2016). Future investigations will have to reveal exactly how loss of these zebrafish HMGN1 homologs affects the development and maintenance of oligodendrocytes and cerebellar cells and how the intricate cross-talk between these cells is affected in linc-mipep and linc-wrb mutants. It will also be important to define how the proteins encoded by linc-mipep and linc-wrb specifically regulate NMDAR signaling and whether this mechanism is conserved in other vertebrate species. Some studies of HMGN1 in mammalian cells have elucidated some of its key molecular mechanisms of gene regulation (Deng et al., 2017; Abuhatzira et al., 2011; He et al., 2018; Prymakowska-Bosak et al., 2002; Catez et al., 2002; Lim et al., 2005). Future work will be needed to fully uncover the molecular mechanisms and binding/interaction partners for each protein in zebrafish and across other vertebrate species, to understand to what extent these mechanisms are conserved. We also do not know whether these paralogous genes work cooperatively or redundantly. For example, future work should investigate whether these related genes have distinct and/or partially overlapping targets and binding partners.

Finally, screening for behavioral phenotypes using F0 mutants is emerging as an important way to decrease time and number of vertebrate animals to enrich for gene candidates for further study (Kroll et al., 2021). Further advances have also increased the resolution of behavioral parameters or patterns affected, allowing for more detailed phenotyping and downstream analyses (Kroll et al., 2021; Ghosh and Rihel, 2020). This phenotyping approach can further enable screens for other micropeptides that are identified through ribosome profiling or mass spectrometry, lincRNAs, and rare or unannotated candidate disorder risk genes. We note limitations for targeting some of these genes are lower GC content, shorter exon lengths, and inducing larger deletions that may cause a phenotype as a result of a necessary noncoding element. However, there are now non-canonical Cas9s and Cas13s and nearly-PAMless endonucleases that can be tested (Treichel and Bazzini, 2022; Vicencio et al., 2022). Current efforts in the field are underway to understand how F0 phenotyping is similar or different from phenotypes observed in stable mutants. Nonetheless, mutations such as those presented in Figure 1D will be important to decipher the role(s) of micropeptides or lincRNAs, including some genes that may have multiple coding and non-coding functions.

Overall, this study highlights the power of using a high-throughput, genetically tractable vertebrate model to systematically screen for micropeptide function within putative lincRNAs, behavioral phenotypes, signaling pathways, and cell type susceptibilities in early vertebrate development. How novel protein-coding genes may be born from non-coding genomic elements remains an elusive question (Weisman, 2022). Several short open reading frames encoding for functional, evolutionarily conserved peptides now have been discovered within putative non-coding RNAs (Makarewich and Olson, 2017), and some of these genes may have emerged along vertebrate lineages (for example, libra/NREP Bitetti et al., 2018). Our analyses support the idea that many more unannotated or undescribed proteins may similarly play critical roles in vertebrate neurodevelopment and behavior (Barlow et al., 2020). We propose that revisiting sORFs identified within putative long non-coding RNAs in basal vertebrates may provide insight into gene innovation and evolution. This framework will enable genetic studies in a basal system to understand the evolutionary origins of human developmental disorders and diseases in a vertebrate cell type-specific manner.

gMaterials and methods

Zebrafish husbandry and care

Fish lines were maintained in accordance with the AAALAC research guidelines, under a protocol approved by the Yale University Institutional Animal Care and Use Committee (IACUC Protocol Number 2021–11109). We have complied with all relevant ethical regulations under this protocol. Zebrafish husbandry and manipulation were performed as described, and all experiments were carried out at 28 °C. For all larval experiments, zebrafish embryos were raised at 28.5 °C in petri dishes at densities of 70 embryos/dish on a 14 hr:10 hr light:dark cycle in a DigiTherm 38 liter Heating/Cooling Incubator with circadian lighting (Tritech Research). Dishes of embryos were cleaned once per day with blue water (fish system water with 1 mg/L methylene blue, pH 7.0) until they were placed in behavior boxes (ZebraBox, Viewpoint), to ensure identical growing conditions. Normal development was assessed, and larvae exhibiting abnormal developmental features (no inflated swim bladder, curved) were not used.

Ribo-seq profiles

Sequences for ribosome profiling were previously published (Bazzini et al., 2014 and Johnstone et al., 2016). Code for updated ribosome profiling plots available here. Updated mapping, including for new genome releases, is available here.

CRISPR F0 experiments

Synthetic guides were designed using CRISPRscan and ordered as sgRNAs through Synthego (Synthego Corportation, Redwood City, CA, USA). Target and scrambled (control) sequences are presented in Supplementary file 1. EnGen Spy Cas9 NLS protein (NEB, M0646) was used for F0 experiments. RNPs were formed by mixing 3 μM Cas9 protein, 300 mM KCl, and 10 mM of each synthetic sgRNA targeting one gene, incubating at 37 °C for 10 min, and cooling to room temperature for 5 min. One-cell stage zebrafish embryos were injected with 100pl of each respective mix early after fertilization into the yolk. Pools of 8 embryos at 24 hpf were collected and incubated in 50 μl of 100 mM NaOH at 95 °C for 20 min. Then, 25 μl of 1 M Tris-HCl (pH 7.5) was added to neutralize the mix. Two μl of these crude DNA extracts were used for genotyping with the corresponding forward and reverse primers (10 µM; Supplementary file 1) using a standard PCR protocol, and these products were then sent for Sanger sequencing to assess cutting efficiencies. Mutation efficiency was assessed using Inference of CRISPR Edits (Synthego Performance Analysis, ICE Analysis. 2019. v3.0. Synthego). We note that the linc-mettl3 target sites lie between highly repetitive regions, making it difficult to amplify the necessary length for ICE analysis. We provide PCR and Sanger sequencing results in this case, indicating efficient targeting and significant large genomic deletion.

CRISPR mutant generation

CRISPR mutant generation was done following Moreno-Mateos et al., 2015. Briefly, CRISPRScan (crisprscan.org) was queried to identify appropriate target sequences (Moreno-Mateos et al., 2015). Primers were ordered and amplified with universal primer 5’- AAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC-3’. sgRNAs were in vitro transcribed using the AmpliScribe T7 Flash kit, using the PCR product (with T7 promoter) as template. In vitro transcribed sgRNAs were treated with DNase I and precipitated with sodium acetate and ethanol. Cas9 mRNA was in vitro transcribed from DNA linearized by XbaI (pT3TS-nCas9n) using the mMESSAGE mMACHINE T3 kit (Ambion). In vitro transcribed Cas9 RNA was treated with DNase I and purified using the RNeasy Mini kit (Qiagen).

One-cell stage zebrafish embryos were injected with 50 pg of each respective sgRNA and 100 pg of cas9 mRNA. sgRNA and genotyping primers and target sequences are available in Supplementary file 1.

Overexpression constructs

gBlocks (IDT) were ordered for the linc-mipep or human Hmgn1 coding sequence, plus a FLAG and HA tag at the C terminus, as follows:

  • Linc-mipep CDS: 5’-gccaccATGCCTAAAAGGAGCAAAGCGAACAATGACGCT GAAGTCTCTGAGCCTAAAAGAAGGTCAGAGAGGTTGGTAAACAAACCTGCACCCCCAAAGGCAGAGCCCAAGCCAAAGAAGGCCCCTGCCAAACCTAAGAAAACAAAGGAACCCAAGGAGCCCAAGGAGGAGGAGAAGAAAGAGGAGGTGCCCGCAGAAAACGGAGAAACAAAAGCTGACGATGATGCATCGGCAACAGAAGACGGCGACAAGAAAGAAGACGGGGAAGGTTCTGGCTCAgactacaaagacgatgacgacaagtacccatacgatgttccagattacgctTAA-3’

  • Human Hmgn1CDS: 5’-gccaccATGCCCAAGAGGAAGGTCAGCTCCGCCGAAGGCGCCGCCAAGGA

  • AGAGCCCAAGAGGAGATCGGCGCGGTTGTCAGCTAAACCTCCTGCAAAAGTGGAAGCGAAGCCGAAAAAGGCAGCAGCGAAGGATAAATCTTCAGACAAAAAAGTGCAAACAAAAGGGAAAAGGGGAGCAAAGGGAAAACAGGCCGAAGTGGCTAACCAAGAAACTAAAGAAGACTTACCTGCGGAAAACGGGGAAACGAAGACTGAGGAGAGTCCAGCCTCTGATGAAGCAGGAGAGAAAGAAGCCAAGTCTGATGGTTCTGGCTCAgactacaaagacgatgacgacaagtacccatacgatgttccagattacgctTAA-3’

Addgene plasmid #79885 (pMT-ubb-cytoBirA-2a-mCherry, a gift from Tatjana Sauka-Spengler Trinh et al., 2017) was digested with BamHI and EcoRV, and the resulting vector was used as the backbone for the construct. InFusion cloning (Takara Bio) was used to amplify the Linc-mipep-FLAG-HA coding sequence and ligate with the vector, using primers F: 5’- TTGTTTACAGGGATCgccaccATGCCTAAAAGGAGC-3’ and R: 5’- CTCTCCTGATCCGATagcgtaatctggaacatcgtatggg-3’. InFusion cloning (Takara Bio) was used to amplify the human Hmgn1-FLAG-HA coding sequence and ligate with the vector, using primers F: 5’ TTGTTTACAGGGATCCGCCACCATGCCCAAGAGG –3’ and R: 5’-CTCTCCTGATCCGATATCATCAGACTTGGCTTCTTTCTCTCC-3’. Sequence-verified plasmids were midi-prepped and injected into the cell of one-cell stage embryos at 20 ng/μl along with 200 ng/ul of Tol2 transposase capped mRNA.

Fish lines used in this study

The following stable fish mutant lines have been established in this study: linc-mipepdel1.8kb (ya126); linc-mipepATG-del6 (ya127); linc-mipepdel8 (ya128); linc-mipep3’UTR-del74 (ya129); linc-wrbdel11 (ya130). The following stable transgenic lines have been established in this study: Tg(ubb:linc-mipep-FLAG-HA-T2A-mCherry) (ya145); and Tg(ubb:human-Hmgn1-FLAG-HA-T2A-mCherry) (ya151). The following previously published transgenic line has been used in this study: Tg(olig2:egfp)vu12.

Quantitative locomotor activity tracking and statistics for sleep/wake analyses

At 4 dpf, single larvae from heterozygous linc-mipep mutant incrosses were placed into individual wells of a clear 96-square well flat plate (Whatman) filled with 650 μL of blue water (fish system water with 1 mg/L methylene blue, pH 7.0). Plates were placed in a Zebrabox (ViewPoint Life Sciences), and each well was tracked using ZebraLab (Viewpoint) in quantized mode, and analyzed with custom software as in Rihel et al., 2010 and at Rihel, 2023 and DOI: 10.5281/zenodo.7644073. Behavioral data were analyzed for statistical significance using one-way ANOVA followed by Tukey’s post hoc test (α=0.05), as previously described (Rihel et al., 2010). Each behavioral experiment presented was repeated 2–4 times. For analyses of maternal-zygotic linc-mipep;linc-wrb mutants, age- and size-matched wild type adult stocks (AB/TL) or linc-mipep;linc-wrb double-homozygous mutants were incrossed, collected simultaneously, and raised in identical conditions prior to quantitative locomotor activity tracking as described above.

Behavioural fingerprints and Euclidean distances

As previously described (Kroll et al., 2021), the raw file generated by the ZebraLab software (ViewPoint Life Sciences) was exported into a series of xls files each containing 1 million rows of data. Each datapoint represented the number of pixels that changed grey value above a sensitivity threshold, set to 18, for one larva at one frame transition, a metric termed Δ pixels. These files, together with a metadata file labelling each well with a genotype, were input to the MATLAB script Vp_Extract.m (Ghosh and Rihel, 2020), which calculated the following behavioral parameters from the Δ pixels timeseries for both day and night: (1) active bout length (duration of each active bout in seconds); (2) active bout mean (mean of the Δ pixels composing each active bout); (3) active bout standard deviation (mean of the Δ pixels composing each active bout); (4) active bout total (sum of the Δ pixels composing each active bout); (5) active bout minimum (smallest Δ pixels of each bout); (6) active bout maximum (largest Δ pixels of each bout); (7) number of active bouts during the entire day or night; (8) total time active (% of the day or night); (9) inactive bout length (duration of each pause between active bouts in seconds). These measurements were then averaged across both days or both nights to obtain one measure per parameter per larva for the day and night. To build the behavioral fingerprints, we calculated the deviation (Z-score) of each mutant (F0) larva from the mean of their wild-type siblings across all parameters. Plotted in Extended Data Figure 1b and c for each parameter is the mean ± SEM of the Z-scores. We compared fingerprints between replicates (Extended Data Figure 1b) or between linc-mipep and linc-wrb (Extended Data Figure 1c) using Pearson correlation. The behavioral fingerprint of each larva can be conceptualized as a single datapoint in a multidimensional space where each dimension represents one behavioral parameter. To summarize the intensity of each phenotype across parameters, we measured the Euclidean distance between each larva and the mean fingerprint of its wild type siblings, set at the origin of this space by the Z-score normalization (Figure 1—figure supplement 3B). Generally, F0 mutants with more parameters affected, or with more extreme differences in the parameters affected, displayed larger Euclidian distances; those with few or with mildly-affected parameters displayed smaller Euclidean distances. Code for this analysis is available on GitHub (Kroll, 2022). Prism 9 (GraphPad) was used for statistics and plotting for Figure 1—figure supplement 4.

Hierarchical clustering

Correlation analysis was done in MATLAB (R2018a; The MathWorks) as previously described (Rihel et al., 2010). Behavioral phenotypes of wild-type fish exposed to a panel of 550 psychoactive agents from 4 to 7 dpf were ascertained as previously described (Rihel et al., 2010). To compare the behavioral fingerprints of WT larvae exposed to each drug and the linc-mipep mutant behavioral fingerprint, hierarchical clustering analysis was performed as in Rihel et al., 2010; Hoffman et al., 2016.

Sequence alignments and homologies

BLASTp, BLASTn, and the UCSC Genome Browser were used to find sequences (especially the highly conserved 3’UTR sequence) and proteins with sequence homology and/or synteny to human Hmgn1. Clustal Omega (through EMBL-EBI) was for multiple sequence alignments.

Custom antibodies generation

Three custom antibodies were designed (YenZym Antibodies, LLC) against: Si:ch73-1a9.3 (linc-mipep), C-Ahx-DDASATEDGDKKEDGE-COOH; Si:ch73-281n10.2 (linc-wrb), C-Ahx-EDAKPEAEEKTP-amide; and both Si:ch73-1a9.3 and Si:ch73-281n10.2: KRSKANNDAE-Ahx-amide. The last antibody designed to recognize both proteins was non-specific and not further used. Antibody specificity was confirmed by antibody staining in wild type and linc-mipep; linc-wrb mutants.

Antibody staining and imaging

Embryos up to 24 hpf: Embryos were dechorionated and collected into room-temperature 4% PFA in PBS for 1 hr. Embryos were blocked rotating for 1 hr at room temperature in 10% normal goat serum (NGS) (Thermo Fisher Scientific, 50062Z), primary antibody stained for 1 hr at room temperature in 10% NGS, washed 3x5 min in 1xPBS with 0.25% Triton-X (PBST), incubated rotating and protected from light for 1 hr at room temperature, washed 3x5 min in PBST, and mounted in 0.7–1% low-melt agarose on glass-bottom dishes (MatTek) for imaging. Larvae: Larvae (up to 6 dpf) were maintained in a quiet environment. For assessment of olig2 + cells, the Tg(olig2:egfp)vu12 line (Shin et al., 2003) was crossed to either wild type or double-homozygous linc-mipep;linc-wrb mutants. Subsequently, those olig2:egfp adults were outcrossed to either wild type or linc-mipep1;linc-wrb double homozygous mutants. To ensure rapid fixation at 6 dpf, larvae from each of these crosses were poured through a mesh sieve and immediately submerged into ice-cold 4% PFA (Electron Microscopy Sciences) /1 x PBS-0.25% Triton X-100 (PBST)/4% sucrose, in fix, as previously described (Randlett et al., 2015). Larvae were fixed overnight at 4 °C and washed three times for 15 min each in PBST. For whole larvae, pigment was bleached with a 1% H2O2/3% KOH solution (in PBS), washed 3x15 min in PBST, then permeabilized with acetone (pre-cooled to –20 °C) at –20 °C for 20 min, and washed three times for 15 min with PBST. For dissecting brains (critical for assessment of GFP + cells), following overnight fixation, larvae were washed 3x5 min in PBST, then brains were dissected by hand and transferred back into tubes with PBS. Brains were sequentially dehydrated 5 min each in 25% MeOH/75% PBS, 50% MeOH/50% PBS, 75% MeOH/50% PBS, and 100% MeOH, and stored at –20 °C for at least overnight. Brains were sequentially similarly rehydrated, then permeabilized with 1 x Proteinase K (10 mg/ml is 1000 x stock) in PBST for exactly 10 min. Brains were then rinsed 3 x with PBST, post-fixed in 4% PFA/PBST for 20 min at room temperature, and washed three times for 5 min in PBST. Samples were mounted in 0.7–1% low-melt agarose on glass-bottom dishes (MatTek) for imaging. Confocal imaging was performed using a Zeiss 980 AiryScan or a Leica SP8 confocal microscope. Images were processed and analyzed using FIJI software and plugins.

Primary antibodies used

custom Linc-mipep (rabbit); custom Linc-wrb (rabbit); anti-GFP (mouse, A11120, Thermo Fisher Scientific, 1:500); acetylated α-tubulin (rabbit, 5335T, Cell Signaling Technology, 1:500). Alexa Fluor 488, 546 or 568 secondary antibodies against rabbit or mouse were used at 1:500 (Invitrogen). DAPI (for nuclear marking) was added at 1:10,000 during secondary antibody staining.

RNA in situ hybridization

Template DNAs for antisense RNA probes were amplified from a pool of 6 hpf, 1 dpf, 2 dpf, and 5 dpf zebrafish cDNA using primers containing the T7-promoter sequence in the reverse primer. All sequences are listed in Supplementary file 1. Digoxigenin (DIG)-labeled RNA probes were synthesized using T7 RNA Polymerase (Roche) and purified using Monarch RNA Cleanup Kit (New England Biolabs). RNA in situ hybridization was performed as described (Giraldez et al., 2005; Thisse and Thisse, 2008). Briefly, embryos at the respective stages were dechorionated (if applicable) and fixed with 4% paraformaldehyde (PFA) overnight at 4 °C. Fixed embryos were washed 3 X with 1 x phosphate-buffered saline (PBS), then dehydrated with a methanol series (25%, 50%, 75%, and 100% methanol). Dehydrated embryos were stored in 100% methanol for at least 24 hr at –20 °C. Embryos were then rehydrated with a reverse methanol series and washed with 1 x PBS. Pre-hybridization and hybridization were performed at 65 °C for 3 hr and overnight, respectively. Embryos were washed extensively and blocked for 3 hr at room temperature, then incubated with anti-DIG antibody overnight at 4 °C. After antibody incubation, embryos were stained with BCIP/NBT, and staining was stopped with 4% PFA overnight at 4 °C. Embryos were then washed briefly, mounted with a glycerol series (50%, 70%, and 86%), and imaged in 86% glycerol with a Zeiss stereo Discovery.V12 microscope.

RNA-seq and qPCR

Data in Figure 1—figure supplement 5F was generated using publicly available RNA-sequencing data (White et al., 2017). For qPCR, larvae (n=10 per sample) were pooled and flash-frozen in liquid nitrogen and stored at –80 °C. Trizol (Invitrogen) was added to samples and homogenized with sterile pestles. Chloroform was then added, and samples were centrifuged at 4 °C for 15 min at 12,000 x g. The aqueous supernatant was placed into a new tube, and isopropanol was added along with 1 μl of GlycoBlue. Samples were left at –20 °C for 2 hours and centrifuged at 4 °C for 15 min at 12,000 x g. The pellet was washed two times with RNase-free 70% ice-cold ethanol, dried, and resuspended in RNase-free water. 1 μg of RNA was used to make cDNA with the SuperScript III First-Strand Synthesis system (Invitrogen). cDNA was diluted 1:3, and 1 μl was used for each qPCR sample using Power Sybr Green Master Mix (2 x) and respective primers, in technical triplicates. Primers for amplification: fosab (c-fos), 5′- GTGCAGCACGGCTTCACCGA-3′ and 5′- TTGAGCTGCGCCGTTGGAGG-3′; ef1a1l1, 5′-TGCTGTGCGTGACATGAGGCAG-3′ and 5′-CCGCAACCTTTGGAACGGTGT-3′ (Reichert et al., 2019). Expression of fosab (c-fos) was normalized to the expression of ef1a1l1 for each respective sample and timepoint, and relative expression levels were calculated using the ΔΔCt method.

Western blot

Embryos from wild type or Tg(ubb:linc-mipep-FLAG-HA-T2A-mCherry) incrosses were dechorionated at 6 hpf, and 150 embryos were collected per sample per replicate. Water was removed, and embryos were deyolking in 500 μl Deyolking Buffer (55 mM NaCl, 1.8 mM KCl, 1.25 mM NaHCO3) by pipetting through a narrow tip to disrupt the yolk sac. Embryos were shaken at 1100 rpm for 5 min. Cells were then pelleted at 300 g for 30 s, and the supernatant was discarded. Two wash steps were performed using wash buffer (110 mM NaCl, 3.5 mM KCl, 2.7 mM CaCl2, 10 mM Tris/Cl pH 8.5), shaking two minutes at 1100 rpm and pelleting cells. The supernatant was then removed and samples were flash-frozen in liquid nitrogen. Cell pellets were then resuspended in 100 μl sample buffer (1 x NuPAGE LDS Sample Buffer supplemented with DTT). After heating for 10 min at 95 °C, protein samples (40 μl, ~60 deyolked embryos) were resolved on a 4–12% Bis-Tris gel with NuPAGE MOPS Running Buffer (Thermo Fisher Scientific) and transferred to a nitrocellulose membrane using the iBlot 2 Gel Transfer Device (Thermo Fisher Scientific). Membranes were blocked in 5% milk / PBS with 0.1% Tween-20 (PBST), incubated with primary antibody solution (each antibody at 1:2000) prepared in block solution, and then incubated with a peroxidase-conjugated secondary antibody solution prepared in block solution. Proteins were detected with SuperSignal West Pico PLUS Chemiluminescent Substrate (for Actin antibody) or SuperSignal West Femto Maximum Sensitivity Substrate (for FLAG antibody; Thermo Fisher Scientific).

In vivo pharmacological drug experiments

At 4 dpf, single larvae from heterozygous linc-mipep mutant incrosses were placed into individual wells of a clear 96-square well flat plate (Whatman) filled with 650 μL of blue water. Respective pharmacological agents (from a stock of 5 or 50 mM depending on solubility) or corresponding vehicle controls (DMSO or water) were pipetted directly into the water to achieve the desired final concentrations at the start of the experiment (typically evening of 4 dpf). Since both linc-mipep and linc-wrb had similar hyperactivity profiles, we focused on linc-mipep to allow for drug analyses of mutant and wild type (WT) larvae with matched genetic backgrounds. Drug treatments, vehicles, and doses are described in Supplementary file 3.

Genotyping

After each behavioral tracking experiment, larvae were anesthetized with an overdose of MS-222 [0.2–0.3 mg/ml], transferred into 96-well PCR plates, and incubated in 50 μl of 100 mM NaOH at 95 °C for 20 min. Then, 25 μl of Tris-HCl 1 M pH 7.5 was added to neutralize the mix. Two μl of these crude DNA extracts were used for genotyping with the corresponding forward and reverse primers (10 µM; Supplementary file 1) using a standard PCR protocol.

Brain collection for molecular analyses

Briefly, brains at peak daytime activity levels (Zeitgeber Time 4, i.e. 4 hr after lights on) were dissected from 5 dpf MZ-linc-mipep;linc-wrb or wild type zebrafish (for omni-ATAC-seq n=10 per sample, and ChIP-seq n=50 per sample) or 6 dpf zebrafish from one linc-mipep-/- heterozygous incross (for single-cell Multiome, n=12 per sample) in ice-cold Neurobasal media supplemented with B-27 (Thermo Fisher Scientific), snap-frozen in a dry ice/methylbutane bath (to preserve nuclear structure), and stored at –80 °C until use. Trunks of linc-mipep fish from the heterozygous cross were genotyped, then wild type or linc-mipep-/- brains as confirmed by genotyping were pooled together before proceeding with scMulitome.

For ChIP-seq experiments, brains were dissected and homogenized before treatment with 1% PFA (protocol adapted from Cotney and Noonan, 2015) and performed as previously described Miao et al., 2022 using 4 μg of RNA Polymerase II antibody (ab817, Abcam) per sample; 5% input samples were also collected and processed.

Omni-ATAC was performed on frozen brains from 5 dpf zebrafish based on published protocols (Buenrostro et al., 2013; Corces et al., 2017). Frozen brain tissue was homogenized in cold homogenization buffer (320 mM sucrose, 0.1 mM EDTA, 0.1% NP40, 5 mM CaCl2, 3 mM Mg(Ac)2, 10 mM Tris pH 7.8, 1×protease inhibitors (Roche, cOmplete), and 167 μM β-mercaptoethanol) on ice. The lysate was filtered with a tip strainer (Flowmi Cell Strainers, porosity 70 μm) into a new Lo-Bind tube. Nuclei were isolated using the gradient iodixanol solution as described (Corces et al., 2017). Nuclei solution was mixed with 1 ml of dilution buffer (10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20) and was then centrifuged at 500 x g for 10 min at 4°C. Transposition and library preparation were performed on the purified nuclei as described (Miao et al., 2022).

The supernatant was removed, and the purified nuclei were resuspended in the transposition reaction mixture (25 μl 2×TD Buffer, 2.5 μl Tn5 transposase, 22.5 μl Nuclease-Free water) and incubated for 30 min at 37 °C. DNA was then purified with the Qiagen MinElute Kit (Qiagen, 28004). Libraries were prepared using NEBNext High-Fidelity 2 X PCR Master Mix (NEB, M0541) with the following conditions: 72 °C, 5 min; 98 °C, 30 s; 15 cycles of 98 °C, 10 s; 63 °C, 30 s; and 72 °C, 1 min. Libraries were purified with Agencourt AMPureXP beads (Beckman Coulter Genomics, A63881) and sequenced with the Illumina NovaSeq 6000 System at the Yale Center for Genome Analysis.

High-throughput sequencing data management

LabxDB seq (Vejnar and Giraldez, 2020) was used to manage our high-throughput sequencing data and configure our analysis pipeline. Export to the Sequence Read Archive was achieved using the “export_sra.py” script from LabxDB Python. All sequencing datasets generated in this work have been deposited through NCBI, BioProject PRJNA945049. Detailed information about these datasets are also provided in Supplementary file 9.

Omni-ATAC data processing, differential and motif enrichment analysis

Raw paired-end Omni-ATAC reads were mapped using LabxPipe (Vejnar, 2023b). Reads were adapter trimmed using ReadKnead (Vejnar, 2023d) and mapped to the zebrafish GRCz11 genome sequence Yates et al., 2020 using Bowtie2 (Langmead and Salzberg, 2012) with parameters ‘-X 2000, --no-unal, "--no-unal", "--no-mixed", "--no-discordant". The alignments were deduplicated using samtools markdup (Li et al., 2009). For genome-wide analysis, only uniquely mapped reads (with alignment quality ≥30) were used. Reads mapped to the + strand were offset by +4 bp and reads mapped to the – strand were offset by −5 bp (Buenrostro et al., 2013). Only fragments with insert size ≤ 100 bp (effective fragments) were used to determine accessible regions. Genome tracks were created using BEDTools (Quinlan and Hall, 2010) and utilities from the UCSC genome browser (Lee et al., 2022). For all the genome tracks in the paper, signal intensity was in RPM (reads per million). Fragment coverage on each nucleotide was normalized to the total number of effective fragments in each sample per million fragments.

Peak calling

Effective reads from three linc-mipep;linc-wrb double mutant replicates and three wild-type replicates were merged. Then narrow peaks were called on the merged data using MACS3 (Zhang et al., 2008) with the additional parameters ‘-f BEDPE --nomodel --keep-dup all’ with significance cutoff at p=10−20. In total, 173,443 narrow peaks were called. Among them, 170,599 peaks were located within chromosomes 1–25; these regions were determined as accessible regions for further analysis. Differential analysis was performed using DESeq2 (Love et al., 2014), comparing fragment coverage of each accessible region in the three linc-mipep;linc-wrb double mutant replicates with that in the three wild-type replicates. A total of 3367 regions that were mapped to chromosomes 1–25 show a significant difference (false discovery rate (FDR)<0.01), with 2167 regions significantly up-regulated and 1200 regions significantly down-regulated. A total of 2928 unaffected regions (FDR >0.95; 1.005<linc-mipep;linc-wrb / WT <0.995) were used as control regions for plotting and motif enrichment analysis. Accessibility heatmaps and density plots were generated using deeptools (Ramírez et al., 2014).

Motif enrichment analysis

This was performed on the up-regulated and the down-regulated regions, with unaffected regions as control, using AME in MEME suite (McLeay and Bailey, 2010) with default parameters (https://meme-suite.org/meme/tools/ame, motif database option: Vertebrates In vivo and in silico, Eukaryotic DNA). Motif heatmaps were generated using the R package gplots (Warnes et al., 2022). Tracks for omni-ATAC-seq of wild type or linc-mipep; linc-wrb mutant brains are publicly available at https://www.giraldezlab.org/data/tornini_et_al_2023_elife/.

ChIP-seq data processing and analysis

Raw ChIP-seq reads were adapter trimmed, mapped, and deduplicated using the same method described in the previous section but using the default parameters for Bowtie2 for read mapping. GeneAbacus (all code available at Vejnar, 2023c) was used to create genomic profiles for creating tracks. Fragment coverage on each nucleotide was normalized to the total fragments in each sample per million fragments. For genome-wide analysis, only uniquely mapped reads (with alignment quality ≥30) were used.

Peak calling for ChIP-seq

Peaks were called using MACS2 Zhang et al., 2008 for ChIP-seq data. Narrow peaks were called using MACS2 with the additional parameters ‘-f BEDPE --nomodel --keep-dup all’ with the default significance cut-off (q=0.05, high threshold) and p=0.05 (low threshold). Peaks that are called at high threshold in one condition but not called at low threshold in the other condition are defined to be specific to the condition. Genes with promoter regions (+/-1 kb of transcription start site) that overlap with a peak are defined to be associated with that peak. Tracks for PolII ChIP-seq at 5dpf of wild type or linc-mipep; linc-wrb mutant brains are publicly available at https://www.giraldezlab.org/data/tornini_et_al_2023_elife/.

Single nuclei preparation for scMultiome

Flash-frozen pooled brains were prepared based on Protocol CG000366 – Rev D (Protocol 2) from 10 x Genomics (available here). It is critical to keep samples cold and/or on ice for all steps. Briefly, all samples were processed identically and simultaneously to minimize batch effects. Chilled 0.1 X Lysis Buffer (500 μl) was immediately added to frozen samples, and samples were homogenized using a glass dounce tissue grinder with glass pestle. Samples were incubated on ice for 5 min, gently pipetted 10 x, then incubated again for 5 min. Chilled Wash Buffer (500 μl) was gently added to samples. After pipetting the mix 5 x, the samples were passed through 70μm-porosity Flowmi tips into new ice-cold low-bind 1.5 ml tubes. Each suspension was subsequently passed through a 40μm-porosity Flowmi tip into a new ice-cold low-bind 1.5 ml tube. Samples were centrifuged at 500 rcf 5 min at 4 °C. The supernatant was gently removed without disturbing the nuclei pellet. Chilled Wash Buffer (1 ml) was added, and the nuclei were gently resuspended 5 x. This wash and resuspension step was repeated one more time. On the final step, nuclei were resuspended in Diluted Nuclei Buffer. Quality and number of nuclei (as assessed by >90% Trypan Blue staining and almost no cell clumps) for each sample was assessed using a hemocytometer and were immediately used for tagmentation step using the 10 x Genomics platform. Library preparation was performed following the standard 10 x Genomics protocol (available here).

Data analysis of scRNA-seq and scATAC-seq

Single nuclei from brains of wild type or linc-mipep mutant siblings were collected as described above (Brain collection for molecular analyses). The raw 10 x Genomics Multiome data of scRNA-seq and scATAC-seq were processed using the 10 x Genomics cellranger-arc pipeline (v1.0.1) with the genome, GRCz11. The total numbers of sequenced read pairs per sample for RNA and ATAC were between 197,900,000 and 268,400,000. The estimated numbers of cells for WT and mutant were 7,137 and 7,872, respectively. The mean numbers of raw read pairs per cell were (1) 27,742.56 for RNA and 37,593.97 for ATAC in WT and (2) 26,154.78 for RNA and 27,382.86 for ATAC in mutant. The median numbers of genes per cell for WT and mutant were 349 and 365, respectively. ATAC median high-quality fragments per cell for WT and mutant were 10,466 and 8,626, respectively.

For downstream analyses, we used the Weighted Nearest Neighbor (WNN) method in Seurat (Hao et al., 2021). The two experimental conditions of WT and mutant were first analyzed separately. Data filtering was based on visual inspection of data distributions. The number of RNA read counts per cell was filtered between 50 and 3000 for WT and between 50 and 5000 for mutant. The number of ATAC read counts per cell was filtered between 500 and 50,000 for WT and between 500 and 80,000 for mutant. The filtering threshold for mitochondrial fractions was 15% for both WT and mutant data. Other parameters were left to default values in Seurat (v4.0.2). The numbers of filtered cells in WT and mutant were 6942 and 7740, respectively. The numbers of filtered ATAC peaks in WT and mutant were 164,266 and 167,925, respectively. We then followed the standard Seurat pipelines, with default parameters, for RNA analysis (SCTransform and PCA) and ATAC analysis (TFIDF and SVD) to obtain a WNN graph as a weighted combination of RNA and ATAC data for each of WT and mutant data. Dimensionality reduction was done by UMAP, clustering by the shared nearest neighbor and smart local moving algorithms, and differential marker identification by Wilcoxon rank sum tests. For analyses of variation in chromatin accessibility and enriched motifs, we used chromVAR (Schep et al., 2017) and all motifs from the Fornes et al., 2020 database. We also performed a merged analysis of the two conditions in a similar way by merging the two datasets using the merge function in Seurat. We did not make any correction for batch effects because the two conditions did not show any distinct batch effects on UMAP plots of the merged data. Cell states, or types, were identified by cross-referencing with known markers on ZFIN and 5 dpf datasets from Raj et al., 2020.

For identification of condition-specific significant ATAC peaks in each cluster, intensity distributions of each peak in WT and mutant were statistically analyzed by the Wilcoxon rank sum and the Kolmogorov-Smirnov (KS) methods using one-tailed tests for each condition. Based on manual inspection of p-value distributions of all peaks, we chose raw p-value thresholds of 0.001 and 0.01 for the Wilcoxon and the KS tests, respectively, to deem peaks to be significant. No p-value correction was performed at this filtering step as a strategy of choice. Those significant peaks were further analyzed to identify enriched motifs as described above. In addition, for those clusters of interest, Clusters 8, 35, 38, 39, and 42, we performed a simulation for the number of significant peaks in each cluster by generating 1000 random peak intensity datasets by shuffling the intensity values between WT and mutant as many as the number of cells in the cluster in question. This simulation provided empirical null distributions of the number of significant peaks to obtain p-values. R code for data processing and analyses is available on GitHub (Lee, 2023).

The cells included after filtering from the Seurat analysis were used to perform integrated diffusion and MELD to keep the analyzed dataset consistent. These new techniques were implemented to analyze the data from a different approach. Integrated diffusion was used to combine multimodal datasets, specifically each cell’s RNA-seq and ATAC-seq data, to create a joint data diffusion operator. The 3D integrated PHATE was computed on this joint data diffusion operator as described previously (Kuchroo et al., 2022; Kuchroo et al., 2021). To color the plots by likelihood of a cell belonging to the wildtype or mutant sample, this integrated diffusion operator was used for MELD, outputting the likelihood score for each cell belonging to a wildtype or mutant sample. The notebook for this analysis is available on GitHub (Du, 2023).

Appendix 1

Appendix 1—key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Gene (Danio rerio)si:ch73-1a9.3, linc-mipep (also called lnc-rps25) - now hmgn1bEnsemblENSDARG00000103919
Gene (Danio rerio)si:ch73-281n10.2, linc-wrb - now hmgn1aEnsemblENSDARG00000097102
Gene (Homo sapiens)Hmgn1EnsemblENSG00000205581
Genetic reagent (Danio rerio)linc-mipepdel1.78kbThis paperMutant lineya126, available from Giraldez Lab; submitted through ZIRC
Genetic reagent (Danio rerio)linc-mipepATG-del6This paperMutant lineya127, available from Giraldez Lab; submitted through ZIRC
Genetic reagent (Danio rerio)linc-mipepdel8This paperMutant lineya128, available from Giraldez Lab; submitted through ZIRC
Genetic reagent (Danio rerio)linc-mipep3’UTR-del74This paperMutant lineya129, available from Giraldez Lab; submitted through ZIRC
Genetic reagent (Danio rerio)linc-wrbdel11This paperMutant lineya130, available from Giraldez Lab; submitted through ZIRC
Genetic reagent(Danio rerio)Tg(olig2:egfp)vu12Shin et al., 2003transgenic linePreviously published line
Genetic reagent(Danio rerio)Tg(ubb:linc-mipep-FLAG-HA-T2A-mCherry)This paperTransgenic lineya145, available from Giraldez lab; submitted through ZIRC
Genetic reagent(Danio rerio)Tg(ubb:human-Hmgn1-FLAG-HA-T2A-mCherry)This paperTransgenic lineya151, available from Giraldez Lab; submitted through ZIRC
Antibodyrabbit polyclonal anti-Linc-wrbThis paperCustom antibodycustom antibody, (1:100–200) for antibody staining; works with ProK or acetone permeabilization.
AntibodyRabbit polyclonal anti-Linc-mipepThis paperCustom antibodycustom antibody, (1:100–200)for antibody staining; works with ProK or acetone permeabilization.
Antibodymouse monoclonal anti-FLAGSigmaCat #:F3165Western blot (1:2000)
Antibodyrabbit polyclonal ActinSigmaCat #: A5060Western blot (1:2000)
Antibodyrabbit polyclonal anti-RNA Polymerase II antibodyAbcamCat #: ab817ChIP-seq (4µg)
Recombinant DNA reagentubb:linc-mipep-FLAG-HA-T2A-mCherryThis paperPlasmidAvailable from Giraldez Lab
Recombinant DNA reagentubb:humanHmgn1-FLAG-HA-T2A-mCherryThis paperPlasmidAvailable from Giraldez Lab
Peptide, recombinant proteinEnGen Spy Cas9 NLS (Cas9 protein)New England BiolabsCat #: M0646T
Sequence-based reagentgBlocksIntegrated DNA Technologies (IDT)Gene blocksSequences in materials section
Sequence-based reagentAll synthetic guide RNAsSynthegoSee Supplementary file 1
Sequence-based reagentprimers for genotyping and qPCR probesSigmasee Supplementary file 1, and materials section
Sequence-based reagentprimers for RNA in situ hybridization probesSigmasee Supplementary file 1
Commercial assay or kitNeurobasal MediumThermo Fisher ScientificCat #: 21103049
Commercial assay or kitB-27 Supplement (50X), serum freeThermo Fisher ScientificCat #: 17504044
Commercial assay or kitMonarch RNA Cleanup KitNew England BiolabsCat #: T2040L
Commercial assay or kitDIG RNA Labeling MixRocheCat #: 11277073910
Commercial assay or kitNBT/BCIP Stock SolutionRocheCat #: 11681451001
Commercial assay or kitEZ-Tn5 TransposaseLucigenCat #: TNP92110
Commercial assay or kitAnti-Digoxigenin-AP, Fab fragmentsRocheCat #: 11093274910
Commercial assay or kitNEBNext High-Fidelity 2X PCR Master MixNew England BiolabsCat #: M0541
Commercial assay or kitAgencourt AMPureXP beadsBeckman Coulter GenomicsCat #: A63881
Commercial assay or kitFlowmi Cell Strainers, porosity 70μmBel-Art SP SciencewareCat #: H13680-0070
Commercial assay or kitFlowmi Cell Strainers, porosity 40μmBel-Art SP SciencewareCat #: H13680-0040
Commercial assay or kitTrizol ReagentTrizol ReagentCat #: 15596–018
Commercial assay or kitNuclei Buffer* (20X)10x GenomicsCat #: 2000153/2000207
Commercial assay or kitNonidet P40 (NP40) SubstituteSigma-AldrichCat #: 74385
Commercial assay or kitNuPAGE 4 to 12%, Bis-Tris, 1.0–1.5mm, Mini Protein GelsThermo Fisher ScientificCat #: NP0322BOX
Commercial assay or kitNuPAGE MOPS SDS Running BufferThermo Fisher ScientificCat #: NP0001
Commercial assay or kit10X Phosphate-Buffered Saline (PBS), pH 7.4American BioCat #: AB11072-01000
Commercial assay or kitAmplitaq DNA PolymeraseApplied BiosystemsCat #: N8080153
Commercial assay or kitSuperScript III Reverse TranscriptaseInvitrogenCat #: 18080044
Commercial assay or kitSuperScript III Reverse TranscriptaseInvitrogenCat #: 18080044
Commercial assay or kitMinElute KitQiagenCat #: 28004
Commercial assay or kitChromium Single Cell Multiome ATAC + Gene Expression10x Genomics10x Genomics
Chemical compound, drugTrizma Hydrochloride Solution, pH 7.4Sigma-AldrichCat #: T2194
Chemical compound, drugSodium Chloride Solution, 5MSigma-AldrichCat #: 59,222C
Chemical compound, drugMagnesium Chloride Solution, 1MSigma-AldrichCat #: M1028
Chemical compound, drugL-701,324Tocris BioscienceCat #: 0907dissolved in DMSO
Chemical compound, drugFlumethasoneSelleck ChemCat #: S4088dissolved in DMSO
Chemical compound, drugTricaine-S Topical AnestheticsPentair Aquatic Eco-SystemsCat #: TRS1
Chemical compound, drugTriton X –100Sigma-AldrichCat #: T9284
Chemical compound, drugTween-20Sigma-AldrichCat #: P1379
Chemical compound, drugDigitonin (5%)Thermo Fisher ScientificCat #: BN2006
Chemical compound, drugDAPIThermo Fisher ScientificCat #: D1306
Chemical compound, drug16% Paraformaldehyde aqueous solutionElectron Microscopy SciencesElectron Microscopy Sciences
Chemical compound, drugcOmplete, EDTA-free Protease Inhibitor CocktailRoche
Chemical compound, drugT7 RNA PolymeraseRocheCat #: RPOLT7-RO
Chemical compound, drugGlycoblueThermo Fisher ScientificCat #: AM9516
Software, algorithmZebraLabViewPoint Behavior Technologyhttp://viewpoint.fr/en/p/software/zebralab-zebrafish-behavior-screening
Software, algorithmMATLAB toolboxesMathWorks
Software, algorithmMATLAB R2018aMathWorkshttp://mathworks.com/products/matlab.html
Software, algorithmPrism 9GraphPadhttps://www.graphstats.net/graphpad-prism
Software, algorithmLabxDB seqVejnar and Giraldez, 2020Used for managing high-throughput sequencing data
Software, algorithmLabxPipeVejnar, 2023bavailable at https://github.com/vejnar/LabxPipe
Software, algorithmReadKneadVejnar, 2023cavailable at https://github.com/vejnar/ReadKnead
Software, algorithmBowtie2Langmead and Salzberg, 2012read mapping
Software, algorithmBEDToolsQuinlan and Hall, 2010genome tracks
Software, algorithmMACS3 and MACS2Zhang et al., 2008peak calling
Software, algorithmDESeq2Love et al., 2014differential analysis
Software, algorithmdeeptoolsRamírez et al., 2014
Software, algorithmgplotsGalili 2020available at https://github.com/talgalili/gplots
Software, algorithmMEME suiteMcLeay and Bailey, 2010available at https://meme-suite.org/meme/tools/ame
Software, algorithmGeneAbacusVejnar, 2023cavailable at https://github.com/vejnar/geneabacus
Software, algorithmcellranger-arc pipeline (v1.0.1)10x Genomics
Software, algorithmWeighted Nearest Neighbor (WNN)Hao et al., 2021
Software, algorithmIntegrated DiffusionKuchroo et al., 2021; Kuchroo et al., 2022
Software, algorithmCustom sleep analysis softwareRihel, 2023available at https://github.com/JRihel/Sleep-Analysis/tree/Sleep-Analysis-Code

Data availability

The sequencing datasets generated and analyzed in this study have been made available through the Gene Expression Omnibus (GEO) database (Project ID PRJNA945049). The plasmids, custom antibodies, and fish lines generated in this study are available from the corresponding authors on request. Plasmids will be deposited through Addgene (202543: ubb:linc-mipep and 202544: ubb:hHmgn1). Fish lines have been requested for submission to ZIRC for distribution. Sequences used to generate ribosome profiling plots were previously published (Bazzini et al., 2014; Johnstone et al., 2016) and are available through Sequence Read Archive (SRA) with accession numbers SRP034750 and at SRP072296. All code generated and used in this study is available through GitHub repositories. Links with code are provided in each respective methods section, and as follows: Multi-frame Ribo-seq and mRNA-seq visualization (Vejnar, 2023a); Micropeptides_fingerprints (Kroll, 2022); Sleep tracking analysis (Rihel, 2023); LabxPipe (Vejnar, 2023b); GeneAbacus (Vejnar, 2023c); Single cell multiome analyses (Lee, 2023); Zebrafish Integrated Analysis (Du, 2023).

The following data sets were generated
    1. Tornini VA
    2. Miao L
    3. Lee H-J
    4. Gerson T
    5. Dube SE
    6. Schmidt V
    7. Kroll F
    8. Tang Y
    9. Du K
    10. Kuchroo M
    11. Vejnar CE
    12. Bazzini AA
    13. Krishnaswamy S
    14. Rihel J
    15. Giraldez AJ
    (2023) NCBI BioProject
    ID PRJNA945049. linc-mipep and linc-wrb encode micropeptides that regulate chromatin accessibility in vertebrate-specific neural cells.
The following previously published data sets were used
    1. Bazzini AA
    2. Johnstone TG
    3. Christiano R
    4. Mackowiak SD
    5. Obermayer B
    6. Fleming ES
    7. Vejnar CE
    8. Lee MT
    9. Rajewsky N
    10. Walther TC
    11. Giraldez AJ
    (2014) NCBI Gene Expression Omnibus
    ID GSE53693. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation.
    1. Johnstone TG
    2. Bazzini AA
    3. Giraldez AJ
    (2015) NCBI Sequence Read Archive
    ID SRA314809. Upstream ORFs are prevalent translational repressors in vertebrates.
    1. Giraldez Lab
    (2014) NCBI Sequence Read Archive
    ID SRP034750. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation.

References

  1. Book
    1. Johns EW
    (editors) (1982)
    The HMG Chromosomal Proteins
    London; New York: Academic Press.
  2. Conference
    1. Kuchroo M
    2. Godavarthi A
    3. Tong A
    4. Wolf G
    5. Krishnaswamy S
    (2021) Multimodal Data Visualization and Denoising with Integrated Diffusion
    2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP). pp. 1–6.
    https://doi.org/10.1109/MLSP52302.2021.9596214

Decision letter

  1. Filippo Del Bene
    Reviewing Editor; Institut de la Vision, France
  2. Marianne E Bronner
    Senior Editor; California Institute of Technology, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "linc-mipep and linc-wrb encode micropeptides that regulate chromatin accessibility in vertebrate-specific neural cells" for consideration by eLife. Your article has been reviewed by 4 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by and Marianne Bronner as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) The evolutionary analysis should be expanded significantly which will increase the scope of the results. What happens in other fish species (teleosts but also coelacanth/gar)? Do they also have both proteins? What happens in frogs/birds/reptiles? A multiple-alignment showing the proteins from different representative species of HMGN1 and the new proteins will be particularly informative.

2) In the initial screen, it is not clear how the candidates for testing were selected and what kind of mutations were introduced in the F0, and what was the efficiency of the editing. As the paper is presented at least in part as an innovative screening effort, it is important to provide these details and outline them in the Results section.

3) A ChIP-seq experiment of the new proteins appears to be very interesting, but it is basically not described at all. How many peaks were found? Do they resemble each other? How reproducible was the data? A motif-based analysis appears to be very superficial given how instrumental these data (if solid) can be.

4) The authors should show ribosome profiling data together with the gene structure of examined transcript (ideally, supported by RNA-seq) to visualize the position of ribosome-protected regions within the transcripts (Extended data Figure 1a and Figure 1d). The sequence analyses reveal the similarity between linc-mipep and linc-wrb and should be presented as it is an important finding. The authors should indicate the (expected/predicted) size of both peptides; it was not mentioned in the manuscript.

5) The different genetic alleles generated for linc-mipep and linc-wrb should be confirmed by DNA sequencing chromatographs; the expression of the linc-mipep and linc-wrb transcripts in the mutants should be confirmed by qRT-PCR as sometimes even small deletions can lead to destabilization or overexpression of the remaining transcripts. This is particularly important for the mutants that show behavioral deviations from wt animals.

6) In an elegant rescue experiment, the authors demonstrate that CDS of linc-miprep can rescue zebrafish locomotion hyperactivity phenotype. A control experiment with a construct expressing a frameshifted peptide should be included. From the presentation in Figure 2a, the peptide was tagged with FLAG-HA. Can the expression of the peptide be detected by Western blot/immunostaining? Have the authors tried to rescue the phenotype with human HMGN1?

7) One of the main conclusions from this study is that both micropeptides act together/somewhat redundantly, which would explain why knocking out both peptides has a stronger phenotype than knocking out either peptide individually. While this is a possibility (that they act redundantly, targeting the same regions in the genome), other scenarios are possible, e.g. that they have distinct or only partially overlapping chromatin targets and thus regulate different genes/pathways, which in the end converge on the same behavioral phenotype.

To resolve this, the rescue with linc-mipep should be attempted for the double mutant and also the single linc-wrb mutant (since it is a ubiquitous overexpression line, it may rescue both). Similarly, a rescue by linc-wrb (which is not shown, also not for the single mutant) would be important to support the conclusion that the phenotype is due to the loss of this peptide, and that it acts redundantly with linc-mipep. Moreover, it will also be important to quantify and provide statistics for the overexpression effect of the rescue construct in the WT background

Please also address the other points raised by the reviewers to improve the clarity and readability of the manuscript.

Reviewer #1 (Recommendations for the authors):

In my opinion, the main weakness of the paper is the very limited ability of the molecular phenotypic characterization of the mutants to explain the behavioral and neuropharmacological phenotype. This weakness is partially evident also by the lack of this point in the discussion that focuses on the evolutionary implications and the chromatin remodeling defects observed in the mutants. This is in my opinion an important point that should be better explained and investigated.

I would have also liked to have some validation of the protein localization in the cell types identified as most sensitive to the loss of linc-mipep and linc-wrb. Custom antibodies for these peptides were generated and staining is presented in extended fig2m showing only the larval forebrain. This analysis should be extended to OPC and cerebellar granule cells.

In the discussion of the putative evolutionary origin of linc-mipep and linc-wrb the authors mention the lancelet defining it simply as "invertebrate". This polyphyletic group is insufficient here and the authors should explain better its relevance in this context as basal chordate.

Reviewer #2 (Recommendations for the authors):

1. In the initial screen, it is not clear how the candidates for testing were selected and what kind of mutations were introduced in the F0, and what was the efficiency of the editing. As the paper is presented at least in part as an innovative screening effort, it is important to provide these details and outline them in the Results section.

2. The evolutionary analysis can be expanded significantly which will increase the scope of the results. What happens in other fish species (teleosts but also coelacanth/gar)? Do they also have both proteins? What happens in frogs/birds/reptiles? A multiple-alignment showing the proteins from different representative species of HMGN1 and the new proteins will be particularly informative.

3. Locomotor activity graphs: the number of tested fish should be added to all graphs. In some cases, the authors added a dot plot graph with P values, and this should be done for all the locomotor activity experiments.

4. The rescue experiments were performed using zebrafish linc-mipep CDS. It would be interesting to test whether a homolog for a different species (i.e., HMGN1) will also rescue the behavioral phenotypes.

5. ATAC-seq analysis: the analysis focuses on the comparison of peaks detected or not detected in the different datasets. A more common and more robust approach is to identify a single set of peaks using all the data together, and then test (e.g., using DESeq2) which peaks have differential accessibility between the different genotypes/samples.

6. A ChIP-seq experiment of the new proteins appears to be very interesting, but it is basically not described at all. How many peaks were found? Do they resemble each other? How reproducible was the data? A motif-based analysis appears to be very superficial given how instrumental these data (if solid) can be.

7. There's a mistake in c-fos In situ hybridization experiment location, which is in extended data Figure 4E, and not in Figure 3f (where it is written now).

8. In figure 2d – is the phenotype of linc-mipep-/- vs. linc-mipep+/+ fish (1st vs. 3rd) here significant? If yes – show the p-value. If not – how is this explained?

9. The statement that genes with ribosome-protected fragments are likely encoding functional proteins is not always correct and this part should be explained in more detail.

10. In the description of the single-cell datasets, please indicate fold-changes in differences of representation (e.g., for reduction of olig2+ oligodendrocyte progenitor cells across the brain)

Reviewer #3 (Recommendations for the authors):

1. The authors should show ribosome profiling data together with the gene structure of examined transcript (ideally, supported by RNA-seq) to visualize the position of ribosome-protected regions within the transcripts (Extended data Figure 1a and Figure 1d). The sequence analyses reveal the similarity between linc-mipep and linc-wrb and should be presented as it is an important finding. The authors should indicate the (expected/predicted) size of both peptides; it was not mentioned in the manuscript.

2. The authors should elaborate on the expression of the examined transcripts/peptides during embryogenesis (i.e., are they expressed at 5dpf only or earlier/later) and in adult tissues.

3. The different genetic alleles generated for linc-mipep and linc-wrb should be confirmed by DNA sequencing chromatographs; the expression of the linc-mipep and linc-wrb transcripts in the mutants should be confirmed by qRT-PCR as sometimes even small deletions can lead to destabilization or overexpression of the remaining transcripts. This is particularly important for the mutants that show behavioral deviations from wt animals.

4. In an elegant rescue experiment, the authors demonstrate that CDS of linc-miprep can rescue zebrafish locomotion hyperactivity phenotype. A control experiment with a construct expressing a frameshifted peptide should be included. From the presentation in Figure 2a, the peptide was tagged with FLAG-HA. Can the expression of the peptide be detected by Western blot/immunostaining? Have the authors tried to rescue the phenotype with human HMGN1?

5. A question related to the comment above: is it possible to detect native, untagged peptides by mass spectrometry? Have the authors tried to do it?

6. The manuscript would gain on clarity if a more detailed description of the behavioral assays used as a functional read-out was included in the main text. In general, the manuscript is partially hard to follow due to the insufficient data presentation, peptide size, peptide sequences, etc.

7. The authors should elaborate on why they used a single linc-mipep mutant for the drug experiments but a double mutant for omni-ATAC experiments.

8. The authors should clearly state in the discussion that the molecular mechanisms of action of both studied peptides remain completely unknown. For example, how do they affect chromatin accessibility? What are their interaction partners if any? etc

Reviewer #4 (Recommendations for the authors):

The manuscript can be significantly improved by addressing the following concerns:

Concerns and suggestions:

One of the main conclusions from this study is that both micropeptides act together/somewhat redundantly, which would explain why knocking out both peptides has a stronger phenotype than knocking out either peptide individually. While this is a possibility (that they act redundantly, targeting the same regions in the genome), other scenarios are possible, e.g. that they have distinct or only partially overlapping chromatin targets and thus regulate different genes/pathways, which in the end converge on the same behavioral phenotype.

To reconcile this, the rescue with linc-mipep should be attempted for the double mutant and also the single linc-wrb mutant (since it is a ubiquitous overexpression line, it may rescue both). Similarly, a rescue by linc-wrb (which is not shown, also not for the single mutant) would be important to support the conclusion that the phenotype is due to loss of this peptide, and that it acts redundantly with linc-mipep. Moreover, it will also be important to quantify and provide statistics for the overexpression effect of the rescue construct in the WT background – is there a significant activity decrease by linc-mipep OE? Overall, the authors mention the dosage-sensitivity of HMGN1 proteins, but with the current analyses fail to provide convincing evidence of a clear dosage effect of the two peptides since they could potentially target different, only in part redundant, genes or have different effects in different cell types. To this end, the use of either the single linc-mipep vs double linc-mipep/linc-wrb mutant is inconsistent in the second half of the manuscript: global ATAC-Seq data is only provided from the double mutant while single-cell-analyses are only provided from the single linc-mipep mutant. Moreover, the ChIP-seq analyses provided are only summarized for both proteins combined in the main Figure, but used individual antibodies, leaving it unclear how the individual profiles look (the authors should follow the standard convention on how to show the quality of ChIP-seq data, e.g. provide ChIP-seq tracks at least for some example genes since the quality of the data remains unclear, and differences between the two Abs cannot be assessed; the Suppl Table 8 also only provides a combined list of 37 genes for which ChIP seq peaks were identified though it would be important to show it individually for each AB; also the number of genes bound appears really really small? Are these ALL genes with a ChIP-seq peak?).

The second major concern relates to the unclear link between the different phenotypes observed: how can the behavioral phenotypes be reconciled with the molecular phenotypes (chromatin accessibility in specific neurons or precursors), and how can the chromatin accessibility differences in WT vs mutant be reconciled with the measured transcriptional/gene expression differences? Is there any evidence for NMDA being downstream of linc-mipep/wrb regulation? I applaud the authors on generating all these interesting data sets and analyses, but without connecting them together (here the focus for example on just the single linc-mipep mutant would be helpful, but the global brain ATAC-Seq data is only shown for the double mutant; and vice-versa, the single-cell ATAC-Seq data with the chromatin accessibility changes detected in specific cell types is not linked back to the ChIP-seq profiles of the peptides). Do glial progenitors and OPCs of the mutant(s) have altered expression of the underlying loci with altered accessibility? In Figure 4c, e, f, h, and Extended 6c-e, how does the chromatin accessibility translate to rna level in Purkinje cells and radial glia cells? How many sites lose accessibility in OPCs? Is "broad loss" a fair assessment of the observation?

Without addressing the two major concern points, the statement that linc-mipep and linc-wrb 'broadly regulate the chromatin state of neural cell types, most impacting OPCs and cerebellar granule cell gene expression networks and cell states in a basal vertebrate' appears overstated and would need to phrased differently/softened.

https://doi.org/10.7554/eLife.82249.sa1

Author response

Essential revisions:

(1) The evolutionary analysis should be expanded significantly which will increase the scope of the results. What happens in other fish species (teleosts but also coelacanth/gar)? Do they also have both proteins? What happens in frogs/birds/reptiles? A multiple-alignment showing the proteins from different representative species of HMGN1 and the new proteins will be particularly informative.

We have performed in depth evolutionary analysis. We observe that other teleosts species have two copies of these genes, while coelacanth and gar have only one copy. Frogs/birds/reptiles also have only one copy of these genes.

Detailed Response: We performed a Clustal Omega multiple-alignment showing the full-length alignment of proteins for key species (including coelacanth, gar, Xenopus, zebra finch, and anole lizard). We have also included these data for all analyzed species in Supplementary Table 2 (tab 5). This analysis provides information on key conserved sequences across species. Additionally, we have provided data on syntenic relationships across species and have included a syntenic alignment diagram to replace the previous version that only included human and zebrafish orthologues. We also clarify in the text, within figure supplements, and supplementary tables which of these are currently known, annotated, or otherwise annotated as noncoding or pseudogenes.

Location of new data: A protein sequence alignment among select vertebrate species (including coelacanth and gar) is presented in Figure 2 —figure supplement 4. The synteny analysis (including spotted gar) is presented in Figure 2 —figure supplement 5. Information on these genes in other fish species (including teleosts) is included in Figure 2 —figure supplement 6.

Additional information from all identified related sequences is presented in Supplementary Table 2 (especially sheet 5).

(2) In the initial screen, it is not clear how the candidates for testing were selected and what kind of mutations were introduced in the F0, and what was the efficiency of the editing. As the paper is presented at least in part as an innovative screening effort, it is important to provide these details and outline them in the Results section.

Candidates were selected for testing by analyzing ribosome profiles of highly expressed transcripts at 12hpf, 24 hpf, or 48 hpf, targeted these genes to test for CRISPR lethality, and in situ hybridization was performed on 21 of the remaining candidates. From these genes, we identified brain-enriched micropeptides on which we focused for further screening.

F0 mutations varied, with some generating small in- and out-of-frame indels, and some generating larger mutations. All guides efficiently edited their target sequences, with indel or large deletion rates with multiple guides estimated between ~40-100%. The efficiency of the editing ranged, depending on guide efficiencies.

Detailed Response: We have included details about how candidates for testing were selected in the main text (page 2) and in Supplementary Table 1 (tab 2). Briefly, we analyzed ribosome profiles for previously published lincRNAs in zebrafish embryos during early development (0-24 hours post-fertilization) (Bazzini et al., 2014), performed preliminary CRISPR/Cas9 targeting and excluded from further study those that were lethal. We then performed in situ hybridization on 21 of these candidates and identified brain-enriched micropeptide candidates, which we focus on for further screening.

To determine the efficiency of editing and what kinds of mutations were introduced in F0s, we pooled 8 representative F0 larvae for each of the tested guides and performed Inference of CRISPR Edits (ICE, Synthego) analysis to identify how efficiently the guides cut and their estimated indel or large deletion rates. All genes were edited, with some target guide combinations inducing small indels, and some inducing also larger deletions. We have updated the methods section to reflect these additions, specifically. We further discuss some limitations for targeting small transcripts, which may have limited GC-rich sequences that traditional CRISPR/Cas9 target sites require, and PAM-less variants available now.

Location of new data: Data showing efficient CRISPR targeting and some example edited sequences is presented in Figure 1 —figure supplement 2, with descriptions in Results section on page 2. Information on in situ hybridization screen results are presented in Supplementary Table 1 (sheet 2) and described in Results section on page 2.

(3) A ChIP-seq experiment of the new proteins appears to be very interesting, but it is basically not described at all. How many peaks were found? Do they resemble each other? How reproducible was the data? A motif-based analysis appears to be very superficial given how instrumental these data (if solid) can be.

Thank you for the comments. We found 315 peaks, yet we could not analyze if they resemble each other because our ChIP-seq combined the antibodies for both proteins to identify common bound regions using the double maternal-zygotic mutants as controls. To address the reviewer’s comments, we have performed replicate ChIP-seq for each protein in two different experiments to characterize the antibodies independently for ChIP-seq of Linc-wrb and Linc-mipep; however, given the low number of peaks identified (357 for Linc-wrb and 78 for Linc mipep) we are concerned about the validity of these peaks and the application of those antibodies for ChIP-seq. Given these results, we believe that future studies will be needed to perform an in-depth characterization of the binding profile of each individual protein, the regions bound in the chromatin, and the developmental progression of their binding profile in the brain and in other tissues, thus we believe this is beyond the current scope of the paper and have removed this analysis.

Detailed Response: In our previous submission, we provided a combined ChIP-seq analysis of both proteins at 24 hpf using both antibodies pooled together to identify overlapping binding regions. As a control we had used a double-maternal-zygotic mutant embryos at 24 hpf (over each respective input). The results we presented in the previous submission, through which we performed the motif-based analysis, represented 315 peaks that were called in wild type compared to double MZ linc-mipep; linc-wrb mutant embryos. To address the reviewer’s comments, we have performed replicates in two different experiments to characterize the antibodies independently for ChIP-seq. First, we performed a ChIP-seq for each individual antibody, using 4 hpf wild type embryos (for which our lab has optimized the ChIP-seq protocol) and double-mutant 4 hpf embryos as controls (over each respective input). At this 4 hpf timepoint, we found an enrichment of nuclear antibody staining in non-dividing cells, as shown in Figure 2 —figure supplement 2I and J. While we were able to call statistically significant peaks (357 peaks for Linc-wrb (q=0.05) and 78 peaks for Linc-mipep (q=0.05)), visual inspection of the tracks representing the ChIP-seq profile did not provide a clear ChIP-seq signal enrichment at all called peaks that would make us confident of the validity of these results. Please see Author response image 1. Second, because we had been able to detect the protein by probing with an anti-FLAG antibody in ubb:linc-mipep-FLAG-HA-T2A-mCherry embryos (which has a FLAG tag on the C-terminal end of the linc-mipep CDS), we reasoned that we may be able to perform ChIP-seq analyses using this transgenic ubb:linc-mipep-FLAG-HA-T2A-mCherry which expresses linc-mipep-FLAG protein as assessed by western blot. We performed ChIP-seq analyses using a FLAG antibody for 4 hpf embryos from ubb:linc-mipep-FLAG-HA-T2A-mCherry incrosses (to ensure maternally-deposited transcripts and enrichment for protein at early stages) and wild type embryos as a control (over each respective input). These analyses did not result in obvious peaks upon visual inspection of called peaks. Based on these attempts, and our inability to definitively identify peaks that are clearly enriched compared to the control, we have decided to remove the previously submitted results from our analyses and interpretations, and from this report. We believe these results, which were performed at 4 hpf and 24 hpf, do not affect the overall conclusions of the paper which are centered at 5-6 dpf, because we clearly observed differentially regulated regions in chromatin accessibility and gene expression between the wild type and mutant brains at 5-6 dpf. Future efforts will be needed to further explore the chromatin binding profile of these proteins in the brain and other tissues.

Author response image 1

(4) The authors should show ribosome profiling data together with the gene structure of examined transcript (ideally, supported by RNA-seq) to visualize the position of ribosome-protected regions within the transcripts (Extended data Figure 1a and Figure 1d). The sequence analyses reveal the similarity between linc-mipep and linc-wrb and should be presented as it is an important finding. The authors should indicate the (expected/predicted) size of both peptides; it was not mentioned in the manuscript.

We now include ribosome footprint and ribosome-depleted RNA-seq tracks (at 48 hpf) for the linc-mipep and linc-wrb genes, in Figure 1 —figure supplement 5D and E. We present the sequences in Figure 2E and have expanded the evolutionary analysis across vertebrate species, in Figure 2 —figure supplements 3 and 5, and Supplementary Table 2. For clarity, we also added the sizes of proteins encoded by linc-mipep (87 aa) and linc-wrb (93 aa) to Figure 2E and have added it to the main text (pages 3 and 4).

(5) The different genetic alleles generated for linc-mipep and linc-wrb should be confirmed by DNA sequencing chromatographs; the expression of the linc-mipep and linc-wrb transcripts in the mutants should be confirmed by qRT-PCR as sometimes even small deletions can lead to destabilization or overexpression of the remaining transcripts. This is particularly important for the mutants that show behavioral deviations from wt animals.

We now include DNA sequencing chromatographs for each of the alleles, now presented in Figure 1 —figure supplement 6, alongside their respective behavioral profiles. These data confirm the generation of linc-mipep and linc-wrb mutations. Antibody staining revealed a loss of protein staining (presented in Figure 2 —figure supplement 2) and analysis of the RNA levels in scRNA seq in linc-mipep mutant brain cells (presented in Figure 4 —figure supplement 2) reveal that the transcript is strongly reduced likely due to nonsense-mediated decay.

(6) In an elegant rescue experiment, the authors demonstrate that CDS of linc-miprep can rescue zebrafish locomotion hyperactivity phenotype. A control experiment with a construct expressing a frameshifted peptide should be included. From the presentation in Figure 2a, the peptide was tagged with FLAG-HA. Can the expression of the peptide be detected by Western blot/immunostaining? Have the authors tried to rescue the phenotype with human HMGN1?

We demonstrate that human HMGN1 can rescue linc-miprep mutants (Figure 2F and G). To test whether the CDS of a related human protein, HMGN1, would be sufficient to rescue the phenotype of linc-mipep and linc-wrb, we generated a stable transgenic ubb:human-Hmgn1-FLAG-HA-T2A-mCherry (ubb:hHmgn1) line. We found that when we crossed this ubb:hHmgn1 line to linc-mipep mutants, we were able to significantly rescue the linc-mipep hyperactivity. These results are presented in Figure 2F and G.

We can detect expression of the peptide FLAG- linc-mipep expressing line by western blot, confirming a protein-coding rescue. This data is presented in Figure 2 —figure supplement 1A.

We considered a frame-shifted CDS overexpression experiments, but felt that the results may be difficult to interpret, for example if overexpression of a frameshifted peptide resulted in novel behavioral phenotypes. We note that, although the specific overexpression constructs are different than used here, Chiu et al. (2016) performed a large sleep/wake behavioral screen on the effects of over-expressing 1286 ORFs, of which most gave no phenotype. They further tested 60 overexpression lines in stable transgenic lines, and found only 12 had behavioral phenotypes, spread across sleep-wake parameters, with some increasing, some decreasing, and some having no effect on activity. Based on this data, over-expressing constructs in general are not expected to have consistent non-specific effects on locomotor activity. Furthermore, because we only use the CDS of linc-mipep or HMGN1 in the transgenic rescue experiments (excluding 5’ and 3’UTR sequences), and the mutants for linc-mipep1 and linc-wrb include an 8nt and 1 nt frameshift deletions, respectively, plus the long generation time to achieve the above mentioned experiment with the frameshift rescue, we hope that the reviewers find the data presented here sufficient evidence to support the function of these coding genes.

(7) One of the main conclusions from this study is that both micropeptides act together/somewhat redundantly, which would explain why knocking out both peptides has a stronger phenotype than knocking out either peptide individually. While this is a possibility (that they act redundantly, targeting the same regions in the genome), other scenarios are possible, e.g. that they have distinct or only partially overlapping chromatin targets and thus regulate different genes/pathways, which in the end converge on the same behavioral phenotype.

To resolve this, the rescue with linc-mipep should be attempted for the double mutant and also the single linc-wrb mutant (since it is a ubiquitous overexpression line, it may rescue both). Similarly, a rescue by linc-wrb (which is not shown, also not for the single mutant) would be important to support the conclusion that the phenotype is due to the loss of this peptide, and that it acts redundantly with linc-mipep. Moreover, it will also be important to quantify and provide statistics for the overexpression effect of the rescue construct in the WT background.

We thank the reviewer for these suggestions. As we mentioned above (comment 6), we were able to rescue linc-mipep mutants with a human Hmgn1 transgene (Figure 2F and G). However, this transgene did not rescue linc-wrb (Figure 2 —figure supplement 1D). Yet, we found that ubb:linc-mipep rescues linc-wrb heterozygous mutants almost to wild type levels (p=0.058) (Figure 2 —figure supplement 1B and C), supporting at least a partially redundant function of these proteins.

While we would like to attempt a rescue of the double mutant (linc-mipep; linc-wrb) with the linc-mipep CDS, this would require at least 2 generations equivalent to at least an additional 6 months, and we do not think the number of animals required for this experiment, based on a 3Rs ethical perspective, justifies this experiment which we believe would not significantly change the conclusions of the paper based on the new results presented in this revision.

As suggested by the reviewers, we have provided statistics for the linc-mipep overexpression effect of the rescue construct in the WT background (Figure 2B).

Moreover, we now include data on intermediate phenotypes in larvae resulting from linc-mipep;linc-wrb double-heterozygous crosses, in Figure 1 —figure supplement 7C and D. These analyses reveal that each mutation causes very similar hyperactivity levels and behavioral profiles. We think these results suggest a dose-dependent effect, reflected in the observed levels of hyperactivity in heterozygous and homozygous mutants. We have therefore included a discussion point on the potential individual, overlapping, or redundant effects of each of these genes.

Location of new data: The linc-mipep rescue of the linc-wrb mutant phenotype is presented in Figure 2 —figure supplement 1B and C. Statistics for linc-mipep overexpression in wild type backgrounds is included in Figure 2B. A behavior plot and dot plot from a double-heterozygous incross showing dosage-dependent phenotypes are presented in Figure 1 —figure supplement 7C and D.

Reviewer #1 (Recommendations for the authors):

In my opinion, the main weakness of the paper is the very limited ability of the molecular phenotypic characterization of the mutants to explain the behavioral and neuropharmacological phenotype. This weakness is partially evident also by the lack of this point in the discussion that focuses on the evolutionary implications and the chromatin remodeling defects observed in the mutants. This is in my opinion an important point that should be better explained and investigated.

We have further analyzed and discussed the molecular phenotypic characterization of the mutants to explain the behavioral and neuropharmacological phenotypes. Using scRNA-seq and ATAC-seq data, we have now identified gene expression changes, as well as corresponding changes in chromatin accessibility, in Purkinje and OPC cells. We also show that some of these changes affect genes important for NMDA and glucocorticoid signaling activity. These results suggest that loss of linc-mipep leads to dysregulation of multiple genes that more strongly affect oligodendrocytes and cerebellar cell types, including genes important for the activity of NMDA and glucocorticoid signaling pathways.

Detailed response: We have further analyzed the molecular phenotypes of affected cells to understand the link between the behavioral and neuropharmacological phenotypes. These analyses suggest a role for linc-mipep in the regulation of genes required for OPC and cerebellar cell type development, including NMDA receptor and glucocorticoid receptor signaling components.

We first searched for changes in gene expression in the cell types of interest – OPCs, cerebellar granule cells, and Purkinje cells – that may indicate how these cells were impacted. From single-cell data, we found that linc-mipep mutant Purkinje cells showed significantly decreased expression of numerous genes, including roraa, Rorb, foxp4, and prkcg, which are required for maturation or maintenance of Purkinje cells in zebrafish. We also identified numerous genes that were dysregulated in wild type OPCs relative to linc-mipep mutants. Some of these genes, including erbb4b, mag, qkia, and myt1b were also enriched in cerebellar granule cells, pointing to similar gene networks that may be disrupted in these affected cell types.

We then assessed omni-ATAC-seq data at some of the differentially expressed genes in OPCs. We found differentially accessible regions in linc-mipep;linc-wrb mutant brains downstream of olig2, within a large intronic span of sgms2b, and upstream of fabp7a, suggesting that the micropeptides may be required for proper gene regulation in OPCs.

Finally, we asked whether some of the genes that are differentially expressed or regulated between mutant and wild-type OPC, cerebellar granule cell, or Purkinje cells that may explain the dysregulation of NMDA and the sensitization to glucocorticoids that we observed in the mutants (from Figure 3). Indeed, in linc-mipep; linc-wrb brains, we found increased accessibility at two genes associated with glucocorticoid downstream signaling (fkbp5 in OPCs) and stress responses (scg5 in granule cells); the expression of these two genes is also downregulated in linc-mipep mutants. We also observed decreased expression in the linc-mipep OPC cluster of numerous genes involved with NMDA receptor activity (aldocb, ttyj3b, slc1a2b, nrxn1a, grin1b, gpmbaa, atp1a1b).

Finally, focusing on NMDA receptor regulation, we identified two regions exhibiting differential chromatin accessibility within grin1b, a gene encoding an NMDAR subunit. At the single-cell level, expression of grin1b (and, to some degree also grin1a, which encodes another subunit), is significantly higher in wild type OPCs relative to linc-mipep mutant OPCs.

These new results are consistent with a model in which loss of linc-mipep leads to dysregulation of multiple genes in oligodendrocytes and cerebellar cell types, including genes important for the activity of NMDA and glucocorticoid signaling pathways. We have also included descriptions of these results in main text (pages 8-9) and detailed discussion about these points (page 10). Additional work, beyond the scope of this manuscript, will be needed to test the specific contribution of each of these changes to the neuronal and behavioral effects of linc-mipep and linc-wrb mutations.

Location of new data: These new data and analyses are presented in Figure 4H; Figure 4 —figure supplements 5, 6, 7, and 8; and Supplementary Table 4.

I would have also liked to have some validation of the protein localization in the cell types identified as most sensitive to the loss of linc-mipep and linc-wrb. Custom antibodies for these peptides were generated and staining is presented in extended fig2m showing only the larval forebrain. This analysis should be extended to OPC and cerebellar granule cells.

We performed antibody staining in olig2:GFP brains at 5 dpf to characterize the protein localization of proteins encoded by linc-mipep and linc-wrb in OPCs and cerebellar regions. We note that we do not have a good antibody or transgenic line to label cerebellar granule cells, though we present antibody staining of Linc-mipep and Linc-wrb in single Z confocal slices of brains at 5 dpf, which show a generally uniform expression pattern for both proteins. We find that the Linc-mipep antibody signal is stronger compared to Linc-wrb (also notable in Figure 2 —figure supplement 2E and G), though Linc-mipep is weakly expressed in the torus longitudinalis and tegmentum (as in Figure 4 —figure supplement 4B and E). We added images of whole brains in olig2:GFP and wild type fish with these antibodies. These new data are presented in Figure 4 —figure supplement 6.

In the discussion of the putative evolutionary origin of linc-mipep and linc-wrb the authors mention the lancelet defining it simply as "invertebrate". This polyphyletic group is insufficient here and the authors should explain better its relevance in this context as basal chordate.

We clarify in the text the lancelet’s relevance in this context as a basal chordate (on page 4). We have also now added context for the synteny analysis, in Figure 2 —figure supplement 5A and Supplementary Table 2. We further clarify in the Results section that these findings are consistent with what is known about HMGN family members and highlight that our analysis identifies the putative HMGN origin in lamprey, which seems to be derived partially from the N-terminal sequence of a protein-coding gene in the lancelet, presented in Figure 2 —figure supplement 3D.

Reviewer #2 (Recommendations for the authors):

1. In the initial screen, it is not clear how the candidates for testing were selected and what kind of mutations were introduced in the F0, and what was the efficiency of the editing. As the paper is presented at least in part as an innovative screening effort, it is important to provide these details and outline them in the Results section.

Please see response to comment #2 above, and copied below.

2. The evolutionary analysis can be expanded significantly which will increase the scope of the results. What happens in other fish species (teleosts but also coelacanth/gar)? Do they also have both proteins? What happens in frogs/birds/reptiles? A multiple-alignment showing the proteins from different representative species of HMGN1 and the new proteins will be particularly informative.

Please see response to comment #1 above, and copied below.

3. Locomotor activity graphs: the number of tested fish should be added to all graphs. In some cases, the authors added a dot plot graph with P values, and this should be done for all the locomotor activity experiments.

We have included the number of fish in all the locomotor activity graphs, and have also added dot plot graphs with P values for all the locomotor activity experiments.

Location of new data: Number of fish for locomotor activity graphs are now included in Figure 1 G and H, Figure 1 —figure supplement 6A through E, Figure 1 —figure supplement 7A and D; Figure 2B, C, and F; Figure 2 —figure supplement 1 B and D; Figure 3B; Figure 3 —figure supplement 2A; Figure 3 —figure supplement 2A, D, and F. Dot plots have been added to all relevant locomotor activity graphs except Figure 1 —figure supplement 6, because that data is mostly represented in Figure 1 – supplementary figure 7A , C, and D.

4. The rescue experiments were performed using zebrafish linc-mipep CDS. It would be interesting to test whether a homolog for a different species (i.e., HMGN1) will also rescue the behavioral phenotypes.

We have generated a stable ubiquitous overexpression line with the human HMGN1 CDS, and have assessed behavioral phenotypes in wild type and linc-mipep or linc-wrb mutant backgrounds. We found that overexpression of the human HMGN1 CDS is sufficient to rescue the phenotype in linc-mipep mutants. Please also see Comment #6 and 7 above.

Location of new data: These results are presented in Figure 2F and G, and in Figure 2 —figure supplement 1B and C.

5. ATAC-seq analysis: the analysis focuses on the comparison of peaks detected or not detected in the different datasets. A more common and more robust approach is to identify a single set of peaks using all the data together, and then test (e.g., using DESeq2) which peaks have differential accessibility between the different genotypes/samples.

We provide an updated ATAC-seq analysis using 3 replicates, identifying a single set of peaks using all the data together, and then testing using DESeq which peaks have differential accessibility between wild-type and linc-mipep;linc-wrb mutant brains at 5 dpf. We find that the new results are consistent with the previous analyses and provide a more robust and refined interpretation to identifying differentially accessible peaks between wild type and mutant brains. These analyses more strongly support some of the findings in this study, including the transcription factor motifs identified as enriched or depleted in mutants, and specific peaks identified as differentially accessible.

Location of new data: These data are provided in Figure 3D and E; Figure 3 —figure supplement 3 A through C; sample tracks in Figure 4 —figure supplement 6A through F; and updated Supplementary Table 4. We have also updated the link to publicly available tracks for these runs.

6. A ChIP-seq experiment of the new proteins appears to be very interesting, but it is basically not described at all. How many peaks were found? Do they resemble each other? How reproducible was the data? A motif-based analysis appears to be very superficial given how instrumental these data (if solid) can be.

Please see response to Comment #3 above,

7. There's a mistake in c-fos In situ hybridization experiment location, which is in extended data Figure 4E, and not in Figure 3f (where it is written now).

Thank you for catching this error. We have corrected this mistake, and have indicated the correct experiment location, now in Figure 3 —figure supplement 3D.

8. In figure 2d – is the phenotype of linc-mipep-/- vs. linc-mipep+/+ fish (1st vs. 3rd) here significant? If yes – show the p-value. If not – how is this explained?

The phenotype of linc-mipep-/- vs. WT fish in Figure 2D is significant (p = 0.031, Dunnett’s test). We have now included the P values in the graph.

9. The statement that genes with ribosome-protected fragments are likely encoding functional proteins is not always correct and this part should be explained in more detail.

We have added a more detailed explanation, in the Results section (page 3) and Discussion section (page 11).

10. In the description of the single-cell datasets, please indicate fold-changes in differences of representation (e.g., for reduction of olig2+ oligodendrocyte progenitor cells across the brain).

We have added a more detailed explanation, in the Results section (page 3) and Discussion section (page 11).

Reviewer #3 (Recommendations for the authors):

1. The authors should show ribosome profiling data together with the gene structure of examined transcript (ideally, supported by RNA-seq) to visualize the position of ribosome-protected regions within the transcripts (Extended data Figure 1a and Figure 1d). The sequence analyses reveal the similarity between linc-mipep and linc-wrb and should be presented as it is an important finding. The authors should indicate the (expected/predicted) size of both peptides; it was not mentioned in the manuscript.

We now include tracks with ribosome footprints and ribosome-depleted RNA-seq above the gene structure of each examined transcript, inFigure 1 —figure supplement 5D and E. We also highlight that sequence analyses reveal similarity between linc-mipep and linc-wrb as an important finding in the Results section (page 4), with extended evolutionary analyses presented in Figure 2 —figure supplements 3 – 6 and Supplementary 2. We have now added the size of both peptides to the text (pages 3 and 4) and directly in Figure 2E.

2. The authors should elaborate on the expression of the examined transcripts/peptides during embryogenesis (i.e., are they expressed at 5dpf only or earlier/later) and in adult tissues.

linc-mipep and linc-wrb transcripts are expressed starting from the 1-cell state zygote stage, throughout early development, with protein expression assessed starting at 4 hpf through 5-6 dpf.

Location of new data: We have included data on the expression of both transcripts, in Figure 1 —figure supplement 5F and G. We also include whole embryo antibody staining for each of the protein, in Figure 2 —figure supplement 2. We did not examine adult tissues, as we focused our studies on early (neuro)developmental stages.

3. The different genetic alleles generated for linc-mipep and linc-wrb should be confirmed by DNA sequencing chromatographs; the expression of the linc-mipep and linc-wrb transcripts in the mutants should be confirmed by qRT-PCR as sometimes even small deletions can lead to destabilization or overexpression of the remaining transcripts. This is particularly important for the mutants that show behavioral deviations from wt animals.

Please see response to Comment #3 above.

4. In an elegant rescue experiment, the authors demonstrate that CDS of linc-miprep can rescue zebrafish locomotion hyperactivity phenotype. A control experiment with a construct expressing a frameshifted peptide should be included. From the presentation in Figure 2a, the peptide was tagged with FLAG-HA. Can the expression of the peptide be detected by Western blot/immunostaining? Have the authors tried to rescue the phenotype with human HMGN1?

Please see response to Comment #6.

5. A question related to the comment above: is it possible to detect native, untagged peptides by mass spectrometry? Have the authors tried to do it?

We have not attempted to perform mass spectrometry, though we expect the peptides to be detectable, as we detect them by antibody staining that is absent in the mutant embryos. We include a point in the discussion (page 11) about additional approaches beyond ribosome profiling, including mass spectrometry, to identify small peptides.

6. The manuscript would gain on clarity if a more detailed description of the behavioral assays used as a functional read-out was included in the main text. In general, the manuscript is partially hard to follow due to the insufficient data presentation, peptide size, peptide sequences, etc.

We have provided more detailed descriptions throughout the main text of the manuscript, specifically about behavioral assays (page 2), and have provided additional supporting information throughout the main figures and figure supplements about the peptide sizes (87aa and 93aa) in Figure 2E, peptide sequences in Figure 2E, Figure 2 —figure supplement 4, and Supplemental Table 2, and protein expression patterns in Figure 2 —figure supplement 2 and in Figure 4 —figure supplement 4.

7. The authors should elaborate on why they used a single linc-mipep mutant for the drug experiments but a double mutant for omni-ATAC experiments.

We elaborate in the Results section that we used single linc-mipep mutants for drug experiments, as we had found similar drugs that correlated with linc-mipep and linc-wrb mutant fingerprints (which we now include as data). To ensure we assessed the full loss-of-function of these two related genes, we performed omni-ATAC-seq experiments in double mutants.

Detailed response: In this revised manuscript, we now present data showing that NMDA receptor antagonism is a common pathway affected in linc-wrb mutants (Figure 3 —figure supplement 1B and 2D-G). To circumvent batch effects from unmatched (non-sibling) samples, and because our results so far indicated generally overlapping functions for linc-mipep and linc-wrb, we chose to analyze linc-mipep mutant brain cells and validate findings in vivo in linc-mipep; linc-wrb double mutants. We describe this rationale on pages 6 and 7. We clarify in the Discussion section (pages 10-11) that further work will be needed to elucidate the overlapping and unique molecular roles of the proteins encoded by linc-mipep and linc-wrb.

Location of new data: We present the linc-mipep or linc-wrb mutants’ correlating fingerprints in Figure 3—figure supplement 1A and B, and note overlapping hits in blue text. We also present the results of linc-wrb mutants treated with either flumethasone or L-701-324 in Figure 3 —figure supplement 2D-G.

8. The authors should clearly state in the discussion that the molecular mechanisms of action of both studied peptides remain completely unknown. For example, how do they affect chromatin accessibility? What are their interaction partners if any? etc

We have elaborated in the discussion that the molecular mechanisms of these proteins, both direct and indirect, remain unknown (pages 10-11). We provide references on work done on the related Hmgn1 in mammals (page 1), and state that future work will be needed to fully elucidate the molecular mechanisms and binding/interaction partners for each protein in zebrafish.

Reviewer #4 (Recommendations for the authors):

The manuscript can be significantly improved by addressing the following concerns:

Concerns and suggestions:

One of the main conclusions from this study is that both micropeptides act together/somewhat redundantly, which would explain why knocking out both peptides has a stronger phenotype than knocking out either peptide individually. While this is a possibility (that they act redundantly, targeting the same regions in the genome), other scenarios are possible, e.g. that they have distinct or only partially overlapping chromatin targets and thus regulate different genes/pathways, which in the end converge on the same behavioral phenotype.

To reconcile this, the rescue with linc-mipep should be attempted for the double mutant and also the single linc-wrb mutant (since it is a ubiquitous overexpression line, it may rescue both). Similarly, a rescue by linc-wrb (which is not shown, also not for the single mutant) would be important to support the conclusion that the phenotype is due to loss of this peptide, and that it acts redundantly with linc-mipep. Moreover, it will also be important to quantify and provide statistics for the overexpression effect of the rescue construct in the WT background – is there a significant activity decrease by linc-mipep OE? Overall, the authors mention the dosage-sensitivity of HMGN1 proteins, but with the current analyses fail to provide convincing evidence of a clear dosage effect of the two peptides since they could potentially target different, only in part redundant, genes or have different effects in different cell types. To this end, the use of either the single linc-mipep vs double linc-mipep/linc-wrb mutant is inconsistent in the second half of the manuscript: global ATAC-Seq data is only provided from the double mutant while single-cell-analyses are only provided from the single linc-mipep mutant. Moreover, the ChIP-seq analyses provided are only summarized for both proteins combined in the main Figure, but used individual antibodies, leaving it unclear how the individual profiles look (the authors should follow the standard convention on how to show the quality of ChIP-seq data, e.g. provide ChIP-seq tracks at least for some example genes since the quality of the data remains unclear, and differences between the two Abs cannot be assessed; the Suppl Table 8 also only provides a combined list of 37 genes for which ChIP seq peaks were identified though it would be important to show it individually for each AB; also the number of genes bound appears really really small? Are these ALL genes with a ChIP-seq peak?).

We have addressed parts of this comment in Comments #3, 6, and 7 above. Please see below for copied responses per comment section:

Comment 1a: One of the main conclusions from this study is that both micropeptides act together/somewhat redundantly, which would explain why knocking out both peptides has a stronger phenotype than knocking out either peptide individually. While this is a possibility (that they act redundantly, targeting the same regions in the genome), other scenarios are possible, e.g. that they have distinct or only partially overlapping chromatin targets and thus regulate different genes/pathways, which in the end converge on the same behavioral phenotype.

To reconcile this, the rescue with linc-mipep should be attempted for the double mutant and also the single linc-wrb mutant (since it is a ubiquitous overexpression line, it may rescue both). Similarly, a rescue by linc-wrb (which is not shown, also not for the single mutant) would be important to support the conclusion that the phenotype is due to loss of this peptide, and that it acts redundantly with linc-mipep. Moreover, it will also be important to quantify and provide statistics for the overexpression effect of the rescue construct in the WT background – is there a significant activity decrease by linc-mipep OE?

See Essential revisions comment 7.

Comment 1b: Overall, the authors mention the dosage-sensitivity of HMGN1 proteins, but with the current analyses fail to provide convincing evidence of a clear dosage effect of the two peptides since they could potentially target different, only in part redundant, genes or have different effects in different cell types. To this end, the use of either the single linc-mipep vs double linc-mipep/linc-wrb mutant is inconsistent in the second half of the manuscript: global ATAC-Seq data is only provided from the double mutant while single-cell-analyses are only provided from the single linc-mipep mutant.

To circumvent batch effects from unmatched (non-sibling) samples, and because our results so far indicated generally overlapping functions for linc-mipep and linc-wrb, we chose to analyze linc-mipep mutant brain cells and validate findings in vivo in linc-mipep; linc-wrb double mutants. We describe this rationale on pages 6 and 7. We clarify in the Discussion section (pages 10-11) that further work will be needed to elucidate the overlapping and unique molecular roles of the proteins encoded by linc-mipep and linc-wrb.

Comment 1c: Moreover, the ChIP-seq analyses provided are only summarized for both proteins combined in the main Figure, but used individual antibodies, leaving it unclear how the individual profiles look (the authors should follow the standard convention on how to show the quality of ChIP-seq data, e.g. provide ChIP-seq tracks at least for some example genes since the quality of the data remains unclear, and differences between the two Abs cannot be assessed; the Suppl Table 8 also only provides a combined list of 37 genes for which ChIP seq peaks were identified though it would be important to show it individually for each AB; also the number of genes bound appears really really small? Are these ALL genes with a ChIP-seq peak?).

See Essential revisions comment 3.

The second major concern relates to the unclear link between the different phenotypes observed: how can the behavioral phenotypes be reconciled with the molecular phenotypes (chromatin accessibility in specific neurons or precursors), and how can the chromatin accessibility differences in WT vs mutant be reconciled with the measured transcriptional/gene expression differences? Is there any evidence for NMDA being downstream of linc-mipep/wrb regulation? I applaud the authors on generating all these interesting data sets and analyses, but without connecting them together (here the focus for example on just the single linc-mipep mutant would be helpful, but the global brain ATAC-Seq data is only shown for the double mutant; and vice-versa, the single-cell ATAC-Seq data with the chromatin accessibility changes detected in specific cell types is not linked back to the ChIP-seq profiles of the peptides). Do glial progenitors and OPCs of the mutant(s) have altered expression of the underlying loci with altered accessibility? In Figure 4c, e, f, h, and Extended 6c-e, how does the chromatin accessibility translate to rna level in Purkinje cells and radial glia cells? How many sites lose accessibility in OPCs? Is "broad loss" a fair assessment of the observation?

Without addressing the two major concern points, the statement that linc-mipep and linc-wrb 'broadly regulate the chromatin state of neural cell types, most impacting OPCs and cerebellar granule cell gene expression networks and cell states in a basal vertebrate' appears overstated and would need to phrased differently/softened.

In this revision, we have now more fully analyzed and characterized the molecular phenotypes to link them with the behavioral phenotypes. In these analyses, we more deeply connect the single-cell analyses with bulk chromatin accessibility phenotypes. To address most of this comment, we refer to our response to a very similar point made by another reviewer, which we believe address the points made in this comment. We note that for single cell experiments with sparse data, the link between accessible regions as potential enhancers and the genes affected by those enhancers is a large challenge in the field, yet in our analyses we have shown how some of the cell type-specific transcriptomic changes show neighboring chromatin accessibility changes. We also note that we have modified our language to include a “loss” of accessibility (in the most statistically significantly affected peaks) in OPCs instead of “broad loss,” to more accurately present these results.

See also Reviewer 1 comment 1.

Without addressing the two major concern points, the statement that linc-mipep and linc-wrb 'broadly regulate the chromatin state of neural cell types, most impacting OPCs and cerebellar granule cell gene expression networks and cell states in a basal vertebrate' appears overstated and would need to phrased differently/softened.

We have adjusted the language as suggested.

https://doi.org/10.7554/eLife.82249.sa2

Article and author information

Author details

  1. Valerie A Tornini

    Department of Genetics, Yale University, New Haven, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    For correspondence
    valerie.tornini@yale.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2877-6057
  2. Liyun Miao

    Department of Genetics, Yale University, New Haven, United States
    Contribution
    Data curation, Formal analysis, Investigation, Methodology
    Contributed equally with
    Ho-Joon Lee
    Competing interests
    No competing interests declared
  3. Ho-Joon Lee

    1. Department of Genetics, Yale University, New Haven, United States
    2. Yale Center for Genome Analysis, Yale University, New Haven, United States
    Contribution
    Software, Formal analysis, Investigation, Visualization, Methodology
    Contributed equally with
    Liyun Miao
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3616-5387
  4. Timothy Gerson

    Department of Genetics, Yale University, New Haven, United States
    Contribution
    Data curation, Investigation
    Contributed equally with
    Sarah E Dube
    Competing interests
    No competing interests declared
  5. Sarah E Dube

    Department of Genetics, Yale University, New Haven, United States
    Contribution
    Data curation, Methodology
    Contributed equally with
    Timothy Gerson
    Competing interests
    No competing interests declared
  6. Valeria Schmidt

    Department of Genetics, Yale University, New Haven, United States
    Contribution
    Data curation, Investigation
    Competing interests
    No competing interests declared
  7. François Kroll

    Department of Cell and Developmental Biology, University College London, London, United Kingdom
    Contribution
    Software, Formal analysis, Visualization, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9908-2648
  8. Yin Tang

    Department of Genetics, Yale University, New Haven, United States
    Contribution
    Formal analysis, Investigation, Methodology
    Competing interests
    No competing interests declared
  9. Katherine Du

    1. Department of Genetics, Yale University, New Haven, United States
    2. Department of Computer Science, Yale University, New Haven, United States
    Contribution
    Formal analysis, Investigation, Visualization, Methodology
    Competing interests
    No competing interests declared
  10. Manik Kuchroo

    1. Department of Genetics, Yale University, New Haven, United States
    2. Department of Computer Science, Yale University, New Haven, United States
    Contribution
    Formal analysis, Investigation, Visualization, Methodology
    Competing interests
    No competing interests declared
  11. Charles E Vejnar

    Department of Genetics, Yale University, New Haven, United States
    Contribution
    Resources, Software, Visualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7132-4534
  12. Ariel Alejandro Bazzini

    1. Stowers Institute for Medical Research, Kansas City, United States
    2. Department of Molecular & Integrative Physiology, University of Kansas School of Medicine, Kansas City, United States
    Contribution
    Data curation, Investigation, Visualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2251-5174
  13. Smita Krishnaswamy

    1. Department of Genetics, Yale University, New Haven, United States
    2. Department of Computer Science, Yale University, New Haven, United States
    Contribution
    Resources, Supervision, Methodology
    Competing interests
    Reviewing editor, eLife
  14. Jason Rihel

    Department of Cell and Developmental Biology, University College London, London, United Kingdom
    Contribution
    Resources, Software, Supervision, Visualization, Methodology, Writing – review and editing
    For correspondence
    j.rihel@ucl.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4067-2066
  15. Antonio J Giraldez

    1. Department of Genetics, Yale University, New Haven, United States
    2. Yale Stem Cell Center, Yale University School of Medicine, New Haven, United States
    3. Yale Cancer Center, Yale University School of Medicine, New Haven, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Project administration, Writing – review and editing
    For correspondence
    antonio.giraldez@yale.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6823-137X

Funding

Eunice Kennedy Shriver National Institute of Child Health and Human Development (K99HD105001)

  • Valerie A Tornini

Hartwell Foundation (Postdoctoral fellowship)

  • Valerie A Tornini

Wellcome Trust (217150/Z/19/Z)

  • Jason Rihel

Simons Foundation Autism Research Initiative

  • Antonio J Giraldez

National Institute of Mental Health (MH118554)

  • Antonio J Giraldez

Eunice Kennedy Shriver National Institute of Child Health and Human Development (HD100035)

  • Antonio J Giraldez

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Acknowledgements

We thank Dr. Shawna Hiley and Dr. Ilil Carmi for editorial and scientific input; Dr. Kaya Bilguvar, Christopher Castaldi, and Dr. Guilin Wang from the Yale Center for Genome Analysis for sequencing support; Dr. Kaelyn Sumigray for sharing Leica confocal; Dr. Mayssa Mokalled for sharing animal transgenic lines; Dr. Marcus Ghosh for code used in the F0 behavioural data analysis and for teaching F.K. the approach; and Dr. Sumru Bayin, Dr. Sarah Ackerman, and members of the Giraldez and Rihel labs for critical feedback. Research reported in this publication was supported by a K99/R00 Pathway to Independence Award from the US NIH Eunice Kennedy Shriver Institute for Child Health and Human Development (K99HD105001) and a fellowship from the Hartwell Foundation (VAT), Wellcome Trust Investigator Award 217150/Z/19/Z (JR), and Simons Foundation grant and NIH grants R01 HD100035 and MH118554 (AJG). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or any funding sources. We acknowledge the Zebrafish Information Network (ZFIN).

Ethics

Fish lines were maintained in accordance with the AAALAC research guidelines, under a protocol approved by the Yale University Institutional Animal Care and Use Committee (IACUC Protocol Number 2021-11109). We have complied with all relevant ethical regulations under this protocol.

Senior Editor

  1. Marianne E Bronner, California Institute of Technology, United States

Reviewing Editor

  1. Filippo Del Bene, Institut de la Vision, France

Version history

  1. Preprint posted: July 22, 2022 (view preprint)
  2. Received: July 28, 2022
  3. Accepted: April 14, 2023
  4. Version of Record published: May 16, 2023 (version 1)

Copyright

© 2023, Tornini et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,251
    Page views
  • 106
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Valerie A Tornini
  2. Liyun Miao
  3. Ho-Joon Lee
  4. Timothy Gerson
  5. Sarah E Dube
  6. Valeria Schmidt
  7. François Kroll
  8. Yin Tang
  9. Katherine Du
  10. Manik Kuchroo
  11. Charles E Vejnar
  12. Ariel Alejandro Bazzini
  13. Smita Krishnaswamy
  14. Jason Rihel
  15. Antonio J Giraldez
(2023)
linc-mipep and linc-wrb encode micropeptides that regulate chromatin accessibility in vertebrate-specific neural cells
eLife 12:e82249.
https://doi.org/10.7554/eLife.82249

Share this article

https://doi.org/10.7554/eLife.82249

Further reading

    1. Developmental Biology
    Chhavi Sood, Md Ausrafuggaman Nahid ... Sarah E Siegrist
    Research Article

    Neuroblasts in Drosophila divide asymmetrically, sequentially expressing a series of intrinsic factors to generate a diversity of neuron types. These intrinsic factors known as temporal factors dictate timing of neuroblast transitions in response to steroid hormone signaling and specify early versus late temporal fates in neuroblast neuron progeny. After completing their temporal programs, neuroblasts differentiate or die, finalizing both neuron number and type within each neuroblast lineage. From a screen aimed at identifying genes required to terminate neuroblast divisions, we identified Notch and Notch pathway components. When Notch is knocked down, neuroblasts maintain early temporal factor expression longer, delay late temporal factor expression, and continue dividing into adulthood. We find that Delta, expressed in cortex glia, neuroblasts, and after division, their GMC progeny, regulates neuroblast Notch activity. We also find that Delta in neuroblasts is expressed high early, low late, and is controlled by the intrinsic temporal program: early factor Imp promotes Delta, late factors Syp/E93 reduce Delta. Thus, in addition to systemic steroid hormone cues, forward lineage progression is controlled by local cell-cell signaling between neuroblasts and their cortex glia/GMC neighbors: Delta transactivates Notch in neuroblasts bringing the early temporal program and early temporal factor expression to a close.

    1. Developmental Biology
    Nicolas G Brukman, Clari Valansi, Benjamin Podbilewicz
    Research Article Updated

    The fusion of mammalian gametes requires the interaction between IZUMO1 on the sperm and JUNO on the oocyte. We have recently shown that ectopic expression of mouse IZUMO1 induces cell-cell fusion and that sperm can fuse to fibroblasts expressing JUNO. Here, we found that the incubation of mouse sperm with hamster fibroblasts or human epithelial cells in culture induces the fusion between these somatic cells and the formation of syncytia, a pattern previously observed with some animal viruses. This sperm-induced cell-cell fusion requires a species-matching JUNO on both fusing cells, can be blocked by an antibody against IZUMO1, and does not rely on the synthesis of new proteins. The fusion is dependent on the sperm’s fusogenic capacity, making this a reliable, fast, and simple method for predicting sperm function during the diagnosis of male infertility.