Neural microexon program in zebrafish.

A) Heatmap showing relative inclusion levels of neural microexons identified in zebrafish (Table S1). The value for each tissue type corresponds to the average inclusion in the samples for that tissue in VastDB (Table S1). Only events with sufficient read coverage in >10 tissue groups are plotted (N = 157) and missing values were imputed. B) Inclusion of neural microexons along embryo development (egg to 7 days post-fertilization [dpf]). Thick line corresponds to the median PSI and the shades to the first and third quartile of the PSI distribution. C) Enriched Gene Ontology (GO) categories among genes harboring neural microexons in zebrafish. GO terms were grouped into networks by ClueGO. D) Evolutionary conservation of zebrafish neural exons of different length group at the genomic and tissue-regulatory level compared with human. Exons conserved at the regulatory level (blue) are those with enriched inclusion in neural samples (ΔPSI ≥ 15) also in human. Those with no neural regulation (black) are conserved at the genomic level (Irimia, et al. 2009), but are not neurally enriched. Those conserved but with insufficient coverage to assess regulation are indicated as “No coverage” (dark grey). P-value corresponds to a two-sided Fisher’s Exact test for neural conservation vs. others between exons > 51 nts and 3-27 nts. E) Distribution of exons of different length by the level of srrm3/4 misregulation in larva or retina (see Methods). P-value corresponds to a two-sided Fisher’s Exact test for non-regulated exons (ΔPSI ≥ -15) vs. others between exons > 51 nts and 3-27 nts. F) Change in inclusion levels [ΔPSI (eMIC mutant-WT)] for all exons shorter than 300 bp. Dots with different blue colors correspond to neural exons of different length.

Selected neural microexons and experimental design.

A) Inclusion levels of the 21 selected conserved microexons across zebrafish tissues as well as differential inclusion in neural vs other tissues (ΔPSI neural) in zebrafish and human, change in inclusion in response to eMIC depletion in zebrafish (ΔPSI eMIC) or eMIC overexpression in human (ΔPSI eMIC OE), and change in inclusion between ASD patients and control individuals (ΔPSI ASD). The three microexons with asterisks were excluded from phenotypic analyses. B) Schematic representation of the CRISPR-Cas9 based deletions of individual microexons. A pair of guide RNAs flanking each microexon were designed, which is expected to lead to normal gene expression without the microexon in all cells. C) Examples of two RT-PCRs testing the inclusion of the targeted microexon upon CRISPR-Cas9 removal. itsn1_1 shows the expected clean deletion in the homozygous, while pus7_1 exhibits inclusion of a cryptic sequence of higher length (red block) (see Figure 2 - figure supplement 1). D) Schematic representation of srrm3 and srrm4 protein domains and the impact of the CRISPR-Cas9 derived mutations (from (Ciampi, et al. 2022)). E) Distribution of body lengths at 30 dpf for heterozygous (green) and homozygous (yellow) fish for each microexon deletion or the srrm4 mutation. For srrm3, the values are shown for 90 dpf. P-value corresponds to a two-sided t- test. F) Top: two representative images from WT and srrm3 homozygous mutant larvae showing staining for 3A10 in Mauthner cells (schematized on the right side). Bottom: quantification of normal and altered number of larvae with respect to Mauthner cell morphology in homozygous mutants for each microexon or regulator line.

Impact of microexon deletion on neurite outgrowth.

A) Schematic representation of the experimental design used to assess neurite outgrowth in zebrafish neuronal primary cultures (see Methods). B) Confocal images of example microexon deletions at 24 hours post-plating (hap). 20x magnification images, white squares indicate the zoom region amplified in the right panel (digital zoom). Scale bar 50µm. C) Heatmap showing the median percent of change in neurite length at 10 and 24 hap of the homozygous mutant with respect to the matched control neurons (HuC:GFP line) for each main microexon deletion line (data for all tested lines in Figure 3 - figure supplement 2). Significance is based on the median of p-value distribution of 10,000 bootstrap resampling Wilcoxon tests for each main founder. * 0.01 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001, *** p < 0.0001. D) Boxplots of the distribution of the length of the longest neurite (one neuron per data point) for both lines of evi5b and vav2, as well as the length distributions for neurons of WT and homozygous deletion (Del) siblings for the main founders. For the regulators, neurite length distributions for neurons of WT and homozygous mutants (Del) siblings are shown for srrm3, srrm4 and the double mutant srrm3/4, for which the control sibling corresponds to the WT of srrm3 but homozygous mutant for srrm4 (i.e., srrm4 Del). P-values correspond to ANOVA tests for evi5b and vav2 founders and for siblings the median of 10,000 bootstrap Wilcoxon tests. * 0.01 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001, *** p < 0.0001. E) Schematic representation of the domain architecture of VAV2 in zebrafish and region encoded by the microexon and upstream and downstream exons (upper blocks). F) Luciferase readout of the activation of SRF upon ectopic expression of zebrafish VAV2 proteins with and without the microexon, a negative control and an oncogenic VAV2 mutant protein in COS7 cells. P-values correspond to Student’s t-test after Holm-Sidak’s multiple test correction. G) Western blot for each overexpressed protein. H) Schematic representation of the domain architecture of EVI5B in zebrafish and region encoded by the microexon and upstream and downstream exons (upper blocks). Microexon inclusion/exclusion leads to different C-termini. I) NanoBRET quantification of protein-protein interaction between EVI5B with and without the microexon and RAB11 proteins from zebrafish.

Impact of microexon misregulation in larval activity and response to stimuli.

A) Experimental design to assess larval activity patterns and response to stimuli using the DanioVision system and the EthoVision XT - Video tracking software. The protocol we implemented consists of 5’ of habituation, 25’ of baseline recording, five alternating 10’ intervals of dark-light, 10’ of re-habituation and the tapping experiment (30 taps at 1 Hz). B) Heatmap showing the percent of change with respect to the WT value for the homozygous (Del) of each main microexon deletion line as well as the single and double regulator mutants for features related to activity and response to stimuli (full plots in Figure 4 - figure supplement 1-14). For visualization purposes, the percent change for median WT values equal to zero was computed using the minimum median WT value of the relative category across founders. P-values correspond to the median P-value from 100 permutation tests selecting 10 observations per genotype across replicates of the main founder. C) Left: Activity (percentage of Δpixels/min) plots for a representative clutch of srrm3 and evi5b main lines. Traces represent mean ± SEM across larvae from the same clutch and genotype. Dark and light periods are shown with gray or white background, respectively. Right: boxplots showing the distribution of the mean baseline activity for each larvae of each genotype. P-values correspond to two-sided Wilcoxon Rank-Sum tests of each genotype against the WT. D) Left: Activity (percentage of Δpixels/min) plots for a representative clutch of srrm3/4 double mutant line. Traces represent mean ± SEM across larvae from the same clutch and genotype. Dark and light periods are shown with gray or white background, respectively. Right: boxplots showing the mean difference in activity during the dark-to-light transition for each larvae of each genotype. P-values correspond to two-sided Wilcoxon Rank-Sum tests. E) Left: Activity (percentage of Δpixels/sec) after each of the 30 consecutive taps representative clutch of srrm3/4 double mutant line. Traces represent mean ± SEM across larvae from the same clutch and genotype. F) Left: schematic representation of a well, divided between center and periphery regions. Right: two representative tracks of WT and Del srrm3/4 larvae for 60 sec at baseline. G) Boxplots showing the median percentage total distance moved (TDM) (mm) in the periphery for larvae of different genotypes of the srrm3/4 double mutant line, under baseline, dark and light conditions. P-values correspond to two-sided Wilcoxon Rank-Sum tests of each genotype against the WT. For simplicity, in D-G, WT denotes srrm3+/+,srrm4-/- fish and Het srrm3+/+,srrm4+/- ones. For all panels: * 0.05 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001, *** p < 0.0001.

Impact of microexon misregulation on social behavior.

A) Schematic representation of the behavioral station (details in Figure 5 - figure supplement 1) and experimental design for each tested pair of 30 dpf juveniles. B) Heatmap showing the percent of change with respect to the control value for each main microexon and regulator deletion line for nine different parameters related to locomotion, anxiety or social behavior (full plots in Figure 5 - figure supplement 2-10). For parameters referring to individual (locomotion, anxiety and leadership), two comparisons are shown: the average of the fish from Del-Del pairs vs the control Het-Het pairs (homotypic pairs, same “S”), or the value of the Del fish with respect to the Het one within each Het-Del pair (heterotypic pairs, different “D”). For group parameters (polarization order and distance to neighbors), the following two comparisons are shown: Het-Het vs Het-Del (HM) and Het-Het vs Del-Del (HD). C) Relative position map showing the position of the other fish of the pair (right genotype) with respect to the focal fish located at the center of the map (left genotype). For instance, (+/-) vs (-/-) shows the position of the Del fish respect to the focal Het one in a Het-Del pair. The merge plot of all fish pairs for vti1a, kif1b and srrm3 are shown. D) Boxplots for the fish pairs shown in (C). Left: ratio in front of the non-focal fish. Lower values indicate higher leadership (i.e., more time in front). “Same” corresponds to either Het-Het or Del-Del pairs and “Different” to Het-Del pairs, with values of individual fish plotted by genotype. Right: median of the distances between the two fish throughout the time course for each genotype pair combination. P-values correspond to Wilcoxon Rank-Sum tests for the indicated comparisons. * 0.01 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001, *** p < 0.0001.

Transcriptomic analyses of 5 dpf larvae suggest potential compensatory changes.

A) Schematic representation of the experimental design. B) Distribution of changes in gene expression (Δ corrected VST expression) between the Del and WT larvae for each main microexon deletion line, as well as change of the host gene (barplots) of closely related paralogs (chordate or younger origin according to Biomart) (dots). The color scale indicates the percentile of the expression change within the overall distribution. Asterisk on bars indicate change in the bottom or top decil. C) Left: Gene Ontology (GO) categories (dots) clustered in related groups (rows) that are enriched among genes globally changing upon individual microexon deletions. Middle: Normalized Enrichment Scores (NES; X-axis) and adjusted p-values (color code) for the same GO categories in the comparison of srrm3 mutant larvae and WT siblings. Right: heatmap showing the NES for the gene sets comprising the union of all GO categories within each group for each specific microexon comparison. Stars indicate adjusted p-value < 0.01. D) NES for specific gene sets (Table S7) with relevance in neurobiology and/or development for each microexon deletion comparison.

Identification and validation of neural microexons in zebrafish.

A) Distribution of neural exons in zebrafish and human according to their length group. B) RT-PCR assays assessing the inclusion of 21 selected microexons across zebrafish tissues. Primer sequences are provided in Table S4. C) Evolutionary conservation of human neural exons of different length group at the genomic and tissue-regulatory level compared with zebrafish. Exons conserved at the regulatory level (blue) are those with enriched inclusion in neural samples (ΔPSI ≥ 15) also in zebrafish. Those with no neural regulation (black) are conserved at the genomic level (Irimia, et al. 2009), but are not neurally enriched. Those conserved but with insufficient coverage to assess regulation are indicated as “No coverage” (dark grey). P-value corresponds to a two-sided Fisher’s Exact test for neural conservation vs. others between exons > 51 nts and 3-27 nts. E) Distribution of exons of different length by the level of upregulation upon overexpression of the regulator in HEK cells (see Methods). P-value corresponds to a two-sided Fisher’s Exact test for non-regulated exons (ΔPSI ≤ 15) vs. others between exons > 51 nts and 3-27 nts.

Validation of microexon deletion lines.

RT-PCR validations of embryo siblings for the main founder of each zebrafish microexon deletion line and genotype are shown. Wildtype (+/+), heterozygous (+/-) and homozygous (-/-). Red font indicates the three lines where the deletion of the microexon leads to the inclusion of cryptic sequences, potentially producing a loss of function of the gene. These microexons were excluded from the subsequent analyses.

Zebrafish neuronal primary culture.

A) Schematic summary of the experimental steps in the generation of zebrafish neuronal primary cultures from FACS-sorted HuC:GFP cells. B) Confocal images of immunofluorescence anti GFP and acetylated tubulin to study zebrafish neuronal outgrowth over time (4, 8, 10, 24, 48, 72 hours after plating [hap]). Top panel: 20x confocal images. Bottom panel: digital zoom to show details of neural development progression. Red asterisk shows the two timepoints selected for this study. Scale bars 50µm. C) Neurite length manual quantifications of the corresponding confocal images.

Quantification of neurite outgrowth across zebrafish mutant lines.

A) Boxplots showing neurite length distributions at 10 and 24 hap for each founder of the microexon deletion lines. The main founder (shown in Figure 3C) is highlighted in bold. Red boxplots correspond to neurons from a Del x Del cross for each microexon line. Dark boxplots correspond to matched control neurons from WT HuC:GFP fish processed in parallel in each experiment (see Methods). B) Boxplots showing neurite length distributions at 10 and 24 hap for the main founder of the microexon deletion lines. The second founder was not tested since no significant differences were found in any of the initial replicates with the exception of reln, for which no second founder could be obtained. C) Boxplots showing neurite length distributions at 10 and 24 hap for siblings of WT (blue), heterozygous (yellow) or homozygous (red) genotype. These samples were generated through Het x Het crosses, fin clip genotyping of each larva and processing of pools of larvae of the same genotype. P-value corresponds to ANOVA tests for each of the groups. * 0.01 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001, *** p < 0.0001.

Larval activity over the time course of the experiment.

Activity (percentage of Δpixels/min) for each microexon and regulator deletion line and biological replicate (clutch). Traces represent mean ± SEM across larvae from the same clutch and genotype. Dark and light periods are shown with gray or white background, respectively.

Activity response and habituation to tapping stimuli.

Log-scaled activity (percentage of Δpixels/sec) at each of the 30 consecutive taps for each microexon and regulator deletion line and biological replicate (clutch). A +1 pseudocount was added to the activity data for plotting purposes only. Traces represent mean ± SEM across larvae from the same clutch and genotype.

Baseline activity.

Boxplots showing baseline (B) activity (percentage of Δpixels/min) for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001, *** p < 0.0001.

Activity during the light intervals.

Boxplots showing activity (percentage of Δpixels/min) during the light intervals for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001, *** p < 0.0001.

Activity during the dark intervals.

Boxplots showing activity (percentage of Δpixels/min) during the dark intervals for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001.

Activity during the dark-to-light transitions.

Boxplots showing activity (percentage of Δpixels/min) during the dark-to-light (DL) transitions for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001.

Activity during the light-to-dark transitions.

Boxplots showing activity (percentage of Δpixels/min) during the light-to-dark (LD) transitions for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001, *** p < 0.0001.

Median activity after the first tap.

Boxplots showing activity (percentage of Δpixels/min) after the first tap for each microexon and regulator deletion line and biological replicate (clutch). P- values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001.

Difference in activity between the first tap and taps 3-5.

Boxplots showing the change in activity in the first tap vs. taps 3-5 for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001.

Difference in activity between the first tap and taps 21-30.

Boxplots showing the change in activity in the first tap vs. taps 21-30 for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001, ** 0.001 < p ≤ 0.0001.

Thigmotaxis during the baseline condition.

Boxplots showing activity (percentage of Δpixels/min) during the baseline condition for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001, *** p < 0.0001.

Thigmotaxis during the light intervals.

Boxplots showing activity (percentage of Δpixels/min) during the light intervals for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001.

Thigmotaxis during the dark intervals.

Boxplots showing activity (percentage of Δpixels/min) during the dark intervals for each microexon and regulator deletion line and biological replicate (clutch). P-values correspond to Wilcoxon Rank-Sum tests between WT and each of the other genotypes. * 0.05 < p ≤ 0.001.

Custom behavioral station.

A) Overview of the behavioral system, based on (Hinz and de Polavieja 2017). B-D) Details of the arena used to assess social interactions between pairs of juvenile fish.

Median speed by genotype for each main microexon and regulator line.

A) Median speed (cm/s) for each fish over the 10’ time course by pair type. “Same” corresponds to either Het-Het or Del-Del pairs, and each data point is the median of the pair, and “Different” to Het-Del pairs and each data point corresponds to an individual fish. None of the comparisons between Het and Del individuals were significant at a threshold of P < 0.01 using Wilcoxon Rank-Sum tests.

Median absolute normal acceleration by genotype for each main microexon and regulator line.

A) Median absolute normal acceleration (cm/s2) for each fish over the 10’ time course by pair type. “Same” corresponds to either Het-Het or Del-Del pairs, and each data point is the median of the pair, and “Different” to Het-Del pairs and each data point corresponds to an individual fish. None of the comparisons between Het and Del individuals were significant at a threshold of P < 0.01 using Wilcoxon Rank-Sum tests.

Median absolute tangential acceleration by genotype for each main microexon and regulator line.

A) Median absolute tangential acceleration (cm/s2) for each fish over the 10’ time course by pair type. “Same” corresponds to either Het-Het or Del-Del pairs and “Different” to Het-Del pairs. None of the comparisons between Het and Del individuals were significant at a threshold of P < 0.01 using Wilcoxon Rank-Sum tests.

Median distance traveled by genotype for each main microexon and regulator line.

A) Median distance traveled (cm) for each fish over the 10’ time course by pair type. “Same” corresponds to either Het-Het or Del-Del pairs and “Different” to Het-Del pairs. None of the comparisons between Het and Del individuals were significant at a threshold of P < 0.01 using Wilcoxon Rank-Sum tests.

Time in the periphery by genotype for each main microexon and regulator line.

A) Median normalized frames in periphery (from 0 to 1) for each fish over the 10’ time course by pair type. “Same” corresponds to either Het-Het or Del-Del pairs, and each data point is the median of the pair, and “Different” to Het-Del pairs and each data point corresponds to an individual fish. None of the comparisons between Het and Del individuals were significant at a threshold of P < 0.01 using Wilcoxon Rank-Sum tests.

Median normalized distance to origin by genotype for each main microexon and regulator line.

A) Median normalized distance to origin/center (from 0 to 1) for each fish over the 10’ time course by pair type. “Same” corresponds to either Het-Het or Del-Del pairs, and each data point is the median of the pair, and “Different” to Het-Del pairs and each data point corresponds to an individual fish. None of the comparisons between Het and Del individuals were significant at a threshold of P < 0.01 using Wilcoxon Rank-Sum tests.

Ratio neighbor in front by genotype for each main microexon and regulator line.

A) Median ratio neighbor (nb) in front (from 0 to 1) for each fish over the 10’ time course by pair type. “Same” corresponds to either Het-Het or Del-Del pairs and “Different” to Het-Del pairs. In both cases, each data point corresponds to an individual fish. None of the comparisons between Het and Del individuals were significant at a threshold of P < 0.01 using Wilcoxon Rank-Sum tests.

Polarization by genotype pair combination for each main microexon and regulator line.

A) Median value of the polarization order parameter (1 = fully concordant vectors) for each fish pair over the 10’ time course by pair type (Het-Het, Het-Del and Del-Del). P-values correspond to Wilcoxon Rank-Sum tests. *** p < 0.0001.

Interindividual distance by genotype pair combination for each main microexon and regulator line.

A) Median distance to neighbor (cm) for each fish pair over the 10’ time course by pair type (Het-Het, Het-Del and Del-Del). P-values correspond to Wilcoxon Rank-Sum tests. *** p < 0.0001.

Batch correction of RNA-seq data.

Principal component analysis (PCA) on raw log10 transformed gene counts (left) and after (right) degradation correction, normalization and batch correction using DegNorm, DESeq2 (VST) and WCGNA (Empirical Bayes-moderate adjustment), Bioconductor-packages, respectively.

Enriched GO categories.

Enrichment of individual GO categories from categories groups highlighted in Figure 6C for each microexon deletion. Each block is a group of related GO categories and the block is named after the underlined category. Values obtained from Gene Set Enrichment Analyses (GSEA).