Dysmorphic features.

(A) Facial dysmorphic features. Individual 9, with variant c.1379delA, p.Lys460SerfsTer19, has a broad nasal tip, protruding cupped ears, gingival cleft, micrognathia. Individual 10, with c.1398+1del, affecting the exon 9-10 splice site, has a prominent forehead, deeply groved philtrum, broad nose, and protuberant ear helices. Individual 11, with variant c.1419del, p.Glu474AsnfsTer5, has a high forehead, hypertelorism, telecanthus, high arched palate, ptosis, small mouth, long philtrum, malar hypoplasia, and hypomimia. Individual 12, with variant c.1420del, p.Glu474AsnfsTer5, has left ptosis, hypertelorism, long philtrum, a broad nasal tip, micro- and retrognathia. Individual 17, with variant c.1489dupA, p.Ile497AsnfsTer31, has mild ptosis, frontal bossing, telecanthus, a short nasal bridge, long philtrum. Individual 21 (c.1587_1588del, p.Ala530ArgfsTer38) has a wide forehead and a broad nasal ridge. Individual 23 (c.1804C>T, p.Gln602Ter) has almond-shaped eyes, slightly upslanting palpebral fissures, a hyperteloric appearance to eyes with telecanthus, and a bulbous nose with a slightly bifid nasal tip. Individual 29 has a C-terminal variant (c.3435dupT, p.Ala1146CysfsTer15) inherited from his mother (not shown here), who both having a triangular-shaped face with ptosis, hypertelorism or telecanthus and a thin upper lip. (B) Hand and foot features. Individual 9 had bilateral mild radial deviation of wrists, bilateral mildly overlapping toes. Individual 10 camptodactyly of distal interphalangeal (DIP) joints, small hypothenar eminence contour. Individual 11, presented with the proximal phalanx of his fifth finger short and the base lower, as indicated by the position of the proximal interphalangeal crease. Individual 12 presented with clinodactyly of his fifth finger, and his palmar creases are hardly visible, small hypothenar eminence contour. Individual 21 presented long flat feet and increased sandal gaps. Individual 29 presented with the position of the base of the small finger and the proximal interphalangeal joint are aberrant with a deviation of the hands with slender fingers, as seen for other individuals. (C) Individual 10 presented with circumferential skin folds on upper (and lower) extremities, and prominent umbilicus (not shown). Individual 17 presented with mild laxity of wrists, elbows, knees, and fingers, and out-toeing of his feet (suspect vertical talus) with folded skin and an umbilical hernia. (D) Individual 21 presented with abnormal teeth and dentition. (E) Individual 23 presented a lobulated, slightly bifid uvula. His congenital pes planovalgus of both feet, right single palmar crease, and narrowing bilateral distal palms are not shown. (F) Overall frequencies of recurring (more than one) dysmorphisms, with calculated percentages that indicate the frequencies as a percentage of addressed individuals. Only explicitly known features that were not adressed per indivdual are subtracted, for example when individuals are too young to asses certain phenotypes, or when complete assesment was not yet performed by the time of writing. Abbreviation: PRS = Pierre Robin Sequence.

Phenotype frequencies.

Frequencies of recurring phenotype features, except dysmorphism features, with calculated percentages that indicate the frequencies as a percentage of addressed individuals. Only explicitly known features that were not addressed per individual are subtracted, for example when individuals are too young to asses certain phenotypes, or when complete phenotype assessment was not (yet) performed by the time of writing. General categories are used to calculated frequencies of related features. Specific features with their recurrence (between parentheses) are written out next to the calculated percentage of each bar. Abbreviations: ID = Intellectual disability [OMIM: 156200], GGD = Global developmental delay, SD = Speech disorder/delay, ADHD = Attention Deficit&Hyperactivity Disorder, IQ = Intelligence Quotient, UTI = Urinary Tract Infection, CNS = Central Nervous System, SWH = Septal Wall Hypertrophy, ASD = Atrial Septal Defect, PDA=Patent Ductus Arteriosus [OMIM: 607411], VSD=Ventral Septal Defect [OMIM: 614431], PFO=Patent Foramen Ovale, WHO = World Health Organization, IUGR = Intrauterine Growth Restriction, EIF = Echogenic Intracardiac Focus, PRS = Pierre Robin Sequence.

Schematic representation of ARID5B variant types and their consequences

(A) The two main isoforms generated from the curated transcript variants 1 and 2. The shorter isoform uses an alternative exon, here named 4b (light blue). Further, the two known domains, the BAH and the DNA binding BRIGHT domain are shown in yellow and dark blue, respectively. Sox9 has been shown to associate with the latter 2/3 of the protein (shown in orange), downstream of the BRIGHT domain. Variant p.(Asn434LysfsTer45) in this cohort has been included in the OMIXCARE cohort as well39. (B) Linear representation of isoform I, with the locations of the variants described in the current cohort on top and variants retrieved from gnomAD V4 or genomic studies below the protein bar. Diamonds indicate truncating variants. Light blue diamonds are predicted to cause NMD of only the long transcript variant 1. Red diamonds indicate variants that are predicted to cause NMD of both transcript variants. Dark blue variants, mainly found in the cohort, are both proven and predicted truncating variants associated with ID. The yellow diamonds represent truncating variants predicted to skip NMD, but with unknown phenotypes (retrieved from gnomAD V4). Green dots represent missense variants associated with neurological perturbations (four from the cohort, two from ASD studies2,14. (C) Isoform-I is represented with on top aligned the missense tolerance rate (MTR). The MTR indicates the running average P-value of synonymous versus observed missense variant. A lower value indicates a lower missense tolerance40. At the bottom, the homology between mouse and human ARID5B is represented with each non-conserved amino acid represented as a gap. Critical regions like the BRIGHT domain show a low missense tolerance and high conservation. Three mouse-to-human conserved domains are indicated with blue boxes. (D) Zebrafish-to-mouse-to-Human conservation of the loci with cohort missense variants (4X) and the ARID5B C-terminus. In (E), to confirm the expression of stable mutant RNA transcripts (escaping NMD), Sanger sequencing was performed on cDNA generated from RNA purified from patient cell lines, following DNAse treatment, and using primers that generate an amplicon that crosses the exon 9-10 junction (further avoiding amplification of DNA instead of RNA). The expression of mutated RNA with similar Sanger sequencing peak depth as wild type RNA. In (F), we validated this by quantifying the RNA levels and comparing normalized exon 9/exon10 RNA levels between controls and cell lines from patients with ARID5B variants, observing no effect on average RNA levels. Three PBMC control cell lines (blue dots) and three LCL control cell lines (yellow dots) were compared with two LCL clones generated at two different ages from individual 14 (c.1489dupA, p.Ile497AsnfsTer31; brown dots), and individual 23 (c.1804C>T, p.Gln602Ter; light blue dot). The line indicates average expression.

Mouse development and behavior are affected in Arid5bemQ522* mice

(A) Sanger sequencing-based validation of genotype showing the four induced DNA mutations leading to a gained stop at Q522. (B) Kaplan-Meier survival plot over one year. The majority homozygous Arid5bQ522* mice (green line) die in the first postnatal days. (C) Body weights of Arid5bQ522* mice across development. Independent ages (P1, P6, P60) were analyzed using one-tailed t-tests to assess a reduction in body weight, comparing wild type with heterozygous mice. Repeated measurements at P8 and P21, obtained from the same cohort of mice, were analyzed using a linear mixed-effects model to account for repeated measures. Body weights at P60 were corrected for sex to account for naturally lighter female weights. Heterozygous Arid5bQ522* mice have significantly reduced weight at P6 (P = 0.0084, df = 13), P8 (P = 0.0265, df = 22.10), and P21 (P = 0.00076, df = 22.10), but not at P60 (df = 21), P1 was inconclusive (P = 0.129, df = 10). (D) Open field test time spent in the center of the open field box as a percentage of total time spent in the open field box per mouse (wild type mice yellow dots/left box; Arid5bQ522* mice orange dots/right box). A two-tailed t-test showed a significant increase in time spent in the center (P = 0.0223, df = 42). One outlier was removed. (E) The total distance covered during the total time spent in the open field box per mouse (wild type mice yellow dots/left box; Arid5bQ522* mice orange dots/right box). A two-tailed t-test was inconclusive (P = 0.1852). (F) Three-chamber sociability test. Left side (“Sociability”): difference in time spent exploring an object versus a mouse for wild-type (yellow) and Arid5bQ522*mice (orange) mice. Right side (“Social Novelty”): difference in time spent exploring a familiar versus a novel mouse. Expected social behavior was confirmed using one-tailed t-tests to asses increased time spent with a mouse vs object or novel mouse vs familiar mouse. (G) Total social interaction time (summed across the sociability and social novelty experiments) in wild-type (WT, yellow) and Arid5bQ522* (orange) mice. A one-tailed (H₁: reduced social interaction in heterozygous mice) t-test was not applicable, since the mean interaction time was increased in Arid5bQ522* mice. A two-tailed t-test comparing total social interaction time between genotypes was inconclusive (WT = 132.17 ± 56.58 s, n = 21; Q522* = 173.64 ± 79.78 s, n = 21; t(36.1) = –1.94, P = 0.0598). Three outliers were identified and removed using Grubbs’ test — two in the wild-type group and one in the Arid5bQ522* group. Statistical significance is indicated as follows: P < 0.05 (*), P < 0.01 (**), and p < 0.001 (***).

Terminating variants affect cellular localization with divergent effect sizes.

(A) Overexpression with both C- and N-terminally, HA- or FLAG-tagged ARID5B isoforms in HEK293T cells show a strong preference for nuclear localization. The N-terminally HA or FLAG-tagged ARID5B, containing the variant of individual 10 (p.Glu522Ter), strongly locate in the cytosol instead. (B) Ratio of nuclear versus cytosolic ARID5B of (FLAG-tagged) wild-type, p.Glu522Ter, and the most C-terminal variant in our cohort detected for individual 22 and his mother (p.Ala1146CysfsTer15). The dots represent averages per experiment (see Methods). Two-tailed, two-sample t-tests (equal variance) compared each mutant to WT, with Bonferroni correction (α = 0.05). WT vs A1146C fsTer15: t(4) = 2.34, adjusted p = 0.176 (#), Cohen’s d ≈ 1.87 (ns). WT vs Q522*: t(4) = 7.39, adjusted p = 0.00412, Cohen’s d ≈ 6.49 (**). In (C), FLAG-tagged predicted isoform III cellular localization showing nuclear localization. (D) Analysis of ARID5B sequence using various structural prediction tools. The graph shows predicted nuclear localization sites (NLSs) generated by NLStradamus representing 3 Hidden Markov Modeled NLS states18. The elevated/peak signals predict three regions that could regulate nuclear localization. The most C-terminal elevated signals co-localize with predicted NLS and nuclear export signals (NES) predicted by other tools, here shown by name of the tool and amino acid sequence. Furthermore, the most abundant kinase signal transduction target sites that Motif Scan/Prosite19 predicted were Casein kinase 2 (CK2) sites, that are scattered and marked in red, and cAMP phospho-sites, located within predicted NLSs. This was in line with both PKA and CK2 involved in regulating pathways wherein ARID5B is active30,41. The nuclear export site is shown as black/yellow stripes. The truncating variants of individuals 16 and 20 are shown too, truncated proteins of 521 and 967 aa long, respectively. (E) Overview of mutated ARID5B proteins tested. In brown variants from cohort individuals, in blue a variant from the gnomAD database, in chartreuse green, a variant that was previously associated with ASD14, finally, in black are the designed variants. (F) Microscope images of the cellular localization of various variants in different growth conditions to assess cellular localization and stability of the phenotype under divergent conditions. Overnight transfected cells were then cultured for 24 hours without serum, after which serum was added 3 hours before staining (+S24h), or no serum was added (SF), or kept in serum-rich medium during the whole experiment (+S). (G) Quantification of the expression area of ARID5B variants using a one-way ANOVA (F(11, 57) = 20.78, P = 4.37e-16, one-tailed, alpha = 0.05). Post hoc t-tests (one-tailed; increase of surface area for defect nuclear localization) comparing each variant to the control (ARID5B-IsoI) were performed and Bonferroni-corrected. Significant increases were observed for Q522* (P-adj. = 1.14 × 10⁻7), Q474* (P-adj. = 5.68 × 10⁻5), Y968* (p adj = 0.00125), del1018-1026 (P-adj. = 0.036), and G634Afs34 (P-adj. = 1.54 × 10⁻7). Error bars indicate SD. The grey-blue bar represents wild-type ARID5B, the orange bars are variants identified in our cohort, and the green bars are in-house designed variants. The light yellow bar one of the missense variants found in the genomic studies on ASD. Finally, the red bar represents a truncating gnomAD_V3 variant. P-adj. < 0.05 (*), Padj. < 0.01 (**), and P-adj. < 0.001 (***), indicate adjusted (P-adj.) afer post-hoc Bonferroni adjustments. For conciseness of the figure we did not include the exact P-value in this figure.