RBMX primarily operates as a splicing repressor in human somatic cells.

(A) Schematic structure of RBMX family proteins (left side, cladogram) and amino acid similarity of each domain between RBMX protein and two other members of this family, RBMXL2 and RBMY. RRM, RNA recognition motif. CD, central domain important for recognition of nascent transcripts and nuclear localisation. CTD, C-terminal domain, involved in RNA binding (Elliott et al., 2019). (B) Western blot analysis shows efficient siRNA-mediated depletion of RBMX from MDA-MB-231 cells. (C) Pie chart showing the percentages of events controlled by RBMX in both MDA-MB-231 (this study) and HEK293 (Liu et al., 2017) cells. (D) Pie chart showing the percentages of events controlled by RBMX in both MDA-MB-231 and HEK293 cells that have been previously annotated (Refseq, Ensembl, Gencode), and those that are novel to this study. (E) Bar chart showing the different types of alternative splicing events controlled by RBMX protein in both HEK293 and MDA-MB-231 cells, summarising the proportion of splicing events that are activated by RBMX versus those that are repressed.

Splicing control and sites of RBMX protein-RNA interaction are enriched within long internal exons.

(A) Western blot showing levels of RBMX-FLAG protein, expressed after 24h treatment with tetracycline, compared to endogenous RBMX within HEK293 cells, both detected using α-RBMX antibody. α-GAPDH antibody was used as loading control. (B) RNAs cross-linked to RBMX-FLAG during iCLIP detected through the infrared adaptor (RBMX-RNA complexes). Lane 1, anti-FLAG pull-down from crosslinked HEK293 control cells not expressing RBMX-FLAG proteins, treated with 0.8 U/ml RNaseI. Lane 3, RBMX-FLAG pull-down crosslinked to RNA, treated with 2.5 U/ml RNaseI. Lanes 5-7, RBMX-FLAG pull-down crosslinked to RNA, treated with 0.8 U/ml RNaseI. Samples in lanes 5-7 were used for iCLIP library preparation. Lanes 2 and 4 are empty. (C) K-mer analysis shows the top 10 enriched motifs within sequences surrounding RBMX iCLIP tags. (D) Boxplot analysis shows sizes of exons containing splicing events regulated by RBMX, grouped by whether they contain CLIP tags or not. ****, p-value<0.0001 (Mann-Whitney test). (E) Boxplot analysis shows distribution of exon sizes relative to: exons contained in mRNA genes expressed in HEK293 cells (Liu et al., 2017); exons regulated by RBMX as identified by RNA-seq; exons containing RBMX binding sites as identified by iCLIP, listed independently of iCLIP tag density. Median sizes for each group are shown. ****, p-value<0.0001 (Wilcoxon rank test and Kruskal-Wallis test). (F) Distribution plot of exon sizes for the groups shown in (E). Note the increased accumulation of exons larger than 1000 bp (ultra-long exons) in RBMX bound and regulated exons compared to all exons expressed in HEK293 (Liu et al., 2017). (G) Bar plot indicating the proportion of ultra-long exons in the groups shown in (E, F). ****, p-value<0.0001 (Chi-squared test).

RBMX protein is important for full-length splicing inclusion of ultra-long exons involved in DNA repair and RNA polymerase II transcription.

(A) Gene Ontology analysis of genes regulated and bound by RBMX displaying significant Gene Ontology Biological Process (GOBP) terms containing at least 5% of the total gene list. FDR, False Discovery Rate. Count, number of genes in the GOBP group. GeneRatio, proportion of genes in the GOBP group relative to the full list of RBMX-regulated genes. (B) Snapshot of RNA-seq merged tracks from MDA-MB-231 cells and RBMX iCLIP tags from HEK293 cells from the IGV genome browser shows cryptic 3’ splice sites repressed by RBMX in ETAA1 exon 5. At the bottom, schematic of PCR products identified by RT-PCR in (C). (C) RT-PCR analysis shows splicing inclusion of ETAA1 exon 5 upon siRNA-mediated depletion of RBMX in the indicated cell lines. (D) Western blot analysis shows ETAA1 protein expression is dependent on RBMX. Anti-Tubulin detection was used as loading control. (E) Snapshot of RNA-seq merged tracks from MDA-MB-231 cells and RBMX iCLIP tags from HEK293 cells from the IGV genome browser shows RBMX represses a cryptic 3’ splice site within the ultra-long exon 13 of REV3L. At the bottom, schematic of PCR products identified by RT-PCR in (F). (F) RT-PCR analysis shows splicing inclusion of REV3L exon 13 upon siRNA-mediated depletion of RBMX in the indicated cell lines. (G) Snapshot of RNA-seq merged tracks from MDA-MB-231 cells and RBMX iCLIP tags from HEK293 cells from IGV genome browser. The location of experimentally mapped branchpoints relative to RBMX binding is indicated.

RBMXL2 can replace the activity of RBMX in ensuring proper splicing inclusion of ultra-long exons.

(A) Schematic of the time-course experiment used to analyse RBMXL2 function in RBMX-depleted HEK293 cells. All conditions were repeated in biological triplicates. (B) Western blot analysis shows that RBMXL2-FLAG protein is stably expressed in HEK293 cells after 24 hours of tetracycline induction, and RBMX protein is successfully depleted after 72 hours siRNA treatment. (C, E, G) Capillary gel electrophoretograms show RNA processing patterns of endogenous ultra-long exons within ETAA1, REV3L and ATRX controlled by RBMX and RBMXL2 analysed using isoform-specific RT-PCR. (D, F, H) Bar charts showing percentage splicing inclusion (PSI) of cryptic isoforms from the endogenous ETAA1, REV3L and ATRX genes under the different experimental conditions, relative to experiments in (C), (E) and (G) respectively. P-values were calculated using unpaired t-test. *, p-value<0.05; ***, p-value<0.001. ****, p-value<0.0001.

The disordered domain of RBMXL2 is required to mediate splicing control of ultra-long exons in HEK293 cells.

(A) Western blot analysis shows that RBMXL2ΔRRM-FLAG protein is stably expressed in HEK293 cells after 24 hours of tetracycline induction, and RBMX protein is successfully depleted after 72 hours siRNA treatment. (B, D, F) Capillary gel electrophoretograms show RNA processing patterns of endogenous ultra-long exons within ETAA1, REV3L and ATRX analysed using isoform-specific RT-PCR. (C, E, G) Bar charts showing percentage splicing inclusion (PSI) of cryptic isoforms from the endogenous ETAA1, REV3L and ATRX genes under the different experimental conditions, relative to experiments in (B), (D) and (F) respectively. P-values were calculated using unpaired t-test. ****, p-value<0.0001.

Model of cryptic splice site repression within ultra-long exons by RBMX family proteins.

Ultra-long exons are intrinsically fragile as they contain cryptic splice sites within an environment rich in Exonic Splicing Enhancers (ESEs). RBMX protein binding within ultra-long exons may directly block access of spliceosome components to cryptic splice sites, and depletion of RBMX from somatic cells activates selection of cryptic splice sites. This means a shorter version of the originally ultra-long exon is included, that fits more easily with exon definition rules normally followed for median size exons. During meiosis, lack of RBMX caused by X chromosome inactivation is compensated by expression of RBMXL2 protein.

(A) Bar chart showing the different types of alternative splicing events controlled by RBMX protein in MDA-MB-231 cells, summarising the proportion of splicing events that are activated by RBMX versus those that are repressed.

(A) Correlation analysis between three replicates for RBMX-FLAG iCLIP. (B) K-mer analysis shows the top 50 enriched motifs within sequences surrounding RBMX iCLIP binding. (C) Barplot showing percentage of exons regulated by RBMX that contain iCLIP tags.

(A) Gene Ontology analysis of genes bound by RBMX as identified by iCLIP, displaying top 20 significant Gene Ontology Biological Process (GOBP). Adjusted p-value were produced using the Benjamini-Hochberg method. Count, number of genes in the GOBP group. GeneRatio, proportion of genes in the GOBP group relative to the full list of RBMX-regulated genes. (B) Analysis as in (A) but relative to genes regulated by RBMX in both MDA-MB-231 (this study) and HEK293 (Liu et al., 2017) as identified by RNA-seq. GOBP terms containing at least 5% of the total gene list are shown. FDR, False Discovery Rate. (C) Analysis as in (B) but relative to human genes containing ultra-long exons (equal or longer than 1 Kb, Ensembl v.104). (D) Comet assay shows increased formation of DNA breaks in U2OS cells treated with RBMX siRNA. Direction of comets is shown. Scale bars 200µm. (E) Quantification of percentage of DNA in the tail of comets. n=58 cells in both conditions. ****, p<0.0001 (Mann-Whitney test).

(A) Snapshot of RNA-seq merged tracks from HEK293 cells (Liu et al., 2017) from the IGV genome browser shows cryptic 3’ splice sites repressed by RBMX in ETAA1 exon 5. (B) Schematic of ETAA1 protein in normal conditions and expected ETAA1 protein in RBMX-depleted cells. (C) Snapshot of RNA-seq merged tracks from MDA-MB-231 cells and RBMX iCLIP tags from HEK293 cells from the IGV genome browser shows an exitron repressed by RBMX in ATRX exon 9. (D) Schematic representation of ETAA1 minigene cloned within pXJ41. (E) Example RT-PCR from ETAA1 minigene shows detection of both the normal and cryptic version of exon 5. (F) Schematic of RT-PCR assay to detect branchpoints used during cryptic splicing of ETAA1 exon 5. (G) RT-PCR analysis for mapping branchpoints used during ETAA1 cryptic splicing (see Figure 3G). The distance from the relative cryptic splice site and from RBMX binding site as defined by iCLIP is indicated.

RBMXL2 can replace the activity of RBMX in ensuring proper splicing inclusion of ultra-long exons.

(A-C) Snapshot of RNA-seq merged tracks from HEK293 cells from the IGV genome browser shows tetracycline-induced expression of RBMXL2 restores correct splicing patterns within ETAA1 exon 5 (A), REV3L exon 13 (B) and ATRX exon 9 (C). Location of the splicing defects in RBMX-depleted cells is shown with red dotted lines.

RBMY can replace the activity of RBMX in ensuring proper splicing inclusion of ultra-long exons.

(A) Western blot analysis shows that RBMY-FLAG protein is stably expressed in HEK293 cells after 24 hours of tetracycline induction, and RBMX protein is successfully depleted after 72 hours siRNA treatment. All conditions were repeated in biological triplicates. (B, D, F) Capillary gel electrophoretograms show RNA processing patterns of endogenous ultra-long exons within ETAA1, REV3L and ATRX controlled by RBMX and RBMY analysed using isoform-specific RT-PCR. (C, E, G) Bar charts showing percentage splicing inclusion (PSI) of cryptic isoforms from the endogenous ETAA1, REV3L and ATRX genes under the different experimental conditions, relative to experiments in (B), (D) and (F) respectively. P-values were calculated using unpaired t-test. ***, p-value<0.001. ****, p-value<0.0001.