Research Article

Quantitative prediction of variant effects on alternative splicing in MAPT using endogenous pre-messenger RNA structure probing

Department of Biology, University of North Carolina at Chapel Hill, United States
Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, United States
Department of Genetics and Biochemistry, Center for Human Genetics, Clemson University, United States
Verna and Marrs McClean Department of Biochemistry and Molecular Biology, Therapeutic Innovation Center (THINC), and Department of Molecular and Human Genetics, Baylor College of Medicine, United States
Department of Chemistry, University of North Carolina at Chapel Hill, United States
Department of Biochemistry & Biophysics and Center for RNA Biology, School of Medicine and Dentistry, University of Rochester, United States

Jun 13, 2022

Open access
Copyright information

Abstract
Editor's evaluation
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Splicing is highly regulated and is modulated by numerous factors. Quantitative predictions for how a mutation will affect precursor mRNA (pre-mRNA) structure and downstream function are particularly challenging. Here, we use a novel chemical probing strategy to visualize endogenous precursor and mature MAPT mRNA structures in cells. We used these data to estimate Boltzmann suboptimal structural ensembles, which were then analyzed to predict consequences of mutations on pre-mRNA structure. Further analysis of recent cryo-EM structures of the spliceosome at different stages of the splicing cycle revealed that the footprint of the B^act complex with pre-mRNA best predicted alternative splicing outcomes for exon 10 inclusion of the alternatively spliced MAPT gene, achieving 74% accuracy. We further developed a β-regression weighting framework that incorporates splice site strength, RNA structure, and exonic/intronic splicing regulatory elements capable of predicting, with 90% accuracy, the effects of 47 known and 6 newly discovered mutations on inclusion of exon 10 of MAPT. This combined experimental and computational framework represents a path forward for accurate prediction of splicing-related disease-causing variants.

Editor's evaluation

This manuscript will be of interest to biologists who study RNA structure-function relationships in a broad range of systems, splicing researchers, and RNA structure bioinformaticians. An integrative analysis of RNA structure probing, model-based RNA folding energetics, cryo-EM data, and protein binding sequence motifs serves as the basis for a comprehensive, accurate, and robust framework for predictive models of splicing dynamics in a well-studied system. The modeling is leveraged by in silico mutagenesis that reveals novel insights into the mechanisms and tradeoffs that underlie the impact of disease-associated mutations on alternative splicing.

https://doi.org/10.7554/eLife.73888.sa0

Introduction

Precursor mRNA (pre-mRNA) splicing is a highly regulated process in eukaryotic cells (Wang and Burge, 2008). Numerous factors control splicing including trans-acting RNA-binding proteins (RBPs), components of the spliceosome, and the pre-mRNA itself. Pre-mRNA structure is a key attribute that directs splicing, particularly alternative splicing, but we have a limited understanding of pre-mRNA structure-mediated splicing mechanisms (Taylor and Sobczak, 2020). It has proven challenging to develop quantitative models capable of predicting splicing outcome, specifically the percent spliced in (PSI) for alternatively spliced exons. It is especially difficult to predict outcome alterations due to genetic variation at exon-intron junctions because mutations affect both the binding by RBPs and also pre-mRNA structure (Tazi et al., 2009).

The consequences of mutations on pre-mRNA structure are difficult to predict. First, little is known about native pre-mRNA structure because pre-mRNAs are relatively short-lived in cells (Herzel et al., 2017). Only recently has high-resolution in-cell experimental characterization been applied to pre-mRNA structure determination (Mustoe et al., 2018a; Sun et al., 2019; Liu et al., 2021; Bubenik et al., 2020). Second, it is not clear which structures within a pre-mRNA modulate spliceosome assembly and activity. Finally, quantitative measures for the relative weighting of RBP affinity for individual motifs within a pre-mRNA relative to the importance of pre-mRNA structure are lacking.

In this study, we exploited several technical developments that address these issues to develop an integrated, RNA structure-based framework that accurately predicts splicing outcomes. We measured endogenous pre-mRNA structure in cells taking advantage of recent developments in mutational profiling (MaP) approaches for read-out of chemical probing data (Homan et al., 2014) with targeted amplification of specific exon-intron junctions. This novel approach enabled us to obtain single-nucleotide RNA (snRNA) structure probing data for endogenous pre- and mature mRNAs in the same cell. Our RNA structure modeling considers the equilibrium between multiple alternative structures (Dethoff et al., 2012; Lai et al., 2018) and employed data-guided Boltzmann suboptimal sampling (Spasic et al., 2018) to predict free energies of unfolding for structures in the ensemble. We additionally leveraged recent high-resolution structures of the spliceosome at various stages of the splicing cycle to deduce the effective spliceosomal footprint on pre-mRNA (Zhang et al., 2019), quantitative analysis of exonic and intronic splicing enhancers (ESEs and ISEs)/silencers (ESSs and ISSs) (Fairbrother et al., 2002; Wang et al., 2004a; Wang et al., 2012; Wang et al., 2012), and a β-regression weighting (Ferrari and Cribari-Neto, 2004).

For validation of our framework, we studied the effects of 47 experimentally measured mutations near the exon 10 – intron 10 junction of the human MAPT gene, which encodes the Tau protein (Park et al., 2016; Silva and Haggarty, 2020). Exons 9, 10, 11, and 12 encode the critical microtubule binding repeat domain in Tau. Exons 9, 11, and 12 are constitutively spliced, but exon 10 is alternatively spliced resulting in MAPT isoforms with either four microtubule binding repeats (4R) or three repeats (3R) when exon 10 is included or skipped, respectively. The normal ratio of 3R to 4R isoforms is approximately 1:1 (Hefti et al., 2018). Twenty-nine clinically validated disease-causing mutations have been identified in the region of the exon 10 – intron 10 junction (Stenson et al., 2003). These mutations result in impaired Tau function and are implicated in neurodegenerative disease (Spillantini et al., 1998; Hutton et al., 1998; Clark et al., 1998; Rizzu et al., 1999; Goedert et al., 1999). Although some mutations alter the Tau protein sequence (Mirra et al., 1999; Iseki et al., 2001), 20 of the disease-associated mutations deregulate MAPT pre-mRNA splicing altering the ratio of 3R to 4R (Hutton et al., 1998; D’Souza et al., 1999; Hasegawa et al., 1999; Jiang et al., 2000). The effect of an additional 27 mutations on exon 10 inclusion has been experimentally determined using cell-based splicing assays (D’Souza and Schellenberg, 2000; Tan et al., 2019; Grover et al., 1999). The exon 10 junction is the best experimentally characterized junction of clinical importance in the human genome and is thus an excellent system for developing forward-predictive models of splicing. Our work provides a framework for integrating endogenous pre-mRNA structure probing data with a structure-based understanding of spliceosome assembly and trans-acting RBPs to qualitatively predict the effect of mutations at exon-intron junctions on splicing.

Results

MAPT 3R and 4R mRNA isoforms are expressed at a consistent 1:1 ratio across tissues

To confirm that MAPT pre-mRNA splicing results in a 1:1 ratio of alternatively spliced isoforms (Goedert et al., 1989; Andreadis, 2005) in a large population, we analyzed RNA-sequencing data from the genotype-tissue expression (GTEx) database (Lonsdale et al., 2013). We analyzed data from tissue types with median MAPT transcripts per million (TPM) greater than 10 (Figure 1—figure supplement 1A) and calculated the PSI value for exon 10 for each sample (Figure 1—source data 1; Materials and methods). We examined data from 2315 tissue samples from 375 individuals of median age 61 (Figure 1A and Figure 1—figure supplement 1B). A PSI of 0 indicates that none of the MAPT transcripts in a sample included exon 10 (3R), whereas a PSI of 1 corresponds to exon 10 inclusion in all transcripts in a sample (4R).

Figure 1 with 8 supplements see all

Download asset Open asset

Three repeat (3R) and four repeat (4R) mature *MAPT* transcripts are expressed in a 1:1 ratio in samples from human subjects and mature exons appear to fold as independent structural units.

(A) Ratio of 3R and 4R *MAPT* transcripts is approximately 1:1 among brain tissues. There are 14 exons alternatively spliced in *MAPT*. Exons 4 A, 6, and 8 are not found in brain mRNA. The four exons highlighted in color are repeat regions that form the microtubule binding domain in the Tau protein. Exon 10 is alternatively spliced to form the 3R or 4R isoform. This is highlighted by the alternate lines from the 5’ splice site of exon 9 to either the 3’ splice site of exon 10 (4R) or the 3’ splice site of exon 11 (3R). The six canonical transcripts found in the CNS can be separated into 3R and 4R transcripts. Percent spliced in (PSI) of exon 10 was calculated from RNA-seq data for 2315 tissue samples representing 12 CNS tissue types and collected from 375 individuals in genotype-tissue expression (GTEx) v8 database. The violin plot for each tissue type and the corresponding region on the brain diagram is colored by the median PSI for all samples of a given type. The patterned regions on the brain diagram indicate tissue types with no data. Tissue types spinal cord and nucleus accumbens are not visualized on the brain diagram. The median PSI of 0.54 among all tissue samples is indicated by the red dotted line. (B) In-cell dimethyl sulfate probing read out by mutational profiling (DMS-MaP) structure probing data across exon 9 – exon 11 junction of 3R mature *MAPT* transcript. T47D cells were treated with dimethyl sulfate (DMS). Structure probing data for junctions of interest were obtained using amplicon sequencing with region-specific primers (Supplementary file 4) following reverse transcription (RT) of extracted RNA. DMS reactivity is plotted for each nucleotide across spliced junctions. Each value is shown with its SE and colored by reactivity based on color scale. High DMS reactivities correspond to less structured regions, whereas low DMS reactivities correspond to more structured regions. The base pairs of the predicted secondary structure guided by DMS reactivities (using A/C nucleotides only) are shown in the arcs colored by pairing probabilities. (C) In-cell DMS-MaP structure probing data across exon 9 – exon 10 – exon 11 junction of 4R mature *MAPT* transcript.

Figure 1—source data 1 Percent spliced in (PSI) of exon 10 plotted in Figure 1A and Figure 1—figure supplement 1B. PSIs were calculated for 2962 tissue samples from 15 tissues types for CNS, muscle-skeletal, colon-sigmoid, and breast-mammary. Each row has the genotype-tissue expression (GTEx) participant ID, the tissue type, and the PSI calculated.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig1-data1-v2.txt
Download elife-73888-fig1-data1-v2.txt
Figure 1—source data 2 Dimethyl sulfate (DMS) reactivities plotted in Figure 1B. Reactivities are from the exon 9 – exon 11 junction of three repeat (3R) mature MAPT transcript in T47D cells, pooled from two replicates. Each row contains nucleotide position, DMS reactivity, the SE, and the base at the position.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig1-data2-v2.txt
Download elife-73888-fig1-data2-v2.txt
Figure 1—source data 3 Dimethyl sulfate (DMS) reactivities plotted in Figure 1C. Reactivities are from the exon 9 – exon 10 – exon 11 junction of four repeat (4R) mature MAPT transcript in T47D cells, pooled from two replicates. Each row contains nucleotide position, DMS reactivity, the SE, and the base at the position.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig1-data3-v2.txt
Download elife-73888-fig1-data3-v2.txt

PSI for exon 10 varied across tissue types and within and between individuals. However, 75% of samples were within an SD of the median PSI of 0.54, demonstrating that the 3R to 4R isoform ratio was close to 1:1 among individuals and across tissues. Within the brain, the pituitary gland demonstrated the largest variation in PSI and the cerebellum the least variation. The pituitary gland also had the lowest median PSI (0.38). However, the median PSI differed by no more than 0.25 across all brain tissues. Interestingly, although MAPT function in breast tissue is not understood compared with its function in the brain, there was greater variation in PSI in breast tissue, and the median PSI in breast tissue was lower than in the pituitary gland (Figure 1—figure supplement 1B). There was also a large amount of variation within tissues of an individual (Figure 1—figure supplement 1C), although there was significantly greater variation between than within individuals (see Supplementary file 1 for ANOVA table). Furthermore, exon 10 inclusion variability (0.2) was between the variability for a MAPT constitutively spliced exon (0.1) and another MAPT alternatively spliced exon (0.3) (Figure 1—figure supplement 2). As levels of RBP expression varied considerably across individuals and tissues (Figure 1—figure supplement 3), sequence and structural features of the MAPT pre-mRNA likely regulate inclusion of exon 10.

Structures of 3R and 4R MAPT mature mRNA isoforms are similar and mostly unstructured

The structures of the mature 3R and 4R isoforms and MAPT pre-mRNA have not been assessed in their endogenous context in cells. Here, we used dimethyl sulfate probing read out by mutational profiling (DMS-MaP) as described previously (Mustoe et al., 2019; Homan et al., 2014) to assess MAPT pre-mRNA and mature mRNA structures in T47D cells, a breast cancer line, and in neuronal SH-SY5Y cells. We used region-specific primers (Smola et al., 2015) to selectively amplify mature 3R and 4R transcripts during library preparation (Supplementary file 4; Materials and methods). This approach leverages the read-through capability of MaP technology to probe the structure of distinct alternatively spliced isoforms in the same cells. High DMS reactivities correspond to less structured regions, whereas low DMS reactivities correspond to more structured regions. DMS reactivities for replicates and cell lines were highly correlated (Figure 1—figure supplement 4A; Figure 1—figure supplement 4B; Figure 1—figure supplement 5A; Figure 1—figure supplement 5B).

As an internal control for our probing experiments, we also collected DMS-MaP data for the small subunit ribosomal RNA (SSU) (Figure 1—figure supplement 6), which has a well-defined secondary structure (Petrov et al., 2014). As expected, the DMS reactivities of unpaired nucleotides were significantly higher than for paired nucleotides both for RNA probed in cells and for RNA isolated from cells prior to probing (Figure 1—figure supplement 7A). This experiment confirmed that our DMS probing recapitulates native RNA secondary structure regardless of the presence of proteins, consistent with previous studies (Woods et al., 2017; Lackey et al., 2018). We used the SSU in-cell reactivity data to calibrate the estimation of equilibrium ensembles (Materials and methods), and we confirmed that structure modeling guided by experimental DMS reactivities yielded a more accurate estimation of the SSU structure than the model not informed by chemical probing data (Figure 1—figure supplement 7B).

The median in-cell DMS reactivity of the mature MAPT isoforms was 0.22, significantly greater than the median in-cell DMS reactivity of the SSU, which was 0.008 (Figure 1—figure supplement 7C). This difference was recapitulated in cell-free samples (Figure 1—figure supplement 7C). These results suggested that the nucleotides of the mature MAPT isoforms were more accessible and less paired overall as compared with the highly structured SSU. Reactivities of exon 9 and exon 11 were highly correlated between the 3R and 4R isoforms (Figure 1B and C; Figure 1—figure supplement 4C). In the 4R isoform, approximately 89% of base pairs were contained within the exon units; only 11% of base pairs were between residues from exon 10 with those of exon 11 (Figure 1—figure supplement 8). This result suggests that the mature exons fold as independent structural units.

MAPT pre-mRNA exon 10 – intron 10 junction is more structured than the mature isoforms in cells

RNA structure around exon-intron junctions has been shown to regulate alternative splicing (Warf and Berglund, 2010; Buratti and Baralle, 2004), and a hairpin structure at the exon 10 – intron 10 junction of MAPT pre-mRNA is implicated in establishing the 3R to 4R 1:1 isoform ratio (Hutton et al., 1998; Varani et al., 1999; Grover et al., 1999; Donahue et al., 2006). The structure of the MAPT pre-mRNA in the exon 10 – intron 10 junction region has been studied using biophysical techniques and chemical probing of in vitro-transcribed fragments and using computational methods (Varani et al., 1999; Lisowiec et al., 2015; Tan et al., 2019; Chen et al., 2019), but the pre-mRNA structure had not previously been analyzed in cells. We obtained DMS-MaP data for this junction from endogenous pre-mRNA in T47D cells (Figure 2A). Replicates were highly correlated (Figure 2—figure supplement 1A). Similar reactivity data were also observed in SH-SY5Y cells (Figure 2—figure supplement 1C), despite the likely differences in RBP populations compared to T47D cells (Figure 1—figure supplement 3).

Figure 2 with 4 supplements see all

Download asset Open asset

The four repeat (4R) and three repeat (3R) mutations shift dimethyl sulfate (DMS) reactivity-guided structural ensemble of exon 10 – intron 10 junction to be less structured and more structured, respectively.

(A) In-cell dimethyl sulfate probing read out by mutational profiling (DMS-MaP) structure probing data across exon 10 – intron 10 junction of precursor *MAPT* transcript in T47D cells. Structure probing data for junctions of interest were obtained using amplicon sequencing with primers (Supplementary file 4) following RT of extracted RNA. DMS reactivity is plotted for each nucleotide. Each value is shown with its SE and colored by reactivity based on the color scale. High median DMS reactivities correspond to less structured regions, whereas low median DMS reactivities correspond to more structured regions. Base pairs of predicted secondary structure guided by A/C DMS reactivities are shown by arcs colored by pairing probabilities. Strongly predicted hairpin structure near exon-intron junction is highlighted by dotted box. (B) t-Distributed stochastic neighbor embedding (t-SNE) visualization of structural ensemble of wildtype (WT) and,+19C>G (3R) and +15A>C (4R) mutations. Structures were predicted using Boltzmann suboptimal sampling and guided by DMS reactivity data for A/C nucleotides generated in A. Data were visualized using t-SNE. Shown are 3000 structures corresponding to 1000 structures per sequence. Each dot represents a single structure and is colored by the calculated unfolding free energy of the 5’ splice site at exon-intron junction (three exonic and six intronic bases). Clusters have been circled and enumerated using k-means clustering with k=5. (C) Density contour plots of structural ensemble of WT and, 3R, and 4R mutations. Contour plots were derived from the distribution of points on the t-SNE plot in B. The darker the blue, the higher the density of structures. Contour lines additionally represent density of points. Color scales for the three plots are identical. Inserts are gel images representative of splicing assays using a reporter plasmid expressing either the WT sequence, the +19C>G (3R) mutation or +15A>C (4R) mutation in HEK293 cells, where the RNA was extracted and reverse transcribed to measure the isoform ratio using specific PCR amplification (Materials and methods). In WT, both 3R (exon 9 – exon 11) and 4R (exon 9 – exon 10 – exon 11) isoforms are expressed (two bands). In the presence of the 3R mutation, only the 3R isoform is expressed (one band), whereas for the 4R mutation, only the 4R isoform is expressed (one band). Gel insets for the 3R and 4R mutation are in their respective density plots. (D) Representative structures for the five clusters are shown. The cluster number is indicated below each structure. The exon-intron junction is marked by EIJ on each structure. Positions of 3R and 4R mutations are marked by a red asterisk on their respective representative structures.

Figure 2—source data 1 Dimethyl sulfate (DMS) reactivities plotted in Figure 2A. Reactivities are from the exon 10 – intron 10 junction of precursor MAPT transcript in T47D cells, pooled from two replicates. Each row contains nucleotide position, DMS reactivity, the SE, and the base at that position.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig2-data1-v2.txt
Download elife-73888-fig2-data1-v2.txt
Figure 2—source data 2 t-Distributed stochastic neighbor embedding (t-SNE) coordinates plotted in Figure 2B. 3000 sampled structures of the exon 10 – intron 10 junction for wildtype (WT), three repeat (3R), and four repeat (4R) mutation were digitized into a 1000 × 234 matrix as described in Materials and methods. t-SNE dimension reductionality was performed to obtain a 3000 × 2 matrix. Each row is for a single structure. It contains the t-SNE x and y coordinate, the sequence type of the structure (WT, 3R, or 4R), k-means cluster number, and the ΔG^‡ of unfolding of the splice site for the structure.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig2-data2-v2.txt
Download elife-73888-fig2-data2-v2.txt

The reactivities for exon 10 in the pre-mRNA and mature 4R isoform were highly correlated (Figure 2—figure supplement 1B). This high correlation was unexpected given that the pre-mRNA undergoes splicing during the 5-min treatment of the cells with DMS. As we observed for the mature 4R isoform, exon 10 in the pre-mRNA mostly formed base pairs with other exon 10 nucleotides (Figure 2—figure supplement 2). However, when we compared DMS reactivities for pre-mRNA and the mature 4R isoform, we found that DMS reactivity in exon 10 was significantly lower for the pre-mRNA (median in-cell DMS reactivity: 0.08) than for the 4R isoform (median in-cell DMS reactivity: 0.22) (Figure 2—figure supplement 3). This was also the case for RNA probed under cell-free conditions (Figure 2—figure supplement 1D). The pre-mRNA is thus apparently more structured than mature mRNA independent of protein protection.

Disease mutations change the MAPT pre-mRNA structural ensemble and splicing of exon 10

Many RNAs adopt an ensemble of structures instead of a single structure (Halvorsen et al., 2010; Adivarahan et al., 2018). We posited that a structural ensemble near the MAPT exon 10 – intron 10 junction regulates exon 10 splicing and that disease-associated mutations alter the composition of the structural ensemble to disrupt splicing regulation. We used Boltzmann sampling of RNA structures supported by DMS reactivity data (Spasic et al., 2018) (Materials and methods) to sample 1000 structures for the wildtype (WT). We also generated ensembles for two RNAs that bear mutations in intron 10 that are known to alter MAPT splicing: (i) an A to C mutation at position +15 (+15A>C) that favors 3R isoform, and (ii) a C to G mutation at position +19 (+19C>G) that favors the 4R isoform (Tan et al., 2019). These mutant ensembles were generated using the same DMS reactivities as the WT RNA, with the exception of the mutation site (see Materials and methods), and thus provide a well-controlled prediction of the impact that each mutation will have on the ensemble.

We visualized the structural ensemble for the 3000 structures using t-distributed stochastic neighbor embedding (t-SNE) (Maaten and Hinton, 2008) and identified five clusters (Figure 2B; Materials and methods). Each dot in the t-SNE plot (Figure 2B) corresponds to a single structure and is colored by the ΔG^‡ of unfolding (Mustoe et al., 2018a) of the 5’ splice site (Figure 2—figure supplement 4A), defined as the last three nucleotides of exon 10 and the first six nucleotides of intron 10 (Yeo and Burge, 2004). The ∆G^‡ is the cost of disrupting a given structure without allowing the RNA to refold (Mustoe et al., 2018a; Mustoe et al., 2018b). We quantified and visualized the density of structures from the t-SNE plot (Figure 2C) and calculated representative structures for each cluster (Figure 2D and Figure 2—figure supplement 4B; Materials and methods). The WT sequence forms structures distributed across the entire space with about 70% of structures found in clusters 2, 3, and 4 (Figure 2—figure supplement 4B). By contrast, in the +19C>G mutant that strongly favors the 3R isoform (Tan et al., 2019), more than 55% of structures belong to cluster 1, which is defined by a fully base-paired 5’ splice site (Figure 2D). Conversely, greater than 50% of structures in the ensemble of the +15A>C mutant (cluster 5), which shifts the isoform balance entirely to 4R (Tan et al., 2019), were characterized by lower ΔG^‡ of unfolding for the splice site region (Figure 2B and C). Correspondingly, the 5’ splice site for the cluster 5 representative structure was less structured than that of cluster 1 (Figure 2D). Based on these results, we concluded that mutations shift the structural ensemble of the MAPT exon 10 – intron 10 junction, and these structural shifts correspondingly change exon 10 splicing.

Unfolding mRNA within the spliceosome B^act complex footprint yields the best prediction of exon 10 splicing level

RNA structure has been shown to control alternative splicing by regulating accessibility of key regions to spliceosome components (McManus and Graveley, 2011; Warf and Berglund, 2010). The 5’ splice site is the minimum region of RNA that must be accessible for base pairing with the U1 snRNA (Blanchette and Chabot, 1997; Singh et al., 2007). In our structural ensemble analysis of the MAPT exon 10 – intron 10 junction (Figure 2), we found that shifts in the unfolding energy of the 5’ splice site in WT and mutant pre-mRNAs corresponded to changes in exon 10 inclusion levels. However, the splicing cycle, orchestrated by the spliceosome, traverses multiple stages to prepare the pre-mRNA and catalyze the two-step splicing reaction (Matera and Wang, 2014; Figure 3A). The RNA itself adopts many conformations as different components of the spliceosome bind to it (Zhang et al., 2019). Hence, we hypothesized that more than just the 5’ splice site nucleotides might need to unpair to facilitate the splicing reaction. We analyzed high-resolution cryo-EM structures of the human spliceosome at Pre-B (PDB ID: 6Q × 9), B (PDB ID: 5O9Z), Pre-B^act (PDB ID: 7ABF), and B^act (PDB ID: 5Z56) stages (Charenton et al., 2019; Bertram et al., 2017; Townsend et al., 2020; Zhang et al., 2018) to quantify the number of nucleotides around the 5’ splice site associated with the spliceosome (Materials and methods). The number of pre-mRNA nucleotides, as observed in each structure, increased through the splicing cycle (Figure 3A).

Figure 3 with 1 supplement see all

Download asset Open asset

The best predictor of exon 10 percent spliced in (PSI) for intronic and synonymous mutations was the unfolding free energy of precursor mRNA (pre-mRNA) during the B^act stage of splicing.

(A) Spliceosome footprint on pre-mRNA during splicing cycle. Structure in the center of the cycle is the wildtype (WT) representative structure from Figure 2B. The dotted box indicates the zoomed-in region at each stage of interest. Cryo-EM structures of the human spliceosome complex at stages Pre-B (PDB ID: 6Q × 9), B (PDB ID: 5O9Z), Pre-B^act (PDB ID: 7ABF), and B^act (PDB ID: 5Z56) are available in the Protein Data Bank. The region around the 5’ splice site of pre-mRNA within the spliceosome at each stage is highlighted in blue on the zoomed-in representative structure. The number of nucleotides for each stage is as follows: Pre-B (2 exonic and 8 intronic); B (10 exonic and 17 intronic); Pre-B^act (9 exonic and 20 intronic); B^act (12 exonic and 31 intronic). These values represent the minimum number of nucleotides required to be unfolded to be accessible to the spliceosome. The mean free energy and SE to unfold RNA within the spliceosome at each stage are calculated for the WT structural ensemble and indicated under the zoomed-in structure. (B) Exon 10 PSIs of synonymous and intronic mutations predicted with the unfolding free energy of pre-mRNA within the spliceosome in B, Pre-B, Pre-B^act, and B^act stages vs. corresponding experimental PSIs measured in splicing assays. Exon 10 PSIs were predicted using Equation 2. Gray line represents the best fit with dotted lines indicating the 95% CI. Pearson correlation coefficients (R²) of experimental to predicted PSIs were calculated for each stage. Violin plots (inset) show R²s calculated for each mutation category by training and testing on subsets of all mutations by non-parametric bootstrapping; synonymous (n=6), intronic (n=14), and WT (n=1). (C) Overall R² calculated for experimental vs. predicted exon 10 PSIs by non-parametric bootstrapping of mutations. Subsets of mutations were randomly sampled 10 times, trained and tested using unfolding free energy of the exon-intron junction region of pre-mRNA within the spliceosome for the respective splicing stage. Pearson’s R² was calculated by comparing predicted PSIs to experimental PSIs. A two-tailed Wilcoxon rank sum test was used to determine significance between B^act complex and the other three complexes. Level of significance: ***p-value<10^–6, **p-value<0.001, and * p-value<0.01.

To identify the spliceosome complex footprint that best predicts splicing outcome, we examined the relationship between unfolding energy and splicing outcome for 20 synonymous or intronic mutations in exon 10 and intron 10 (Figure 3—figure supplement 1A). These mutations are more likely to affect splicing (Supek et al., 2014; Lin et al., 2019) and structure (Sharma et al., 2019; Lin et al., 2016) than mutations that alter the protein sequence. The distribution of ΔG^‡ of unfolding of the 5’ splice site in the presence of these mutations was correlated with exon 10 PSI (Figure 3—figure supplement 1B). We then calculated the ΔG^‡ of unfolding of the RNA for regions overlapping the 5’ splice site that correspond to the footprints of each of the four spliceosome intermediates. Features of the unfolding ΔG^‡ distribution, including mean and SD, were then used in a β-regression to predict exon 10 PSI (Materials and methods; Equation 2). Unfolding larger regions around the 5’ splice site improved the predictive power of the model, and the B^act complex footprint yielded the best prediction accuracy (R²=0.89; Figure 3B). Crucially, we found that using features of the distribution of unfolding ΔG^‡ in the structural ensemble produced greater predictive power than simply using the unfolding ΔG^‡ of a single minimum free energy structure, supporting the importance of RNA ensemble behavior to splicing outcome (Figure 3—figure supplement 1C). We performed bootstrapping cross-validation and confirmed that unfolding the RNA within the B^act spliceosome complex yielded the best prediction (Figure 3C). Synonymous mutations that alter exon 10 inclusion lie a mean distance of 54 nucleotides from the exon-intron junction, whereas those in the intron are a mean of 14 nucleotides from the junction. The variation in bootstrapped correlation coefficients decreased as a larger region around the exon-intron junction was unfolded, suggesting that the synonymous mutations affect distal structures.

We then tested the structural ensemble-based model on an additional 24 non-synonymous and compensatory mutations found in exon 10 and intron 10. Compensatory mutations are double mutations that were designed to rescue changes in exon 10 splicing caused by a single mutation (Grover et al., 1999). Although the model performed well for compensatory mutations (median bootstrapped R²=0.76), it yielded significantly less accurate predictions for non-synonymous mutations (median bootstrapped R²=−0.21) (Figure 3—figure supplement 1D, E). One clear limitation of this structure-only model is that it does not account for the effects of mutations on potential splicing regulatory elements (SREs) in the sequence, which are also known to control alternative splicing (Wang and Burge, 2008).

Consideration of motif strengths of SREs improves prediction of exon 10 PSI for non-synonymous mutations

Exon 10 splicing is highly regulated by differential binding of RBPs to cis-SREs within exon 10 and intron 10 (Qian and Liu, 2014). The expression patterns of RBPs known to bind MAPT pre-mRNA vary across tissues and individuals (Figure 1—figure supplement 3) and are not predictive of exon 10 PSI. Additionally, while our structure-only model performs moderately well for 47 mutations (R²=0.74) (Figure 4—figure supplement 1A, see Supplementary file 2 for further details about mutations), the structure-only model performs particularly poorly for non-synonymous mutations (median bootstrapped R²=–0.21, Figure 4—figure supplement 1A). Hence, we hypothesized that consideration of mutation-induced changes in binding of SREs might improve our model. We identified SREs by similarity to reported general enhancer and silencer hexamer motifs (Fairbrother et al., 2002; Wang et al., 2004a; Wang et al., 2012; Wang et al., 2013) (Materials and methods) and calculated changes to splice site, enhancer, and silencer motif strengths due to mutations (Figure 4A; Materials and methods). We found that using splice site strength as the sole predictor yielded poor prediction of exon 10 PSI for all mutation categories (Figure 4B; Equation 4). There was a weak positive correlation between PSI and enhancer strength and a significant negative correlation between PSI and silencer strength (Figure 4 and Figure 4—figure supplement 1B). When exon 10 PSI was modeled with the changes to the motif strength of all SREs, prediction accuracy increased (R²=0.51; Figure 4C) compared with that obtained when only splice site strength was considered (R²=0.29); for non-synonymous mutations accuracy was even higher (R²=0.75).

Figure 4 with 2 supplements see all

Download asset Open asset

Combining the strength of all splicing regulatory elements (SREs) significantly improves prediction of exon 10 percent spliced in (PSI) compared to using only splice site strength.

(A) Heatmap of SRE relative strength for 47 mutations compared with wildtype (WT). A value of 0 indicates mutation does not change WT SRE strength, positive values indicate SRE strength is greater than WT, and negative values indicate SRE strength is weaker than WT. Splice site strengths were calculated using MaxEntScan; a splice site was defined as the last three nucleotides of the exon and first six nucleotides of the intron. Enhancer and silencer strengths were calculated from position weight matrices of known motifs derived from cell-based screens (Materials and methods). Mutation type refers to whether the mutation is exonic non-synonymous, exonic synonymous, intronic, or compensatory. Experimental label is the label given by the original study that experimentally validated each mutation using a splicing assay. (B) Exon 10 PSIs of 47 mutations predicted from change in splice site strength and plotted against experimental PSIs measured in splicing assays. Exon 10 PSIs predicted using Equation 4. Each point on the scatterplot represents a mutation and is colored by mutation category. Gray line represents the best fit with dotted lines indicating the 95% CI. Pearson correlation coefficient (R²) calculated of experimental to predicted PSIs. Violin plot shows R²s calculated for each category by training and testing on subsets of all mutations by non-parametric bootstrapping; exonic non-synonymous (n=11), exonic synonymous (n=7), intronic (n=15), compensatory (n=14), and WT (n=1). (C) Exon 10 PSIs of 47 mutations predicted by combining change in splice site, enhancer, and silencer strength and plotted against experimental PSIs measured in splicing assays. Exon 10 PSIs predicted using Equation 5.

Figure 4—source data 1 Values of heatmap plotted in Figure 4A which show the relative splicing regulatory strength for 47 mutations compared with wildtype. Each row contains the mutation ID (Supplementary file 2), the mutation type, the enhancer relative strength, the silencer relative strength, and the splice site relative strength for the mutation.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig4-data1-v2.txt
Download elife-73888-fig4-data1-v2.txt

Many RBPs have been identified that regulate MAPT exon 10 splicing (Qian et al., 2011; D’Souza and Schellenberg, 2006; Kondo et al., 2004; Wang et al., 2004a; Gao et al., 2007; Ding et al., 2012; Broderick et al., 2004; Wang et al., 2010; Kar et al., 2006; Kar et al., 2011; Ray et al., 2011). To determine whether focusing on binding motifs for these proteins would improve model accuracy, we identified RBP sites based on previous data from high-throughput sequencing of bound RNAs (Dominguez et al., 2018; Ray et al., 2013) (Materials and methods). Unlike SRE motifs, there was no clear pattern or correlation between motif strength changes due to MAPT mutations and exon 10 PSI (Figure 4—figure supplement 2A, B). Model prediction accuracy was lower (R²=0.08, Figure 4—figure supplement 2C) than when predictions considered general SRE motifs. Thus, going forward, we chose to use SRE motifs for our combined models.

Model with both RNA structure and SRE motif changes yields best prediction of exon 10 PSI

We next set out to determine if combining both structural and SRE features further improved prediction. Indeed, a combined interactive model consistently produced more accurate predictions of exon 10 PSI compared with a structure-only model and an SRE-only model for all mutation categories (R²=0.89; Figure 5A and B). An alternative additive model had lower prediction accuracy (R²=0.80) (Figure 5—figure supplement 1A), particularly for non-synonymous mutations (Figure 5—figure supplement 1B). This suggests that considering the category of mutation is critical in accurately modeling the effects on splicing.

Figure 5 with 1 supplement see all

Download asset Open asset

Combining structure and splicing regulatory element (SRE) strength into a unified model best predicts exon 10 percent spliced in (PSI).

(A) Exon 10 PSIs of 47 mutations predicted from combined model using structure and SRE strength and fit to experimental PSIs measured in splicing assays. Exon 10 PSIs predicted using Equation 7. Each point on scatterplot represents a mutation and is colored by mutation category. Gray line represents the best fit with dotted lines indicating the 95% CI. Pearson correlation coefficient (R²) calculated of experimental to predicted PSIs. (B) Violin plots of correlation coefficients for each mutation category for structure model, SRE model, and combined model. R²s calculated for each mutation category by training and testing on subsets of all mutations by non-parametric bootstrapping 10 times. Structure model uses unfolding free energy of precursor mRNA (pre-mRNA) within spliceosome at B^act stage as predictor. SRE strength model uses relative change in SRE strength as predictor. Combination model using both structure and SRE strength and weighs the features based on if mutation is intronic/synonymous or non-synonymous (Materials and methods). (C) Heatmap of the normalized changes in structure and SRE strength for each mutation clustered by affected features. Features were normalized such that a value of 1 predicts exon 10 spliced in (four repeat [4R] isoform, blue), whereas a value of 0 implies exon 10 spliced out (three repeat [3R] isoform, red). Mutations were clustered using hierarchical clustering on normalized features (Materials and methods). Experimental PSIs are plotted for each mutation with a PSI of 1 colored as blue, PSI of 0.5 colored as white, and PSI of 0 colored as red. (D) Pie chart showing the features that regulate exon 10 splicing for the 47 experimentally validated mutations. The pie chart was generated based on the heatmap in C. Exon 10 splicing for 51.1% of mutations is supported by changes in both structure and SRE. This implies that structure, at least one SRE motif, and PSI are all blue or all red in the heatmap in (C). Exon 10 splicing for 23.4% of mutations is supported by changes in SRE wherein one of the SREs is the same color as PSI. For 17.0% of mutations, structural changes support exon 10 splicing wherein structure and PSI are the same color. For 4 mutations (8.5%), the colors of none of the features match the color of PSI.

Figure 5—source data 1 Values of heatmap plotted in Figure 5C which show the normalized changes in structure and splicing regulatory element (SRE) strength. Each row contains the mutation ID (Supplementary file 2), the normalized splice strength, normalized silencer strength, normalized enhancer strength, normalized mean ΔG^‡ of unfolding, and experimental percent spliced in (PSI) of the mutation.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig5-data1-v2.txt
Download elife-73888-fig5-data1-v2.txt

To determine whether structure or SRE changes were responsible for the splicing phenotype of each individual mutant, we hierarchically clustered the four primary features (structure around 5’ splice site, 5’ splice site strength, enhancer strength, and silencer strength) for the 47 mutants that have been experimentally characterized (Materials and methods). Six categories emerged from the clustering of features (Figure 5C, and Figure 5—figure supplement 1C). For about 51% of mutations, both structure and SRE motif strength were altered in the same direction to either promote or inhibit exon 10 inclusion (Figure 5D). For the remaining mutations, structure and SRE strength changed in opposite directions. For 17% of mutants, structure dominated the direction of splicing. For about 23%, SRE strength was dominant (Figure 5D). Overall, these results support the conclusion that structure and SREs have equally important effects on regulation of splicing at this exon-intron junction.

Mutations around the MAPT exon 10 – intron 10 junction skew to exon 10 inclusion

We next interrogated the model by performing a systematic in silico-mutagenesis analysis of the 100 nucleotides spanning the exon 10 – intron 10 junction (Figure 6A). Our model predicted that most mutations should result in inclusion of exon 10. This bias is consistent with the observation that about 75% of known disease-associated mutations in this region induce exon 10 inclusion (Figure 6B and Figure 6—figure supplement 1A). We found that a significantly greater proportion of disease-inducing mutations (76.4%) results in changes to both structure and SRE compared with uncategorized mutations (36.0%) (Figure 6C). Thus, mutations that alter both structure and SREs have a greater likelihood of causing disease than mutations that alter only structure or only an SRE. Intriguingly, mutations overall caused a slight skew toward a more structured exon-intron junction that would be expected to decrease inclusion of exon 10 (Figure 6A, Figure 6—figure supplement 1B); however, these same mutations altered SRE strength in a manner that skewed toward increased inclusion of exon 10 (Figure 6—figure supplement 1C), indicating that SREs act to counter the effect of structural changes. Our modeling suggests that a complex balance of structure and RBP binding results in the observed 1:1 ratio of the 3R to 4R MAPT isoforms.

Figure 6 with 1 supplement see all

Download asset Open asset

Combined model is predictive of exon 10 inclusion ratios for previously uncharacterized mutations.

(A) Heatmap of predicted exon 10 percent spliced ins (PSIs) for every possible mutation around 100 nucleotide window of exon 10 – intron 10 junction. Combined model was trained using 47 mutations with experimental PSIs measured from splicing assays as shown in Figure 5A and then used to predict PSIs for all mutation combinations for 100 nucleotides around the junction. Tiles with sequence indicate the wildtype (WT) nucleotide at the position. Heatmap of mean PSI per position and mean relative change in unfolding free energy of pre-mRNA within spliceosome at B^act stage compared with WT is shown below the gene diagram. (B) Violin plot of predicted PSIs for all possible mutations around exon 10 – intron 10 junction and only disease mutations. All possible mutations (n=300) and disease mutations (n=17). A two-tailed Wilcoxon rank sum test was used to determine significance between the two categories. Level of significance: ***p-value<10^–6, **p-value<0.001, and * p-value<0.01. (C) Pie chart showing features that drive exon 10 splicing for disease and presently uncategorized mutations. The pie chart was generated by quantifying the number of mutations for which the direction of predicted exon 10 PSI matched the direction of structure or splicing regulatory element (SRE) change. Exon 10 splicing for 76.4% of disease mutations is supported by changes to both structure and SRE compared with only 36.0% of uncategorized mutations. The difference in proportions was tested with a one-tailed Fisher’s exact test. (D) Histogram displaying the distribution of predicted PSIs using the combined model for 55 variants of unknown significance (VUSs) found in dot-bracket SNP (dbSNP). Density curve was overlaid on top of histogram showing that predicted PSIs skew away from three repeat (3R). Dotted line shows mean predicted PSI of 0.66. VUSs tested in splicing assays are indicated by their dbSNP RS IDs. (E) Representative gel of RT-PCR data for splicing assay in the presence of VUSs. Splicing reporter was transfected into HEK293 cells. The mean exon 10 PSI displayed for each variant was calculated from three replicates, and SE is shown in brackets below. Structure diagram on left displays the location of the VUSs tested.

Figure 6—source data 1 Values of heatmap of in-silico mutagenesis around exon 10 – intron 10 junction shown in upper panel of Figure 6A. Each row contains mutation ID (Supplementary file 6), the relative position of the mutation with respect to the exon 10 – intron 10 junction, mutation type, mutated base, wildtype (WT) base, and predicted percent spliced in (PSI) using combined interactive model.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig6-data1-v2.txt
Download elife-73888-fig6-data1-v2.txt
Figure 6—source data 2 Values of heatmap of mean percent spliced in (PSI) predicted per position shown in lower panel of Figure 6A. Each row contains the relative position with respect to the exon 10 – intron 10 junction and the mean predicted PSI of the three mutations for that position.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig6-data2-v2.txt
Download elife-73888-fig6-data2-v2.txt
Figure 6—source data 3 Values of heatmap of mean ΔΔG^‡ per position shown in lower panel of Figure 6A. Each row contains the relative position with respect to the exon 10 – intron 10 junction and the mean ΔΔG^‡ of the three mutations for that position.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig6-data3-v2.txt
Download elife-73888-fig6-data3-v2.txt
Figure 6—source data 4 Values of predicted percent spliced in (PSI) predicted for 55 variants of unknown significance (VUSs) shown in Figure 6D. Each row contains the mutation ID (Supplementary file 3), the relative position of the mutation with respect to the exon 10 – intron 10 junction, the wildtype (WT) base, the mutated base, and the predicted PSI for the VUS using the combined interactive model.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig6-data4-v2.txt
Download elife-73888-fig6-data4-v2.txt
Figure 6—source data 5 Band intensities of three replicates calculated for splicing reporter gels for six variants of unknown significance (VUSs) and wild type (WT) shown in Figure 6E. Each row contains the mutation, the replicate number, the four repeat (4R) band intensity, three repeat (3R) band intensity, calculated percent spliced in (PSI) from the band intensities, and the mean calculated PSI for the mutation from three replicates.: https://cdn.elifesciences.org/articles/73888/elife-73888-fig6-data5-v2.txt
Download elife-73888-fig6-data5-v2.txt

To assess the general applicability of our model beyond our mutation training set, we predicted exon 10 PSIs for 55 variants of unknown significance (VUSs) found in dot-bracket SNP (dbSNP) (see Supplementary file 3 for further details of VUSs). VUSs are mutations observed in the human population but are not currently associated with disease. The mean exon 10 PSI for VUSs was 0.66, and 70% were within an SD of the mean (Figure 6D). We observed that only a few mutations were predicted to have a PSI of 0 (3R) (Figure 6D red bar). We therefore used splicing assays to experimentally determine the splicing preference of six instructive variants (Materials and methods): three VUSs predicted to be 3R, one VUS predicted to be 4R, and two VUSs predicted to maintain the WT splicing ratio (Figure 6D). We found that all six predictions were correct (Figure 6E, Figure 6—figure supplement 1D). The three 3R VUSs caused the region around the exon-intron junction to become more structured, while the 4R VUS made this region less structured compared to WT (Figure 6—figure supplement 1E). SRE strength changes correctly predict exon 10 splicing direction for +30U>C and –6G>A (Figure 6—figure supplement 1F). For +23U>C and +26G>A, we observed changes in the degree of structure around the exon-intron junction and silencer strengths in diverging directions (Figure 6—figure supplement 1E, F) suggesting that these opposing changes preserve the WT 3R/4R ratio.

Discussion

Splicing specificity is complex (Baralle and Giudice, 2017). The spliceosome does not rely on sequence alone to correctly identify 5’ and 3’ splice sites; other cues ensure correct binding to appropriate locations. The MAPT exon 10 – intron 10 junction is a well-studied example of the effect of 5’ splice site secondary structure on splicing regulation. A hairpin was initially hypothesized to play a major role in splice site accessibility because disease mutations in this structure, close to the exon-intron junction, shifted the isoform balance to completely exclude or completely include exon 10 in the mature mRNA (Hutton et al., 1998; Grover et al., 1999). Nuclear magnetic resonance (NMR), cell-free chemical probing, and computation analyses confirmed the presence of the hairpin (Varani et al., 1999; Chen et al., 2019; Lisowiec et al., 2015). Recent studies have shown that structures determined in cell-free conditions can differ dramatically from those in cells (Sun et al., 2019; Rouskin et al., 2014). Our results suggest that this is not the case for the exon 10 – intron 10 junction region: in-cell chemical probing of the endogenous MAPT pre-mRNA provided strong evidence for formation of this hairpin in cells and for structural features not previously captured.

Our analysis also revealed that, in cells, exonic regions were less structured than introns, as also observed by Sun et al., 2019. Mature MAPT 3R and 4R are less structured in the region of exon 9 through exon 11 than is the pre-mRNA. The high correlations between structures of the exon in different MAPT isoforms and our finding that predicted exon 10 folding is only slightly impacted by the presence of intron 10 or exon 11 residues agree with previous observations of mRNAs. Specifically, mRNAs which encode yeast ribosomal proteins that indicate that RNA folding in both pre- and post-spliced exons is highly local and that most base pairs are intra-exon (Zubradt et al., 2017).

Unlike non-coding RNAs such as the ribosome and tRNA that rely on folding to a single, well-defined structure (Petrov et al., 2014), most RNAs are dynamic, unfolding, and refolding within a landscape (Cruz and Westhof, 2009; Giegé et al., 2012). We showed that structural ensembles have an important function at the exon 10 – intron 10 junction. If the 5’ splice site was always paired, only the 3R isoform would be produced. However, the presence of 3R and 4R isoforms, usually in a 1:1 ratio, implies that the junction is accessible in a subset of the structures. We found disease-causing mutations produced distinct shifts in the ensemble of the MAPT exon 10 – intron 10 junction region; these shifts showed strong correlation with changes in the 3R to 4R isoform ratio and confirmed that ensembles are essential at this junction. Our ability to accurately predict the effects of mutations on ensembles significantly improved our quantitative model (Figure 3—figure supplement 1C).

The U1 snRNA base pairs with a nine nucleotide sequence around the exon-intron junction (Roca et al., 2012). However, our analysis of cryo-EM structures of the human spliceosomal assembly cycle revealed that a larger region of the pre-mRNA interacts with the spliceosome and must be unfolded during splicing. Our structural model performed most accurately when we required 43 nucleotides around the 5’ exon-intron junction to be unfolded, corresponding to the region within the spliceosome B^act complex. This observation suggests that a large region of the pre-mRNA is dynamically remodeled by the spliceosome and that structures distal to the exon-intron junction can regulate splicing. Our finding corroborates evidence that RNA structure near this exon-intron junction is extensive (Tan et al., 2019). Note that we do not claim that all 43 nucleotides need to remain fully unpaired during the splicing cycle, as the entire cycle is dynamic and likely involves other intermediate structures. Rather, our model argues that mRNA unfolding and accommodation into the B^act complex are a key rate-limiting step in splicing, and considering this step is necessary to accurately model splicing outcome for a diverse set of mutations. Broadly, our definition of a functional footprint for splicing parallels a similar idea for translation initiation by the ribosome (Corley et al., 2017; Mustoe et al., 2018a; Mustoe et al., 2018b), for which the footprint is roughly 30 nucleotides. Thus, for both translation initiation and splice site selection, there is a region in which RNA structure functions as a rheostat.

Considerable evidence supports a function for both SREs (and their corresponding RBPs) and RNA structure in controlling alternative splicing of exon 10 of MAPT (Andreadis, 2012). However, the relative importance of these two factors has been controversial. The regression model we developed clarifies that there is a cooperative relationship between RNA structure and SREs in driving splicing outcome. Exonic non-synonymous mutations promote splicing changes primarily by altering SRE motifs, whereas exonic synonymous and intronic mutations altered RNA structure. A combined model that accounted for both structure and SREs was the most accurate predictor of exon 10 PSI (Figure 5D). It was previously proposed that exon 10 is alternatively spliced due to a weak 5’ splice site (D’Souza and Schellenberg, 2005), and indeed, we found that mutations that strengthened the splice site increase inclusion of exon 10 (Figure 4A). SRE strength alterations overall skewed more toward increased exon 10 inclusion, which suggest that SREs and the RBPs that bind them buffer the effects of RNA structure to maintain the 1:1 isoform ratio.

Although structure and SREs had opposing effects on splicing outcomes, disease variants frequently resulted in a synergistic effect on splicing outcome. The combined model was directly validated by accurate prediction of the effects of six previously untested VUSs on exon 10 splicing (Figure 6E). Few VUSs were predicted to completely exclude exon 10 from the mature mRNA. Only five VUSs had PSIs less than 0.25. Our model accurately predicted the effect of the three with the lowest predicted PSI. Our systematic computational mutagenesis revealed a hotspot for mutations around 25–30 nucleotides downstream of the exon-intron junction that were predicted to result in production of only the 3R isoform (Figure 6A). Indeed, the experimentally validated VUSs with PSIs less than 0.25 were in this region.

In principle, our splicing model can be extended to other exon-intron junctions, although RBPs that recognize SRE motifs have different binding contexts (Dominguez et al., 2018) and the exact binding preferences of the RBPs that regulate the junction of interest are currently unknown. Another limitation is that the current model does not consider structural and sequence features around the 3’ splice site (in the case of MAPT exon 10, the intron 9 – exon 10 junction) that are expected to impact exon 10 splicing regulation. Although our model provides an exact PSI prediction for each mutation, we emphasize that its principal utility is in predicting the direction in which the 3R to 4R isoform ratio shifted from the WT ratio.

In brain tissue from healthy individuals, exon 10 PSIs varied between individuals and between tissues within an individual (Figure 1A). Even in individuals with progressive supranuclear palsy, a tauopathy in which MAPT variants are implicated, there was variability in exon 10 PSIs in different brain tissues (Majounie et al., 2013). Thus, although our model combines both structural and sequence features to achieve quantitative prediction accuracy of the 3R to 4R ratio for a wide range of disease mutations (synonymous, non-synonymous, intronic, and exonic), it is not clear that PSI alone is predictive of severity of disease for the broad class of tauopathies (Majounie et al., 2013). Disease severity is compounded by other factors including gene-gene interactions and environmental factors. As such, the value of our model stems more from how it incorporates RNA structure in predicting alternative splicing rather than as a direct predictor of disease severity. Many neurodegenerative diseases are caused by mutations around the MAPT exon 10 – intron 10 junction, and there are no approved therapeutics that target this junction. Our work suggests that it is crucial to consider the larger structural context of this region of the pre-mRNA and the interplay between structure and SREs when considering the consequences of mutations on splicing regulation and the design of potential therapeutics to alter this ratio.

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Cell line (Homo sapiens)	T-47D	ATCC	HTB-133	Epithelial cells from ductal carcinoma of breast
Cell line (H. sapiens)	SH-SY5Y	ATCC	CRL-2266	Neuroblastoma cells from bone marrow
Recombinant DNA reagent	Wildtype splicing reporter plasmid for MAPT exon 10 – intron 10 junction	Tan et al., 2019
Sequence-based reagent	Exon10Intron10-Fwd	This paper	PCR primers	AACGTCCAGTCCAAGTGTGG
Sequence-based reagent	Exon10Intron10-Rev	This paper	PCR primers	CATTCACCCAGAGGTCGCAG
Sequence-based reagent	3 R/4R-Fwd	This paper	PCR primers	GCAGTGGTCCGTACTCCAC
Sequence-based reagent	3 R/4R-Rev	This paper	PCR primers	TGATGGATGTTGCCTAATGAGC

Share this article

Cite this article

Three repeat (3R) and four repeat (4R) mature MAPT transcripts are expressed in a 1:1 ratio in samples from human subjects and mature exons appear to fold as independent structural units.

Figure 1—source data 1

Figure 1—source data 2

Figure 1—source data 3

The four repeat (4R) and three repeat (3R) mutations shift dimethyl sulfate (DMS) reactivity-guided structural ensemble of exon 10 – intron 10 junction to be less structured and more structured, respectively.

Figure 2—source data 1

Figure 2—source data 2

The best predictor of exon 10 percent spliced in (PSI) for intronic and synonymous mutations was the unfolding free energy of precursor mRNA (pre-mRNA) during the Bact stage of splicing.

Combining the strength of all splicing regulatory elements (SREs) significantly improves prediction of exon 10 percent spliced in (PSI) compared to using only splice site strength.

Figure 4—source data 1

Combining structure and splicing regulatory element (SRE) strength into a unified model best predicts exon 10 percent spliced in (PSI).

Figure 5—source data 1

Combined model is predictive of exon 10 inclusion ratios for previously uncharacterized mutations.

Figure 6—source data 1

Figure 6—source data 2

Figure 6—source data 3

Figure 6—source data 4

Figure 6—source data 5

Author details

Jayashree Kumar

Contribution

Competing interests

Lela Lackey

Contribution

Competing interests

Justin M Waldern

Contribution

Competing interests

Abhishek Dey

Contribution

Competing interests

Anthony M Mustoe

Contribution

Competing interests

Kevin M Weeks

Contribution

Competing interests

David H Mathews

Contribution

Competing interests

Alain Laederach

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

The best predictor of exon 10 percent spliced in (PSI) for intronic and synonymous mutations was the unfolding free energy of precursor mRNA (pre-mRNA) during the B^act stage of splicing.