Massively parallel reporter assay (MPRA) to determine the effects of UTR variants on RNA stability.

(A) MPRA workflow. In brief, 6,555 reference (ref) and mutant (mt) UTR pairs were synthesized in bulk, ligated with promoters and reporter sequences, in vitro-transcribed into capped and tailed RNAs, transfected into human cell lines, and then the remaining RNAs were collected over a time-course. The collected RNAs were reverse-transcribed, amplified and sequenced to resolve the genotype of each UTR. The unique sequences were used to calculate RNA half-life. Mutational effects were inferred from those pairs significantly differing in half-life (see Methods). (B) Volcano plot of MPRA data from three repeated experiments. The colored dots indicate significant stability-altering variants. (C) Examples of significant stability-altering UTR mutations in both UTR types.

AREs generally destabilize RNAs (except extremely U-rich AREs).

(A) AREs of both UTR types destabilize RNA. (B) The ten most influential AREs in terms of RNA stability in SH- SY5Y cells. Coefficients are determined by regression analysis, representing the effect size of each motif.

Inferential statistical analysis of RNA stability determinants.

(A) Workflow of variable selection to build models of influencers of RNA stability. (B-E) Influential regulators for the 5’ UTR library from HEK293T cells (B), the 3’ UTR library from HEK293T cells (C), the 5’ UTR library from SH-SY5Y cells (D), and the 3’ UTR library from SH-SY5Y cells (E). The error bars represent 95% confidence intervals of the coefficients. Note that the factors presented on the figure are representative of their respective clusters (see Methods and Supplemental Fig. S3).

The UTR UA dinucleotides and UA-rich motifs are the most common and influential RNA destabilizing factor.

(A) Correlation of the 5’ UTR UA dinucleotide ratio and half-life. (B) Top 15 influential factors in the UA cluster of 5’ UTR. UTRs are arranged by half-life, and factors by their coefficient to half-life. Note that there are destabilizing factors (such as UA and AU dinucleotides) as well as stabilizing factors (such as GC content and G monomers) in this cluster. UA dinucleotide and WWWWWW (PPIE) (where W represents A/U) are representative of the cluster for modeling UTR stability in SH-SY5Y and HEK293T cells, respectively. (C) Correlation of the 3’ UTR UA dinucleotide ratio and half-life. (D) Top 15 influential factors in the UA cluster of 3’ UTR. (E) Mutational gain of a UA dinucleotide by 3’ UTRs significantly reduces RNA stability (lower panel). (F) Gain of UA dinucleotides in a random 5’ UTR library led to RNA destabilization. We categorized pairs with a ≥ 1.5-fold change as ’significant’ (Sig) and those with less than this threshold as ’non-significant’ (Non-sig). (G) High UA-nucleotide ratios of both UTRs reduce endogenous RNA stability in HEK293 cells. Q1-Q4 denote quantile groups categorized based on the UA-dinucleotide ratio.

GC content, RBP and ribosome binding shields RNA from the destabilizing effect of UA dinucleotides.

(A) MPRA data of SH-SY5Y cells stratified according to the GC content (GC%) of UTRs. The data was divided into high and low groups according to the median of GC%. In both UTRs, the destabilizing effect of UA dinucleotides is more evident in the context of low GC content (right panels). P values were determined by linear regression. (B) High GC content hinders the effect of altered UA dinucleotides in mutant UTRs. The destabilizing effect of UA- addition (blue) and the stabilizing effect of UA-deletion (crimson) are only observed under the condition of low GC content. P values were determined by a two-sided Wilcoxon rank sum test. (C) Destabilizing effect of UA dinucleotide is observed with 5’ UTR random library. High GC content hinders the UA-destabilizing effect. (D) UA dinucleotides are enriched in P-body- resident mRNAs. ρ represents Spearman’s correlation coefficient. (E) High GC content inhibits enrichment of UA dinucleotide-hosting mRNAs in P-bodies. For medium or low GC%, a high UA-dinucleotide ratio promoted the P-body localization of mRNAs, but this was not the case for high GC%. This effect was more prominent for 3’ UTRs. P values were determined by a two-sided Wilcoxon rank sum test. (F) UTRs with more eCLIP RBP binding signals per UA dinucleotide are more stable. The high and low groups was stratified based on the median of number of RBPs per UA. (G) UTRs harboring more predicted RBP-binding sites per UA dinucleotide are more stable, as determined by MPRA. P values were determined by two- sided Wilcoxon rank sum test (F and G). (H) Comparison of RNA half-life of high-UA UTRs determined by MPRA and transcription inhibition with actinomycin D (ActD) (H). ρ represents Spearman’s correlation coefficient. (I) RNA stability assay with Actinomycin D treatment. Error bars are standard errors computed from three experimental replicates. **: FDR-adjusted p value = 0.002. (J) Association of UA dinucleotide-binding protein motifs with RNA half-life in SH-SY5Y cells. Note that UA-binding RBPs can have both positive and negative effects on RNA stability.

UA dinucleotides delineate functional gene groups.

(A) The top ten biological processes for which the 5’ UTR UA-dinucleotide ratio most significantly deviated from the genomic background (dashed line). (B) The top ten biological processes for which the 3’ UTR UA-dinucleotide ratio most significantly deviated from the genomic background. (C) Functional gene groups for which the 5’ UTR UA-dinucleotide ratio was significantly above or below the genomic background in more than ten sliding windows. (D) Functional gene groups for which the 3’ UTR UA-dinucleotide ratio was significantly above or below the genomic background in more than ten sliding windows. (E) Biological processes for RNAs in which the UA-dinucleotide ratios of both 5’ and 3’ UTRs are significantly different from the genomic background (dashed lines). (F) Molecular functions for RNAs in which the UA-dinucleotide ratios of both 5’ and 3’ UTRs are significantly different from the genomic background (dashed lines). The thin solid lines represent the standard deviation of the UA-dinucleotide ratio within the gene group.

UTR variants associated with disease. (A-B)

3’ UTR mutations that increase RNA and protein expression in carcinoma samples. Protein level was determined by reverse phase protein arrays (RPPA). (C) QQ plot of the p value distribution of stability-altering UTR variants in association with health biomarkers or self-reported diseases against a theoretical distribution. (D) The G allele of the most significant UTR variant (rs5128) identified in (C) is associated with plasma triglyceride levels in the Taiwanese population (TWB dataset).