Massively parallel reporter assay (MPRA) to determine the effects of UTR variants on RNA stability. (A)

MPRA workflow. In brief, 6,555 WT and mutant UTR pairs were synthesized in bulk, ligated with promoters and reporter sequences, in vitro-transcribed into capped and tailed RNAs, transfected into human cell lines, and then the remaining RNAs were collected over a time-course. The collected RNAs were reverse-transcribed, amplified and sequenced to resolve the genotype of each UTR. The unique sequences were used to calculate RNA half-life. Mutational effects were inferred from those pairs significantly differing in half-life (see Methods). (B) Volcano plot of MPRA data from three repeated experiments. The colored dots indicate significant stability-altering variants. (C) Examples of significant stability-altering UTR mutations in both UTR types.

AREs generally destabilize RNAs (except extremely U-rich AREs). (A)

AREs of both UTR types destabilize RNA. (B) The ten most influential AREs in terms of RNA stability. Coefficients are determined by regression analysis, representing the effect size of each motif.

Inferential statistical analysis of RNA stability determinants. (A)

Workflow of variable selection to build models of influencers of RNA stability. (B-E) Influential regulators for the 5’ UTR library from HEK293T cells (B), the 3’ UTR library from HEK293T cells (C), the 5’ UTR library from SH-SY5Y cells (D), and the 3’ UTR library from SH-SY5Y cells (E). The error bars represent 95% confidence intervals of the coefficients. Note that the factors presented on the figure are representative of their respective clusters (see Methods and Supplemental Fig. S3).

The UTR TA dinucleotide ratio is the most common and influential RNA destabilizing factor. (A)

Correlation of the 5’ UTR TA dinucleotide ratio and half-life. (B) Top 15 influential factors in the TA cluster of 5’ UTR. UTRs are arranged by half-life, and factors by their coefficient to half-life. Note that there are destabilizing factors (such as TA and AT dinucleotides) as well as stabilizing factors (such as GC content and G monomers) in this cluster. TA dinucleotide and WWWWWW (PPIE) (where W represents A/T) are representative of the cluster for modeling UTR stability in SH-SY5Y and HEK293T cells, respectively. (C) Correlation of the 3’ UTR TA dinucleotide ratio and half-life. (D) Top 15 influential factors in the TA cluster of 3’ UTR. (E) Mutational gain of a TA dinucleotide by 3’ UTRs significantly reduces RNA stability (lower panel). (F) Gain of TA dinucleotides in a random 5’ UTR library led to RNA destabilization. We categorized pairs with a ≥ 1.5-fold change as ’significant’ (Sig) and those with less than this threshold as ’non-significant’ (Non-sig). (G) High TA-nucleotide ratios of both UTRs reduce endogenous RNA stability in HEK293 cells. Q1-Q4 denote quantile groups categorized based on the TA-dinucleotide ratio.

GC content, RBP and ribosome binding shields RNA from the destabilizing effect of TA dinucleotides. (A)

MPRA data stratified according to the GC content (GC%) of UTRs. The data was divided into high and low groups according to the median of GC%. In both UTRs, the destabilizing effect of TA dinucleotides is more evident in the context of low GC content (right panels). P values were determined by linear regression. (B) High GC content hinders the effect of altered TA dinucleotides in mutant UTRs. The destabilizing effect of TA-addition (blue) and the stabilizing effect of TA-deletion (crimson) are only observed under the condition of low GC content. P values were determined by a two-sided Wilcoxon rank sum test. (C) Destabilizing effect of TA dinucleotide is observed with 5’ UTR random library. High GC content hinders the TA-destabilizing effect. (D) TA dinucleotides are enriched in P-body-resident mRNAs. ρ represents Spearman’s correlation coefficient. (E) High GC content inhibits enrichment of TA dinucleotide-hosting mRNAs in P-bodies. For medium or low GC%, a high TA dinucleotide ratio promoted the P-body localization of mRNAs, but this was not the case for high GC%. This effect was more prominent for 3’ UTRs. P values were determined by a two-sided Wilcoxon rank sum test. (F) UTRs with more eCLIP RBP binding signals per TA dinucleotide are more stable. The high and low groups was stratified based on the median of number of RBPs per TA. (G) UTRs harboring more predicted RBP-binding sites per TA dinucleotide are more stable, as determined by MPRA. P values were determined by two-sided Wilcoxon rank sum test (F and G). (H) Comparison of RNA half-life of high-TA UTRs determined by MPRA and transcription inhibition with actinomycin D (ActD) (H). ρ represents Spearman’s correlation coefficient. (I) RNA stability assay with Actinomycin D treatment. Error bars are standard errors computed from three experimental replicates. **: FDR-adjusted p value = 0.002. (J) Association of TA-dinucleotide-binding protein motifs with RNA half-life in SH-SY5Y cells. Note that TA-binding RBPs can have both positive and negative effects on RNA stability.

TA dinucleotides delineate functional gene groups. (A)

The top ten biological processes for which the 5’ UTR TA dinucleotide ratio most significantly deviated from the genomic background. (B) The top ten biological processes for which the 3’ UTR TA dinucleotide ratio most significantly deviated from the genomic background. (C) Functional gene groups for which the 5’ UTR TA dinucleotide ratio was significantly above or below the genomic background in more than ten sliding windows. (D) Functional gene groups for which the 3’ UTR TA dinucleotide ratio was significantly above or below the genomic background in more than ten sliding windows. (E) Biological processes for RNAs in which the TA dinucleotide ratios of both 5’ and 3’ UTRs are significantly different from the genomic background. (F) Molecular functions for RNAs in which the TA dinucleotide ratios of both 5’ and 3’ UTRs are significantly different from the genomic background.

UTR variants associated with disease. (A-B)

3’ UTR mutations that increase RNA and protein expression in carcinoma samples. Protein level was determined by reverse phase protein arrays (RPPA). (C) QQ plot of the p value distribution of stability-altering UTR variants in association with health biomarkers or self-reported diseases against a theoretical distribution. (D) The G allele of the most significant UTR variant (rs5128) identified in (C) is associated with plasma triglyceride levels in the Taiwanese population (TWB dataset).