paSNVs disrupting cleavage/polyadenylation signals are depleted in the normal population.

(A) Bioinformatics workflow used to analyse the effect of paSNVs on pre-mRNA cleavage and polyadenylation.

(B) Top, effects of UP- and DOWN-paSNVs on the APARENT2 score (mean±SEM) as a function of their position with respect to annotated pre-mRNA cleavage sites (CSs). Bottom, combined distribution of AWTAAA-affecting paSNVs in both datasets.

(C) Box plot showing that paSNVs disrupting polyadenylation signals are significantly less frequent compared to control groups of events in normal population.

(D) paSNVs disrupting polyadenylation signals are enriched for singletons, consistent with purifying selection against such events in normal population.

Cancer somatic mutations tend to disrupt functional cleavage/polyadenylation signals.

(A) Bar plot showing enrichment of paSNVs disrupting polyadenylation signals among cancer somatic mutations.

(B) Bar plot showing enrichment of SNVs affecting AWTAAA sequences in 3’UTRs close to annotated cleavage sites (CSs) among cancer somatic mutations.

(C) Box plot showing that somatic mutations disrupt stronger cleavage/polyadenylation signals in cancer.

(D) paSNVs disrupting polyadenylation signals occurs in more evolutionary conserved regions in cancer (mean PhastCons score in 15-nt window centred at SNVs).

Somatic cancer mutations often disrupt cleavage/polyadenylation signals in tumour suppressor genes.

(A-B) Overrepresentation of (A) tumour suppressors but not (B) oncogenes among genes with cancer somatic DOWN-paSNVs, as compared to genes with cancer somatic BG-paSNVs. Fractions of tumour suppressors and oncogenes are also shown for all genes and genes containing cancer somatic nonsense (premature stop codons), missense (altered amino acid residues) and synonymous (synonymous codons) mutations. Note that the enrichment of tumour suppressors is stronger for DOWN-paSNVs compared to nonsense mutations.

(C-D) Enrichment of different groups of cancer somatic SNVs in (C) tumour suppressors and (D) oncogenes calculated using DigDriver relative to genes not listed in Cancer Census (non-Census) and presented with 95% confidence intervals. Note that DOWN-paSNVs and nonsense mutations are enriched in tumour suppressors but not in oncogenes. In contrast, oncogenes are often affected by missense mutations, as expected.

(E) Cancer somatic DOWN-paSNVs co-occur in the same tumour with non-synonymous damaging SNVs, a group of somatic mutations defined in 20, more often than BG-paSNVs. Note that the co-occurrence is particularly high for tumour suppressors.

(F) The overall frequency of non-synonymous damaging SNVs is significantly higher in the DOWN-paSNV-containing group compared to the DOWN-paSNV-lacking group of tumour suppressor genes.

Somatic cancer DOWN-paSNVs are sufficient to downregulate tumour suppressor genes.

(A-B) Gene-specific expression differences between DOWN-paSNV-containing and wild-type samples (ΔLog2 of copy number variation-normalized FPKM values; see Materials and Methods) reveal a consistently negative effect of DOWN-paSNV on tumour suppressor mRNA abundance in colorectal cancers. Box plots are shown for (A) an aggregated set of qualifying tumour suppressors and (B) individual genes from this set. Outliers are omitted for clarity.

(C) Wild-type and mutated sequences of the XPA tumour suppressor gene cleavage/polyadenylation signal. The PAS hexamer is enclosed within a box.

(D) Top, XPA cleavage site read-through minigenes and corresponding primers used for RT-qPCR analyses. Bottom, RT-qPCR data showing stronger read-through (weaker polyadenylation) in the mutant minigene.

(E) Top, luciferase expression minigenes. Bottom, luciferase assay revealing that the cancer-specific PAS mutation dampens the expression of the reporter gene.

Distribution of cleavage/polyadenylation signal-disrupting mutations in the normal population (1000 genomes dataset).

(A) Box plot comparison of normal-population allele frequencies of cleavage/polyadenylation signal-disrupting mutations defined by considering only AWTAAA gain/loss, only APARENT2 score changes, or both (DOWN-paSNVs).

(B) Bar plot comparison of normal-population fractions of singletons for cleavage/polyadenylation-disrupting mutations defined by considering only AWTAAA gain/loss, only APARENT2 score changes, or both (DOWN-paSNVs).

Cancer somatic DOWN-paSNVs often occur in evolutionarily conserved regions. The plot is generated similarly to Fig. 2D except the conservation was calculated for the exact SNV position.

Cancer somatic DOWN-paSNVs often reside in genes with tumour suppressive functions.

(A) Stacked bar plot showing enrichment of SNVs disrupting polyadenylation signals (DOWN-paSNVs) in tumour suppressors in cancer.

(B) The data in (A) normalized for Census genes only. Note that nonsense mutations show a similar to DOWN-paSNVs enrichment in tumour suppressors, but not oncogenes. Conversely, UP-paSNVs are enriched in oncogenes but not tumour suppressors.

(C) Top 10 GO terms enriched in genes with cancer somatic DOWN-paSNVs. Note the enrichment of apoptosis- and cell death-related functions.

Cancer somatic DOWN-paSNVs are enriched for statistically significant DigDriver events (BH-adjusted P<0.01), suggesting that they may be under positive selection in cancer.

Wild-type tumour suppressor genes tend to have efficient cleavage/polyadenylation signals.

(A) Box plot showing that wild-type tumour suppressors have stronger cleavage/polyadenylation signals than oncogenes and non-Census genes.

(B) All Census genes classifiable as tumour suppressors (“Tumour suppressors+”; see Materials and Methods) have stronger cleavage/polyadenylation signals compared to oncogenes and non-Census genes.

(C) Tumour suppressors associated with “hallmarks of cancer” have stronger cleavage/polyadenylation signals than “non-hallmark” tumour suppressor genes.