Many m6Am genes are mistakenly annotated.

(A) Genes tend to have multiple TSSs. Shown is a histogram displaying the number of TSS per protein-coding gene in HEK293T cells. TSSs (87,624 TSSs from 9,199 genes) were mapped using ReCappable-seq. These TSSs have expression level ≥1 TPM (transcription-start nucleotide per million).

(B) and (C) Examples of genes that were mistakenly classified by previous studies15. SRSF1(B) was previously designated as m6Am because of the miCLIP signals overlapping with A- TSSs. However, based on ReCappable-seq, ∼41.5% of the reads mapped to non-A-TSSs in SRSF1. Notably, one of the most expressed m6Am A-TSS (chr17:58,007,228) was mistakenly considered as internal m6A because this position was not previously annotated as a TSS40. ADAR (C) was previously classified as Gm. There is no m6Am signal based on miCLIP15, m6Am-seq12, or m6ACE-seq13 mapped to ADAR. However, ∼71% of the transcripts from ADAR are A- initiated.

(D) Previously classified m6Am genes express considerable levels of non-A-initiated transcripts. Each column represents a gene previously classified as m6Am gene by miCLIP15. For each gene, the percentage of transcript isoforms starting with m6Am/Am (in green) or Gm/Cm/Um (in gray) are shown. The percentage was calculated by weighting each transcript isoform by its expression level. The TSN frequencies were obtained using ReCappable-seq in HEK293T cells.

CROWN-seq correctly maps and quantifies m6Am.

(A) Schematic of CROWN-seq. RNAs are firstly treated with sodium nitrite, which causes Am at the transcription-start position to be converted to Im. To isolate the TSN, m7G caps are replaced with 3’-desthiobiotin (DTB) caps. These DTB caps are enriched on streptavidin beads, while uncapped background RNA fragments are uncapped and washed away. After washing, an enriched pool of transcript 5’ ends are released from the beads by cleaving the triphosphate bridge, leaving 5’ monophosphate ends that are ligated to an adapter. After adapter ligation, cDNA was synthesized and amplified for illumina sequencing. During sequencing, the converted sequences were aligned to a reference genome. The TSNs can be determined as the first base immediately after the 5’ adapter sequence. To quantify m6Am stoichiometry, we count the number of A (m6Am) and G (Am) bases at the TSN position.

(B) CROWN-seq enriches reads that contain the TSN. The relative coverage of reads mapped to the TSS and non-TSS regions across the m7G-ppp-Am-initiated RNA standard were calculated. The average relative coverage of reads that map to the TSS and to non-TSS positions are shown for three replicates. The 95% CI of the relative coverages are shown using error bars.

(C) CROWN-seq exhibits high quantitative accuracy for measuring m6Am stoichiometry. RNA standards (Table S2) were prepared with 0%, 25%, 50%, 75%, and 100% m6Am stoichiometry. To make m6Am standards in different m6Am levels, we generated both Am transcripts and m6Am transcripts by in vitro transcription with cap analogs m7G-ppp-Am and m7G-ppp-m6Am. Five transcripts were made in the Am and m6Am form and mixed to achieve the indicated m6Am stoichiometry. These transcripts have identical 5’ ends and different barcodes (Table S2). Linear least-squares regression was performed in calculating the correlation between expected non-conversion rates and the observed average non-conversion rates for each standard. All TSNs shown in this plot have high sequencing coverage, ranging from 656 to 21,545 reads.

(D) CROWN-seq results for SRSF1. CROWN-seq shows that 54.0% transcripts of SRSF1 initiate with A. Among the A-initiated transcripts, 93.4% were resistant to conversion (A’s, shown in green), and therefore m6Am. As a result, SRSF1 has 50.4% m6Am transcripts, 3.6% Am transcripts, and 46.0% non-A-initiated transcripts. Notably, a previously miCLIP study identified an internal m6A site40 which we found was m6Am at the TSN based on CROWN-seq.

(E) CROWN-seq results for JUN. CROWN-seq shows that ∼58% transcripts from JUN initiate with A. Unlike SRSF1 which A-TSNs are highly methylated, JUN A-TSNs are only ∼75% methylated. As a result, JUN has 43.5% m6Am transcripts, 14.5% Am transcripts, and 42% non-A-initiated transcripts.

(F) CROWN-seq identifies most m6Am sites identified in previous studies. 7,480 m6Am sites in HEK293T cells found either by miCLIP15, m6Am-seq12, or m6ACE-seq13 were analyzed. The high-confidence sites in CROWN-seq were defined as A-TSN with ≥20 unique mapped reads. Results shown are from HEK293T cells, which is the same cell line used in all previous studies. Among the 1,284 sites uniquely found in other studies, 811 sites are also mapped by CROWN-seq but at lower coverage (1-19 reads); 343 sites are mapped very far (>100 nt) away from any TSS annotations and thus can be considered as false positives; the remaining 130 sites mapped very closely to known TSSs may be false negative results in CROWN-seq.

(G) Many A-TSNs identified in CROWN-seq in HEK293T cells are not annotated. In this analysis, A-TSSs in (F) were intersected with the TSS annotation in Gencode v45. Only 12.2% A-TSSs found by CROWN-seq are previously annotated.

(H) CROWN-seq exhibits high accuracy in TSN discovery. In this analysis, we compared the non-conversion of A-TSNs between wild-type and PCIF1 knockout cells. For the 6,457 A-TSNs annotated by Gencode v45, most of them have high non-conversion rates in wild-type cells and very low non-conversion rates in PCIF1 knockout cells, indicating correct TSN mapping. Similar to the annotated TSNs, 25,435 newly found A-TSNs also found to have differential m6Am between wild-type and PCIF1 knockout. Thus, these newly found A-TSNs were also mostly true positives. In this analysis, only A-TSNs mapped by mapped by at least 20 reads in both wild-type and PCIF1 knockout HEK293T cells were used.

(I) The previously identified m6Am sites are biasedly in higher expression and higher m6Am stoichiometry. Shown are the sequencing coverage (left) and non-conversion rates (right) of different sets of m6Am sites in HEK293T CROWN-seq data. In total, 98,147 sites found by CROWN-seq, 2,129 sites found by miCLIP15, 3,693 sites found by m6ACE-seq13, and 1,610 sites found by m6Am-seq12 are shown.

(J) CROWN-seq has much higher sensitivity in m6Am discovery than all existing m6Am mapping methods. In this analysis, sensitivity is defined as m6Am/A-TSN found per million mapped reads. For CROWN-seq, sensitivity was defined as the slope of linear regression result between sequencing depth and A-TSN number among different samples in this study (see Figure S2G). For other methods, sensitivity was defined as the number of reported m6Am sites over the number of reads in all libraries required for m6Am identification.

CROWN-seq reveals m6Am landscape in mRNA.

(A) Boxplot showing the overall mRNA m6Am levels (i.e., m6Am stoichiometry) among different cell lines. The exact m6Am levels of the mRNA m6Am sites in this analysis can be found in Table S3. Only m6Am sites with ≥50 reads mapped in at least one cell line were analyzed.

(B) mRNA m6Am stoichiometry is positively correlated with PCIF1 expression. In this plot, PCIF1 expression was estimated by the number of reads mapped to PCIF1 TSSs. The read counts were normalized into transcription-start nucleotide per million (TPM) for gene expression comparison. Three cell lines (CCD841 CoN, HCT-116, and HT-29) whose PCIF1 expression was estimated by western blots and RT-qPCR by Wang et al. 5 are highlighted. Pearson’s r and p-value in this analysis were obtained by linear regression.

(C) Overall mRNA m6Am stoichiometry is not correlated with CTBP2 expression. Similar to (B), CTBP2 expression was estimated by CROWN-seq. Four cell lines with very low CTBP2 expression are highlighted.

(D) Some A-TSNs have relatively low and more variable m6Am stoichiometry among cell lines. In this plot, the variability of m6Am stoichiometry of a site, which is defined as the maximum m6Am subtracted by the minimum m6Am stoichiometry among all cell lines is shown on the X- axis; the average m6Am level of a site among all cell lines is shown on the Y-axis. Several example genes are indicated in different colors.

(E) Boxplots and dotplots showing the m6Am levels of different A-TSNs in JUN, ENO1, MYC, and ACTD. These genes contain A-TSNs with relatively low m6Am stoichiometry. In this plot, the exact m6Am levels of individual A-TSNs are shown in dots, while the median and IQR of the m6Am levels are shown in boxplots. Only m6Am sites with ≥50 reads mapped were analyzed.

(F) Gene Ontology enrichment (Cellular Components) of genes containing lowly methylated m6Am sites. The full Gene Ontology results can be found in Table S4.

(G) A-TSNs in histone genes tend to have relatively low m6Am stoichiometry. In this plot, histone genes are categorized by their genomic localizations. Histone gene cluster 6p22.1-2 and 1q21.1-2 are the two major histone gene clusters. For histone gene cluster 6p22.1-2, 55 to 173 A-TSNs are shown in different cell lines; for histone gene cluster 1q21.1-2, 9 to 14 A-TSNs are shown; for other histone genes, 24 to 109 A-TSNs are shown.

m6Am stoichiometry is related to core promoter sequence.

(A) m6Am stoichiometry is related to base composition in both upstream and downstream of A- TSNs. In these plots, 58,723 A-TSNs are grouped into twenty 5-percentile bins (X-axis). For each bin, the frequency of A, T, C, and G bases at each position relative to the A-TSN are plotted on the Y-axis. Among different positions in promoter region, C’s in −4, −1, and +3, as well as G’s in +2 are positively correlated with high m6Am; while A’s in +2 are negatively correlated with high m6Am. Results for other promoter positions can be found in Figure S4A.

(B) Motif analysis of A-TSNs with the lowest 5% m6Am stoichiometry (upper) and the A-TSNs with the highest 5% m6Am stoichiometry (lower). The core promoter region (−40 to +41) was screened for enriched motifs. The lowest 5% A-TSNs exhibited a VA+1RR TSS (V=A/C/G, R=A/G) motif, while the highest 5% A-TSNs exhibited a SSCA+1GC (S=C/G) motif. The sequence contexts for all A-TSNs are shown in Figure S3E.

(C) A-TSNs expressed from different core promoters exhibit different m6Am stoichiometry. Core promoters containing the VA+1RR motif produce transcripts with relatively low m6Am stoichiometry. Transcripts using the SSCA+1GC motif exhibited relatively high m6Am stoichiometry. In comparison, the m6Am stoichiometry in conventional A-TSNs from either BBCA+1BW or BA+1 are also shown and exhibit intermediate m6Am stoichiometry. In this analysis, 14,788, 7,981, 34,578 and 1,376 A-TSNs were used for each of the four motifs. P- values, Student’s t-test, two-sided.

(D) TATA-box containing core promoters exhibit relatively low m6Am stoichiometry. For this analysis, TATA-box is defined as TATAWAWR29. Because many TATA-boxes found in our A- TSN dataset are outside the classic −31 to −24 region, we extended the region for the TATA-box search to −36 to −19. Since histone genes preferentially contain TATA box, we separately plotted TATA-box-containing histone genes (N=155) and TATA-box-containing non-histone genes (N=28). 58,540 A-TSNs without TATA-box are also shown. P-values, Student’s t-test, two-sided.

PCIF1 knockout leads to m6Am and TSS motif-related A-TSN expression changes.

(A) m6Am stoichiometry is positively related to A-TSN expression in wild-type HEK293T cells. In this cumulative distribution plot, the expression of each A-TSN was quantified by ReCappable-seq, for all A-TSNs in each indicated m6Am stoichiometry bin. A-TSNs (n=58,723) were grouped into five bins based on m6Am stoichiometry quantified by CROWN-seq. In total, 5,125, 6,962, 7,991, 8,368, and 8,009 A-TSNs are shown in each bin (from low m6Am to high m6Am). These A-TSNs have average TPM ≥1 in two ReCappable-seq replicates and coverage ≥50 in CROWN-seq. P-values, Student’s t-test for TPM (log-transformed), two-sided.

(B) The expression level of high m6Am stoichiometry A-TSNs is reduced in PCIF1 knockout. Shown is a cumulative distribution plot of A-TSN expression change in HEK293T cells upon PCIF1 knockout. The differential expression of A-TSN were calculated by DESeq241. Similar to (A), the A-TSNs were binned based on the m6Am stoichiometry. In total, 3,269, 2,272, 3,218, 3,813, and 3,369 A-TSNs are shown in each bin (from low m6Am to high m6Am). A-TSNs with a baseMean (i.e., the average of the normalized count among replicates) ≥100 were used in the differential expression test (two replicates were used for both wild-type and PCIF1 knockout) and coverage ≥50 reads in CROWN-seq. P-values, Student’s t-test, two-sided.

(C) Shown are cumulative distribution plots of expression changes of A-TSNs and G-TSNs after PCIF1 depletion. 14,516 A-TSNs and 9,667 G-TSN with expression levels quantified by ReCappable-seq are shown. These A-TSNs and G-TSNs have baseMean ≥100 during differential expression test. P-values, Student’s t-test, two-sided.

(D) Similar to (B), A-TSNs that use different TSS motifs exhibit different changes in expression upon PCIF1 knockout. In total, 352 A-TSNs using SSCA+1GC, 7,928 A-TSNs using BA+1, 2,958 A-TSNs using BBCA+1BW, and 2,760 A-TSNs using VA+1RR are shown. These A-TSNs have baseMean≥100 during differential expression test (2 replicates for both wild-type and PCIF1 knockout) and coverage ≥50 reads in CROWN-seq. P-values, Student’s t-test, two-sided.

CROWN-seq reveals m6Am landscape in snRNA and snoRNA.

(A) Heatmaps showing m6Am stoichiometry in different snRNA gene families and isoforms. The name of each snRNA isoform are shown on the right. A-TSNs already annotated in Gencode v45 are highlighted in bold. For newly found A-TSNs, the relative distance between the new A- TSN and the nearest annotated A-TSN are showed in brackets. Note that Gencode v45 contains snRNA annotation from different databases. For example, RNU1-4 and U1.22 are both U1 snRNA, however, RNU1-4 is from HGNC database and U1.22 is from RFAM database.

(B) Similar to (A), Heatmaps showing the m6Am stoichiometry in different snoRNA isoforms.

(C) and (D), snRNA methylation levels are not well correlated with PCIF1 expression, but negatively correlated with FTO expression. The RNA expression levels of PCIF1 and FTO were estimated by read counts in CROWN-seq, which were converted into TPM to normalize the sequencing depth. Linear regressions were performed to obtain Pearson’s r and p-value of the correlations.

(E) FTO depletion leads to increased m6Am level (i.e., m6Am stoichiometry) in many kinds of snRNA and snoRNA. In this plot, the difference in m6Am levels between wild-type and FTO knockout cells are shown in the first row. The exact m6Am levels in FTO knockout and wild-type cells are shown in the second and third rows. Different kinds of snRNA and snoRNA are shown in different colors.

(F) FTO depletion leads to increased m6Am stoichiometry in snRNA and snoRNA pseudogenes. In this plot, shown are the annotated pseudogenes of U1, U2, U4, U5, and 7SK, as well as the newly identified snRNA/snoRNA pseudogenes in intronic and intergenic regions. Several mRNA exhibited 5’ ends resembling snRNA pseudogenes. However, these snRNA-like mRNA 5’ ends showed high and stable m6Am stoichiometry in both wild-type and FTO knockout cells.