CROWN-seq correctly maps and quantifies m6Am.
(A) Schematic of CROWN-seq. RNAs are firstly treated with sodium nitrite, which causes Am at the transcription-start position to be converted to Im. To isolate the TSN, m7G caps are replaced with 3’-desthiobiotin (DTB) caps. These DTB caps are enriched on streptavidin beads, while uncapped background RNA fragments are uncapped and washed away. After washing, an enriched pool of transcript 5’ ends are released from the beads by cleaving the triphosphate bridge, leaving 5’ monophosphate ends that are ligated to an adapter. After adapter ligation, cDNA was synthesized and amplified for illumina sequencing. During sequencing, the converted sequences were aligned to a reference genome. The TSNs can be determined as the first base immediately after the 5’ adapter sequence. To quantify m6Am stoichiometry, we count the number of A (m6Am) and G (Am) bases at the TSN position.
(B) CROWN-seq enriches reads that contain the TSN. The relative coverage of reads mapped to the TSS and non-TSS regions across the m7G-ppp-Am-initiated RNA standard were calculated. The average relative coverage of reads that map to the TSS and to non-TSS positions are shown for three replicates. The 95% CI of the relative coverages are shown using error bars.
(C) CROWN-seq exhibits high quantitative accuracy for measuring m6Am stoichiometry. RNA standards (Table S2) were prepared with 0%, 25%, 50%, 75%, and 100% m6Am stoichiometry. To make m6Am standards in different m6Am levels, we generated both Am transcripts and m6Am transcripts by in vitro transcription with cap analogs m7G-ppp-Am and m7G-ppp-m6Am. Five transcripts were made in the Am and m6Am form and mixed to achieve the indicated m6Am stoichiometry. These transcripts have identical 5’ ends and different barcodes (Table S2). Linear least-squares regression was performed in calculating the correlation between expected non-conversion rates and the observed average non-conversion rates for each standard. All TSNs shown in this plot have high sequencing coverage, ranging from 656 to 21,545 reads.
(D) CROWN-seq results for SRSF1. CROWN-seq shows that 54.0% transcripts of SRSF1 initiate with A. Among the A-initiated transcripts, 93.4% were resistant to conversion (A’s, shown in green), and therefore m6Am. As a result, SRSF1 has 50.4% m6Am transcripts, 3.6% Am transcripts, and 46.0% non-A-initiated transcripts. Notably, a previously miCLIP study identified an internal m6A site40 which we found was m6Am at the TSN based on CROWN-seq.
(E) CROWN-seq results for JUN. CROWN-seq shows that ∼58% transcripts from JUN initiate with A. Unlike SRSF1 which A-TSNs are highly methylated, JUN A-TSNs are only ∼75% methylated. As a result, JUN has 43.5% m6Am transcripts, 14.5% Am transcripts, and 42% non-A-initiated transcripts.
(F) CROWN-seq identifies most m6Am sites identified in previous studies. 7,480 m6Am sites in HEK293T cells found either by miCLIP15, m6Am-seq12, or m6ACE-seq13 were analyzed. The high-confidence sites in CROWN-seq were defined as A-TSN with ≥20 unique mapped reads. Results shown are from HEK293T cells, which is the same cell line used in all previous studies. Among the 1,284 sites uniquely found in other studies, 811 sites are also mapped by CROWN-seq but at lower coverage (1-19 reads); 343 sites are mapped very far (>100 nt) away from any TSS annotations and thus can be considered as false positives; the remaining 130 sites mapped very closely to known TSSs may be false negative results in CROWN-seq.
(G) Many A-TSNs identified in CROWN-seq in HEK293T cells are not annotated. In this analysis, A-TSSs in (F) were intersected with the TSS annotation in Gencode v45. Only 12.2% A-TSSs found by CROWN-seq are previously annotated.
(H) CROWN-seq exhibits high accuracy in TSN discovery. In this analysis, we compared the non-conversion of A-TSNs between wild-type and PCIF1 knockout cells. For the 6,457 A-TSNs annotated by Gencode v45, most of them have high non-conversion rates in wild-type cells and very low non-conversion rates in PCIF1 knockout cells, indicating correct TSN mapping. Similar to the annotated TSNs, 25,435 newly found A-TSNs also found to have differential m6Am between wild-type and PCIF1 knockout. Thus, these newly found A-TSNs were also mostly true positives. In this analysis, only A-TSNs mapped by mapped by at least 20 reads in both wild-type and PCIF1 knockout HEK293T cells were used.
(I) The previously identified m6Am sites are biasedly in higher expression and higher m6Am stoichiometry. Shown are the sequencing coverage (left) and non-conversion rates (right) of different sets of m6Am sites in HEK293T CROWN-seq data. In total, 98,147 sites found by CROWN-seq, 2,129 sites found by miCLIP15, 3,693 sites found by m6ACE-seq13, and 1,610 sites found by m6Am-seq12 are shown.
(J) CROWN-seq has much higher sensitivity in m6Am discovery than all existing m6Am mapping methods. In this analysis, sensitivity is defined as m6Am/A-TSN found per million mapped reads. For CROWN-seq, sensitivity was defined as the slope of linear regression result between sequencing depth and A-TSN number among different samples in this study (see Figure S2G). For other methods, sensitivity was defined as the number of reported m6Am sites over the number of reads in all libraries required for m6Am identification.