Figures and data

The Eμ enhancer regulates the chromatin landscape in the intron but not in the V region in Ramos human B cells.
(A) ChIP-seq profiles of H3K27ac, H3K4me3 and H3K36me3 at the IGH locus from AID−/− and two independent clones (c1 and c2) of AID−/− Eμ−/− Ramos cells, the latter generated as described in Fig. S1C-D. The sequence annotations indicate the positions of the IGH promoter (Prom), the complementary determining regions (CDR1-3), the intronic enhancer (Eμ) and the switch μ region (Sμ). The bioinformatic approach to include multimapping reads (explained in the text and depicted in Fig. 3A) was applied. (B) A magnified view of the locus from A. (C) ChIP-qPCR analysis at IGH from AID−/− and AID−/− Eμ−/− Ramos cells (c1, c2) to measure the relative levels of H3K27ac, H3K4me3, H3K79me3, pan-acetylation of histones H3 and H4, and histone H3. The amplicons used for PCR are indicated in the schematic diagram at the top. The active B2M gene is used as a positive control and a gene desert segment on chromosome 1 is used as a negative control (Neg.). The data represent three independent experiments. Asterisks indicate P < 0.05 using the Student’s t-test and ns indicates not significant (P > 0.05).

Ablation of the Eμ enhancer significantly decreases nascent transcription but not SHM.
(A) RT-qPCR measurements of nascent transcripts at the Ramos V region in AID−/− and AID−/− Eμ−/− cells (c1, c2). To account for potential clonal variation, Ramos cells were spiked with Drosophila S2 cells prior to RNA extraction. The data was normalized to the levels of the Drosophila housekeeping gene, Act5c. GAPDH mRNA was used as a control. Asterisks indicate P<0.05 using the Student’s t-test and ns indicates not significant (P >0.05). (B) RT-qPCR analysis as in A measuring the spliced IGH mRNA. (C) Table of mutation frequencies at the Ramos V region and the JH6 intron (amplicons shown in the diagram above the table) in AID−/− and AID−/− Eμ−/− cells following infection with AID(JP8Bdel) for 7 days. Statistical analysis was performed with the Student’s t-test. (D) Flow cytometry analysis of IgM expression in AID−/− and AID−/− Eμ−/− Ramos cells infected with AID(JP8Bdel) for 7 days. (E) Bar graph summarizing the flow cytometry analyses from D, showing the percent of IgM loss from three independent experiments.

Comparison of transcriptional and mutational landscapes at the endogenous Ramos V region.
(A) Scheme depicting the strategy used to align multimapping reads to the V region. As described in the text and Methods, since upstream, non-recombined V genes are silent in Ramos cells, a read that maps to the recombined V region is retained if it is mapping to any V, D or J segment of the human IGH locus but nowhere else in the human genome. The same principle is applied when mapping reads to murine V regions to the mouse genome. (B) Integrative Genomics Viewer (IGV) browser snapshot of nascent RNA 3’ ends (by PRO-seq) aligned to the Ramos V region. Multimapping (top track) reads are separated from uniquely mapping reads (middle track), and these two tracks are subsequently combined to generate the total profile (bottom track. (C) PRO-cap and PRO-seq 5’ and 3’ end densities, respectively, at the IGH locus along with mutation frequencies at the V region displayed on the Integrative Genomics Viewer (IGV) browser. Mapping of the V region transcriptome was done via the bioinformatic pipeline outlined in A and exemplified in B (total profiles are shown). The locations of the antigen-binding complementary determining regions (CDR1-3) are highlighted. Mutation analysis via MutPE-seq was performed following infection of AID−/− cells with AID(JP8Bdel)-expressing lentiviruses for 7 days. (D) A magnified view of the V region from C above. The stalling zones are shown in the panel between the PRO-seq and MutPE-seq tracks. (E) Details of the mutation patterns at the Ramos V region. Mutated cytidines in AID hotspot motifs (WRCH) are displayed as red bars. All other C:G mutation are shown as black bars and A:T mutations as grey bars. The panel under the graph shows the position of both hotspot (WRCH in black, AGCT in red, upper panel) and coldspot (SYC, bottom panel) motifs. (F) Waterfall plot with mutations ordered from highest (left) to lowest (right) frequency following the color code described in E. (G) Mutation frequency bar plot showing the percentages (indicated within the bars) of the three mutations classes following the color code described in E. (H) Bar graph indicating the percentage of the type of mutation indicated on the X axis. The C to T and G to A transition mutations are the signature of AID activity.

Comparison of transcriptional and mutational landscapes at two different human V regions expressed from the Ramos IGH locus.
(A) IGV browser snapshots of nascent RNA 5’ (PRO-cap) and 3’ (PRO-seq) ends and mutation tracks (MutPE-seq) at the VH4-59-DH2-JH6 V region expressed from the Ramos VH4-34 promoter (CDRs 1-3 highlighted). Mutation analysis was performed following infection of AID−/− cells with AID(JP8Bdel)-expressing lentiviruses for 7 days. (B) A magnified view of the VH4-59-DH2-JH6 V region from A above. The stalling zones are shown in the panel between the PRO-seq and MutPE-seq tracks. (C) Detailed mutational analysis of the VH4-59-DH2-JH6 V region displayed and color-coded as in Fig. 3E. A bar plot, as in Fig. 3G, with the percentage of mutation frequencies is shown on the right. (D) Nascent transcriptional and mutational profiles of the VH3-30-DH2-JH6 V region as in A. (E) A magnified view of the VH3-30-DH2-JH6 V region from D above. The stalling zones are shown in the panel between the PRO-seq and MutPE-seq tracks. (F) Detailed mutational analysis of the VH3-30-DH2-JH6 V region (see C above for details).

Transcriptional and mutational landscapes of the murine B1-8hi V region and non-Ig AID target genes in mice.
(A) Nascent transcriptional analysis at the Igh locus in murine B1-8hi primary, splenic B cells stimulated for four days with LPS, IL4 and RP105. IGV browser snapshots show the 5’ (PRO-cap) and 3’ (PRO-seq) ends of the aligned reads. For MutPE-seq, IghB1–8hi/ B1–8hi Rosa26AIDER/AIDER primary B cells were activated with LPS, IL4 and RP105 for 4 days with 4-HT. PRO-seq, PRO-cap and MutPE-seq were also performed from sorted, splenic germinal center B cells (GCBs) following immunization with sheep red blood cells for 10 days. (B) Analysis of SHM at the B1-8hi V region from B1-8hi Rosa26AIDER primary, activated murine splenocytes. The bar graph on the right shows the percentage of each indicated mutation category (see Fig. 3E and 3G for details). (C) Analysis of SHM at the B1-8hi V region from splenic GCBs following immunization with sheep red blood cells for 10 days. (D) 5’ and 3’ ends of nascent RNA (PRO-cap, PRO-seq) and mutation profiles at four selected AID target genes. Mutational data are from Alvarez-Prado et al. (2018) wherein the first 500 bp of the genes were sequenced. The region displayed extends from –100 bp from the annotated (RefSeq) TSS up to 50 bp downstream of the sequenced amplicon. The WRCH motifs are indicated as red dots.

Analysis of gene regulatory context on the nascent transcriptional landscape and SHM of the murine B1-8hi V region.
(A) Scheme showing the exchange of the endogenous Ramos V region for the murine B1-8hi V region to generate the RamosB1–8hi human IGH locus following the approach described in Fig. S2A. (B) 3’ ends of nascent RNAs by PRO-seq at the RamosB1–8hi IGH locus showing the distribution of multimapping, unique and total signal. (C) Nascent RNA 5’ and 3’ ends of PRO-cap and PRO-seq respectively and MutPE-seq analysis at the RamosB1–8hi IGH locus. MutPE-seq was performed following infection with AID(JP8Bdel)-expressing lentiviruses for 7 days. The stalling zones are shown in the panel between the PRO-seq and MutPE-seq tracks. (D) Details of the mutation patterns of the B1-8hi V region expressed from the Ramos IGH locus obtained by MutPE-seq. The bar graphs on the right show the percentage of each indicated mutation category (see Fig. 3E and 3G for details). For comparison, the murine B1-8hi mutation profile from primary murine B cells (middle panel) and murine GCBs (bottom panel), exactly as in Fig. 5B-C, are included.

Integrative model for co-transcriptional AID targeting and differential motif mutability.
(A) 3D structure of the human RNA Pol II elongation complex visualized with ChimeraX. We superimposed, via their RPB1 subunits, the elongation complex structure (PDB ID 6GMH) onto the transcribing Pol II-DSIF (a complex of SPT5 and SPT4) structure (PDB ID 5OIK). Proteins are shown as surfaces (Pol II, grey; DSIF, salmon; SPT6, brown; PAF complex, yellow) and nucleic acids as cartoons (DNA template strand, blue; DNA non-template strand, cyan; RNA, red). The right panel highlights the trajectory of DNA and RNA buried within Pol II. The exposed non-template strand of the transcription bubble is occluded from interactions with AID due to it being completely covered by SPT5. (B) Proposed model for SHM that incorporates various known aspects of SHM biology. The cartoon diagram of the Pol II elongation complex reflects the actual structure described in (A) above and in the main text. AID is recruited to the Pol II complex via super-enhancers in the context of a transcriptional hub (step 1) and is then retained via interactions with elongation factors, SPT5, PAF and SPT6 (step 2). The next step is the availability of ssDNA (step 3). We support the idea that premature transcription termination may be an important source of ssDNA 48. This event is not associated with stalling but occurs randomly within the first 2-3 kb from the TSSs 106. A slower-moving terminating Pol II may provide a more stable source of upstream, negative supercoiled DNA favoring higher AID activity. Upon dissociation of Pol II, the non-template strand is likely available as ssDNA for some time since the DNA:RNA hybrid would prevent immediate reannealing. The RNA is removed by RNA helicases and RNaseH followed by degradation of the RNA by the RNA exosome, which is known to associate with AID and provide access to the template strand. In this manner, AID can access ssDNA through various means all in the context of termination.

ChIP-qPCR analysis of the Ramos V region locale and generation of Ramos Eμ−/− cells.
(A) ChIP-qPCR analysis from three independent experiments in AID−/− Ramos cells for the indicated epigenetic marks as well as histone H3. Amplicons (1-7) used are indicated below the locus diagram shown above the graphs. The Neg. amplicon corresponds to a gene desert on chromosome 1 and is used as a negative control. (B) ChIP-qPCR analysis from five independent experiments in B1-8hi primary splenic cells for the indicated epigenetic marks as well as histone H3. Amplicons (1-6) used are indicated below the locus diagram shown above the graphs. The Neg. (negative region) amplicon corresponds to a gene desert on chromosome 1 and is used as a negative control. (C) Strategy to create the Eμ−/− AID−/− Ramos lines. The deleted region (583 bp) corresponds to the peak of accessible chromatin detected by ATAC-seq (top panel). This segment was replaced with a floxed reporter cassette expressing GFP or mCherry using CRISPR. Single clones double-positive for GFP and mCherry expression were isolated followed by excision of the floxed cassette by Cre recombinase. Two clones, c1 and c2 were used for all experiments. (D) Genotyping PCR analysis to confirm the loss of Eμ in AID−/− Eμ−/− clones c1 and c2. The location of primers is shown in the diagram above the gel image. (E) Surface IgM expression in Eμ−/− AID−/− Ramos clones relative to the parental AID−/− line determined by FACS.

MutPE-seq of the Ramos V region following infection of AID−/− cells with AID (m7.3).
(A) Mutation profile of the Ramos V region generated by AID (m7.3) following 21 days of infection. The type of mutation and location of hotspots is indicated and can be compared directly with Fig. 3E. (B—D) Waterfall plot of mutations ordered by frequency (B), bar plot of mutation frequencies in the three indicated classes (C) and bar graph of mutation type (D), as explained in the legend of Fig. 3 F to H).

Relationship between SHM, Pol II stalling and nascent transcription at different V regions.
(A) Scheme for the analysis of mutation frequency between stalling sites and stalling zones within V regions. (B) Box plots comparing mutation frequency between stalling and non-stalling zones in the Ramos endogenous (WT) V region, VH4-59, VH3-30 and RamosB1–8hi. Statistical analyses were performed with a Wilcoxon rank sum test and indicated within the plots. (C) Categorization of unmutated and mutated C:G pairs from MutPE-seq data, as described in the text. (D) Box plot showing the PRO-seq density in 7 bp windows centered at each C:G residue in the unmutated group and the ten mutated sub-groups, as described in A. The V gene analyzed is indicated at the top of each plot. The medians are indicated by black lines inside the boxes. A Wilcoxon rank sum test after multiple testing correction using the Bonferroni revealed no statistical significance (defined as P < 0.05) between any pair of groups within any of the box plots. (E) Workflow for the classification of mutated C:G neighborhoods into 7 bp windows (C:G ±3 bp) based on their PRO-seq levels, as described in the text. (F) Box plots showing the mutation frequencies of the central C:G residue in the 7 bp windows (C:G ±3 bp) within all ten PRO-seq sub-groups. The V gene analyzed is indicated at the top of each plot. The medians are indicated by black lines inside the boxes. A Wilcoxon rank sum test with the Bonferroni correction revealed no statistical significance (defined as P < 0.05) between any pair of groups within any of the box plots.

PRO-seq profiling of the Ramos V region in AID−/− Eμ−/− cells.
(A) Analysis of nascent transcription 3’ ends (PRO-seq) at the IGH locus in AID−/− and AID−/− Eμ−/− Ramos cells (c1, c2). The promoter (Prom.), complementary determining regions (CDR 1-3), intronic enhancer (Eμ) and switch µ region (Sμ). The lower panel shows a magnified view of the V region locale. (B) Quantification of PRO-seq read counts from the data in A. Signals were quantified separately on the sense and antisense strands and further divided into regions upstream and downstream of the deleted Eμ segment. The upstream portion extends from the 5’ end of the V region to the 5’ end of the deleted Eμ segment. The downstream portion extends from the 3’ end of the deleted Eμ segment to the 5’ end of Sμ.

Generation of Ramos cell lines expressing exogenous V regions.
(A) Schematic representation of the workflow for generating Ramos cells expressing new V regions using CRISPR-based editing. An IgM-negative line was made by excising the endogenous V region in AID−/− cells (AID−/−ΔV) and replacing it with a unique small guide RNA (sgRNA)-targeting sequence (green). Subsequently, an sgRNA targeting this site is combined with Cas9 and homology repair templates harboring any new V regions. Correct integration leads to restoration of surface IgM expression which is used as a readout to identify correctly targeted clones. (B) Flow cytometry analysis of surface IgM expression in Ramos cell clones expressing the indicated human and mouse V regions. Shown are three clones of human VH4-59-DH2-JH6 and human VH3-30-DH2-JH6 expressing Ramos cells, and two clones of B1-8hi expressing Ramos cells. (C) PRO-seq analysis showing the 3’ ends of aligned reads at the human VH4-59-DH2-JH6 V region expressed from the VH4-34 promoter at the human IGH in Ramos cells. Tracks of Multimapping, uniquely mapping and total are shown (see also Fig. 3A and 3B for a detailed description). (D) PRO-seq analysis as in C of the human VH3-30-DH2-JH6 V region expressed from the VH4-34 promoter at the human IGH in Ramos cells. (E) PRO-seq analysis as in C at the murine B1-8hi V region at the Igh locus in mice.

Analysis of PRO-cap, PRO-seq and SHM at non-Ig AID target genes in murine GCBs.
(A) 5’ and 3’ ends of nascent RNA (PRO-cap, PRO-seq) and mutation profiles (from Alvarez-Prado et al. 17) at selected AID target genes as in Fig. 5D-G. Shown are genes where highly mutated residues lie within 150 bp of the transcription initiation site defined by the peak of PRO-cap signals. The WRCH motifs are indicated as red dots. (B) As in A but showing genes where highly mutated residues lie near the transcription initiation site defined by the peak of PRO-cap signals.

Additional examples of PRO-cap, PRO-seq and SHM at non-Ig AID target genes in murine GCBs.
See Fig. 5D for detailed legend.

Relationship between SHM and transcriptional strength at non-IG AID target genes.
(A) Workflow for the classification of mutated and unmutated genes. (B) Box plot showing the PRO-seq density in 7 bp windows centered at each C:G residue in the three groups and sub-groups described in A. The medians are indicated by black lines inside the boxes. The asterisk indicates P < 0.05 based on the Wilcoxon rank sum test after multiple testing correction using the Bonferroni method. (C) Scheme for the classification of mutated C:G neighborhoods based on their PRO-seq density. (D) Box plots showing the mutation frequencies of the central C:G residue in the 7 bp windows (C:G ±3 bp) within all ten PRO-seq sub-groups. The medians are indicated by black lines inside the boxes. The asterisk indicates P < 0.05 based on the Wilcoxon rank sum test after multiple testing correction using the Bonferroni method.

List of all primers, oligos, antibodies and sgRNAs used in this study

Statistical analysis of mutation frequencies and PRO-seq densities in different V regions.
he upper and lower tables show the P values for the analyses in Fig. S3D and S3F, respectively. P values are based on the Wilcoxon rank sum test following multiple testing correction with the Bonferroni method.