Introduction

The vast majority of Cas systems explored as genome editors originate from mesophilic hosts. The emergence of the thermophilic GeoCas9, with DNA cleavage function up to 85 °C, can expand CRISPR technology to higher temperature regimes and stabilities, but similarities and differences in its mechanism relative to other Cas9s must be established. The widely-studied SpCas9, as well as GeoCas9, are Type-II CRISPR systems, though only the Type II-C class to which GeoCas9 belongs has been rigorously validated for mammalian genome editing, reinforcing the need to better understand this CRISPR class.1 The similar domain arrangements of GeoCas9 and SpCas9 led us to initially speculate that these could share atomic level mechanistic similarities.2 GeoCas9 utilizes a guide RNA (gRNA) to recognize, unwind, and cleave a double-stranded DNA target after recognition of its 5’NNNNCRAA-3’ protospacer adjacent motif (PAM)3,4 and both of the GeoCas9 nuclease active sites are spatially distinct from the PAM recognition site, necessitating structural and dynamic changes that allosterically propagate DNA binding information to these subdomains. Our prior work revealed a divergence in the timescales of allosteric motions in the SpCas9 and GeoCas9 HNH domains2,5 and it is also possible that docking of the gRNA with GeoCas9, and therefore its interaction with the RNA:DNA hybrid, may differ from the SpCas9 system, as GeoCas9 contains a truncated recognition (Rec) lobe with only two of the three major subdomains.

The high thermal stability and more compact size of GeoCas9 (it is 281 residues shorter than SpCas9) can be especially important for in vivo delivery applications, since promising viral vectors (i.e. adeno-associated virus, AAV) have cargo capacities of ∼4.7kb,6 which prevents SpCas9-gRNA packaging into a single AAV vector but permits “all-in-one” delivery of GeoCas9-gRNA.1 Despite this potential, little is known about specific residues that influence GeoCas9 structure, nucleic acid binding, or function. Our recent NMR work with SpCas9 uncovered pathways of micro-millisecond timescale motions that propagate chemical information related to allostery and specificity through SpRec and its RNA:DNA hybrid,5,7 prompting us to investigate this phenomenon in GeoRec.

The structural basis for specificity in protein-nucleic acid complexes is also poorly understood and difficult to quantify. Although dynamic ensembles in DNA repair enzymes have provided some context,810 many efforts to improve Cas9 specificity and reduce off-target activity have relied on large mutational screens or error-prone PCR,1114 which are less intuitive. Inter-subunit allosteric communication between the catalytic HNH domain and the Rec lobe is critical to Cas9 specificity, as the binding of off-target DNA sequences at Rec alter HNH dynamics to affect DNA cleavage.3,11,15 To further probe the fundamental role of protein motions in the function and specificity of GeoCas9, as well as the effect of protein-nucleic acid interactions on its structural signatures, we engineered two mutations in GeoRec (K267E and R332A, housed within GeoRec2). We hypothesized that these variants could enhance GeoCas9 specificity (i.e. limit its off-target cleavage) for two reasons. First, the chosen mutation sites are homologous to those of specificity-enhancing variants of SpCas9.16,17 Second, altered Cas9-gRNA interactions have been shown to be a consequence of specificity-enhancement and these charged residues appear to directly interact with the gRNA.1821 Balancing these two points is the fact that Type-II Cas systems generally have conserved nuclease domains, but are delineated by highly varied Rec domains.1 This implies that the structural and dynamic properties of Rec may play an outsized role in differentiating the functions of SpCas9 and GeoCas9, which may not be identical. Nevertheless, our work provides new insight into the biophysical, biochemical, and functional role of the GeoRec lobe and how mutations modulate the domain itself and its interaction with gRNA in full-length GeoCas9.

Results

The structural similarity of GeoRec1, GeoRec2, and GeoRec facilitates NMR analysis of protein dynamics and nucleic acid affinity

GeoCas9 is a 1087 amino acid polypeptide, thus we employed a “divide and conquer” approach for NMR studies, which we previously showed to be useful for quantifying allosteric structure and motion in SpCas9.5,7,22,23 The GeoRec lobe is comprised of subdomains GeoRec1 and GeoRec2, which likely work together to recognize nucleic acids. We engineered constructs of the GeoRec1 (136 residues, 16 kDa) and GeoRec2 (212 residues, 25 kDa) subdomains and solved the X-ray crystal structure of GeoRec2 at 1.49 Å, which aligns remarkably well with the structure of the GeoRec2 domain within the AlphaFold model of GeoCas9 (RMSD 1.025 Å, Figure 1A). We were neither able to crystallize GeoRec1 nor full-length GeoCas9, nonetheless GeoRec2 in isolation represents the structure of the subdomain within full-length GeoCas9 very well. Our previous studies of GeoHNH also show identical superpositions of X-ray crystal structures with full-length Cas complexes.2 In addition to the individual subdomains, we also generated a construct of the intact GeoRec lobe (370 residues, 43 kDa).

(A) Architecture of GeoCas9 modeled with AlphaFold2. The GeoRec2 domain from the model (grey) is overlaid with an X-ray structure of GeoRec2 (red, PDB ID: 9B72, RMSD 1.03 Å). (B) 1H-15N TROSY HSQC NMR spectrum of GeoRec collected at 850 MHz. Overlays of this spectrum with resonances from spectra of GeoRec1 (black) and GeoRec2 (blue) demonstrate a structural similarity between the isolated subdomain and intact GeoRec. (C) NMR titration of a 39-nt gRNA into WT GeoRec. Representative resonances are colored by increasing gRNA concentration in the legend. (D) NMR chemical shift perturbations caused by gRNA binding to WT GeoRec. Gray bars denote sites of line broadening and the blue bar denotes an unassigned region of GeoRec corresponding to the native Rec1-Rec2 linker. The red dashed line indicates 1.5α above the 10% trimmed mean of the data. (E) Chemical shift perturbations >1.5α are mapped onto GeoRec (red spheres). Resonances that have broadened beyond detection are mapped as yellow spheres. (F) MST-derived binding affinity of GeoRec to the Cy5-labeled 39-nt gRNA yields a Kd = 2.95 ± 0.53 μM.

Despite only 22% sequence identity (Figure S1A), the structure of SpRec3 and GeoRec2 are highly similar (RMSD 2.00 Å, Figure S1B). The structure of GeoRec1, in contrast, does not align perfectly with SpRec1, instead, it partially aligns with both SpRec1 and SpRec2 (Figure S1C). Thus, the nearly identical SpRec3 and GeoRec2 architectures and their intrinsic dynamics may be a common thread among Type II Cas9s of different size and PAM preference. To capture atomic-level signatures of GeoRec, we obtained well-resolved 1H-15N NMR fingerprint spectra for all three protein constructs, and assigned the amide backbones (Figure S2). 1H-15N amide (Figure 1B, S2) and 1H-13CH3 Ile, Leu, and Val (ILV)-methyl NMR spectra (Figure S3) of GeoRec overlay very well with those of its individual subdomains, suggesting that the linkage of subdomains within the full-length GeoRec polypeptide does not alter their isolated folds. Consistent with this observation, circular dichroism (CD) thermal unfolding profiles of GeoRec1 (Tm ∼ 34 °C) and GeoRec2 (Tm = 61.50 °C) are distinct, and occur as separate events in the unfolding profile of GeoRec (Figure S4). The dumbbell shape of the GeoRec lobe, where the two globular subdomains are connected by a linker is a likely contributor to these biophysical properties.

The Rec lobe of Cas9 is responsible for orienting the RNA:DNA hybrid, as well as the adjacent nuclease domains, into their active conformations.1,19,20,24 Thus, the structure, motions, and nucleic acid interactions of Rec represent a critical piece of the Cas9 signaling machinery. Previous studies of SpCas9 revealed that gRNA binding (i.e. ribonucleoprotein complex (RNP) formation) induces a global structural rearrangement of the protein that positions the adjacent HNH into its “proofreading” state,18 after which target DNA binding positions the nucleases into active conformations for cleavage.18,25 We wondered if the atomistic details of the apo GeoCas9-to-RNP transition could be captured by NMR using the GeoRec construct and a 101-nucleotide (nt) gRNA derived from the full 141-nt guide containing a 23-nt spacer followed by an 8-nt repeat (PAM complement) and 110-nt guide sequence intrinsic to GeoCas9. An overlay of the 1H-15N HSQC NMR spectra of apo GeoRec and GeoRec-RNP at a 1:1 molar ratio shows extensive line broadening (Figure S5A), likely due to the large size of the complex (75.5 kDa). To mitigate this issue, we created a second 39-nt RNA construct maintaining the spacer sequence that is known to bind to the Rec lobe of other Cas9s. When bound to GeoRec, this complex is 55.6 kDa and more amenable to NMR studies. A 1H-15N NMR spectral overlay of apo GeoRec and GeoRec in complex with 39-nt gRNA at a 1:1 molar ratio shows clear, resolved resonances with significant chemical shift perturbations and line broadening (Figure 1C, S5B). Importantly, many resonances are also minimally impacted. The results of gRNA titration experiments were quantified in Figure 1D and mapped onto the GeoRec structure in Figure 1E. The strongest chemical shift perturbations are localized to the GeoRec2 subdomain that interfaces with the RNA:DNA hybrid at the PAM distal end. Previous studies of specificity-enhancing variants of SpCas9 identified structural perturbations to SpRec3, affecting RNA:DNA hybrid binding at the PAM distal end.7 These data suggest that in the analogous GeoRec2, residues located near the PAM distal end of the nucleic acids play a similarly critical role in binding. Modulation of binding affinity within this region, as in SpCas9, may lead to predictable functional changes in GeoCas9. In addition, line broadening is evident in both GeoRec1 and GeoRec2, primarily localized to the RNA:DNA hybrid interface. Microscale thermophoresis (MST) experiments quantified the affinity of GeoRec for gRNA, producing a Kd = 2.95 ± 0.53 µM that is consistent with the strong NMR chemical shift perturbations (Figure 1F).

GeoRec2 mutants do not substantially impact the GeoRec structure

To understand how the structure and gRNA interactions of GeoCas9 can be modulated at the level of GeoRec, we engineered two charge-altering point mutants in the GeoRec2 subdomain, K267E and R332A. Based on the AlphaFold2 model of GeoCas9, both of these residues are < 5Å from the bound RNA:DNA hybrid and are predicted to interface with the nucleic acids directly (Figure 2A,B). Of course, we must note that due to the lack of an experimental structure of GeoCas9, the interactions between K267 and R332 and the gRNA may be different than predicted. The rationale for our designed mutations was that removal of positive charge would weaken the interactions between GeoCas9 and the gRNA, affecting Kd via the electrostatics or dynamics of the GeoRec lobe. Studies of SpCas9 revealed that interactions of SpRec3 (analogous to GeoRec2) with its RNA:DNA hybrid triggers conformational rearrangements that allow the catalytic HNH domain to sample its active conformation.19 Thus, SpRec3 acts as an allosteric effector that recognizes the RNA:DNA hybrid to activate HNH. Nucleotide mismatches (i.e. off-target DNA sequences) in the target DNA generally prevent SpRec3 from undergoing the full extent of its required conformational rearrangements to allosterically activate HNH, leaving HNH in a “proofreading” state with its catalytic residues too far from the DNA cleavage site. Off-target DNA cleavage by Cas9 remains an area of intense study and substantial effort from various groups has gone into mitigating such effects.18,21,2628 Indeed, many high-specificity SpCas9 variants contain mutations within SpRec3 that increase the threshold for its conformational activation, reducing the propensity for HNH to sample its active state in the presence of off-target DNA sequences.18,19,28 Studies of flexibility within Rec itself, as well as its gRNA interactions in the presence of mutations, are therefore essential for connecting biophysical properties to function and specificity in related Cas9s.

(A, B) Sites of selected mutations within GeoRec2, K267 and R332, are highlighted as purple sticks directly facing the nucleic acids modeled from NmeCas9 (PDB ID: 6JDV), allowing for prediction of the binding orientation within GeoCas9. NMR chemical shift perturbations caused by the K267E (C) or R332A (D) mutations are plotted for each residue of GeoRec. Gray bars denote sites of line broadening, the blue bar denotes an unassigned region of GeoRec corresponding to the native Rec1-Rec2 linker, and the red bar indicates the mutation site. The red dashed line indicates 1.50 above the 10% trimmed mean of all shifts. Chemical shift perturbations >1.50 are mapped onto K267E (E) and R332A (F) GeoRec (red spheres). Resonances that have broadened beyond detection are mapped as yellow spheres and the mutation sites are indicated by a black sphere and green arrow.

The K267E GeoRec2 variant is sequentially and structurally similar to a specificity enhancing site in SpCas9 (K526E), within the evoCas9 system.16 The SpCas9 K526E mutation substantially reduced off-target activity alone, but was even more effective in conjunction with three other single-point mutations in SpRec3.16 The R332A GeoRec2 variant also mimics one mutation within a high-specificity SpCas9 variant, called HiFi Cas9-R691A.17 We assessed mutation-induced changes to local structure in GeoRec via NMR chemical shift perturbations to 1H-15N HSQC backbone amide spectra. Consistent with experiments using GeoRec2 alone, chemical shift perturbations and line broadening are highly localized to the mutation sites. (Figure 2C-F). Perturbation profiles of the GeoRec2 subdomain and intact GeoRec also implicate the same residues as sensitive to the mutations (Figure S6).

CD spectroscopy revealed that wild-type (WT), K267E, and R332A GeoRec2 maintained similar alpha-helical secondary structure (Figure S7A), though the thermostability of both variants was slightly reduced from that of WT GeoRec2 (Figure S7B). The Tm of WT GeoRec2 is ∼62 °C, consistent with the Tm of the full-length GeoCas9, while that of K267E GeoRec2 was decreased to ∼55 °C. Though the R332A GeoRec2 Tm remains ∼62 °C, this variant underwent a smaller unfolding event near 40 °C before completely unfolding. These data suggest that despite small structural perturbations, both mutations are destabilizing to GeoRec2, which implies a change in NMR-detectable protein dynamics.

Mutations enhance and redistribute protein dynamics within GeoRec2

Due to the high molecular weight of the intact GeoRec lobe, decays in NMR signal associated with spin relaxation experiments were significant and hampered data quality. Thus, we focused on quantifying the molecular motions of the GeoRec2 subdomain, where the K267E and R332A mutations reside and the chemical shift perturbations are most apparent. To obtain high-quality per-residue information representative of GeoRec, we measured longitudinal (R1) and transverse (R2) relaxation rates and heteronuclear 1H-[15N] NOEs, then used these data in a Model-free analysis of per-residue order parameters (S2, Figure 3A). Previous measurements of S2 across the adjacent GeoHNH nuclease revealed substantial ps-ns timescale flexibility,2 leading us to wonder whether a similar observation would be made for GeoRec2, which abuts GeoHNH. Such a finding would suggest that HNH-Rec2 crosstalk in GeoCas9 is driven primarily by rapid bond vector fluctuations. However, unlike GeoHNH, S2 values for GeoRec2 are generally elevated, suggesting that the ps-ns motions of this subdomain arise primarily from global tumbling of the protein in solution. We therefore carried out Carr-Purcell-Meiboom-Gill (CPMG) relaxation dispersion NMR experiments to assess the μs-ms flexibility of GeoRec2, which has been linked to chemical information transfer in the well-studied SpCas9.5,7,20,22 Evidence of μs-ms motions (i.e. curved relaxation dispersion profiles) is observed in 17 residues within the GeoRec2 core, spanning its interfaces to Rec1 and HNH (Figure 3B,C). Such motions are absent from GeoHNH, thus two neighboring domains, GeoRec2 and GeoHNH, diverge in their intrinsic flexibility (at least in isolation), raising questions about the functional implications of these motions in GeoRec2. We previously showed that heightened flexibility of SpRec3 via specificity-enhancing mutations concomitantly narrowed the conformational space sampled by SpHNH, highlighting a “motional trade-off” between the domains, which may be mirrored here. Manipulation of the flexibility of SpCas9 and GeoCas9 domains by mutagenesis also impacts aspects of nucleic acid binding and cleavage,18,20,22,29,30 which led us to investigate similar perturbations in GeoRec2.

(A) CPMG relaxation dispersion profiles for all sites of μs-ms flexibility, fit to a global kex of 147 ± 41 s- 1 (WT GeoRec2, top), 376 ± 89 s-1 (K267E GeoRec2, middle), and 142 ± 28 s-1 (R332A GeoRec2, bottom). Relaxation dispersion profiles are colored according to Table S1. B) CPMG relaxation dispersion profiles for representative residues in GeoRec2. A mixture of curved and flat profiles illustrate the changes in and redistribution of μs-ms motions between WT and GeoRec2 variants. (C) All sites exhibiting CPMG relaxation dispersion plotted in A are mapped to GeoRec as blue spheres. Adjacent domains within the AlphaFold2 model of GeoCas9 are also shown.

Since SpRec3 and GeoRec2 have similar structures and μs-ms flexibility, we speculated that charge-altering mutations would modulate the biophysical properties of GeoRec and the function of GeoCas9, as observed for SpCas9. We investigated K267E and R332A GeoRec2 with NMR spin relaxation, as described for WT GeoRec2 (vide supra). An analysis of chemical exchange rates, kex, derived from dual-field CPMG relaxation dispersion show a global kex for WT GeoRec2 of 147 ± 41 s-1. The K267E mutation shifts the globally fitted kex to 376 ± 89 s-1, while the R332A variant maintains a kex similar to that of WT GeoRec2 (142 ± 28 s-1). The global fit of the K267E variant is based on CPMG profiles of 33 residues, while that of R332A is derived from 18 residues (Table S1). Interestingly, the sites participating in global motions of both variants are distinct from those of WT GeoRec2, demonstrating that residue-specific flexibility is redistributed throughout GeoRec2, affecting intradomain molecular crosstalk within GeoRec. Indeed, perturbation to NMR-detectable motions in SpCas9 rewired its allosteric signaling and enzymatic function.7,22 A similar dynamic modulation of GeoCas9 may fine-tune its DNA cleavage, which has been demonstrated within the GeoHNH nuclease29 and wedge (WED) domains.26 We also assessed the ps-ns fluctuations of GeoRec2 variants (a negligible contribution to the WT GeoRec2 dynamic profile) and calculated order parameters from R1, R2 and 1H-[15N] NOE relaxation measurements (Figure 3A, S8). Bond vector fluctuations on the ps-ns timescale are only locally altered, thus the mutation-induced reshuffling of these dynamics is negligible (<ΔS2> ≤ 0.1) and suggests that, like WT GeoRec2, ps-ns motion arises primarily from global tumbling in solution. Mutations within GeoRec diminish its affinity for guide RNA

To understand how the K267E and R332A mutants impact gRNA binding to GeoRec, we conducted gRNA titration experiments via NMR and observed that gRNA-induced chemical shift perturbations were attenuated in both variants, relative to WT GeoRec. Despite this muted structural effect, the impact from gRNA-induced line broadening remains substantial in the GeoRec1 subdomain. Our NMR data reveal that a three-fold greater concentration of gRNA is required to induce the maximal structural and dynamic effects in the variants than is required for WT GeoRec (Figure 4A-C), suggesting that the variants have a reduced gRNA affinity. MST experiments showed modest reductions in gRNA affinity for the K267E and R332A constructs, relative to WT GeoRec (Kd = 2.95 ± 0.53 µM), where K267E GeoRec produced a Kd = 6.67 ± 2.0 µM and R332A GeoRec produced a Kd = 4.02 ± 1.34 µM (Figure 4E). Collectively, these data reveal that mutations within GeoRec only subtly alter its structure, but more significantly impact protein dynamics and in turn, the gRNA interaction. NMR experiments also demonstrate that the presence of gRNA impacts both subdomains of GeoRec, providing an avenue for additional molecular tuning.

(A) NMR titration of a 39-nt gRNA into K267E (top) and R332A (bottom) GeoRec. The left panel of each pair demonstrates that gRNA concentrations mimicking the WT titration induce minimal change in NMR chemical shift or resonance intensity. The right panel of each pair depicts the titration over a three-fold greater concentration range of gRNA, where shifts and line broadening are visible. Representative resonances are colored by increasing gRNA concentration according to the legend. (B) NMR chemical shift perturbations caused by gRNA binding to K267E and R332A GeoRec. Gray bars denote sites of line broadening and the blue bar denotes an unassigned region of GeoRec corresponding to the native Rec1-Rec2 linker. The red dashed line indicates 1.50 above the 10% trimmed mean of all shifts. (C) Chemical shift perturbations >1.50 are mapped onto GeoRec (red spheres). Resonances that have broadened beyond detection are mapped as yellow spheres. (D) Fitted thermal denaturation profiles derived from CD spectra of apo WT (black line), K267E (dashed line), and R332A (dotted line) GeoCas9 are shown on the left. Fitted denaturation profiles of the same proteins, as an RNP in complex with 141-nt gRNA, are shown in the right panel. The red dashed line denotes the Tm of the apo proteins, which are nearly identical. (E) MST-derived binding affinities of K267E and R332A GeoRec and a Cy5-labeled 39-nt gRNA yielding Kd = 6.67 ± 2.0 µM and Kd = 4.02 ± 1.34 µM, respectively.

To more completely understand the mechanism of gRNA binding to GeoRec, we carefully examined the NMR data to identify any residues critical for gRNA binding. When overlaying the gRNA-induced chemical shift perturbations (Δδ) of WT and GeoRec variants (Figure S9A), it became clearer that the effect of gRNA binding to both variants was muted and that even at saturating concentrations, the chemical shift perturbations of identical resonances in the K267E and R332A GeoRec2 spectra were weaker than those site in the WT GeoRec spectrum. Plots of the change in Δδ (WT - mutant) indicate site-specifically that residues in the variants are less structurally affected by gRNA binding, noted as positive values in Figure S9B. Negative values in the same plot highlight residues in the variants that experience greater structural changes than WT GeoRec during gRNA binding. Residues with comparative change in Δδ exceeding one standard deviation above the 10% trimmed mean of all data were identified to represent key residues influencing gRNA affinity. These sites were mapped onto the GeoRec structure (Figure S9C) and considered to be allosteric hotspots, as many are distal to the gRNA binding interface. Notably, residues critical for the tight association of GeoRec and gRNA (positive change in Δδ) largely overlap when comparing WT GeoRec to either variant. These sites include F170, R192, H264, R269, L270, L279, H300, D301, D376, E368, D403, E405, E408, and I429. Mutations of these residues in future work will confirm their role as hotspots for precisely tuning the affinity of GeoRec for gRNA.

Having observed a weakened affinity of GeoRec variants for gRNA by NMR and MST, we next quantified the impact of the K267E and R332A mutations on RNP formation and stability in full-length GeoCas9. The thermal unfolding midpoint of full-length WT GeoCas9 determined by CD is ∼60 °C and the K267E and R332A mutations do not change the Tm of the apo protein (Figure S10). Upon formation of an RNP, the Tm of WT GeoCas9 increases to 73 °C, but the K267E and R332A GeoCas9 variants form less stable RNPs with Tm values of 70 °C and 61 °C, respectively. The trend of these data is consistent with NMR and MST, which highlight that although K267E and R332A mutations within GeoRec have somewhat muted structural effects, these changes alter protein dynamics and the interaction with gRNA.

DNA cleavage assays suggest GeoCas9 is resistant to K267E- or R332A-induced functional changes

The dynamic impact of the GeoRec mutations and their altered gRNA interactions at the biophysical level led us to speculate that when either mutation was incorporated into full-length GeoCas9, they would also alter DNA cleavage function, especially at elevated temperatures where WT GeoCas9 is most active. Temperature-dependent functional alterations were previously observed for single-point mutations within GeoHNH.29 Although the K267E and R332A mutations slightly diminished on-target DNA cleavage by GeoCas9, the effect was subtle and these overall cleavage activities followed the temperature dependence of WT GeoCas9 quite closely (Figure S11).

To assess the impact of these mutations on GeoCas9 specificity, we assayed the propensity for off-target cleavage using DNA substrates with mismatches 5-6 or 19-20 base pairs from the PAM seed site (Figure S12A, Table S2). As a control for on- and off-target activity, we also assayed WT SpCas9 alongside the widely used high-specificity HiFi-SpCas9 variant17 (Figure S12B, Table S3) and found a lower percent of digested off-target (mismatched) DNA sequences when compared to WT SpCas9. As expected, WT GeoCas9 was increasingly sensitive to mismatched target sequences closer to the seed site, which has been demonstrated with SpCas9 and other Cas systems.3,4,31,32 No significant differences in activity were observed with digestion durations ranging from 1-60 minutes,4 implying that a 1-minute digestion is sufficient for in vitro activity of GeoCas9 with the target DNA template. While these findings generally align with prior investigations of off-target DNA cleavage,3,4,31 there are nuanced differences. Specifically, a previous study reported ∼10% cleavage of off-target DNA with a mismatch 5-6 base pairs from the PAM by WT GeoCas9.4 Our results showed nearly 50% cleavage for the same off-target mismatch, but still a significant decrease in cleavage from on-target or 19-20 base pair distal mismatches. This could be due to the relatively high RNP concentrations (600-900 nM) in our assay (for clear visibility on the gel), compared to prior studies with RNP concentrations σ; 500 nM.4 Our results corresponded closely to those of prior studies with a 19-20 base pair mismatch, where off-target cleavage is tolerated by WT GeoCas9.4 Single-point mutants K267E and R332A GeoCas9 have negligible impact on GeoCas9 specificity (both variants follow the trend of WT GeoCas9, Figure S12A), which contrasts prior work with SpCas9 that demonstrated robust specificity enhancement with single-point mutations in Rec.16,17 However, the GeoCas9 double mutant K267E/R332A modestly enhances its specificity by reducing cleavage proximal and distal to the PAM seed site. The concomitant reduction in on-target cleavage activity of K267E/R332A has also been noted in other Cas systems.1618,28,3335

Discussion

CRISPR-Cas9 is a powerful tool for targeted genome editing with high efficiency and modular specificity.4,1618 Allosteric signals propagate DNA binding information to the HNH and RuvC nuclease domains, facilitating their concerted cleavage of double-stranded DNA.3,18,20,30 The intrinsic flexibility of the nucleic acid recognition lobe plays a critical role in this information transfer, exerting a measure of conformational control over catalysis.20 This study provides new insights into the structural, dynamic, and functional role of the thermophilic GeoCas9 recognition lobe. Novel constructs of subdomains GeoRec1 and GeoRec2, as well as intact GeoRec show a high structural similarity to the domains in full-length GeoCas9, facilitating solution NMR experiments that captured the intrinsic allosteric motions across GeoRec2. These studies revealed the existence of μs-ms timescale motions that are classically associated with allosteric signaling and enzyme function, which span the entire GeoRec2 subdomain to its interfaces with GeoRec1 and the adjacent GeoHNH.

Based on homology to specificity-enhancing variants of the better studied SpCas9, the biophysical and biochemical consequences of two mutations were tested in GeoRec2, the larger GeoRec lobe, and full-length GeoCas9. We speculated that removing positively charged residues with potential to interact with negatively charged nucleic acids could disrupt GeoCas9-gRNA complex formation, stability, and subsequent function by altering the protein or nucleic acid motions. Indeed, CPMG relaxation dispersion experiments revealed that mutations enhanced and reorganized the μs-ms flexibility of GeoRec2. Further, NMR titrations showed the affinity of K267E and R332A GeoRec for gRNA to be weaker than that of WT GeoRec, consistent with MST-derived Kd values. The mutations also diminished the stability of the full-length GeoCas9 RNP complex.

The collective changes to protein dynamics, gRNA binding, and RNP thermostability suggested that mutations could modulate GeoCas9 function, as observed in similar studies of SpCas9 reporting that gRNA dynamics, affecting the potential for the RNA:DNA hybrid to dissociate, have affected function.18,19,36 Yet, the functional impact of the single-point mutations in this work was relatively modest, suggesting that multiple additive (or synergistic) mutations within GeoRec are required to fine-tune activity or specificity to a large degree. Although homologous K-to-E and R-to-A single point mutations enhance specificity of the mesophilic SpCas9, their impact in the thermophilic GeoCas9 system may be muted due to its evolutionary resilience.37,38 Further, the biophysical impact of the mutations within GeoRec2 and GeoRec may be tempered by the unusually stable neighboring domains in the context of full-length GeoCas9.

It should be noted that the effects of these and other GeoRec mutations may vary in vivo or with alternative target cleavage sites and cell types. Such studies will be the subject of future work, as will biochemical assays of homologous mutations across diverse Cas9s, which have contributed to the wide use of CRISPR technology.39 We also note that despite the homology between GeoRec2 and SpRec3 and the latter’s role in evo- and HiFi-SpCas9 variants that guided our K267E and R332A mutations, the maximally enhanced SpCas9 variants contain four mutations each. Presumably, the individual substitutions all play some role modulating specificity. However, there is no consistent pattern that discerns whether multiple mutations will have additive or synergistic impacts on Cas9 function. NMR and MD studies of high-specificity SpCas9 variants (HF-1, Hypa, and Evo, each with distinct mutations in the SpRec3 subdomain) reveal universal structural and dynamic variations in regions of SpRec3 that interface with the RNA;DNA hybrid. Notably, a recently published variant, iGeoCas9,26 demonstrated enhanced genome-editing capabilities in HEK293T cells with eight mutations, though none in the Rec2 subdomain. This study highlighted the functional adaptability of iGeoCas9 under low magnesium conditions, a trait beneficial in mammalian cells, distinguishing it from WT GeoCas9. These very recently published data, as well as the findings reported here, advance our molecular understanding of GeoCas9 as a promising avenue for other enhanced variants.

This study marks the first phase of mapping allosteric motions and pathways of information flow in the GeoRec lobe with solution NMR experiments. Such information transfer is critical to the crosstalk between Rec and HNH in several Cas9s. Despite NMR advancements in perdeuteration,40 transverse relaxation-optimized spectroscopy (TROSY),41 and sparse isotopic labeling,42 per-residue dynamics underlying allosteric signaling in large multi-domain proteins such as GeoCas9 (∼126 kDa) have remained challenging to characterize. A novel cryo-EM structure of GeoCas926 will facilitate the merging of future NMR studies with molecular dynamics simulations that we have previously used to report on RNP dynamics and atomic level networks of communication. The identification of additional (or synergistic) allosteric hotspots within GeoRec using an integrated workflow will help to further resolve the balance between structural stability and flexibility in GeoCas9, leading to new insight into targeted manipulation of RNA affinity and enhanced variants.

Materials and Methods

Expression and purification of GeoRec1, GeoRec2, GeoRec, and GeoCas9

The Rec1 (residues 90-225) and Rec2 (residues 245-456) subdomains, as well as the entire Rec lobe (residues 90-456) of G. stearothermophilus Cas9 were engineered into a pET28a vector with a N-terminal His6-tag and a TEV protease cleavage site. The K267E and R332A mutations were separately introduced into the GeoRec2 plasmid. Plasmids were transformed into BL21 (DE3) cells (New England Biolabs). Protein samples for CD spectroscopy, MST, and functional assays were grown in Lysogeny Broth (LB, Fisher), while isotopically labeled samples for NMR were grown in M9 minimal media (deuterated for GeoRec2 and GeoRec) containing CaCl2, MgSO4, MEM vitamins, and 1.0 g/L 15N ammonium chloride and 2.0 g/L 13C glucose (Cambridge Isotope Laboratories), as the sole nitrogen and carbon sources, respectively. Cells were induced with 1 mM IPTG after reaching an OD600 of 0.8−1.0 and grown for 4 hours at 37 °C post induction. The cells were harvested by centrifugation, resuspended in a buffer of 50 mM Tris-HCl, 250 mM NaCl, 5 mM imidazole, and 1 mM PMSF at pH 7.4, lysed by ultrasonication, and purified by Ni−NTA affinity chromatography. Following TEV proteolysis of the terminal His-tag, the samples were further purified on a Superdex75 size exclusion column. NMR samples were dialyzed into a buffer containing 20 mM NaPi, 80 mM KCl, 1 mM DTT, and 1 mM EDTA at pH 7.4.

The full-length GeoCas9 plasmid was acquired from Addgene (#87700), expressed in LB media and purified as described above. The K267E, R332A, and K267E/R332A variants were introduced into full-length GeoCas9 by modifying the original plasmid acquired from Addgene.

NMR spectroscopy

Backbone resonance assignments of GeoRec1 and GeoRec2 were carried out on a Bruker Avance NEO 600 MHz spectrometer at 25 °C. The following triple resonance experiments were collected for each sample: 1H-15N TROSY-HSQC, HNCA, HN(CO)CA, HN(CA)CB, HN(COCA)CB, HN(CA)CO and HNCO. All spectra were processed in NMRPipe43 and analyzed in Sparky44. Three-dimensional correlations and assignments were made in CARA45 and GeoRec1 and GeoRec2 backbone assignments were deposited in the BMRB under accession numbers 52363 and 51197, respectively.

NMR spin relaxation experiments were carried out at 600 and 850 MHz on Bruker Avance NEO and Avance III HD spectrometers, respectively. CPMG experiments were adapted from the report of Palmer and coworkers46 with a constant relaxation period of 20 ms and νCPMG values of 0, 25, 50 (×2), 75, 100, 150, 250, 500 (×2), 750, 800 (×2), 900, and 1000 Hz. Exchange parameters were obtained from global fits of the data carried out with RELAX47 using the R2eff, NoRex, and CR72 models, as well as in-house fitting in GraphPad Prism via:

where Rex is

Error was determined with replicate measurements.

Longitudinal and transverse relaxation rates were measured with relaxation times of 20 (x2), 60 (x2), 100, 200, 600(x2), 800, and 1200 ms for T1 and 16.96, 33.92(x2), 50.88(x2), 67.84(x2), 84.8, and 101.76(x2) ms for T2. Peak intensities were quantified in Sparky and the resulting decay profiles were analyzed in Sparky with errors determined from the fitted parameters. Steady-state 1H-[15N] NOE were measured with a 6 second relaxation delay followed by a 3 second saturation (delay) for the saturated (unsaturated) experiments. All relaxation experiments were carried out in a temperature-compensated interleaved manner. Model-free analysis using the Lipari-Szabo formalism was carried out on NMR data in RELAX47 with fully automated protocols. NMR nucleic acid titrations were performed on a Bruker Avance NEO 600 MHz spectrometer at 25 °C by collecting a series of 1H-15N TROSY HSQC spectra with increasing gRNA concentration. The 1H and 15N carrier frequencies were set to the water resonance and 120 ppm, respectively. Samples of WT, K267E, and R332A GeoRec were titrated with gRNA until no further spectral perturbations were detected. NMR chemical shift perturbations were calculated as:

Microscale thermophoresis (MST)

MST experiments were performed on a Monolith X instrument (NanoTemper Technologies), quantifying WT, K267E, R332A, and K267E/R332A GeoRec binding to a 39-nt Cy5-labeled gRNA at a concentration of 20 nM in a buffer containing 20 mM sodium phosphate, 150 mM KCl, 5 mM MgCl2, and 0.1% Triton X-100 at pH 7.6. The GeoRec proteins were serially diluted from a 200 µM stock into 16 microcentrifuge tubes and combined in a 1:1 molar ratio with serially diluted gRNA from a 40 nM stock. After incubation for 5 minutes at 37 °C in the dark, each sample was loaded into a capillary for measurement. Kd values for the various complexes were calculated using the MO Control software (NanoTemper Technologies).

Circular dichroism (CD) spectroscopy

All GeoCas9 and GeoRec proteins were buffer exchanged into a 20 mM sodium phosphate buffer at pH 7.5, diluted to 1 μM, and loaded into a 2 mm quartz cuvette (JASCO instruments). A CD spectrum was first measured between 200 - 250 nm, after which the sample was progressively heated from 20 – 90 °C in 1.0 °C increments while ellipticity was monitored at 222 and 208 nm. Phosphate buffer baseline spectra were subtracted from the sample measurements. Prior to CD measurements, GeoCas9-RNP was formed by incubating 3 μM GeoCas9 with gRNA at a 1:1.5 molar ratio at 37 °C for 10 minutes.

X-ray crystallography

GeoRec protein purified as described above was crystallized by sitting drop vapor diffusion at room temperature by mixing 1.0 µL of 15 mg/mL GeoRec in a buffer of 20 mM HEPES and 100 mM KCl at pH 7.5 with 2.0 µL of crystallizing condition: 0.15 M calcium chloride, 15 % polyethylene glycol 6000, 0.1 M HEPES at pH 7.0. Crystals were cryoprotected in crystallizing condition supplemented with 30% ethylene glycol. Diffraction images were collected at the NSLS-II AMX beamline at Brookhaven National Laboratory under cryogenic conditions. Images were processed using XDS48 and Aimless in CCP4.49 Chain A of the N. meningitidis Cas9 X-ray structure (residues 249-445 only, PDB ID: 6JDQ) was used for molecular replacement with Phaser followed by AutoBuild in Phenix.50 Electron density was only observed for the GeoRec2 subdomain. The GeoRec2 structure was finalized through manual building in Coot51 and refinement in Phenix.

DNA cleavage assays

GeoCas9 gRNA templates containing 21-nt, 23-nt, or 25-nt spacers targeting the mouse Tnnt2 gene locus were introduced into EcoRI and BamHI sites in pUC57 (Genscript). The plasmid was transformed into BL21(DE3) cells (New England BioLabs) and subsequent restriction digest of the plasmid DNA was carried out using the BamHI restriction enzyme (New England BioLabs) according to the manufacturer’s instructions. Linearized plasmid DNA was immediately purified using the DNA Clean and Concentrator-5 kit (Zymo Research) according to the manufacturer’s instructions. RNA transcription was performed in vitro with the HiScribe T7 High Yield RNA Synthesis Kit (New England BioLabs). DNA substrates containing the target cleavage site (479 base pairs, Table S2, S3) were produced by polymerase chain reaction (PCR) using mouse genomic DNA as a template and primer pairs 5’CAAAGAGCTCCTCGTCCAGT3’ and 5’ ATGGACTCCAGGACCCAAGA3’ followed by a column purification using the NucleoSpinⓇ Gel and PCR Clean-up Kit (Macherey-Nagel). For the in vitro cleavage assay, RNP formation was achieved by incubating 3 µM GeoCas9 (WT, K267E, R332A, or K267E/R332A variant) and 3 µM gRNA at either 37 °C, 60 °C, 75 °C, or 85 °C for 30 minutes in a reaction buffer of 20 mM Tris, 100 mM KCl, 5 mM MgCl2, 1 mM DTT, and 5% glycerol at pH 7.5. The 10 µL cleavage reactions were set up by mixing RNP at varying concentrations with 149 nanograms of PCR products on ice followed by incubation at 37 °C for 30 minutes. The reaction was quenched by adding 1 µL of proteinase K (20 mg/mL) and subsequent incubation at 56 °C for 10 minutes. Two microliters of 6x DNA loading buffer were added to each reaction and 10 µL of the mixture per lane was loaded onto an 1% agarose gel. DNA band intensity measurements on a gel image were carried out with ImageJ.

For in vitro off-target cleavage assays, RNP formation was achieved by incubating 10 µM GeoCas9 (WT, K267E, R332A, or K267E/R332A variant) and 10 µM gRNA at 37 °C for 30 minutes in the reaction buffer described above. The 10 µL cleavage reactions were set up by mixing 1 µM RNP with 150 nanograms of PCR products (off-target DNA sequences listed in Table S2) on ice followed by incubation at 37 °C for varying time points. The reaction was quenched and visualized as described above. WT and HiFi SpCas9 control proteins were purchased from Integrated DNA Technologies (IDT, cat. No. 108158 and No. 108160, respectively), as was the associated SpCas9 gRNA, Alt-R™ CRISPR-Cas9 gRNA, with an RNA spacer sequence complementing 5’-TGGACAGAGCCTTCTTCTTC-3’. The on-target and off-target DNA sequences used for the SpCas9 in vitro cleavage assay can be found in Table S3.

Author Contributions

HBB and ALK produced GeoRec1, GeoRec2, GeoRec, and GeoCas9 proteins and gRNA, conducted the NMR and biophysical experiments, analyzed the data, and wrote the original draft of the manuscript. AMD solved the X-ray crystal structure of GeoRec2. ZF and JL conducted GeoCas9 functional assays and analyzed the data. GJ supervised collection of X-ray crystallographic data. GPL conceived the study, supervised collection of NMR spectroscopic data, obtained funding, and wrote the original draft. The final manuscript was written and edited with contributions from all authors.

Acknowledgements

This work was supported by NIH grant R01 GM 136815 and NSF grant MCB 2143760 (to GPL). This research used the AMX beamline of the National Synchrotron Light Source II, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under Contract No. DE-SC0012704. The Center for BioMolecular Structure (CBMS) is primarily supported by the NIH (NIGMS) through a Center Core P30 Grant (P30GM133893), and by the DOE Office of Biological and Environmental Research (KP1607011).