Research Article

Biochemistry and Chemical Biology

The dynamic conformational landscape of the protein methyltransferase SETD8

Memorial Sloan Kettering Cancer Center, United States
Shanghai Institute of Materia Medica, Chinese Academy of Sciences, China
Icahn School of Medicine at Mount Sinai, United States
University of Toronto, Canada
University of Georgia, United States
Science Center Drive, United States
University of Chinese Academy of Sciences, China
Weill Cornell Medical College of Cornell University, United States

May 13, 2019

Open access
Copyright information

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

Elucidating the conformational heterogeneity of proteins is essential for understanding protein function and developing exogenous ligands. With the rapid development of experimental and computational methods, it is of great interest to integrate these approaches to illuminate the conformational landscapes of target proteins. SETD8 is a protein lysine methyltransferase (PKMT), which functions in vivo via the methylation of histone and nonhistone targets. Utilizing covalent inhibitors and depleting native ligands to trap hidden conformational states, we obtained diverse X-ray structures of SETD8. These structures were used to seed distributed atomistic molecular dynamics simulations that generated a total of six milliseconds of trajectory data. Markov state models, built via an automated machine learning approach and corroborated experimentally, reveal how slow conformational motions and conformational states are relevant to catalysis. These findings provide molecular insight on enzymatic catalysis and allosteric mechanisms of a PKMT via its detailed conformational landscape.

https://doi.org/10.7554/eLife.45403.001

eLife digest

Our cells contain thousands of proteins that perform many different tasks. Such tasks often involve significant changes in the shape of a protein that allow it to interact with other proteins or ligands. Understanding these shape changes can be an essential step for predicting and manipulating how proteins work or designing new drugs. Some changes in protein shape happen quickly, whereas others take longer. Existing experimental approaches generally only capture some, but not all, of the different shapes an individual protein adopts.

A family of proteins known as protein lysine methyltransferases (PKMTs) help to regulate the activities of other proteins by adding small tags called methyl groups to specific positions on their target proteins. PKMTs play important roles in many life processes including in activating genes, maintaining stem cells and controlling how organs develop.

It is important for cells to properly control the activity of PKMTs because too much, or too little, activity can promote cancers and neurological diseases. For example, genetic mutations that increase the levels of a PKMT known as SETD8 appear to promote the progression of some breast cancers and childhood leukemia. There is a pressing need to develop new drugs that can inhibit SETD8 and other PKMTs in human patients. However, these efforts are hindered by the lack of understanding of exactly how the shape of PKMT proteins change as they operate in cells.

Chen, Wiewiora et al. used a technique called X-ray crystallography to generate structural models of the human SETD8 protein in the presence or absence of native or foreign ligands. These models were used to develop computer simulations of how the shape of SETD8 changes as it operates. Further computational analysis and laboratory experiments revealed how slow changes in the shape of SETD8 contribute to the ability of the protein to attach methyl groups to other proteins.

This work is a significant stepping-stone to developing a complete model of how the SETD8 protein works, as well as understanding how genetic mutations may affect the protein’s role in the body. The next step is to refine the model by integrating data from other approaches including biophysical models and mathematical calculations of the energy associated with the shape changes, with a long-term goal to better understand and then manipulate the function of SETD8.

https://doi.org/10.7554/eLife.45403.002

Introduction

Proteins are not static, but exist as an ensemble of conformations in dynamic equilibrium (Wei et al., 2016). Characterization of conformational heterogeneity can be an essential step towards interpreting function, understanding pathogenicity, and exploiting pharmacological perturbation of target proteins (Ferguson and Gray, 2018; Latorraca et al., 2017; Lu et al., 2016). Biophysical techniques such as X-ray crystallography (Shi, 2014), nuclear magnetic resonance (NMR) (Huang and Kalodimos, 2017), and cryo-electron microscopy (Fernandez-Leiro and Scheres, 2016) mainly provide static snapshots of highly-populated conformational states. While complementary techniques such as relaxation-dispersion NMR can resolve a limited number of low-population states, they are incapable of providing detailed structural information (van den Bedem and Fraser, 2015). By contrast, molecular simulations provide atomistic detail---a prerequisite to structure-guided rational ligand design---and insight into relevant conformational transitions (Wei et al., 2016). The emergence of Markov state models (MSMs) has shown the power of distributed molecular simulations in resolving complex kinetic landscapes of proteins (Husic and Pande, 2018; Plattner et al., 2017). By integrating simulation datasets with MSMs, functionally relevant conformational dynamics as well as atomistic details can be extracted (Plattner et al., 2017). Recently, MSMs have been used to identify key intermediates for enzyme activation (Shukla et al., 2014; Sultan et al., 2017) and allosteric modulation (Bowman et al., 2015). However, these approaches are limited by the number of seed structures and timescales accessible by molecular simulations (generally microseconds for one structure) relative to the reality of complicated conformational transitions (up to milliseconds for multiple structures) (Klepeis et al., 2009). To overcome the limitations of individual techniques, efforts have been made to combine simulation with experiment to characterize and experimentally validate conformational landscape models of proteins that provide insight into functions (Hart et al., 2016; Knoverek et al., 2019; Latallo et al., 2017; Zimmerman et al., 2017).

Protein lysine methyltransferases (PKMTs) comprise a subfamily of posttranslational modifying enzymes that transfer a methyl group from the cofactor S-adenosyl-L-methionine (SAM) (Luo, 2018). PKMTs play epigenetic roles in gene transcription, cellular pluripotency, and organ development (Allis and Jenuwein, 2016; Murn and Shi, 2017). Their dysregulation has been implicated in neurological disorders and cancers (Dawson, 2017; Flavahan et al., 2017). SETD8 (SET8/Pr-SET7/KMT5A) is the sole PKMT annotated for monomethylation of histone H4 lysine 20 (H4K20me) (Fang et al., 2002; Nishioka et al., 2002) and many nonhistone targets such as the tumor suppressor p53 and the p53-stabilizing factor Numb (Dhami et al., 2013; Shi et al., 2007). Disruption of endogenous SETD8 leads to cell cycle arrest and chromatin decondensation, consistent with essential roles for SETD8 in transcriptional regulation and DNA damage response (Beck et al., 2012; Liu et al., 2010; Veschi et al., 2017). SETD8 has also been implicated in cancer invasiveness and metastasis (Yang et al., 2012). High expression of SETD8 is associated with pediatric leukemia and its overall low survival rate (Hashemi et al., 2014). While there is enormous interest in elucidating functional roles of SETD8 in disease, it has been challenging to develop potent, selective, and cellularly active SETD8 inhibitors (Blum et al., 2014; Milite et al., 2016a; Milite et al., 2016b).

Given the essential roles of conformational dynamics in enzymatic catalysis (Schramm, 2011; Wei et al., 2016) and our current limited knowledge of conformational landscapes of PKMTs, we envisioned characterizing the dynamic conformational landscapes of SETD8 and its cancer-associated mutants with atomic resolution. To access previously-unseen, less-populated conformational states of SETD8 to seed parallel distributed molecular dynamics (MD) simulations, we envisioned trapping these conformations with small-molecule ligands. Here we solved four distinct crystal structures of SETD8 in alternative ligand-binding states with covalent SETD8 inhibitors and native ligands. With the aid of these new structures, we generated an aggregate of six milliseconds of unbiased explicit solvent MD simulation data for apo- and SAM-bound SETD8. Using a machine learning approach to select features and hyperparameters for MSMs via extensive cross-validation, we clustered apo-SETD8 conformers into 24 kinetically distinct, likely functionally relevant metastable conformational states and annotated how the conformational landscape is remodeled upon SAM binding. We then explored these conformational landscape models experimentally with stopped-flow kinetics and isothermal titration calorimetry by examining SAM binding, characterizing rationally-designed SETD8 variants with increased catalytic efficiency, and resolving multiple timescales associated with transitions among these conformers. The resulting model furnishes key insights into how these dynamic conformations play a role in catalysis of SETD8 and how cancer-associated SETD8 mutants alter this process allosterically through reshaping the conformational landscape rather than directly affecting the catalytic site. These findings suggest the importance of referencing conformational landscapes for elucidating enzymatic catalysis and allosteric regulation of SETD8 and likely other PKMTs.

Results

Crystal structures of SETD8 associated with hidden conformations

To identify hidden high-energy conformational states of SETD8, we envisioned a strategy of trapping the associated conformers with small-molecule ligands. The development of high-affinity SETD8 inhibitors with canonical target-engagement modes is challenging (Milite et al., 2016b), and led us to exploit covalent inhibitors (Blum et al., 2014; Butler et al., 2016). These compounds can overcome the high energy penalties associated with hidden conformers through the irreversible formation of energetically-favored inhibitor‒SETD8 adducts. Our prior efforts led to the development of covalent inhibitors containing 2,4-diaminoquinazoline arylamide and multi-substituted quinone scaffolds by targeting Cys311 (Blum et al., 2014; Butler et al., 2016). Upon further optimization of these scaffolds, we identified MS4138 (Inh1) and SGSS05NS (Inh2) (Luo et al., 2015), two structurally distinct covalent inhibitors with the desired potency against SETD8 (Figure 1a, Figure 1—figure supplement 1). X-ray crystal structures of SETD8 were then solved in complex with Inh1 and Inh2, respectively (Figure 1b,c, Figure 1—figure supplements 2 and 3, Table 1). Notably, despite the overall structural similarity of the pre-SET, SET, and SET-I motifs, the Inh1- and Inh2-SETD8 binary complexes (BC-Inh1 and BC-Inh2) differ from the SETD8-SAH-H4 ternary complex (TC) (Couture et al., 2005; Couture et al., 2008; Xiao et al., 2005) by the distinct conformations of their post-SET motifs. The post-SET motif of TC was characterized by its U-shaped topology with a double-kinked loop-helix-helix architecture, which appears to be optimally oriented for binding both SAM and a peptide substrate (Figures 1c and 2) (Couture et al., 2005; Couture et al., 2008; Xiao et al., 2005). In comparison, BC-Inh1 and BC-Inh2 rotate their post-SET motifs by 140° and 60°, respectively (Figure 2). Moreover, the post-SET motifs of BC-Inh1 and BC-Inh2 adopt more extended configurations with a less structured loop and a singly-kinked helix, respectively (Figures 1c and 2). Whereas multiple factors may influence the overall conformations, the formation of Cys311 adducts likely made the key contribution to the discovery of these hidden post-SET motif conformers.

Figure 1 with 5 supplements see all

Download asset Open asset

Diverse SETD8 conformations captured in altered ligand-binding states.

(a) Structures of SETD8 ligands involved in this work. Two covalent inhibitors targeting Cys311 (MS4138 as **Inh1** and SGSS05NS as **Inh2**) and the cofactor SAM were used as ligands to trap neo-conformations of SETD8. (b) Domain topology of SETD8 (Uniprot: Q9NQR1-1). Four functional motifs at SETD8’s catalytic domain are colored: pre-SET (light green), SET (dark yellow), SET-I (purple), and post-SET (orange). (c) Cartoon representations of four neo-structures of SETD8 (**BC-Inh1**, **BC-Inh2**, **BC-SAM**, and **APO**) and a structure of a SETD8-SAH-H4 ternary complex (TC). These structures are shown in two orthogonal views with ligands, pre-SET, SET, SET-I, and post-SET colored in cyan, light green, dark yellow, purple, and orange, respectively.

https://doi.org/10.7554/eLife.45403.003

Figure 2

Download asset Open asset

Superposition of five crystal structures highlighted with detailed views of post-SET, SET-I, and pre-SET motifs.

The five X-ray structures reveal four distinct conformational states of the post-SET motif (**P1-4**) and three distinct conformational states of the SET-I motif (**I1-3**).

https://doi.org/10.7554/eLife.45403.009

Table 1

Data collection and refinement statistics of crystallography.

https://doi.org/10.7554/eLife.45403.010

	BC-Inh1	BC-Inh2	BC-SAM	APO
PDB Code	6BOZ	5W1Y	4IJ8	5V2N
Data collection
Wavelength (Å)	0.98	0.98	0.98	0.98
Space group	P2₁2₁2₁	P2₁2₁2₁	P6₁22	P4₃2₁2
Cell dimensions
a, b, c (Å)	31.56, 68.06, 125.90	58.35, 39.79, 131.90	101.44, 101.44,140.80	60.6, 60.6, 80.7
α, β, γ (°)	90, 90, 90	90, 90, 90	90, 90, 120	90, 90, 90
Resolution (Å)	62.95–2.40 (2.49–2.40)	43.70–1.70 (1.73–1.70)	47.72–2.00 (2.11–2.00)	50.00–2.00 (2.03–2.00)
Unique reflections	11,209 (1550)	34,422 (1769)	29,619 (4231)	10,736 (918)
Redundancy	3.6 (3.0)	3.8 (3.6)	21.6 (22.0)	14.5
Completeness (%)	99.5 (97.8)	99.4 (96.3)	100.0 (100.0)	99.8 (97.0)
I/σ(I)	8.2 (3.4)	15.0 (1.8	19.7 (4.0)	20.0 (1.1)
R_sym^a	0.110 (0.361)	0.064 (0.657)	0.112 (0.942)	0.13 (0.460)
R_pim	0.065 (0.129)	0.036 (0.386)	0.025 (0.205)	0.040 (0.4)
Refinement
No. protein molecules/ASU	2	2	2	1
Resolution (Å)	62.95–2.40	35.00–1.70	43.96–2.00	48.47–2.00
Reflections used or used/free	11,165/1065	32,998/1373	28,045/1516	10,153/513
Rwork	0.179	0.201	0.176	0.183
Rfree	0.242	0.237	0.199	0.249
Average B value (Å²)	38.9	20.9	37.8	41.6
Protein	39.8	20.7	37.9	40.7
Compound	20.0	16.4	24.5	n/a
Other	38.5	20.7	44.7	50.9
Water	30.4	24.3	36.9	49.4
Number of Atoms	2299	2835	2675	1404
Protein	2161	2553	2416	1267
Compound	60	38	54	0
Other	4	36	72	12
Water	74	208	133	125
RMS Bonds (Å)	0.007	0.014	0.015	0.010
RMS Angles (°)	0.9	1.6	1.5	1.4
Wilson B value (Å²)	30.0	18.7	32.6	35.3
Ramachandran plot
Most favored (%)	94.8	96.9	98.4	92.3
Additional allowed (%)	5.2	3.1	1.6	7.0
Generously allowed (%)	0.0	0.0	0.0	0.7
Outliers (%)	0.0	0.0	0.0	0.0

To reveal additional hidden conformers that are structurally distinct from TC, we also solved crystal structures of SETD8 upon depleting native ligands and obtained structures of the SAM-SETD8 binary complex (BC-SAM) and apo-SETD8 (APO) (Figure 1c, Figure 1—figure supplements 4 and 5, Table 1). Strikingly, BC-SAM and APO differ from TC by their distinct SET-I motifs in the context of the otherwise similar SET-domain (Figure 2). Furthermore, the post-SET motif of APO structurally resembles an intermediate state between BC-Inh1 and BC-Inh2 but is distinct from those of BC-SAM and TC (Figure 2). In contrast to the structurally diverse SET-I (I1-3) and post-SET motifs (P1-4) in these structures, their pre-SET motifs show only slightly altered configuration (Figure 2). The differences between these structures highlight the conformational plasticity of the SET-I and post-SET motifs. Collectively, these observations provide strong structural rationale for the existence of a dynamic conformational landscape of SETD8.

Hidden conformations of apo-SETD8 revealed by structural chimeras

The BC-SAM, BC-Inh1, BC-Inh2, APO, and TC structures can be readily classified into three distinct SET-I configurations (I1-3) and four distinct post-SET configurations (P1-4) (Figure 2). Given the relative spatial separation between the SET-I and post-SET motifs, we envisioned additional combinations of discrete motifs might be accessible to yet-unobserved conformations of SETD8. We thus constructed putative ‘structural chimeras’ of apo-SETD8 containing orthogonal I1-3 and P1-4 in a combinatorial (3 × 4) manner (Figure 3a, Figure 3—figure supplement 1). Among the twelve structural chimeras as potential seeds for MD simulations, five were crystallographically-determined conformers (BC-Inh1, BC-Inh2, BC-SAM, TC with ligands removed, and APO), four were new structurally-chimeric conformers, and three were excluded because of obvious steric clashes (Figure 3a, Figure 3—figure supplement 2). The four structurally-chimeric conformers were included to seed MD simulation with the intention to uncover the conformational landscape more effectively, although this operation proved to be redundant for the discovery of new conformations in the validation process (see details below).

Figure 3 with 2 supplements see all

Download asset Open asset

Construction of conformational landscapes of apo- and SAM-bound SETD8 through diversely seeded, parallel molecular dynamics simulations and Markov state models.

(a) Combinatorial construction of putative structural domain chimeras using crystallographically-derived post-SET and SET-I conformations. Each conformer is boxed and color-coded with black for five X-ray-derived structures, blue for four putative structural chimeras included as seed structures for MD simulations, and gray for three structural chimeras excluded from MD simulations because of obvious steric clashes. (b) Schematic workflow to construct dynamic conformational landscapes via MSM. The five X-ray structures and the four structural chimeras were used to seed parallel MD simulations on Folding@home (see Materials and method). Markov state models were constructed from these MD simulation results to reveal the conformational landscape.

https://doi.org/10.7554/eLife.45403.011

Dynamic conformational landscape of apo-SETD8 via Markov state modeling from 5 ms MD simulation dataset

With seed conformations prepared as above, we envisioned illuminating the conformational landscape with distributed long-timescale MD simulations and resolving its kinetic features with Markov state models (MSMs) (Figure 3b, Figure 3—figure supplement 1). Because there is no prior report of the conformational landscapes of PKMTs that can be used as the reference of SETD8, we leveraged extensive computational power for MD simulation with the intention to not only uncover the conformational landscape of SETD8 in an unbiased manner but also cross-validate the completeness of the dataset. Here we conducted approximately 500 × 1 µs explicit-solvent MD simulations from each seed and accumulated 5 milliseconds of aggregate data in 10 million conformational snapshots for apo-SETD8 (Appendix 1—figure 1, Supplementary file 1a). To identify functionally relevant conformational states and their transitions, we built MSMs using a pipeline that employs machine learning and extensive hyperparameter optimization to identify slow degrees of freedom and structural and kinetic criteria to cluster conformational snapshots into discrete conformational states (Appendix 1—figures 2–9, Supplementary file 1b, 1c) (Husic et al., 2016). This approach identified 24 kinetically metastable conformations (macrostates) from an optimized, cross-validated set of 100 microstates (Figure 4a, Figure 5—figure supplement 1, Supplementary file 1d, 1e). These macrostates are remarkably diverse, spanning up to 10.5 Å Cα RMSD from APO. To visualize the kinetic relationships between functionally important conformations, dimensionality reduction was used to project the landscape into 2D while preserving log inverse fluxes between states (Figure 4b). The relative populations of these macrostates were also calculated, resolving rare conformational states up to 6 kT in free energy (Figures 4b and 5a).

Figure 4

Download asset Open asset

Markov state models and conformational landscapes of apo- and SAM-bound SETD8.

Kinetically metastable conformations (macrostates) obtained from kinetically coupled microstates via Hidden Markov Model (HMM) analysis. The revealed dynamic conformational landscapes consist of 24 macrostates for apo-SETD8 (left panel) and 10 macrostates for SAM-bound SETD8 (right panel). (a) Kinetic and structural separation of macrostates in a 3D scatterplot. The MDS1/MDS2 axes are the two top vectors used in multidimensional scaling (MDS), a dimensionality reduction method, for separation of macrostates via log-inverse flux kinetic embedding (see Materials and methods). The Z axis reports root-mean-square deviations (RMSDs) of each macrostate to **APO** (left) or **BC-SAM** (right). The relative population of each macrostate of apo- or SAM-bound SETD8 ensembles is proportional to the volume of each representative sphere. (b) Cartoon depiction of macrostates in a 2D scatterplot. The diameter of the corresponding circle in the 2D scatterplot is proportional to the diameter of the respective sphere in the 3D scatterplot above. Equilibrium kinetic fluxes larger than 7.14 × 10² s⁻¹ for apo- and 1.39 × 10³ s⁻¹ for SAM-bound SETD8 are shown for interconversion kinetics with thickness of the connections proportional to fluxes between two macrostates.

https://doi.org/10.7554/eLife.45403.014

Figure 5 with 2 supplements see all

Download asset Open asset

Chord diagrams and representative conformers of macrostates.

The colors represent the free energy of each macrostate relative to the lowest free energy macrostate. The equilibrium flux between two macrostates is proportional to thickness of connecting arcs.

https://doi.org/10.7554/eLife.45403.015

The dynamic conformational landscape of SAM-bound SETD8

Given the success in constructing the dynamic conformational landscape of apo-SETD8, we applied the same strategy to SAM-bound SETD8. With the two crystal structures of SETD8 in complex with SAM (BC-SAM and TC) as the seed conformations, we conducted ~500 × 1 µs explicit solvent MD simulations from each structure and accumulated 1 millisecond of aggregate data (2M snapshots) (Appendix 1—figure 10). The MSM of the conformational landscape of SAM-bound SETD8 was constructed using the same degrees of freedom as that of apo-SETD8 to facilitate direct comparison of the models (Appendix 1—figures 11-13). The resulting MSM for SAM-bound SETD8 contained 10 kinetically metastable macrostates arising from 67 microstates (Figure 5—figure supplement 2, Supplementary file 1f, 1g). Similar to those of apo-SETD8, the relative macrostate populations of SAM-bound SETD8 and their flux kinetics were computed and embedded into 3D/2D scatter plots and a chord diagram (Figures 4a, b and 5b). The smaller number of metastable states identified for SAM-bound SETD8 is anticipated given that specific conformations are required for optimal interaction between SAM and SETD8’s post-SET motif (Couture et al., 2005; Couture et al., 2008; Xiao et al., 2005). We also compared the timescale structure of the apo- and SAM-bound SETD8 MSMs, as well as an MSM constructed from the subset of apo-SETD8 trajectories originating from the same conformations as the SAM-bound trajectories (Appendix 1—figure 14). We found a large decrease in the number of slow processes seen in the SAM-bound model compared to the other two (respectively for the apo, SAM-bound, and subset of apo MSMs there are 14, 4, and 9 processes slower than 1 μs). SAM binding thus restricts overall conformational accessibility of SETD8.

Experimental corroboration of the conformational landscapes of SETD8

Upon uncovering the dynamic conformational landscapes of apo- and SAM-bound SETD8 for the first time of the PKMT family of enzymes, we were able to extract new structural information and designed experiments to further examine this model (Figure 6). Comparison of the conformational ensembles between apo- and SAM-bound SETD8 revealed that SAM binding dramatically alters the environment of Trp390 (Figure 6a, blue sticks), the sole tryptophan residue in the catalytic domain of SETD8. This residue is flexible and mainly solvent-exposed in apo-SETD8 conformational ensembles but restricted in a hydrophobic environment through SAM-mediated pi-pi stacking in SAM-bound SETD8 conformational ensembles (Figure 6a). Such environmental changes upon SAM binding are expected to quench fluorescence of Trp390 (Royer, 2006). To verify this prediction, we designed rapid-mixing stopped-flow kinetic experiments with 5 ms dead time and 0.1 ms resolution to track the fluorescence change of Trp390 upon SAM binding (Figure 6b). We observed SAM-dependent biphasic kinetics of the fluorescence decrease within 1 s with >80% of the change occurring in the fast phase (0–0.1 s) (Figure 7a). In the context of the conformational landscape of apo-SETD8, we interpreted the major decrease in fluorescence intensity (fast-phase kinetics) as a consequence of the collective changes of Trp390 from the solvent-exposed hydrophilic environment in apo conformations to the hydrophobic environment in SAM-bound conformations (Figure 6a). In contrast, the minor decrease in fluorescence intensity (slow-phase kinetics) reflects the slow conformational changes of Trp390 in the SAM-bound SETD8 conformational ensembles (Figure 7a). With unsupervised global fitting to this two-step model, we obtained forward and reverse rate constants for the fast- and slow-phase kinetics, which are in agreement with conventional fitting to double exponential kinetics (Johnson, 1992) (Figures 7a, b and 8a, Figure 7—figure supplement 1, Supplementary file 1h). The k_-1 value was also confirmed independently by rapid-mixing stopped-flow dilution of SAM-bound SETD8 (Agafonov et al., 2014) (‘ES +E'S’, Appendix 1—figure 15, Supplementary file 1h). Here the k_-1/k₁ ratio (Figure 7b) of 309 ± 6 µM corresponds to the average SAM dissociation constant K_d1 of apo-SETD8 conformers, which is consistent with independently determined ITC K_d of 251 ± 16 µM (Figure 7b and c, Figure 7—figure supplement 2). In contrast, the large k_-2/k₂ ratio (Figure 7b) of 30 ± 11 suggests that the second phase corresponds to a slow equilibrium between ES and E'S with minimal contribution of E'S to the overall SAM dissociation constant K_d (Figure 7c). The conformational ensembles we identified for apo- and SAM-bound SETD8 demonstrate the statistical nature of its SAM-binding process. Therefore, the observed fluorescence changes and herein determined macroscopic kinetic constants represent an ensemble-weighted average of microscopic behaviors of all species that exist in the solution. A rigorous mathematical description of microscopic kinetics of SAM binding was thus obtained under the consideration of interconversion of the metastable conformational states of apo- and SAM-bound SETD8 (Figure 7—figure supplement 3).

Figure 6

Download asset Open asset

Experimental design to probe the conformational landscape of SETD8.

(a) Comparison of binding environments of Trp390 (blue) between apo and SAM-bound (orange) SETD8 in the context of their dynamic conformational landscapes. (b) Illustration of rapid-quenching stopped-flow experiments. These experiments were conducted to trace fluorescence changes of Trp390 upon SAM binding. (c) Comparison of the conformations of post-SET kink and SET-I helix between apo- and SAM-bound SETD8 in the context of their dynamic conformational landscapes. Analysis of key structural motifs indicated K382P (blue in the upper panel), I293G and E292G (red in the lower panel) as potential gain-of-function variants.

https://doi.org/10.7554/eLife.45403.018

Figure 7 with 3 supplements see all

Download asset Open asset

Biochemical characterization of gain-of-function mutations revealed by conformational landscapes of SETD8.

(a) Fluorescence changes of wild-type and K382P SETD8 traced with a rapid-quenching stopped-flow instrument within 1 s upon SAM binding. (b) Stepwise SAM-binding of SETD8 in the integrative context of biochemical, biophysical, structural, and simulation data. ITC determines the thermodynamic constant of SAM binding by SETD8. MD simulations and MSM uncover metastable conformations and interconversion rates of apo- and SAM-bound SETD8 (K^apo and K^SAM). Stopped-flow experiments revealed that SETD8 binds SAM via biphasic kinetics. Rate constants uncovered by stopped-flow experiments (k₁, k_-1, k₂, k_-2) represent macroscopic rates of SAM binding by SETD8 with multiple metastable conformations. The microscopic behavior of individual metastable states and corresponding rates (k₁, k_-1, k₂, k_-2) have not been resolved. Transition probability matrices (red) and microscopic rate constant matrices (blue) are shown as colored grids. A rigorous mathematical derivation of this scheme is shown in Figure 7—figure supplement 3. (c) ITC enthalpogram for the titration of SAM into wild-type and K382P SETD8.

https://doi.org/10.7554/eLife.45403.019

Figure 8

Download asset Open asset

Kinetic and thermodynamic constants of wild-type SETD8 and its rationally designed mutants.

For k₁, *k_-1*, k₂, *k_-2* in Figure 7, data are best fitting values ± standard error (s.e.) from KinTek. For *K_d-ITC,* data are mean ±s.e. of at least three replicates. K_d1, K_eq, and K_d are calculated based on equations in Methods. Uncertainties of K_d1, K_eq, K_d, and ΔG are s.e. calculated by the propagation of uncertainties from individual rate constants and dissociation constants, respectively. h, Relative energy landscapes of apo- and SAM-bound SETD8 and its gain-of-function mutants. The relative energy of apo- and SAM-bound (wildtype and mutated) SETD8 as well as their transition states were determined on the basis of their k₁, *k_-1*, and K_d values. The relative position of each energy landscape was then set on the basis of the rough counts of mutation-associated loss or gain of favorable interactions in contrast to apo- or SAM-bound wild-type SETD8. All SETD8 variants except SAM-bound I293G disrupt the favorable interactions to various degrees.

https://doi.org/10.7554/eLife.45403.023

We then proposed to confirm our understanding of functionally-relevant conformations and their thermodynamics by identifying SETD8 variants with increased affinity for SAM. We uncovered a collection of characteristic kink motifs around Lys382 in the post-SET motif of SAM-bound SETD8 conformational ensembles (Figure 6c), while this region is less structured in apo-SETD8 conformational ensembles. We hypothesized that a proline mutation (K382P) could better stabilize the conformational ensembles of SAM-bound SETD8 than apo-SETD8 (Figures 6c and 8b). We also considered the characteristic α-helix in the SET-I motif, which adopts flexible and diverse configurations in the apo ensembles but becomes constrained and elongated in SAM-bound ensembles (Figure 6c). We proposed that the replacement of I293 or E292 adjacent to the α-helix with a flexible glycine should relax this distortion to better stabilize SAM-bound ensembles (Figures 6c and 8b). We therefore characterized the SAM-binding kinetics and affinities of K382P, I293G, and E292G variants of SETD8 with stopped-flow kinetics and ITC (Figures 6c, 7a, b and c, Figure 7—figure supplements 1 and 2, Appendix 1—figure 15). While exhibiting biphasic kinetics similar to that of wild-type SETD8, the stopped-flow mixing experiment revealed the three variants showed a significant two-fold decrease of K_d,SAM (Figures 7a, c and 8a). The stopped-flow data further revealed that the two-fold change of K_d,SAM mainly arises from increased SAM-binding rates k₁ with relatively unchanged k_-1 (Figure 8a). These results are consistent with independently determined K_d and k_-1 from ITC and stopped-flow dilution, respectively (Figure 7b and c, Figure 7—figure supplement 2, Appendix 1—figure 15, Supplementary file 1h). Collectively, these observations confirm the robustness of our conformational landscape model for apo- and SAM-bound SETD8.

Effects of key simulation parameters on construction of conformational landscapes

We systematically investigated how the choices of seed structures and simulation time---key computational parameters---influence microstate discovery and quality of conformational landscapes of SETD8 (Figures 9 and 10). The simulations of apo-SETD8 initiated from any single X-ray structure (BC-Inh1, BC-Inh2, BC-SAM, APO, or TC in Figure 1c) only reveal a partial conformational landscape (28–61% microstate coverage, Figure 9a, Supplementary file 1i). To achieve >90% microstate coverage, at least two crystal structures---BC-SAM in combination with either BC-Inh1 or BC-Inh2---must be included (Figure 9a). If three crystal structures are included, BC-SAM in combination with TC and APO can provide >90% coverage (Figure 9a). In terms of the structural motifs (I1-3 or P1-4, Figures 2 and 3a), simulations originating from the SET-I motif I1, I2, or I3 alone led to the discovery of 69, 58, or 39 of the 100 microstates, respectively (Figure 9b, Supplementary file 1j). The combination of I1 and I2 is sufficient to cover all 100 microstates, arguing for the redundant character of I3. For the post-SET motif, any combination of two post-SET configurations except P2∪P3 leads to >90 microstate coverage (Figure 9b, Supplementary file 1j). These findings are in agreement with the key requirement of structural motif conformations I1 (equivalent to BC-Inh1, BC-Inh2, or TC), I2 (equivalent to BC-SAM), and any two of P1−4 except P2∪P3 (e.g. P1∪P3 is equivalent to the combination of APO with BC-SAM or TC) to achieve >90% microstate coverage. For SAM-bound SETD8, the seed conformations derived from BC-SAM and TC structures contribute 31 and 38 of 67 microstates (Figure 9c and d, Supplementary file 1k). These findings argue for the importance of using multiple structures to construct the landscape within achievable computer time. The seed conformations prepared from ligand-trapped SETD8 structures are essential to discovering the complete conformational landscapes of SETD8.

Figure 9

Download asset Open asset

Evaluation of key simulation parameters of molecular simulations.

(**a−b**) Assessments of simulations of apo-SETD8: (a) Heat map for the coverage of the 100 microstates with all combinations of the crystal structures (**BC-Inh1, BC-Inh2, BC-SAM, APO, and TC**) as seed conformations; (b) Venn diagrams of the coverage of the 100 microstates with all conformational combinations of SET-I and post-SET motifs (**I1-3** and **P1-4**) as seed structures for MD simulations. (**c−d**) Robustness of simulations of SAM-bound SETD8: (c) Venn diagram of the coverage of the 67 microstates with TC, **BC-SAM** or both as seed structures for MD simulation; (d) Minimal time required by MD simulations to reach certain coverage of the 67 microstates of SAM-bound SETD8 with representative combinations of seed structures. (e) Minimal time required by MD simulations to reach certain coverage of the 100 microstates of apo-SETD8 with representative combinations of seed structures.

https://doi.org/10.7554/eLife.45403.024

Figure 10 with 1 supplement see all

Download asset Open asset

Contour map of microstate coverage at various combinations of trajectory lengths and numbers as percentage of the maximal trajectory length and number of MD simulations.

The seed structures of each panel are listed as the simulation entries e1, e5−8 for apo-SETD8, and d1−3 for SAM-bound SETD8 in Figure 9d and e. Each curve corresponds to the aggregation of specific simulation time.

https://doi.org/10.7554/eLife.45403.025

For simulation time, we observed that the fewer seed conformations of apo-SETD8 were employed, the more computing power (the product between the number of simulation trajectories and the time length per trajectory) was required to reach a comparable level of microstate coverage (Figure 9e, Supplementary file 1l, 1m). When computing power is fixed, comparable microstate coverages of apo- and SAM-bound SETD8 can be obtained by running either multiple short trajectories or few long trajectories (Figure 10, Figure 10—figure supplement 1). While the current aggregate simulation time (5 ms for apo-SETD8 and 1 ms for SAM-bound SETD8) appears sufficient to discover essentially all relevant conformations in the landscapes of SETD8 and estimate their relative populations and corresponding uncertainties, more data would yet be needed to improve estimates of inter-macrostate kinetics in order to develop a fully kinetically accurate model (Supplementary file 1h, 1l). Collective contributions of the number of seed structures and the overall simulation time determined the efficiency of uncovering conformational landscapes of SETD8. The conformational landscape of apo-SETD8 can be revealed upon implementing a minimum of two seed structures (TC and BC-SAM) or 10% of the current simulation time. With the two seed structures (TC and BC-SAM) and sufficient simulation time, apo-SETD8 sampled 22 more microstates than SAM-bound SETD8 (89 states with 750 μs simulation versus 67 states with 850 μs simulation, Figure 9d,e), consistent with the conformational restriction of SETD8 upon SAM binding. We also noted that it is redundant to include the four structurally-chimeric conformers because this operation contributes less than 10% of microstate coverage and the comparable conformational landscape of apo-SETD8 can be generated with the subsets of seeds solely prepared from the X-ray structures (Supplementary file 1l).

Functionally relevant conformations in the dynamic landscapes of apo- and SAM-bound SETD8

After experimentally corroborating the conformational landscapes of apo- and SAM-bound SETD8, we explored the dynamic details of these landscapes with the focus on the connectivity and equilibrium fluxes between kinetically metastable macrostates (henceforth referred to as the ‘network’). When projected into two dimensions, the conformational landscape of apo-SETD8 takes the form of a dumbbell-like shape containing two lobes, each composed of about 12 macrostates primarily connected via a single hub-like central macrostate A11 (Figures 4b and 11, Supplementary file 1e). The conformational landscape also consists of other multiply-connected macrostates, including A1−A4, A9, and A14, as characterized by their rapid kinetic interconversion with multiple other macrostates (Figures 4b and 5a). Most low-populated macrostates (A17−A24) appear as satellite macrostates in the periphery of the network with few high-flux channels of interconversion to other macrostates (Figures 4b and 5a). The remaining states were classified as basin-like macrostates including {A5, A10}, A7, A8, {A12, A13, A16} and A15, because these macrostates are highly populated and either are relatively isolated or appear in tightly interconnected but globally isolated groups.

Figure 11

Download asset Open asset

Functional annotation of the dynamic conformational landscapes of SETD8.

(a) 3D scatterplots of the 24 macrostates of apo-SETD8 landscape and 10 macrostates of SAM-bound SETD8 landscape in the coordinates of RMSDs relative to **APO**, **BC-SAM**, and TC. Volume of each sphere is proportional to the relative population of the corresponding macrostate in the context of the 24 macrostates for apo-SETD8 or the 10 macrostates for SAM-bound SETD8. The RMSD of each macrostate is the average of its microstates weighted with their intra-macrostate populations. The RMSD of each microstate is the average of the top 10 frames most closely related to the clustering center of the microstate. The feature of each macrostate is annotated in color. (b) Cartoons of representative conformations of key macrostates in the apo-SETD8 landscape. Structural annotations are shown in bottom right of each conformation. (c) 3D scattering plot of 100 microstates of the apo landscape in the coordinates of RMSDs to **APO**, **BC-SAM**, and TC. Volume of each cube is proportional to the relative population of the corresponding microstate in the context of the 100 microstates. Microstates clustered in intermediate-like macrostates are highlighted in colors. Structural diversity of microstates within individual macrostates indicates that each intermediate-like state contains multiple structurally distinct but readily interconvertible microstates. (d) Radar chart of representative macrostates of apo (left) and SAM-bound (right) landscapes in reference to the five crystal structures. Distances between dots and cycle centers are proportional to the reciprocal values of RMSDs of macrostates relative to the crystal structures. (e) Cartoons of representative conformations of key macrostates in the SAM-bound SETD8 landscape. Structural annotations are shown in bottom right of each conformation.

https://doi.org/10.7554/eLife.45403.027

The hub-like macrostate A11 consists of two structurally distinct microstates with comparable populations (Figures 4b and 11a). One microstate structurally resembles the conformation of APO (I₃P₃), while the other microstate represents a conformer with the I₁P₂₃ feature for its SET-I and post-SET motifs (Figure 11b, Supplementary file 1d). Rapid conformational interconversions within A11 are consistent with its hub-like character, centered between the two lobes of the dumbbell-like network. Interestingly, macrostates kinetically adjacent to A11 have structurally similar SET-I motifs within each lobe but distinct SET-I motifs between the two lobes ({I2 ~3} for the left and {I1 ~2} for the right) (Figures 4b and 11b). Therefore, A11 is a transition-type state essential for the conformational fluxes of the macrostates between the two lobes, involved in a key step of conformational changes of the SET-I motif between {I1 ~2} and {I2 ~3}.

The intermediate-like macrostates A1−A4, A9, and A14 each contains multiple structurally distinct but kinetically associated microstates (Figures 4b, 11a and b). The satellite macrostates A17−A24 are less populated and more structurally homogeneous (Figures 4b, 11a and b). Conformers in the macrostates A22, A24, and A20 are structurally similar to TC and BC-SAM with slightly different but well-defined SAM-binding pockets, suggesting minimal conformational reorganization of A22, A24, and A20 is required to accommodate the cofactor (Figure 11a,b,c). Interestingly, A22 and A24, whose overall structures are similar to each other (TC-like), rarely interconvert in the apo landscape (Figure 4b). In contrast, the basin-like macrostates {A5, A10}, A7, A8, {A12, A13, A16} and A15 do not contain a well-defined SAM-binding pocket (Figure 11a,b,c). Here the conformers in macrostate A12 are similar to APO, the conformers in the macrostate A6 are similar to BC-Inh1, and the conformers in the macrostate A10 are similar to BC-Inh2 (Figure 11d). The structural similarity between the simulated conformers and BC-Inh1/2 suggests that the two covalent inhibitors successfully trapped key hidden conformers of apo-SETD8.

Similar to that of apo-SETD8, the interconversion network of the macrostates of SAM-bound SETD8 also displays a dumbbell-like shape with S9 as the hub-like state connecting the two lobes of the network (Figures 4b and 11a). The macrostates S1 and S3−S5 are multi-connected states; S6, S8, and S10 are satellite-like states; S2 and S7 are basin-like states (Figure 11a,b). Notably, the complexity of the overall conformational landscape of SAM-bound SETD8 is significantly reduced in comparison with those of apo-SETD8 (Figures 4b and 11a). The conformers in S1, S2, and S10 are structurally similar to those of A20, as well as BC-SAM; the conformers in S4, S6, and S8 are structurally similar to those in A22 and A24, as well as TC (Figure 11d,e). The structural similarities between these apo and SAM-bound macrostates suggest possible pathways for connecting the two conformational landscapes upon SAM binding.

Characterization of cancer-associated SETD8 mutants

Sequences from tumor samples retrieved from cBioPortal (Cerami et al., 2012; Cheng et al., 2015; Gao et al., 2013) contain two dozen point mutations in the catalytic domain of SETD8 (Figure 12a and b, Supplementary file 1n). We expect that some of these mutations perturb SETD8 function. Because of conformational heterogeneity, it has historically been challenging for in silico approaches to annotate how mutations---in particular those structurally remote from functional sites---allosterically affect a target protein on the basis of its static structure(s) (Campbell et al., 2016; Klinman and Kohen, 2014; Stefl et al., 2013). Here, we envisioned addressing this challenge with reference to the conformational ensemble of wild-type SETD8. To characterize mutations remote from catalytic sites (20 out of 24 known mutations), 40 independent microsecond-long MD simulations for each of the cancer-associated apo-SETD8 mutants were conducted with seed structures prepared from one ternary complex (TC) conformer---a structure resembling the enzymatic transition state and thus essential for SETD8-catalyzed methylation reaction (Linscott et al., 2016). We then constructed a differential residue-contact map for each variant (Figure 12c,d) and extracted snapshots representing the largest conformational deviations from the wild-type conformational ensembles (Figure 12e). Even with modest simulation time, 8 of the 20 examined cancer-associated mutants displayed neo-conformations that were not observed in the 5 ms wild-type dataset and cannot be predicted from static X-ray crystal structures. Interestingly, all of the neo-conformations display distinct reorganizations at the SET-I motif (Figure 12e). For instance, a single point mutation A296T, ~16 Å remote from the active site, yields five distinct neo-conformations (Figure 12e,f). In addition, relative to wild-type apo-SETD8, this mutant populates several conformations with a structurally relaxed α-helix at the SET-I motif (Figure 12e). C324del, ~20 Å from the SET-I motif, is associated with three neo-conformations and displays the largest changes in the differential contact map (Figure 12d, panel 13). The remote H340D mutation is associated with one neo-conformation as well as more populated conformations containing spatially compressed active sites (Figure 12d, panel 7; Figure 12e). Using in vitro radiometric assays, the A296T and H340D mutants were characterized by loss of the methyltransferase activity on H4K20 peptide substrate (Figure 12g). The failure to purify recombinant C324del also supports the impact of this deletion on SETD8 function. H388Q, which mutates a histidine involved in substrate binding, is also associated with neo-conformations as well as loss of the methyltransferase activity (Figure 12e,g). These observations provide potential molecular rationale for how remote mutations can alter the active sites and the SET-I motif---and hence catalysis allosterically---via modulating the overall conformational landscape rather than directly affecting specific residues at the catalytic site. Exceptions are T274I, R279W, R279Q, and A368V, which yielded neo-conformations but showed activity comparable to wild-type SETD8 (Figure 12e,g), suggesting that certain neo-conformations must either still be catalytically competent or their population may not significantly alter the ability to populate conformations relevant for catalysis. The exceptions suggest that a more complete picture of the conformational ensembles might be necessary to uncover quantitative correlations with the relative methyltransferase activities of these SETD8 mutants.

Figure 12 with 2 supplements see all

Download asset Open asset

Computational and experimental characterization of cancer-associated SETD8 mutants.

(a) Cancer-associated mutations in the catalytic domain of SETD8 examined in this work. (b) Cartoon representations of TC with cancer-associated SETD8 mutations highlighted. (c) Differential residue-contact maps of cancer-associated SETD8 mutants in reference to wild-type apo-SETD8 (gray). Residue-residue contact map of wild-type apo-SETD8 is presented as a 162 × 162 matrix. The vertical and horizontal axes show the residue numbers of SETD8’s catalytic domain. The contact of a pair of residues is scored as ‘1’ if their distance is shorter than 4.0 Å; ‘0’ if the distance is equal or above 4.0 Å. For the 60 1ZKK(chain A)-seeded MD trajectory frames of wild-type apo-SETD8, the average contact fraction of each residue pair is presented in a square shape and depicted with a gray gradient at the corresponding vertical and horizontal coordinates. The contact fraction of cancer-associated SETD8 mutants were obtained in a similar manner. The vertical and horizontal coordinates of representative positive changes of the contact scores from wild-type to mutated SETD8 (newly acquired interactions) are highlighted in red-gradient squares with details expanded in the next panel. (d) Representative contacts in the differential residue-contact maps of cancer-associated SETD8 mutants. The contacts of SETD8 mutants with >3 fold gain of contact fraction relative to wild-type SETD8 are listed. Increased magnitude of the contact fraction is depicted in red gradient as described in the previous panel. Only positive changes (newly acquired interactions) are presented with the two residues involved labeled in left and top; the fold of the increase of their contact score labeled in bottom. (e) Cartoon representations of neo-conformations revealed by simulations of SETD8 mutants. Large conformational changes are observed in the SET-I (purple) and post-SET (orange) motifs. (f) Differential residue-contact maps of the structurally relaxed α-helix at the SET-I motif of SETD8 A296T mutant. Decrease of contact fraction relative to wild-type SETD8 is depicted in blue gradient. (g) Enzymatic activities of wild-type and mutated SETD8 determined by an in vitro radiometric assay with H4K20 peptide substrate. Here SETD8 mutants are categorized as the following: red, uncovered neo-conformations (Neo-conf.) with >90% loss of methyltransferase activity; green, populated inactive conformations (Pop. shift) with partially abolished methyltransferase activity; blue, no large change of differential contact maps with comparable methyltransferase activity with wild-type SETD8; brown, unknown relationship between differential contact maps and methyltransferase activities. Data are mean ±standard deviation (s.d.) of 3 replicates.

https://doi.org/10.7554/eLife.45403.028

The differential residue-contact maps further revealed that 8 out of the 20 remote mutations alter conformational landscapes by changing populations of pre-existing conformations (Figure 12c,d). For instance, E257K, G280S, A301V, T309M, E330Q, D352Y mutations populate conformations containing spatially compressed active sites (Figure 12—figure supplement 1); E372D populates conformations containing a constrained post-SET motif; R333C populates conformations with reorganized SET motifs adjacent to the peptide binding pocket. All of these mutations showed partial loss of methyltransferase activity (Figure 12g). Notably, these structural alterations are often remote from the corresponding mutation sites (Figure 12b). In contrast, R244S and V356I (2 out of 20) showed no significant conformational change on the basis of their differential contact maps, consistent with their comparable methyltransferase activity to wild-type SETD8 (Figure 12g). Likely due to insufficient simulation time (40 × 1 μs/mutant), R333L and L334P variants, characterized by partial-to-complete loss of the methyltransferase activity (Figure 12g), showed similar conformational landscapes to that of wild-type apo-SETD8. These exceptions, though only a small portion of all mutants studied, point to the necessity of a more extensive exploration of the conformational ensembles to obtain quantitative correlations of the atomistic structure with activities of this collection of SETD8 mutants. Exploring these conformational landscapes is thus an effective strategy to reveal structural alterations associated with majority of remote-site mutations of SETD8 for qualitative functional annotation. More importantly, this change provides a mechanistic rationale of the allosteric effect of remote residues of SETD8 with reference to its conformational landscape.

Discussion

Here we have demonstrated that tight integration of structural determination---using covalent probes and multiple ligand-binding states to trap hidden conformations (Figure 1)---with distributed molecular simulations and the powerful framework of Markov state models (Figure 3b) can provide insights into the detailed conformational dynamics of an enzyme. The current work demonstrates the merit of an approach that leverages multiple X-ray structures with distinct diverse conformations of a PKMT for MD simulations and machine-learning-based MSM construction to elucidate complex conformational dynamics, and corroborates the resulting model experimentally with testable biophysical predictions (Figures 6–8). Previously, individual components of our integrative strategy have been employed to study the dynamics of transcriptional activators (Wang et al., 2013), kinases (Shukla et al., 2014; Sultan et al., 2017), and allosteric regulation (Bowman et al., 2015). Several efforts have also been made to combine experimental and computational approaches to explore conformational landscapes of proteins and their utilities (Hart et al., 2016; Knoverek et al., 2019; Latallo et al., 2017; Zimmerman et al., 2017). However, it is the first time that these diverse approaches are consolidated explicitly with the goal of illuminating conformational dynamics of a PKMT in a comprehensive and feasible manner. Assessment of key computational parameters concluded that we have utilized sufficient or even redundant seed structures and simulation time for essentially complete microstate discovery (Figures 9 and 10). This implementation is essential for the current work because of the lack of the conformational landscapes of PKMTs as reference or for validation. Notably, we relied on a unique computational resource---Folding@home---to collect six-millisecond of aggregate simulation data (see Materials and methods). Without access to Folding@home, contemporaneous progress on developing adaptive Markov state model construction algorithms---where iterative model building guides the collection of additional simulation data (Hruska et al., 2018; Shamsi et al., 2017; Zimmerman et al., 2018)---will still allow research groups to achieve this feat on local GPU clusters or cloud resources in the near future. Furthermore, the concept of adaptive model construction can be extended to identify which new structural or biophysical data would be valuable in reducing uncertainty (Dixit and Dill, 2018; Matsunaga and Sugita, 2018; Olsson et al., 2017) and producing refined MSMs. Utilizing the slow collective variables identified here, advanced sampling methods such as metadynamics (Saladino and Gervasio, 2012) or umbrella sampling (Meng and Roux, 2014) can be applied to more efficiently compute the free energy landscape for SETD8 and its mutants. With a transfer learning approach (Sultan et al., 2017), it is also possible to adapt these collective variables to other members of the PKMT protein family.

This work represents the first time that conformational dynamics of a protein methyltransferase have been definitively characterized with atomic details. SETD8 adopts extremely diverse dynamic conformations in apo and SAM-bound states (24 and 10 kinetically metastable macrostates, respectively, Figure 4). Interconversions between metastable conformers cover a broad spatio-temporal scale in particular associated with motions of SETD8’s SET-I and post-SET motifs (Figures 1, 2 and 11). In the apo landscape, the general structural features of the X-ray structures of BC-Inh1, BC-Inh2, APO, BC-SAM and TC (Figure 1) are recapitulated by a subset of macrostates (e.g. A6 for BC-Inh1; A10 for BC-Inh2; A12 for APO; A20 for BC-SAM; A22, A24 for TC, 6 of 24 macrostates, Figure 11). Such observation indicates that these X-ray structures trapped in the different ligand-binding states are not ligand-induced artifacts but indeed relevant snapshots of hidden conformations of apo-SETD8. Similarly, a few macrostates in the SAM-bound landscape also recapitulate major structural features of the two cofactor-bound X-ray structures (e.g. S1, S2, S10 for BC-SAM, S4, S6, S8 for TC, 6 of 10 macrostates, Figure 11). Meanwhile, our results also demonstrate that X-ray crystallography alone is insufficient to capture all metastable conformations of SETD8. In addition, there is no correlation of overall structural similarity and interconversion rates between metastable conformers. As observed previously in other studies of protein dynamics (Bowman and Pande, 2010), in addition to fast transitions between structurally similar conformers and slow transitions between structurally distinct conformers (e.g. microstates within individual satellite macrostates A17−A24 of apo‒SETD8; S6, S8, and S10 of SAM-bound SETD8, Figure 11), we frequently observed fast kinetics of transitions between structurally distinct microstates (e.g. microstates within hub-like macrostates A11 and S8; multi-connected states A1−A4, A9, A14, S1 and S3−S5) and vice versa (e.g. macrostates A22 and A24) (Figures 4 and 11). It is thus interesting to examine how other factors such as specific residue contacts and cooperative long-range motions of certain structural motifs play roles in interconversion kinetics. Meanwhile, utilizing the power of Markov state models to stitch together multiple short (microseconds long) trajectories and generate synthetic trajectories orders of magnitude longer (milliseconds), we visualized the MSMs of apo- and SAM-bound SETD8 via 2 ms long (enough to visit all macorstates) movies (Videos 1 and 2).

Video 1

Download asset

posterframe for video — A 2 ms molecular dynamics trajectory simulated from the HMM of apo-SETD8.
https://doi.org/10.7554/eLife.45403.031

Video 2

Download asset

Functional annotation of the landscapes revealed that the SET-I motif adopts diverse conformations (Figures 4 and 5), and its overall configuration is a key feature that differentiates the lobes of the dumbbell-like conformational landscape of SETD8. The conformational dynamics within the hub-like macrostate A11 primarily involve motions of the SET-I motif, secondarily coupling a shift of the post-SET motif. Two gain-of-function I293G and E292G variants of SETD8 were designed for relaxing constrained elongate helix configurations of the SET-I motif upon SAM binding (Figure 6). These findings argue the functional essentiality of the intrinsically dynamic motions of SET-I motif for SETD8 SAM binding and catalysis. Importance of dynamic conformational modulation of the SET-I motif has also been shown for other SET-domain PKMTs. For instance, the SET domains of MLLs and EZH1/2 alone are catalytically inert but active in the presence of binding partners WDR5-RbBP5-Ash2L-Dpy30 (referred as MLL-WRAD) and EED-Suz12 (referred as PRC2), respectively (Luo, M., 2018). Recent structural evidence implicated that the formation of these complexes regulates the conformational dynamics of the SET-I motif, which is essential for catalysis (Justin et al., 2016; Li et al., 2016). Interestingly, this region has also been exploited by cancer-associated mutants of PKMTs. For instance, NSD2’s E1099 is located in its SET-I motif and its E1099K mutant was characterized as a hot-spot cancer mutation with the gain-of-activity of H3K36 methylation (Oyer et al., 2014). Additionally, many mutations of PKMTs have been mapped in their SET-I motifs, implicating their potential roles in alternation of function (Figure 12—figure supplement 2, Supplementary file 1o). In contrast to static X-ray structures, this analysis greatly facilitated the characterization of cancer-associated SETD8 mutants (Figure 12). Among the 20 examined SETD8 mutations, eight deplete the pre-existing conformations of TC and showed the partial loss of activity in comparison with wild-type SETD8 (8 out of 8); eight have neo-conformations with four characterized with the partial loss of methyltransferase activity (4 out of 8); four do not affect the conformational landscape with two characterized for no loss of methyltransferase activity (2 out of 4). Collectively, comparing the conformational landscapes between SETD8 mutations and wild-type TC allows us to predict the methyltransferase activity with 70% accuracy (14 out of 20). However, we could not quantitatively correlate the amounts of the neo- or altered conformations of these SETD8 mutants with their methyltransferase activities. We reason that certain nonnative conformations can still be catalytically active. A significant portion of cancer-associated, loss-of-function SETD8 mutations, though remote from active sites, were revealed to perturb the SET-I motif and thus catalysis allosterically via altering the conformational landscape, which is relevant to the formation of the ternary complex and likely the transition state of native SETD8 (Figure 12). We also discovered significant changes in the connective networks and a large decrease in conformational heterogeneity of SETD8 upon SAM binding (Figures 4 and 5). This finding highlights how SETD8-SAM interactions reshape conformational landscapes. The conformational landscapes of SETD8 thus provide a platform for virtual screening of ligand candidates as inhibitors via exploring different modes of interaction (SAM-competitive, substrate-competitive, covalent or allosteric). Uncovering hidden conformations can thus be essential for developing potent and selective SETD8 inhibitors by targeting these conformations.

Furthermore, it seems feasible that additional simulation effort---if appropriately allocated among poorly-sampled transitions---can produce a statistically precise kinetic model of the conformational dynamics of apo- and SAM-bound SETD8, and that these landscapes could be used to seed simulations for the construction of atomistic models of the rest of the catalytic cycle. Furthermore, the structural information in the resulting models and the kinetic experimental observables could be reconciled using the dynamical fingerprints framework (Noé et al., 2011). This approach can also be used to design new experiments by proposing locations of site-specific labels for optimal experimental probing of the molecular relaxation processes of interest. Future work could therefore furnish a quantitative atomistic explanation of the experimentally observed kinetics.

Additionally, these metastable states could be paired with alchemical free energy calculations (Gapsys et al., 2016) to rapidly assess the impact of point mutations on the populations of each metastable state in each stage of the catalytic cycle to aid the annotation of the functional impact of these mutations. A prerequisite of our approach was the determination of conformationally diverse structures as seeds for molecular simulations. Here, this was achieved with Cys-covalent inhibitors and native ligand depletion because of the lack of conventional structural probes of SETD8. Given the significant interest in exploring PKMT catalysis and developing selective inhibitors to study functions (Luo, 2018), we envision applying similar strategies to other native or disease-associated PKMTs (Nacev et al., 2019).

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Gene (Homo sapiens)	Human SETD8 catalytic domain	Blum et al., 2014	Uniprot: Q9NQR1-1 (positions 232–393)	with an N-terminal 6 × His tag in pHIS2, for in vitro assays and crystallography of APO.
Gene (Homo sapiens)	Human SETD8 catalytic domain	Ma et al., 2014	Uniprot: Q9NQR1-1 (positions 232–393)	with an N-terminal 6 × His tag in pHIS2, for crystallography of BC-Inh2 and BC-SAM
Gene (Homo sapiens)	Human SETD8 catalytic domain	Addgene	Plasmid #51327	with an N-terminal 6 × His tag in pHIS2, for crystallography of BC-Inh1
Strain, strain background (E. coli)	Rosetta 2(DE3)	Novagen	#71400
Strain, strain background (E. coli)	BL21-CodonPlus(DE3)-RIL	Stratagene	#230245
Strain, strain background (E. coli)	BL21 (DE3) V2R-pRARE	SGC
Sequence-based reagent	Forward Primer for K382P	IDT		5'-CTATGGGGACCGCAGCCCGGCTTCCATTGAAGCCC-3'
Sequence-based reagent	Forward Primer for I293G	IDT		5'-CGGGGACCTCATCGAGGGCACCGACGCCAAGAAAC-3'
Sequence-based reagent	Forward Primer for E292G	IDT		5'-CGGGGACCTCATCGGCATCACCGACGCCAAG-3'
Peptide, recombinant protein	H4K20 peptide (10-30)	The Rockefeller University Proteomics Resource Center		NH₂-LGKGGAKRHRKVLRDNIQGIT-COOH
Chemical compound	SAM	Sigma Aldrich	#A2408
Chemical compound	[³H-Me]-SAM	PerkinElmer Life Sciences	#NET155001MC
Commercial assay or kit	UltimaGold	PerkinElmer Life Sciences	#6013327
Software, algorithm	Anaconda Python	Oliphant, 2007; Millman and Aivazis, 2011
Software, algorithm	ARP/wARP	Perrakis et al., 1997; Murshudov et al., 2011
Software, algorithm	AUTOBUSTER	Emsley et al., 2010
Software, algorithm	CCP4 suite	Collaborative Computational Project, Number 4, 1994
Software, algorithm	COOT	Emsley and Cowtan, 2004
Software, algorithm	Ensembler 1.0.5	Parton et al., 2016
Software, algorithm	Folding@home	Shirts and Pande, 2000
Software, algorithm	GRADE	Bruno et al., 2004
Software, algorithm	HKL2000	PMID: 27799103
Software, algorithm	IPython	Perez and Granger, 2007
Software, algorithm	Jupyter Notebook	DOI: 10.3233/978-1-61499-649-1-87
Software, algorithm	KinTek Explorer	Johnson et al., 2009
Software, algorithm	matplotlib 2.2.2	Hunter, 2007
Software, algorithm	MDTraj	McGibbon et al., 2015a
Software, algorithm	MOGUL	Langer et al., 2008
Software, algorithm	MolProbity	PMID: 20057044
Software, algorithm	MOLREP	PMID: 20057045
Software, algorithm	MSMBuilder	Harrigan et al., 2017
Software, algorithm	MSMExplorer 1.1	Harrigan et al., 2017
Software, algorithm	NumPy	https://www.numpy.org
Software, algorithm	OpenMM 6.6.1	Eastman et al., 2013
Software, algorithm	Origin 7.0	OriginLab
Software, algorithm	OriginPro 2018	OriginLab
Software, algorithm	pandas	https://conference.scipy.org/proceedings/scipy2010/pdfs/mckinney.pdf
Software, algorithm	PDBFixer 1.3	https://github.com/pandegroup/pdbfixer
Software, algorithm	PHASER	McCoy, 2007
Software, algorithm	phenix.refine	Adams et al., 2010; Afonine et al., 2012
Software, algorithm	POINTLESS/AIMLESS	Evans and Murshudov, 2013
Software, algorithm	PRODRG	Schüttelkopf and van Aalten, 2004
Software, algorithm	PyEMMA	Scherer et al., 2015
Software, algorithm	PyMOL 1.8.4	Schrödinger, LLC
Software, algorithm	REFMAC	Murshudov et al., 1997
Software, algorithm	seaborn 0.8.1	DOI: 10.5281/zenodo.883859
Software, algorithm	MODELLER 9.16	Sali and Blundell, 1993
Software, algorithm	XDS	Kabsch, 2010
Software, algorithm	XtalView	McRee, 1999

Share this article

Cite this article

Diverse SETD8 conformations captured in altered ligand-binding states.

Superposition of five crystal structures highlighted with detailed views of post-SET, SET-I, and pre-SET motifs.

Data collection and refinement statistics of crystallography.

Construction of conformational landscapes of apo- and SAM-bound SETD8 through diversely seeded, parallel molecular dynamics simulations and Markov state models.

Markov state models and conformational landscapes of apo- and SAM-bound SETD8.

Chord diagrams and representative conformers of macrostates.

Experimental design to probe the conformational landscape of SETD8.

Biochemical characterization of gain-of-function mutations revealed by conformational landscapes of SETD8.

Kinetic and thermodynamic constants of wild-type SETD8 and its rationally designed mutants.

Evaluation of key simulation parameters of molecular simulations.

Contour map of microstate coverage at various combinations of trajectory lengths and numbers as percentage of the maximal trajectory length and number of MD simulations.

Functional annotation of the dynamic conformational landscapes of SETD8.

Computational and experimental characterization of cancer-associated SETD8 mutants.

A 2 ms molecular dynamics trajectory simulated from the HMM of apo-SETD8.

A 2 ms molecular dynamics trajectory simulated from the HMM of SAM-bound SETD8.

Distribution of trajectory lengths of apo-SETD8 simulation.

VAMP-2 scoring results for optimal hyperparameter choice for featurization and tICA.

VAMP-2 scoring results for optimal hyperparameter choice for featurization and tICA.

VAMP-2 scoring results for optimal hyperparameter choice for tICA with dihedral features.

VAMP-2 scoring results for optimal hyperparameter choice for tICA with distance features.

VAMP-2 scoring results for final featurization and microstate number choice.

Implied timescales of the apo-SETD8 Bayesian Markov state models (BMSMs).

Chapman-Kolmogorov test of the apo-SETD8 Bayesian Markov state model (BMSM) using the metastable memberships from the Hidden Markov state model (HMM).

Implied timescales of the apo-SETD8 Bayesian Hidden Markov state models (BHMSMs).

Distribution of the SAM-bound SETD8 simulation trajectory lengths.

Implied timescales of the SAM-bound SETD8 Bayesian Markov state models (BMSMs).

Chapman-Kolmogorov test of the SAM-bound SETD8 Bayesian Markov state model (BMSM) using the metastable memberships from the Hidden Markov state model (HMM).

Implied timescales of the SAM-bound SETD8 Bayesian Hidden Markov state models (BHMSMs).

Comparison of the kinetic complexity of apo- and SAM-bound SETD8.

Rapid-mixing stopped-flow dilution of SAM-bound SETD8.

1H-NMR of MS4138 (Inh1).

13C-NMR of MS4138 (Inh1).

1H-NMR of SGSS05N.

13C-NMR of SGSS05N.

1H-NMR of SGSS05NS (Inh2).

13C-NMR of SGSS05NS (Inh2).

Global fitting analysis of stopped-flow binding experiment into a conformational selection model.

Comparison of macrostate escape kinetics for different His351 tautomers of apo-SETD8.

Distribution of the 24 mutant apo-SETD8 simulation trajectory lengths.

Rate of discovery of new microstates in wild-type (WT) and mutant apo-SETD8 datasets.

Author details

Shi Chen

Contribution

Contributed equally with

Competing interests

Rafal P Wiewiora

Contribution

Contributed equally with

Competing interests

Fanwang Meng

Contribution

Competing interests

Nicolas Babault

Contribution

Competing interests

Anqi Ma

Contribution

Competing interests

Wenyu Yu

Contribution

Competing interests

Kun Qian

Contribution

Competing interests

Hao Hu

Contribution

Competing interests

Hua Zou

Contribution

Competing interests

Junyi Wang

Contribution

Competing interests

Shijie Fan

Contribution

Competing interests

Gil Blum

Contribution

¹H-NMR of MS4138 (Inh1).

¹³C-NMR of MS4138 (Inh1).

¹H-NMR of SGSS05N.

¹³C-NMR of SGSS05N.

¹H-NMR of SGSS05NS (Inh2).

¹³C-NMR of SGSS05NS (Inh2).