Introduction

Supervised molecular dynamics1,2 (SuMD) is an efficient technique for studying ligand-receptor binding and unbinding pathways; here, we present the multiple walker enhancement (mwSuMD) to study a significantly wider range of structural transitions relevant to the drug design. We validated the method by applying it to G protein-coupled receptors (GPCRs), since their inherent flexibility is essential to their function and because these are the most abundant family of membrane receptors in eukaryotes3 and the target for more than one-third of drugs approved for human use4.

Vertebrate GPCRs are subdivided into five subfamilies (Rhodopsin or class A, Secretin or class B, Glutamate or class C, Adhesion, and Frizzled/Taste2) according to function and sequence5,6. X-ray and cryo-electron microscopy (cryo-EM) show that GPCRs possess seven transmembrane (TM) helices connected by three extracellular loops (ECLs) and three intracellular loops (ICLs), with an extended and structured N-terminal extracellular domain (ECD) in all subtypes, but class A. The primary function of GPCRs is transducing extracellular chemical signals into the cytosol by binding and activating four G protein families (Gs/olf, Gi/o, G12/13, and Gq/11) responsible for decreasing (Gi/o) or increasing (Gs/olf) cyclic adenosine-3’,5’-monophosphate (cAMP), and generating inositol-1,4,5-triphosphate (IP3) and diacylglycerol (DAG) to increase Ca2+ intracellular levels (Gq)7.

Many aspects of GPCR pharmacology remain elusive: for example, the structural determinants of the selectivity displayed towards specific G proteins or the ability of certain agonists to drive a preferred intracellular signaling pathway over others (i.e. functional selectivity or bias)8. GPCRs are challenging proteins to characterize experimentally due to their inherent flexibility and the transitory nature of the complexes formed with extracellular and intracellular effectors. Importantly, agonists can allosterically modify the receptor selectivity profile by imprinting unique intracellular conformations from the orthosteric binding site. The mechanism behind these phenomena is one of the outstanding questions in the GPCR field9.

Molecular dynamics (MD) is a powerful computational methodology that predicts the movement and interactions of (bio)molecules in systems of variable complexity, at atomic detail. However, classic MD sampling is limited to the microsecond or, in the best conditions, the millisecond time scale10,11. For this reason, different algorithms have been designed to speed up the simulation of rare events such as ligand (un)binding and conformational transitions. Amongst the most popular and effective ones, are metadynamics12, accelerated MD (aMD)13, and Gaussian-accelerated MD (GaMD)14, which introduce an energy potential to overcome the energy barriers preventing the complete exploration of the free energy surface, thus de facto bias the simulation. Energetically unbiased MD protocols, on the other hand, comprise weighted ensemble MD (weMD)15, swarms approach16, AdaptiveGoal17, and SuMD1,18, which have largely been applied to (unbinding) small molecules, peptides, and small proteins1,1822. Since SuMD is optimized for (un)bindings, we have designed mwSuMD to address more complex conformational transitions and protein-protein associations. GPCRs preferentially couple to very few G proteins out of 23 possible counterparts9,23. It is increasingly accepted that dynamic and transient interactions determine whether the encounter between a GPCR and a G protein results in productive or unproductive coupling24. MD simulations are considered a useful orthogonal tool for providing working hypotheses and rationalizing existing data on G protein selectivity. However, so far, it has not delivered as expected. Attempts have usually employed energetically biased simulations, have been confined to the Gα subunit, or considered a pre-formed GPCR:G protein complex23,2527.

Firstly, we validated mwSuMD on the nonapeptide arginine vasopressin (AVP) by simulating binding (dynamic docking) and unbinding paths from the vasopressin 2 receptor (V2R). Dynamic docking, although more computationally demanding than standard molecular docking, provides insights into the binding mode of ligands in a fully hydrated and flexible environment. Moreover, it informs about binding paths and the complete mechanism of formation leading to an intermolecular complex, delivering in the context of binding kinetics28 and structure-kinetics relationship (SKR) studies29. We then studied the class B1 GPCR glucagon-like peptide-1 receptor (GLP-1R) activation by the small molecule PF06882961. GLP-1R is a validated target in type 2 diabetes and probably the best-characterized class B1 GPCR from a structural perspective. GLP-1R is the only class B1 receptor with structurally characterized non-peptidic orthosteric agonists, which makes it a model system for studying the druggability of the entire B1 subfamily. After GLP-1R agonist binding and activation, the coupling of Gs and the release of GDP, the rate-limiting step of the G protein activation, was simulated for the first time using an energy-unbiased method.

These results demonstrate the usefulness of mwSuMD for illuminating the molecular events involved in GPCR function.

Results and discussion

Short mwSuMD time windows improve the AVP dynamic docking prediction

Arginine vasopressin (AVP) is an endogenous hormone (Figure S1a) that mediates antidiuretic effects on the kidney by signaling through three class A GPCR subtypes: V1a and V1b receptors activate phospholipases via Gq/11, while the V2 receptor (V2R) activates adenylyl cyclase by interacting with Gs 30 and is a therapeutic target for hyponatremia, hypertension, and incontinence31. AVP is amphipathic and in the bound state interacts with both polar and hydrophobic V2R residues located on TM helices and ECLs (Figure S1b). Although AVP presents an intramolecular C1-C6 disulfide bond that limits the backbone’s overall conformational flexibility, it has many rotatable bonds, making dynamic docking complicated32. We compared the performance of mwSuMD to the parent algorithm SuMD in reconstructing the experimental V2R:AVP complex using different settings, simulating a total of 92 binding events (Table S1). As a reference, the AVP RMSD during a classic (unsupervised) equilibrium MD simulation of the X-ray AVP:V2R complex was 3.80 ± 0.52 Å (Figure S2). SuMD1,18 produced a minimum root mean square deviation (RMSD) to the cryo-EM complex of 4.28 Å, with most of the replicas (i.e., distribution’s mode) having an RMSD close to 10 Å (Figure S3a). mwSuMD, with the same settings (Figure S3b, Table S1) in terms of time window duration (600 ps), metric supervised (the distance between AVP and V2R), and acceptance method (slope) produced slightly more precise results (i.e., distribution’s mode RMSD = 7.90 Å) but similar accuracy (minimum RMSD = 4.60). Supervising the AVP RMSD to the experimental complex rather than the distance (Figure S3c) and using the SMscore (Equation 1) as the acceptance method (Figure S3d) worsened the prediction. Supervising distance and RMSD at the same time (Figure S3e), employing the DMscore (Equation 2), recovered accuracy (minimum RMSD = 4.60 Å) but not precision (distribution mode RMSD = 12.40 Å). Interestingly, decreasing the time window duration from 600 ps to 100 ps impaired the SuMD ability to predict the experimental complex (Figure 1a), but enhanced mwSuMD accuracy and precision (Figure 1b-d). The combination of RMSD as the supervised metric and SMscore produced the best results in terms of minimum RMSD and distribution mode RMSD, 3.85 Å and 4.40 Å, respectively (Figure 1d, Video S1), in agreement with the AVP deviations in the equilibrium MD simulation of the X-ray AVP:V2R complex (Figure S2). These results confirm the inherent complexity of reproducing the AVP:V2R complex via dynamic docking and suggest that short time windows can improve mwSuMD performance on this system. However, it is necessary to know the final bound state to employ the RMSD as the supervised metric, while the distance is required to dynamically dock ligands with unknown bound conformation as previously reported1,23. Both distance and RMSD-based simulations delivered insights into the binding path and the residues involved along the recognition route. For example, mwSuMD pinpointed V2R residues E184ECL2, P298ECL3, and E303ECL3 (Figure S4a) as involved during AVP binding, although not in contact with the ligand in the orthosteric complex. None of them are yet characterized through mutagenesis studies according to the GPCRdb33.

AVP SuMD and mwSuMD binding simulations to V2R (100 ps time windows).

For each set of settings (a-d) the RMSD of AVP Cα atoms to the cryo-EM structure 7DW9 is reported during the time course of each SuMD (a) or mwSuMD (b-d) replica alongside the RMSD values distribution and the snapshot corresponding to the lowest RMSD values (AVP from the cryo-EM structure 7DW9 is in a cyan stick representation, while AVP from simulations is in a tan stick representation). A complete description of the simulation settings is reported in Table S1 and the Methods section. The dashed red line indicates the AVP RMSD during a classic (unsupervised) equilibrium MD simulation of the X-ray AVP:V2R complex (Figure S2).

Further to binding, a SuMD approach was previously employed to reconstruct the unbinding path of ligands from several GPCRs2,34. We assessed mwSuMD’s capability to simulate AVP unbinding from V2R. Five mwSuMD and five SuMD replicas were collected using 100 ps time windows (Table S1). Overall, mwSuMD outperformed SuMD in terms of time required to complete a dissociation (Figure S5, Video S2), producing dissociation paths almost 10 times faster than SuMD. Such rapidity in dissociating inherently produces a limited sampling of metastable states along the pathway, which can be compensated by seeding classic (unsupervised) MD simulations from configurations extracted from the unbinding pathway35,36. Here, the V2R residues involved during the dissociation were comparable to the binding (Figure S4b), although ECL2 and ECL3 were slightly more involved during the association than the dissociation, in analogy with other class A and B GPCRs20,35.

PF06882961 binding and GLP-1R activation

The GLP-1R has been captured by cryo-electron microscopy (cryo-EM) in both the inactive apo (ligand-free) and the active (Gs-bound) conformations and in complex with either peptide or non-peptide agonists3742. In the inactive apo GLP-1R, residues forming the binding site for the non-peptide agonist PF06882961 are dislocated and scattered due to the structural reorganization of the transmembrane domain (TMD) and extracellular domain (ECD) (Figure S6) that occurs on activation. Moreover, GLP-1R in complex with GLP-1 or different agonists present distinct structural features, even amongst structurally related ligands (Figure S7). This complicates the scenario and suggests divergent recognition mechanisms amongst different agonists. We simulated the binding of PF06882961 using multistep supervision on different system metrics (Figure 2) to model the structural hallmark of GLP-1R activation (Video S5, Video S6).

mwSuMD simulation of PF06882961 binding to GLP-1R and receptor activation.

Each panel reports the root mean square deviation (RMSD) to the position of the ligand in the active state (top panel) or a GLP-1R structural element, over the time course (all but ECL3 converging to the active state). ECD: extracellular domain; TM: transmembrane helix; ECL: extracellular loop. The mwSuMD simulation was performed with four different settings over 1 microsecond in total. The red dashed lines show the initial RMSD value for reference.

Several metrics were supervised consecutively. Firstly, the distance between PF06882961 and the TMD as well as the RMSD of the ECD to the active state (stage 1); secondly, the RMSD of ECD and ECL1 to the active state (stage 2); thirdly, the RMSD of PF06882961 and ECL3 to the active state (stage 3); lastly, only the RMSD of TM6 (residues I345-F367, Cα atoms) to the active state (stage 4). The combination of these supervisions produced a conformational transition of GLP-1R towards the active state (Figure 2, Video S6). Noteworthy, mwSuMD like any other CV-based technique requires some knowledge of the simulated system. The sequence of these supervisions was arbitrary and does not necessarily reflect the right order of the steps involved in GLP-1R activation. This kind of planned multistep approach is feasible when the end-point receptor inactive and active structures are available, and the inherent flexibility of different domains is known. In class B GPCRs, the ECD is the most dynamic sub-structure, followed by the ECL1 and ECL3 which display high plasticity during ligand binding20,43. For this reason, we first supervised these elements of GLP-1R, leaving the bottleneck of activation, TM6 outward movement, as the last step. However, the protocol employed can be tweaked to study how each conformational transition takes place and influences the receptor domains. Structural elements not directly supervised, such as TM1 or TM7, were influenced by the movement of supervised helixes or loops and therefore displayed an RMSD reduction to the active state. For example, the supervision of ECL3 (stage 3) and TM6 (stage 4) facilitated the spontaneous rearrangement of the ECD to an active-like conformation after the ECD had previously experienced transient high flexibility during stages 2 and 3 (Figure 2). These results suggest a concerted conformational transition for ECD and ECL1 during the binding of PF06882961 and an allosteric effect between ECL3 and the bottom of TM6. While the intracellular polar interactions were destabilized by the ECL3 transition to an active-like conformation (stages 2 and 3), the outward movement of TM6 (stage 4) did not favor the closure of ECL3 towards PF06882961, which appears to be driven by direct interactions between the ligand and R3105.40 or R3807.35. Interestingly, the mwSuMD simulation during Stage 4 (TM6 supervision) sampled a counterclockwise helix rotation (Figure S8a) consistent with the GLP-1R cryo-EM structures in the active, Gs-coupled state40,44.

It is worth noting that 6LN2, the only inactive GLP-1R structure available complete with the ECD, was stabilized and solved with an antibody bound to the ECD. This strategy might have forced the ECD into a closed conformation that engages the EC vestibule of GLP-1R and possibly restrained the whole TMD in an altered conformation that deviates from the physiological conditions. This might explain why the RMSD of the TMD elements monitored during the simulation rarely reach values lower than 3 or 4 Å.

During the supervision of ECL3 and PF06882961 (stage 3), we observed a loosening of the intracellular polar interactions that stabilize GLP-1R TM6 in the inactive state. As a result, the subsequent supervision of TM6 (residues I345-F367, Cα atoms) rapidly produced the outward movement of TM6 towards the active state, in the last step of the mwSuMD simulation (stage 4). A more detailed analysis revealed that the central polar network, which is pivotal for mediating GLP-1 signalling45, and the residues at the TM6 kink level adopted active-like conformations during the final stage of the simulation (Figure 3a). In particular, the central polar network (E3646.53, H3636.52, and Q3947.49) experienced side chain rearrangements (Figure 3b) while extensive TM6 kink dislocation occurred at L3606.49, P3586.48, L3576.47 and I3576.46 (Figure 3c).

GLP-1R key structural motifs during mwSuMD GLP-1R activation.

a) RMSD to the active state GLP-1R (7LCI) of the residues forming the central polar network (magenta) and TM6 kink (orange) during mwSuMD of receptor activation; at the end of the simulations minimum values were reached (dashed square). b) The position of the polar network within the core of TMD (left-hand panel) and comparison between inactive, active and mwSuMD final states for the side chains of the residues forming of the polar network; c) The position of the TM6 kink (right-hand panel) and comparison between inactive, active and mwSuMD final states for the side chains of the residues forming the TM6 kink.

Gs protein binding to GLP-1R and GDP release

We then focused on simulating the Gs binding to GLP1-R, after activation, without energy input. For this purpose, we first tested the binding between the prototypical class A receptor β2 adrenoreceptor (β2 AR), and the stimulatory G protein (Gs) (Video S3, Figure S9a,b) by supervising the distance between Gs helix 5 (α5) and β2AR as well as the RMSD of the intracellular end of TM6 to the fully active state of the receptor (see Supplementary Methods). During two out of three replicas, both Gα and Gβ achieved distance values close to 5 Å (minimum RMSD = 3.94 Å and 3.96 Å respectively), in good agreement with the reference (the β2 AR:Gs complex, PDB 3SN6, Figure S9c). A possible pitfall is that G proteins bear potential palmitoylation and myristoylation sites that anchor the inactive trimer to the plasma membrane46,47, de facto restraining possible binding paths to the receptor. To address this point and test different conditions, we prepared the adenosine A1 receptor (A1R), and its principal effector, the inhibitory G protein (Gi) considering the G residue C3 and Gγ residue C65 as palmitoylated and geranylgeranylated respectively and hence inserted in the membrane. Recently, the Gi binding to A1R was simulated by combining the biased methods GaMD with SuMD48 but without considering membrane-anchoring post-translational modifications. Both classic (unsupervised) and mwSuMD simulations were performed on this system for comparison (Video S4, Figure S9d). In about 50 ns of mwSuMD, the G subunit engaged its intracellular binding site on A1R and formed a complex in good agreement with the cryo-EM structure (PDB 6D9H, RMSD β 5 Å). For comparison, 1 μs of cMD did not produce a productive engagement as the G remained at RMSD values > 40 Å (Figure S9d), suggesting the effectiveness of mwSuMD in sampling G protein binding rare events without the input of energy. The membrane anchoring affected the overall Gi binding and the final complex, which was rotated compared to the experimental structure due to the lipidation of G and Gγ (Figure S9e).

Encouraged by results obtained on Gs and Gi binding to β2 AR and A1R (Figure S9), we extracted the GLP-1R active conformation described above and simulated the Gs binding to its intracellular side. Starting from the inactive, membrane-anchored Gs, we performed three independent mwSuMD replicas by supervising the distance between Gs helix α5 and GLP-1R residues located at the intracellular binding interface (Figure 4a,e). All three mwSuMD replicas showed the Gs approaching GLP-1R, with two out of three reaching an RMSD of the G subunit close to or less than 10 Å, compared to the experimental complex 7LCI (Figure 4e). Replica 2, in particular, well reproduced the cryo-EM GLP-1R:Gs complex as suggested by RMSDs to 7LCI of 7.59 ± 1.58 Å, 12.15 ± 2.13 Å, and 13.73 ± 2.24 Å for Gα, Gβ and Gγ respectively. Such values are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with Gs and GLP-149 (Gα = 6.18 ± 2.40 Å; Gβ = 7.22 ± 3.12 Å; Gγ = 9.30 ± 3.65 Å), which indicates overall higher flexibility of Gβ and Gγ compared to Gα, which acts as a sort of fulcrum bound to GLP-1R.

GLP-1R activation and Gs binding.

a-d) sequence of simulated events during the mwSuMD Gs:GLP-1R simulations. e) RMSD of Gsα to the experimental GLP-1R:Gs complex (PDB 7LCJ) during three mwSuMD replicas; the RMSD to the experiential bound conformation (7LCJ) during the second part of Replica2 (red dashed line) is reported for each Gs subunit. RMSDs were computed on Gα residues 11 to 43 and 205 to 394 to the experimental structure 7LCI after superimposition on GLP-1R residues 140-240 Cα atoms. f) RMSD of the αG-α4 loop (purple), the distance between K3426.31 and D323 (salmon), and the distance between GDP and D295 (green) during Gs binding, AHD opening and GDP dissociation; g) and h) comparison between states extracted from before and after AHD opening. Before AHD opening (a), GLP-1R ICL3 interacted with D323 and D295 interacted with GDP; after AHD opening (b) and αG-α4 loop reorganization (curved red arrow), αG and D295 moved away from GDP (straight red arrow), destabilizing its binding to Gs.

According to the model of G protein activation, the G protein binding has the effect of allosterically stabilizing the orthosteric agonist in complex with the receptor49 and destabilizing the guanosine-diphosphate (GDP) bound to Gα, triggering its release and exchange with guanosine-triphosphate (GTP)50, upon opening of the G protein alpha-helical domain (AHD). Following this model, PF06882961 and GDP were respectively stabilized and destabilized during the simulated Gs association (Figure S8b,c). The analysis of atomic contacts along the binding path of Gs to GLP-1R highlights a few persistent interactions not observed in the equilibrium MD simulations of GLP-1R:Gs cryo-EM complexes49; for example, we propose the hidden interaction between D3446.33 and R385α5 to be important for Gs coupling (Table S2), which would explain the GLP-1 EC50 reduction upon mutation of position 344 to Ala51.

Extending Replica 2, we further investigated the Gs activation mechanism by supervising the opening of the Gs alpha-helical domain (AHD), which is considered a necessary step to allow GDP release from the Ras-like domain52. We first easily obtained the opening of AHD (Figure 4b) and successively supervised the GDP unbinding in a further three replicas, seeded after the AHD opening. In one of these three mwSuMD simulations, the nucleotide dissociated from Gs (Figure 4d). Video S7 shows the full Gs binding, AHD opening, and GDP release. mwSuMD suggested several structural changes as implicated in GDP dissociation (Figure 4f-h): i) the AHD opening; ii) the conformational change of αG-α4 loop (residues A303-P332), in concert with a loosening of interactions between D323 and K3426.31, and iii) the rupture of the hydrogen bond between GDP and D295 triggered by the movement of αG away from the GDP binding site (Figure 4f). The involvement of αG through the αG-α4 during the release of GDP from Gs is supported by hydrogen/deuterium exchange experiments53, while the role of D275 has been probed with functional assays on the Gi isoform mutant D272GαiA54. Moreover, a similar αG behavior was recently suggested by MD simulations of the Gs binding pathway to β2AR55. We also note that the αG-α4 loop length and aminoacidic composition diverge among G protein isoforms, further suggesting a role in G protein selectivity consistent with the hypothesis that, in different G proteins, distinct domains of the Gα subunit could be responsible for receptor selectivity56. However, a more subtle allosteric communication through internal structural elements like the β2-β3 strands, prompted by α5 tilting, could have weakened GDP phosphate binding as previously suggested by other groups5759. Interestingly, no significant conformational changes of the β6-α5 loop happened before or during the GDP dissociation, suggesting that its conformational change as captured in the nucleotide-free GLP-1R:Gs complex (Figure S10) occurs after the GDP release as a result of the loss of binding stabilization, rather than being an initiator of the GDP dissociation.

Discussion

Classic MD simulations sample phase space with an efficiency that depends on the energy barrier between neighboring minima. Processes like (un)binding and protein activation require the system to overcome numerous energy barriers, some creating bottlenecks that slow the transition down to the millisecond, or second, time scale. To reduce some of these limits, we have developed, and tested on complex structural events characterizing GPCRs, an energetically unbiased adaptive sampling algorithm, namely multiple walker SuMD, which is based on traditional SuMD, while drawing on parallel multiple replica methods60,61,

Our simulations propose that remarkable predictivity can be obtained with distance-driven mwSuMD, as demonstrated by the lowest deviation from the experimental AVP:V2R complex. The dissociation of AVP from V2R was simulated much more rapidly by mwSuMD than by SuMD, suggesting it is an efficient tool for studying the dissociation of ligands from GPCRs. This is due to the more extensive sampling obtainable by seeding multiple parallel short simulations instead of a single simulation for batch.

mwSuMD performed similarly to SuMD for the dynamic docking of AVP to V2R when time windows of 600 ps were employed. Time windows of 100 ps remarkably improved mwSuMD. Usually, dynamic docking is performed to either predict the geometry of complexes or sample the binding path of an already-known intermolecular complex, or both. The RMSD of AVP to the experimental coordinates as the supervised metric produced the best results. Consequently, the RMSD should be the metric of choice to study the binding path of well-known intermolecular complexes. The distance, on the other hand, is necessary when limited structural information about the binding mode is available. In the absence of structural information regarding the final bound state, it is possible to sample numerous binding events employing mwSuMD and evaluate the final bound states rank by applying end-point free energy binding methods like the molecular mechanics energies combined with the Poisson– Boltzmann or generalized Born and surface area continuum solvation (MM/PBSA and MM/GBSA62) models.

We increased the complexity of binding simulations by considering GLP-1R and the non-peptide agonist PF06882961. Using mwSuMD, we obtained a binding of PF06882961in good agreement with the cryo-EM structure, followed by an active-like conformational transition of GLP-1R. The choice of the metrics supervised was driven by the structural data available40 and extensive preparatory MD simulations. However, binding routes are possible from either the bulk solvent or the membrane35,63,64. These results show the power of the mwSuMD method, indicating that future applications could include ligand binding from the membrane, or alternative apo receptor conformations to improve the sampling for more difficult receptors.

mwSuMD enabled us to simulate the Gs binding to the active GLP-1R and the subsequent GDP release. Our results suggest a concerted effect on GDP binding produced by AHD opening, αG-α4 loop rearrangement, and αG shift away from the GDP site. The full rotation and elongation of α5 as in cryo-EM structures would occur after the GDP release, supporting the role of hidden, metastable interactions as the driving force of G protein coupling and selectivity, as per recent work on GLP-1R51.

We stress that a complete understanding of a complex molecular event like G protein coupling requires the collection of numerous G protein binding paths and GDP dissociation events. mwSuMD is designed to yield a mechanistic description of structural events. For this reason, it integrates well with mutagenesis and kinetics experiments. We note that mwSuMD trajectories, since they describe the sequential states along a transition pathway, can represent a precious backbone for further MD sampling aimed at quantifying or predicting the kinetics of the transition. Approaches such as path collective variable metadynamics65, Markov State Models66, or machine learning models67,68 can be informed by mwSuMD. We will address these novel opportunities created by mwSuMD in future work.

In summary, we showcased the applicability domain of mwSuMD to key, but scarcely understood, aspects of GPCR structural biology, pharmacology, and drug design hitherto unaddressed by unbiased simulations. Given the generality and simplicity of its implementation, we anticipate that mwSuMD can be employed to study a wide range of structural phenomena characterizing both membrane and cytosolic proteins. mwSuMD is an ongoing project updated on https://github.com/pipitoludovico/mwSuMD_multiEngine and in the recent implementation can exploit ACEMD69, NAMD70, GROMACS71 and OPENMM72 as graphic processing units (GPU)-based MD engines while offering the option to run also on CPUs with GROMACS.

Methods

Multiple walker SuMD (mwSuMD) protocol

The supervised MD (SuMD) is an adaptive sampling method73 for speeding up the simulation of binding events between small molecules (or peptides74,75) and proteins1,18 without the introduction of any energetic bias. Briefly, during SuMD, a series of short unbiased MD simulations are performed, and after each simulation, the distances between the centers of mass (or the geometrical centers) of the ligand and the predicted binding site (collected at regular time intervals) are fitted to a linear function. If the resulting slope is negative (showing progress towards the target) the next simulation step starts from the last set of coordinates and velocities, otherwise, the simulation is restarted by randomly assigning the atomic velocities.

mwSuMD is designed to increase the sampling from a specific configuration by seeding user-decided parallel replicas (walkers) rather than one short simulation as per SuMD. Since one replica for each batch of walkers is always considered productive, mwSuMD gives more control than SuMD on the total wall-clock time used for a simulation. On the flip side, to maximise mwSuMD it is optimal to assign one walker per GPU, requiring multiple GPUs to be effective. However, modern multi-threaded GPUs can still employ mwSuMD with a smaller cost in GPU performance. In the implementation for ACEMD used in this work, mwSuMD needs as input the initial coordinates of the system as a pdb file, the coordinates, and the atomic velocities of the system from the equilibration stage, the topology file of the system, and all the necessary force field parameters. The user can decide to supervise one (X) or two metrics (X’, X’’) of the simulated system over short simulations seeded in batches, called walkers. In the former case, either the slope of the linear function interpolating the metric values or a score can be adopted to decide whether to continue the mwSuMD simulation. When the user decides to supervise two metrics, then a specific score is used. In the present work, distances between centroids, RMSDs, or the number of atomic contacts between two selections were supervised (Table S1). The choice of the metrics is system and problem-dependent, as the RMSD is most useful when the final state is known, while the distance is required when the target state is unknown; details on the scores are given below. The decision to restart or continue mwSuMD after any short simulation is postponed until all the walkers of a batch are collected. The best short simulation is selected and extended by seeding the same number of walkers, with the same duration as the step before.

For each walker, the score for the supervision of a single metric (SMscore) is computed as the square root of the product between the metric value in the last frame (Xlast frame) and the average metric value over the short simulation (X̅):

If the metric is set to decrease (e.g. binding or dimerization) the walker with the lowest SMscore is continued, otherwise (e.g. unbinding or outwards opening of domains), the walker with the highest score is continued. Using the SMscore rather than the slope should give more weight to the final state of each short simulation, as it is the starting point for the successive batch of simulations. Considering the average of the metric should favor short simulations consistently evolving in the desired direction along the metric.

If both X’ and X’’ are set to increase during the mwSuMD simulations, the score for the supervision of two metrics (DMscore) on each walker is computed as follows:

Where X’last frame and X’’last frame are the metrics values in the last frame, while X̅batch walkers and X̅’’batch walkers represent the average value of the two metrics over all the walkers in the batch. Subtracting the value 1 to the metric ratio ensures that if one of the two metrics from the last frame (X’last frame or X’’last frame) is equal to the average (X̅’’batch walkers or X̅’’batch walkers) then that metric addend is null and DMscore depends only on the remaining metric. If any of the two metrics is set to decrease, then the corresponding component in Equation 2 is multiplied by -1 to maintain a positive score. Considering the average value of the two metrics over all the walkers rather than only over the considered walker should be more representative of the system evolution along the defined metric. In other words, the information about the metric is taken from all the walkers to better describe the evolution of the system.

The DMScore is designed to preserve some degree of independence between the two metrics supervised. Indeed, if the variation of one of them slows down and gets close to zero, the other metric is still able to drive the system’s evolution. It should be noted that DMScore works at its best if the two metrics have similar variations over time, as in the case of distance and RMSD (both of which are distance-based). Notably, differently from SuMD, when a walker is extended by seeding a new batch of short simulations and the remaining walkers are stopped, the atomic velocities are not reassigned. This allows the simulations to be as short as a few picoseconds if desired, without introducing artifacts due to the thermostat latency to reach the target temperature (usually up to 10-20 ps when a simulation is restarted reassigning the velocities of the atoms).

The current implementation of mwSuMD is for python3 and exploits MDAnalysis76 and MDTRaj77 modules.

Force field, ligands parameters, and general systems preparation

The CHARMM3678,79/CGenFF 3.0.18082 force field combination was employed in this work. Initial ligand force field, topology and parameter files were obtained from the ParamChem webserver80. Restrained electrostatic potential (RESP)83 partial charges were assigned to all the non-peptidic small molecules but adrenaline and guanosine-5’-diphosphate (GDP) using Gaussian09 (HF/6-31G* level of theory) and AmberTools20.

Six systems were prepared for MD (Table S1). Hydrogen atoms were added using the pdb2pqr84 and propka85 software (considering a simulated pH of 7.0); the protonation of titratable side chains was checked by visual inspection. The resulting receptors were separately inserted in a 1-palmitoyl-2-oleyl-sn-glycerol-3-phosphocholine (POPC) bilayer (previously built by using the VMD Membrane Builder plugin 1.1, Membrane Plugin, Version 1.1. at: http://www.ks.uiuc.edu/Research/vmd/plugins/membrane/), through an insertion method86. Receptor orientation was obtained by superposing the coordinates on the corresponding structure retrieved from the OPM database87. Lipids overlapping the receptor transmembrane helical bundle were removed and TIP3P water molecules88 were added to the simulation box employing the VMD Solvate plugin 1.5 (Solvate Plugin, Version 1.5. at <http://www.ks.uiuc.edu/Research/vmd/plugins/solvate/). Finally, overall charge neutrality was reached by adding Na+/Cl- counter ions up to the final concentration of 0.150 M), using the VMD Autoionize plugin 1.3 (Autoionize Plugin, Version 1.3. at <http://www.ks.uiuc.edu/Research/vmd/plugins/autoionize/).

System equilibration and general MD settings

The MD engine ACEMD369 was employed for both the equilibration and productive simulations. The equilibration was achieved in isothermal-isobaric conditions (NPT) using the Berendsen barostat89 (target pressure 1 atm) and the Langevin thermostat90 (target temperature 300 K) with low damping of 1 ps-1. For the equilibration (integration time step of 2 fs): first, clashes between protein and lipid atoms were reduced through 1500 conjugate-gradient minimization steps, then a positional constraint of 1 kcal mol-1 Å-2 on all heavy atoms was gradually released over different time windows: 2 ns for lipid phosphorus atoms, 60 ns for protein atoms other than alpha carbon atoms, 80 ns for alpha carbon atoms; a further 20 ns of equilibration was performed without any positional constraints.

Productive trajectories (Table S1) were computed with an integration time step of 4 fs in the canonical ensemble (NVT). The target temperature was set at 300 K, using a thermostat damping of 0.1 ps-1; the M-SHAKE algorithm91,92 was employed to constrain the bond lengths involving hydrogen atoms. The cut-off distance for electrostatic interactions was set at 9 Å, with a switching function applied beyond 7.5 Å. Long-range Coulomb interactions were handled using the particle mesh Ewald summation method (PME)93 by setting the mesh spacing to 1.0 Å.

Vasopressin binding simulations

The vasopressin 2 receptor (V2R) in complex with vasopressin (AVP) and the Gs protein94 was retrieved from the Protein Data Bank95 (PDB 7DW9). The Gs was removed from the system and the missing residues on ECL2 (G185-G189) were modeled from scratch using Modeller 9.1996, considering the solution with the lowest DOPE score out of ten conformations produced. AVP was placed away from V2R in the extracellular bulk and the resulting system was prepared for MD simulations and equilibrated as reported above.

During SuMD simulations, the distance between the centroids of AVP residues C1-Q4 (backbone and side chains), anticipated to bind deep into V2R, and the V2R residues lining the peptide binding site Q96, Q174, Q291, and L312 (Cα atoms only) was supervised over time windows of 600 ps or 100 ps (Table S1).mwSuMD simulations considered the same distance, the RMSD of AVP residues C1-Q4 to the experimental bound complex or the combination of the two during time windows of 600 ps (3 walkers) or 100 ps (10 walkers) (Table S1). Slope, SMscore, or DMscore (see Methods section mwSuMD protocol) was used in the different mwSuMD replicas performed (Table S1). Simulations were stopped after 300 ns (time window duration = 600 ps) or 50 ns (time window duration = 100 ps) of total SuMD or mwSuMD simulation time.

Vasopressin unbinding simulations

The V2R:AVP complex was prepared for MD simulations and equilibrated as reported above. During both SuMD and mwSuMD simulations (Table S1), the distance between the centroids of AVP residues C1-Q4 (backbone and side chains) and V2R residues Q96, Q174, Q291, and L312 (Cα atoms only) was supervised over time windows of 100 ps (10 walkers seeded for mwSuMD simulations). Replicas were stopped when the AVP-V2R distance reached 40 Å.

GLP-1R:PF06882961 binding simulations

The inactive, ligand-free glucagon-like peptide receptor (GLP-1R) was retrieved from the Protein Data Bank95 (PDB 6LN2)97. Missing residues in the stalk (129-134) and ICL2 (256 to 263) were modeled with Modeller 9.19, considering the solutions with the lowest DOPE score out of ten conformations produced. The PF06882961 initial conformation was extracted from the complex with the fully active GLP-1R44 (PDB 7LCJ) and placed away from GLP-1R in the extracellular bulk. The resulting system was prepared for MD simulations and equilibrated as reported above. CGenFF dihedral force field parameters of PF06882961 with the highest penalties (dihedrals NG2R51-CG321-CG3C41-CG3C41 (penalty=143.5) and NG2R51-CG321-CG3C41-OG3C51 (penalty=152.4)) were optimized (Figure S11) employing Gaussian09 (geometric optimization and dihedral scan at HF/6-31g(d) level of theory) and the VMD force field toolkit plugin98.

Four classic MD replicas, for a total of 8 μs, were performed on the inactive, ligand-free receptor (prepared for MD simulations and equilibrated as reported above) to assess the possible binding path to the receptor TMD and therefore decide the initial position of PF06882961 in the extracellular bulk of the simulation box. A visual inspection of the trajectories suggested three major conformational changes that could allow ligand access to the TMD (Figure S12). Transitory openings of the ECD (distance Q47ECD - S310ECL2), TM6-TM7 (distance H3636.52 - F3907.45), and TM1-ECL1 (distance E1381.33 and W214ECL1) were observed. Since the opening of TM1-ECL1 was observed in two replicas out of four, we placed the ligand in a favorable position for crossing that region of GLP-1R.

mwSuMD simulations (Table S1) were performed stepwise to dock the ligand within GLP-1R first and then relax the receptor towards the active state. The PF06882961 binding was obtained by supervising at the same time the distance between the ligand’s heavy atoms centroid and the centroid of GLP-1R TM7 residues L3797.34-F3817.36 (Cα atoms only), which are part of the orthosteric site, and the RMSD of the ECD (residues W33ECD-W120ECD, Cα atoms only) to the active state (PDB 7LCJ) until the former distance reached 4 Å. In the second phase of mwSuMD, the RMSD of the ECD (residues W33ECD-W120ECD, Cα atoms only) and the ECL1 to the active state (PDB 7LCJ, Cα atoms of residues M2042.74-L2243.27) were supervised until the latter reached less than 4 Å. During the third phase, the RMSD of PF06882961, as well as the RMSD of ECL3 (residues A3686.57-T3787.33, Cα atoms), were supervised until the former reached values lower than 3 Å. In the last mwSuMD step, only the RMSD of TM6 (residues I3456.34-F3676.56, Cα atoms) to the active state (PDB 7LCJ) was supervised until less than 5 Å. RMSDs were computed after superimposition on TM2, ECL1, and TM3 residues 170-240 (Cα atoms), which is the GLP-1R less flexible part49.

Membrane-anchored Gs protein:GLP-1R simulations and GDP dissociation

The PDB 6EG8 was processed through Charmm-GUI99 to palmitoylate residue C3Gαi and geranylgeranylate residue C65. The resulting system was inserted into a 120 x 120 Å POPC membrane and previously built by using the VMD Membrane Builder plugin 1.1, Membrane Plugin, Version 1.1. at: http://www.ks.uiuc.edu/Research/vmd/plugins/membrane. Lipids overlapping the palmitoyl and geranylgeranyl groups were removed and TIP3P water molecules88 were added to the simulation box by means of the VMD Solvate plugin 1.5 (Solvate Plugin, Version 1.5. at <http://www.ks.uiuc.edu/Research/vmd/plugins/solvate/). Finally, overall charge neutrality was reached by adding Na+/Cl- counter ions up to the final concentration of 0.150 M), using the VMD Autoionize plugin 1.3 (Autoionize Plugin, Version 1.3. at <http://www.ks.uiuc.edu/Research/vmd/plugins/autoionize/). The first stage of equilibration was performed as reported above (Methods section System equilibration and general MD settings) for 120 ns, followed by a second stage in the NVT ensemble for a further 1 μs without any restraints to allow the membrane-anchored heterotrimeric Gs protein to stabilize within the intracellular side of the simulation box. After this two-stage long equilibration, GLP-1R from the final frame of the activation simulation (in complex with PF06882961) was manually inserted into the equilibrated membrane above the Gs protein using the corresponding structure retrieved from the OPM database as a reference, and the system further equilibrated for 120 ns as reported above (Methods section System equilibration and general MD settings). The GLP-1R-Gs system was then subjected to three simulations (Table S1). Each mwSuMD replica was interrupted by 500 ns of classic MD twice, to relax the system during the transition. In the supervised stages, the distance between residues M386-L394 Gαs (all-atoms centroid) of helix 5 (α5) and the GLP-1R intracellular residues R1762.46, R3486.37, S3526.41, and N4057.60 (Cα atoms only) was monitored, seeding three walkers of 200 ps each.

The AHD opening was simulated starting from the GLP-1R:Gs binding mwSuMD replica with the final lowest Gs RMSD, the lowest PF06882961 binding energy and the highest GDP binding energy (Replica 2 in Figure 4e) by supervising the distance between AHD residues G70-R199 Gαs and K300-L394Gαs (all-atoms centroids) during three walkers of 100 ps each. 300 ns of classic MD was performed to relax the system. Finally, the GDP unbinding was supervised as the distance between GDP (all-atoms centroid) and residues E50Gαs, K52Gαs, T55Gαs, K293Gαs, and V367Gαs (Cα atoms only) of Gαs; five walkers were used in a 50 ps long mwSuMD simulations.

MD Analysis

Interatomic distances were computed through MDAnalysis76; root mean square deviations (RMSD) were computed using VMD100 and MDAnalysis76. Interatomic contacts and ligand-protein hydrogen bonds were detected using the GetContacts scripts tool (https://getcontacts.github.io), setting a hydrogen bond donor-acceptor distance of 3.3 Å and an angle value of 120° as geometrical cut-offs. Contacts and hydrogen bond persistency are quantified as the percentage of frames (over all the frames obtained by merging the different replicas) in which protein residues formed contacts or hydrogen bonds with the ligand.

The MMPBSA.py101 script, from the AmberTools20 suite (The Amber Molecular Dynamics Package, at http://ambermd.org/), was used to compute molecular mechanics energies combined with the generalized Born and surface area continuum solvation (MM/GBSA) method or the molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) approach, after transforming the CHARMM psf topology files to an Amber prmtop format using ParmEd (documentation at <http://parmed.github.io/ParmEd/html/index.html). Supplementary Videos were produced employing VMD and avconv (at https://libav.org/avconv.html). Molecular graphics images were produced using the UCSF Chimera102 (v1.14).

Numbering system

Throughout the manuscript, the Ballesteros-Weinstein residues numbering system for class A103 and the Wootten residues numbering system for class B GPCRs104 are adopted.

Associated content

All the MD trajectories (stripped of POPC, water molecules, and ions) and topology files (psf and pdb) are available here:

https://zenodo.org/record/7944479

The mwSuMD software version used in this study is available at:

https://github.com/pipitoludovico/mwSuMD

Acknowledgements

GD is a member of the GPCRs-focused European COST action ERNEST. CAR is grateful for a Royal Society Industry Fellowship. GD and CAR are grateful for support from the BBSRC (BB/W016974/1) and Diabetes UK (BDA 20/0006307).

Additional information

Competing interest

All authors declare that they have no conflicts of interest.

Author contributions

GD and CAR conceived and supervised the project; GD designed and implemented the software and planned the simulations; GD, LP, RMR, TW, and PG carried out the simulations; GD analyzed the data; GD, AC, SM, and CAR interpreted the results, GD wrote the manuscript with input from CAR, AC, and SM; all the authors edited and reviewed the final version of the manuscript.