Introduction

Intrinsically disordered proteins (IDPs) comprise a class of proteins that lack a unique, well-defined three-dimensional structure under physiological conditions. Defying the classical structure-function paradigm, 1,2 they are involved in crucial cellular processes such as regulation of cell cycle signalling and act as hubs in interaction networks mainly due to their promiscuous binding nature.35 Emergent studies present disordered proteins as an important component of biomolecular condensates that play regulatory roles in several key cellular functions.69

Considering the broad range of functionality of IDPs, their disregulation and altered abundance can lead to a number of diseases including cancer, diabetes, cardiovascular and neurodegenerative disorders.1,2,10 In rational structure-based drug design, exploration of the target proteins is a crucial step based on which the interactions of small molecules are optimized to have stabilising interactions with a structured ligand-binding pocket11,12 and is typically applied to folded proteins. IDPs, however, are considered undruggable since they exist as an ensemble of conformations that precludes the usage of these traditional drug-design strategies. 13 Nevertheless, a few monomeric disordered proteins such as the oncogenic transcription factor c-Myc, 14 Abeta42,1519 Tau,20 PTP1B21 and osteopontin22 have been successfully targeted by small molecules.23

The IDP of this article’s interest, alpha-synuclein (αS) has remained an important drug-target2427 owing to its long-rooted association with neurodegenerative diseases. Often, the early aggregation stages of αS into small oligomers is implicated in the causation of Parkinson’s disease2831 and a potential therapeutic strategy has been the stabilization of αS in soluble monomeric form. In particular, the small molecule fasudil (focus of the present investigation) has been shown to interact with αS and retard its aggregation. 26 The anti-aggregative effect of fasudil on αS assembly was tested both in vivo and in vitro and it was found to significantly reduce αS aggregation in both scenarios. Characterisation of the effect of small molecule binding on the ensemble properties of IDPs and the molecular details of the stabilizing interactions can provide the basis for selectivity and can be effectively calibrated to induce the desired effect to alleviate disease conditions. In particular, latest effort by Robustelli and coworkers32 had produced an unprecedentedly long-time scale (1500 µs) all-atom molecular dynamics (MD) simulation trajectory of small molecule drug fasudil with monomeric αS, in which the predicted protein-fasudil interactions were found to be in good agreement with previously reported NMR chemical shift data. However, a routine comparison of monomeric αS ensemble in neat water33 with that in presence of fasudil does not bring out significant difference that is a customary signature of the dynamic IDP ensemble.

Characterization of the peculiar nature of IDPs such as αS using the conventional experimental and computational approaches have proven to be challenging. IDPs can be best described by an ensemble of conformations potentially sampled by the protein along with their associated statistical/thermodynamic weights. While biophysical methods can provide an ensemble averaged conformation of an IDP, computer simulations can extensively sample the conformational ensembles of IDPs at atomic resolutions. However, biophysical methods lack the potential to provide an ensemble view and there are inherent limitations of force fields used in computer simulations to model IDPs in agreement with experiments. Integrative approaches are therefore gaining importance in IDP studies3337 and have also been employed in studies of small-molecule binding to IDPs.19,32 In recent years, Markov State Models (MSMs) have been exploited to elucidate a states-and-rates view of biomolecular dynamics i.e. finding the kinetically relevant states and the transition rates between them.3840 The first step in the process of building a kinetic model / MSM of any complex biomolecular ensemble is to identify a suitable metric or collective variable (CV) that efficiently captures its conformational space. CVs such as torsion angles, distances, chemical shifts and so forth, can characterize the complex molecular motion in a high number of dimensions. However, this presents a challenge to effectively analyze trajectories generated from MD simulation. In order to address this, dimension reduction is performed to obtain a set of slow variables that can be further subjected to clustering into kinetically discrete metastable states. Since the high-dimensional input features describing the heterogeneous conformational landscape of IDPs can have non-linear relationships, non-linear dimension reduction techniques are most suitable. Recent advances in the field of machine learning have shown better results in capturing such non-linear relationships.41 In this study, we employed a deep neural network, specifically β-variational autoencoder42,43 to perform dimension reduction and further utilized this machine-learnt latent dimension for building MSM in order to better understand the conformational landscape of IDPs in presence and in absence of a small molecule.

In the current work, to elucidate the small-molecule mediated modulatory effect on αS ensemble, we harnessed deep-learning based β-Variational AutoEncoder (β-VAE) and Markov State Models (MSM) to dissect αS ensemble in water and in aqueous fasudil solution. A projection of the atomistically simulated conformation ensemble of αS in its apo state and in presence of Fasudil, along their respective latent space describes the conformational land-scape, which is further refined by developing individual MSMs. Our result reveals that, compared to the macrostates identified in water, the presence of fasudil led to an increase in the number of metastable states. Characterisation of small-molecule interaction with the protein indicates that fasudil has differential interactions with each of these macrostates. These macrostates exhibit clear distinctions as evident from their Cα contact maps which was further analysed using a convolutional VAE. An entropic perspective at the global (whole protein) and local (residue) level reveals that fasudil binding can potentially trap the protein and the resultant states may be disordered or ordered in nature. The effect is mainly manifested at the backbone level with a decrease in entropy at the fasudil binding hotspots in some states. The observations reported in this study demonstrate how the binding of a small molecule, fasudil, modulates the conformational properties of the disordered protein, αS, manifested as a change in the backbone entropy of the fasudil binding hotspots.

Results and Discussion

1. Modulation of free energy landscape of αS in presence of fasudil

Simulations of αS monomer in the presence of the small molecule fasudil and 50 mM NaCl have been reported in a previous study by Robustelli et al. 32 This study showed no large-scale differences between the bound and unbound states of αS. In our study, in order to compare the αS-fasudil ensemble with an apo ensemble at same salt concentration, we spawned multiple all-atom MD simulations of αS in neat water 50 mM NaCl from multiple starting conformations to generate a cumulative ensemble of 62 µs.

In a clear departure from the classical view of ligand binding to a folded globular protein, the visual change in αS ensemble due to the presence of small molecule is not so strikingly apparent. In order to understand the underlying ensemble modulatory effect of the small-molecule binding events, the complex and fuzzy conformational ensemble of αS (see Figure 1a) needs to be delineated using a suitable spatial and temporal decomposition into its key sub-ensembles or metastable states. This prompted us to analyse the two ensembles using the framework of Markov State Model (MSM). As a suitable input feature to build the MSM, we estimated the set of inter-residue Cα pairwise distances, as this feature largely incorporates the conformational space of αS. In a standard MSM analysis, this high-dimensional input feature requires further processing such as dimensionality reduction, before discretisation using clustering for the construction of an MSM.

(a) Conformational ensemble view of αS and αS-Fasudil ensemble, (b) A schematic of β-Variational Autoencoder (β-VAE), (c) Training and validation loss for β-VAE and (d) RMSE as a function of the β parameter. The red-annotated β parameter was used for the investigation

We started our investigation by searching for an optimized representation of high-dimensional and heterogeneous ensemble of monomeric αS conformation in presence of fasudil. As the datasets, particularly in the context of IDPs, is more complex and large in size, the need for effective dimensionality reduction techniques becomes more conspicuous. Dimension reduction using principal component analysis, independent component analysis, 44 singular value decomposition45 and linear discriminant analysis46 linearly transform the high dimensional data into a lower dimensional manifold. However, non-linear methods such as kernel PCA,47 multidimensional scaling,48 isomap,49 fastmap,50 locally linear embedding51 etc52 have outperformed such linear transformation methods and have been used for free energy calculations53 and determining timescales to capture slow conformational changes.54,55 In the present work, we opted to draw inspiration from artificial deep neural network based framework to employ data-agnostic and mostly unsupervised non-linear approach to derive optimized non-linear latent feature space that can be employed for statistical state-space decomposition of IDP such as αS.

Over the recent years, unsupervised machine learning algorithm such as autoencoders have been effective as a dimension reduction tool, owing to non-linear activations of the neurons that allows them to capture complex relationships and non-linear patterns in the high dimensional data.41,56 This high dimensional input is represented into a lower dimensional latent space, which is then used to reconstruct the input. However, the latent space is “deterministic”, i.e. it provides only a fixed mapping of the input to latent space, which might not account for variations or uncertainty present in the data. Consequently, this could generate an inaccurate result for a point that does not exist in the latent space. Thus, when projecting new, similar data onto the latent space, a deterministic autoencoder will map similar inputs to nearly identical points in the latent space, potentially lacking the diversity and capturing only a single mode of the data distribution. In contrast, the latent space of a Variational Autoencoder (VAE) is “probabilistic” in nature, thereby avoiding this limitation of autoencoders. The technical details have been documented in the Method section. However, here we provide a brief rationale for the choice of the β-VAE model in this work.

Variational autoencoder (VAE)42,43 (figure 1b) is also an unsupervised machine learning algorithm, which although is closely related to autoencoder, it is inextricably linked with variational bayesian methods. A VAE consists of an encoder network and a decoder network, much like a traditional autoencoder. The encoder takes input data and maps it to a probabilistic distribution in the latent space, while the decoder reconstructs the input data from samples drawn from this distribution. The probabilistic nature of the latent space is a result of applying variational inference, which aims to estimate the actual posterior using an approximate distribution, parameterized by a mean vector and a variance vector. This estimation is achieved by minimizing the Kullback-Leibler (KL) divergence (also known as relative entropy or information gain) between the two distributions. As a result, instead of producing a single point in the latent space for each input, the encoder outputs a distribution. The latent variable is then sampled from this distribution, which is then used by the decoder to reconstruct the input. This makes the latent space continuous, thereby improving its ability to interpolate novel, unseen data. The VAE model can be further improved by enhancing the probability of generating an actual data while keeping the distance between the true and the posterior distribution under a small threshold. This results in the weighting of the KL divergence term in the loss function. The factor determining this weight is represented by β and hence the name β-VAE, which is used in this study. The implementation of the β-VAE is made publicly available in our group’s Github page (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder). The loss function of β-VAE is given as,

The β-VAE model was trained over a number of β values ranging between 5 and 1×1015. A β value of 1×1012 was chosen as this value gave us the minimum RMSE (see Figure 1 d). This model was further used to project the αS data in its apo state for further analysis.

In particular, as described in detail in the Methods section, we trained a large array of inter-residue pairwise distances using the β-VAE architecture. Our attempt to reconstruct the large-dimensional input feature via a β-VAE resulted in compression of the 1.5 millisecond MD simulation data into a two-dimensional latent space. The loss function steadily decreased and eventually plateaued providing a model optimized for this system (see Figure 1 c). The rugged nature of the free energy landscape of alpha synuclein, is evident from its projection along the latent space dimension of simulated trajectory either in aqueous media (see Figure 2a) or in presence of fasudil (see Figure 2b). The free energy landscape represents a large number of spatially close local minima representative of energetically competitive conformations inherent in αS, which is also the source of the conformational heterogeneity and a hallmark of an IDP(see Figure 2 a-b). Next we wanted to compare the changes in the conformational landscape of the apo αS conformations. Subsequent projection of the apo state on the low-dimensional subspace also brought out a spatially heterogeneous landscape.

The free energy landscape of (a) αS and (b) αS-Fasudil system as determined from the latent dimensions of β-VAE

However, a close comparison of free energy landscapes indicates that a set of local minima representing conformations of αS are less clustered and appear in small patches in the apo state. Moreover, a relative comparison of the two free energy landscapes suggests that the αS spans a significantly larger space in presence of fasudil even in this low dimensional subspace, hinting at an expansion of conformational repertoire of this IDP in presence of the small molecule (see Figure 2 a-b). Individual, one-dimensional projection of the conformational landscape along each of the two latent dimensions reveals that that the presence of the small molecule would shift part of the conformational landscape to a distinct location, suggesting that fasudil would modulate the existing conformational ensemble as well as would create new conformational space. For a more discrete and clearer state-space decomposition of the conformational ensemble of αS (in apo or in ligand-bound form), we attempt to enumerate the optimum number of distinct metastable macrostates with non-negligible equilibrium populations by constructing MSM as explained in the following section.

2. Markov State Model elucidates distinct binding competing states of αS in presence of the small-molecule drug

The two-dimensional feature obtained from β-VAE was used as the input feature for the MSM estimator provided by PyEMMA.57 A flowchart of the implemented protocol is drawn in Figure 3a. A geometric clustering approach, k -means, was used to discretise the ensemble. Analysis of the implied time scales as a function of lag-time predicted a Markovian model of αS in water with three kinetically separated states (see Figure 3d for relative population). On the other hand, a similar analysis of αS in presence of fasudil (Figure 3c)) indicated a clear increase in the number of spatially and temporally resolved conformational metastable states. Thus, a three-state MSM was built for the αS ensemble in water whereas a six-state model was built for the αS-fasudil ensemble. Hereafter, we will refer to the macrostates created in presence of fasudil prefixed with FS and those generated in water prefixed with MS. A bootstrapping analysis was performed to estimate the mean and standard deviations of the equilibrium populations of these metastable states (Figure 3d).

(a) A flowchart of the process of building a Markov State Model (MSM). (b, c) Implied timescales plot of the αS and αS-Fasudil systems. (c) Macrostate populations of the 3-state and 6-state MSM of the αS and αS-Fasudil systems. Bootstrapping was used to estimate the mean and standard deviations of the macrostate populations.

The MSM was also further validated for Markovianity i.e. ability to predict estimates at longer time scales using the Chapman-Kolmogorov (CK) test, depicted in Figure S1. This test further validates the quality of the model that it estimates long-time scale behavior with reasonable accuracy. The MSM analyses essentially indicate that the small-molecule binding interactions with αS led to sampling or generation of additional conformational states than the states populated in the apo αS ensemble.

A pertinent question arises: Are the appearances of newer states in αS in presence of fasudil a result of the presence of longer trajectory in fasudil solution (1500 µs) than that of the ensemble populated in neat water (62 µs)? We verified this by building an MSM for 60 µs of data of fasudil system using 10 µs and 70 µs segment of the trajectory (see Figure S2). Very interestingly, the MSM built using lesser data (and same amount of data as in water) also indicated the presence of six states of αS in presence of fasudil, as was observed in the MSM of the full trajectory. Together, this exercise invalidates the sampling argument and suggests that the increase in the number of metastable macrostates of αS in fasudil solution relative to that in water is a direct outcome of the interaction of αS with the small molecule.

3. Structural characterisation of metastable states of αS monomer in presence of fasudil

We characterised the residue-wise intramonomer contact maps of the metastable states to identify the differences in the interaction patterns in these states that can be attributed to interactions of fasudil with the monomer. The average inter-residue contact probability maps for each of the metastable states populated in the presence of fasudil and in water are depicted in Figure 4 and Figure 5, respectively. The residue-wise percentage secondary structure in each of the macrostates are presented in Figure S3 and S4, respectively.

Intrapeptide residue-wise contact probability maps of (a) six macrostates (FS1 to FS6) in the presence of fasudil. Axes denote the residue numbers. The color scale for the contact probability is shown at the extreme right of each panel of maps. The color bar along the axes of the plots represents the segments in the αS monomer. Specific contact regions are marked by boxes and numbered. These contacts are illustrated by representative snapshots and the corresponding contacts are similarly marked and numbered.

Intrapeptide residue-wise contact probability maps of the three macrostates (MS1 to MS3) in neat water. Axes denote the residue numbers. The color scale for the contact probability is shown at the extreme right. The color bar along the axes of the plots represents the segments in the αS monomer. Specific contact regions are marked by boxes and numbered. These contacts are illustrated by representative snapshots and the corresponding contacts are similarly marked and numbered.

In state FS1, antiparallel β-sheet interactions are formed within the N-terminal region. This β-sheet network also includes long-range interactions of the H2 region of the N-terminal with the negatively charged C-terminal region. Residues 70-80 in the hydrophobic NAC region are also involved in antiparallel β-sheet interactions with residues 120-130 in the C-terminal region. Short parallel β-sheet interactions also exist between the NAC and C-terminal residues with the N-terminal residues. The residues in the C-terminal exhibit 40-90 % of β-sheet propensity (see Figure S3). Long range interactions between N-terminal and C-terminal are relatively diminished in state FS2. This state has multiple parallel β-sheet interactions between short stretches of residues in the NAC region and C-terminal region with the N-terminal region. In addition. antiparallel β-sheet interactions are also present in NAC:N-terminal, NAC:C-terminal and within the N-terminal region. Residues 8 to 13 in the N-terminal and 86 to 92 in the NAC region exhibit significant helicity. State FS3 is characterised by 20 % of β-sheet interactions across the entire protein. Hydrophobic and polar, uncharged residues in the range 61 to 80 in the NAC region are involved in β-sheet interactions with the N-terminal (residues 1 to 10) and C-terminal (100 to 120). Moreover, β-sheet interaction networks are also present within the NAC and N-terminal regions. The C-terminal region 100 to 120 forms long-range interactions with residues 1 to 10 in the N-terminal region. Extensive short and long-range β-sheet networks are prevalent in state FS4. The entire NAC region forms antiparallel β-sheet network with the N-terminal. The C-terminal forms parallel β-sheets with the NAC and N-terminal. In the most populated state FS5, the key interactions are β-sheets formed between the oppositely charged termini. The NAC region also exhibits α-helical propensity. Finally, in state FS6, the NAC region (residues 60 to 80) forms antiparallel β-sheet network with the residues 30 to 60 of the N-terminal segment as well as the C-terminal region 100 to 120. These residues have 40 % β-sheet propensity.

On the other hand, the metastable states that are populated in neat water are distinct from those formed in the presence of fasudil. In state MS1, the NAC region residues 61 to 80 interacts with the N-terminal region by both parallel and antiparallel β-sheet interactions. The β-sheet network of NAC also incorporates interactions with residues 100 to 130 in the C-terminal region. These residues have 60 % β-sheet propensity (see Figure S4). State MS2, the most populated macrostate in neat water, is characterised by β-sheet interactions of the NAC region with H2 region (residues 30 to 60) of the N-terminal. Furthermore, the N and C-termini also form β-sheet interactions. State MS3 mainly has long-range interactions as β-sheets between the N and C-terminal regions. Moreover, a β-sheet network also exists within the NAC region and NAC:N-terminal (H2) interface. The residues in the NAC region have 50% β-sheet propensity in this macrostate.

We note that the differences in the structural features i.e. intra-protein contacts of the macrostates arise from the interactions of fasudil with the protein residues. For instance, in state FS2 in which fasudil interacts with multiple sites across the entire length of the protein, the secondary structure properties of this state are dictated by β-sheets formed by short stretches of residues rather than an extensive network. Furthermore, owing to the relatively stronger small-molecule binding interactions with hydrophobic residues in the N-terminal, the long-range interactions of the oppositely charged terminal regions are suppressed. Similarly, in state FS4, since fasudil primarily interacts with residues 120 to 140 of the C-terminal and other interactions are transient, the conformational properties of this metastable state is marked by a network of β-sheets across the rest of the protein. Essentially, small-molecule binding interactions via stabilisation of various intra-protein interactions modulate the conformational properties of αS.

4. Mapping the macrostates in αS and αS-Fasudil ensembles using Denoising Convolutional Variational Autoencoder

To ascertain the distinctions among these contact maps for the αS and αS-Fasudil system, we chose to filter the essential features and patterns by representing it in a lower dimensional space. Initially, this was attempted using a Convolutional Variational AutoEncoder (CVAE) with the contact maps of αS and αS-Fasudil macrostates as inputs. The model was initially trained using the six contact maps of αS-Fasudil system as a test case.

However, the model displayed a tendency to overfit by learning an identity function between the input and output, as evident from the large reconstruction error of the test dataset (see Figure S5 (a)). This overfitting issue was addressed by training a Denoising CVAE (DCVAE), where the contact map was intentionally corrupted with noise prior to training. This approach successfully mitigated the problem of overfitting as indicated by the loss profile of the test dataset (see Figure S5 (b)). An implementation is provided in GitHub (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder)

The denoising performance was measured by calculating the structural similarity index measure (SSIM) between the original and denoised contact map. SSIM measures the similarity between two images as perceived by the human eye by considering the luminance, contrast and structure of the image. SSIM values can range from -1 to 1, which represents perfect anti-correlation and perfect similarity, respectively. 58 The SSIM values is tabulated in Figure 6. We find that the SSIM values for all the metastable states of αS and αS-Fasudil system are close to 1 (see Figure 6), thereby indicating that the denoised contact map as obtained from the trained DCVAE model is very close to the quality of the original contact map (see Figure S6-S8)). This is also supported by the high Peak signal-to-noise ratio (PSNR). PSNR is the ratio of the maximum possible power of a signal to the power of corrupting noise that measures the fidelity of a reconstructed image compared to its original in decibels (dB).58 A higher value of PSNR indicates smaller difference between the original and denoised contact map (see Figure 6). Upon visualizing the latent space, it became evident that the contact map of the metastable states of αS and αS-Fasudil occupy distinct locations within the latent space, which indicates that the DCVAE model effectively captured the distinct patterns among the metastable states.

(a) Schematic of denoising convolutional variational autoencoder (b) Latent space of the contact map of αS and αS-Fasudil metastable states. (c) SSIM and PSNR values for αS and αS-Fasudil metastable states.

5. Fasudil exhibits conformation-dependent interactions with individual metastable states

We next sought to understand the specificities of the interactions of fasudil with the different metastable states of αS. Accordingly, we calculated the contact probabilities between fasudil and each αS residue. A contact between fasudil and a protein residue is considered to be present when the minimum distance between any heavy atom of fasudil and a protein residue is less than 0.6 nm. This is calculated for the entire population of each macrostate to determine the contact probabilities. The αS-Fasudil contact maps for the six macrostates are presented in Figure 7a and snapshots of the interactions are presented in Figure 7b.

(a) Contact probability map of residue-wise interaction of fasudil with αS. (b) Overlay of representative conformations from the six macrostates (FS1 to FS6) along with the bound fasudil molecules. The conformations are colored segment-wise as shown in the legend. The fasudil molecules, in licorice representation, are colored yellow.

The residue-interaction profiles of each of the six individual macrostates of αS with fasudil revealed distinct motifs of interaction across the length of the protein for the different macrostates (Figure 7). Precedent investigations26,32 have reported dominant fasudil-αS interaction via the C-terminal region of the protein while it weakly interacts with the entire αS sequence. In state FS1, fasudil dominantly interacts with residues 133-139 of the C-terminal region, which consists of negatively charged residues D135 and E137 as well as the polar, neutral tyrosine residues (Y133 and Y136).

In addition, there are also non-specific interactions of fasudil with hydrophobic residues in the NAC region (residues 71 to 74). In state FS2, fasudil interactions are spread across the entirety of the protein. These comprise hydrophobic residues in the N-terminal region, neutral polar residues (62 to 66) in the NAC region and residues 130 to 136 in the C-terminal region. Fasudil interacts strongly with the NAC region consisting of hydrophobic and polar, neutral residues in state FS3. Interactions also prevail with the negatively charged C-terminal region and residues 1 to 6 of the N-terminal region. While interacting non-specifically with residues in the amplipathic N-terminal region of state FS4, fasudil preferentially interacts with residues 121-140 of the C-terminal region via charge-charge and π-stacking interactions with the negatively charged residues (D135, E137, E139) and tyrosine side chains (Y125, Y133, Y136). In the most populated state FS5, fasudil interacts weakly with the hydrophobic residues and glutamic acid residues while interactions with the C-terminal region 107 to 140 comprising of tyrosines and acidic amino acids are dominant. Lastly, in state FS6, the strongest interactions of fasudil comprise hydrophobic and polar residues in the range 118 to 121. There are also weak interactions with hydrophobic residues in the N-terminal (11 to 16, 38, 39) and NAC regions (71 to 74, 80 to 95).

Robustelli et al.32 have reported that in the bound state, fasudil interactions with αS are favorable charge-charge and π-stacking interactions that form and break in a mechanism, they term as dynamic shuttling. The stacking interactions arise from the favourable orientation of fasudil’s isoquinoline ring and the aromatic ring in the side chain of tyrosine residue. Analysis of the time-series of the formation of different intermolecular interactions indicated the formation of charge-charge and π-stacking interaction with residues Y125, Y133 and Y136 and the shuttling among these interactions causes fasudil to remain localized to the C-terminal region. Notably, we observe that each of the metastable, while preferentially interacting with the C-terminal also has distinct yet significant interactions with hydrophobic residues in the N-terminal and NAC regions.

6. Entropic Signatures of Small Molecule Binding

The overall analysis of the conformational ensemble of αS in presence of fasudil demonstrates that small-molecule binding substantially modulates the ensemble of αS. Our results indicate that the interaction of fasudil with αS residues governs the structural features of the protein. Estimation of the metastable states of αS shows that compared to aqueous conditions in which three metastable states are identified, the ensemble populated in the presence of fasudil in solution contains six metastable states. Since IDPs are described as an ensemble unlike folded proteins with a fixed structure, the effect of small molecules binding on disordered proteins or regions, is manifested as modulation of the whole ensemble by shifting the population of the states leading to a change in its conformational entropy.5961 The common mechanism that is generally postulated is the entropic collapse or folding upon binding model.62 It describes the disorder-to-order transition that IDPs typically undergo upon interacting with its biological target. This transition leads to the shift in population to a more ordered, predominant state causing a loss of conformational entropy that is compensated by a corresponding enthalpic gain. Conversely, small-molecule binding may also lead to entropic expansion in which newer conformational states are populated that are otherwise undetectable.60 Additionally, in cases where there is no gross change in the conformational entropy and the binding is considered fuzzy, 34 referred to as isentropic shift.

The observations reported in the present investigation for αS monomer plausibly conform with the entropy-expansion model. 59 According to this model, the interactions of small molecules with IDPs can promote the exploration of conformations with discernible probability, thereby expanding the disorderedness (conformational entropy) of the ensemble. 19,59,60,63 The small molecule interaction with αS operates transiently, and not strongly as seen in structured proteins. These transient local interactions stabilise the conformations that are inaccessible in the absence of the small molecule.

To gain deeper insights into the entropic effect of small-molecule binding to αS, we further estimated protein conformational entropy as well as water (solvent) entropy. We evaluated the entropy of water using the 2PT64,65 method. First, we determined the molar entropy of pure water that yielded an entropy value of 55.5 J mol1 K 1, which is close to the reported value of 54.4 J mol1 K 1 from free energy perturbation calculation.66 However, the molar entropy is marginally reduced in all the macrostates of both αS and αS-fasudil systems, with values ranging between 55.01 J mol1 K 1 to 55.25 J mol1 K 1, with difference from that of bulk water being negligible. Upon closer investigation of the translational and rotational components of the total water entropy, we find that that the decrease of the translational component is more pronounced as compared to the rotational component. This implies that αS in its apo state or in the presence of fasudil has a comparatively greater impact on the translational component than on the rotational component of the entropy of water (see Figure S9). Nonetheless, the entropy analysis suggests that the presence of fasudil has an insignificant impact on the entropy of water.

Next, we estimated the protein conformational entropy using the PDB2ENTROPY program67 that implements the Maximum Information Spanning Tree (MIST) approach68,69 on sets of torsional angles of the protein. The entropy is calculated from probability distributions of torsional degrees of freedom of the protein relative to random distribution i.e. corresponding to a fully flexible protein chain. The total protein entropy values for the metastable states in water (MS1 to MS3) and fasudil-bound states (FS1 to FS6) are shown in Figure 8. While the conformational entropies of the ligand-free states, MS1 to MS3, range from -72 to -75 J mol1 K1, the entropy of fasudil-bound states (FS1 to FS6) vary from -70 to -86 J mol1 K1, indicating that binding of fasudil to αS leads to a broadening of the span of entropy of these states. The highest entropy among the metastable states in the two environments correspond to FS5 and MS2 with values of -70.8 J mol1 K1 and -72.3 J mol1 K1, respectively. These states are also the most populated metastable states. MS1 and MS3 states have entropy values of -73.9 J mol1 K1 and -74.5 J mol1 K1, respectively. Notably, the presence of fasudil and its interactions with the protein chains can elicit the exploration of states that are more ordered in nature or increasingly deviated from a random chain. The largest deviation from a fully disordered chain is attained by the state FS1 with entropy value of -85.3 J mol1 K1, followed by FS2 having entropy of -83.1 J mol1 K1. FS4 is also relatively more ordered than FS5 with entropy of -76.2 J mol1 K1. FS3 and FS6 are closely disordered as FS5 with entropy values of -73.2 J mol1 K1 and -72.5 J mol1 K1, respectively. These observations of the total extent of disorderedness of the metastable states clearly indicate that the interactions of fasudil with αS residues can restrict their conformational degrees of freedom thus entrapping the protein in specific conformational states, thus manifesting as overall decrease in the protein entropy.

(a) Total Protein entropy of the metastable states of αS and αS-Fasudil calculated from the torsion angles, relative to a fully flexible chain (b) Total water entropy in αS and αS-Fasudil system and (c) Residue-wise backbone entropy within the αS and αS-Fasudil states.

Considering the diffuse binding of fasudil across several residues along the entire length of the protein, we further estimated and analysed how the entropic contributions are affected at the residue level (shown in Figure 8 and Figure S10, S11). The total residue-wise entropy presented in Figure S10 shows that the residues in the N-terminal region of fasudil-bound states FS1 and FS2, exhibit greater entropic fluctuations i.e. more negative values from the apo metastable states (MS1 to MS3), suggesting more ordered or restricted fluctuations of these residues. We note that the propensity of fasudil binding to the N-terminal residues is higher in FS1 and FS2, relative to other states (see Figure 7). We further decoupled the residue entropy into backbone and sidechain components; this analysis is depicted in Figure 8 and Figure S11, respectively. Comparison of the sidechain entropies, computed from the χ torsion angles, for the unbound and bound states, does not show significant alteration upon small-molecule binding. Conversely, increased fluctuations in residue backbone entropies are observed in response to small-molecule binding (see Figure 8). The metastable states, MS1 to MS3, populated in water have minimally fluctuating residue entropies amongst the states. In contrast, significant fluctuations in residue entropies are noted predominantly in the N-terminal region (residues 1 to 60) and to some extent in the NAC region for states FS1 and FS2. These residues in the N-terminal region in FS1 and FS2 experience a drop in backbone entropy owing to the restricted degrees of freedom from fasudil binding. The decrease in residue backbone entropy correlates with fasudil binding hotspots and therefore can be ascribed to small-molecule interactions. Hydrophobic residues in the NAC region show decrease in entropy in ligand-binding locations mainly in state FS1, FS2 and FS4. However, intriguingly, we note that the correlation between fasudil interactions and residue entropy is not observed in state FS3. Finally, in the negatively charged C-terminal region, where fasudil preferentially targets the entire region in the αS protein, residue-wise entropic fluctuations are minimal compared to the apo state. Robustelli et al. refer to the interaction of fasudil with the C-terminal region as dynamic shuttling, in which breaking and formation of interactions with nearby residues occur simultaneously. This dynamic shuttling mechanism could possibly minimise the backbone entropic fluctuations compared to the relatively specific binding in the N-terminal and NAC regions.

7. Exploring interplay of conformational entropy and Mean First Passage Time in Protein Conformational Transitions

The Mean First Passage Time (MFPT) for transition among the metastable states provides valuable information for understanding the dynamics and conformational changes in the system. MFPT rate defines the average time it takes for a system to transition between one metastable state to another. A lower MFPT value suggests a favourable transition among the states. The transition timescales for αS indicate that in the presence and absence of fasudil, the transition to the most populated state is preferred (see Figure 9). For αS-Fasudil system, the transition to the most populated states takes place over timescales ranging from 74 µs and 766.5 µs, whereas for the apo state it occurs between 29.8 µs and 37.4 µs. Upon comparing these transition times with those to other macrostates, it becomes evident that these timescales represent the shortest durations. This observation suggests that the transition to the MS2 and FS5 state in αS and αS-Fasudil system is the most favoured over all other states in the system.

The rate network obtained from MFPT analysis for the (a) 3-state MSM of the αS ensemble in neat water and (b) 6-state MSM of the αS-Fasudil ensemble. The sizes of the discs is proportional to the stationary population of the respective state. The thickness of the arrows connecting the states is proportional to the transition rate (1/MFPT) between the two states and the MFPT values are shown on the arrows. (c, d) Projection of conformation sub-ensemble PCCA+ for αS and αS-Fasudil system in latent space, with respective protein entropy values (in unit of J mol1 K 1) annotated on top of it. (e) Correlation between protein entropy and transition time to the major state for αS and αS-Fasudil system. The inset plot corresponds to the correlation between entropy and transition to the macrostate for αS and αS-Fasudl system.

While MFPT offers insights into the kinetics of the metastable states, the thermodynamic property, entropy, characterizes the diversity and disorder within the system. Although a state with higher population might not always have the higher entropy, we observed that in our case, the state with the highest population also has the highest protein entropy, for both αS and αS-Fasudil system. States with higher entropy are thermodynamically favored which in turn can affect the rates of state transitions, potentially influencing MFPT values. To decipher this, we have compared the transition times of each metastable state to the most populated state with the entropy of the state, and found a strong degree of correlation (see Figure 9e). This suggests that a state with higher entropy, which also has more number of accessible microstates, potentially provide multiple pathways for other macrostates to access this state for both αS and αS-Fasudil system. However, as compared to αS in its apo state, the apparent increase ( 2-20 times) in the time to transition to the major state in αS-Fasudil system, suggests that fasudil may potentially trap the protein conformations in the intermediate states, thereby slowing down αS in exploring the large conformational space in presence of fasudil.

Conclusions

The conformationally dynamic disordered proteins challenge the application of conventional structure-based drug-design methods that relies on the specificity of a binding pocket, based on which the drug molecule is designed and optimized. A general framework describes the effect of small-molecule binding to IDPs by modulation of the disordered ensemble by increasing or decreasing its conformational entropy. Here, we characterised the effect of the small molecule, fasudil, on the conformational characteristics of the quintessential IDP, α-synuclein. Fasudil has been experimentally reported to curb the aggregation of αS both in vivo and in vitro. The ensemble view of long time scale atomic simulations of fasudil binding to αS monomer displays a fuzzy ensemble of αS with preferential binding of fasudil to the C-terminal region via charge-charge interactions and aromatic stacking. Using a deep neural-network based machine learning approach, variational autoencoder, we reconstructed a high dimensional feature i.e. Cα-pairwise distance into a two-dimensional latent space. We built a Markov State Model using this data to delineate the metastable conformational states of αS and identify the differences that arise from small-molecule interactions. Comparison of the metastable states of αS in the presence and absence of the small molecule fasudil revealed that the small-molecule binding navigates the structural landscape of the protein. Importantly, more number of metastable states are populated indicating that small-molecule mediated conformational modulation led to entropy expansion (see Figure 9d-e).

Atomic-level MD simulations have slowly emerged as a a valuable tool for complementing experimental measurements of disordered proteins and providing detailed descriptions of their conformational ensembles. 7073 With improvements in force fields33,74,75 that model disordered proteins, the accuracy of MD simulations has dramatically increased, as assessed by their agreement with a wide variety of experimental measurements. Integrative approaches use a combination of experimental restraints such as chemical shifts, paramagnetic relaxation enhancement (PRE),70,71,73,76 residual dipolar coupling constants (RDCs),7678 small-angle X-ray scattering (SAXS)79,80 and computational models from MD simulations to generate a more accurate description of the ensemble of disordered proteins. More recent advances include statistical approaches such as Bayesian inference models like Metainference,34 Experimental Inferential Structure Determination (EISD)81,82 and Maximum Entropy formulations8385 that also take into account the experimental and back-calculation model errors and uncertainties. Such improved approaches to leverage MD simulations with improved force fields and experimental knowledge-based ensemble generation have shown tremendous potential for describing molecular recognition mechanisms of IDPs with other protein partners; scenarios such as folding-upon-binding62,86 or self-dimerisation. More importantly, a set of recent initiatives have focussed on molecular characterisation of the interaction of small-molecules with IDPs via effective combination of atomistic simulation and biophysical experiments.19,32

In this regard, the present investigation takes a step ahead via a quantitative and statistical dissection of the αS ensemble in presence of a small-molecule. The implementation of Markov state model in the present work clearly brings out the feasibility of metastable states that would have a finite ‘stationary’ or equilibrium population in presence of small-molecule or in neat water. The uniqueness of these states is supported from the projection of their inter-residue Cα contact map of the macrostates using DCVAE model. Our systematic investigation of the impact of fasudil on αS monomer ensemble reveals the altering effects of small-molecule interaction on αS ensemble by tuning the entropy of these states. Insights from studies targeting structural ensemble modulation by small-molecule can be effectively harnessed in designing drug-design strategies for aggregation diseases.

Computational Methods

Monomer Simulation Protocol

A 1500 µs long all-atom MD simulation trajectory of αS monomer in aqueous fasudil solution was simulated by D. E. Shaw Research with GPU/Desmond in Anton supercomputer that is specially purposed for running long-time-scale simulations. 32 The protein was solvated in a cubic water-box with 108 Å sides and containing 50 mM concentration of Na or Cl ions. Protein, water molecules and ions present in the system were parameterised with the a99SB-disp force field.33 The small-molecule, fasudil, was parameterised using the generalised Amber force-field (GAFF). Production run of the system was performed in the NPT ensemble. Further details of the MD simulation can be obtained by referring to the original work by Robustelli et al. 32 The extensive simulation trajectories were generously made available by D. E. Shaw Research upon request.

To generate αS ensemble in water, we performed multiple simulations of αS monomer. The starting conformations of these simulations were randomly selected from a previously reported33 multi-µs αS trajectory in water. The simulation details are described below. Each protein conformation is solvated in a cubic box of explicit water molecules and Na or Cl ions were added to a concentration of 50 mM, to maintain the same conditions as used in the simulations with fasudil. The all-atom a99SB-disp force field and water model33 were used to simulate the protein and solvent. The unbiased MD simulations were performed using the Gromacs simulation package. 87 A timestep of 2 fs and the leap-frog integrator was used. The simulation was performed in the isothermal-isobaric (NPT) ensemble at an average temperature of 300 K, maintained using the v-rescale thermostat88 with a relaxation time of 0.1 ps and a pressure of 1 bar using the Parrinello-Rahman barostat89 with a time constant of 2 ps for coupling. The Verlet cutoff scheme90 with 1.0 nm cutoff was applied for Lennard-Jones interactions and short-range electrostatic interactions. Long-range electrostatic interactions were computed using the Particle-Mesh Ewald (PME) summation method91 and covalent bonds involving hydrogen atoms were constrained using the LINCS algorithm.92 All the systems were minimized using the steepest descent algorithm, followed by equilibration in the isothermal-isochoric (NVT) and subsequently in the NPT ensemble with position restraints on the heavy atoms of the protein. Twenty-three simulations, each starting from different conformations, were performed. These simulation timescales are variable, ranging from 1 to 4 µs, amounting to a cumulative length of 62 µs.

Dimension reduction using β-Variational Autoencoder

The probability distribution p(x) of a multi-dimensional variable x can be determined using various approaches,93,94 one of them being the latent variable model. In this model, the probability distribution is modeled as a joint distribution p(x, z) of the observable variable x and the hidden latent variable z which is given as p(x, z) = p(x|z)p(z), where p(z) is the prior distribution over the latent variables z and p(x|z) is the likelihood of generating a true data, given z. The p(x) is determined by marginalization over all the latent variables giving us,

However, this integral is often intractable thereby constraining the calculation of the true posterior distribution p(z|x). This can be solved using, variational inference, which aims to estimate the true posterior using an approximate distribution q(z|x), using the encoder part of VAE which parameterizes the distribution using a mean vector and a variance vector. The goal is to determine q(z|x) which is closest to the true posterior by minimizing the Kullback-Leibler (KL) divergence between them which is given as

Upon expanding and rearranging, we have

where p(x) is the evidence. The VAE is trained so that we maximize the log-likelihood of the evidence and minimize the KL divergence between the estimated and true posterior. The loss function of VAE is given by the negation of equation 2 and is given as,

The training objective of VAE being minimizing this loss function to generate a realistic data. The VAE model can be further improved by keeping the distance between the approximated and the true posterior under a small distance, δ. This is achieved by maximizing the equation

which modifies the loss function to

where β is the Lagrangian multiplier (under the Karush-Kuhn-Tucker conditions). This enhances the probability of generating an actual data and is known as β-VAE.95 The expectation term in the loss function of β-VAE requires a point z to be sampled from the posterior distribution. This is performed using the reparametrization trick

where E is sampled from a Standard Normal distribution. This makes the VAE model trainable using the backpropagation algorithm.

In our work we have used Cα-Cα pairwise distance (excluding i, i + 1 and i + 2 distances) sampled at every 9 ns of the entire 1.5 millisecond of the αS protein in presence of fasudil trajectory as our input. We have built our model using Tensorflow backend96 with Keras library. Training was performed with 80% of the data while 20% was used for testing. We have employed a symmetric VAE where the input layer consists of 9453 nodes followed by four hidden layers having 4096, 512, 128 and 16 nodes respectively, and 2 nodes in the latent layer. Hyperbolic tangent (tanh) is used as our activation function with weights being initialized using the glorot_uniform97 method in all the layers except for the output layer where we have used sigmoid activation. Adaptive Moment Estimation (Adam98) was used as the optimizer with a learning rate of 1×104. The model was trained for 300 epochs with a batch size of 64. The relevant Python implementations are provided in Github (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder).

Building the Denoising Convolutional Variational AutoEncoder

The Denoising Convolutional Variational Autoencoder (DCVAE) model employs five convolutional layers which incorporate filters of sizes 16, 32, 64, 96, and 128, with corresponding kernels of size 3×3. Strides are set at 1, 2, 2, 2, 2, and 1 for efficient feature extraction. This is followed by six densely connected layers with neuron counts of 4096, 2048, 1024, 256, 64 and 16 in the successive dense layers. The latent layer is comprised of 2 neurons. The decoder mirrors the encoder architecture, to reconstruct back the input contact maps. Hyperbolic tangent activation is used in all the layers, excluding the output layer where sigmoid activation is used. Adam optimizer with a learning rate of 5×104 is used for training the DCVAE. The model was trained for 300 epochs with a batch size of 32 with β = 1×1012. Training was performed with 90% of the data while 10% was used for testing.

At first, we added a noise to 200 replicas of contact map for each of the states of αS and αS-fasudil. The noise was generated stochastically from a standard normal distribution which was multiplied by a noise factor of 0.035 and added to each of the replicas, followed by scaling the dataset so that the values range between 0 and 1 using the formula

where xmin and xmax are the minimum and maximum values of the input contact map. In order to assess the denoising performance, we have calculated the structural similarity index measure (SSIM) values.58 The SSIM value can range between -1 to +1, thereby indicating the structural dissimilarity or similarity between the original (x) and denoised (y) contact maps. Mathematically, this is expressed as

where the functions l(x, y), c(x, y),and s(x, y) compare the luminance, contrast and structure between the contact maps and is given as

The variables µi and σi represent the pixel sample mean and variance of i, where i = {x, y}. σxy is the covariance of the x and y contact maps. The exponents α, β and γ controls the weighted contribution to the SSIM value and are set to 1. The constants C1 = (k1L)2, C2 = (k2L)2 are small positive values introduced to prevent instability and division by zero. They are used for numerical stability and to avoid problems when the means and variances of the contact maps are close to zero. We have set k1 = 0.01, k2 = 0.03 and C3 = C2/2 as per convention.99

In addition, we have also calculated the peak Signal-to-Noise Ratio to evaluate our denoising performance and is calculated as

where, R is the maximum pixel value (255 in our case) and MSE is the mean-square error between the original and denoised image and is given as

where M is the height (number of rows) of the image, N is the width (number of columns) of the image and Oij and Dij is the pixel intensity at position (i, j) for the original and denoised image respectively. The relevant Python implementations are provided in Github (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder).

Building a Markov State Model of monomer ensemble

The 2D latent space reconstructed using β-VAE described in the previous subsection was then used to build a Markov State Model to elicit the metastable states underlying the ensembles. We employed PyEMMA,57 a Markov State Model (MSM) building and analysis package. The 2D data is subjected to k -means clustering into 180 microstates. A transition matrix was then built by counting the number of transitions among the microstates at a specific lag time. To choose the lag time, the implied timescale (ITS) or relaxation timescales of the systems were calculated over a range of lag times and plotted as a function of lag time. The timescale at which the ITS plot levels off is chosen as the lag time to build the final MSM. The ITS plots corresponding to the neat water and aqueous fasudil systems are presented in Figure 3. Accordingly, MSM lag times of 36 and 32 ns were selected for neat water fasudil systems, respectively. At these lag times, by identifying the gaps between the ITS, a three-state model for neat water and a six-state model for aqueous fasudil system was built. Lastly, the transition path theory100102 was used to ascertain the transition paths, fluxes and timescales in these models. Bootstrapping was performed to estimate the errors associated with the state populations and transition timescales. In ten iterations, the model was rebuilt after eliminating a randomly selected trajectory followed by calculation of the state populations and transition timescales. The mean value and standard deviations of the values collected by bootstrapping are reported. A Chapman-Kolmogorov test was performed for the MSMs of the neat water and small molecule ensembles, that tests the validity of the model prediction at longer time scales (see Figure S1).

Entropy of water

In order to calculate the entropy of water, simulations were performed using GROMACS (version 2022.3). A single snapshot was chosen from each macrostate of α-synuclein in its apo state and in presence of fasudil, which were then solvated in a cubic box, placed within 1 nm from the box edge. The system was neutralized using NaCl and the salt concentration was kept at 50 mM. The system was energy minimized using the steepest descent algorithm, followed by equilibration in the canonical ensemble for 2 ps. The average temperature was maintained at 300 K using velocity rescale thermostat with a time constant of 0.1 ps. Following this, NPT equilibration was performed for 2 ps to maintain the pressure at 1.0 bar using the Berendsen barostat. Finally, three independent production runs were performed from the equilibrated structures in the NPT ensemble using the velocity-verlet integrator with a timestep of 1 fs for a timescale of 20 ps with frames saved at every 4 fs. In addition, we have performed three simulations of neat water using the parameters as described for the macrostates.

The entropy of water was estimated using the DoSPT program103,104 that computes the thermodynamic properties from MD simulation in the framework of the 2-phase-thermodynamics (2PT) method.64,65 This is achieved from calculating the density of state (DoS) function, I(v) which is the Fourier transform of the velocity auto correlation function,

where ml is the mass of atom l, vk is the velocity of atom l along the k direction and N is the total number of atoms in the system. DoS can be represented as a sum of translational (trn), rotational (rot) and vibrational (vib) motions. 65

For water, the 2PT model overcomes the anharmonic nature of the low frequency modes (diffusive motions and libration) by decomposing the DoS into solid-like (Is) and gas-like (Ig) contribution for the translational and rotational components. The solid-like DoS is treated using the harmonic oscillator model, whereas the gas-like DoS is described using the Enskog hard sphere (HS) theory for translation and the rigid rotor (RR) model for rotation. The entropy is calculated by integrating the DoS, weighted by the corresponding weighting functions W,

where k = trn, rot, or vib and the weighting functions are given as

where β = (kBT)1 and h is the planck constant.105

Protein conformational entropy

We calculated the protein conformational entropy using the program PDB2ENTROPY.67 This program is based on the nearest-neighbor approach from probability distributions of torsion angles at a given temperature relative to uniform distributions. The entropy is computed using the Maximum Information Spanning Tree (MIST) approach.68,69

All supplemental figures described in the article (PDF)

Code Availability

The implementations of Machine learning protocols are publicly available in our group GitHub webpage (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder)

Acknowledgements

We sincerely thank D. E. Shaw research for providing us with access to long simulation trajectories of monomeric alpha-synuclein in presence of fasudil which seeded the project. We acknowledge support of the Department of Atomic Energy, Government of India, under Project Identification No. RTI 4007. JM acknowledges Core Research grants provided by the Department of Science and Technology (DST) of India (CRG/2023/001426).