An Integrated Machine Learning Approach Delineates Entropy-mediated Conformational Modulation of α-synuclein by Small Molecule

Sneha Menon; Subinoy Adhikari; Jagannath Mondal

doi:10.7554/eLife.97709.1

eLife assessment

This study describes the application of machine learning and Markov state models to characterize the binding mechanism of alpha-Synuclein to the small molecule Fasudil. The results suggest that entropic expansion can explain such binding. However, the simulations and analyses in their present form are inadequate.

https://doi.org/10.7554/eLife.97709.1.sa3

Strength of evidence

inadequate: Methods, data and analyses do not support the primary claims

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

The mis-folding and aggregation of intrinsically disordered proteins (IDPs) such as α-synuclein (αS) underlie the pathogenesis of various neurodegenerative disorders. However, targeting αS with small molecules faces challenges due to its lack of defined ligand-binding pockets in its disordered structure. Here, we implement a deep artificial neural network based machine learning approach, which is able to statistically distinguish fuzzy ensemble of conformational substates of αS in neat water from those in aqueous fasudil (small molecule of interest) solution. In particular, the presence of fasudil in milieu either modulates pre-existing states of αS or gives rise to new conformational states of αS, akin to an ensemble-expansion mechanism. The ensembles display strong conformation-dependence in residue-wise interaction with the small molecule. A thermodynamic analysis indicates that small-molecule modulates the structural repertoire of αS via tuning protein backbone entropy, however keeping entropic ordering of surrounding solvent unperturbed. Together, this study sheds light on the intricate interplay between small molecules and IDPs, offering insights into entropic modulation and ensemble expansion as key biophysical mechanisms driving potential therapeutics.

Introduction

Intrinsically disordered proteins (IDPs) comprise a class of proteins that lack a unique, well-defined three-dimensional structure under physiological conditions. Defying the classical structure-function paradigm, ^1,2 they are involved in crucial cellular processes such as regulation of cell cycle signalling and act as hubs in interaction networks mainly due to their promiscuous binding nature.^3–5 Emergent studies present disordered proteins as an important component of biomolecular condensates that play regulatory roles in several key cellular functions.^6–9

Considering the broad range of functionality of IDPs, their disregulation and altered abundance can lead to a number of diseases including cancer, diabetes, cardiovascular and neurodegenerative disorders.^1,2,10 In rational structure-based drug design, exploration of the target proteins is a crucial step based on which the interactions of small molecules are optimized to have stabilising interactions with a structured ligand-binding pocket^11,12 and is typically applied to folded proteins. IDPs, however, are considered undruggable since they exist as an ensemble of conformations that precludes the usage of these traditional drug-design strategies. ¹³ Nevertheless, a few monomeric disordered proteins such as the oncogenic transcription factor c-Myc, ¹⁴ Abeta42,^15–19 Tau,²⁰ PTP1B²¹ and osteopontin²² have been successfully targeted by small molecules.²³

The IDP of this article’s interest, alpha-synuclein (αS) has remained an important drug-target^24–27 owing to its long-rooted association with neurodegenerative diseases. Often, the early aggregation stages of αS into small oligomers is implicated in the causation of Parkinson’s disease^28–31 and a potential therapeutic strategy has been the stabilization of αS in soluble monomeric form. In particular, the small molecule fasudil (focus of the present investigation) has been shown to interact with αS and retard its aggregation. ²⁶ The anti-aggregative effect of fasudil on αS assembly was tested both in vivo and in vitro and it was found to significantly reduce αS aggregation in both scenarios. Characterisation of the effect of small molecule binding on the ensemble properties of IDPs and the molecular details of the stabilizing interactions can provide the basis for selectivity and can be effectively calibrated to induce the desired effect to alleviate disease conditions. In particular, latest effort by Robustelli and coworkers³² had produced an unprecedentedly long-time scale (1500 µs) all-atom molecular dynamics (MD) simulation trajectory of small molecule drug fasudil with monomeric αS, in which the predicted protein-fasudil interactions were found to be in good agreement with previously reported NMR chemical shift data. However, a routine comparison of monomeric αS ensemble in neat water³³ with that in presence of fasudil does not bring out significant difference that is a customary signature of the dynamic IDP ensemble.

Characterization of the peculiar nature of IDPs such as αS using the conventional experimental and computational approaches have proven to be challenging. IDPs can be best described by an ensemble of conformations potentially sampled by the protein along with their associated statistical/thermodynamic weights. While biophysical methods can provide an ensemble averaged conformation of an IDP, computer simulations can extensively sample the conformational ensembles of IDPs at atomic resolutions. However, biophysical methods lack the potential to provide an ensemble view and there are inherent limitations of force fields used in computer simulations to model IDPs in agreement with experiments. Integrative approaches are therefore gaining importance in IDP studies^33–37 and have also been employed in studies of small-molecule binding to IDPs.^19,32 In recent years, Markov State Models (MSMs) have been exploited to elucidate a states-and-rates view of biomolecular dynamics i.e. finding the kinetically relevant states and the transition rates between them.^38–40 The first step in the process of building a kinetic model / MSM of any complex biomolecular ensemble is to identify a suitable metric or collective variable (CV) that efficiently captures its conformational space. CVs such as torsion angles, distances, chemical shifts and so forth, can characterize the complex molecular motion in a high number of dimensions. However, this presents a challenge to effectively analyze trajectories generated from MD simulation. In order to address this, dimension reduction is performed to obtain a set of slow variables that can be further subjected to clustering into kinetically discrete metastable states. Since the high-dimensional input features describing the heterogeneous conformational landscape of IDPs can have non-linear relationships, non-linear dimension reduction techniques are most suitable. Recent advances in the field of machine learning have shown better results in capturing such non-linear relationships.⁴¹ In this study, we employed a deep neural network, specifically β-variational autoencoder^42,43 to perform dimension reduction and further utilized this machine-learnt latent dimension for building MSM in order to better understand the conformational landscape of IDPs in presence and in absence of a small molecule.

In the current work, to elucidate the small-molecule mediated modulatory effect on αS ensemble, we harnessed deep-learning based β-Variational AutoEncoder (β-VAE) and Markov State Models (MSM) to dissect αS ensemble in water and in aqueous fasudil solution. A projection of the atomistically simulated conformation ensemble of αS in its apo state and in presence of Fasudil, along their respective latent space describes the conformational land-scape, which is further refined by developing individual MSMs. Our result reveals that, compared to the macrostates identified in water, the presence of fasudil led to an increase in the number of metastable states. Characterisation of small-molecule interaction with the protein indicates that fasudil has differential interactions with each of these macrostates. These macrostates exhibit clear distinctions as evident from their C_α contact maps which was further analysed using a convolutional VAE. An entropic perspective at the global (whole protein) and local (residue) level reveals that fasudil binding can potentially trap the protein and the resultant states may be disordered or ordered in nature. The effect is mainly manifested at the backbone level with a decrease in entropy at the fasudil binding hotspots in some states. The observations reported in this study demonstrate how the binding of a small molecule, fasudil, modulates the conformational properties of the disordered protein, αS, manifested as a change in the backbone entropy of the fasudil binding hotspots.

Results and Discussion

1. Modulation of free energy landscape of αS in presence of fasudil

Simulations of αS monomer in the presence of the small molecule fasudil and 50 mM NaCl have been reported in a previous study by Robustelli et al. ³² This study showed no large-scale differences between the bound and unbound states of αS. In our study, in order to compare the αS-fasudil ensemble with an apo ensemble at same salt concentration, we spawned multiple all-atom MD simulations of αS in neat water 50 mM NaCl from multiple starting conformations to generate a cumulative ensemble of ∼62 µs.

In a clear departure from the classical view of ligand binding to a folded globular protein, the visual change in αS ensemble due to the presence of small molecule is not so strikingly apparent. In order to understand the underlying ensemble modulatory effect of the small-molecule binding events, the complex and fuzzy conformational ensemble of αS (see Figure 1a) needs to be delineated using a suitable spatial and temporal decomposition into its key sub-ensembles or metastable states. This prompted us to analyse the two ensembles using the framework of Markov State Model (MSM). As a suitable input feature to build the MSM, we estimated the set of inter-residue Cα pairwise distances, as this feature largely incorporates the conformational space of αS. In a standard MSM analysis, this high-dimensional input feature requires further processing such as dimensionality reduction, before discretisation using clustering for the construction of an MSM.

(a) Conformational ensemble view of αS and αS-Fasudil ensemble, (b) A schematic of β-Variational Autoencoder (β-VAE), (c) Training and validation loss for β-VAE and (d) RMSE as a function of the β parameter. The red-annotated β parameter was used for the investigation

We started our investigation by searching for an optimized representation of high-dimensional and heterogeneous ensemble of monomeric αS conformation in presence of fasudil. As the datasets, particularly in the context of IDPs, is more complex and large in size, the need for effective dimensionality reduction techniques becomes more conspicuous. Dimension reduction using principal component analysis, independent component analysis, ⁴⁴ singular value decomposition⁴⁵ and linear discriminant analysis⁴⁶ linearly transform the high dimensional data into a lower dimensional manifold. However, non-linear methods such as kernel PCA,⁴⁷ multidimensional scaling,⁴⁸ isomap,⁴⁹ fastmap,⁵⁰ locally linear embedding⁵¹ etc⁵² have outperformed such linear transformation methods and have been used for free energy calculations⁵³ and determining timescales to capture slow conformational changes.^54,55 In the present work, we opted to draw inspiration from artificial deep neural network based framework to employ data-agnostic and mostly unsupervised non-linear approach to derive optimized non-linear latent feature space that can be employed for statistical state-space decomposition of IDP such as αS.

Over the recent years, unsupervised machine learning algorithm such as autoencoders have been effective as a dimension reduction tool, owing to non-linear activations of the neurons that allows them to capture complex relationships and non-linear patterns in the high dimensional data.^41,56 This high dimensional input is represented into a lower dimensional latent space, which is then used to reconstruct the input. However, the latent space is “deterministic”, i.e. it provides only a fixed mapping of the input to latent space, which might not account for variations or uncertainty present in the data. Consequently, this could generate an inaccurate result for a point that does not exist in the latent space. Thus, when projecting new, similar data onto the latent space, a deterministic autoencoder will map similar inputs to nearly identical points in the latent space, potentially lacking the diversity and capturing only a single mode of the data distribution. In contrast, the latent space of a Variational Autoencoder (VAE) is “probabilistic” in nature, thereby avoiding this limitation of autoencoders. The technical details have been documented in the Method section. However, here we provide a brief rationale for the choice of the β-VAE model in this work.

Variational autoencoder (VAE)^42,43 (figure 1b) is also an unsupervised machine learning algorithm, which although is closely related to autoencoder, it is inextricably linked with variational bayesian methods. A VAE consists of an encoder network and a decoder network, much like a traditional autoencoder. The encoder takes input data and maps it to a probabilistic distribution in the latent space, while the decoder reconstructs the input data from samples drawn from this distribution. The probabilistic nature of the latent space is a result of applying variational inference, which aims to estimate the actual posterior using an approximate distribution, parameterized by a mean vector and a variance vector. This estimation is achieved by minimizing the Kullback-Leibler (KL) divergence (also known as relative entropy or information gain) between the two distributions. As a result, instead of producing a single point in the latent space for each input, the encoder outputs a distribution. The latent variable is then sampled from this distribution, which is then used by the decoder to reconstruct the input. This makes the latent space continuous, thereby improving its ability to interpolate novel, unseen data. The VAE model can be further improved by enhancing the probability of generating an actual data while keeping the distance between the true and the posterior distribution under a small threshold. This results in the weighting of the KL divergence term in the loss function. The factor determining this weight is represented by β and hence the name β-VAE, which is used in this study. The implementation of the β-VAE is made publicly available in our group’s Github page (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder). The loss function of β-VAE is given as,

The β-VAE model was trained over a number of β values ranging between 5 and 1×10⁻¹⁵. A β value of 1×10⁻¹² was chosen as this value gave us the minimum RMSE (see Figure 1 d). This model was further used to project the αS data in its apo state for further analysis.

In particular, as described in detail in the Methods section, we trained a large array of inter-residue pairwise distances using the β-VAE architecture. Our attempt to reconstruct the large-dimensional input feature via a β-VAE resulted in compression of the 1.5 millisecond MD simulation data into a two-dimensional latent space. The loss function steadily decreased and eventually plateaued providing a model optimized for this system (see Figure 1 c). The rugged nature of the free energy landscape of alpha synuclein, is evident from its projection along the latent space dimension of simulated trajectory either in aqueous media (see Figure 2a) or in presence of fasudil (see Figure 2b). The free energy landscape represents a large number of spatially close local minima representative of energetically competitive conformations inherent in αS, which is also the source of the conformational heterogeneity and a hallmark of an IDP(see Figure 2 a-b). Next we wanted to compare the changes in the conformational landscape of the apo αS conformations. Subsequent projection of the apo state on the low-dimensional subspace also brought out a spatially heterogeneous landscape.

The free energy landscape of (a) αS and (b) αS-Fasudil system as determined from the latent dimensions of β-VAE

However, a close comparison of free energy landscapes indicates that a set of local minima representing conformations of αS are less clustered and appear in small patches in the apo state. Moreover, a relative comparison of the two free energy landscapes suggests that the αS spans a significantly larger space in presence of fasudil even in this low dimensional subspace, hinting at an expansion of conformational repertoire of this IDP in presence of the small molecule (see Figure 2 a-b). Individual, one-dimensional projection of the conformational landscape along each of the two latent dimensions reveals that that the presence of the small molecule would shift part of the conformational landscape to a distinct location, suggesting that fasudil would modulate the existing conformational ensemble as well as would create new conformational space. For a more discrete and clearer state-space decomposition of the conformational ensemble of αS (in apo or in ligand-bound form), we attempt to enumerate the optimum number of distinct metastable macrostates with non-negligible equilibrium populations by constructing MSM as explained in the following section.

2. Markov State Model elucidates distinct binding competing states of αS in presence of the small-molecule drug

The two-dimensional feature obtained from β-VAE was used as the input feature for the MSM estimator provided by PyEMMA.⁵⁷ A flowchart of the implemented protocol is drawn in Figure 3a. A geometric clustering approach, k -means, was used to discretise the ensemble. Analysis of the implied time scales as a function of lag-time predicted a Markovian model of αS in water with three kinetically separated states (see Figure 3d for relative population). On the other hand, a similar analysis of αS in presence of fasudil (Figure 3c)) indicated a clear increase in the number of spatially and temporally resolved conformational metastable states. Thus, a three-state MSM was built for the αS ensemble in water whereas a six-state model was built for the αS-fasudil ensemble. Hereafter, we will refer to the macrostates created in presence of fasudil prefixed with FS and those generated in water prefixed with MS. A bootstrapping analysis was performed to estimate the mean and standard deviations of the equilibrium populations of these metastable states (Figure 3d).

(a) A flowchart of the process of building a Markov State Model (MSM). (b, c) Implied timescales plot of the αS and αS-Fasudil systems. (c) Macrostate populations of the 3-state and 6-state MSM of the αS and αS-Fasudil systems. Bootstrapping was used to estimate the mean and standard deviations of the macrostate populations.

The MSM was also further validated for Markovianity i.e. ability to predict estimates at longer time scales using the Chapman-Kolmogorov (CK) test, depicted in Figure S1. This test further validates the quality of the model that it estimates long-time scale behavior with reasonable accuracy. The MSM analyses essentially indicate that the small-molecule binding interactions with αS led to sampling or generation of additional conformational states than the states populated in the apo αS ensemble.

A pertinent question arises: Are the appearances of newer states in αS in presence of fasudil a result of the presence of longer trajectory in fasudil solution (1500 µs) than that of the ensemble populated in neat water (62 µs)? We verified this by building an MSM for 60 µs of data of fasudil system using 10 µs and 70 µs segment of the trajectory (see Figure S2). Very interestingly, the MSM built using lesser data (and same amount of data as in water) also indicated the presence of six states of αS in presence of fasudil, as was observed in the MSM of the full trajectory. Together, this exercise invalidates the sampling argument and suggests that the increase in the number of metastable macrostates of αS in fasudil solution relative to that in water is a direct outcome of the interaction of αS with the small molecule.

3. Structural characterisation of metastable states of αS monomer in presence of fasudil

We characterised the residue-wise intramonomer contact maps of the metastable states to identify the differences in the interaction patterns in these states that can be attributed to interactions of fasudil with the monomer. The average inter-residue contact probability maps for each of the metastable states populated in the presence of fasudil and in water are depicted in Figure 4 and Figure 5, respectively. The residue-wise percentage secondary structure in each of the macrostates are presented in Figure S3 and S4, respectively.

Intrapeptide residue-wise contact probability maps of (a) six macrostates (FS1 to FS6) in the presence of fasudil. Axes denote the residue numbers. The color scale for the contact probability is shown at the extreme right of each panel of maps. The color bar along the axes of the plots represents the segments in the αS monomer. Specific contact regions are marked by boxes and numbered. These contacts are illustrated by representative snapshots and the corresponding contacts are similarly marked and numbered.

Intrapeptide residue-wise contact probability maps of the three macrostates (MS1 to MS3) in neat water. Axes denote the residue numbers. The color scale for the contact probability is shown at the extreme right. The color bar along the axes of the plots represents the segments in the αS monomer. Specific contact regions are marked by boxes and numbered. These contacts are illustrated by representative snapshots and the corresponding contacts are similarly marked and numbered.

In state FS1, antiparallel β-sheet interactions are formed within the N-terminal region. This β-sheet network also includes long-range interactions of the H2 region of the N-terminal with the negatively charged C-terminal region. Residues 70-80 in the hydrophobic NAC region are also involved in antiparallel β-sheet interactions with residues 120-130 in the C-terminal region. Short parallel β-sheet interactions also exist between the NAC and C-terminal residues with the N-terminal residues. The residues in the C-terminal exhibit 40-90 % of β-sheet propensity (see Figure S3). Long range interactions between N-terminal and C-terminal are relatively diminished in state FS2. This state has multiple parallel β-sheet interactions between short stretches of residues in the NAC region and C-terminal region with the N-terminal region. In addition. antiparallel β-sheet interactions are also present in NAC:N-terminal, NAC:C-terminal and within the N-terminal region. Residues 8 to 13 in the N-terminal and 86 to 92 in the NAC region exhibit significant helicity. State FS3 is characterised by ∼20 % of β-sheet interactions across the entire protein. Hydrophobic and polar, uncharged residues in the range 61 to 80 in the NAC region are involved in β-sheet interactions with the N-terminal (residues 1 to 10) and C-terminal (100 to 120). Moreover, β-sheet interaction networks are also present within the NAC and N-terminal regions. The C-terminal region 100 to 120 forms long-range interactions with residues 1 to 10 in the N-terminal region. Extensive short and long-range β-sheet networks are prevalent in state FS4. The entire NAC region forms antiparallel β-sheet network with the N-terminal. The C-terminal forms parallel β-sheets with the NAC and N-terminal. In the most populated state FS5, the key interactions are β-sheets formed between the oppositely charged termini. The NAC region also exhibits α-helical propensity. Finally, in state FS6, the NAC region (residues 60 to 80) forms antiparallel β-sheet network with the residues 30 to 60 of the N-terminal segment as well as the C-terminal region 100 to 120. These residues have ∼40 % β-sheet propensity.

On the other hand, the metastable states that are populated in neat water are distinct from those formed in the presence of fasudil. In state MS1, the NAC region residues 61 to 80 interacts with the N-terminal region by both parallel and antiparallel β-sheet interactions. The β-sheet network of NAC also incorporates interactions with residues 100 to 130 in the C-terminal region. These residues have ∼60 % β-sheet propensity (see Figure S4). State MS2, the most populated macrostate in neat water, is characterised by β-sheet interactions of the NAC region with H2 region (residues 30 to 60) of the N-terminal. Furthermore, the N and C-termini also form β-sheet interactions. State MS3 mainly has long-range interactions as β-sheets between the N and C-terminal regions. Moreover, a β-sheet network also exists within the NAC region and NAC:N-terminal (H2) interface. The residues in the NAC region have ∼50% β-sheet propensity in this macrostate.

We note that the differences in the structural features i.e. intra-protein contacts of the macrostates arise from the interactions of fasudil with the protein residues. For instance, in state FS2 in which fasudil interacts with multiple sites across the entire length of the protein, the secondary structure properties of this state are dictated by β-sheets formed by short stretches of residues rather than an extensive network. Furthermore, owing to the relatively stronger small-molecule binding interactions with hydrophobic residues in the N-terminal, the long-range interactions of the oppositely charged terminal regions are suppressed. Similarly, in state FS4, since fasudil primarily interacts with residues 120 to 140 of the C-terminal and other interactions are transient, the conformational properties of this metastable state is marked by a network of β-sheets across the rest of the protein. Essentially, small-molecule binding interactions via stabilisation of various intra-protein interactions modulate the conformational properties of αS.

4. Mapping the macrostates in αS and αS-Fasudil ensembles using Denoising Convolutional Variational Autoencoder

To ascertain the distinctions among these contact maps for the αS and αS-Fasudil system, we chose to filter the essential features and patterns by representing it in a lower dimensional space. Initially, this was attempted using a Convolutional Variational AutoEncoder (CVAE) with the contact maps of αS and αS-Fasudil macrostates as inputs. The model was initially trained using the six contact maps of αS-Fasudil system as a test case.

However, the model displayed a tendency to overfit by learning an identity function between the input and output, as evident from the large reconstruction error of the test dataset (see Figure S5 (a)). This overfitting issue was addressed by training a Denoising CVAE (DCVAE), where the contact map was intentionally corrupted with noise prior to training. This approach successfully mitigated the problem of overfitting as indicated by the loss profile of the test dataset (see Figure S5 (b)). An implementation is provided in GitHub (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder)

The denoising performance was measured by calculating the structural similarity index measure (SSIM) between the original and denoised contact map. SSIM measures the similarity between two images as perceived by the human eye by considering the luminance, contrast and structure of the image. SSIM values can range from -1 to 1, which represents perfect anti-correlation and perfect similarity, respectively. ⁵⁸ The SSIM values is tabulated in Figure 6. We find that the SSIM values for all the metastable states of αS and αS-Fasudil system are close to 1 (see Figure 6), thereby indicating that the denoised contact map as obtained from the trained DCVAE model is very close to the quality of the original contact map (see Figure S6-S8)). This is also supported by the high Peak signal-to-noise ratio (PSNR). PSNR is the ratio of the maximum possible power of a signal to the power of corrupting noise that measures the fidelity of a reconstructed image compared to its original in decibels (dB).⁵⁸ A higher value of PSNR indicates smaller difference between the original and denoised contact map (see Figure 6). Upon visualizing the latent space, it became evident that the contact map of the metastable states of αS and αS-Fasudil occupy distinct locations within the latent space, which indicates that the DCVAE model effectively captured the distinct patterns among the metastable states.

(a) Schematic of denoising convolutional variational autoencoder (b) Latent space of the contact map of αS and αS-Fasudil metastable states. (c) SSIM and PSNR values for αS and αS-Fasudil metastable states.

5. Fasudil exhibits conformation-dependent interactions with individual metastable states

We next sought to understand the specificities of the interactions of fasudil with the different metastable states of αS. Accordingly, we calculated the contact probabilities between fasudil and each αS residue. A contact between fasudil and a protein residue is considered to be present when the minimum distance between any heavy atom of fasudil and a protein residue is less than 0.6 nm. This is calculated for the entire population of each macrostate to determine the contact probabilities. The αS-Fasudil contact maps for the six macrostates are presented in Figure 7a and snapshots of the interactions are presented in Figure 7b.

(a) Contact probability map of residue-wise interaction of fasudil with αS. (b) Overlay of representative conformations from the six macrostates (FS1 to FS6) along with the bound fasudil molecules. The conformations are colored segment-wise as shown in the legend. The fasudil molecules, in licorice representation, are colored yellow.

The residue-interaction profiles of each of the six individual macrostates of αS with fasudil revealed distinct motifs of interaction across the length of the protein for the different macrostates (Figure 7). Precedent investigations^26,32 have reported dominant fasudil-αS interaction via the C-terminal region of the protein while it weakly interacts with the entire αS sequence. In state FS1, fasudil dominantly interacts with residues 133-139 of the C-terminal region, which consists of negatively charged residues D135 and E137 as well as the polar, neutral tyrosine residues (Y133 and Y136).

In addition, there are also non-specific interactions of fasudil with hydrophobic residues in the NAC region (residues 71 to 74). In state FS2, fasudil interactions are spread across the entirety of the protein. These comprise hydrophobic residues in the N-terminal region, neutral polar residues (62 to 66) in the NAC region and residues 130 to 136 in the C-terminal region. Fasudil interacts strongly with the NAC region consisting of hydrophobic and polar, neutral residues in state FS3. Interactions also prevail with the negatively charged C-terminal region and residues 1 to 6 of the N-terminal region. While interacting non-specifically with residues in the amplipathic N-terminal region of state FS4, fasudil preferentially interacts with residues 121-140 of the C-terminal region via charge-charge and π-stacking interactions with the negatively charged residues (D135, E137, E139) and tyrosine side chains (Y125, Y133, Y136). In the most populated state FS5, fasudil interacts weakly with the hydrophobic residues and glutamic acid residues while interactions with the C-terminal region 107 to 140 comprising of tyrosines and acidic amino acids are dominant. Lastly, in state FS6, the strongest interactions of fasudil comprise hydrophobic and polar residues in the range 118 to 121. There are also weak interactions with hydrophobic residues in the N-terminal (11 to 16, 38, 39) and NAC regions (71 to 74, 80 to 95).

Robustelli et al.³² have reported that in the bound state, fasudil interactions with αS are favorable charge-charge and π-stacking interactions that form and break in a mechanism, they term as dynamic shuttling. The stacking interactions arise from the favourable orientation of fasudil’s isoquinoline ring and the aromatic ring in the side chain of tyrosine residue. Analysis of the time-series of the formation of different intermolecular interactions indicated the formation of charge-charge and π-stacking interaction with residues Y125, Y133 and Y136 and the shuttling among these interactions causes fasudil to remain localized to the C-terminal region. Notably, we observe that each of the metastable, while preferentially interacting with the C-terminal also has distinct yet significant interactions with hydrophobic residues in the N-terminal and NAC regions.

6. Entropic Signatures of Small Molecule Binding

The overall analysis of the conformational ensemble of αS in presence of fasudil demonstrates that small-molecule binding substantially modulates the ensemble of αS. Our results indicate that the interaction of fasudil with αS residues governs the structural features of the protein. Estimation of the metastable states of αS shows that compared to aqueous conditions in which three metastable states are identified, the ensemble populated in the presence of fasudil in solution contains six metastable states. Since IDPs are described as an ensemble unlike folded proteins with a fixed structure, the effect of small molecules binding on disordered proteins or regions, is manifested as modulation of the whole ensemble by shifting the population of the states leading to a change in its conformational entropy.^59–61 The common mechanism that is generally postulated is the entropic collapse or folding upon binding model.⁶² It describes the disorder-to-order transition that IDPs typically undergo upon interacting with its biological target. This transition leads to the shift in population to a more ordered, predominant state causing a loss of conformational entropy that is compensated by a corresponding enthalpic gain. Conversely, small-molecule binding may also lead to entropic expansion in which newer conformational states are populated that are otherwise undetectable.⁶⁰ Additionally, in cases where there is no gross change in the conformational entropy and the binding is considered fuzzy, ³⁴ referred to as isentropic shift.

The observations reported in the present investigation for αS monomer plausibly conform with the entropy-expansion model. ⁵⁹ According to this model, the interactions of small molecules with IDPs can promote the exploration of conformations with discernible probability, thereby expanding the disorderedness (conformational entropy) of the ensemble. ^19,59,60,63 The small molecule interaction with αS operates transiently, and not strongly as seen in structured proteins. These transient local interactions stabilise the conformations that are inaccessible in the absence of the small molecule.

To gain deeper insights into the entropic effect of small-molecule binding to αS, we further estimated protein conformational entropy as well as water (solvent) entropy. We evaluated the entropy of water using the 2PT^64,65 method. First, we determined the molar entropy of pure water that yielded an entropy value of 55.5 J mol⁻¹ K ⁻¹, which is close to the reported value of 54.4 J mol⁻¹ K ⁻¹ from free energy perturbation calculation.⁶⁶ However, the molar entropy is marginally reduced in all the macrostates of both αS and αS-fasudil systems, with values ranging between 55.01 J mol⁻¹ K ⁻¹ to 55.25 J mol⁻¹ K ⁻¹, with difference from that of bulk water being negligible. Upon closer investigation of the translational and rotational components of the total water entropy, we find that that the decrease of the translational component is more pronounced as compared to the rotational component. This implies that αS in its apo state or in the presence of fasudil has a comparatively greater impact on the translational component than on the rotational component of the entropy of water (see Figure S9). Nonetheless, the entropy analysis suggests that the presence of fasudil has an insignificant impact on the entropy of water.

Next, we estimated the protein conformational entropy using the PDB2ENTROPY program⁶⁷ that implements the Maximum Information Spanning Tree (MIST) approach^68,69 on sets of torsional angles of the protein. The entropy is calculated from probability distributions of torsional degrees of freedom of the protein relative to random distribution i.e. corresponding to a fully flexible protein chain. The total protein entropy values for the metastable states in water (MS1 to MS3) and fasudil-bound states (FS1 to FS6) are shown in Figure 8. While the conformational entropies of the ligand-free states, MS1 to MS3, range from -72 to -75 J mol⁻¹ K⁻¹, the entropy of fasudil-bound states (FS1 to FS6) vary from -70 to -86 J mol⁻¹ K⁻¹, indicating that binding of fasudil to αS leads to a broadening of the span of entropy of these states. The highest entropy among the metastable states in the two environments correspond to FS5 and MS2 with values of -70.8 J mol⁻¹ K⁻¹ and -72.3 J mol⁻¹ K⁻¹, respectively. These states are also the most populated metastable states. MS1 and MS3 states have entropy values of -73.9 J mol⁻¹ K⁻¹ and -74.5 J mol⁻¹ K⁻¹, respectively. Notably, the presence of fasudil and its interactions with the protein chains can elicit the exploration of states that are more ordered in nature or increasingly deviated from a random chain. The largest deviation from a fully disordered chain is attained by the state FS1 with entropy value of -85.3 J mol⁻¹ K⁻¹, followed by FS2 having entropy of -83.1 J mol⁻¹ K⁻¹. FS4 is also relatively more ordered than FS5 with entropy of -76.2 J mol⁻¹ K⁻¹. FS3 and FS6 are closely disordered as FS5 with entropy values of -73.2 J mol⁻¹ K⁻¹ and -72.5 J mol⁻¹ K⁻¹, respectively. These observations of the total extent of disorderedness of the metastable states clearly indicate that the interactions of fasudil with αS residues can restrict their conformational degrees of freedom thus entrapping the protein in specific conformational states, thus manifesting as overall decrease in the protein entropy.

(a) Total Protein entropy of the metastable states of αS and αS-Fasudil calculated from the torsion angles, relative to a fully flexible chain (b) Total water entropy in αS and αS-Fasudil system and (c) Residue-wise backbone entropy within the αS and αS-Fasudil states.

Considering the diffuse binding of fasudil across several residues along the entire length of the protein, we further estimated and analysed how the entropic contributions are affected at the residue level (shown in Figure 8 and Figure S10, S11). The total residue-wise entropy presented in Figure S10 shows that the residues in the N-terminal region of fasudil-bound states FS1 and FS2, exhibit greater entropic fluctuations i.e. more negative values from the apo metastable states (MS1 to MS3), suggesting more ordered or restricted fluctuations of these residues. We note that the propensity of fasudil binding to the N-terminal residues is higher in FS1 and FS2, relative to other states (see Figure 7). We further decoupled the residue entropy into backbone and sidechain components; this analysis is depicted in Figure 8 and Figure S11, respectively. Comparison of the sidechain entropies, computed from the χ torsion angles, for the unbound and bound states, does not show significant alteration upon small-molecule binding. Conversely, increased fluctuations in residue backbone entropies are observed in response to small-molecule binding (see Figure 8). The metastable states, MS1 to MS3, populated in water have minimally fluctuating residue entropies amongst the states. In contrast, significant fluctuations in residue entropies are noted predominantly in the N-terminal region (residues 1 to 60) and to some extent in the NAC region for states FS1 and FS2. These residues in the N-terminal region in FS1 and FS2 experience a drop in backbone entropy owing to the restricted degrees of freedom from fasudil binding. The decrease in residue backbone entropy correlates with fasudil binding hotspots and therefore can be ascribed to small-molecule interactions. Hydrophobic residues in the NAC region show decrease in entropy in ligand-binding locations mainly in state FS1, FS2 and FS4. However, intriguingly, we note that the correlation between fasudil interactions and residue entropy is not observed in state FS3. Finally, in the negatively charged C-terminal region, where fasudil preferentially targets the entire region in the αS protein, residue-wise entropic fluctuations are minimal compared to the apo state. Robustelli et al. refer to the interaction of fasudil with the C-terminal region as dynamic shuttling, in which breaking and formation of interactions with nearby residues occur simultaneously. This dynamic shuttling mechanism could possibly minimise the backbone entropic fluctuations compared to the relatively specific binding in the N-terminal and NAC regions.

7. Exploring interplay of conformational entropy and Mean First Passage Time in Protein Conformational Transitions

The Mean First Passage Time (MFPT) for transition among the metastable states provides valuable information for understanding the dynamics and conformational changes in the system. MFPT rate defines the average time it takes for a system to transition between one metastable state to another. A lower MFPT value suggests a favourable transition among the states. The transition timescales for αS indicate that in the presence and absence of fasudil, the transition to the most populated state is preferred (see Figure 9). For αS-Fasudil system, the transition to the most populated states takes place over timescales ranging from 74 µs and 766.5 µs, whereas for the apo state it occurs between 29.8 µs and 37.4 µs. Upon comparing these transition times with those to other macrostates, it becomes evident that these timescales represent the shortest durations. This observation suggests that the transition to the MS2 and FS5 state in αS and αS-Fasudil system is the most favoured over all other states in the system.

The rate network obtained from MFPT analysis for the (a) 3-state MSM of the αS ensemble in neat water and (b) 6-state MSM of the αS-Fasudil ensemble. The sizes of the discs is proportional to the stationary population of the respective state. The thickness of the arrows connecting the states is proportional to the transition rate (1/MFPT) between the two states and the MFPT values are shown on the arrows. (c, d) Projection of conformation sub-ensemble PCCA+ for αS and αS-Fasudil system in latent space, with respective protein entropy values (in unit of J mol⁻¹ K ⁻¹) annotated on top of it. (e) Correlation between protein entropy and transition time to the major state for αS and αS-Fasudil system. The inset plot corresponds to the correlation between entropy and transition to the macrostate for αS and αS-Fasudl system.

While MFPT offers insights into the kinetics of the metastable states, the thermodynamic property, entropy, characterizes the diversity and disorder within the system. Although a state with higher population might not always have the higher entropy, we observed that in our case, the state with the highest population also has the highest protein entropy, for both αS and αS-Fasudil system. States with higher entropy are thermodynamically favored which in turn can affect the rates of state transitions, potentially influencing MFPT values. To decipher this, we have compared the transition times of each metastable state to the most populated state with the entropy of the state, and found a strong degree of correlation (see Figure 9e). This suggests that a state with higher entropy, which also has more number of accessible microstates, potentially provide multiple pathways for other macrostates to access this state for both αS and αS-Fasudil system. However, as compared to αS in its apo state, the apparent increase (∼ 2-20 times) in the time to transition to the major state in αS-Fasudil system, suggests that fasudil may potentially trap the protein conformations in the intermediate states, thereby slowing down αS in exploring the large conformational space in presence of fasudil.

Conclusions

The conformationally dynamic disordered proteins challenge the application of conventional structure-based drug-design methods that relies on the specificity of a binding pocket, based on which the drug molecule is designed and optimized. A general framework describes the effect of small-molecule binding to IDPs by modulation of the disordered ensemble by increasing or decreasing its conformational entropy. Here, we characterised the effect of the small molecule, fasudil, on the conformational characteristics of the quintessential IDP, α-synuclein. Fasudil has been experimentally reported to curb the aggregation of αS both in vivo and in vitro. The ensemble view of long time scale atomic simulations of fasudil binding to αS monomer displays a fuzzy ensemble of αS with preferential binding of fasudil to the C-terminal region via charge-charge interactions and aromatic stacking. Using a deep neural-network based machine learning approach, variational autoencoder, we reconstructed a high dimensional feature i.e. Cα-pairwise distance into a two-dimensional latent space. We built a Markov State Model using this data to delineate the metastable conformational states of αS and identify the differences that arise from small-molecule interactions. Comparison of the metastable states of αS in the presence and absence of the small molecule fasudil revealed that the small-molecule binding navigates the structural landscape of the protein. Importantly, more number of metastable states are populated indicating that small-molecule mediated conformational modulation led to entropy expansion (see Figure 9d-e).

Atomic-level MD simulations have slowly emerged as a a valuable tool for complementing experimental measurements of disordered proteins and providing detailed descriptions of their conformational ensembles. ^70–73 With improvements in force fields^33,74,75 that model disordered proteins, the accuracy of MD simulations has dramatically increased, as assessed by their agreement with a wide variety of experimental measurements. Integrative approaches use a combination of experimental restraints such as chemical shifts, paramagnetic relaxation enhancement (PRE),^70,71,73,76 residual dipolar coupling constants (RDCs),^76–78 small-angle X-ray scattering (SAXS)^79,80 and computational models from MD simulations to generate a more accurate description of the ensemble of disordered proteins. More recent advances include statistical approaches such as Bayesian inference models like Metainference,³⁴ Experimental Inferential Structure Determination (EISD)^81,82 and Maximum Entropy formulations^83–85 that also take into account the experimental and back-calculation model errors and uncertainties. Such improved approaches to leverage MD simulations with improved force fields and experimental knowledge-based ensemble generation have shown tremendous potential for describing molecular recognition mechanisms of IDPs with other protein partners; scenarios such as folding-upon-binding^62,86 or self-dimerisation. More importantly, a set of recent initiatives have focussed on molecular characterisation of the interaction of small-molecules with IDPs via effective combination of atomistic simulation and biophysical experiments.^19,32

In this regard, the present investigation takes a step ahead via a quantitative and statistical dissection of the αS ensemble in presence of a small-molecule. The implementation of Markov state model in the present work clearly brings out the feasibility of metastable states that would have a finite ‘stationary’ or equilibrium population in presence of small-molecule or in neat water. The uniqueness of these states is supported from the projection of their inter-residue C_α contact map of the macrostates using DCVAE model. Our systematic investigation of the impact of fasudil on αS monomer ensemble reveals the altering effects of small-molecule interaction on αS ensemble by tuning the entropy of these states. Insights from studies targeting structural ensemble modulation by small-molecule can be effectively harnessed in designing drug-design strategies for aggregation diseases.

Computational Methods

Monomer Simulation Protocol

A 1500 µs long all-atom MD simulation trajectory of αS monomer in aqueous fasudil solution was simulated by D. E. Shaw Research with GPU/Desmond in Anton supercomputer that is specially purposed for running long-time-scale simulations. ³² The protein was solvated in a cubic water-box with 108 Å sides and containing 50 mM concentration of Na or Cl ions. Protein, water molecules and ions present in the system were parameterised with the a99SB-disp force field.³³ The small-molecule, fasudil, was parameterised using the generalised Amber force-field (GAFF). Production run of the system was performed in the NPT ensemble. Further details of the MD simulation can be obtained by referring to the original work by Robustelli et al. ³² The extensive simulation trajectories were generously made available by D. E. Shaw Research upon request.

To generate αS ensemble in water, we performed multiple simulations of αS monomer. The starting conformations of these simulations were randomly selected from a previously reported³³ multi-µs αS trajectory in water. The simulation details are described below. Each protein conformation is solvated in a cubic box of explicit water molecules and Na or Cl ions were added to a concentration of 50 mM, to maintain the same conditions as used in the simulations with fasudil. The all-atom a99SB-disp force field and water model³³ were used to simulate the protein and solvent. The unbiased MD simulations were performed using the Gromacs simulation package. ⁸⁷ A timestep of 2 fs and the leap-frog integrator was used. The simulation was performed in the isothermal-isobaric (NPT) ensemble at an average temperature of 300 K, maintained using the v-rescale thermostat⁸⁸ with a relaxation time of 0.1 ps and a pressure of 1 bar using the Parrinello-Rahman barostat⁸⁹ with a time constant of 2 ps for coupling. The Verlet cutoff scheme⁹⁰ with 1.0 nm cutoff was applied for Lennard-Jones interactions and short-range electrostatic interactions. Long-range electrostatic interactions were computed using the Particle-Mesh Ewald (PME) summation method⁹¹ and covalent bonds involving hydrogen atoms were constrained using the LINCS algorithm.⁹² All the systems were minimized using the steepest descent algorithm, followed by equilibration in the isothermal-isochoric (NVT) and subsequently in the NPT ensemble with position restraints on the heavy atoms of the protein. Twenty-three simulations, each starting from different conformations, were performed. These simulation timescales are variable, ranging from 1 to 4 µs, amounting to a cumulative length of ∼62 µs.

Dimension reduction using β-Variational Autoencoder

The probability distribution p(x) of a multi-dimensional variable x can be determined using various approaches,^93,94 one of them being the latent variable model. In this model, the probability distribution is modeled as a joint distribution p(x, z) of the observable variable x and the hidden latent variable z which is given as p(x, z) = p(x|z)p(z), where p(z) is the prior distribution over the latent variables z and p(x|z) is the likelihood of generating a true data, given z. The p(x) is determined by marginalization over all the latent variables giving us,

However, this integral is often intractable thereby constraining the calculation of the true posterior distribution p(z|x). This can be solved using, variational inference, which aims to estimate the true posterior using an approximate distribution q(z|x), using the encoder part of VAE which parameterizes the distribution using a mean vector and a variance vector. The goal is to determine q(z|x) which is closest to the true posterior by minimizing the Kullback-Leibler (KL) divergence between them which is given as

Upon expanding and rearranging, we have

where p(x) is the evidence. The VAE is trained so that we maximize the log-likelihood of the evidence and minimize the KL divergence between the estimated and true posterior. The loss function of VAE is given by the negation of equation 2 and is given as,

The training objective of VAE being minimizing this loss function to generate a realistic data. The VAE model can be further improved by keeping the distance between the approximated and the true posterior under a small distance, δ. This is achieved by maximizing the equation

which modifies the loss function to

where β is the Lagrangian multiplier (under the Karush-Kuhn-Tucker conditions). This enhances the probability of generating an actual data and is known as β-VAE.⁹⁵ The expectation term in the loss function of β-VAE requires a point z to be sampled from the posterior distribution. This is performed using the reparametrization trick

where E is sampled from a Standard Normal distribution. This makes the VAE model trainable using the backpropagation algorithm.

In our work we have used C_α-C_α pairwise distance (excluding i, i + 1 and i + 2 distances) sampled at every 9 ns of the entire 1.5 millisecond of the αS protein in presence of fasudil trajectory as our input. We have built our model using Tensorflow backend⁹⁶ with Keras library. Training was performed with 80% of the data while 20% was used for testing. We have employed a symmetric VAE where the input layer consists of 9453 nodes followed by four hidden layers having 4096, 512, 128 and 16 nodes respectively, and 2 nodes in the latent layer. Hyperbolic tangent (tanh) is used as our activation function with weights being initialized using the glorot_uniform⁹⁷ method in all the layers except for the output layer where we have used sigmoid activation. Adaptive Moment Estimation (Adam⁹⁸) was used as the optimizer with a learning rate of 1×10⁻⁴. The model was trained for 300 epochs with a batch size of 64. The relevant Python implementations are provided in Github (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder).

Building the Denoising Convolutional Variational AutoEncoder

The Denoising Convolutional Variational Autoencoder (DCVAE) model employs five convolutional layers which incorporate filters of sizes 16, 32, 64, 96, and 128, with corresponding kernels of size 3×3. Strides are set at 1, 2, 2, 2, 2, and 1 for efficient feature extraction. This is followed by six densely connected layers with neuron counts of 4096, 2048, 1024, 256, 64 and 16 in the successive dense layers. The latent layer is comprised of 2 neurons. The decoder mirrors the encoder architecture, to reconstruct back the input contact maps. Hyperbolic tangent activation is used in all the layers, excluding the output layer where sigmoid activation is used. Adam optimizer with a learning rate of 5×10⁻⁴ is used for training the DCVAE. The model was trained for 300 epochs with a batch size of 32 with β = 1×10⁻¹². Training was performed with 90% of the data while 10% was used for testing.

At first, we added a noise to 200 replicas of contact map for each of the states of αS and αS-fasudil. The noise was generated stochastically from a standard normal distribution which was multiplied by a noise factor of 0.035 and added to each of the replicas, followed by scaling the dataset so that the values range between 0 and 1 using the formula

where x_min and x_max are the minimum and maximum values of the input contact map. In order to assess the denoising performance, we have calculated the structural similarity index measure (SSIM) values.⁵⁸ The SSIM value can range between -1 to +1, thereby indicating the structural dissimilarity or similarity between the original (x) and denoised (y) contact maps. Mathematically, this is expressed as

where the functions l(x, y), c(x, y),and s(x, y) compare the luminance, contrast and structure between the contact maps and is given as

The variables µ_i and σ_i represent the pixel sample mean and variance of i, where i = {x, y}. σ_xy is the covariance of the x and y contact maps. The exponents α, β and γ controls the weighted contribution to the SSIM value and are set to 1. The constants C₁ = (k₁L)², C₂ = (k₂L)² are small positive values introduced to prevent instability and division by zero. They are used for numerical stability and to avoid problems when the means and variances of the contact maps are close to zero. We have set k₁ = 0.01, k₂ = 0.03 and C₃ = C₂/2 as per convention.⁹⁹

In addition, we have also calculated the peak Signal-to-Noise Ratio to evaluate our denoising performance and is calculated as

where, R is the maximum pixel value (255 in our case) and MSE is the mean-square error between the original and denoised image and is given as

where M is the height (number of rows) of the image, N is the width (number of columns) of the image and O_ij and D_ij is the pixel intensity at position (i, j) for the original and denoised image respectively. The relevant Python implementations are provided in Github (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder).

Building a Markov State Model of monomer ensemble

The 2D latent space reconstructed using β-VAE described in the previous subsection was then used to build a Markov State Model to elicit the metastable states underlying the ensembles. We employed PyEMMA,⁵⁷ a Markov State Model (MSM) building and analysis package. The 2D data is subjected to k -means clustering into 180 microstates. A transition matrix was then built by counting the number of transitions among the microstates at a specific lag time. To choose the lag time, the implied timescale (ITS) or relaxation timescales of the systems were calculated over a range of lag times and plotted as a function of lag time. The timescale at which the ITS plot levels off is chosen as the lag time to build the final MSM. The ITS plots corresponding to the neat water and aqueous fasudil systems are presented in Figure 3. Accordingly, MSM lag times of 36 and 32 ns were selected for neat water fasudil systems, respectively. At these lag times, by identifying the gaps between the ITS, a three-state model for neat water and a six-state model for aqueous fasudil system was built. Lastly, the transition path theory^100–102 was used to ascertain the transition paths, fluxes and timescales in these models. Bootstrapping was performed to estimate the errors associated with the state populations and transition timescales. In ten iterations, the model was rebuilt after eliminating a randomly selected trajectory followed by calculation of the state populations and transition timescales. The mean value and standard deviations of the values collected by bootstrapping are reported. A Chapman-Kolmogorov test was performed for the MSMs of the neat water and small molecule ensembles, that tests the validity of the model prediction at longer time scales (see Figure S1).

Entropy of water

In order to calculate the entropy of water, simulations were performed using GROMACS (version 2022.3). A single snapshot was chosen from each macrostate of α-synuclein in its apo state and in presence of fasudil, which were then solvated in a cubic box, placed within 1 nm from the box edge. The system was neutralized using NaCl and the salt concentration was kept at 50 mM. The system was energy minimized using the steepest descent algorithm, followed by equilibration in the canonical ensemble for 2 ps. The average temperature was maintained at 300 K using velocity rescale thermostat with a time constant of 0.1 ps. Following this, NPT equilibration was performed for 2 ps to maintain the pressure at 1.0 bar using the Berendsen barostat. Finally, three independent production runs were performed from the equilibrated structures in the NPT ensemble using the velocity-verlet integrator with a timestep of 1 fs for a timescale of 20 ps with frames saved at every 4 fs. In addition, we have performed three simulations of neat water using the parameters as described for the macrostates.

The entropy of water was estimated using the DoSPT program^103,104 that computes the thermodynamic properties from MD simulation in the framework of the 2-phase-thermodynamics (2PT) method.^64,65 This is achieved from calculating the density of state (DoS) function, I(v) which is the Fourier transform of the velocity auto correlation function,

where m_l is the mass of atom l, v^k is the velocity of atom l along the k direction and N is the total number of atoms in the system. DoS can be represented as a sum of translational (trn), rotational (rot) and vibrational (vib) motions. ⁶⁵

For water, the 2PT model overcomes the anharmonic nature of the low frequency modes (diffusive motions and libration) by decomposing the DoS into solid-like (I^s) and gas-like (I^g) contribution for the translational and rotational components. The solid-like DoS is treated using the harmonic oscillator model, whereas the gas-like DoS is described using the Enskog hard sphere (HS) theory for translation and the rigid rotor (RR) model for rotation. The entropy is calculated by integrating the DoS, weighted by the corresponding weighting functions W,

where k = trn, rot, or vib and the weighting functions are given as

where β = (k_BT)⁻¹ and h is the planck constant.¹⁰⁵

Protein conformational entropy

We calculated the protein conformational entropy using the program PDB2ENTROPY.⁶⁷ This program is based on the nearest-neighbor approach from probability distributions of torsion angles at a given temperature relative to uniform distributions. The entropy is computed using the Maximum Information Spanning Tree (MIST) approach.^68,69

All supplemental figures described in the article (PDF)

Code Availability

The implementations of Machine learning protocols are publicly available in our group GitHub webpage (https://github.com/JMLab-tifrh/Protein_Ligand_Variational_Autoencoder)

Supporting information

Supplemental figures

Acknowledgements

We sincerely thank D. E. Shaw research for providing us with access to long simulation trajectories of monomeric alpha-synuclein in presence of fasudil which seeded the project. We acknowledge support of the Department of Atomic Energy, Government of India, under Project Identification No. RTI 4007. JM acknowledges Core Research grants provided by the Department of Science and Technology (DST) of India (CRG/2023/001426).

References

(1)
1. Maroteaux L.
2. Campanelli J. T.
3. Scheller R. H
2015Intrinsically disordered proteins in cellular signalling and regulationNat. Rev. Mol. Cell. Biol 16:18–29Google Scholar
(2)
1. Babu M. M.
2. van der Lee R.
3. de Groot N. S.
4. Gsponer J
2011Intrinsically disordered proteins: regulation and diseaseCurrent Opinion in Structural Biology 21:432–440Google Scholar
(3)
1. Csizmok V.
2. Follis A. V.
3. Kriwacki R. W.
4. Forman-Kay J. D
2016Dynamic Protein Interaction Networks and New Structural Paradigms in SignalingChemical Reviews 116:6424–6462PubMed Google Scholar
(4)
1. Wang Y.
2. Fisher J. C.
3. Mathew R.
4. Ou L.
5. Otieno S.
6. Sublet J.
7. Xiao L.
8. Chen J.
9. Roussel M. F.
10. Kriwacki R. W
2011Intrinsic disorder mediates the diverse regulatory functions of the Cdk inhibitor p21Nature chemical biology 7:214–221Google Scholar
(5)
1. Dyson H. J
2016Making sense of intrinsically disordered proteinsBiophysical journal 110:1013Google Scholar
(6)
1. Alberti S.
2. Gladfelter A.
3. Mittag T
2019Considerations and challenges in studying liquid-liquid phase separation and biomolecular condensatesCell 176:419–434Google Scholar
(7)
1. Chong P. A.
2. Forman-Kay J. D
2016Liquid–liquid phase separation in cellular signaling systemsCurrent opinion in structural biology 41:180–186Google Scholar
(8)
1. Shin Y.
2. Brangwynne C. P
2017Liquid phase condensation in cell physiology and diseaseScience 357:eaaf4382Google Scholar
(9)
1. O’Flynn B. G.
2. Mittag T
2021The role of liquid–liquid phase separation in regulating enzyme activityCurrent opinion in cell biology 69:70–79Google Scholar
(10)
1. Uversky V. N.
2. Oldfield C. J.
3. Dunker A. K
2008Intrinsically disordered proteins in human diseases: introducing the D2 conceptAnnual review of biophysics 37:215–246Google Scholar
(11)
2011Bronowska, A. Thermodynamics of Ligand-Protein Interactions: Implications for Molecular Design, Thermodynamics - Interaction Studies - Solids, Liquids and Gases; 2011.Google Scholar
(12)
1. Klebe G
2015Applying thermodynamic profiling in lead finding and optimizationNat. Rev. Drug Discov 14:95–110Google Scholar
(13)
1. Metallo S. J
2010Intrinsically disordered proteins are potential drug targetsCurrent Opinion in Chemical Biology 14:481–488Google Scholar
(14)
2008Structural rationale for the coupled binding and unfolding of the c-Myc oncoprotein by small moleculesChemistry and Biology 15:1149–1155Google Scholar
(15)
1. Ono K.
2. Li L.
3. Takamura Y.
4. Yoshiike Y.
5. Zhu L.
6. Han H.
7. Mao X.
8. Ikeda T.
9. Takasaki T.
10. Nishijo H
2012Phenolic compounds prevent amyloid -protein oligomerization and synaptic dysfunction by site-specific bindingJ. Biol. Chem 287:14631–14643Google Scholar
(16)
1. Zhu M.
2. De Simone A.
3. Schenk D.
4. Toth G.
5. Dobson C. M.
6. Vendruscolo M
2013Identification of small-molecule binding pockets in the soluble monomeric form of the Abeta42 peptideThe Journal of Chemical Physics 139Google Scholar
(17)
1. Attanasio F.
2. Convertino M.
3. Magno A.
4. Caflisch A.
5. Corazza A.
6. Haridas H.
7. Esposito G.
8. Cataldo S.
9. Pignataro B.
10. Milardi D.
11. Rizzarelli E
2013Carnosine Inhibits Abeta-42 Aggregation by Perturbing the H-Bond Network in and around the Central Hydrophobic ClusterChemBioChem 14:583–592Google Scholar
(18)
1. Ehrnhoefer D. E.
2. Bieschke J.
3. Boeddrich A.
4. Herbst M.
5. Masino L.
6. Lurz R.
7. Engemann S.
8. Pastore A.
9. Wanker E. E
2008EGCG redirects amyloidogenic polypeptides into unstructured, off-pathway oligomersNature structural and molecular biology 15:558–566Google Scholar
(19)
1. Heller G. T.
2. et al.
2020Small-molecule sequestration of amyloid as a drug discovery strategy for Alzheimers diseaseScience Advances 6:eabb5924Google Scholar
(20)
1. Akoury E.
2. Gajda M.
3. Pickhardt M.
4. Biernat J.
5. Soraya P.
6. Griesinger C.
7. Mandelkow E.
8. Zweckstetter M
2013Inhibition of Tau Filament Formation by Conformational ModulationJournal of the American Chemical Society 135:2853–2862Google Scholar
(21)
1. others„,
2. et al.
2014Targeting the disordered C terminus of PTP1B with an allosteric inhibitorNature chemical biology 10:558–566Google Scholar
(22)
1. Kurzbach D.
2. Schwarz T. C.
3. Platzer G.
4. Hafler S.
5. Hinderberger D.
6. Konrat R
2014Compensatory Adaptations of Structural Dynamics in an Intrinsically Disordered Protein ComplexAngewandte Chemie International Edition 53:3840–3843Google Scholar
(23)
1. Biesaga M.
2. Frigol-Vivas M.
3. Salvatella X
2021Intrinsically disordered proteins and biomolecular condensates as drug targetsCurrent Opinion in Chemical Biology 62:90–100Google Scholar
(24)
1. Fields C. R.
2. Bengoa-Vergniory N.
3. Wade-Martins R
2019Targeting Alpha-Synuclein as a Therapy for Parkinson’s DiseaseFrontiers in Molecular Neuroscience 12Google Scholar
(25)
1. Tah G.
2. et al.
2014Targeting the Intrinsically Disordered Structural Ensemble of alpha-Synuclein by Small Molecules as a Potential Therapeutic Strategy for Parkinson’s DiseasePLOS ONE 9:1–11Google Scholar
(26)
1. Tatenhorst L.
2. Eckermann K.
3. Dambeck V.
4. Fonseca-Ornelas L.
5. Walle H.
6. da Fonseca Lopes
7. Koch T.
8. Becker J. C.
9. Tönges S.
10. Bähr L.
2016Fasudil attenuates aggregation of α-synuclein in models of Parkinson’s diseaseActa neuropathologica communications 4:1–17Google Scholar
(27)
1. Cao K.
2. Zhu Y.
3. Hou Z.
4. Liu M.
5. Yang Y.
6. Hu H.
7. Dai Y.
8. Wang Y.
9. Yuan S.
10. Huang G.
2022α-Synuclein as a Target for Metallo-Anti-Neurodegenerative AgentsAngewandte Chemie International Edition Google Scholar
(28)
1. Winner B.
2. Jappelli R.
3. Maji S. K.
4. Desplats P. A.
5. Boyer L.
6. Aigner S.
7. Hetzer C.
8. Loher T.
9. Vilar M.
10. Campioni S
2011In vivo demonstration that α-synuclein oligomers are toxicProceedings of the National Academy of Sciences 108:4194–4199Google Scholar
(29)
1. Emin D.
2. Zhang Y. P.
3. Lobanova E.
4. Miller A.
5. Li X.
6. Xia Z.
7. Dakin H.
8. Sideris D. I.
9. Lam J. Y.
10. Ranasinghe R. T
2022Small soluble α-synuclein aggregates are the toxic species in Parkinson’s diseaseNature communications 13:1–15Google Scholar
(30)
1. Stephens A. D.
2. Kölbel J.
3. Moons R.
4. Chung C. W.
5. Ruggiero M. T.
6. Mahmoudi N.
7. Shmool T. A.
8. McCoy T. M.
9. Nietlispach D.
10. Routh A. F.
2022Decreased Water Mobility Contributes To Increased α-Synuclein AggregationAngewandte Chemie International Edition Google Scholar
(31)
1. Ubbiali D.
2. Fratini M.
3. Piersimoni L.
4. Ihling C. H.
5. Kipping M.
6. Heilmann I.
7. Iacobucci C.
8. Sinz A
2022Direct Observation of Elongated Conformational States in α-Synuclein upon Liquid-Liquid Phase SeparationAngewandte Chemie 134:e202205726Google Scholar
(32)
1. Robustelli P.
2. Ibanez-de Opakua A.
3. Campbell-Bezat C.
4. Giordanetto F.
5. Becker S.
6. Zweckstetter M.
7. Pan A. C.
8. Shaw D. E
2022Molecular Basis of Small-Molecule Binding to alpha-SynucleinJournal of the American Chemical Society 144:2501–2510Google Scholar
(33)
1. Robustelli P.
2. Piana S.
3. Shaw D. E
2018Developing a molecular dynamics force field for both folded and disordered protein statesProceedings of the National Academy of Sciences 115:E4758–E4766Google Scholar
(34)
1. Bonomi M.
2. Camilloni C.
3. Cavalli A.
4. Vendruscolo M
2016Metainference: A Bayesian inference method for heterogeneous systemsScience advances 2:e1501177Google Scholar
(35)
1. Bottaro S.
2. Lindorff-Larsen K
2018Biophysical experiments and biomolecular simulations: A perfect match?Science 361:355–360Google Scholar
(36)
1. Gomes G.-N. W.
2. Krzeminski M.
3. Namini A.
4. Martin E. W.
5. Mittag T.
6. Head-Gordon T.
7. Forman-Kay J. D.
8. Gradinaru C. C
2020Conformational ensembles of an intrinsically disordered protein consistent with NMR, SAXS, and single-molecule FRETJournal of the American Chemical Society 142:15697–15710Google Scholar
(37)
1. Stelzl L. S.
2. Pietrek L. M.
3. Holla A.
4. Oroz J.
5. Sikora M.
6. KoÃàfinger J.
7. Schuler B.
8. Zweckstetter M.
9. Hummer G
2022Global structure of the intrinsically disordered protein tau emerges from its local structureJacs Au 2:673–686Google Scholar
(38)
1. Prinz J.-H.
2. Wu H.
3. Sarich M.
4. Keller B.
5. Senne M.
6. Held M.
7. Chodera J. D.
8. Schutte C.
9. Noe F
2011Markov models of molecular kinetics: Generation and validationJ. Chem. Phys 134:174105Google Scholar
(39)
1. Bowman G. R.
2. Beauchamp K. A.
3. Boxer G.
4. Pande V. S
2009Progress and challenges in the automated construction of Markov state models for full protein systemsJ. Chem. Phys 131:124101Google Scholar
(40)
1. Husic B. E.
2. Pande V. S
2018Markov State Models: From an Art to a ScienceJournal of the American Chemical Society 140:2386–2396PubMed Google Scholar
(41)
1. Adhikari S.
2. Mondal J
2023Machine Learning Subtle Conformational Change due to Phosphorylation in Intrinsically Disordered ProteinsThe Journal of Physical Chemistry B Google Scholar
(42)
1. Kingma D. P.
2. Welling M.
Auto-encoding variational bayesGoogle Scholar
(43)
1. Rezende D. J.
2. Mohamed S.
3. Wierstra D
2014Stochastic backpropagation and approximate inference in deep generative modelsInternational conference on machine learning :1278–1286Google Scholar
(44)
1. Hyvärinen A
2013Independent component analysis: recent advances. Philosophical Transactions of the Royal Society A: MathematicalPhysical and Engineering Sciences 371:20110534Google Scholar
(45)
1. Klema V.
2. Laub A
1980The singular value decomposition: Its computation and some applicationsIEEE Transactions on automatic control 25:164–176Google Scholar
(46)
1. Fisher R. A
1936The use of multiple measurements in taxonomic problemsAnnals of eugenics 7:179–188Google Scholar
(47)
1. Schölkopf B.
2. Smola A.
3. Müller K.-R
1997Kernel principal component analysisArtificial Neural Networks-ICANN 97: 7th International Conference Lausanne, Switzerland :583–588Google Scholar
48.
1. Kruskal J. B.
2. Wish M.
1978Multidimensional scalingGoogle Scholar
(49)
1. Tenenbaum J. B.
2. Silva V. d.
3. Langford J. C.
2000A global geometric framework for nonlinear dimensionality reductionScience 290:2319–2323Google Scholar
(50)
1. Faloutsos C.
2. Lin K.-I
1995FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasetsProceedings of the 1995 ACM SIGMOD international conference on Management of data :163–174Google Scholar
(51)
1. Roweis S. T.
2. Saul L. K
2000Nonlinear dimensionality reduction by locally linear embeddingscience 290:2323–2326Google Scholar
(52)
1. Sumithra V.
2. Surendran S
2015A review of various linear and non linear dimensionality reduction techniquesInt. J. Comput. Sci. Inf. Technol 6:2354–2360Google Scholar
(53)
1. Das P.
2. Moll M.
3. Stamati H.
4. Kavraki L. E.
5. Clementi C
2006Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reductionProceedings of the National Academy of Sciences 103:9885–9890Google Scholar
(54)
1. Schwantes C. R.
2. Pande V. S
2015Modeling molecular kinetics with tICA and the kernel trickJournal of chemical theory and computation 11:600–608Google Scholar
(55)
1. Wehmeyer C.
2. Noé F
2018Time-lagged autoencoders: Deep learning of slow collective variables for molecular kineticsThe Journal of chemical physics 148Google Scholar
(56)
1. Hinton G. E.
2. Salakhutdinov R. R
2006Reducing the dimensionality of data with neural networksscience 313:504–507Google Scholar
(57)
1. Scherer M. K.
2. Trendelkamp-Schroer B.
3. Paul F.
4. Pérez-Hernández G.
5. Hoffmann M.
6. Plattner N.
7. Wehmeyer C.
8. Prinz J.-H.
9. Noé F
2015PyEMMA 2: A software package for estimation, validation, and analysis of Markov modelsJournal of chemical theory and computation 11:5525–5542Google Scholar
(58)
1. Hore A.
2. Ziou D
2010Image quality metrics: PSNR vs. SSIM. 2010 20th international conference on pattern recognition:2366–2369Google Scholar
(59)
1. Heller G. T.
2. Bonomi M.
3. Vendruscolo M
2018Structural Ensemble Modulation upon Small-Molecule Binding to Disordered ProteinsJournal of Molecular Biology 430:2288–2292Google Scholar
(60)
1. Heller G. T.
2. Sormani P.
3. Vendruscolo M
2015Targeting disordered proteins with small molecules using entropyTrends Biochem. Sci 40:491–496Google Scholar
(61)
1. Flock T.
2. Weatheritt R. J.
3. Latysheva N. S.
4. Babu M. M
2014Controlling entropy to tune the functions of intrinsically disordered regionsCurrent Opinion in Structural Biology 26:62–72Google Scholar
(62)
1. Dyson H.
2. Wright P. E
2002Coupling of folding and binding for unstructured proteinsCurrent Opinion in Structural Biology 12:54–60Google Scholar
(63)
1. Löhr T.
2. Kohlhoff K.
3. Heller G. T.
4. Camilloni C.
5. Vendruscolo M
2022A Small Molecule Stabilizes the Disordered Native State of the Alzheimer’s Aβ PeptideACS Chemical Neuroscience 13:1738–1745Google Scholar
(64)
1. Lin S.-T.
2. Blanco M.
3. Goddard W. A
2003The two-phase model for calculating thermodynamic properties of liquids from molecular dynamics: Validation for the phase diagram of Lennard-Jones fluidsThe Journal of chemical physics 119:11792–11805Google Scholar
(65)
1. Lin S.-T.
2. Maiti P. K.
3. Goddard W. A
2010Two-phase thermodynamic model for efficient and accurate absolute entropy of water from molecular dynamics simulationsThe Journal of Physical Chemistry B 114:8191–8198Google Scholar
(66)
1. Mukherjee S.
2. Schäfer L. V
2023Thermodynamic forces from protein and water govern condensate formation of an intrinsically disordered protein domainNature Communications 14:5892Google Scholar
(67)
1. Fogolari F.
2. Maloku O.
3. Dongmo Foumthuim C. J.
4. Corazza A.
5. Esposito G
2018PDB2ENTROPY and PDB2TRENT: Conformational and Translational–Rotational Entropy from Molecular EnsemblesJournal of Chemical Information and Modeling 58:1319–1324Google Scholar
(68)
1. King B. M.
2. Tidor B
2009MIST: Maximum Information Spanning Trees for dimension reduction of biological data setsBioinformatics 25:1165–1172Google Scholar
(69)
1. King B. M.
2. Silver N. W.
3. Tidor B
2012Efficient calculation of molecular configurational entropies using an information theoretic approximationThe Journal of Physical Chemistry B 116:2891–2904Google Scholar
(70)
1. Dedmon M. M.
2. Lindorff-Larsen K.
3. Christodoulou J.
4. Vendruscolo M.
5. Dobson C. M
2005Mapping long-range interactions in α-synuclein using spin-label NMR and ensemble molecular dynamics simulationsJournal of the American Chemical Society 127:476–477Google Scholar
(71)
1. Allison J. R.
2. Varnai P.
3. Dobson C. M.
4. Vendruscolo M
2009Determination of the free energy landscape of α-synuclein using spin label nuclear magnetic resonance measurementsJournal of the American Chemical Society 131:18314–18326Google Scholar
(72)
1. Ullman O.
2. Fisher C. K.
3. Stultz C. M
2011Explaining the structural plasticity of α-synucleinJournal of the american chemical society 133:19536–19546Google Scholar
(73)
1. Esteban-Martín S.
2. Silvestre-Ryan J.
3. Bertoncini C. W.
4. Salvatella X
2013Identification of fibril-like tertiary contacts in soluble monomeric α-synucleinBiophysical journal 105:1192–1198Google Scholar
(74)
1. Best R. B.
2. Zheng W.
3. Mittal J
2014Balanced protein–water interactions improve properties of disordered proteins and non-specific protein associationJournal of chemical theory and computation 10:5113–5124Google Scholar
(75)
1. Huang J.
2. Rauscher S.
3. Nawrocki G.
4. Ran T.
5. Feig M.
6. De Groot B. L.
7. Grub-müller H.
8. MacKerell A. D
2017CHARMM36m: an improved force field for folded and intrinsically disordered proteinsNature methods 14:71–73Google Scholar
(76)
1. Bertoncini C. W.
2. Jung Y.-S.
3. Fernandez C. O.
4. Hoyer W.
5. Griesinger C.
6. Jovin T. M.
7. Zweckstetter M
2005Release of long-range tertiary interactions potentiates aggregation of natively unstructured α-synucleinProceedings of the National Academy of Sciences 102:1430–1435Google Scholar
(77)
1. Bernadó P.
2. Blanchard L.
3. Timmins P.
4. Marion D.
5. Ruigrok R. W.
6. Blackledge M
2005A structural model for unfolded proteins from residual dipolar couplings and smallangle x-ray scatteringProceedings of the National Academy of Sciences 102:17002–17007Google Scholar
(78)
1. Marsh J. A.
2. Baker J. M.
3. Tollinger M.
4. Forman-Kay J. D
2008Calculation of residual dipolar couplings from disordered state ensembles using local alignmentJournal of the American Chemical Society 130:7804–7805Google Scholar
(79)
1. Bernadó P.
2. Mylonas E.
3. Petoukhov M. V.
4. Blackledge M.
5. Svergun D. I
2007Structural characterization of flexible proteins using small-angle X-ray scatteringJournal of the American Chemical Society 129:5656–5664Google Scholar
(80)
1. Ahmed M. C.
2. Skaanning L. K.
3. Jussupow A.
4. Newcombe E. A.
5. Kragelund B. B.
6. Camilloni C.
7. Langkilde A. E.
8. Lindorff-Larsen K
2021Refinement of α-synuclein ensembles against SAXS data: Comparison of force fields and methodsFrontiers in molecular biosciences 8Google Scholar
(81)
1. Brookes D. H.
2. Head-Gordon T
2016Experimental inferential structure determination of ensembles for intrinsically disordered proteinsJournal of the American Chemical Society 138:4530–4538Google Scholar
(82)
1. Lincoff J.
2. Krzeminski M.
3. Haghighatlari M.
4. Teixeira J.
5. Gomes G.-N. W.
6. Gradinaru C. C.
7. Forman-Kay J. D.
8. Head-Gordon T.
Extended Experimental Inferential Structure Determination Method for Evaluating the Structural Ensembles of Disordered Protein StatesGoogle Scholar
(83)
1. Roux B.
2. Weare J
2013On the statistical equivalence of restrained-ensemble simulations with the maximum entropy methodThe Journal of chemical physics 138:02B616Google Scholar
(84)
1. Boomsma W.
2. Ferkinghoff-Borg J.
3. Lindorff-Larsen K
2014Combining experiments and simulations using the maximum entropy principlePLoS computational biology 10:e1003406Google Scholar
(85)
1. Cavalli A.
2. Camilloni C.
3. Vendruscolo M
2013Molecular dynamics simulations with replica-averaged structural restraints generate structural ensembles according to the maximum entropy principleThe Journal of chemical physics 138:03B603Google Scholar
(86)
1. Robustelli P.
2. Piana S.
3. Shaw D. E
2020Mechanism of Coupled Folding-upon-Binding of an Intrinsically Disordered ProteinJournal of the American Chemical Society 142:11092–11101PubMed Google Scholar
(87)
1. Lindahl E.
2. Abraham M.
3. Hess B.
4. Spoel v. d. D.
2020GROMACS 2020.1 Source codeGoogle Scholar
(88)
1. Bussi G.
2. Donadio D.
3. Parrinello M.
2007Canonical sampling through velocity rescalingThe Journal of chemical physics 126:014101Google Scholar
(89)
1. Parrinello M.
2. Rahman A
1981Polymorphic transitions in single crystals: A new molecular dynamics methodJournal of Applied physics 52:7182–7190Google Scholar
(90)
1. Páll S.
2. Hess B
2013A flexible algorithm for calculating pair interactions on SIMD architecturesComputer Physics Communications 184:2641–2650Google Scholar
(91)
1. Darden T.
2. York D.
3. Pedersen L
1993The effect of long-range electrostatic interactions in simulations of macromolecular crystals–a comparison of the ewald and truncated list methodsJ. Chem. Phys 99Google Scholar
(92)
1. Hess B.
2. Bekker H.
3. Berendsen H. J.
4. Fraaije J. G
1997LINCS: a linear constraint solver for molecular simulationsJournal of computational chemistry 18:1463–1472Google Scholar
(93)
1. Bengio S.
2. Bengio Y
2000Taking on the curse of dimensionality in joint distributions using neural networksIEEE Transactions on Neural Networks 11:550–557Google Scholar
(94)
1. Larochelle H.
2. Murray I.
2011The neural autoregressive distribution estimator. Proceedings of the fourteenth international conference on artificial intelligence and statistics:29–37Google Scholar
(95)
1. Higgins I.
2. Matthey L.
3. Pal A.
4. Burgess C.
5. Glorot X.
6. Botvinick M.
7. Mohamed S.
8. Lerchner A.
2016beta-vae: Learning basic visual concepts with a constrained variational framework. International conference on learning representationsGoogle Scholar
(96)
1. others„ et al.
Tensorflow: Large-scale machine learning on heterogeneous distributed systemsGoogle Scholar
(97)
1. Glorot X.
2. Bengio Y
2010Understanding the difficulty of training deep feedforward neural networksProceedings of the thirteenth international conference on artificial intelligence and statistics :249–256Google Scholar
(98)
1. Kingma D. P.
2. Ba J.
2014Adam: A method for stochastic optimizationGoogle Scholar
(99)
1. Wang Z.
2. Bovik A. C.
3. Sheikh H. R.
4. Simoncelli E. P
2004Image quality assessment: from error visibility to structural similarityIEEE transactions on image processing 13:600–612Google Scholar
(100)
1. Weinan E.
2. Vanden-Eijnden E
2006Towards a theory of transition pathsJournal of statistical physics 123:503–523Google Scholar
(101)
1. Metzner P.
2. Schütte C.
3. Vanden-Eijnden E
2009Transition path theory for Markov jump processesMultiscale Modeling & Simulation 7:1192–1219Google Scholar
(102)
1. Noé F.
2. Schütte C.
3. Vanden-Eijnden E.
4. Reich L.
5. Weikl T. R
2009Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulationsProceedings of the National Academy of Sciences 106:19011–19016Google Scholar
(103)
1. Caro M. A.
2. Laurila T.
3. Lopez-Acevedo O
2016Accurate schemes for calculation of thermodynamic properties of liquid mixtures from molecular dynamics simulationsThe Journal of chemical physics 145Google Scholar
(104)
1. Caro M. A.
2. Lopez-Acevedo O.
3. Laurila T
2017Redox potentials from ab initio molecular dynamics and explicit entropy calculations: Application to transition metals in aqueous solutionJournal of chemical theory and computation 13:3432–3441Google Scholar
(105)
1. Huang S.-N.
2. Pascal T. A.
3. Goddard W. A.
4. Maiti P. K.
5. Lin S.-T
2011Absolute entropy and energy of carbon dioxide using the two-phase thermodynamic modelJournal of chemical theory and computation 7:1893–1901Google Scholar

Article and author information

Author information

Sneha Menon
Tata Institute of Fundamental Research, Hyderabad 500046, India
- These authors contributed equally to this work
Subinoy Adhikari
Tata Institute of Fundamental Research, Hyderabad 500046, India
- These authors contributed equally to this work
Jagannath Mondal
Tata Institute of Fundamental Research, Hyderabad 500046, India
- jmondal@tifrh.res.in,+914020203091

Version history

Preprint posted: March 27, 2024
Sent for peer review: March 27, 2024
Reviewed Preprint version 1: June 3, 2024
Reviewed Preprint version 2: November 8, 2024
Version of Record published: December 18, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.97709. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Rosana Collepardo
University of Cambridge, Cambridge, United Kingdom
Senior Editor
Qiang Cui
Boston University, Boston, United States of America

Reviewer #1 (Public Review):

Summary:

This is a well-conducted study about the mechanism of binding of a small molecule (fasudil) to a disordered protein (alpha-synuclein). Since this type of interaction has puzzled researchers for the last two decades, the results presented are welcome as they offer relevant insight into the physical principles underlying this interaction.

Strengths:

The results show convincingly that the mechanism of entropic expansion can explain the previously reported binding of fasudil to alpha-synuclein. In this context, the analysis of the changes in the entropy of the protein and of water is highly relevant. The combination use of machine learning for dimensional reduction and of Markov State Models could become a general procedure for the analysis of other systems where a compound binds a disordered protein.

Weaknesses:

It would be important to underscore the computational nature of the results, since the experimental evidence that fasudil binds alpha-synuclein is not entirely clear, at least to my knowledge.

https://doi.org/10.7554/eLife.97709.1.sa2

Reviewer #2 (Public Review):

The manuscript by Menon et al describes a set of simulations of alpha-Synuclein (aSYN) and analyses of these and previous simulations in the presence of a small molecule.

While I agree with the authors that the questions addressed are interesting, I am not sure how much we learn from the present simulations and analyses. In parts, the manuscript reads more like an attempt to apply a whole range of tools rather than with a goal of answering any specific questions.

There's a lot going on in this paper, and I am not sure it is useful for the authors, readers or me to spell out all of my comments in detail. But here are at least some points that I found confusing/etc

Major concerns

p. 5 and elsewhere:
I lack a serious discussion of convergence and the statistics of the differences between the two sets of simulations. On p. 5 it is described how the authors ran multiple simulations of the ligand-free system for a total of 62 µs; that is about 25 times less than for the ligand system. I acknowledge that running 1.5 ms is unfeasible, but at a bare minimum the authors should discuss and analyse the consequences for the relatively small amount of sampling. Here it is important to say that while 62 µs may sound like a lot it is probably not enough to sample the relevant properties of a 140-residue long disordered protein.

p. 7:
The authors make it sound like a bad thing than some methods are deterministic. Why is that the case? What kind of uncertainty in the data do they mean? One can certainly have deterministic methods and still deal with uncertainty. Again, this seems like a somewhat ad hoc argument for the choice of the method used.

p. 8:
The authors should make it clear (i) what the reconstruction loss and KL is calculated over and (ii) what the RMSD is calculated over.

p. 9/figure 1:
The authors select a beta value that may be the minimum, but then is just below a big jump in the cross-validation error. Why does the error jump so much and isn't it slightly dangerous to pick a value close to such a large jump.

p. 10:
Why was a 2-dimensional representation used in the VAE? What evidence do the authors have that the representation is meaningful? The authors state "The free energy landscape represents a large number of spatially close local minima representative of energetically competitive conformations inherent in αS" but they do not say what they mean by "spatially close". In the original space? If so, where is the evidence.

p. 10:
It is not clear from the text whether the VAEs are the same for both aSYN and aSYN-Fasudil. I assume they are. Given that the Fasudil dataset is 25x larger, presumably the VAE is mostly driven by that system. Is the VAE an equally good representation of both systems?

p. 10/11:
Do the authors have any evidence that the latent space representation preserves relevant kinetic properties? This is a key point because the entire analysis is built on this. The choice of using z1 and z2 to build the MSM seems somewhat ad hoc. What does the auto-correlation functions of Z1 and Z2 look like? Are the related to dynamics of some key structural properties like Rg or transient helical structure.

p. 11:
What's the argument for not building an MSM with states shared for aSYN +- Fasudil?

p. 12:
Fig. 3b/c show quite clearly that the implied timescales are not converged at the chosen lag time (incidentally, it would have been useful with showing the timescales in physical time). The CK test is stated to be validated with "reasonable accuracy", though it is unclear what that means.

p. 12:
In Fig. 3d, what are the authors bootstrapping over? What are the errors if the authors analyse sampling noise (e.g. bootstrap over simulation blocks)?

p. 13:
I appreciate that the authors build an MSM using only a subset of the fasudil simulations. Here, it would be important that this analysis includes the entire workflow so that the VAE is also rebuilt from scratch. Is that the case?

p. 18:
I don't understand the goal of building the CVAE and DCVAE. Am I correct that the authors are building a complex ML model using only 3/6 input images? What is the goal of this analysis. As it stands, it reads a bit like simply wanting to apply some ML method to the data. Incidentally, the table in Fig. 6C is somewhat intransparent.

p. 22:
"Our results indicate that the interaction of fasudil with αS residues governs the structural features of the protein."
What results indicate this?

p. 23:
The authors should add some (realistic) errors to the entropy values quoted. Fig. 8 have some error bars, though they seem unrealistically small. Also, is the water value quoted from the same force field and conditions as for the simulations?

p. 23:
Has PDB2ENTROPY been validated for use with disordered proteins?

p. 23/24:
It would be useful to compare (i) the free energies of the states (from their populations), (ii) the entropies (as calculated) and (iii) the enthalpies (as calculated e.g. as the average force field energy). Do they match up?

p. 31:
It is unclear which previous simulation the new aSYN simulations were launched from. What is the size of the box used?

https://doi.org/10.7554/eLife.97709.1.sa1

Reviewer #3 (Public Review):

Summary:

In this manuscript Menon, Adhikari, and Mondal analyze explicit solvent molecular dynamics (MD) computer simulations of the intrinsically disordered protein (IDP) alpha-synuclein in the presence and absence of a small molecule ligand, Fasudil, previously demonstrated to bind alpha-synuclein by NMR spectroscopy without inducing folding into more ordered structures. In order to provide insight into the binding mechanism of Fasudil the authors analyze an unbiased 1500us MD simulation of alpha-synuclein in the presence of Fasudil previously reported by Robustelli et.al. (Journal of the American Chemical Society, 144(6), pp.2501-2510). The authors compare this simulation to a very different set of apo simulations: 23 separate1-4us simulations of alpha-synuclein seeded from different apo conformations taken from another previously reported by Robustelli et. al. (PNAS, 115 (21), E4758-E4766), for a total of ~62us.

To analyze the conformational space of alpha-synuclein - the authors employ a variational auto-encoder (VAE) to reduce the dimensionality of Ca-Ca pairwise distances to 2 dimensions, and use the latent space projection of the VAE to build Markov state Models. The authors utilize k-means clustering to cluster the sampled states of alpha-synuclein in each condition into 180 microstates on the VAE latent space. They then coarse grain these 180 microstates into a 3-macrostate model for apo alpha-synuclein and a 6-macrostate model for alpha-synuclein in the presence of fasudil using the PCCA+ course graining method. Few details are provided to explain the hyperparameters used for PCCA+ coarse graining and the rationale for selecting the final number of macrostates.

The authors analyze the properties of each of the alpha-synuclein macrostates from their final MSMs - examining intramolecular contacts, secondary structure propensities, and in the case of alpha-synuclein:Fasudil holo simulations - the contact probabilities between Fasudil and alpha-synuclein residues.

The authors utilize an additional variational autoencoder (a denoising convolutional VAE) to compare denoised contact maps of each macrostate, and project onto an additional latent space. The authors conclude that their apo and holo simulations are sampling distinct regions of the conformational space of alpha-synuclein projected on the denoising convolutional VAE latent space.

Finally, the authors calculate water entropy and protein conformational entropy for each microstate. To facilitate water entropy calculations - the author's take a single structure from each macrostate - and ran a 20ps simulation at a finer timestep (4 femtoseconds) using a previously published method (DoSPT), which computes thermodynamic properties of water from MD simulations using autocorrelation functions of water velocities. The authors report that water entropy calculated from these individual 20ps simulations is very similar.

For each macrostate the authors compute protein conformational entropy using a previously published Maximum Information Spanning tree approach based on torsion angle distributions - and observe that the estimated protein conformational entropy is substantially more negative for the macrostates of the holo ensemble.

The authors calculate mean first passage times from their Markov state models and report a strong correlation between the protein conformational entropy of each state and the mean first passage time from each state to the highest populated state.

As the authors observe the conformational entropy estimated from macrostates of the holo alpha-synuclein:Fasudil is greater than those estimated from macrostates of the apo holo alpha-synuclein macrostates - they suggest that the driving force of Fasudil binding is an increase in the conformational entropy of alpha-synuclein. No consideration/quantification of the enthalpy of alpha-synuclein Fasudil binding is presented.

Strengths:

The author's utilize MD simulations run with an appropriate force field for IDPs (a99SB-disp and a99SB-disp water (Robustelli et. al, PNAS, 115 (21), E4758-E4766) - which has previously been used to perform MD simulations of alpha-synuclein that have been validated with extensive NMR data.

The contact probability between Fasudil and each alpha-synuclein residue observed in the previously performed 1500us MD simulation of alpha-synuclein in the presence of Fasudil (Robustelli et. al., Journal of the American Chemical Society, 144(6), pp.2501-2510) was previously found to be in good agreement with experimental NMR chemical shift perturbations upon Fasudil binding - suggesting that this simulation is a reasonable choice for understanding IDP:small molecule interactions.

Weaknesses:

Major Weakness 1: Simulations of apo alpha-synuclein and holo simulations of alpha-synuclein and fasudil are not comparable.

The most robust way to determine how presence of Fasudil affects the conformational ensemble of alpha-synuclein conclusions is to run apo and holo simulations of the same length from the same starting structures using the same simulation parameters.

The 23 1-4 us independent simulations of apo alpha-synuclein and the long unbiased 1500us alpha-synuclein in the presence of fasudil are not directly comparable. The starting structures of simulations used to build a Markov state model to describe apo alpha-synuclein were taken from a previously reported 73us MD simulation of alpha-synuclein run with the a99SB-disp force field and water model) with 100mM NaCl, (Robustelli et. al, PNAS, 115 (21), E4758-E4766). As the holo simulation of alpha-synuclein and Fasudil was run in 50mM NaCl, snapshots from the original apo alpha-synuclein simulation were resolvated with 50mM NaCl - and new simulations were run.

No justification is offered for how starting structures were selected. We have no sense of the conformational variability of the starting structures selected and no sense of how these conformations compare to the alpha-synuclein conformations sampled in the holo simulation in terms of standard structural descriptors such as tertiary contacts, secondary structure, radius of gyration (Rg), solvent exposed surface area etc. (we only see a comparison of projections on an uninterpretable non-linear latent-space and average contact maps). Additionally, 1-4 us is a relatively short timescale for a simulation of a 140 residue IDP- and one is unlikely to see substantial evolution for many structural properties of interest (ie. secondary structure, radius of gyration, tertiary contacts) in simulations this short. Without any information about the conformational space sample in the 23 apo simulations (aside from a projection on an uninterpretable latent space)- we have no way to determine if we observe transitions between distinct states in these short simulations, and therefore if it is possible the construct a meaningful MSM from these simulations.

If the structures used for apo simulations are on average more compact or contain more tertiary contacts - then it is unsurprising that in short independent simulations they sample a smaller region of conformational space. Similarly, if the starting structures have similar dimensions - but we only observe extremely local sampling around starting structures in apo simulations in the short simulation times - it would also not be surprising that we sample a smaller amount of conformational space. By only presenting comparisons of conformational states on an uninformative VAE latent space - it is not possible for a reader to ask simple questions about how the conformational ensembles compare.

It is noted that the authors attempt to address questions about sampling by building an MSM of single contiguous 60us portion of the holo simulation of alpha-synuclein and Fasudil - noting that:

"the MSM built using lesser data (and same amount of data as in water) also indicated the presence of six states of alphaS in presence of fasudil, as was observed in the MSM of the full trajectory. Together, this exercise invalidates the sampling argument and suggests that the increase in the number of metastable macrostates of alphaS in fasudil solution relative to that in water is a direct outcome of the interaction of alphaS with the small molecule."

However, the authors present no data to support this assertion - and readers have no sense of how the conformational space sampled in this portion of the trajectory compares to the conformational space sampled in the independent apo simulations or the full holo simulation. As the analyzed 60us portion of the holo trajectory may have no overlap with conformational space sampled in the independent apo simulations - it is unclear if this control provides any information. There is no quantification of the conformational entropy of the 6 states obtained from this portion of the holo trajectory or the full conformational space sampled. No information is presented to determine if we observe similar states in the shorter portion of the holo trajectory. Furthermore - as the authors provide almost no justification for the criteria used to select of the final number of macrostates for any of the MSMs reported in this work- and the number of macrostates is effectively a free parameter in the PCCA+ method, arriving at an MSM with 6 macrostates does not convey any information about the conformational entropy of alpha-synuclein in the presence or absence of ligands. Indeed - the implied timescale plot for 60us holo MSM (Figure S2) - shows that at least 10 processes are resolved in the 120 microstate model - and there is no information to provided explaining/justifying how a final 6-macrostate model was determined. The authors also do not project the conformations sampled in this sub- trajectory onto the latent space of the final VAE.

One certainly expects that an MSM built with 1/20th of the simulation data should have substantial differences from an MSM built from the full trajectory - so failing additional information and hyperparameter justification - one wonders if the emergence of a 6-state model could be the direct result of hardcoded VAE and MSM construction hyperparameter choices.

Required Controls For Supporting the Conclusions of the Study: The authors should initiate apo and holo simulations from the same starting structures - using the same simulation software and parameters. This could be done by adding a Fasudil ligand to the apo structures - or by removing the Fasudil ligand from a subset of holo structures. This would enable them to make apples-to-apples comparisons about the effect of Fasudil on alpha-synuclein conformational space.

Failing to add direct apples-to-apples comparisons, which would be required to truly support the studies conclusions, the authors should at least compare the conformational space sampled in the independent apo simulations and holo simulations using standard interpretable IDP order parameters (ie. Rg, end-to-end distance, secondary structure order parameters) and/or principal components from PCA or tICA obtained from the holo simulation. The authors should quantify the number of transitions observed between conformational states in their apo simulations. The authors could also perform more appropriate holo controls, without additional calculations, by taking batches of a similar number of short 1-4us segments of simulations used to compute the apo MSMs and examining how the parameters/macrostates of the holo MSMs vary with the input with random selections.

Major Weakness 2: There is little justification of how the hyperparameters MSMs were selected. It is unclear if the results of the study depend on arbitrary hyperparameter selections such as the final number of macrostates in each model.

It is unclear what criteria were used to determine the appropriate number of microstates and macrostates for each MSM. Most importantly - as all analyses of water entropy and conformational entropy are restricted to the final macrostates - the criteria used to select the final number of macrostates with the PCCA+ are extremely important to the results of the conclusions of the study. From examining the ITS plots in Figure 3 - it seems both MSMs show the same number of resolved processes (at least 11) - suggesting that a 10-state model could be apropraite for both systems. If one were to simply select a large number of macrostates for the 20x longer holo simulation - do these states converge to the same conformational entropy as the states seen in the short apo simulations? Is there some MSM quality metric used to determine what number of macrostates is more appropriate?

Required Controls For Supporting the Conclusions of the Study: The authors should specify the criteria used to determine the appropriate number of microstates and macrostates for their MSMs and present controls that demonstrate that the conformational entropies calculated for their final states are not simply a function of the ratio of the number macrostates chosen to represent very disparate amounts of conformational sampling.

Major Weakness 3: The use of variational autoencoders (VAEs) obscures insights into the underlying conformational ensembles of apo and holo alpha-synuclein rather than providing new ones.

No rationale is offered for the selection of the VAE architecture or hyperparameters used to reduce the dimensionality of alpha-synuclein conformational space.

It is not clear the VAEs employed in this study are providing any new insight into the conformational ensembles and binding mechanisms of Fasudil to alpha-synuclein, or if the underlying latent space of the VAEs are more informative or kinetically meaningful than standard linear dimensionality reduction techniques like PCA and tICA. The initial VAE is used to reduce the dimensionality of alpha-synuclein conformational ensembles to 2 degrees of freedom - but it is unclear if this projection is structurally or kinetically meaningful. It is not clear why the authors choice to use a 2-dimeinsional projection instead of a higher number of dimensions to build their MSMs. Can they produce a more kinetically and structurally meaningful model using a higher dimensional VAE latent space?

Additionally - it is not clear what insights are provided by the Denoising Convolutional Variational Autoencoder. The authors appear to be noising-and-denoising the contact maps of each macrostate, and then projecting the denoised values onto a new latent space - and commenting that they are different. Does this provide additional insight that looking at the contact maps in Figures 4&5 does not? Is this more informative than examining the distribution of the Radii of gyration or the secondary structure propensities of each ensemble? It is not clear what insight this analysis adds to the manuscript.

Suggested controls to improve the study: The authors should project interpretable IDP structural descriptors (ie. secondary structure, radius of gyration, secondary structure content, # of intramolecular contacts, # of intermolecular contacts between alpha-synuclein and Fasudil ) onto this latent space to illustrate if any of these properties are meaningful separated by the VAE projection. The authors should compare these projections, and MSMs built from these projections, to projections and MSMs built from projections using standard linear dimensionality projection techniques like PCA and tICA.

Major Weakness 4: The MSMs produced in this study have large discrepancies with MSMs previously produced on the same dataset by the same authors that are not discussed.

Previously - two of the authors of this manuscript (Menon and Mondal) authored a preprint titled "Small molecule modulates α-synuclein conformation and its oligomerization via Entropy Expansion" (https://www.biorxiv.org/content/10.1101/2022.10.20.513005v1.full) that analyzed the same 1500us holo simulation of alpha-synuclein binding Fasudil. In this study - they utilized the variational approach to Markov processes (VAMP) to build an MSM using a 1D order parameter as input (the radius of gyration), first discretizing the conformational space into 300 microstates before similarly building a 6 macrostate model. From examining the contact maps and secondary structure propensities of the holo MSMs from the current study and the previous study- some of the macrostates appear similar, however there appear to be orders of magnitude differences in the timescales of conformational transitions between the two models. The timescales of conformational transitions in the previous MSM are on the order of 10s of microseconds, while the timescales of transitions in this manuscript are 100s-1000s microseconds. In the previous manuscript, a 3 state MSM is built from an apo α-synuclein obtained from a continuous 73ms unbiased MD simulation of alpha-synuclein run at a different salt concentration (100mM) and an additional 33 ms of shorter simulations. The apo MSM from the previous study similarly reports very fast timescales of transitions between apo states (on the order ~1ms) - while the MSM reported in the current study (Figure 9) are on the order of 10s-100s of microseconds).

These discrepancies raise further concerns that the properties of the MSMs built on these systems are extremely sensitive to the chosen projection methods and MSM modeling choices and hyperparameters, and that neither model may be an accurate description of the true underlying dynamics

Suggestions to improve the study: The authors should discuss the discrepancies with the MSMs reported in their previous studies.

https://doi.org/10.7554/eLife.97709.1.sa0

An Integrated Machine Learning Approach Delineates Entropy-mediated Conformational Modulation of α-synuclein by Small Molecule

Strength of evidence

Abstract

Introduction

Results and Discussion

1. Modulation of free energy landscape of αS in presence of fasudil

2. Markov State Model elucidates distinct binding competing states of αS in presence of the small-molecule drug

3. Structural characterisation of metastable states of αS monomer in presence of fasudil

4. Mapping the macrostates in αS and αS-Fasudil ensembles using Denoising Convolutional Variational Autoencoder

5. Fasudil exhibits conformation-dependent interactions with individual metastable states

6. Entropic Signatures of Small Molecule Binding

7. Exploring interplay of conformational entropy and Mean First Passage Time in Protein Conformational Transitions

Conclusions

Computational Methods

Monomer Simulation Protocol

Dimension reduction using β-Variational Autoencoder

Building the Denoising Convolutional Variational AutoEncoder

Building a Markov State Model of monomer ensemble

Entropy of water

Protein conformational entropy

Code Availability

Supporting information

Acknowledgements

References

Article and author information

Author information

Sneha Menon

Subinoy Adhikari

Jagannath Mondal

Version history

Cite all versions

Copyright

Peer review process

Editors

Be the first to read new articles from eLife

Strength of evidence

Abstract

Introduction

Results and Discussion

1. Modulation of free energy landscape of αS in presence of fasudil

2. Markov State Model elucidates distinct binding competing states of αS in presence of the small-molecule drug

3. Structural characterisation of metastable states of αS monomer in presence of fasudil

4. Mapping the macrostates in αS and αS-Fasudil ensembles using Denoising Convolutional Variational Autoencoder

5. Fasudil exhibits conformation-dependent interactions with individual metastable states

6. Entropic Signatures of Small Molecule Binding

7. Exploring interplay of conformational entropy and Mean First Passage Time in Protein Conformational Transitions

Conclusions

Computational Methods

Monomer Simulation Protocol

Dimension reduction using β-Variational Autoencoder

Building the Denoising Convolutional Variational AutoEncoder

Building a Markov State Model of monomer ensemble

Entropy of water

Protein conformational entropy

Code Availability

Supporting information

Acknowledgements

References

Article and author information

Author information

Sneha Menon⊥

Subinoy Adhikari⊥

Jagannath Mondal

Version history

Cite all versions

Copyright

Peer review process

Editors

Sneha Menon

Subinoy Adhikari