The interaction between the Heat Shock Proteins 70 and 40 is at the core of the ATPase regulation of the chaperone machinery that maintains protein homeostasis. However, the structural details of the interaction remain elusive and contrasting models have been proposed for the transient Hsp70/Hsp40 complexes. Here we combine molecular simulations based on both coarse-grained and atomistic models with coevolutionary sequence analysis to shed light on this problem by focusing on the bacterial DnaK/DnaJ system. The integration of these complementary approaches resulted in a novel structural model that rationalizes previous experimental observations. We identify an evolutionarily conserved interaction surface formed by helix II of the DnaJ J-domain and a structurally contiguous region of DnaK, involving lobe IIA of the nucleotide binding domain, the inter-domain linker, and the -basket of the substrate binding domain.https://doi.org/10.7554/eLife.23471.001
The 70 kDa and 40 kDa Heat Shock Proteins (Hsp70/Hsp40) form the core of a chaperone machinery that plays essential roles in proteostasis and proteolytic pathways (Daugaard et al., 2007; Hartl et al., 2011; Mayer, 2013). Hsp70 chaperones, and their cochaperone partners Hsp40, are highly conserved ubiquitous proteins, present in multiple paralogs in virtually all known organisms (Daugaard et al., 2007; Kampinga and Craig, 2010). The chaperoning role of this machinery is based on the ability of Hsp70s to bind client proteins in non-native states, thereby preventing and reverting aggregation, unfolding misfolded proteins, assisting protein degradation and translocation (De Los Rios et al., 2006; Proctor and Lorimer, 2011; Rampelt et al., 2012; Sharma et al., 2010).
Members of the Hsp70 family are composed of two domains, connected by a flexible linker: the N-terminal nucleotide binding domain (NBD) binds and hydrolyzes ATP, whereas the C-terminal substrate binding domain (SBD) interacts with client proteins (Zuiderweg et al., 2013). The nature of the bound nucleotide induces dramatically different conformations of Hsp70: in the ADP-bound state, the two domains are mostly detached and behave almost independently (Bertelsen et al., 2009), whereas in the ATP-bound state, the SBD splits into two sub-domains that dock onto the NBD (Kityk et al., 2012; Qi et al., 2013). Therefore, nucleotide hydrolysis and exchange induce large-scale conformational dynamics that regulate the chaperone interaction with client proteins (Mayer, 2013).
Hsp40s are also called J-Domain Proteins, as they are invariantly characterized by the presence of a 70 residue signature domain (J-domain), within a variable multi-domain architecture. This J-domain is composed of four helices (Figure 1A). The two central helices II and III form an antiparallel bundle, connected by a flexible loop with a highly conserved distinctive histidine-proline-aspartate (HPD) motif (Pellecchia et al., 1996). Several studies have indicated the essential role of the J-domain and of the HPD motif in Hsp40/Hsp70 interactions (Greene et al., 1998; Mayer et al., 1999; Suh et al., 1998; Tsai and Douglas, 1996). While the structural diversity of Hsp40s mirrors the functional versatility of this complex machinery, the common conserved J-domain is strictly necessary for enhancing ATP hydrolysis by Hsp70 (Kampinga and Craig, 2010). Modulation of Hsp70 ATPase activity through formation of transient Hsp70/Hsp40/client complexes regulates the chaperone affinity for client proteins (De Los Rios and Barducci, 2014; Kellner et al., 2014) and is hence essential for all its multiple cellular functions. Continuous switching between multiple conformations means that the chaperone-cochaperone interaction is intrinsically highly dynamic. Understanding the complex interplay between Hsp70 and Hsp40 at the mechanistic level is therefore a crucial task to gain a deeper functional insight into the chaperone machinery (Mapa et al., 2010).
Extensive experimental evidence of Hsp70/40 interactions has been accumulated over the last two decades (Alderson et al., 2016). Mutagenesis, surface plasmon resonance and NMR experiments have identified multiple putative interacting regions of the J-domain and Hsp70, mostly focusing on the E. coli DnaK/DnaJ system (Ahmad et al., 2011; Genevaux et al., 2002; Greene et al., 1998; Suh et al., 1998, 1999). In spite of this considerable effort, the dynamic and transient nature of the Hsp70/Hsp40 complex has posed severe challenges to its structural characterization and no consensus view has yet been reached on this.
To date, the only available high-resolution structure has been obtained by means of X-ray crystallography of the NBD of bovine Hsc70 (Hsp70) covalently linked to the J-domain of bovine auxilin (Hsp40) (Jiang et al., 2007). However, this structure cannot be easily reconciled with NMR and mutagenesis data collected on DnaK/DnaJ, thus suggesting either major differences in the binding modes of bacterial and eukaryotic Hps70/40s or the trapping of a sparsely populated state that is influenced by non-native contacts (Sousa et al., 2012; Zuiderweg and Ahmad, 2012). More recently, solution PRE-NMR experiments identified an alternative highly dynamic interface between ADP-bound DnaK and DnaJ (Ahmad et al., 2011).
Here we relied on both multi-scale molecular modeling and statistical analysis of protein sequences to shed light on the Hsp70/Hsp40 interactions. By combining these complementary techniques, we propose a structural model of the binding of bacterial DnaK/DnaJ, which is in good agreement with available experimental data and greatly extends our understanding of this elusive yet fundamental process.
We characterized DnaK/DnaJ interactions by means of Monte Carlo simulations based on a coarse-grained (CG) potential energy validated against structural and thermodynamic properties of protein complexes with low binding affinity (Kim and Hummer, 2008; Kim et al., 2008; Różycki et al., 2011). Binding partners are modeled as rigid bodies, using one interaction site per amino-acid (residue) located at the position of the experimental structure. Intermolecular energy functions are based on statistical contact potentials and long-range Debye-Hückel electrostatic interactions (Kim and Hummer, 2008). A replica exchange Monte Carlo simulation protocol is adopted to exhaustively sample all relevant bound conformations. We took advantage of this approach and of the availability of high-resolution structures of the individual binding partners to investigate complexes formed by the J-domain of E. coli DnaJ (JD) with the DnaK NBD, both in its ADP- and ATP-bound conformations (NBD(ADP), NBD(ATP)). Moreover, we extended this analysis to full-length ATP-bound DnaK (FL(ATP)) in order to unveil a possible role of the SBD in the binding process.
CG trajectories were analyzed to determine the binding affinity of the three DnaK constructs for JD and to characterize the most favorable complex conformations. Calculated binding affinities (KD = 540 μM ± 60 NBD(ADP), KD = 370 μM ± 35 NBD(ATP), KD = 23 μM ± 3 FL(ATP)) are compatible with previous experimental determinations (Ahmad et al., 2011; Greene et al., 1998; Wittung-Stafshede et al., 2003), and their significant dependence on the presence of the SBD and linker suggests a stabilizing role of this region in the DnaK/JD complex. The analysis of the conformational ensembles corresponding to bound complexes revealed several distinctive features of the DnaK/JD binding process, along with a certain degree of conformational heterogeneity (Figure 1—figure supplement 3). The free energy surfaces as a function of NBD-centered spherical coordinates (Figure 1C,E,G) clearly indicate that a specific binding site predominates in all the simulated DnaK constructs, irrespective of the bound nucleotide and of the presence of the SBD and linker.
To better characterize this favored binding interface, we calculated the probability of each DnaK residue to be in direct contact with JD and mapped it onto the NBD structure (Figure 1B). These results suggest that the formation of DnaK/DnaJ complexes mostly involves a DnaK region located on lobe IIA of the NBD, in a negatively charged narrow groove formed by a beta-sheet and a short loop (Figures 1B and 2A,B). The complementary interface on JD suggests that its interaction with DnaK is mostly mediated by the positively charged helix II and few residues on helix I (Figure 1A).
We further analyzed the conformational ensembles obtained by CG simulations to identify the most relevant orientations of the JD in the bound complexes. The free energy surfaces as a function of Euler angles measuring the relative orientation of the binding partners reveal two major conformational sub-ensembles in all the simulated systems (Figure 1D,F,H, Figure 1—figure supplement 2). Cluster analysis of the CG trajectories suggests that these distinct intermolecular arrangements correspond to complexes with similar binding interfaces but opposite orientations of the JD with respect to DnaK (Figure 2, Figure 2—figure supplement 1). The conserved HPD loop of the JD points outwards in one conformational sub-ensemble (HPD-OUT, Figure 2A,C), whereas in the other it is close to the groove on the NBD where the inter-domain linker docks (HPD-IN, Figure 2B,D). These two arrangements were observed in all the simulated systems with very limited perturbations because of the presence of SBD, and together they account for more than 91% of the populations in the bound ensembles. In all the simulated systems, the population of the HPD-OUT conformation is higher than that of HPD-IN. However, the observed free energy differences between the two binding modes (1.5 kcal/mol) are comparable with the expected uncertainty of the CG model (Kim and Hummer, 2008).
Statistical analysis of the covariation in multiple sequence alignments (MSAs) represents an extremely valuable approach to investigate protein structure by identifying residue-residue interactions that are evolutionarily conserved (de Juan et al., 2013; Marks et al., 2012). Particularly, direct coupling analysis (DCA) (Morcos et al., 2011; Weigt et al., 2009) of paired MSAs of interacting proteins has been successfully applied to predict interfaces of protein complexes (Hopf et al., 2014; Ovchinnikov et al., 2014). The canonical matching algorithms for generating paired MSAs are based on intergenic distances and, unfortunately, they cannot be directly applied to the Hsp70/40 interaction, because of promiscuous interactions that cannot strictly be predicted by operon structure in this family. To circumvent this difficulty, we adopted an alternative approach based on generation of an ensemble of stochastically matched MSAs. In this context, the statistical reliability of inter-residue couplings can be related to their frequency of appearance within the DCA predictions obtained from all the realizations of the MSA ensemble (see Materials and methods for details).
We took advantage of the large sizes of the Hsp70/40 families (Hsp70: 20061 sequences, Hsp40: 26254 sequences, distributed in all kingdoms, see Materials and methods) to evaluate inter-residue evolutionary couplings between the Hsp40 JD and the Hsp70 NBD. The high degree of conservation of these domains guarantees high-quality alignments and thus accurate results. This analysis identified three inter-protein residue pairs that stand out among coevolving pairs in the Hsp40 and Hsp70 families (Figure 3A), corresponding to N187-K23, D208-K26 and T189-R19 in E. coli DnaK and DnaJ. The spatial proximity of N187/D208/T189 on the DnaK NBD, and the proximity between K23/K26/R19 on the JD, suggest the presence of well-defined evolutionarily conserved binding patches across Hsp40/70 families.
Remarkably, these patches are perfectly overlapping with the binding regions predicted by CG modeling, that is, helix II in DnaJ and a sub-region of lobe IIA in DnaK NBD. Thus, DCA predictions can be used to evaluate the HPD-IN and HPD-OUT binding modes suggested by CG simulations (Figure 3B,C). Quantitative assessment is limited by several factors such as the difficulty of translating coevolutionary couplings into exact distance restraints, the limited resolution of the residue-based CG model and the dynamic nature of the JD/DnaK complexes. Nevertheless, while both the intermolecular conformations might be compatible with DCA predictions, the specific binding pattern predicted by coevolutionary analysis matches significantly better the JD orientation observed in HPD-IN (Figure 3C). Moreover in this conformation, the location of D35 in the HPD motif is compatible with a putative interaction with R167 on DnaK NBD, as previously suggested by mutagenesis experiments (Suh et al., 1998). The overall better agreement of the HPD-IN conformation can be further strengthened by the observation that the average distance observed in HPD-OUT for the pair T189-R19 seems too large () to justify a direct interaction, and thus a strong statistical coupling between those residues.
We repeated the DCA analysis on two subsets restricted to either bacterial or eukaryotic sequences, to further investigate the origin of the detected coevolutionary signal. The results obtained on the bacterial subset (Figure 3—figure supplement 1A) are in perfect agreement with the results obtained on the full dataset, whereas no strong coevolutionary couplings are detected using the eukaryotic subset (Figure 3—figure supplement 1B). This observation indicates that the coevolutionary signal of the observed Hsp70/Hsp40 interface mostly originates from the bacterial sequences in the dataset.
We performed atomistic explicit-solvent molecular dynamics (MD) simulations to investigate the stability and the dynamics of the DnaK/DnaJ docked conformations obtained with CG modeling.
Firstly, we assessed the reliability of the DnaK/DnaJ complexes by performing 10 MD runs of 30 ns for each system (JD:NBD(ADP), JD:NBD(ATP) and JD:FL(ATP)) in both HPD-IN and HPD-OUT binding modes (see Materials and methods). These simulations showed a certain degree of conformational dynamics in all cases (Figure 4—figure supplement 1), and provided information about the relative stability of the various complexes. Particularly, the C distance root mean square deviation (dRMS) and the average angular deviation of the JD with respect to the starting frame (Table 1, Figure 4—figure supplements 2–3) revealed that NBD(ADP):JD and NBD(ATP):JD complexes in the HPD-OUT conformations displayed high structural variability in contrast with HPD-IN conformations, which were significantly more stable on this time scale. This difference was significantly less pronounced in the simulations of full-length DnaK, likely because of the stabilizing effect of JD-SBD interactions.
We then focused on the FL(ATP):JD complex in the HPD-IN conformation, which stands out among the other systems as it involves full-length DnaK and is compatible with coevolutionary and mutagenesis data. Particularly, we performed three MD simulations of 1 s to better probe its conformational dynamics. The results confirmed the stability of the HPD-IN arrangement on a more extended time scale but unveiled the presence of multiple, distinct conformational states within this overall binding mode (Figure 4A). While an exhaustive characterization of the conformational space exceeds the capabilities of all-atom MD, the broad structural ensembles are suggestive of a significant degree of conformational dynamics in the timescale. This picture is consistent with the broad, multi-modal distributions of the atomic distances corresponding to relevant intermolecular interactions, such as the three evolutionarily conserved contacts and D35-R167 (Figure 4B). These interactions thus appear to be transiently populated in the context of a highly dynamical intermolecular interface.
To shed further light on the molecular determinants of the DnaK/JD interaction, we then evaluated the energetic contributions of individual residues to the protein-protein binding energy in the trajectories, using a generalized born surface area (GBSA) approximation (see Material and Materials and methods). The per-residue decomposition of the binding energy highlighted four fragments of DnaK that contribute most strongly to the stabilization of the DnaK/JD complexes (Figure 5A). The residues corresponding to three of these spots form an almost continuous patch covering the upper-cleft between lobes II and III of the NBD (residues 206–219, 329–335) and a segment of the inter-domain linker (residues 391–393) Figure 5A–B). Reciprocally, the energetic analysis of the JD residues predicted helix II and the HPD loop as being the principal region involved in energetic stabilization of the complex (Figure 5—figure supplement 1). These findings confirm that the JD strongly interacts with the docked linker and its neighboring residues, suggesting the role of JD at stabilizing this linker arrangement. Remarkably, the energetic analysis also unveiled that a stretch of the SBD beta-basket plays an important role in securing the DnaK/JD interface (residues 414–423).
To investigate the functional relevance of the interaction between the JD and the SBD, we then extended the DCA analysis to full-length Hsp70 sequences. This analysis confirmed the significance of the previously observed NBD/JD contacts and predicted two inter-protein contacts involving the SBD (H422-E75 and Q424-K51 in E. coli DnaK/DnaJ numbering, Figure 5C and Figure 5—figure supplement 2). Interestingly, the two residues on the SBD correspond to the SBD region energetically involved in the binding of the JD, and thus show that their interaction has been conserved through evolution. Of the two corresponding residues on the JD, one is located on helix III (K51), in excellent agreement with the HPD-IN binding mode, while the second (E74) lies in the unstructured C-terminal region of the JD, which was not included in the structural model (see Materials and methods). Taken together, these energetic and evolutionary analyses strongly indicate that the J-domain directly interacts with the inter-domain linker in its docked conformation, as well as with the SBD.
The integration of complementary approaches such as coevolutionary sequence analysis and molecular modeling at the coarse-grained and atomistic scale allowed us to shed light on the structural details of the crucial interaction of DnaJ with DnaK.
Indeed, molecular simulations based on a CG model specifically suited to study low-affinity protein binding identified the positively charged helix II of the JD and a region close to lobe IIA of the DnaK NBD as the most relevant interaction sites in the formation of DnaK/JD complexes. This prediction was corroborated by statistical sequence analysis showing that several inter-protein contacts across this interface strongly coevolve in the Hsp40/70 family. These findings are in good agreement with much experimental evidence collected in the last 20 years. Indeed, a major role for helix II of JD in the DnaK/DnaJ interaction has been suggested both by NMR and mutagenesis experiments (Greene et al., 1998; Suh et al., 1998). Furthermore, our prediction is in excellent agreement with recent PRE-NMR investigation of the interaction of JD with ADP-bound DnaK that identified the sequence 206EIDEVDGEKTFEVLAT221 as the main binding region on DnaK (Ahmad et al., 2011). The observation that the same interaction site was present in all the simulated systems (ATP- and ADP-NBD and ATP-bound full-length DnaK) strongly suggests that this region is likely to play a primary role throughout the chaperone functional cycle, thus greatly extending its physiological relevance. Interestingly, the predicted bound conformations located the J-domain in near proximity to the docked inter-domain linker in FL(ATP) (Figure 2), which has been shown to play a central role in the allosteric coupling of the two domains in the Hsp70 cycle (Alderson et al., 2014; Vogel et al., 2006; Zhuravleva et al., 2012).
Beyond a detailed characterization of the binding regions on DnaK and the JD, our integrated approach provided precious information about the inter-protein arrangement in the transient DnaK/JD complexes. Effectively, CG modeling suggested two possible binding modes characterized by opposite orientations of the JD (Figure 2). Both these putative conformations were only minimally affected by structural differences in the NBD upon ATP/ADP binding or by inter-domain docking in full-length ATP-bound DnaK. Direct comparison of these results with the interaction pattern inferred from coevolutionary analysis reveals an excellent agreement for one of the conformations (HPD-IN). Further elements supporting the relevance of this structure can be found by taking into account the role of the highly conserved HPD loop of the JD. Indeed, several mutagenesis studies have shown that the HPD loop is fundamental for functional chaperone/cochaperone interactions (Landry, 2003; Suh et al., 1998). NMR investigations have reported conflicting evidence about the actual involvement of the HPD region in the Hsp70/Hsp40 interface (Ahmad et al., 2011; Greene et al., 1998; Kim et al., 2014). However, the observation that the DnaK R167H mutation could suppress the deleterious effect of the DnaJ D35N mutation strongly pointed toward a direct, yet possibly transient, interaction of these residues during the chaperone functional cycle (Suh et al., 1998). Strikingly, this experimental evidence is perfectly compatible with the spatial proximity of DnaJ D35 and DnaK R167 in the HPD-IN conformation (Figure 3C), whereas it cannot be easily reconciled with the orientation of the HPD in the current structural model of the DnaK:JD complex based on PRE-NMR experiments (Ahmad et al., 2011). The HPD-IN conformation hence provides a novel, suggestive model for the elusive DnaK/DnaJ complex that best recapitulates the most relevant experimental evidence on prokaryotic Hsp70/Hsp40 systems.
The insights obtained combining CG modeling and coevolutionary sequence analysis were further confirmed and enriched by explicit solvent, atomistic simulations. Indeed, MD trajectories confirmed the overall stability of the HPD-IN complex on the time scale and showed the transient interaction of D35-R167 and of the coevolving pairs (Figure 4). The transient nature of these contacts is perfectly compatible with the dynamical interface suggested by NMR experiments (Ahmad et al., 2011). Furthermore, energetic analysis of the atomistic simulations unveiled the residues that contribute most to the formation of this dynamical complex. Particularly, we notice that the interaction of JD helix II with the DnaK surface composed of the docked intermolecular linker and adjacent -strands has a key role in stabilization of the binding interface. Remarkably, the MD analysis highlighted that a few residues of the SBD -basket contribute significantly to JD binding. DCA analysis performed with full-length Hsp70 sequences further strengthened this observation by showing that the SBD/JD interface observed in the simulations actually contains pairs of coevolving residues in the Hsp70/Hsp40 family. Altogether, our results suggest that although the overall DnaK/JD arrangement is determined by interactions with the NBD, specific contacts with both the SBD and the inter-domain linker may significantly increase the complex stability, thus rationalizing the decreased affinity observed for isolated NBD (Kim et al., 2014).
To put our findings into context, we have to take into account the current understanding of allosteric signal transmission in DnaK. Much experimental evidence has indicated that ATP-bound DnaK undergoes large-scale structural fluctuations with significant inter-domain rearrangements (Mayer, 2010; Mapa et al., 2010). Within this conformational ensemble, NMR and mutagenesis studies suggested that allosterically active conformers with high ATPase activity are characterized by a docked inter-domain linker but very limited SBD-NBD contacts (Zhuravleva et al., 2012; Kityk et al., 2015; Jiang et al., 2007). Remarkably, we find that the docked inter-domain linker and neighboring residues correspond to a hotspot for Hsp70/Hsp40 interaction, suggesting a stabilization of this linker conformation in the transient DnaK/JD complex. Furthermore, the energetically favorable and evolutionarily conserved interactions between DnaK SBD and JD suggest an additional mechanism for altering the conformational ensemble of ATP-bound DnaK upon DnaJ binding. Our structural model is thus compatible with the intriguing hypothesis that the docking of the JD affects the SBD dynamics through direct interactions, shifting DnaK towards an allosterically active conformation (Zhuravleva et al., 2012). Therefore, while the dynamical interplay among NBD, inter-domain linker, SBD, and JD remains to be fully elucidated, our analysis provides insights about the regulatory role of J-domain proteins in the Hsp70 cycle, a topic of great interest for understanding the role of the Hsp70 machinery in the global chaperone network (Kravats et al., 2017) as well as for designing allosteric inhibitors (Li et al., 2016).
The alternative arrangement observed in the bovine auxilin:Hsc70 complex (Jiang et al., 2007), and its poor agreement with NMR/mutagenesis data on DnaK/DnaJ (Ahmad et al., 2011; Greene et al., 1998) raise the question of the uniqueness of the Hsp70/Hsp40 binding mode (Garimella et al., 2006). Whether these major structural differences are caused by artifacts introduced by the artificial cross-linking, or point to the existence of multiple dynamic interaction interfaces, or to phylogenetic differentiation of Hsp70/40s, remains an essential yet unsolved question. In this respect, the successful combination of coevolutionary and molecular modeling analysis proposed here paves the way for further analysis to tackle these challenges.
We used the coarse-grained model introduced in Kim and Hummer (2008) to simulate the binding of DnaJ to DnaK constructs. Both proteins were treated as rigid bodies, at a resolution of one bead per residue centered on the atoms. We modeled NBD(ADP) using the structured region (residues 4–380) of ADP-bound E. coli DnaK (pdb: 2kho [Bertelsen et al., 2009]), whereas we relied on the X-ray structure of ATP-bound E. coli DnaK (pdb: 4jne [Qi et al., 2013]) for both NBD(ATP) (residues 4–380) and FL(ATP) (residues 1–600). The J-domain was modeled based on the structured region of the E. coli DnaJ (pdb: 1xbl [Pellecchia et al., 1996], residues 2–70). We defined the structured part of the J-domain, by aligning multiple J-domains (pdb:1xbl, 4j7z, 2m6y, 2n04, 2qsa, 2lgw, 2och, 2dn9, 2dmx, 1hdj, 1faf, 2ctw) and considering the common structured part. We therefore removed the last six C-terminal residues from the 1xbl structure to define the maximal common structured region of the J-domain. Our definition of the J-domain corresponds to the one used in Ahmad et al. (2011).
Conformations were sampled from the equilibrium distribution using a replica-exchange Monte-Carlo (REMC) algorithm in a prediodic box, with 20 replicas distributed in the temperature range 200–395K. A total of MC-steps were performed for each replica and samples were recorded every 100 steps. Dissociation constants were calculated by measuring the fraction of bound conformations, and simulations were repeated with five increasing box sizes (240–360 Å for NBD, 300–420 Å for FL(ATP)). Bound conformations were extracted by selecting all complexes in which the two proteins had at least one pair of beads within 8 Å distance and total interaction energy equal or below . All subsequent analyses on the CG complexes have been performed on the ensemble of bound complexes. The algorithm introduced in (Daura et al., 1999) with a cutoff radius of 5 Å was used to perform cluster analysis of the CG trajectories.
To characterize the angular orientation of the binary complexes, we used two sets of angular coordinates. The binding site of the JD on the DnaK NBD was first characterized by the spherical coordinates of its center of mass. Let and denote the center of mass of the JD and NBD, respectively (computed over all atoms). Furthermore, let and (, corresponds to the largest moment of inertia) denote the three (normalized) axes of inertia of the JD and NBD, respectively. The spherical coordinates describing the binding site of the JD on the NBD are then defined by the usual pair of spherical angles, in the reference coordinate system defined by the inertia axis of the NBD.
The relative orientation of the JD with respect to the NBD is characterized by three Euler angles , computed with respect to the reference frame of the NBD, as follows (see Figure 1—figure supplement 4):
where denotes the quadrant-checking arctangent function.
To perform direct-coupling analysis we used the same sequence extraction protocol as in (Malinverni et al., 2015), reported hereafter:
• Initial seeds were built for both protein families, containing sequences from all kingdoms: Bacteria, Eukaryotes and Archaea.
• Hidden Markov models of the alignments were built, using the hmmbuild utility of the HMMER (version 3.1b2) (Mistry et al., 2013) suite, with default parameters.
• The union of the Swissprot and the Trembl databases (release 2015_08) was scanned against these two profiles , using the hmmsearch (Mistry et al., 2013) utility, with default parameters.
• For both retrieved MSAs, all sequences having more than 10% gapped positions were removed from the datasets.
• The Hsp70 sequences were restricted to the NBD and linker region, by trimming the C-terminal part of the alignment.
• Taxonomic identifiers for all sequences were retrieved from the NCBI taxonomy database.
This resulted in multiple-sequence-alignments for the Hsp70 and Hsp40 families containing, respectively, 20061 and 26254 homologues, with the following taxonomic distribution:
The number of paralogs per organism varies widely (Table 2, Figure 6), with bacteria having fewer paralogs for both Hsp40 and Hsp70. All organisms were included in the full dataset, both those having multiple paralogs and organisms possessing a single copy of Hsp40-Hsp70 pairs (3701 organisms).
The resulting MSAs covered the following ranges of the E. coli DnaK/DnaJ proteins: DnaK (Uniprot ID: P0A6Y8) I4-T395, DnaJ (Uniprot ID: P08622) K3-G78. Both sequence alignments and taxonomic identifiers for the Hsp40 and Hsp70 families are available as supplementary material (Supplementary file 1−5).
Direct-coupling analysis (DCA) was performed on each of the 1000 stochastically concatenated MSAs using the asymmetric version of the pseudo-likelihood method (Ekeberg et al., 2014), with standard parameters (maximum 90% sequence identity, regularization parameters ). In practice, the parameters and of the generalized Potts model (Equation 1) are numerically fitted to the data in the MSAs in the Pseudo-Likelihood approximation (Ekeberg et al., 2014)
where denotes an amino-acid sequence, the normalizing partition function. The raw DCA scores (Equation 2), quantifying the statistical coupling strength between two positions in the MSA, are defined as the Frobenius norm of the local 21 × 21 coupling matrices
Finally, we apply the average product correction (APC) (Dunn et al., 2008) (Equation 3) to the raw DCA scores, to correct for a bias in position specific mutation rate. To account for variable mutation rates in two different protein families, we follow the modification to the APC introduced in (Ovchinnikov et al., 2014), by taking the average over the two protein segments independently
where denotes the average over the row/column of the matrix .
To detect inter-protein coevolving residue pairs, concatenated MSAs of interacting protein sequence pairs must be built. Given the lack of knowledge on the interaction network of Hsp40s and Hsp70s and the lack of conservation of the number of paralogs throughout species, no trivial matching could be performed. Furthermore, the approach of matching interacting sequence pairs based on their genomic proximity (Feinauer et al., 2016; Hopf et al., 2014; Ovchinnikov et al., 2014) failed because of a lack of operon organization of Hsp70s and Hsp40s. We therefore employed a stochastic approach to the sequence-matching problem, which consists of the following steps:
For each organism:
Randomly select a sequence of Hsp70 and randomly match it to a single Hsp40 sequence of the same organism.
Remove these two sequences from the pool of available sequences in the current organism.
Repeat this procedure until there are no more Hsp40 or Hsp70 sequences to match in the current organism.
Repeat the procedure for all organisms possessing at least one Hsp40 and Hsp70 sequence.
This procedure generated a stochastic realization of a matched MSA, ensuring that each sequence was present only once in the MSA. This constraint of matching each sequence only once avoided the combinatorial explosion of the size of the random MSAs and, consequently, the dilution of the coevolutionary signal because of the presence of an overwhelming majority of non-interacting protein pairs. We generated 1000 such random MSAs and performed DCA on each of them individually. For each DCA realization, we then extracted the strongest inter-protein coevolving pairs, using a criterion introduced in Hopf et al. (2014), which briefly goes as follows: all inter-protein DCA scores are renormalized as
where denotes the average-product corrected DCA score, the length of the MSA and the effective number of sequences in the MSA (taking into account the reweighting by maximum 90% identity). Note that the minimum is taken restricted to the interface scores. This renormalization partially corrects for dependencies of the inter-protein DCA scores on the alignment width () and depth , and therefore permits an easier comparison of DCA scores across different protein families (Hopf et al., 2014). Note that the introduction of this score does not change the relative ranking of inter-protein DCA contacts, as it is merely a convenient renormalization of the scores. For each of the 1000 realizations, we collected all the inter-protein DCA pairs which had a normalized score above 0.8. We then computed the selection frequency for each contact (Figure 3A) and retained the residue pairs that were selected most frequently for subsequent analysis.
The rationale behind this procedure was that contacts appearing repeatedly in multiple random realizations were robust to matching noise in the MSAs and should therefore reflect a strong underlying coevolutionary signal.
As mentioned above, only a single copy of Hsp40 and Hsp70 sequences were retrieved in some organisms. The corresponding Hsp70/Hsp40 pairs were systematically matched and added to all the the randomly generated MSAs. To investigate the dependence of our results on this choice, we performed a DCA analysis on a limited MSA composed solely of single-copy organisms. We observed that in this case, only a fraction of the strongest coevolutionary signals predicted from the full-dataset are recovered.
Recently, two methods to simultaneously pair interacting paralogs and predict inter-protein coevolving contacts have been developed (Bitbol et al., 2016; Gueudré et al., 2016). Both these involved methods tackle the combined objective of deciphering the paralog interaction network as well as determining coevolving residue across protein interfaces through iterative schemes. We observed that the strongest inter-protein contacts predicted by the two methods (Bitbol et al., 2016; Gueudré et al., 2016) strongly overlap with those obtained using our random-matching strategy (Figure 7—figure supplement 1). To allow comparison with other inter-protein interactions studied by DCA, we report here an empirical inter-protein score introduced in (Feinauer et al., 2016; Gueudré et al., 2016), which consists of characterizing the overall strength of inter-protein coevolution by the average of the DCA scores of the four strongest inter-protein pairs (scored by the Frobenius norm and after APC correction). We obtained empirical inter-protein scores of 0.11, 0.17, and 0.24 for the random matching, PPM, and IPA, respectively. Note that in the case of the random matching, the empirical inter-protein score is averaged over the 1000 realizations of the matching procedure. In the context of this work, our interest was restricted to predicting coevolving inter-protein contacts, and we thus applied the simpler random matching strategy discussed above.
The selection frequencies of all inter-protein contacts were computed over the 1000 realizations (Figure 3C) and the most frequently appearing contacts were selected for further analysis. To set a threshold on the number of selected contacts, we computed the solvent accessible surface area (SASA) of the pairs of residues involved in the most frequent contacts. We then selected all ranked contacts before the appearance of a buried residue (SASA < 1 Å2) in the contact pair. This resulted in the selection of three significantly conserved DCA predicted contacts (Figure 3A,B,C). Note that similar limited numbers of contacts is generally considered in the DCA prediction of protein-protein interactions (Hopf et al., 2014; Ovchinnikov et al., 2014).
To validate our conservative threshold choice and further asses the robustness of our results, we extend here the analysis to all DCA predicted contacts with an appearance frequency 20% (Figure 7). Among the nine predicted contacts, two involve a buried residue (SASA < 1 Å, Table 3) and are discarded from further analysis. The seven remaining DCA predicted inter-protein contacts are depicted in Figure 7B–D. We observe that five out of the seven contacts are concentrated in the binding interface observed in the CG simulations that was already identified by the three strongest one and that these additional contacts do not qualitatively change the predictions reported in the results section (Figure 7, Table 3). This extended analysis of DCA predictions thus confirms the robustness of the results reported in the Results section using a stringent selection criterion, and further supports the DCA predicted binding interface.
For all the simulated systems, we used the RosettaDock (Chaudhury et al., 2011; Gray et al., 2003) protocol to obtain atomistic structures from the low-resolution CG conformations corresponding to the HPD-IN and HPD-OUT binding modes. Particularly, we took advantage of the multi-scale docking protocol (Chaudhury et al., 2011) to generate 1000 all-atom conformations from the CG structures corresponding to the center of each HPD-IN and HPD-OUT cluster. We then selected the 10 best scoring structures among those within a deviation equivalent to the radius of the clusters (Cα RMSD ≤ 5 Å) and we solvated them in dodecahedral boxes containing approximately 26,000 and 60,000 water molecules for NBD:JD and FL:JD complexes, respectively. MD simulations were performed using the GROMACS 5 MD package (Abraham et al., 2015), with the AMBER14 force-field (Case et al., 2014) and TIP3P water model (Jorgensen et al., 1983). Given the large internal dynamics of Hsp70, we used harmonic restraints on the backbone atoms of the Hsp70 constructs (NBD(ADP),NBD(ATP) and FL(ATP)) to focus on the inter-protein dynamics in the 30 ns runs. The s simulations used the same parameters, with the harmonic restraints removed.
All simulations were performed in a dodecahedral box with periodic boundary conditions. Simulations were carried out with the following protocol:
Starting structures were solvated with TIP3P water molecules and subsequently energy minimized by steepest descent.
A first NVT equilibration phase (1 ns) was performed, putting full restraints on all proteins, ATP (when present) and MG atoms.
A second NPT equilibration phase (1 ns) was performed keeping the same restraints as in the 1ns NVT equilibration phase.
Subsequently, another NPT equilibration was performed (10 ns), putting restraints on the protein backbone only (DnaK and DnaJ).
Finally, production runs were carried out for 30 ns, keeping only restraints on the DnaK backbone.
For the 1s simulations, we continued from the last frame of the 30 ns runs, without any restraints.
Temperature was kept constant (T = 300 K) using the v-rescale thermostat (Bussi et al., 2007) and NPT (p=1 atm) simulations relied on a Parrinello-Rahman barostat (Parrinello and Rahman, 1981). The equations of motion were integrated with a time step of 2 fs. All covalent bonds were constrained to their equilibrium values using the LINCS algorithm (Hess et al., 1997). The electrostatic interactions were calculated by the Particle Mesh Ewald algorithm, and a cutoff of 10 nm was used both for Lennard-Jones interaction and for the real-space coulomb contribution.
The distance root mean square (dRMS) measurements were calculated by
where (resp. ) are indices of the residues belonging to the J-domain (resp. DnaK), and denotes the distance between the atoms of residue of the J-domain and residue of DnaK at time . The dRMSs are then time-averaged over the last 10 ns of the MD trajectories (results reported in Table 1).
Similarly the angular stability ( in the text, see Table 1) was computed by
where denotes the inertia axis of the JD associated to the largest moment of inertia computed on the (see Figure 1—figure supplement 4), for aligned DnaK NBD at all frames. The values reported in Table 1 are averaged over the last 10 ns of the MD trajectories.
The binding energies of the JD/DnaK complexes were calculated by the MM-GBSA method, implemented in Ambertools (Case et al., 2014). The polar contribution to the solvation energy was calculated using the Generalized Born approximation, with 0.01M counterion concentration in solution. The SASA computation was performed using the LCPO method.
Amber 14San Francisco: University of California.
The nature of the accessible and buried surfaces in proteinsJournal of Molecular Biology 105:1–12.https://doi.org/10.1016/0022-2836(76)90191-1
Peptide folding: when Simulation meets experimentAngewandte Chemie International Edition 38:236–240.https://doi.org/10.1002/(SICI)1521-3773(19990115)38:1/2<236::AID-ANIE236>3.0.CO;2-M
Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequencesJournal of Computational Physics 276:341–356.https://doi.org/10.1016/j.jcp.2014.07.024
Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformationsJournal of Molecular Biology 331:281–299.https://doi.org/10.1016/S0022-2836(03)00670-3
LINCS: a linear constraint solver for molecular simulationsJournal of Computational Chemistry 18:1463–1472.https://doi.org/10.1002/(SICI)1096-987X(199709)18:12<1463::AID-JCC4>3.0.CO;2-H
Comparison of simple potential functions for simulating liquid waterThe Journal of Chemical Physics 79:926.https://doi.org/10.1063/1.445869
The HSP70 chaperone machinery: j proteins as drivers of functional specificityNature Reviews Molecular Cell Biology 11:579–592.https://doi.org/10.1038/nrm2941
Nucleotide-dependent interactions within a specialized Hsp70/Hsp40 complex involved in Fe-S cluster biogenesisJournal of the American Chemical Society 136:11586–11589.https://doi.org/10.1021/ja5055252
Coarse-grained models for simulations of multiprotein complexes: application to ubiquitin bindingJournal of Molecular Biology 375:1416–1433.https://doi.org/10.1016/j.jmb.2007.11.063
Pathways of allosteric regulation in Hsp70 chaperonesNature Communications 6:8308.https://doi.org/10.1038/ncomms9308
Interaction of E. coli Hsp90 with DnaK involves the DnaJ binding region of DnaKJournal of Molecular Biology 429:858–872.https://doi.org/10.1016/j.jmb.2016.12.014
Structure and energetics of an allele-specific genetic interaction between dnaJ and dnaK: correlation of nuclear magnetic resonance chemical shift perturbations in the J-domain of Hsp40/DnaJ with binding affinity for the ATPase domain of Hsp70/DnaKBiochemistry 42:4926–4936.https://doi.org/10.1021/bi027070y
Targeting Allosteric Control mechanisms in Heat shock protein 70 (Hsp70)Current Topics in Medicinal Chemistry 16:2729–2740.https://doi.org/10.2174/1568026616666160413140911
Protein structure prediction from sequence variationNature Biotechnology 30:1072–1080.https://doi.org/10.1038/nbt.2419
Investigation of the interaction between DnaK and DnaJ by surface plasmon resonance spectroscopyJournal of Molecular Biology 289:1131–1144.https://doi.org/10.1006/jmbi.1999.2844
Challenges in homology search: hmmer3 and convergent evolution of coiled-coil regionsNucleic Acids Research 41:e121.https://doi.org/10.1093/nar/gkt263
Polymorphic transitions in single crystals: A new molecular dynamics methodJournal of Applied Physics 52:7182–7190.https://doi.org/10.1063/1.328693
NMR structure of the J-domain and the Gly/Phe-rich region of the Escherichia coli DnaJ chaperoneJournal of Molecular Biology 260:236–250.https://doi.org/10.1006/jmbi.1996.0395
Allosteric opening of the polypeptide-binding site when an Hsp70 binds ATPNature Structural & Molecular Biology 20:900–907.https://doi.org/10.1038/nsmb.2583
Metazoan Hsp70 machines use Hsp110 to power protein disaggregationThe EMBO Journal 31:4221–4235.https://doi.org/10.1038/emboj.2012.264
The kinetic parameters and energy cost of the Hsp70 chaperone as a polypeptide unfoldaseNature Chemical Biology 6:914–920.https://doi.org/10.1038/nchembio.455
Structural features required for the interaction of the Hsp70 molecular chaperone DnaK with its cochaperone DnaJJournal of Biological Chemistry 274:30534–30539.https://doi.org/10.1074/jbc.274.43.30534
A conserved HPD sequence of the J-domain is necessary for YDJ1 stimulation of Hsp70 ATPase activity at a site distinct from substrate bindingThe Journal of Biological Chemistry 271:9347–9354.https://doi.org/10.1074/jbc.271.16.9347
Allosteric regulation of Hsp70 chaperones involves a conserved interdomain linkerJournal of Biological Chemistry 281:38705–38711.https://doi.org/10.1074/jbc.M609020200
Axel T BrungerReviewing Editor; Stanford University Medical Center, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Modeling Hsp70/Hsp40 interaction by multi-scale molecular simulations and co-evolutionary sequence analysis" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Arup Chakraborty as the Senior Editor. The reviewers have opted to remain anonymous.
Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, major revisions are required before the manuscript can be considered further. Please note that there is no guarantee for acceptance.
Your paper aims to establish an integrative bioinformatic pipeline, bringing together co-evolutionary modeling with molecular simulations based on both coarse-grained and atomistic models. There is considerable discussion in the field as regards to the relevant interface between DnaK/DnaJ and homologous Hsp70/Hsp40 complexes. Crystal structures and NMR experiments have produced conflicting results. The simulations seem to generally support the conclusions drawn from the NMR studies of the DnaK/DnaJ complex. However, there is no further experimental validation of the computational results, and the study currently lacks new insights into the functional roles of the J protein/Hsp70 interaction. One reviewer commented that she/he could not reproduce the co-evolutionary analyses (although we do not have the MSA). Many other technical points noted below need to be addressed.
1) While we find the molecular simulation part very convincing, we are a bit perplexed, possibly due to lack of clarity in the manuscript, about the co-evolutionary analysis. In particular we failed to reproduce some of the results presented. Lacking the multiple sequence alignment (MSA) of the two protein families, we are not sure that we followed the same pipeline as you did. The MSA must be deposited and the pipeline used clearly described. Also, please provide more information about the alignments of the two proteins. How many species are included? What is the statistics of paralogs? How many species with unique copies of both proteins exist in the alignment (how are they correctly matched)? Are there cases of proteins coded in operons or of certified interaction which can be imposed in the matching? You cite Malinverni et al., 2015 for how the MSA was obtained, but this reference does not provide sufficient detail.
2) When you note: "We built two separate seeds containing Hsp70 and Hsp40 sequences, covering a broad portion of the tree of life" did you include sequences other than those for Prokaryotes? If you did so, how could you justify this inclusion as it well known, and also acknowledged by you in the fifth paragraph of the Introduction, that the Bacterial system seems to be incompatible with the Eukaryotic one?
3) Apart from the seed, it is also not clear if eventually non-bacterial sequences were removed from the final MSA. If you did not do so, it would be very important to present the same analysis only on bacterial sequences and discuss differences in the inference (if any).
On a related note, it would be interesting to verify your claim that the statistics over repeated random matchings is informative, by applying your method to PPI treated in past work where the operon-based matching is known. This will provide insights into the generalizability of the approach to other protein systems lacking operons. The same problem has been recently addressed by two papers by Bitbol et al. and by Gueudre et al. in PNAS (2016). Both propose rather involved matching schemes. Would the application of your methods improve the results? In case these papers provide their codes, this would be an easy analysis to be added. The paper has applied the selection criteria of Hopf et al., 2014 with a cutoff of 0.8. The quantity subject to this cutoff is not mentioned. Why should the cutoff established for operonic matchings be applicable to random matchings? We are rather surprised (positively) that a random matching produces similarly strong signal.
4) The method used to match the two protein families is interesting but recently two other methods have been published tackling the issue of concatenating MSAs with a new computational approach: Thomas Gueudré et al. "Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis", vol. 113 no. 43, 12186-12191, doi:10.1073/pnas.1607570113, Published online before print October 11, 2016. Anne-Florence Bitbol, Robert S. Dwyer, Lucy J. Colwell, and Ned S. Wingreen Inferring interaction partners from protein sequences PNAS 2016 113 (43) 12180-12185; published ahead of print September 23, 2016, doi:10.1073/pnas.1606762113. We encourage you to try one or both of the methods mentioned to verify the robustness of the random matching.
5) Together with the selection criterion introduced in Hopf et al., 2014, it could be interesting to show in general a histogram across the 1000 stochastically concatenated MSAs of the original residue-residue score (Ekeberg, Hartonen and Aurell, 2014) in order to figure out whether the score is doing more than largely producing top-scoring pairs. Similarly, how many protein pairs typically have a score larger than 0.8?
6) The long all-atom MD runs are only performed for the HPD-IN conformation. Would runs on the HPD-OUT conformation confirm the preference for the IN conformation over the OUT conformation?
7) A critically important overarching question is what have we learned from this study that was not known previously? The interaction site on DnaK is not unexpected, the region on the J-domain, especially the HPD, was known to be key to this interaction, and the dynamic nature of the complex was expected. Thus, the work, while laudable, in its current form, does not move the field significantly forward. You need to attempt to address some of the functional underlying questions: how do J-proteins modulate Hsp70s to affect their allosteric cycle? What is the role of the diversity of J-proteins and how involved is the SBD?
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Modeling Hsp70/Hsp40 interaction by multi-scale molecular simulations and co-evolutionary sequence analysis" for further consideration at eLife. Your revised article has been favorably evaluated by Arup Chakraborty (Senior Editor), a Reviewing Editor, and three reviewers.
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:
1) In previous literature (namely Feinauer et al., 2016; Gueudré et al., 2016), an empirical coevolutionary score for protein pairs was introduced: starting from the Average Product Corrected (APC) inter-protein residue pair score (i.e. the restriction of the APC coupling score to all pair of residues i, j for which i belongs to one protein and j to the second one), and consider the mean over the 4 largest. It would be extremely interesting to show this score for the random matching strategy and also for PPM and IPA to compare the "strength" the DnaJ/DnaK coupling in comparison with other known protein pairs presented in literature.
2) While it becomes clear from the manuscript and from the authors reply that the central interest is in the Hsp70/Hsp40 system and not in the development of a general-purpose methodology, two results reported in the rebuttal letter but not in the manuscript or its supplement should be reported in the supplement, with a short reference from the new last paragraph of the subsection “3. Random Paralog matching”:
The random matching procedure is successful even if applied to the two component system used in Bitbol et al., 2016 and Gueudré et al., 2016.
The matching procedure of Bitbol et al., 2016 and Gueudré et al., 2016 produces strongly overlapping results with the random-matching procedure.
3) A minor remark concerns the results of the 3701 always matched protein pairs, where both sequences are single copy in their genomes. It is interesting that this case recovers part of the strongest signal, but the sampling of random matchings is able to enhance the coevolutionary signal beyond the one found for the uniquely matched pairs. Again, this is a nice detail, it might be introduced at the beginning of the Methods section on random matching, or at the end of Sec. II.B. “Coevolutionary Analysis predicts conserved DnaK-DnaJ contacts”.
4) HMMer should be cited in the first paragraph of the subsection “1. Sequence Extraction and Preprocessing”.https://doi.org/10.7554/eLife.23471.039
- Duccio Malinverni
- Paolo De Los Rios
- Duccio Malinverni
- Paolo De Los Rios
- Alfredo Jost Lopez
- Gerhard Hummer
- Alessandro Barducci
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
DM thanks the Swiss National Science Foundation (http://www.snf.ch/) for grants 2012_149278 and 20020_163042/1. AJL and GH were supported by the Max Planck Society. AB acknowledges the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-14-ACHN-0016. This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project s684.
- Axel T Brunger, Reviewing Editor, Stanford University Medical Center, United States
© 2017, Malinverni et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.