1. Biophysics and Structural Biology
  2. Computational and Systems Biology
Download icon

Modeling Hsp70/Hsp40 interaction by multi-scale molecular simulations and coevolutionary sequence analysis

  1. Duccio Malinverni
  2. Alfredo Jost Lopez
  3. Paolo De Los Rios
  4. Gerhard Hummer
  5. Alessandro Barducci Is a corresponding author
  1. Faculté de Sciences de Base, École Polytechnique Fédérale de Lausanne, Switzerland
  2. Max Planck Institute of Biophysics, Germany
  3. École Polytechnique Fédérale de Lausanne, Switzerland
  4. Institut für Biophysik, Johann Wolfgang Goethe Universität Frankfurt, Germany
  5. Inserm, U1054, France
  6. Université de Montpellier, CNRS, UMR 5048, Centre de Biochimie Structurale, France
Research Article
Cited
1
Views
863
Comments
0
Cite as: eLife 2017;6:e23471 doi: 10.7554/eLife.23471

Abstract

The interaction between the Heat Shock Proteins 70 and 40 is at the core of the ATPase regulation of the chaperone machinery that maintains protein homeostasis. However, the structural details of the interaction remain elusive and contrasting models have been proposed for the transient Hsp70/Hsp40 complexes. Here we combine molecular simulations based on both coarse-grained and atomistic models with coevolutionary sequence analysis to shed light on this problem by focusing on the bacterial DnaK/DnaJ system. The integration of these complementary approaches resulted in a novel structural model that rationalizes previous experimental observations. We identify an evolutionarily conserved interaction surface formed by helix II of the DnaJ J-domain and a structurally contiguous region of DnaK, involving lobe IIA of the nucleotide binding domain, the inter-domain linker, and the β-basket of the substrate binding domain.

https://doi.org/10.7554/eLife.23471.001

Introduction

The 70 kDa and 40 kDa Heat Shock Proteins (Hsp70/Hsp40) form the core of a chaperone machinery that plays essential roles in proteostasis and proteolytic pathways (Daugaard et al., 2007; Hartl et al., 2011; Mayer, 2013). Hsp70 chaperones, and their cochaperone partners Hsp40, are highly conserved ubiquitous proteins, present in multiple paralogs in virtually all known organisms (Daugaard et al., 2007; Kampinga and Craig, 2010). The chaperoning role of this machinery is based on the ability of Hsp70s to bind client proteins in non-native states, thereby preventing and reverting aggregation, unfolding misfolded proteins, assisting protein degradation and translocation (De Los Rios et al., 2006; Proctor and Lorimer, 2011; Rampelt et al., 2012; Sharma et al., 2010).

Members of the Hsp70 family are composed of two domains, connected by a flexible linker: the N-terminal nucleotide binding domain (NBD) binds and hydrolyzes ATP, whereas the C-terminal substrate binding domain (SBD) interacts with client proteins (Zuiderweg et al., 2013). The nature of the bound nucleotide induces dramatically different conformations of Hsp70: in the ADP-bound state, the two domains are mostly detached and behave almost independently (Bertelsen et al., 2009), whereas in the ATP-bound state, the SBD splits into two sub-domains that dock onto the NBD (Kityk et al., 2012; Qi et al., 2013). Therefore, nucleotide hydrolysis and exchange induce large-scale conformational dynamics that regulate the chaperone interaction with client proteins (Mayer, 2013).

Hsp40s are also called J-Domain Proteins, as they are invariantly characterized by the presence of a 70 residue signature domain (J-domain), within a variable multi-domain architecture. This J-domain is composed of four helices (Figure 1A). The two central helices II and III form an antiparallel bundle, connected by a flexible loop with a highly conserved distinctive histidine-proline-aspartate (HPD) motif (Pellecchia et al., 1996). Several studies have indicated the essential role of the J-domain and of the HPD motif in Hsp40/Hsp70 interactions (Greene et al., 1998; Mayer et al., 1999; Suh et al., 1998; Tsai and Douglas, 1996). While the structural diversity of Hsp40s mirrors the functional versatility of this complex machinery, the common conserved J-domain is strictly necessary for enhancing ATP hydrolysis by Hsp70 (Kampinga and Craig, 2010). Modulation of Hsp70 ATPase activity through formation of transient Hsp70/Hsp40/client complexes regulates the chaperone affinity for client proteins (De Los Rios and Barducci, 2014; Kellner et al., 2014) and is hence essential for all its multiple cellular functions. Continuous switching between multiple conformations means that the chaperone-cochaperone interaction is intrinsically highly dynamic. Understanding the complex interplay between Hsp70 and Hsp40 at the mechanistic level is therefore a crucial task to gain a deeper functional insight into the chaperone machinery (Mapa et al., 2010).

Figure 1 with 4 supplements see all
Binding modes of DnaK/DnaJ.

(A) Probabilities of JD residues to be in contact with the DnaK (NBD(ATP) case). Helices I-IV of the JD are highlighted. See (Figure 1—figure supplement 1) for the NBD(ADP) and FL(ATP) cases. (B) Probabilities of DnaK residues to be in contact with the JD (NBD(ATP) case). Blue: low, Red: high, see scalebar. (C,E,G) Orientational free energy as a function of the spherical polar angles (Φcom,Ωcom) of the JD center of mass for (A) NBD(ADP), (C) NBD(ATP), (E) FL(ATP). The origin and the reference axes are defined by NBD center of mass and inertia axes, respectively. (D,F,H) The free energy surface as a function of Euler angles (Ω,Ψ) defining the relative orientation of J-domain w.r.t. the NBD for (B) NBD(ADP), (D) NBD(ATP), (F) FL(ATP). See Materials and methods and Figure 1—figure supplement 4 for details on the angular definitions. Iso-lines are drawn at 1 kBT free-energy intervals. See (Figure 1—figure supplement 2) for the third Euler angle.

https://doi.org/10.7554/eLife.23471.002

Extensive experimental evidence of Hsp70/40 interactions has been accumulated over the last two decades (Alderson et al., 2016). Mutagenesis, surface plasmon resonance and NMR experiments have identified multiple putative interacting regions of the J-domain and Hsp70, mostly focusing on the E. coli DnaK/DnaJ system (Ahmad et al., 2011; Genevaux et al., 2002; Greene et al., 1998; Suh et al., 1998, 1999). In spite of this considerable effort, the dynamic and transient nature of the Hsp70/Hsp40 complex has posed severe challenges to its structural characterization and no consensus view has yet been reached on this.

To date, the only available high-resolution structure has been obtained by means of X-ray crystallography of the NBD of bovine Hsc70 (Hsp70) covalently linked to the J-domain of bovine auxilin (Hsp40) (Jiang et al., 2007). However, this structure cannot be easily reconciled with NMR and mutagenesis data collected on DnaK/DnaJ, thus suggesting either major differences in the binding modes of bacterial and eukaryotic Hps70/40s or the trapping of a sparsely populated state that is influenced by non-native contacts (Sousa et al., 2012; Zuiderweg and Ahmad, 2012). More recently, solution PRE-NMR experiments identified an alternative highly dynamic interface between ADP-bound DnaK and DnaJ (Ahmad et al., 2011).

Here we relied on both multi-scale molecular modeling and statistical analysis of protein sequences to shed light on the Hsp70/Hsp40 interactions. By combining these complementary techniques, we propose a structural model of the binding of bacterial DnaK/DnaJ, which is in good agreement with available experimental data and greatly extends our understanding of this elusive yet fundamental process.

Results

Coarse-grained simulations identify binding regions and suggest structural models for the DnaK/DnaJ complex

We characterized DnaK/DnaJ interactions by means of Monte Carlo simulations based on a coarse-grained (CG) potential energy validated against structural and thermodynamic properties of protein complexes with low binding affinity (Kim and Hummer, 2008; Kim et al., 2008; Różycki et al., 2011). Binding partners are modeled as rigid bodies, using one interaction site per amino-acid (residue) located at the Cα position of the experimental structure. Intermolecular energy functions are based on statistical contact potentials and long-range Debye-Hückel electrostatic interactions (Kim and Hummer, 2008). A replica exchange Monte Carlo simulation protocol is adopted to exhaustively sample all relevant bound conformations. We took advantage of this approach and of the availability of high-resolution structures of the individual binding partners to investigate complexes formed by the J-domain of E. coli DnaJ (JD) with the DnaK NBD, both in its ADP- and ATP-bound conformations (NBD(ADP), NBD(ATP)). Moreover, we extended this analysis to full-length ATP-bound DnaK (FL(ATP)) in order to unveil a possible role of the SBD in the binding process.

CG trajectories were analyzed to determine the binding affinity of the three DnaK constructs for JD and to characterize the most favorable complex conformations. Calculated binding affinities (KD = 540 μM ± 60 NBD(ADP), KD = 370 μM ± 35 NBD(ATP), KD = 23 μM ± 3 FL(ATP)) are compatible with previous experimental determinations (Ahmad et al., 2011; Greene et al., 1998; Wittung-Stafshede et al., 2003), and their significant dependence on the presence of the SBD and linker suggests a stabilizing role of this region in the DnaK/JD complex. The analysis of the conformational ensembles corresponding to bound complexes revealed several distinctive features of the DnaK/JD binding process, along with a certain degree of conformational heterogeneity (Figure 1—figure supplement 3). The free energy surfaces as a function of NBD-centered spherical coordinates (Figure 1C,E,G) clearly indicate that a specific binding site predominates in all the simulated DnaK constructs, irrespective of the bound nucleotide and of the presence of the SBD and linker.

To better characterize this favored binding interface, we calculated the probability of each DnaK residue to be in direct contact with JD and mapped it onto the NBD structure (Figure 1B). These results suggest that the formation of DnaK/DnaJ complexes mostly involves a DnaK region located on lobe IIA of the NBD, in a negatively charged narrow groove formed by a beta-sheet and a short loop (Figures 1B and 2A,B). The complementary interface on JD suggests that its interaction with DnaK is mostly mediated by the positively charged helix II and few residues on helix I (Figure 1A).

Figure 2 with 1 supplement see all
Conformations of the DnaK:JD complex.

Most representative HPD-OUT/IN conformations for the FL(ATP):JD system (HPD-OUT: (A) and (C), HPD-IN: (B) and (D)). The four lobes forming the sub-structures of the NBD are highlighted. The SBD is in brown. The docked inter-domain linker is in purple. The HPD tripeptide of the J-domain is depicted as magenta spheres. The JD is depicted in blue (HPD-OUT) or red (HPD-IN). See (Figure 2—figure supplement 1) for the NBD(ADP) and NBD(ATP) systems. For readability, the NBD in the rotated panels C) and D) is uniformly colored in green.

https://doi.org/10.7554/eLife.23471.007

We further analyzed the conformational ensembles obtained by CG simulations to identify the most relevant orientations of the JD in the bound complexes. The free energy surfaces as a function of Euler angles measuring the relative orientation of the binding partners reveal two major conformational sub-ensembles in all the simulated systems (Figure 1D,F,H, Figure 1—figure supplement 2). Cluster analysis of the CG trajectories suggests that these distinct intermolecular arrangements correspond to complexes with similar binding interfaces but opposite orientations of the JD with respect to DnaK (Figure 2, Figure 2—figure supplement 1). The conserved HPD loop of the JD points outwards in one conformational sub-ensemble (HPD-OUT, Figure 2A,C), whereas in the other it is close to the groove on the NBD where the inter-domain linker docks (HPD-IN, Figure 2B,D). These two arrangements were observed in all the simulated systems with very limited perturbations because of the presence of SBD, and together they account for more than 91% of the populations in the bound ensembles. In all the simulated systems, the population of the HPD-OUT conformation is higher than that of HPD-IN. However, the observed free energy differences between the two binding modes (1.5 kcal/mol) are comparable with the expected uncertainty of the CG model (Kim and Hummer, 2008).

Coevolutionary analysis predicts conserved DnaK-DnaJ contacts

Statistical analysis of the covariation in multiple sequence alignments (MSAs) represents an extremely valuable approach to investigate protein structure by identifying residue-residue interactions that are evolutionarily conserved (de Juan et al., 2013; Marks et al., 2012). Particularly, direct coupling analysis (DCA) (Morcos et al., 2011; Weigt et al., 2009) of paired MSAs of interacting proteins has been successfully applied to predict interfaces of protein complexes (Hopf et al., 2014; Ovchinnikov et al., 2014). The canonical matching algorithms for generating paired MSAs are based on intergenic distances and, unfortunately, they cannot be directly applied to the Hsp70/40 interaction, because of promiscuous interactions that cannot strictly be predicted by operon structure in this family. To circumvent this difficulty, we adopted an alternative approach based on generation of an ensemble of stochastically matched MSAs. In this context, the statistical reliability of inter-residue couplings can be related to their frequency of appearance within the DCA predictions obtained from all the realizations of the MSA ensemble (see Materials and methods for details).

We took advantage of the large sizes of the Hsp70/40 families (Hsp70: 20061 sequences, Hsp40: 26254 sequences, distributed in all kingdoms, see Materials and methods) to evaluate inter-residue evolutionary couplings between the Hsp40 JD and the Hsp70 NBD. The high degree of conservation of these domains guarantees high-quality alignments and thus accurate results. This analysis identified three inter-protein residue pairs that stand out among coevolving pairs in the Hsp40 and Hsp70 families (Figure 3A), corresponding to N187-K23, D208-K26 and T189-R19 in E. coli DnaK and DnaJ. The spatial proximity of N187/D208/T189 on the DnaK NBD, and the proximity between K23/K26/R19 on the JD, suggest the presence of well-defined evolutionarily conserved binding patches across Hsp40/70 families.

Figure 3 with 1 supplement see all
Contacts from coevolutionary analysis.

(A) Frequency of appearance of the coevolutionary inter-protein contacts (see Materials and methods). The three most frequent contacts are highlighted (N187-K23 magenta, D208-K26 orange and T189-R19 blue. Numbering refers to the E. coli DnaK-DnaJ (Uniprot IDs: DnaK P0A6Y8, DnaJ P08622)). (B–C) The same three contacts represented on the HPD-OUT (blue, panel B) and HPD-IN (red, panel C) conformations of the NBD(ATP):JD complex. Coevolving residues are depicted by spheres, following the color scheme of panel A. Gray spheres represent D35 of the 33HPD35 motif on JD and R167 of the NBD. (See Materials and methods and Figure 7 for an extended DCA analysis and validation.)

https://doi.org/10.7554/eLife.23471.009

Remarkably, these patches are perfectly overlapping with the binding regions predicted by CG modeling, that is, helix II in DnaJ and a sub-region of lobe IIA in DnaK NBD. Thus, DCA predictions can be used to evaluate the HPD-IN and HPD-OUT binding modes suggested by CG simulations (Figure 3B,C). Quantitative assessment is limited by several factors such as the difficulty of translating coevolutionary couplings into exact distance restraints, the limited resolution of the residue-based CG model and the dynamic nature of the JD/DnaK complexes. Nevertheless, while both the intermolecular conformations might be compatible with DCA predictions, the specific binding pattern predicted by coevolutionary analysis matches significantly better the JD orientation observed in HPD-IN (Figure 3C). Moreover in this conformation, the location of D35 in the HPD motif is compatible with a putative interaction with R167 on DnaK NBD, as previously suggested by mutagenesis experiments (Suh et al., 1998). The overall better agreement of the HPD-IN conformation can be further strengthened by the observation that the average Cα-Cα distance observed in HPD-OUT for the pair T189-R19 seems too large (>20Å) to justify a direct interaction, and thus a strong statistical coupling between those residues.

We repeated the DCA analysis on two subsets restricted to either bacterial or eukaryotic sequences, to further investigate the origin of the detected coevolutionary signal. The results obtained on the bacterial subset (Figure 3—figure supplement 1A) are in perfect agreement with the results obtained on the full dataset, whereas no strong coevolutionary couplings are detected using the eukaryotic subset (Figure 3—figure supplement 1B). This observation indicates that the coevolutionary signal of the observed Hsp70/Hsp40 interface mostly originates from the bacterial sequences in the dataset.

Atomistic MD simulations highlight a dynamical interface

We performed atomistic explicit-solvent molecular dynamics (MD) simulations to investigate the stability and the dynamics of the DnaK/DnaJ docked conformations obtained with CG modeling.

Firstly, we assessed the reliability of the DnaK/DnaJ complexes by performing 10 MD runs of 30 ns for each system (JD:NBD(ADP), JD:NBD(ATP) and JD:FL(ATP)) in both HPD-IN and HPD-OUT binding modes (see Materials and methods). These simulations showed a certain degree of conformational dynamics in all cases (Figure 4—figure supplement 1), and provided information about the relative stability of the various complexes. Particularly, the Cα distance root mean square deviation (dRMS) and the average angular deviation Θ of the JD with respect to the starting frame (Table 1, Figure 4—figure supplements 23) revealed that NBD(ADP):JD and NBD(ATP):JD complexes in the HPD-OUT conformations displayed high structural variability in contrast with HPD-IN conformations, which were significantly more stable on this time scale. This difference was significantly less pronounced in the simulations of full-length DnaK, likely because of the stabilizing effect of JD-SBD interactions.

Table 1

Stability analysis of atomistic MD. dRMS is the distance root mean square deviation between Cα atoms of the JD and DnaK . Θ is the angle formed by the principal axis of the JD with respect to the principal axis of the JD in the starting frame. Standard deviations over the 10 MD trajectories are reported in parentheses. See Materials and methods for details.

https://doi.org/10.7554/eLife.23471.011
dRMS [Å]Θ [°]
NBD(ADP)HPD-OUT9.2(2.1)46.9(15.8)
HPD-IN3.7(0.9)15.6(6.1)
NBD(ATP)HPD-OUT12.3(5.0)40.9(17.4)
HPD-IN6.1(3.0)19.9(12.9)
FL(ATP)HPD-OUT6.2(2.7)18.6(8.5)
HPD-IN5.2(2.0)21.8(5.1)

We then focused on the FL(ATP):JD complex in the HPD-IN conformation, which stands out among the other systems as it involves full-length DnaK and is compatible with coevolutionary and mutagenesis data. Particularly, we performed three MD simulations of 1 μs to better probe its conformational dynamics. The results confirmed the stability of the HPD-IN arrangement on a more extended time scale but unveiled the presence of multiple, distinct conformational states within this overall binding mode (Figure 4A). While an exhaustive characterization of the conformational space exceeds the capabilities of all-atom MD, the broad structural ensembles are suggestive of a significant degree of conformational dynamics in the μs timescale. This picture is consistent with the broad, multi-modal distributions of the atomic distances corresponding to relevant intermolecular interactions, such as the three evolutionarily conserved contacts and D35-R167 (Figure 4B). These interactions thus appear to be transiently populated in the context of a highly dynamical intermolecular interface.

Figure 4 with 4 supplements see all
All-atom 1μs simulations of FL-ATP.

(A) 10 snapshots of the three long atomistic MD simulations of FL-ATP DnaK bound to the DnaJ JD. Green: NBD, magenta: linker, brown SBD, red/cyan/purple: JD for the three trajectories. For ease of visualization, only helices II and III of the JD are depicted. (B) Distributions of the distances of the three coevolving contacts and the D35-R167 contact. The distance distributions for the three cases follow the color scheme of panel A (red/cyan/purple). Shaded areas are the sum of the distributions for each case. See (Figure 4—figure supplement 4) for traces of the trajectories.

https://doi.org/10.7554/eLife.23471.012

To shed further light on the molecular determinants of the DnaK/JD interaction, we then evaluated the energetic contributions of individual residues to the protein-protein binding energy in the 1μs trajectories, using a generalized born surface area (GBSA) approximation (see Material and Materials and methods). The per-residue decomposition of the binding energy highlighted four fragments of DnaK that contribute most strongly to the stabilization of the DnaK/JD complexes (Figure 5A). The residues corresponding to three of these spots form an almost continuous patch covering the upper-cleft between lobes II and III of the NBD (residues 206–219, 329–335) and a segment of the inter-domain linker (residues 391–393) Figure 5A–B). Reciprocally, the energetic analysis of the JD residues predicted helix II and the HPD loop as being the principal region involved in energetic stabilization of the complex (Figure 5—figure supplement 1). These findings confirm that the JD strongly interacts with the docked linker and its neighboring residues, suggesting the role of JD at stabilizing this linker arrangement. Remarkably, the energetic analysis also unveiled that a stretch of the SBD beta-basket plays an important role in securing the DnaK/JD interface (residues 414–423).

Figure 5 with 2 supplements see all
J-domain - SBD interactions.

(A) DnaK per-residue contribution to the binding energy with the J-domain. The NBD, linker, and SBD regions are highlighted in green, pink, and ochre, respectively. The dashed line denotes the threshold for which residues are depicted in panel B (−1kcal/mol). (B) Structural view of the residues most contributing to the binding energy with the J-domain. The subdomains of DnaK (NBD, linker, and SBD) follow the same color scheme as in panel A. Residues significantly contributing to the binding energy (ΔE < −1 kcal/mol) are depicted in blue surface representation. (C) Coevolutionary contacts predicted on the full-length Hsp70 sequences. The three blue contacts are the same as those reported in Figure 3. The two contacts involving the SBD of Hsp70 are shown in yellow. The dotted circle represents residue E75 of the J-domain, absent from the structures used in the simulations (see Materials and methods). The depicted conformation is the final frame of one of the three 1-μs HPD-IN FL-ATP simulations.

https://doi.org/10.7554/eLife.23471.017

To investigate the functional relevance of the interaction between the JD and the SBD, we then extended the DCA analysis to full-length Hsp70 sequences. This analysis confirmed the significance of the previously observed NBD/JD contacts and predicted two inter-protein contacts involving the SBD (H422-E75 and Q424-K51 in E. coli DnaK/DnaJ numbering, Figure 5C and Figure 5—figure supplement 2). Interestingly, the two residues on the SBD correspond to the SBD region energetically involved in the binding of the JD, and thus show that their interaction has been conserved through evolution. Of the two corresponding residues on the JD, one is located on helix III (K51), in excellent agreement with the HPD-IN binding mode, while the second (E74) lies in the unstructured C-terminal region of the JD, which was not included in the structural model (see Materials and methods). Taken together, these energetic and evolutionary analyses strongly indicate that the J-domain directly interacts with the inter-domain linker in its docked conformation, as well as with the SBD.

Discussion

The integration of complementary approaches such as coevolutionary sequence analysis and molecular modeling at the coarse-grained and atomistic scale allowed us to shed light on the structural details of the crucial interaction of DnaJ with DnaK.

Indeed, molecular simulations based on a CG model specifically suited to study low-affinity protein binding identified the positively charged helix II of the JD and a region close to lobe IIA of the DnaK NBD as the most relevant interaction sites in the formation of DnaK/JD complexes. This prediction was corroborated by statistical sequence analysis showing that several inter-protein contacts across this interface strongly coevolve in the Hsp40/70 family. These findings are in good agreement with much experimental evidence collected in the last 20 years. Indeed, a major role for helix II of JD in the DnaK/DnaJ interaction has been suggested both by NMR and mutagenesis experiments (Greene et al., 1998; Suh et al., 1998). Furthermore, our prediction is in excellent agreement with recent PRE-NMR investigation of the interaction of JD with ADP-bound DnaK that identified the sequence 206EIDEVDGEKTFEVLAT221 as the main binding region on DnaK (Ahmad et al., 2011). The observation that the same interaction site was present in all the simulated systems (ATP- and ADP-NBD and ATP-bound full-length DnaK) strongly suggests that this region is likely to play a primary role throughout the chaperone functional cycle, thus greatly extending its physiological relevance. Interestingly, the predicted bound conformations located the J-domain in near proximity to the docked inter-domain linker in FL(ATP) (Figure 2), which has been shown to play a central role in the allosteric coupling of the two domains in the Hsp70 cycle (Alderson et al., 2014; Vogel et al., 2006; Zhuravleva et al., 2012).

Beyond a detailed characterization of the binding regions on DnaK and the JD, our integrated approach provided precious information about the inter-protein arrangement in the transient DnaK/JD complexes. Effectively, CG modeling suggested two possible binding modes characterized by opposite orientations of the JD (Figure 2). Both these putative conformations were only minimally affected by structural differences in the NBD upon ATP/ADP binding or by inter-domain docking in full-length ATP-bound DnaK. Direct comparison of these results with the interaction pattern inferred from coevolutionary analysis reveals an excellent agreement for one of the conformations (HPD-IN). Further elements supporting the relevance of this structure can be found by taking into account the role of the highly conserved HPD loop of the JD. Indeed, several mutagenesis studies have shown that the HPD loop is fundamental for functional chaperone/cochaperone interactions (Landry, 2003; Suh et al., 1998). NMR investigations have reported conflicting evidence about the actual involvement of the HPD region in the Hsp70/Hsp40 interface (Ahmad et al., 2011; Greene et al., 1998; Kim et al., 2014). However, the observation that the DnaK R167H mutation could suppress the deleterious effect of the DnaJ D35N mutation strongly pointed toward a direct, yet possibly transient, interaction of these residues during the chaperone functional cycle (Suh et al., 1998). Strikingly, this experimental evidence is perfectly compatible with the spatial proximity of DnaJ D35 and DnaK R167 in the HPD-IN conformation (Figure 3C), whereas it cannot be easily reconciled with the orientation of the HPD in the current structural model of the DnaK:JD complex based on PRE-NMR experiments (Ahmad et al., 2011). The HPD-IN conformation hence provides a novel, suggestive model for the elusive DnaK/DnaJ complex that best recapitulates the most relevant experimental evidence on prokaryotic Hsp70/Hsp40 systems.

The insights obtained combining CG modeling and coevolutionary sequence analysis were further confirmed and enriched by explicit solvent, atomistic simulations. Indeed, MD trajectories confirmed the overall stability of the HPD-IN complex on the μs time scale and showed the transient interaction of D35-R167 and of the coevolving pairs (Figure 4). The transient nature of these contacts is perfectly compatible with the dynamical interface suggested by NMR experiments (Ahmad et al., 2011). Furthermore, energetic analysis of the atomistic simulations unveiled the residues that contribute most to the formation of this dynamical complex. Particularly, we notice that the interaction of JD helix II with the DnaK surface composed of the docked intermolecular linker and adjacent β-strands has a key role in stabilization of the binding interface. Remarkably, the MD analysis highlighted that a few residues of the SBD β-basket contribute significantly to JD binding. DCA analysis performed with full-length Hsp70 sequences further strengthened this observation by showing that the SBD/JD interface observed in the simulations actually contains pairs of coevolving residues in the Hsp70/Hsp40 family. Altogether, our results suggest that although the overall DnaK/JD arrangement is determined by interactions with the NBD, specific contacts with both the SBD and the inter-domain linker may significantly increase the complex stability, thus rationalizing the decreased affinity observed for isolated NBD (Kim et al., 2014).

To put our findings into context, we have to take into account the current understanding of allosteric signal transmission in DnaK. Much experimental evidence has indicated that ATP-bound DnaK undergoes large-scale structural fluctuations with significant inter-domain rearrangements (Mayer, 2010; Mapa et al., 2010). Within this conformational ensemble, NMR and mutagenesis studies suggested that allosterically active conformers with high ATPase activity are characterized by a docked inter-domain linker but very limited SBD-NBD contacts (Zhuravleva et al., 2012; Kityk et al., 2015; Jiang et al., 2007). Remarkably, we find that the docked inter-domain linker and neighboring residues correspond to a hotspot for Hsp70/Hsp40 interaction, suggesting a stabilization of this linker conformation in the transient DnaK/JD complex. Furthermore, the energetically favorable and evolutionarily conserved interactions between DnaK SBD and JD suggest an additional mechanism for altering the conformational ensemble of ATP-bound DnaK upon DnaJ binding. Our structural model is thus compatible with the intriguing hypothesis that the docking of the JD affects the SBD dynamics through direct interactions, shifting DnaK towards an allosterically active conformation (Zhuravleva et al., 2012). Therefore, while the dynamical interplay among NBD, inter-domain linker, SBD, and JD remains to be fully elucidated, our analysis provides insights about the regulatory role of J-domain proteins in the Hsp70 cycle, a topic of great interest for understanding the role of the Hsp70 machinery in the global chaperone network (Kravats et al., 2017) as well as for designing allosteric inhibitors (Li et al., 2016).

The alternative arrangement observed in the bovine auxilin:Hsc70 complex (Jiang et al., 2007), and its poor agreement with NMR/mutagenesis data on DnaK/DnaJ (Ahmad et al., 2011; Greene et al., 1998) raise the question of the uniqueness of the Hsp70/Hsp40 binding mode (Garimella et al., 2006). Whether these major structural differences are caused by artifacts introduced by the artificial cross-linking, or point to the existence of multiple dynamic interaction interfaces, or to phylogenetic differentiation of Hsp70/40s, remains an essential yet unsolved question. In this respect, the successful combination of coevolutionary and molecular modeling analysis proposed here paves the way for further analysis to tackle these challenges.

Materials and methods

Coarse-grained simulations

Simulation protocol

We used the coarse-grained model introduced in Kim and Hummer (2008) to simulate the binding of DnaJ to DnaK constructs. Both proteins were treated as rigid bodies, at a resolution of one bead per residue centered on the Cα atoms. We modeled NBD(ADP) using the structured region (residues 4–380) of ADP-bound E. coli DnaK (pdb: 2kho [Bertelsen et al., 2009]), whereas we relied on the X-ray structure of ATP-bound E. coli DnaK (pdb: 4jne [Qi et al., 2013]) for both NBD(ATP) (residues 4–380) and FL(ATP) (residues 1–600). The J-domain was modeled based on the structured region of the E. coli DnaJ (pdb: 1xbl [Pellecchia et al., 1996], residues 2–70). We defined the structured part of the J-domain, by aligning multiple J-domains (pdb:1xbl, 4j7z, 2m6y, 2n04, 2qsa, 2lgw, 2och, 2dn9, 2dmx, 1hdj, 1faf, 2ctw) and considering the common structured part. We therefore removed the last six C-terminal residues from the 1xbl structure to define the maximal common structured region of the J-domain. Our definition of the J-domain corresponds to the one used in Ahmad et al. (2011).

Conformations were sampled from the equilibrium distribution using a replica-exchange Monte-Carlo (REMC) algorithm in a prediodic box, with 20 replicas distributed in the temperature range 200–395K. A total of 2106 MC-steps were performed for each replica and samples were recorded every 100 steps. Dissociation constants were calculated by measuring the fraction of bound conformations, and simulations were repeated with five increasing box sizes (240–360 Å for NBD, 300–420 Å for FL(ATP)). Bound conformations were extracted by selecting all complexes in which the two proteins had at least one pair of beads within 8 Å distance and total interaction energy equal or below -2kBT. All subsequent analyses on the CG complexes have been performed on the ensemble of bound complexes. The algorithm introduced in (Daura et al., 1999) with a cutoff radius of 5 Å was used to perform cluster analysis of the CG trajectories.

Angular analysis

To characterize the angular orientation of the binary complexes, we used two sets of angular coordinates. The binding site of the JD on the DnaK NBD was first characterized by the spherical coordinates of its center of mass. Let JCM and KCM denote the center of mass of the JD and NBD, respectively (computed over all Cα atoms). Furthermore, let IiJ and IiK (i=1,2,3, i=1 corresponds to the largest moment of inertia) denote the three (normalized) axes of inertia of the JD and NBD, respectively. The spherical coordinates ΦCOM,ΩCOM describing the binding site of the JD on the NBD are then defined by the usual pair of spherical angles, in the reference coordinate system defined by the inertia axis of the NBD.

The relative orientation of the JD with respect to the NBD is characterized by three Euler angles Θ,Ω,Ψ, computed with respect to the reference frame of the NBD, as follows (see Figure 1—figure supplement 4):

Θ=180πasin(I1JI2K)
Ω=180πatan2(I2JI2Kcos(Θ),I3JI2Kcos (Θ))
Ψ=180πatan2(I1JI3Kcos(Θ),I1JI1Kcos(Θ))

where atan2 denotes the quadrant-checking arctangent function.

Coevolutionary analysis

Sequence extraction and preprocessing

To perform direct-coupling analysis we used the same sequence extraction protocol as in (Malinverni et al., 2015), reported hereafter:

  • • Initial seeds were built for both protein families, containing sequences from all kingdoms: Bacteria, Eukaryotes and Archaea.

  • • Hidden Markov models of the alignments were built, using the hmmbuild utility of the HMMER (version 3.1b2) (Mistry et al., 2013) suite, with default parameters.

  • • The union of the Swissprot and the Trembl databases (release 2015_08) was scanned against these two profiles , using the hmmsearch (Mistry et al., 2013) utility, with default parameters.

  • • For both retrieved MSAs, all sequences having more than 10% gapped positions were removed from the datasets.

  • • The Hsp70 sequences were restricted to the NBD and linker region, by trimming the C-terminal part of the alignment.

  • • Taxonomic identifiers for all sequences were retrieved from the NCBI taxonomy database.

This resulted in multiple-sequence-alignments for the Hsp70 and Hsp40 families containing, respectively, 20061 and 26254 homologues, with the following taxonomic distribution:

The number of paralogs per organism varies widely (Table 2, Figure 6), with bacteria having fewer paralogs for both Hsp40 and Hsp70. All organisms were included in the full dataset, both those having multiple paralogs and organisms possessing a single copy of Hsp40-Hsp70 pairs (3701 organisms).

Table 2

Summary of the taxonomic composition of the Hsp40 and Hsp70 alignments. Entries of the table represent the number of sequences found in each taxonomic group. The number of organisms in each taxonomic group is indicated in parenthesis.

https://doi.org/10.7554/eLife.23471.020
EukaryotesBacteriaArchaeaVirusesOtherTotal
Hsp4014369 (1093)11379 (7837)311 (273)36 (22)159 (13)26254 (9238)
Hsp707881 (1933)11819 (8272)273 (258)25 (17)63 (13)20061 (10493)
Distribution of number of paralogs per organism, for Hsp40 and Hsp70.
https://doi.org/10.7554/eLife.23471.021

The resulting MSAs covered the following ranges of the E. coli DnaK/DnaJ proteins: DnaK (Uniprot ID: P0A6Y8) I4-T395, DnaJ (Uniprot ID: P08622) K3-G78. Both sequence alignments and taxonomic identifiers for the Hsp40 and Hsp70 families are available as supplementary material (Supplementary file 1−5).

Direct-coupling analysis

Direct-coupling analysis (DCA) was performed on each of the 1000 stochastically concatenated MSAs using the asymmetric version of the pseudo-likelihood method (Ekeberg et al., 2014), with standard parameters (maximum 90% sequence identity, regularization parameters λH=λJ=0.01). In practice, the parameters Jij(A,B) and hi(A) of the generalized Potts model (Equation 1) are numerically fitted to the data in the MSAs in the Pseudo-Likelihood approximation (Ekeberg et al., 2014)

(1) P(x)=1Zexp(ijJij(Xi,Xj)+ihi(Xi))

where x denotes an amino-acid sequence, Z the normalizing partition function. The raw DCA scores Sijraw (Equation 2), quantifying the statistical coupling strength between two positions in the MSA, are defined as the Frobenius norm of the local 21 × 21 coupling matrices Jij(A,B)

(2) Sijraw=A,BJij(A,B)2

Finally, we apply the average product correction (APC) (Dunn et al., 2008) (Equation 3) to the raw DCA scores, to correct for a bias in position specific mutation rate. To account for variable mutation rates in two different protein families, we follow the modification to the APC introduced in (Ovchinnikov et al., 2014), by taking the average over the two protein segments independently

(3) Sij=Sijraw-Si.rawS.jrawS..raw

where . denotes the average over the row/column of the matrix Sij.

Random paralog matching

To detect inter-protein coevolving residue pairs, concatenated MSAs of interacting protein sequence pairs must be built. Given the lack of knowledge on the interaction network of Hsp40s and Hsp70s and the lack of conservation of the number of paralogs throughout species, no trivial matching could be performed. Furthermore, the approach of matching interacting sequence pairs based on their genomic proximity (Feinauer et al., 2016; Hopf et al., 2014; Ovchinnikov et al., 2014) failed because of a lack of operon organization of Hsp70s and Hsp40s. We therefore employed a stochastic approach to the sequence-matching problem, which consists of the following steps:

  • For each organism:

    • Randomly select a sequence of Hsp70 and randomly match it to a single Hsp40 sequence of the same organism.

    • Remove these two sequences from the pool of available sequences in the current organism.

    • Repeat this procedure until there are no more Hsp40 or Hsp70 sequences to match in the current organism.

    • Repeat the procedure for all organisms possessing at least one Hsp40 and Hsp70 sequence.

This procedure generated a stochastic realization of a matched MSA, ensuring that each sequence was present only once in the MSA. This constraint of matching each sequence only once avoided the combinatorial explosion of the size of the random MSAs and, consequently, the dilution of the coevolutionary signal because of the presence of an overwhelming majority of non-interacting protein pairs. We generated 1000 such random MSAs and performed DCA on each of them individually. For each DCA realization, we then extracted the strongest inter-protein coevolving pairs, using a criterion introduced in Hopf et al. (2014), which briefly goes as follows: all inter-protein DCA scores are renormalized as

(4) Sij~=Sij|min(SijInter)|(1+NNeff)

where Sij denotes the average-product corrected DCA score, N the length of the MSA and Neff the effective number of sequences in the MSA (taking into account the reweighting by maximum 90% identity). Note that the minimum is taken restricted to the interface scores. This renormalization partially corrects for dependencies of the inter-protein DCA scores on the alignment width (N) and depth Neff, and therefore permits an easier comparison of DCA scores across different protein families (Hopf et al., 2014). Note that the introduction of this score does not change the relative ranking of inter-protein DCA contacts, as it is merely a convenient renormalization of the scores. For each of the 1000 realizations, we collected all the inter-protein DCA pairs which had a normalized score Sij~ above 0.8. We then computed the selection frequency for each contact (Figure 3A) and retained the residue pairs that were selected most frequently for subsequent analysis.

The rationale behind this procedure was that contacts appearing repeatedly in multiple random realizations were robust to matching noise in the MSAs and should therefore reflect a strong underlying coevolutionary signal.

As mentioned above, only a single copy of Hsp40 and Hsp70 sequences were retrieved in some organisms. The corresponding Hsp70/Hsp40 pairs were systematically matched and added to all the the randomly generated MSAs. To investigate the dependence of our results on this choice, we performed a DCA analysis on a limited MSA composed solely of single-copy organisms. We observed that in this case, only a fraction of the strongest coevolutionary signals predicted from the full-dataset are recovered.

Recently, two methods to simultaneously pair interacting paralogs and predict inter-protein coevolving contacts have been developed (Bitbol et al., 2016; Gueudré et al., 2016). Both these involved methods tackle the combined objective of deciphering the paralog interaction network as well as determining coevolving residue across protein interfaces through iterative schemes. We observed that the strongest inter-protein contacts predicted by the two methods (Bitbol et al., 2016; Gueudré et al., 2016) strongly overlap with those obtained using our random-matching strategy (Figure 7—figure supplement 1). To allow comparison with other inter-protein interactions studied by DCA, we report here an empirical inter-protein score introduced in (Feinauer et al., 2016; Gueudré et al., 2016), which consists of characterizing the overall strength of inter-protein coevolution by the average of the DCA scores of the four strongest inter-protein pairs (scored by the Frobenius norm and after APC correction). We obtained empirical inter-protein scores of 0.11, 0.17, and 0.24 for the random matching, PPM, and IPA, respectively. Note that in the case of the random matching, the empirical inter-protein score is averaged over the 1000 realizations of the matching procedure. In the context of this work, our interest was restricted to predicting coevolving inter-protein contacts, and we thus applied the simpler random matching strategy discussed above.

Figure 7 with 1 supplement see all
Extended analysis of coevolutionary predictions and threshold validation.

DCA predicted contacts with threshold selection threshold above 0.2. (A) Frequency of appearance of the coevolutionary inter-protein contacts (see Materials and methods). All nine predicted contacts which appear in more than 20% (dashed red line) of the random matchings are reported. Contacts are colored by their selection frequency (blue: most frequent, white: less frequent). The two contacts containing a buried residue are underlined. (B–C) The seven surface exposed residues among the nine selected, reported on the DnaK NBD. (D–E) The seven surface exposed residues among the nine selected, reported on the JD. Panels B-E follow the same color scheme as panel (A).

https://doi.org/10.7554/eLife.23471.022

Coevolutionary contact selection

The selection frequencies of all inter-protein contacts were computed over the 1000 realizations (Figure 3C) and the most frequently appearing contacts were selected for further analysis. To set a threshold on the number of selected contacts, we computed the solvent accessible surface area (SASA) of the pairs of residues involved in the most frequent contacts. We then selected all ranked contacts before the appearance of a buried residue (SASA < 1 Å2) in the contact pair. This resulted in the selection of three significantly conserved DCA predicted contacts (Figure 3A,B,C). Note that similar limited numbers of contacts is generally considered in the DCA prediction of protein-protein interactions (Hopf et al., 2014; Ovchinnikov et al., 2014).

To validate our conservative threshold choice and further asses the robustness of our results, we extend here the analysis to all DCA predicted contacts with an appearance frequency >20% (Figure 7). Among the nine predicted contacts, two involve a buried residue (SASA < 1 Å, Table 3) and are discarded from further analysis. The seven remaining DCA predicted inter-protein contacts are depicted in Figure 7B–D. We observe that five out of the seven contacts are concentrated in the binding interface observed in the CG simulations that was already identified by the three strongest one and that these additional contacts do not qualitatively change the predictions reported in the results section (Figure 7, Table 3). This extended analysis of DCA predictions thus confirms the robustness of the results reported in the Results section using a stringent selection criterion, and further supports the DCA predicted binding interface.

Table 3

Solvent accessible surface area (SASA) of the residue involved in the nine predicted DCA contacts with selection frequency higher than 20%. The residue numbering refers to the E. coli numbering of DnaK resp. DnaJ. Numbers in parentheses denote the SASA of the amino-acids involved in the contacts normalized to the average amino-acid SASA (data from [Chothia, 1976]). Residue pairs involving a buried amino-acid are colored in gray. δ denotes the average Cα distances of the corresponding residue pairs in FL(ATP) conformations from the CG simulations in the HPD-IN/OUT orientations.

https://doi.org/10.7554/eLife.23471.024
ContactSasa 1 [Å2]Sasa 1 [Å2]δOUT [Å]δIN [Å]Selection frequency
N187 - K23102.7 (0.64)119.8 (0.60)13.711.40.996
D208 - K2641.6 (0.27)144.9 (0.72 )8.510.00.976
T189 - R1917.3 (0.12)177.9 (0.79)23.013.70.587
A176 - A640.6 (0.005)34.6 (0.30)--0.414
L392 - A2989.5 (0.52)11.1 (0.10)11.79.70.334
L219 - E5544.7 (0.26)69.8 (0.37)22.617.70.311
D224 - K5140.7 (0.27)95.2 (0.48)33.731.60.287
L320 - M300.7 (0.004)129.2 (0.70)--0.285
F356 - R3648.5 (0.23)214.6 (0.95)23.339.90.232

Atomistic simulations

Atomistic simulations protocol

For all the simulated systems, we used the RosettaDock (Chaudhury et al., 2011; Gray et al., 2003) protocol to obtain atomistic structures from the low-resolution CG conformations corresponding to the HPD-IN and HPD-OUT binding modes. Particularly, we took advantage of the multi-scale docking protocol (Chaudhury et al., 2011) to generate 1000 all-atom conformations from the CG structures corresponding to the center of each HPD-IN and HPD-OUT cluster. We then selected the 10 best scoring structures among those within a deviation equivalent to the radius of the clusters (Cα RMSD ≤ 5 Å) and we solvated them in dodecahedral boxes containing approximately 26,000 and 60,000 water molecules for NBD:JD and FL:JD complexes, respectively. MD simulations were performed using the GROMACS 5 MD package (Abraham et al., 2015), with the AMBER14 force-field (Case et al., 2014) and TIP3P water model (Jorgensen et al., 1983). Given the large internal dynamics of Hsp70, we used harmonic restraints on the backbone atoms of the Hsp70 constructs (NBD(ADP),NBD(ATP) and FL(ATP)) to focus on the inter-protein dynamics in the 30 ns runs. The μs simulations used the same parameters, with the harmonic restraints removed.

All simulations were performed in a dodecahedral box with periodic boundary conditions. Simulations were carried out with the following protocol:

  • Starting structures were solvated with TIP3P water molecules and subsequently energy minimized by steepest descent.

  • A first NVT equilibration phase (1 ns) was performed, putting full restraints on all proteins, ATP (when present) and MG atoms.

  • A second NPT equilibration phase (1 ns) was performed keeping the same restraints as in the 1ns NVT equilibration phase.

  • Subsequently, another NPT equilibration was performed (10 ns), putting restraints on the protein backbone only (DnaK and DnaJ).

  • Finally, production runs were carried out for 30 ns, keeping only restraints on the DnaK backbone.

  • For the 1μs simulations, we continued from the last frame of the 30 ns runs, without any restraints.

Temperature was kept constant (T = 300 K) using the v-rescale thermostat (Bussi et al., 2007) and NPT (p=1 atm) simulations relied on a Parrinello-Rahman barostat (Parrinello and Rahman, 1981). The equations of motion were integrated with a time step of 2 fs. All covalent bonds were constrained to their equilibrium values using the LINCS algorithm (Hess et al., 1997). The electrostatic interactions were calculated by the Particle Mesh Ewald algorithm, and a cutoff of 10 nm was used both for Lennard-Jones interaction and for the real-space coulomb contribution.

The distance root mean square (dRMS) measurements were calculated by

(5) dRMS(t)=1NiNj(i,j)(Ni,Nj)(dij(t)dij(0))2

where i (resp. j) are indices of the residues belonging to the J-domain (resp. DnaK), and dij(t) denotes the distance between the Cα atoms of residue i of the J-domain and residue j of DnaK at time t. The dRMSs are then time-averaged over the last 10 ns of the MD trajectories (results reported in Table 1).

Similarly the angular stability (θ in the text, see Table 1) was computed by

(6) θ(t)=180πacos(I1J(t)I1J(0))

where I1J denotes the inertia axis of the JD associated to the largest moment of inertia computed on the Cα (see Figure 1—figure supplement 4), for aligned DnaK NBD at all frames. The values reported in Table 1 are averaged over the last 10 ns of the MD trajectories.

The binding energies of the JD/DnaK complexes were calculated by the MM-GBSA method, implemented in Ambertools (Case et al., 2014). The polar contribution to the solvation energy was calculated using the Generalized Born approximation, with 0.01M counterion concentration in solution. The SASA computation was performed using the LCPO method.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    Scanning mutagenesis identifies amino acid residues essential for the in vivo activity of the Escherichia coli DnaJ (Hsp40) J-domain
    1. P Genevaux
    2. F Schwager
    3. C Georgopoulos
    4. WL Kelley
    (2002)
    Genetics 162:1045–1053.
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64

Decision letter

  1. Axel T Brunger
    Reviewing Editor; Stanford University Medical Center, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Modeling Hsp70/Hsp40 interaction by multi-scale molecular simulations and co-evolutionary sequence analysis" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Arup Chakraborty as the Senior Editor. The reviewers have opted to remain anonymous.

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, major revisions are required before the manuscript can be considered further. Please note that there is no guarantee for acceptance.

Summary:

Your paper aims to establish an integrative bioinformatic pipeline, bringing together co-evolutionary modeling with molecular simulations based on both coarse-grained and atomistic models. There is considerable discussion in the field as regards to the relevant interface between DnaK/DnaJ and homologous Hsp70/Hsp40 complexes. Crystal structures and NMR experiments have produced conflicting results. The simulations seem to generally support the conclusions drawn from the NMR studies of the DnaK/DnaJ complex. However, there is no further experimental validation of the computational results, and the study currently lacks new insights into the functional roles of the J protein/Hsp70 interaction. One reviewer commented that she/he could not reproduce the co-evolutionary analyses (although we do not have the MSA). Many other technical points noted below need to be addressed.

Essential revisions:

1) While we find the molecular simulation part very convincing, we are a bit perplexed, possibly due to lack of clarity in the manuscript, about the co-evolutionary analysis. In particular we failed to reproduce some of the results presented. Lacking the multiple sequence alignment (MSA) of the two protein families, we are not sure that we followed the same pipeline as you did. The MSA must be deposited and the pipeline used clearly described. Also, please provide more information about the alignments of the two proteins. How many species are included? What is the statistics of paralogs? How many species with unique copies of both proteins exist in the alignment (how are they correctly matched)? Are there cases of proteins coded in operons or of certified interaction which can be imposed in the matching? You cite Malinverni et al., 2015 for how the MSA was obtained, but this reference does not provide sufficient detail.

2) When you note: "We built two separate seeds containing Hsp70 and Hsp40 sequences, covering a broad portion of the tree of life" did you include sequences other than those for Prokaryotes? If you did so, how could you justify this inclusion as it well known, and also acknowledged by you in the fifth paragraph of the Introduction, that the Bacterial system seems to be incompatible with the Eukaryotic one?

3) Apart from the seed, it is also not clear if eventually non-bacterial sequences were removed from the final MSA. If you did not do so, it would be very important to present the same analysis only on bacterial sequences and discuss differences in the inference (if any).

On a related note, it would be interesting to verify your claim that the statistics over repeated random matchings is informative, by applying your method to PPI treated in past work where the operon-based matching is known. This will provide insights into the generalizability of the approach to other protein systems lacking operons. The same problem has been recently addressed by two papers by Bitbol et al. and by Gueudre et al. in PNAS (2016). Both propose rather involved matching schemes. Would the application of your methods improve the results? In case these papers provide their codes, this would be an easy analysis to be added. The paper has applied the selection criteria of Hopf et al., 2014 with a cutoff of 0.8. The quantity subject to this cutoff is not mentioned. Why should the cutoff established for operonic matchings be applicable to random matchings? We are rather surprised (positively) that a random matching produces similarly strong signal.

4) The method used to match the two protein families is interesting but recently two other methods have been published tackling the issue of concatenating MSAs with a new computational approach: Thomas Gueudré et al. "Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis", vol. 113 no. 43, 12186-12191, doi:10.1073/pnas.1607570113, Published online before print October 11, 2016. Anne-Florence Bitbol, Robert S. Dwyer, Lucy J. Colwell, and Ned S. Wingreen Inferring interaction partners from protein sequences PNAS 2016 113 (43) 12180-12185; published ahead of print September 23, 2016, doi:10.1073/pnas.1606762113. We encourage you to try one or both of the methods mentioned to verify the robustness of the random matching.

5) Together with the selection criterion introduced in Hopf et al., 2014, it could be interesting to show in general a histogram across the 1000 stochastically concatenated MSAs of the original residue-residue score (Ekeberg, Hartonen and Aurell, 2014) in order to figure out whether the score is doing more than largely producing top-scoring pairs. Similarly, how many protein pairs typically have a score larger than 0.8?

6) The long all-atom MD runs are only performed for the HPD-IN conformation. Would runs on the HPD-OUT conformation confirm the preference for the IN conformation over the OUT conformation?

7) A critically important overarching question is what have we learned from this study that was not known previously? The interaction site on DnaK is not unexpected, the region on the J-domain, especially the HPD, was known to be key to this interaction, and the dynamic nature of the complex was expected. Thus, the work, while laudable, in its current form, does not move the field significantly forward. You need to attempt to address some of the functional underlying questions: how do J-proteins modulate Hsp70s to affect their allosteric cycle? What is the role of the diversity of J-proteins and how involved is the SBD?

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Modeling Hsp70/Hsp40 interaction by multi-scale molecular simulations and co-evolutionary sequence analysis" for further consideration at eLife. Your revised article has been favorably evaluated by Arup Chakraborty (Senior Editor), a Reviewing Editor, and three reviewers.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1) In previous literature (namely Feinauer et al., 2016; Gueudré et al., 2016), an empirical coevolutionary score for protein pairs was introduced: starting from the Average Product Corrected (APC) inter-protein residue pair score (i.e. the restriction of the APC coupling score to all pair of residues i, j for which i belongs to one protein and j to the second one), and consider the mean over the 4 largest. It would be extremely interesting to show this score for the random matching strategy and also for PPM and IPA to compare the "strength" the DnaJ/DnaK coupling in comparison with other known protein pairs presented in literature.

2) While it becomes clear from the manuscript and from the authors reply that the central interest is in the Hsp70/Hsp40 system and not in the development of a general-purpose methodology, two results reported in the rebuttal letter but not in the manuscript or its supplement should be reported in the supplement, with a short reference from the new last paragraph of the subsection “3. Random Paralog matching”:

The random matching procedure is successful even if applied to the two component system used in Bitbol et al., 2016 and Gueudré et al., 2016.

The matching procedure of Bitbol et al., 2016 and Gueudré et al., 2016 produces strongly overlapping results with the random-matching procedure.

3) A minor remark concerns the results of the 3701 always matched protein pairs, where both sequences are single copy in their genomes. It is interesting that this case recovers part of the strongest signal, but the sampling of random matchings is able to enhance the coevolutionary signal beyond the one found for the uniquely matched pairs. Again, this is a nice detail, it might be introduced at the beginning of the Methods section on random matching, or at the end of Sec. II.B. “Coevolutionary Analysis predicts conserved DnaK-DnaJ contacts”.

4) HMMer should be cited in the first paragraph of the subsection “1. Sequence Extraction and Preprocessing”.

https://doi.org/10.7554/eLife.23471.039

Author response

Essential revisions:

1) While we find the molecular simulation part very convincing, we are a bit perplexed, possibly due to lack of clarity in the manuscript, about the co-evolutionary analysis. In particular we failed to reproduce some of the results presented. Lacking the multiple sequence alignment (MSA) of the two protein families, we are not sure that we followed the same pipeline as you did. The MSA must be deposited and the pipeline used clearly described. Also, please provide more information about the alignments of the two proteins. How many species are included? What is the statistics of paralogs? How many species with unique copies of both proteins exist in the alignment (how are they correctly matched)? Are there cases of proteins coded in operons or of certified interaction which can be imposed in the matching? You cite Malinverni et al., 2015 for how the MSA was obtained, but this reference does not provide sufficient detail.

We agree with the reviewers that the protocol of sequence extraction and the coevolutionary analysis need further explanations. As requested, the multiple sequence alignments of the two protein families, as well as the taxonomic identifiers of all sequences, have now been uploaded as supplementary material available online. Furthermore, we added a more detailed presentation of the sequence extraction and preprocessing steps in the Materials and methods section and we provided details about the statistics of the multiple sequence alignments. We report here for convenience in Author response table 1 the number of sequences and organisms in the alignments used in this work.

Author response table 1

Summary of the taxonomic composition of the two alignments used in the work. Entries of the table represent the number of sequences found in each taxonomic group. The number of organisms in each taxonomic group is indicated in parenthesis.

https://doi.org/10.7554/eLife.23471.030
EukaryotesBacteriaArchaeaVirusesOtherTotal
Hsp4014369 (1093)11379 (7837)311 (273)36 (22)159 (13)26254 (9238)
Hsp707881 (1933)11819 (8272)273 (258)25 (17)63 (13)20061 (10493)
Author response image 1
Distribution of number of paralogs per organism for Hsp40 and Hsp70.
https://doi.org/10.7554/eLife.23471.031

The number of Hsp40 and Hsp70 paralogs per organism varies widely. There are in total 8856 organisms for which both proteins have been retrieved. Among these, 3701 organisms have a single (retrieved) copy of both Hsp40 and Hsp70. These single sequence pairs were added to the matched MSAs without any special treatment. To assert the influence of the single copy pairs on the coevolutionary results, we performed DCA on an MSA composed solely of the 3701 organisms having a single copy of Hsp40 and Hsp70. Note that due to the finite sensitivity of the sequence extraction protocol, there is no guarantee that these organisms have truly a unique pair of Hsp40/Hsp70. The two strongest DCA contacts predicted on this reduced set (Author response table 2) correspond to the two most frequent contacts in the complete set. We however observed no overlap between lower-rank predicted inter-protein contacts in this reduced set and other frequently selected contacts in the full set. These observations lead to two conclusions: First, the organisms containing single pairs positively contribute to the most predicted contacts in the full dataset. Second, not the entire coevolutionary signal detected in the full dataset is contained in the reduced dataset containing only single pair organisms. The combination of both organisms with single and multiple paralogs is thus beneficial for the overall prediction of inter-protein contacts.

Author response table 2

10 first DCA predicted contacts using the reduced set consisting of the 3701 single copy pairs organisms. The second column denotes the DCA score of the predicted contact. The rank is computed over all contacts, including the intra-chain predictions.

https://doi.org/10.7554/eLife.23471.032
ContactScoreRankSASA [Å2]
N187-K230.31324102.7 – 119.8
D208-K260.2538542.0 – 145.0
A17-R630.253920 – 84.0
E306-R630.2440641.1 – 84.0
I338-I210.234480.8 – 1.0
N222-K310.2246314.8 – 145.0
L219-E550.2247844.7 – 69.8
L390-A290.2057889.5 – 11.1
T215-E550.1959726.6 – 69.8
L131-A640.1961747.2 – 34.6

There are cases of Hsp40s and Hsp70s that appear in operons in bacterial organisms (e.g. members of the DnaJ/DnaK and HscB/HscA subfamilies). We initially tested the operon based matching presented in (Ovchinnikov et al., eLife 2014), where sequences are matched if they are separated by less than 20 genes on the genome. Although a substantial number of matched pairs are retrieved (11293), we observed that these pairs of sequences present very high sequence identity, which is reflected by the effective number of sequences they represent (Neff=1045, with similarity threshold of maximum 90% sequence identity). This was reflected by overall poor DCA predictions, and particularly weak inter-protein couplings. Furthermore, there are some known interacting paralogs for a very small set of model organisms. However, these sets are not exhaustive (they do not cover all paralogs of the model organisms, even in the case of E. coli).

Given the reasons above, and the quality of the inter-protein predictions based on the random matching strategy, we decided to keep the pairing procedure as generic and stochastic as possible, without introducing any additional pairing restraints.

2) When you note: "We built two separate seeds containing Hsp70 and Hsp40 sequences, covering a broad portion of the tree of life" did you include sequences other than those for Prokaryotes? If you did so, how could you justify this inclusion as it well known, and also acknowledged by you in the fifth paragraph of the Introduction, that the Bacterial system seems to be incompatible with the Eukaryotic one?

We thank the reviewers for pointing to the importance of the possible differences between Bacteria and Eukaryotes regarding the Hsp40-Hsp70 interactions. As discussed above (see point 1), no sequences were discarded based on taxonomic information. We kept sequences from all taxonomic origins in our main dataset to incorporate as much variability and exhaustiveness as possible in our dataset. We repeated the analysis on bacterial sequences only (200 random matchings). For completeness, we also performed the same analysis on eukaryotic sequences only (200 random matchings). Results of the coevolutionary analysis restricted to bacterial and eukaryotic organisms separately show that the origin of the inter-protein contacts detected by DCA in the full alignment mostly stems from the bacterial sequences (Author response image 2, left) while no strong coevolutionary signal is retrieved from eukaryotic sequences alone (Author response image 2, right). The contacts extracted from bacterial data alone are in excellent agreement with the predictions from the full dataset (Author response table 3). This last observation highlights the robustness of the matching procedure, as the addition of sequences from eukaryotic organisms does not perturb the ranking of the strongest coevolving DCA contacts.

Author response image 2
Frequency of appearance of coevolutionary inter-protein contacts.

Left: Only bacterial sequences. Right: Only eukaryotic sequences. Inset: Zoom on the vertical axis between 0 and 5%. The five most frequent contacts in the bacterial analysis reported in Tab.R3 are highlighted. The colored circles coincide with the three most frequently selected contacts in the full dataset, reported in the main text.

https://doi.org/10.7554/eLife.23471.033

The five most frequent contacts in the bacterial analysis (Author response image 2, left) are reported in Author response table 3. We observe that the four first contacts (denoted by (*) in the table) are among the first five contacts predicted in the complete dataset. These observations support the idea that the coevolutionary signal that we observe in the complete dataset stems from the bacterial sequences in our sample. We added these results, based on separate bacterial and eukaryotic sequences to the manuscript (Figure 3—figure supplement 1) and commented on them in the main text.

Author response table 3

Five most frequent contacts in the coevolutionary analysis restricted to bacterial sequences. Contacts denoted by (*) are among the five first predicted contacts in the full dataset.

https://doi.org/10.7554/eLife.23471.034
ContactFrequency
D208 – K26 (*)1
N187 – K23 (*)1
T189 – R19 (*)0.83
L392 – A29 (*)0.81
V215 – Y540.75

3) Apart from the seed, it is also not clear if eventually non-bacterial sequences were removed from the final MSA. If you did not do so, it would be very important to present the same analysis only on bacterial sequences and discuss differences in the inference (if any).

On a related note, it would be interesting to verify your claim that the statistics over repeated random matchings is informative, by applying your method to PPI treated in past work where the operon-based matching is known. This will provide insights into the generalizability of the approach to other protein systems lacking operons. The same problem has been recently addressed by two papers by Bitbol et al. and by Gueudre et al. in PNAS (2016). Both propose rather involved matching schemes. Would the application of your methods improve the results? In case these papers provide their codes, this would be an easy analysis to be added. The paper has applied the selection criteria of Hopf et al., 2014 with a cutoff of 0.8. The quantity subject to this cutoff is not mentioned. Why should the cutoff established for operonic matchings be applicable to random matchings? We are rather surprised (positively) that a random matching produces similarly strong signal.

We addressed above (point 2) the question regarding the consequences of limiting the DCA analysis to bacterial (or non-bacterial) sequences. For the sake of clarity, we discuss here the application of our random matching algorithm to an alternative system as requested by the reviewers. Please find below our considerations about the application of the IPA (Iterative Pairing Algorithm, IPA, Bitbol et al.) and PPM (Progressive Paralog Matching, PPM, Guedré et al.) schemes to the Hsp40/Hsp70 case (point 4) and the selection criteria (point 5).

We want to stress that the scope of our work is to investigate Hsp70/Hsp40 interactions and not to propose a novel, general approach to investigate PPIs where operon based matching is not available. However, we are pleased by the fact that the reviewers are positively surprised by the results obtained with the random matching algorithm proposed here and we tried to address their curiosity about its generalizability to other systems. Following their suggestion, we assessed the robustness of the random matching strategy by performing the same analysis on the bacterial two-component system HK-RR, which was used by both Guedré et al., and by Bitbol et al. as a benchmark and validation set. Unfortunately, we could not access the MSAs used in these two studies that apparently were not made publicly available. We then reconstructed them following the procedure for the data extraction and preprocessing described in the work by Bitbol et al. and briefly summarized here:

Two component systems sequences were retrieved from the P2CS database. We retained sequences belonging to the Pfam domain profiles of the Histidine Kinase (HisKA, PF00512) or the Response Regulator (Response_reg, PF00072). After removing organisms containing a single pair of HK/RR and removing hybrid and unorthodox sequences, the interacting partners were retrieved as adjacent HK and RR on the genome. Finally, a reduced dataset is under-sampled, by extracting randomly 459 organisms (see Bitbol et al.). All organisms possessing a single pair of HK-RR are discarded. We verified that the average number of paralogs and the distribution of number of paralogs match with those presented in their work.

We report in Author response image 3 the results obtained with the random matching strategy (1000 random matchings) on the resulting HK/RR dataset. Interestingly, we observe the presence of a limited set of predicted contacts that are observed very frequently (Author response image 3, left). This is particularly noteworthy in the case of the HK/RR dataset, which has a broader distribution of paralogs per organism compared to our dataset of Hsp40/Hsp70, and thus a higher variability in the matchings. We also notice that the most frequent couplings (Author response image 3, right) either correspond to native contacts in the crystal structure (PDB: 3DGE) (5 out of the 10 most frequent), or lie very close to native contacts. The random matching strategy is thus generalizable to the bacterial HK-RR system for coevolutionary inter-protein contact predictions.

Author response image 3
Random matching results obtained on the HK/RR dataset.

Left: Frequence of appearances of the contacts. The 15 most frequent contacts are highlighted by red circles for visualization. Right: The 10 most frequent contacts plotted onto the contact map of the HK/RR hetero-dimer (PDB:3DGE). Structural contacts from the PDB are defined using a threshold of 8.5Å between heavy atoms of two residues. Green (resp. red) DCA predicted contacts correspond to true (resp. false) predictions with respect the crystal structure. The blue dotted line represents the limit between HK and RR.

https://doi.org/10.7554/eLife.23471.035

We want to acknowledge here that both IPA and PPM tackle an ambitious objective, namely the simultaneous identification of both the coevolving residues forming the interaction interface and the interaction network between multiple paralogs. Both studies propose technically involved and elegant numerical solutions for tackling these coupled problems. In our case, the objective was restricted to determining a set of coevolving residues between Hsp40 and Hsp70. While of crucial importance and great interest, the identification of the interaction network between Hsp40 and Hsp70 goes beyond the scope of this work, and will probably require a more detailed analysis of the effect of highly promiscuous interactions on coevolving interfaces.

4) The method used to match the two protein families is interesting but recently two other methods have been published tackling the issue of concatenating MSAs with a new computational approach: Thomas Gueudré et al. "Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis", vol. 113 no. 43, 12186-12191, doi:10.1073/pnas.1607570113, Published online before print October 11, 2016. Anne-Florence Bitbol, Robert S. Dwyer, Lucy J. Colwell, and Ned S. Wingreen Inferring interaction partners from protein sequences PNAS 2016 113 (43) 12180-12185; published ahead of print September 23, 2016, doi:10.1073/pnas.1606762113. We encourage you to try one or both of the methods mentioned to verify the robustness of the random matching.

We thank the reviewers for highlighting the two recent methods proposed for the matching of interacting partners in the presence of paralogs.

In order to assess the robustness of our results with respect to multiple methods, we performed two additional analyses, using the PPM algorithm presented in Guedré et al., and the IPA algorithm presented in Bitbol et al. We implemented both IPA and PPM in Matlab (from scratch for IPA, based on the Julia implementation and pseudo-code provided for PPM). Both algorithms had to be adapted to allow for an unequal number of paralogs of both families. In the case of IPA, we started from a random pairing seed as presented in their original work. Both algorithms were run with their default parameters as presented in their resp. papers. To best compare the contact prediction quality, we used IPA/PPM to build (near-) optimal paralog matchings. The resulting MSAs were then used as input to the same pseudo-likelihood based DCA algorithm used in the random matching strategy, which is known to perform slightly better than the mean-field methods used in IPA/PPM when performing contact predictions.

We report in Author reponse table 4 the nine most frequent contacts in the random matching approach, as well as the nine strongest coevolving contacts in the DCAs obtained from the MSAs resulting from the IPA/PPM procedures. We highlighted in green the contacts that appear in both the Random matching and either the IPA or the PPM procedures. In yellow are contacts that are defined as “similar” (one residue being predicted by two methods and the second one in a nearby region between the two predictions). As seen, the results obtained by the random matching strategy are strongly overlapping with the results obtained by IPA and PPM. In particular, the four most frequent contacts identified by the random matching are also among the strongest contacts identified by the other methods. The 5th and 6th most frequent contacts in the random matching are similar (see above) to strong contacts identified by IPA or PPM. Taken together, these results highlight the strong overlap of all three methods in this case.

Author response table 4

Comparison of top inter-protein contact predictions between Random Matching, PPM and IPA. In green are highlighted the overlapping predictions between Random matching and PPM/IPA. Contacts highlighted in yellow denote similar contacts, defined as having one identical residue, and the second one in proximity.

https://doi.org/10.7554/eLife.23471.036
Random MatchingPPMIPA
K23 – N187K23 – N187K23 – N187
K26 – D208E55 – T215R19 – T189
R19 – T189K26 – D208K26 – D208
A64 – A176R63 – A17D59 – I39
A29 – L392R19 – T189A64 – A176
E55 – L219Y25 – A191A29 – L382
K51 – D224A29 – L382K50 – Y193
M30 – L320Y25 – I338A24 – G358
R36 – F356Y54 – A376I21 – I338

To further investigate whether IPA or PPM could improve the predictions on the bacterial/eukaryotic subsets, we repeated the analysis on these two separate subsets. Our results (Author response image 4) are in perfect agreement with our observations obtained using the random matching strategy (reported again in Author response image 4 for visual comparison, top row). We observe a strong overlap between all methods in the case of the full dataset and the dataset restricted to bacterial sequences (left and middle column), and a large spread of predictions over three different lobes with all methods in the case of the dataset restricted to eukaryotic sequences (right column).

Author response image 4
Comparison of inter-protein predictions between the three methods, on the full dataset (left column), bacterial dataset (central column) and eukaryotic dataset (right column).

The residues on the NBD involved in the absolute top five contacts predicted by all methods are reported, without any filtering. The color-code denotes the rank of the predictions, from dark blue (rank=1) to dark red (rank=5).

https://doi.org/10.7554/eLife.23471.037

Altogether, given the strong agreement for the top predictions between the three methods, we can conclude that the random matching strategy used in this work produces results fully compatible with the two state-of-the art methods. Furthermore, we observe that the use of more involved methods currently does not allow disentangling the more complex eukaryotic coevolutionary networks between paralogs. Indeed, both IPA and PPM were developed and benchmarked on datasets with fundamentally binary interactions, i.e. the underlying assumption of both methods is to find a unique one-to-one matching for all paralogs. While this is perfectly legit and documented for some systems, the situation for the Hsp40-Hsp70 interaction network is notoriously more complex. Several cases of promiscuous interactions of Hsp70s with multiple different J-proteins have been experimentally reported. The promiscuous one-to-many, and possible many-to-many, interactions in this system is highlighted by the unbalanced distribution of paralogs per organism between Hsp70 and Hsp40 (e.g. 15 Hsp70 vs 50 Hsp40 in H. Sapiens). The approaches consisting in finding the optimal one-to-one matching, as implemented in IPA and PPM is therefore not guaranteed to be an optimal solution for more complex interaction networks such as Hsp40 and Hsp70.

5) Together with the selection criterion introduced in Hopf et al., 2014, it could be interesting to show in general a histogram across the 1000 stochastically concatenated MSAs of the original residue-residue score (Ekeberg, Hartonen and Aurell, 2014) in order to figure out whether the score is doing more than largely producing top-scoring pairs. Similarly, how many protein pairs typically have a score larger than 0.8?

We agree with the reviewers that the selection criterion used needs further clarifications. We extensively discussed in detail the definition and the meaning of the selection threshold used.

As discussed in the revised manuscript, the selection score pertains to the selection of inter-protein residue pairs for a given MSA realization. In practice, all inter-protein DCA scores are normalized (see Equation 4 of the revised manuscript), and a threshold is set on the normalized scores to select the significant predicted contacts for each random realization of the random matching. It must be noted that the normalization does not change the relative rank of the inter-protein contacts, as it is merely a convenient way to rescale the scores, which can in general vary with the depth and width of the MSAs across different protein families. Alternatively, one could set a threshold on the number of highest scoring contacts selected for each realization, which leads to virtually the same results.

6) The long all-atom MD runs are only performed for the HPD-IN conformation. Would runs on the HPD-OUT conformation confirm the preference for the IN conformation over the OUT conformation?

We would like to underline that the long all-atom MD simulations were not performed to discriminate between the alternative HPD-IN and HPD-OUT conformations. Their objective was to further assess the stability of the proposed model on longer time-scales. We selected the HPD-IN conformation as a model for the DnaK/DnaJ complex due to its better agreement both with the coevolutionary predictions and with previous experimental evidences, such as the involvement of the HPD tripeptide in the stimulation of ATP hydrolysis by Hsp70.

However, we followed the suggestion of the reviewers and we further investigated the stability of the HPD-OUT conformations of the full-length complex using longer explicit-solvent atomistic MD runs. To this aim, we selected the two most stable HPD-OUT conformations in the 30ns runs and simulated them on longer time-scales using the same protocol as for the long HPD-IN simulations. Given the considerable computational cost of these simulations (approximatively 200’000 atoms), we could perform two ~750 ns MD runs for the HPD-OUT conformations.

In this timescale, we did not observe the full detachment of the J-domain from the docked state. This outcome is not totally surprising even in the case of weak, non-specific interactions as it is well known that the current force-fields have a propensity to over-estimate the stickiness of proteins and under-estimate the solvent-protein interactions (Petrov and Zagrovic, PLoS Comp. Biol, 2014, Abriata and Dal Peraro, Sci. Rep., 2015). Therefore, we relied on a standard, implicit solvation approach (MM-GBSA) to estimate the differential stability of the HPD-IN and HPD-OUT complex from the atomistic MD trajectories. The results, shown in Author response image 5, indicate a striking difference in the inter-protein binding enthalpy between the HPD-IN and HPD-out trajectories. Although this analysis does not account for entropic differences, the observed difference clearly supports a stronger binding in the HPD-IN conformation.

Author response image 5
Binding energy computed by GBSA.

The blue bars denote the three micro-second runs of the HPD-IN conformations. The red bars denote the two 750 ns runs of the HPD-OUT conformations.

https://doi.org/10.7554/eLife.23471.038

7) A critically important overarching question is what have we learned from this study that was not known previously? The interaction site on DnaK is not unexpected, the region on the J-domain, especially the HPD, was known to be key to this interaction, and the dynamic nature of the complex was expected. Thus, the work, while laudable, in its current form, does not move the field significantly forward. You need to attempt to address some of the functional underlying questions: how do J-proteins modulate Hsp70s to affect their allosteric cycle? What is the role of the diversity of J-proteins and how involved is the SBD?

We respectfully disagree with the reviewers on this point. Although it is true that there are multiple experimental evidences about the interaction between J-proteins and Hsp70s, only two structural models are currently available to rationalize these observations: the X-ray structure of a cross-linked eukaryotic NBD-JD complex (Jiang et al., 2007) and a model for the interaction between DnaJ JD with ADP-bound DnaK based on PRE-NMR data (Ahmad et al. 2011). While both these studies are insightful, they clearly suffer from some limitations, as discussed in the manuscript. To date no structural model is available for the complex formed by DnaJ and ATP-bound DnaKi.e. for the most studied Hsp40/Hsp70 system in the functionally-relevant nucleotide state (as stated for example in the recent review by Zuiderweg et al., Cell Stress and Chaperones, February 2017, section “Complexes with J proteins, p.182-182). Furthermore, a detailed structural characterization of the Hsp70/Hsp40 complex is critical both for better understanding interactions with other chaperone families, such as Hsp90s (Kravats et al.,2016, 429 (6), p. 858 J Mol Biol) and for designing allosteric inhibitors (Li et al., 2016, 16 (25), p. 2729, Curr Top Med Chem). Here, we propose a novel structural model for the transient DnaK/DnaJ complex that is i) in excellent agreement with co-evolutionary data, and thus expected to be functionally-relevant and ii) compatible with the main experimental observations available in the literature at variance with previous models (Ahmad et al. 2011). We acknowledge that these points have probably not been satisfactorily explained in the manuscript and we modified the revised version to address this issue.

Importantly, we were inspired by the insightful comments of the reviewers and we thus extended our work to investigate the involvement of SBD in J-protein binding. Moreover, we took advantage of our findings to propose a mechanistic hypothesis about the role of J-protein binding in Hsp70 functional cycle. Particularly, we calculated the energetic contribution of individual residues to the binding interface for elucidating the structural determinants of the dynamical DnaK/JD interactions. This analysis assessed the importance of the interdomain linker and neighbouring NBD residues for effective JD binding and also unveiled a potential role of SBD in the stabilization of the complex. The relevance of SBD contribution was then further confirmed by an additional co-evolutionary analysis taking into account full-length Hsp70 sequences. Combining our findings and recent results on the allosteric signal transmission in DnaK, we then formulated an hypothesis about the molecular mechanism at the basis of JD-induced stimulation of DnaK ATPase activity. Indeed, our structural, coevolutionary and energetic analyses suggest a scenario where JD binding shifts the equilibrium of ATP-bound DnaK, which dynamically populates multiple conformational states, toward “allosterically active” conformers with the highest ATPase activity.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1) In previous literature (namely Feinauer et al., 2016; Gueudré et al., 2016), an empirical coevolutionary score for protein pairs was introduced: starting from the Average Product Corrected (APC) inter-protein residue pair score (i.e. the restriction of the APC coupling score to all pair of residues i, j for which i belongs to one protein and j to the second one), and consider the mean over the 4 largest. It would be extremely interesting to show this score for the random matching strategy and also for PPM and IPA to compare the "strength" the DnaJ/DnaK coupling in comparison with other known protein pairs presented in literature.

We now reported in the manuscript the inter-protein scores introduced in Feinauer et al., 2016 and Gueudré et al., 2016 obtained for Hsp40-Hsp70 using the random matching, IPA and PPM. Nevertheless, we note that as the random matching results in an ensemble of MSAs, we could only report the empirical score averaged over the 1000 random realizations.

2) While it becomes clear from the manuscript and from the authors reply that the central interest is in the Hsp70/Hsp40 system and not in the development of a general-purpose methodology, two results reported in the rebuttal letter but not in the manuscript or its supplement should be reported in the supplement, with a short reference from the new last paragraph of the subsection “3. Random Paralog matching”:

The random matching procedure is successful even if applied to the two component system used in Bitbol et al., 2016 and Gueudré et al., 2016.

The matching procedure of Bitbol et al., 2016 and Gueudré et al., 2016 produces strongly overlapping results with the random-matching procedure.

As suggested by the reviewers, we included and discussed the comparison of the results obtained with IPA, PPM and random matching approaches for the Hsp70/Hsp40 families.

As discussed with the editorial board, we decided not to discuss in the main manuscript the positive results obtained on the unrelated HK/RR dataset.

3) A minor remark concerns the results of the 3701 always matched protein pairs, where both sequences are single copy in their genomes. It is interesting that this case recovers part of the strongest signal, but the sampling of random matchings is able to enhance the coevolutionary signal beyond the one found for the uniquely matched pairs. Again, this is a nice detail, it might be introduced at the beginning of the Methods section on random matching, or at the end of Sec. II.B. “Coevolutionary Analysis predicts conserved DnaK-DnaJ contacts”.

We discussed the results on the separate dataset containing only the 3701 single-copy pairs in the random matching sub-section of the Methods section.

4) HMMer should be cited in the first paragraph of the subsection “1. Sequence Extraction and Preprocessing”.

We apology for our oversight of the citation of the HMMER software suite and added the citation to the revised manuscript.

https://doi.org/10.7554/eLife.23471.040

Article and author information

Author details

  1. Duccio Malinverni

    Laboratoire de Biophysique Statistique, Faculté de Sciences de Base, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
    Contribution
    DM, Conceptualization, Formal analysis, Investigation, Writing—original draft, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  2. Alfredo Jost Lopez

    Max Planck Institute of Biophysics, Frankfurt am Main, Germany
    Contribution
    AJL, Formal analysis, Investigation, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  3. Paolo De Los Rios

    1. Laboratoire de Biophysique Statistique, Faculté de Sciences de Base, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
    2. Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
    Contribution
    PDLR, Conceptualization, Resources, Funding acquisition, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-5394-5062
  4. Gerhard Hummer

    1. Max Planck Institute of Biophysics, Frankfurt am Main, Germany
    2. Institut für Biophysik, Johann Wolfgang Goethe Universität Frankfurt, Frankfurt am Main, Germany
    Contribution
    GH, Conceptualization, Resources, Formal analysis, Funding acquisition, Writing—review and editing
    Competing interests
    The authors declare that no competing interests exist.
  5. Alessandro Barducci

    1. Inserm, U1054, Montpellier, France
    2. Université de Montpellier, CNRS, UMR 5048, Centre de Biochimie Structurale, Montpellier, France
    Contribution
    AB, Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Writing—original draft, Writing—review and editing
    For correspondence
    alessandro.barducci@cbs.cnrs.fr
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-1911-8039

Funding

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (2012_149278)

  • Duccio Malinverni
  • Paolo De Los Rios

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (20020_163042/1)

  • Duccio Malinverni
  • Paolo De Los Rios

Max-Planck-Gesellschaft

  • Alfredo Jost Lopez
  • Gerhard Hummer

Agence Nationale de la Recherche (ANR-14-ACHN-0016)

  • Alessandro Barducci

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

DM thanks the Swiss National Science Foundation (http://www.snf.ch/) for grants 2012_149278 and 20020_163042/1. AJL and GH were supported by the Max Planck Society. AB acknowledges the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-14-ACHN-0016. This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project s684.

Reviewing Editor

  1. Axel T Brunger, Reviewing Editor, Stanford University Medical Center, United States

Publication history

  1. Received: November 22, 2016
  2. Accepted: May 10, 2017
  3. Accepted Manuscript published: May 12, 2017 (version 1)
  4. Version of Record published: July 20, 2017 (version 2)

Copyright

© 2017, Malinverni et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 863
    Page views
  • 311
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Cell Biology
    2. Developmental Biology and Stem Cells
    Hirohito Shimizu et al.
    Research Article