Structural Biology and Molecular Biophysics

A parametrized two-domain thermodynamic model explains diverse mutational effects on protein allostery

Zhuang Liu
Thomas Gillis
Srivatsan Raman
Qiang Cui author has email address

Department of Physics, Boston University, Boston, United States
Department of Biochemistry, University of Wisconsin, Madison, United States
Department of Chemistry, University of Wisconsin, Madison, United States
Department of Bacteriology, University of Wisconsin, Madison, United States
Department of Chemistry, Boston University, Boston, United States

https://doi.org/10.7554/eLife.92262.2

Open access
Copyright information

Figures and data

Schematic illustration of the two-domain statistical thermodynamic model of TetR allostery. (A) The crystal structure of TetR(B) in complex with minocycline and magnesium (PDB code: 4AC0). The red residues are the hotspots identified in the DMS study Leander et al. (2020). (B) Four possible conformations of a two-domain TetR molecule with their corresponding free energies (G). G of the L_I D_I state is set to 0. Blue/green circle (square) denotes the inactive (active) state of LBD/DBD. (C) A simple repression scheme of TetR function. Binding of the ligand (inducer) favors the inactive state of DBD in TetR, which then releases the DNA operator and enables the transcription of the downstream gene. (D) Schematic free energy diagram of the possible binding states of TetR at fixed ligand and operator concentrations. Red, orange and purple arrows show how a mutation can disrupt allostery by 1. increasing ɛ_L; 2. decreasing ɛ_D and 3. decreasing γ. The L-L_A D_A state is not explicitly shown in the last column as the doubly-bound L-L_A D_A -D state is expected to have a lower free energy. Note that mutations that change the binding affinities of the active LBD/DBD to ligand/operator are not discussed here as we focus on the intrinsic allosteric properties of the TF itself.

Distances to the DNA operator and ligand of the 21 residues under mutational study.

Bayesian inference results with different prior distributions of γ.

Schematic illustration of the characteristic effects of perturbations in the three biophysical parameters on the free energy landscape and the corresponding induction curves of a two-domain allosteric system. Panels A, B and C illustrate how changing ɛ_L, ɛ_D, and γ alone affects the free energy landscape for the binding states shown in Figure 1D and the induction curve. For the black induction curve in (C), the values of ɛ_L and ɛ_D are also adjusted to aid visualization of the negative monotonicity of the gene expression level (fold change) as a function of ligand concentration. The green shade in the middle column separates the DNA-bound states from the rest. In the free energy landscapes shown in the middle column, ligand- or DNA-binding is always assumed when the corresponding domain is in the active conformation.
Figure 2—figure supplement 1. Statistical weights of promoter occupancy states and repressor states.
Figure 2—figure supplement 2. Equilibria among different conformational and binding states of the repressor.
Figure 2—figure supplement 3. Extended parametric study of main text Equation 1.

Induction data of 15 TetR mutants and the corresponding parameter estimation results. (A) Shaded blue curves in each plot show the percentiles of the simulated fold-change measurements using the inferred posterior parameters of the mutant. The white data points represent the corresponding experimental induction measurement of 4 or more biological replicates (three replicates for C203V and G102D-HQQ). (B) The inferred parameter values of the 15 mutants. The error bars of ɛ_L and γ represent the 95^thpercentile of the Bayesian posterior samples, while the error bar of ɛ_D is calculated based on the standard error of the mean (SEM) of the corresponding leakiness measurement. The horizontal lines indicate the WT parameter values for reference.
Figure 3—figure supplement 1. Sequence and structural distributions of the 21 residues chosen for the mutation analyses in this work.
Figure 3—figure supplement 2. Prior probability distributions and prior predictive check.
Figure 3—figure supplement 3. Distributions of the prior predictive parameters and the corresponding posterior distributions.
Figure 3—figure supplement 4. Distributions of rank statistics of the prior predictive parameters relative to the corresponding posterior samples.
Figure 3—figure supplement 5. Sensitivity analysis for model parameter inference.
Figure 3—figure supplement 6. Posterior predictive check of mutant G102D-Y42M-I57N.
Figure 3—figure supplement 7. Theoretical induction curves of R49G, D53H, P105M and G143M with WT γ value.
Figure 3—figure supplement 8. Sorting scheme to identify dead variants.
Figure 3—figure supplement 9. Distances to the DNA operator and ligand of the 21 residues under mutational study.
Figure 3—figure supplement 10. Bayesian inference results with different prior distributions of γ.

The induction curves for the eight combined TetR mutants from experimental measurement and prediction from the modified additive model. In each plot, black, orange and white points represent the experimental data for mutant 1, mutant 2 and the combined mutant (named mutant 1-mutant 2), specified in the legend and title. The blue band show the 95^thpercentile of the indu_{ ction curve prediction from th_} e modified additive model. The modification to the basic additive model in each plot is specified by the 6 weights {1,ɛ_D ^{, α}2,ɛ_D ^{, α}1,ɛ_L ^{, α}2,ɛ_L ^{, α}1,γ ^{, α}2,γ (see Equation 4), which are (A) {1, 1, 1, 1, 1, 1}; (B) {1, 1, 0.5, 1, 1, 1}; (C) {1, 1, 0.5, 1, 1, 1}; (D) {1, 1, 0, 1, 1, 1}; (E) {0, 1, 1, 1, 0, 1}; (F) {0, 1, 1, 1, 0, 1}; (G) {0, 1, 1, 1, 1, 1}; (H) {0, 1, 1, 1, 1, 1}.
Figure 4—figure supplement 1. The induction curves of the eight combined mutants calculated using the basic additive model.
Figure 4—figure supplement 2. Induction curves of the eight combined mutants and the corresponding parameter estimation results as well as the basic additive model predictions.

Distribution of the 23 investigated TetR mutants in the parameter space of the two-domain model illustrates the correlation between perturbations in ɛ_L and γ. Color of the contour plot encodes the fold change at c = 1000 nM calculated for each point in the two-dimensional space of ɛ_L and γ with the WT ɛ_D value. The color within the data point of each mutant is based on the F C¹⁰⁰⁰ calculated with its specific ɛ_D value. The notation of each mutant is abbreviated based on the 1-letter codes of the residues that are mutated. The specific mutations corresponding to each letter code from upper left to lower right are C: C203V; Y: Y132A; GL: G102D-L146A; GT: G102D-T26A; R: R49G; D: D53H; GK: G102D-K98Q; E: E150Y; QE: Q32A-E147G; G: G143M; P: P105M; GYI: G102D-Y42M-I57N; GHQQ: G102D-H44F-Q47S-Q76K; PIF: P176N-I174K-F177S. The least-squares regression line between ɛ_L and γ is shown with their Pearson correlation coefficient (ρ).

Statistical weights of promoter occupancy states and repressor states. (A) Statistical weights of the promoter occupancy states, with the empty promoter state taken as reference. P is the average number of RNAP per cell. R₁, R₂, R₃ and R₄ denote the average number of repressors in the L_I D_I, L_A D_I, L_I D_A and L_A D_A state per cell, respectively. N_Ns is the number of non-specific DNA binding sites in the cell. Δɛ_P, Δɛ_RA and Δɛ_RI represent the energy differences between specific and non-specific DNA binding of RNAP, repressor with DBD in the active and inactive conformations, respectively. (B) Statistical weights of the allosteric states of the repressor, with the L_I D_I state taken as reference. K_A and K_I are the dissociation constants of ligand to the repressor with LBD in the active and inactive conformations, respectively, and c is ligand concentration. Partial binding of ligand is ignored in the symmetric model. All energy terms in the exponents are evaluated in the unit of k_B T.

Equilibria among different conformational and binding states of the repressor. Here, L and O represent ligand and operator respectively. ɛ_R0 and ɛ_R0 are the A I free energies of operator binding for repressor with DBD in the active and inactive conformation, respectively. R₁, R₂, R₃ and R₄ denote the L_I D_I, L_A D_I, L_I D_A and L_A D_A state of the repressor, respectively. All energy terms in the exponents are evaluated in the unit of k_B T.

Extended parametric study of main text Equation 1. The four colored curves show that a flat induction curve can result from tuning each one of the three main biophysical parameters of the two-domain model alone (by decreasing ɛ_D or increasing ɛ_L from the WT value, or by setting γ to 0). The white points show the experimental induction data for the mutant G102D (measurements of 4 biological replicates at each ligand concentration). Note that the colored curves in the figure are generated for better visualization of the model parameters’ effect on the induction curve and not to be compared with the experimental data of G102D. The leakiness of G102D (0.0173) is higher than that of the WT (0.0086) determined in experiments.

Sequence and structural distributions of the 21 residues chosen for the mutation analyses in this work. The upper panel shows the sequence of TetR (residue 2-203), where the 21 residues chosen for mutation analyses in this work are colored red, orange, green or blue (while the other residues are colored grey). The lower panel shows the crystal structure of TetR(B) in complex with minocycline and magnesium (PDB code: 4AC0). Here, the two identical monomers of TetR are colored white and grey respectively, while the residues chosen for mutation analyses are colored in the same way as in the sequence above in the white monomer. Specifically, the 5 red residues from top to bottom are C203, Y132, I57, R49 and T26. The 4 orange residues from left to right are H44, P105, G102 and L146. The 5 blue residues from top to bottom are Q76, G143, E147, Q47 and Q32. The 7 green residues from left to right are Y42, D53, K98, E150, F177, P176 and I174. Some of the 21 residues are presented in the stick format to aid visualization while all other residues are presented in the cartoon format.

Prior probability distributions and prior predictive check. (A) Density functions of the prior distributions of leakiness, σ, ɛ_L and γ. The black dots above the prior distributions show the 1000 prior predictive draws of the corresponding parameter. (B) Percentiles of the simulated fold changes using the 1000 sets of parameters shown in (A).

Probability distributions of ɛ_L, γ and σ values in the 1000 sets of prior predictive draws (ground truth) and the average of the corresponding 1000 sets of inferred posterior distributions of the parameters (inferred).

Distributions of rank statistics of the prior predictive draws relative to the corresponding posterior samples. (A) Histograms (20 bins); (B) ECDF plots; (C) ECDF difference plots of rank statistics of ɛ_L, γ and σ. The green bands in (A)-(C) show the 99^thpercentile expected from a true uniform distribution.

Sensitivity analysis for model parameter inference. Posterior z-score and posterior contraction of the inferred posterior distribution for each of the 1000 prior predictive draws of parameters and the corresponding simulated data except: (A) parameters of flat induction curves (see Supplementary file section 3 and Eq. (32)); (B) parameters of flat induction curves or μ(c=1000 nM)>0.97 (see Supplementary file section 3 and Eq. (34)). Accordingly, there are 970 and 523 data points for each parameter in (A) and (B), respectively.

Posterior predictive check of mutant G102D-Y42M-I57N. (A) 1000 sets of posterior samples of ^{ɛ, γ, σ^}. The scattered plots show the joint distributions of the parameters, colored by the log probability of the parameter combinations. Blue/red corresponds to low/high probability. The histograms show the marginal distributions of the corresponding individual parameters. (B) Percentiles of the simulated fold change measurements using the 1000 sets of posterior samples based on Supplementary file Eq. (24). The white data points show the experimental induction data of four biological replicates.

Theoretical induction curves of the four dead mutants when their γ values are set to the WT value while using their respective ɛ_D and ɛ_L values (taken from Figure 3B in the main text). The theoretical induction curve is calculated with main text Equation 1

Sorting scheme to identify dead variants. Each panel is a tiled segment of single site saturation mutants of defined length along the TetR gene. The residue number spanning each segment is shown above the panel. The gray distribution denotes the uninduced population of cells containing mutants. The dark green distribution on the same panel denotes the population of cells responding to the anhydrotetracycline. The uninduced population of cells containing mutants when was added was sorted and reflowed which is shown outside the main panel as the light green distribution.

The induction curves of the eight combined mutants calculated using the basic additive model (i.e., with α_1,p = α_2,p = 1 in main text Equation 4). In each plot, the white data points show the experimental induction data of the combined mutants, while the black and orange ones show data of the individual mutants. The blue band show the 95^thpercentile of the induction curve prediction from the additive model.

Induction curves of the eight combined mutants and the corresponding parameter estimation results as well as the basic additive model predictions. (A) Percentiles of the simulated fold change measurements using the inferred posterior parameters of each mutant based on Supplementary file Eq. (24). The white data points show the experimental induction data of four biological replicates (three replicates for C203V-PIF). (B) The inferred parameter values of WT and the 8 combined mutants. Error bars of ɛ_L and γ represent the upper and lower bounds of the 95 percent credible region. For each combined mutant, the parameters shown by the lighter and darker bar plots are the results from the basic additive model and direct fitting, respectively. The horizontal lines show the WT parameter values for reference.

Sign up for email alerts