A parameterized two-domain thermodynamic model explains diverse mutational effects on protein allostery

  1. Zhuang Liu
  2. Thomas G Gillis
  3. Srivatsan Raman
  4. Qiang Cui  Is a corresponding author
  1. Department of Physics, Boston University, United States
  2. Department of Biochemistry, University of Wisconsin, United States
  3. Department of Chemistry, University of Wisconsin, United States
  4. Department of Bacteriology, University of Wisconsin, United States
  5. Department of Chemistry, Boston University, United States
5 figures, 2 tables and 1 additional file

Figures

Schematic illustration of the two-domain statistical thermodynamic model of TetR allostery.

(A) The crystal structure of TetR(B) in complex with minocycline and magnesium (PDB code: 4AC0). The red residues are the hotspots identified in the deep mutational scanning (DMS) study (Leander et al., 2020). (B) Four possible conformations of a two-domain TetR molecule with their corresponding free energies (G). G of the LIDI state is set to 0. Blue/green circle (square) denotes the inactive (active) state of ligand/DNA-binding domain (LBD/DBD). (C) A simple repression scheme of TetR function. Binding of the ligand (inducer) favors the inactive state of DBD in TetR, which then releases the DNA operator and enables the transcription of the downstream gene. (D) Schematic free energy diagram of the possible binding states of TetR at fixed ligand and operator concentrations. Red, orange, and purple arrows show how a mutation can disrupt allostery by (1) increasing εL; (2) decreasing εD, and (3) decreasing γ. The L-LADA state is not explicitly shown in the last column as the doubly bound L-LADA-D state is expected to have a lower free energy. Note that mutations that change the binding affinities of the active LBD/DBD to ligand/operator are not discussed here as we focus on the intrinsic allosteric properties of the transcription factor (TF) itself.

Figure 2 with 3 supplements
Schematic illustration of the characteristic effects of perturbations in the three biophysical parameters on the free energy landscape and the corresponding induction curves of a two-domain allosteric system.

Panels A, B, and C illustrate how changing εL, εD, and γ alone affects the free energy landscape for the binding states shown in Figure 1D and the induction curve. For the black induction curve in (C), the values of εL and εD are also adjusted to aid visualization of the negative monotonicity of the gene expression level (fold change) as a function of ligand concentration. The green shade in the middle column separates the DNA-bound states from the rest. In the free energy landscapes shown in the middle column, ligand or DNA binding is always assumed when the corresponding domain is in the active conformation.

Figure 2—figure supplement 1
Statistical weights of promoter occupancy states and repressor states.

(A) Statistical weights of the promoter occupancy states, with the empty promoter state taken as reference. P is the average number of RNA polymerase (RNAP) per cell. R1,R2,R3,andR4 denote the average number of repressors in the LIDI,LADI,LIDA,andLADA state per cell, respectively. NNS is the number of non-specific DNA-binding sites in the cell. ΔεP,ΔεRA,andΔεRI represent the energy differences between specific and non-specific DNA binding of RNAP, repressor with DNA-binding domain (DBD) in the active and inactive conformations, respectively. (B) Statistical weights of the allosteric states of the repressor, with the LIDI state taken as reference. KA and KI are the dissociation constants of ligand to the repressor with ligand-binding domain (LBD) in the active and inactive conformations, respectively, and c is ligand concentration. Partial binding of ligand is ignored in the symmetric model. All energy terms in the exponents are evaluated in the unit of kBT.

Figure 2—figure supplement 2
Equilibria among different conformational and binding states of the repressor.

Here, L and O represent ligand and operator respectively. εRAO andεRIO are the free energies of operator binding for repressor with DNA-binding domain (DBD) in the active and inactive conformation, respectively. R1, R2, R3, and R4 denote the LIDI,LADA,LIDA,andLADA state of the repressor, respectively. All energy terms in the exponents are evaluated in the unit of kBT.

Figure 2—figure supplement 3
Extended parametric study of main text Equation 1.

The four colored curves show that a flat induction curve can result from tuning each one of the three main biophysical parameters of the two-domain model alone (by decreasing εD or increasing εL from the wildtype WT value, or by setting γ to 0). The white points show the experimental induction data for the mutant G102D (measurements of four biological replicates at each ligand concentration). Note that the colored curves in the figure are generated for better visualization of the model parameters’ effect on the induction curve and not to be compared with the experimental data of G102D. The leakiness of G102D (0.0173) is higher than that of the WT (0.0086) determined in experiments.

Figure 3 with 8 supplements
Induction data of 15 TetR mutants and the corresponding parameter estimation results.

(A) Shaded blue curves in each plot show the percentiles of the simulated fold change measurements using the inferred posterior parameters of the mutant. The white data points represent the corresponding experimental induction measurement of four or more biological replicates (three replicates for C203V and G102D-HQQ). (B) The inferred parameter values of the 15 mutants. The error bars of εL and γ represent the 95th percentile of the Bayesian posterior samples, while the error bar of εD is calculated based on the standard error of the mean (SEM) of the corresponding leakiness measurement. The horizontal lines indicate the wildtype (WT) parameter values for reference.

Figure 3—figure supplement 1
Sequence and structural distributions of the 21 residues chosen for the mutation analyses in this work.

The upper panel shows the sequence of TetR (residues 2–203), where the 21 residues chosen for mutation analyses in this work are colored red, orange, green, or blue (while the other residues are colored gray). The lower panel shows the crystal structure of TetR(B) in complex with minocycline and magnesium (PDB code: 4AC0). Here, the two identical monomers of TetR are colored white and gray respectively, while the residues chosen for mutation analyses are colored in the same way as in the sequence above in the white monomer. Specifically, the 5 red residues from top to bottom are C203, Y132, I57, R49, and T26. The 4 orange residues from left to right are H44, P105, G102, and L146. The 5 blue residues from top to bottom are Q76, G143, E147, Q47, and Q32. The 7 green residues from left to right are Y42, D53, K98, E150, F177, P176, and I174. Some of the 21 residues are presented in the stick format to aid visualization while all other residues are presented in the cartoon format.

Figure 3—figure supplement 2
Prior probability distributions and prior predictive check.

(A) Density functions of the prior distributions of leakiness, σ,εL,andγ. The black dots above the prior distributions show the 1000 prior predictive draws of the corresponding parameter. (B) Percentiles of the simulated fold changes using the 1000 sets of parameters shown in (A).

Figure 3—figure supplement 3
Probability distributions of εL,γ,andσ values in the 1000 sets of prior predictive draws (ground truth) and the average of the corresponding 1000 sets of inferred posterior distributions of the parameters (inferred).
Figure 3—figure supplement 4
Distributions of rank statistics of the prior predictive draws relative to the corresponding posterior samples.

(A) Histograms (20 bins); (B) ECDF plots; (C) ECDF difference plots of rank statistics of εL,γ,andσ. The green bands in (A)–(C) show the 99th percentile expected from a true uniform distribution.

Figure 3—figure supplement 5
Sensitivity analysis for model parameter inference.

Posterior z-score and posterior contraction of the inferred posterior distribution for each of the 1000 prior predictive draws of parameters and the corresponding simulated data except: (A) parameters of flat induction curves (see Appendix 1 section ‘Model parameter estimation’ and Equation 37); (B) parameters of flat induction curves or μ(c=1000nM)>0.97 (see Appendix 1 ‘Model parameter estimation’ and Equation 39). Accordingly, there are 970 and 523 data points for each parameter in (A) and (B), respectively.

Figure 3—figure supplement 6
Posterior predictive check of mutant G102D-Y42M-I57N.

(A) 1000 sets of posterior samples of {εL,γ,σ}. The scattered plots show the joint distributions of the parameters, colored by the log probability of the parameter combinations. Blue/red corresponds to low/high probability. The histograms show the marginal distributions of the corresponding individual parameters. (B) Percentiles of the simulated fold change measurements using the 1000 sets of posterior samples based on Appendix 1 Equation 29. The white data points show the experimental induction data of four biological replicates.

Figure 3—figure supplement 7
Theoretical induction curves of the four dead mutants when their γ values are set to the wildtype (WT) value while using their respective εDandεL values (taken from Figure 3B in the main text).

The theoretical induction curve is calculated with main text Equation 1.

Figure 3—figure supplement 8
Sorting scheme to identify dead variants.

Each panel is a tiled segment of single site saturation mutants of defined length along the TetR gene. The residue number spanning each segment is shown above the panel. The gray distribution denotes the uninduced population of cells containing mutants. The dark green distribution on the same panel denotes the population of cells responding to the anhydrotetracycline. The uninduced population of cells containing mutants when was added was sorted and reflowed which is shown outside the main panel as the light green distribution.

Figure 4 with 2 supplements
The induction curves for the eight combined TetR mutants from experimental measurement and prediction from the modified additive model.

In each plot, black, orange, and white points represent the experimental data for mutant 1, mutant 2, and the combined mutant (named mutant 1-mutant 2), specified in the legend and title. The blue band shows the 95th percentile of the induction curve prediction from the modified additive model. The modification to the basic additive model in each plot is specified by the six weights {α1,εD,α2,εD,α1,εL,α2,εL,α1,γ,α2,γ} (see Equation 4), which are (A) {1,1,1,1,1,1}; (B) {1,1,0.5,1,1,1}; (C) {1,1,0.5,1,1,1}; (D) {1,1,0,1,1,1}; (E) {0,1,1,1,0,1}; (F) {0,1,1,1,0,1}; (G) {0,1,1,1,1,1}; (H) {0,1,1,1,1,1}.

Figure 4—figure supplement 1
The induction curves of the eight combined mutants calculated using the basic additive model (i.e. with α1,p=α2,p=1 in main text Equation 4).

In each plot, the white data points show the experimental induction data of the combined mutants, while the black and orange ones show data of the individual mutants. The blue band shows the 95th percentile of the induction curve prediction from the additive model.

Figure 4—figure supplement 2
Induction curves of the eight combined mutants and the corresponding parameter estimation results as well as the basic additive model predictions.

(A) Percentiles of the simulated fold change measurements using the inferred posterior parameters of each mutant based on Appendix 1 Equation 29. The white data points show the experimental induction data of four biological replicates (three replicates for C203V-PIF). (B) The inferred parameter values of wildtype (WT) and the eight combined mutants. Error bars of εLandγ represent the upper and lower bounds of the 95% credible region (estimated from 1000 posterior samples). For each combined mutant, the parameters shown by the lighter and darker bar plots are the results from the basic additive model and direct fitting, respectively. The horizontal lines show the WT parameter values for reference. PIF, P176N-I174K-F177S.

Distribution of the 23 investigated TetR mutants in the parameter space of the two-domain model illustrates the correlation between perturbations in εL and γ.

Color of the contour plot encodes the fold change at c=1000 nM calculated for each point in the two-dimensional space of εL and γ with the wildtype (WT) εD value. The color within the data point of each mutant is based on the FC1000 calculated with its specific εD value. The notation of each mutant is abbreviated based on the one-letter codes of the residues that are mutated. The specific mutations corresponding to each letter code from upper left to lower right are C: C203V; Y: Y132A; GL: G102D-L146A; GT: G102D-T26A; R: R49G; D: D53H; GK: G102D-K98Q; E: E150Y; QE: Q32A-E147G; G: G143M; P: P105M; GYI: G102D-Y42M-I57N; GHQQ: G102D-H44F-Q47S-Q76K; PIF: P176N-I174K-F177S. The least-squares regression line between εL and γ is shown with their Pearson correlation coefficient (ρ).

Tables

Table 1
Distances to the DNA operator and ligand of the 21 residues under mutational study.
Residue numberDistance to DNA operator (Å)Distance to ligand (Å)
267.324.7
3212.630.4
428.125.0
449.421.6
477.721.9
4911.417.8
5317.012.1
5722.57.0
7645.717.9
9819.915.6
10218.114.2
10524.87.7
13239.316.0
14329.416.9
14625.817.6
14726.816.0
15023.219.0
17434.514.9
17638.819.5
17735.517.5
20355.128.7
Table 2
Bayesian inference results with different prior distributions of γ.
MutantϵL1ϵL2γ1γ2
WT6.616.516.736.626.516.735.295.205.385.295.205.38
Q32A-E147G7.297.027.537.277.027.532.452.382.522.452.382.52
R49G8.628.408.848.628.428.851.661.601.721.651.591.71
D53H5.465.095.805.465.055.791.621.571.661.621.571.67
P105M7.236.707.777.246.597.760.920.860.980.920.860.98
Y132A2.352.152.552.362.172.556.235.976.546.225.966.55
G143M6.926.677.186.916.687.151.521.471.561.521.471.56
E150Y6.396.016.736.385.996.731.721.661.791.721.661.78
PIF11.0510.9211.1711.0510.9011.181.541.491.591.541.491.59
C203V4.514.274.744.534.274.786.955.8910.237.476.0115.84
G102D-T26A7.237.097.367.227.117.354.314.164.514.314.154.52
G102D-Y42M-I57N8.218.098.348.218.098.342.832.782.892.832.782.88
G102D-K98Q5.965.516.365.975.556.372.142.012.282.142.002.29
G102D-L146A5.705.515.885.705.525.885.435.036.055.425.036.05
G102D-HQQ7.567.307.837.577.237.861.541.481.611.541.481.60
Y132A-G102D-T26A9.558.7710.259.678.9210.512.963.392.583.023.472.66
Y132A-R49G5.775.565.985.785.585.983.113.053.163.113.053.16
Y132A-PIF8.718.488.938.708.478.933.693.573.843.693.573.83
Y132A-C203V4.143.944.324.153.964.347.826.4611.129.306.7616.72
C203V-R49G5.405.145.635.395.135.662.382.332.432.382.332.44
C203V-D53H5.264.985.535.264.975.511.421.391.461.431.391.46
C203V-G102D-L146A3.132.813.433.162.783.496.746.009.016.996.0313.63
C203V-PIF8.307.958.618.307.948.653.493.363.643.493.353.64
  1. The column of p1/p2 shows the Bayesian inference results using the Gaussian prior distribution of γ centered at 5 kBT with a standard deviation of 2.5/5 kBT(p=ϵLorγ). The numbers in the table are the medians of the inferred posterior distributions for the corresponding parameters, with their superscripts/subscripts labeling the upper/lower bound of the 95% credible regions (estimated from 1000 posterior samples).

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zhuang Liu
  2. Thomas G Gillis
  3. Srivatsan Raman
  4. Qiang Cui
(2024)
A parameterized two-domain thermodynamic model explains diverse mutational effects on protein allostery
eLife 12:RP92262.
https://doi.org/10.7554/eLife.92262.3