Schematic illustration of the two-domain statistical thermodynamic model of TetR allostery. (A) The crystal structure of TetR(B) in complex with minocycline and magnesium (PDB code: 4AC0). The red residues are the hotspots identified in the DMS study Leander et al. (2020). (B) Four possible conformations of a two-domain TetR molecule with their corresponding free energies (G). G of the LI DI state is set to 0. Blue/green circle (square) denotes the inactive (active) state of LBD/DBD. (C) A simple repression scheme of TetR function. Binding of the ligand (inducer) favors the inactive state of DBD in TetR, which then releases the DNA operator and enables the transcription of the downstream gene. (D) Schematic free energy diagram of the possible binding states of TetR at fixed ligand and operator concentrations. Red, orange and purple arrows show how a mutation can disrupt allostery by 1. increasing ϵL; 2. decreasing ϵD and 3. decreasing γ. The L-LADA state is not explicitly shown in the last column as the doubly-bound L-LADA-D state is expected to have a lower free energy. Note that mutations that change the binding affinities of the active LBD/DBD to ligand/operator are not discussed here as we focus on the intrinsic allosteric properties of the TF itself.

Schematic illustration of the characteristic effects of perturbations in the three biophysical parameters on the free energy landscape and the corresponding induction curves of a two-domain allosteric system. Panels A, B and C illustrate how changing ϵL, ϵD, and γ alone affects the free energy landscape for the binding states shown in Figure 1D and the induction curve. For the black induction curve in (C), the values of ϵL and ϵD are also adjusted to aid visualization of the negative monotonicity of the gene expression level (fold change) as a function of ligand concentration. The green shade in the middle column separates the DNA-bound states from the rest. In the free energy landscapes shown in the middle column, ligand- or DNA-binding is always assumed when the corresponding domain is in the active conformation.

Induction data of 15 TetR mutants and the corresponding parameter estimation results. (A) Shaded blue curves in each plot show the percentiles of the simulated fold-change measurements using the inferred posterior parameters of the mutant. The white data points represent the corresponding experimental induction measurement of 4 or more biological replicates (three replicates for C203V and G102D-HQQ). (B) The inferred parameter values of the 15 mutants. The error bars of ϵL and γ represent the 95th percentile of the Bayesian posterior samples, while the error bar of ϵD is calculated based on the standard error of the mean (SEM) of the corresponding leakiness measurement. The horizontal lines indicate the WT parameter values for reference.

The induction curves for the eight combined TetR mutants from experimental measurement and prediction from the modified additive model. In each plot, black, orange and white points represent the experimental data for mutant 1, mutant 2 and the combined mutant (named mutant 1-mutant 2), specified in the legend and title. The blue band show the 95th percentile of the induction curve prediction from the modified additive model. The modification to the basic additive model in each plot is specified by the 6 weights {α1,ϵD, α2,ϵD, α1,ϵL, α2,ϵL, α1,γ, α2,γ} (see Equation 4), which are (A) {1, 1, 1, 1, 1, 1}; (B) {1, 1, 0.5, 1, 1, 1}; (C) {1, 1, 0.5, 1, 1, 1}; (D) {1, 1, 0, 1, 1, 1}; (E) {0, 1, 1, 1, 0, 1}; (F) {0, 1, 1, 1, 0, 1}; (G) {0, 1, 1, 1, 1, 1}; (H) {0, 1, 1, 1, 1, 1}.

Distribution of the 23 investigated TetR mutants in the parameter space of the two-domain model illustrates the correlation between perturbations in ϵL and γ. Color of the contour plot encodes the fold change at c = 1000 nM calculated for each point in the two-dimensional space of ϵL and γ with the WT ϵD value. The color within the data point of each mutant is based on the F C1000 calculated with its specific ϵD value. The notation of each mutant is abbreviated based on the 1-letter codes of the residues that are mutated. The specific mutations corresponding to each letter code from upper left to lower right are C: C203V; Y: Y132A; GL: G102D-L146A; GT: G102D-T26A; R: R49G; D: D53H; GK: G102D-K98Q; E: E150Y; QE: Q32A-E147G; G: G143M; P: P105M; GYI: G102D-Y42M-I57N; GHQQ: G102D-H44F-Q47S-Q76K; PIF: P176N-I174K-F177S. The least-squares regression line between ϵL and γ is shown with their Pearson correlation coefficient (ρ).

Statistical weights of promoter occupancy states and repressor states. (A) Statistical weights of the promoter occupancy states, with the empty promoter state taken as reference. P is the average number of RNAP per cell. R1, R2, R3 and R4 denote the average number of repressors in the LI DI, LADI, LI DA and LADA state per cell, respectively. NNS is the number of non-specific DNA binding sites in the cell. ΔϵP, ΔϵRA and ΔϵRI represent the energy differences between specific and non-specific DNA binding of RNAP, repressor with DBD in the active and inactive conformations, respectively. (B) Statistical weights of the allosteric states of the repressor, with the LI DI state taken as reference. KA and KI are the dissociation constants of ligand to the repressor with LBD in the active and inactive conformations, respectively, and c is ligand concentration. Partial binding of ligand is ignored in the symmetric model. All energy terms in the exponents are evaluated in the unit of kBT.

Equilibria among different conformational and binding states of the repressor. Here, L and O represent ligand and operator respectively. ϵRA O and ϵRI O are the free energies of operator binding for repressor with DBD in the active and inactive conformation, respectively. R1, R2, R3 and R4 denote the LI DI, LADI, LI DA and LADA state of the repressor, respectively. All energy terms in the exponents are evaluated in the unit of kBT.

Extended parametric study of maintext Equation 1. The four colored curves show that a flat induction curve can result from tuning each one of the three main biophysical parameters of the two-domain model alone (by decreasing ϵD or increasing ϵL from the WT value, or by setting γ to 0). The white points show the experimental induction data for the mutant G102D (measurements of 4 biological replicates at each ligand concentration). Note that the colored curves in the figure are generated for better visualization of the model parameters’ effect on the induction curve and not to be compared with the experimental data of G102D. The leakiness of G102D (0.0173) is higher than that of the WT (0.0086) determined in experiments.

Sequence and structural distributions of the 21 residues chosen for the mutation analyses in this work. The upper panel shows the sequence of TetR (residue 2-203), where the 21 residues chosen for mutation analyses in this work are colored red, orange, green or blue (while the other residues are colored grey). The lower panel shows the crystal structure of TetR(B) in complex with minocycline and magnesium (PDB code: 4AC0). Here, the two identical monomers of TetR are colored white and grey respectively, while the residues chosen for mutation analyses are colored in the same way as in the sequence above in the white monomer. Specifically, the 5 red residues from top to bottom are C203, Y132, I57, R49 and T26. The 4 orange residues from left to right are H44, P105, G102 and L146. The 5 blue residues from top to bottom are Q76, G143, E147, Q47 and Q32. The 7 green residues from left to right are Y42, D53, K98, E150, F177, P176 and I174. Some of the 21 residues are presented in the stick format to aid visualization while all other residues are presented in the cartoon format.

Prior probability distributions and prior predictive check. (A) Density functions of the prior distributions of leakiness, σ, ϵL and γ. The black dots above the prior distributions show the 1000 prior predictive draws of the corresponding parameter. (B) Percentiles of the simulated fold changes using the 1000 sets of parameters shown in (A).

Probability distributions of ϵL, γ and σ values in the 1000 sets of prior predictive draws (ground truth) and the average of the corresponding 1000 sets of inferred posterior distributions of the parameters (inferred).

Distributions of rank statistics of the prior predictive draws relative to the corresponding posterior samples. (A) Histograms (20 bins); (B) ECDF plots; (C) ECDF difference plots of rank statistics of ϵL, γ and σ. The green bands in (A)-(C) show the 99th percentile expected from a true uniform distribution.

Sensitivity analysis for model parameter inference. Posterior z-score and posterior contraction of the inferred posterior distribution for each of the 1000 prior predictive draws of parameters and the corresponding simulated data except: (A) parameters of flat induction curves (see Supplementary file section 3 and Eq. (32)); (B) parameters of flat induction curves or μ(c=1000 nM)>0.97 (see Supplementary file section 3 and Eq. (34)). Accordingly, there are 970 and 523 data points for each parameter in (A) and (B), respectively.

Posterior predictive check of mutant G102D-Y42M-I57N. (A) 1000 sets of posterior samples of {ϵ, γ, σ}. The scattered plots show the joint distributions of the parameters, colored by the log probability of the parameter combinations. Blue/red corresponds to low/high probability. The histograms show the marginal distributions of the corresponding individual parameters. (B) Percentiles of the simulated fold change measurements using the 1000 sets of posterior samples based on Supplementary file Eq. (24). The white data points show the experimental induction data of four biological replicates.

Theoretical induction curves of the four dead mutants when their γ values are set to the WT value while using their respective ϵD and ϵL values (taken from Figure 3B in the main text). The theoretical induction curve is calculated with maintext Equation 1

Sorting scheme to identify dead variants. Each panel is a tiled segment of single site saturation mutants of defined length along the TetR gene. The residue number spanning each segment is shown above the panel. The gray distribution denotes the uninduced population of cells containing mutants. The dark green distribution on the same panel denotes the population of cells responding to the anhydrotetracycline. The uninduced population of cells containing mutants when was added was sorted and reflowed which is shown outside the main panel as the light green distribution.

The induction curves of the eight combined mutants calculated using the basic additive model (i.e., with α1,p = α2,p = 1 in maintext Equation 4). In each plot, the white data points show the experimental induction data of the combined mutants, while the black and orange ones show data of the individual mutants. The blue band show the 95th percentile of the induction curve prediction from the additive model.

Induction curves of the eight combined mutants and the corresponding parameter estimation results as well as the basic additive model predictions. (A) Percentiles of the simulated fold change measurements using the inferred posterior parameters of each mutant based on Supplementary file Eq. (24). The white data points show the experimental induction data of four biological replicates (three replicates for C203V-PIF). (B) The inferred parameter values of WT and the 8 combined mutants. Error bars of ϵL and γ represent the upper and lower bounds of the 95 percent credible region. For each combined mutant, the parameters shown by the lighter and darker bar plots are the results from the basic additive model and direct fitting, respectively. The horizontal lines show the WT parameter values for reference.