(A) Nonfluorescent cells in the TetR, TtgR, MphR, and RolR single-mutant library were sorted (gray bar) in the presence (light shade) and absence (dark shade) of 1 µM anhydrotetracycline (aTC), 500 …
Protein-wide, single-site saturation mutagenesis of four TetR-like family allosteric transcription factors (aTFs) – TetR, TtgR, RolR, and MphR – using reporter-based screening followed by deep …
Heatmaps detailing the effect of all single mutants at every position in (A) TetR, (B) TtgR, (C) MphR, and (D) RolR are shown. Wild-type residues are black, mutations that do not affect protein …
The distribution of weighted scores for every position in (A) TetR, (B) TtgR, (C) MphR, and (D) RolR is shown. Box and whisker plots above each histogram illustrate the spread of the data where …
The correlation of weighted scores for every position using a ×5 or ×10 read count threshold is shown for (A) TetR, (B) TtgR, (C) MphR, and (D) RolR. The red and orange lines illustrate the spread …
The percent of hotspots in the four main structural regions of the TetR homologs. Regions were broken into groups based on the crystal structures of TetR (PDB ID: 4AC0), TtgR (PDB ID: 2UXU), MphR …
Average conservation score of all positions considered inactive or having no effect in (A) TetR, (B) TtgR, (C) MphR, and (D) RolR. Data show as mean ± SEM.
Allosteric hotspots of TetR, TtgR, MphR, and RolR determined by experiments (A, C, E, G) and the Ohm webserver (B, D, F, H) differ significantly. The Ohm webserver identifies critical residues along …
(A) Residue-residue contact map showing LRIs within each homolog. The LRIs are grouped by color, following standard k-means clustering, representing different regions of the protein. Inset shows …
The percent of hotspot and non-hotspot residues participating in long-range interactions (LRIs) in each homolog protein.
The optimal number of clusters to use for the k-means clustering of long-range interactions (LRIs) in each homolog was determined by iteratively calculating the variance within clusters for 1–25 …
(A) Percentage of mutations (final mutated state) among dead (red) and not-dead (gray) variants from deep mutational scanning (DMS) data for all four homologs combined. (B) Comparison of …
Mutations in (A) TetR, (B) TtgR, (C) MphR, and (D) RolR were separated based on their effect on protein function, dead (red) or no effect (gray), and the proportion of each of the 20 amino acids …
(A) The full list of 27 features is shown at the top. The F scores (measure of importance) of the features for each of the four allosteric transcription factors (aTFs) is shown below. (B) Frequency …
The full list of 27 features is shown at the top. The JSDs (measure of importance) of the features for each of the four allosteric transcription factors (aTFs) is shown below. JSD is a measure of …
The 27 plots correspond to the distributions of TetR’s hotspots’ (hs) and non-hotspots’ (non-hs) z-scored feature values for feature 1–27 as labeled by figure titles. The distributions of hotspots …
The 27 plots correspond to the distributions of MphR’s hotspots’ (hs) and non-hotspots’ (non-hs) z-scored feature values for feature 1–27 as labeled by figure titles. The distributions of hotspots …
The 27 plots correspond to the distributions of TtgR’s hotspots’ (hs) and non-hotspots’ (non-hs) z-scored feature values for feature 1–27 as labeled by figure titles. The distributions of hotspots …
The 27 plots correspond to the distributions of RolR’s hotspots’ (hs) and non-hotspots’ (non-hs) z-scored feature values for feature 1–27 as labeled by figure titles. The distributions of hotspots …
The plots show the average and best F1 scores for 4–10 feature combinations as a function of generation in the genetic algorithm feature selection for the four homologous allosteric transcription …
Frequency of appearance of the 27 features in the top ten 1–10 feature combinations ranked by F1 score for each protein (labeled on top). Row 2–28 corresponds to feature 1–27, row 1 is the average …
Plots of centrality against residue number of each protein (labeled by the title), with the four red stars label the positions of centrality peaks 1–4 from left to right. The centrality peaks are …
(A) Best cross-protein predictions without (CPP, yellow) and with transfer learning (CPP_TL, green) achieved for each protein using models trained with 1–10 features and different training data. The …
(A) Correlation between relative performance and the identity between the template protein and the target protein for modeling. (B) Correlation between relative performance and the root mean squared …
Correlation of sequence identity and RMSD between the four allosteric transcription factors (aTFs) and their corresponding templates used in generating homology models. R squared shows the …
TetR | MphR | RolR | TtgR | |
---|---|---|---|---|
UniRep1900 | 0.50±0.01 | 0.65±0.00 | 0.57±0.01 | 0.43±0.01 |
feat1927 | 0.53±0.00 | 0.69±0.00 | 0.59±0.02 | 0.44±0.00 |
random | 0.11 | 0.12 | 0.09 | 0.07 |
a. Performances are evaluated as the average performance of five times of fivefold cross-validation tests; Unirep1900 and feat1927 show best NN performance using only Unirep features and using Unirep features in combination with 27 physical features, respectively. Data are presented as average ± std.
TetR | MphR | RolR | TtgR | |
---|---|---|---|---|
feat27 | 0.83±0.02 | 0.82±0.02 | 0.64±0.02 | 0.54±0.03 |
UniRep1900 | 0.61±0.07 | 0.50±0.02 | 0.32±0.03 | 0.35±0.03 |
random | 0.19 | 0.16 | 0.26 | 0.21 |
a. Feat27 represents the fitness of the best-performing feature combination emerged in feature selection with the GA-NN approach. Performances are evaluated as the average performance of five times of fivefold cross-validation tests, and presented as average ± std.
Pairwise sequence identity and similarity.
R squared correlation of deads identified at each position between replicates.
Cluster rankings.
Template information.
Cross-protein prediction of mutation phenotype.