Predicting the sequence-dependent backbone dynamics of intrinsically disordered proteins

  1. Sanbo Qin
  2. Huan-Xiang Zhou  Is a corresponding author
  1. Department of Chemistry, University of Illinois Chicago, United States
  2. Department of Physics, University of Illinois Chicago, United States
9 figures, 2 tables and 1 additional file

Figures

Clock-like tree plot showing lack of homology among the 45 IDPs.

The level of homology between two sequences is measured by the distance from their convergence point to the center of the clock. The highest level of apparent identity is between A1-LCD and TDP-43, at 25%, but these two proteins differ in both secondary structure formation and R2 characteristics. There is, however, a 20-residue overlap between the N-terminus of MBP-xα2 and the C-terminus of rmBG21.

Representative conformations of five IDPs.

(A–E) MKK4, α-synuclein, Mev-PNTD, Sev-NT, and CBP-ID4. Conformations were initially generated using TraDES (http://trades.blueprint.org; Feldman and Hogue, 2002), selected to have radius of gyration close to predicted by a scaling function Rg=2.54N0.522 (Å) (Bernadó and Blackledge, 2009). Conformations for residues predicted as helical by PsiPred plus filtering were replaced by an ideal helix. Finally residues are colored according to a scheme ranging from green for low predicted R2 to red for high predicted R2.

Figure 3 with 1 supplement
Properties of the 45 IDPs in the training set.

(A) Histograms of means and standard deviations, calculated for individual proteins. Curves are drawn to guide the eye. Inset: correlation between R¯2 and σR2. (B) Experimental mean scaled R2 (msR2) and SeqDYN q parameters, for the 20 types of amino acids. Note that Pro residues have low msR2 for the lack of backbone amide proton. Amino acids are in descending order of q.

Figure 3—figure supplement 1
Possible effects of sequence length, temperature, and magnetic field on R2.

(A) Lack of dependence of R¯2 on sequence length. (B) Counts of IDPs with R2 measured at various temperatures. (C) Matching of R2 profiles of Sev-NT measured at two temperatures after uniform scaling. (D) Matching of R2 profiles of A1-LCD measured at two temperatures after uniform scaling. (E) Counts of IDPs with R2 measured at various magnetic fields.

Figure 4 with 1 supplement
SeqDYN model parameters.

(A) Correlation between msR2 and q. The values are also displayed as bars in Figure 3B. (B) Correlation of msR2 and q with amino-acid molecular mass. (C) Correlation of msR2 and q with bulkiness. (D) The optimal correlation length and deterioration of SeqDYN prediction as the correlation length is moved away from the optimal value.

Figure 4—figure supplement 1
T-test of on the q parameters for pairs of amino acids.

q parameters were obtained from five-fold cross-validation training, resulting in five independent values for each q parameter. Mean presented as red bars; standard deviation presented as error bar. *, p<0.05; **, p<0.01; ***, p<0.001; ns, not significant. q parameters for all neighboring pairs not explicitly indicated are not significantly different.

Figure 5 with 1 supplement
Quality of SeqDYN predictions.

(A) Histogram of RMSE(–1). Letters indicate RMSE(–1) values of the IDPs to be presented in panels (B–F). (B–F) Measured (bars) and predicted (curves) R2 profiles for MKK4, α-synuclein, Mev-PNTD, Sev-NT, and CBP-ID4. In (E) and (F), green curves are SeqDYN predictions and red curves are obtained after a helix boost.

Figure 5—figure supplement 1
Close reproduction (curve) of the measured R2 profile (bars) of CBP-ID4 when that set of data alone was used to parameterize SeqDYN.

The resulting model has no value for predicting R2 for other proteins.

Measured (bars) and predicted (curves) R2 profiles for ChiZ N-terminal region, TIA1 prion-like domain, Pdx1 C-terminal region, synaptobrevin-2, α-endosulfine, YAP, AMOTL1, FtsQ, and CAHS-8.

In (C), R2 does not fall off at the N-terminus because the sequence is preceded by an expression tag MGSSHHHHHHHHHHHHS. In (H) and (I), green curves are SeqDYN predictions and red curves are obtained after a helix boost.

R2 profiles predicted (curves) by SeqDYN show close agreement with those measured (bars) on structured proteins in the unfolded state.

(A) Wild-type lysozyme (8 M urea; pH 2; cysteine-methylated). (B) Lysozyme with Trp62 to Gly mutation (pH 2). Methylated cysteines were treated as Ala in the SeqDYN predictions. (C) Apomyoglogin (8 M urea; pH 2.3). (D) Ubiquitin (8 M urea; pH 2).

Comparison between SeqDYN prediction (curves) and effective transverse relaxation rate (bars) from 1H dispersion relaxation experiment.

(A) R2,eff in the high-ωeff limit. (B) R2,eff at low ωeff.

Correlation between the stickiness parameters (λ) and the NMR relaxation parameters (q).

The regression line is shown as dashes.

Tables

Table 1
Experimental conditions, mean and standard deviation of measured R2, and SeqDYN prediction RMSE.
Protein name# of resTemp(K)B0 (MHz)R¯2(s–1)σR2 (s–1)RMSE(s–1)PMID; ref
Training set (45 IDPs) *
A1-LCD1312988002.680.460.6032029630; Martin et al., 2020
Aβ40402786003.400.920.3831181936; Rezaei-Ghaleh et al., 2019
Ash1832788009.801.401.4127807972; Martin et al., 2016
Beclin11652888005.371.031.1427288992; Yao et al., 2016
CAPRIN11033036005.340.880.7231898464; Wong et al., 2020
CBP-ID42072837005.452.552.01;1.9029790640; Murrali et al., 2018
GbnD4-DHD912807006.811.551.2829309054; Jenner et al., 2018
ERD141852886003.960.870.5421336827; Szalainé Ágoston et al., 2011
ExsE882986003.180.880.7622138394; Zheng et al., 2012
FCP1852985002.940.540.4326286791; Lawrence and Showalter, 2012
FUS1632988503.480.510.5426455390; Burke et al., 2015
GAb1822985003.990.880.8934929201; Gruber et al., 2022
hACTR693046003.260.470.4918177052; Ebert et al., 2008
Hahellin922988009.942.692.8524671380; Patel et al., 2014
hCSD11412985003.560.930.9918537264; Kiss et al., 2008
HOX-DFD902986006.983.151.9930802457; Maiti et al., 2019
hZIP4-ICL21002838009.542.371.5830793391; Bafaro et al., 2019
Jaburetox942988006.012.302.2725605001; Lopes et al., 2015
KRS-NT723036003.260.930.8324983501; Cho et al., 2014
MBP-xα2702956003.830.600.5425343306; De Avila et al., 2014
MKK4862788504.491.420.6329276882; Delaforge et al., 2018
N-Cby632984.191.201.2521182262; Mokhtarzada et al., 2011
Niv-PNTD4062887005.411.821.6633177626; Schiavina et al., 2020
NS5A-D2D32682788008.623.852.1426445449; Sólyom et al., 2015
NUPR1932986002.980.820.7631325636; Neira et al., 2019
OPN2203108002.590.820.5431794728; Mateos et al., 2020
p53TAD732988502.720.660.3330240067; Xie et al., 2018
PDEγ872983.961.050.7118230733; Song et al., 2008
PKIα753009003.410.870.5232338601; Olivieri et al., 2020
Mev-PNTD3042989502.920.590.4830140745; Milles et al., 2018
ProTα1132838003.400.560.4329466338; Borgia et al., 2018
Pup642988502.660.510.4330240067; Xie et al., 2018
rmBG211993006004.060.900.6317676872; Ahmed et al., 2007
RPB12012778506.481.741.3328945358; Janke et al., 2018
securin2022835005.491.131.0819053469; Csizmok et al., 2008
Sev-NT1242986003.201.420.76;0.3827112095; Abyzov et al., 2016
Sic1922785003.340.590.4820399186; Mittag et al., 2010
SKIPN712985.641.051.4620007319; Wang et al., 2010
SLBP-NT1132986003.961.401.6115260482; Thapar et al., 2004
α-synuclein1402986002.960.530.4430184304; Rezaei-Ghaleh et al., 2018
SOCS5-JIR703038004.322.361.9126173083; Chandrashekaran et al., 2015
tau K181292837004.120.950.8323740819; Barré and Eliezer, 2013
TC11062986004.651.611.2423189168; Cino et al., 2012
TDP-431512835004.071.510.9627545621; Conicella et al., 2016
γ-tubulin-CT392885002.230.350.2729127738; Harris et al., 2018
Test set (9 IDPs)
AMOTL12072838008.452.552.0435481651; Vogel et al., 2022
CAHS-82333038504.433.252.36;1.9234750927; Malki et al., 2022
ChiZ642988004.330.890.7432585849; Hicks et al., 2020
α-endosulfine1212988003.210.810.4834346186; Thapa et al., 2022
FtsQ993058006.443.782.32;1.7136959324; Smrt et al., 2023
Pdx1832985002.980.700.7630525611; Cook et al., 2019
synaptobrevin-2962786005.541.800.7230975750; Lakomek et al., 2019
TIA-1913108004.010.890.5536112647; Sekiyama et al., 2022
YAP1222988003.191.441.2335378854; Feichtinger et al., 2022
  1. *

    For training set, RMSE is calculated for prediction based on leave-one-out training (using 44 IDPs).

  2. First number is for SeqDYN prediction; second number is after applying a helix boost.

Table 2
RMSEs (s–1) of R2 predictions by SeqDYN and MD for 10 IDPs.
IDP nameSeqDYNMD
A1-LCD0.60*0.59 §,
Aβ400.38*0.38 §
HOX-DFD1.99*1.40 §
α-synuclein0.44*0.50 §
p53TAD0.33*1.04 **
Pup0.43*1.00**
Sev-NT0.38*,1.10 §,††
tau K180.83*0.80 §
ChiZ0.74 1.40 ‡ ‡
FtsQ1.71 §,1.70 § §
  1. *

    Based on leave-one-out training (using 44 IDPs).

  2. Helix boost applied.

  3. Based on training by the full training set (45 IDPs).

  4. §
  5. RMSE is scaled down by a factor of 2.39, to correct for the effect of temperature (MD at 288 K; see Figure 3—figure supplement 1C).

  6. **
  7. ††

    RMSE is scaled down by a factor of 2.99, to correct for the effects of temperature and magnetic field (MD at 274 K and 850 MHz; see Figure 3—figure supplement 1B).

  8. ‡ ‡

    Originally calculated in Hicks et al., 2020 with correction in Hicks et al., 2021.

  9. § §

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Sanbo Qin
  2. Huan-Xiang Zhou
(2024)
Predicting the sequence-dependent backbone dynamics of intrinsically disordered proteins
eLife 12:RP88958.
https://doi.org/10.7554/eLife.88958.3