DIRseq as a method for predicting drug-interacting residues of intrinsically disordered proteins from sequences
eLife Assessment
This important study presents a sequence-based method for predicting drug-interacting residues in intrinsically disordered proteins (IDPs), addressing a significant challenge in understanding small-molecule:IDP interactions. The findings have solid support through examples underscoring the role of aromatic interactions. While predicted binding sites remain coarse, validation was done on a total of 10 IDPs at varying depths. The method builds on the authors' previous work and, with ad hoc modifications, is poised to benefit this emerging field.
https://doi.org/10.7554/eLife.107470.3.sa0Important: Findings that have theoretical or practical implications beyond a single subfield
- Landmark
- Fundamental
- Important
- Valuable
- Useful
Solid: Methods, data and analyses broadly support the claims with only minor weaknesses
- Exceptional
- Compelling
- Convincing
- Solid
- Incomplete
- Inadequate
During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments
Abstract
Intrinsically disordered proteins (IDPs) are now well-recognized as drug targets. Identifying drug-interacting residues is valuable for both optimizing compounds and elucidating the mechanism of action. Currently, NMR chemical shift perturbation and all-atom molecular dynamics (MD) simulations are the primary tools for this purpose. Here, we present DIRseq, a fast method for predicting drug-interacting residues from the amino-acid sequence. All residues contribute to the propensity of a particular residue to be drug-interacting; the contributing factor of each residue has an amplitude that is determined by its amino-acid type and attenuates with increasing sequence distance from the particular residue. DIRseq predictions match well with drug-interacting residues identified by NMR chemical shift perturbation and other methods, including residues L22WK24 and Q52WFT55 in the tumor suppressor protein p53. These successes augur well for deciphering the sequence code for IDP-drug binding. DIRseq is available as a web server at https://zhougroup-uic.github.io/DIRseq/ and has many applications, such as virtual screening against IDPs and designing IDP fragments for in-depth NMR and MD studies.
Introduction
Intrinsically disordered proteins (IDPs) are now recognized as important drug targets (Joshi and Vendruscolo, 2015; Saurabh et al., 2023; Wang et al., 2023; Uversky, 2024). For structured protein targets, a crucial step in characterizing drug binding is identifying the drug-binding pocket. Although an IDP can be locked into a specific conformation by a drug molecule in rare cases (Peterson et al., 2004), the prevailing scenario is that the protein remains disordered upon drug binding (Woods et al., 2011; Ono et al., 2012; Jin et al., 2013; De Mol et al., 2016; Heller et al., 2020; Iwaya et al., 2020; Zhao et al., 2021; Robustelli et al., 2022; Heller et al., 2024). Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues (Jin et al., 2013; Zhao et al., 2021; Robustelli et al., 2022). Such drug-interacting residues, akin to binding pockets in structured proteins, are key to optimizing compounds (Robustelli et al., 2022; Iconaru et al., 2015; Basu et al., 2023) and elucidating the mechanism of action (Woods et al., 2011; De Mol et al., 2016; Zhao et al., 2021; Iconaru et al., 2015; Basu et al., 2023).
NMR chemical shift perturbation (CSP) is the best experimental method for identifying drug-interacting residues and has been applied to many IDPs (Ono et al., 2012; De Mol et al., 2016; Iwaya et al., 2020; Zhao et al., 2021; Robustelli et al., 2022; Heller et al., 2024; Iconaru et al., 2015; Basu et al., 2023; Hammoudeh et al., 2009; Lamberto et al., 2009; Krishnan et al., 2014; Tatenhorst et al., 2016; Ahmed et al., 2017; Kocis et al., 2017; Neira et al., 2017; Ruan et al., 2021; Iruela et al., 2022; Xu et al., 2022). Aromatic residues are frequently among drug-interacting residues (Zhao et al., 2021; Robustelli et al., 2022; Iconaru et al., 2015; Basu et al., 2023; Lamberto et al., 2009; Tatenhorst et al., 2016; Ruan et al., 2021). This is understandable as drug molecules typically are rich in aromatic rings (Figure 1 and Supplementary file 1A), which can form π-π interactions with aromatic residues. As recently pointed out by Heller et al., 2024, CSPs of IDPs elicited by drug binding can be small enough to fall within the spectral resolution of NMR spectroscopy, therefore making it difficult to unequivocally identify drug-interacting residues. The small magnitude of CSPs may arise because the drug does not have a strong preference for interacting with any residues. Another scenario may be that drug binding induces a conformational shift such as secondary structure formation or even partial folding (De Mol et al., 2016; Iwaya et al., 2020; Robustelli et al., 2022), so CSPs spread from the directly interacting residues to the rest of the IDP, adding to the difficulty in identifying drug-interacting residues.

Four intrinsically disordered proteins (IDPs) and drugs that bind to them.
(a) p27, p21, and SJ403. (b) p53 and epigallocatechin-3-gallate (EGCG). (c) α-Synuclein and Fasudil. For each IDP, the DIRseq propensities are rendered by a color spectrum from yellow for low values to red for high values. Predicted drug-interacting residues are shown with sidechains rendered in stick.
All-atom molecular dynamics (MD) simulations have presented atomic details in many IDP-drug systems (Jin et al., 2013; Heller et al., 2020; Zhao et al., 2021; Robustelli et al., 2022; Basu et al., 2023; Ruan et al., 2021; Yu et al., 2016; Herrera-Nieto et al., 2020; Wang et al., 2022; Zhu et al., 2022; Mehta and Goyal, 2024). These simulation studies have highlighted the frequent engagement of drug molecules with aromatic residues, particularly in π-π interactions (Zhao et al., 2021; Robustelli et al., 2022; Basu et al., 2023; Ruan et al., 2021; Herrera-Nieto et al., 2020; Wang et al., 2022; Zhu et al., 2022; Mehta and Goyal, 2024). MD simulations also revealed the emergence of a compact subpopulation of the N-terminal disordered region of the tumor suppressor protein p53 upon binding the drug epigallocatechin-3-gallate (EGCG) (Zhao et al., 2021) and reduced solvent exposure of hydrophobic residues in the 42-residue amyloid-β peptide (Aβ42) upon binding the drug 10074-G5 (Heller et al., 2020). However, MD simulations of IDPs still suffer from the perennial issues of inaccurate force fields and insufficient sampling, which are exacerbated when drug molecules are included. For example, two replicate simulations may identify different drug-interacting residues (Lama et al., 2024).
Here, we present a sequence-based method, DIRseq, as a complement to NMR CSP and MD simulations. This method was motivated by our observation that drug-interacting residues seem to overlap with residues exhibiting elevated transverse relaxation rates (R2) (Qin and Zhou, 2024; Muhammedkutty et al., 2025), as exemplified by the C-terminal 20 residues of α-synuclein (Robustelli et al., 2022). Elevated R2 in IDPs is caused by either local inter-residue interactions or residual secondary structures (Dey et al., 2022). Based on this understanding, we developed a sequence-based method, SeqDYN, to predict R2 of IDPs (Qin and Zhou, 2024). As suggested previously (Muhammedkutty et al., 2025), the propensities of residues to form intramolecular interactions and therefore elevate R2 should be similar to those for forming intermolecular interactions with drug molecules. DIRseq is therefore an adaptation of SeqDYN. DIRseq predictions match well with drug-interacting residues identified by NMR CSP and other methods, including residues L22WK24 and Q52WFT55 in p53 and C-terminal residues in α-synuclein. DIRseq is available as a web server at https://zhougroup-uic.github.io/DIRseq/ and has many applications.
Results
Retooling of SeqDYN into DIRseq
SeqDYN (Qin and Zhou, 2024) predicts the value of residue n by accumulating a contributing factor from each residue i:
where N is the total number of residues in the IDP and is a uniform scale parameter. The contributing factor has an amplitude that is determined by the amino-acid type of residue i and attenuates with increasing sequence distance, , from residue :
The 20 parameters (one for each amino-acid type) and were trained on the measured values for 45 IDPs. The original values (Figure 1—figure supplement 1a) show that aromatic (Trp, Tyr, His, and Phe), Arg, and long aliphatic (Ile and Leu) amino acids are interaction-prone and tend to elevate . (The original SeqDYN also has an option of applying a helix boost; we did not apply this boost here.) We transformed SeqDYN into DIRseq by implementing four changes.
First, we reassigned four parameters. We lowered the values of the three long aliphatic amino acids, Leu, Ile, and Met, to the value of a short aliphatic amino acid, Val, because long aliphatic amino acids primarily participate in hydrophobic interactions, which may be less important for stabilizing the binding of a small molecule in sites largely exposed to water. At the same time, we increased the value of Asp to be the same as that of Glu, to increase the role of Asp’s electrostatic interactions. Both the downgrade of hydrophobic interactions and the upgrade of electrostatic interactions were motivated by observations on drug binding to α-synuclein in MD simulations by Robustelli et al., 2022. The modified set of parameters is displayed in Figure 1—figure supplement 1b.
Second, the original value, 0.0316, corresponds to a correlation half-length, , of 5.6 residues. Given the small size of drug molecules, we increased the value to 0.3, corresponding to a correlation half-length of 1.8 residues. That is, we expect a drug molecule to interact with 3–4 residues at a time. Third, the SeqDYN-predicted R2 profile, capturing experimental observations (Klein-Seetharaman et al., 2002), falls off at both termini, because no residues beyond the termini are present to provide a contributing factor to . However, whereas intra-IDP interactions experience such a terminal effect, IDP-drug interactions do not. To eliminate the terminal effect, we padded the original IDP sequence by a stretch of 12 Gln residues at each terminus. Gln was selected because its value is at the middle of the 20 values (Figure 1—figure supplement 1b).
Lastly, we converted high and low values into high and low drug-interacting propensities, respectively, using a sigmoid function
where the midpoint and width of the transition region are determined by the mean () and standard deviation () of the values over the entire sequence. Specifically,
We chose to be 1.5 and to be 14.0. On the DIRseq web server, users can either keep these default values for , , and , or enter values of their choice. The source code has been deposited on GitHub under the file name DIRseq.js (https://github.com/hzhou43/DIRseq/; Zhou, 2025). The values of range from 0 to 100; = 50 when is at the ‘threshold’ value . Residues with 50 are predicted to be drug-interacting, but an isolated residue with 50 is discarded, as it can be reasonably expected that interactions with at least two consecutive (or nearby) residues are needed to generate sufficient drug binding stability.
Detailed assessment of DIRseq on four IDPs
Drug-interacting residues in four IDPs or intrinsically disordered regions (IDRs) have been thoroughly characterized by NMR CSP. In Figure 1 (left images), we show these IDPs in a single conformation, with DIRseq propensities rendered in a color spectrum (from low in yellow to high in red); predicted drug-interacting residues (i.e. those with 50) are shown with sidechains in stick. Drug molecules that bind to these IDPs are also displayed in Figure 1 (right images). We display NMR CSPs as blue bars and DIRseq propensities as a red curve in Figure 2. Unless otherwise noted, we use m+1.5 SD as the threshold for identifying drug-interacting residues, for both CSP and DIRseq; this threshold is indicated by a horizontal dashed line in Figure 2. Experimentally identified drug-interacting residues are further indicated by cyan shading. Below, we present a detailed comparison between CSP and DIRseq.

Comparison of DIRseq propensities with NMR chemical shift perturbations (CSPs).
(a) p27. (b) p21. (c) p53. (d) α-Synuclein. CSPs are displayed as blue bars and in units of ppb; CSP-identified drug-interacting residues are indicated by cyan shading. DIRseq predictions are shown as red curves. The ordinate scales are chosen so that the m+1.5 SD threshold for CSP is at the same height as the 50% threshold for DIRseq, indicated by a horizontal dashed line.
-
Figure 2—source data 1
Source data for Figure 2.
- https://cdn.elifesciences.org/articles/107470/elife-107470-fig2-data1-v1.xlsx
The kinase inhibitory domain (residues 22–105) of the cell cycle regulator p27 harbors three aromatic-centered motifs that interact with a group of compounds represented by SJ403 (Figure 1a), as found by Iconaru et al., 2015. These motifs, W60N61, E75WQ77, and Y88Y89 (Figure 2a, cyan shading), are correctly predicted by DIRseq (Figure 2a, portions of the red curve above the 50% threshold). Specifically, the predicted drug-interacting residues are R58KWN61, Y74EWQ77, and Y88Y89 (Figure 1a, left image). Moreover, the DIRseq propensity profile matches well the CSP profile along the sequence (Figure 2a). The two parameters show a very strong correlation, with a Pearson correlation coefficient (r) of 0.82 (Figure 2—figure supplement 1a).
Iconaru et al. also studied SJ403 binding to a related cell cycle regulator p21. Their CSP data identified the first two corresponding motifs, W49N50 and F63A64, but not the third as drug-interacting residues (Figure 2b, cyan shading). DIRseq correctly predicts both the first two motifs above the 50% threshold and the third motif below the threshold (Figure 2b, red curve; Figure 1b, left image). As noted by Iconaru et al., the p21 counterpart to the p27 third motif F87YY89 is L76YL78; the two Leu residues in p21 are no match for F87 and Y89 in p27. For the DIRseq method, Leu has a lower value than Tyr and Phe (Figure 1—figure supplement 1b). CSP and DIRseq propensity show a strong correlation (r=0.66; Figure 2—figure supplement 1b).
NMR CSP revealed two motifs, W23K24 and D49IEQWFT55, in the N-terminal region of p53 as interacting with EGCG (Zhao et al., 2021; Figure 2c, cyan shading). MD simulations supported these drug-binding sites. DIRseq again correctly predicts these two motifs (Figure 2c, red curve). Specifically, the predicted drug-interacting motifs are L22WK24 and E51QWFT55 (Figure 1c, left image). Here also CSP and DIRseq propensity show a strong correlation (r=0.67; Figure 2—figure supplement 1c).
Using the m+1.5 SD threshold, CSPs of Robustelli et al., 2022 select residues A124YEMPS129 and Y133QDYE137 of α-synuclein as interacting with the drug Fasudil (Figure 2d, cyan shading). These two C-terminal motifs are also identified by DIRseq, with residues A124YE126 and Y133QDYEP138 above the 50% threshold (Figure 2d, red curve; Figure 1d, left image). There is a moderate correlation (r=0.51; Figure 2—figure supplement 1d) between CSP and DIRseq propensity. Robustelli et al. further characterized the IDP-drug interactions by all-atom simulations, using both full-length α-synuclein and a C-terminal fragment (residues 121–140). Further simulations of the C-terminal fragment binding with additional compounds led to a compound known as Ligand 47 with a somewhat higher affinity than Fasudil. The CSP profiles elicited by Fasudil and Ligand 47 are similar, though with larger amplitudes around Y125 and Y136 by the latter; their correlation coefficient is 0.78 (Figure 2—figure supplement 2a). Because DIRseq does not consider any information about the drug molecule, this correlation coefficient between CSPs elicited by two different compounds might be viewed as an upper bound of what can be achieved for the correlation between CSP and DIRseq propensity.
Dependences of DIRseq prediction accuracies on model parameters
We now use the CSP data of the above four IDPs to assess the dependences of DIRseq prediction accuracies on model parameters (Supplementary file 1B). We use two complementary measures for accuracy: the Pearson correlation between CSP and drug-interacting propensity and the difference between true positive (TP) and false positive (FP). The four IDPs have a total of 31 CSP-identified drug-interacting residues. Recall that DIRseq uses a sigmoid function to convert the output of SeqDYN into drug-interacting propensities. Using the original 20 parameters along with the present values for , , and , the predictions are already pretty reasonable: the correlations for the four IDPs range from 0.46 to 0.79 (sum = 2.43) and TP outnumbers FP by 7 residues. Indeed, this initial success was anticipated (Qin and Zhou, 2024; Muhammedkutty et al., 2025) and validated the premise of DIRseq, i.e., intra-IDP interactions that elevate also tend to mediate drug interactions. As stated above, we tweaked 4 of the 20 parameters to arrive at the final DIRseq; now the r sum increases to 2.66 and TP – FP increases to 15.
Examining the four parameter changes one at a time, the downgrade of for Leu increases both r sum (to 2.53) and TP – FP (to 16) relative to the counterparts with the original SeqDYN parameters. In comparison, the downgrade of for Ile does not affect r sum and yields a small increase in TP – FP (to 10), the upgrade of for Asp yields small increases in both r sum (to 2.52) and TP – FP (to 8), but the downgrade of for Met actually decreases both r sum (to 2.37) and TP – FP (to 6). The latter result shows that we made these parameter tweaks not solely for increasing accuracy but for physical reasons, i.e., to reduce the role of hydrophobic interactions but elevate the role of electrostatic interactions in IDP-drug binding, as suggested by MD simulations.
We also tested alternative models for parameters. Given the prominence of aromatic amino acids in drug binding as revealed by CSP, we wondered how a model that solely emphasizes aromatic amino acids would perform. To that end, we tested a model with only two different values for the 20 amino acids: a high value (same as that of Trp in SeqDYN) for all three aromatic amino acids (Trp, Tyr, and Phe) and a low value (same as that of Gly in SeqDYN) for all other amino acids. This is similar to a sticker-spacer model for simulating liquid-liquid phase separation (Martin et al., 2020). This aromatic model achieves the same r sum, 2.66, as DIRseq, but its FP outnumbers its TP, such that TP – FP = –1. Tesei and Lindorff-Larsen, 2022 parameterized a coarse-grained force field to simulate liquid-liquid phase separation. Their ‘stickiness’ parameters have a good correlation with the SeqDYN original parameters (Zhou et al., 2024). We replaced the parameters by these parameters (after scaling to the same numerical range as the parameters); the resulting model has both a low r sum (1.21) and a low TP – FP (–21). This outcome suggests that, unlike liquid-liquid phase separation where stickiness is the main drive, drug binding is more selective in the type of intermolecular interactions. Lastly, we tested a model where an average hydropathy scale (compiled by Tesei et al., 2021) was used in place of the parameters; the resulting model has very little predictive value (r sum = 0.44 and TP – FP = –20). This last outcome is in line with our downgrade of hydrophobic interactions in DIRseq. The test results from all these alternative models indicate that the parameters in the final DIRseq model capture the appropriate balance among aromatic, electrostatic, and hydrophobic interactions in IDP-drug binding.
In addition to the 20 parameters, DIRseq has 3 other parameters. The parameter determines the number of residues that simultaneously interact with a drug molecule. In Supplementary file 1, we list the performance measures when is varied. These results show that our final choice, 0.3, for is optimal for both r sum and TP – FP. The corresponding number of residues, 3–4, that simultaneously interact with a drug molecule fits with the narrow range of drug molecule sizes in the present study (Supplementary file 1A; molecular weights: 360±130 Da). The last two parameters, and , are in the sigmoid function that converts the output into drug-interacting propensities. sets the threshold for labeling a residue as drug-interacting. Again, our final choice, corresponding to m+1.5 SD for the threshold, achieves an optimum for r sum and TP – FP. A lower threshold leads to a high FP and also a slight deterioration in r sum, whereas a higher threshold leads to a low FP and possibly a minuscule improvement in r sum. controls the sharpness of the sigmoid function in the transition region and affects r sum but not TP – FP. r sum increases with increasing ; we chose an value, where r sum is nearly at the saturation level.
DIRseq as a complement to CSP for assigning drug-interacting residues
After assessing the achievable accuracy of DIRseq, we now consider an application: its combination with CSP to make robust assignments of drug-interacting residues. We note that the above CSP-identified drug-interacting motifs are all anchored on one or more aromatic residues, and this feature likely contributes to the good performance of DIRseq. When a clear ‘aromatic signal’ is not present, CSP or DIRseq alone may not be able to conclusively identify drug-interacting residues. However, a consensus identification by CSP and DIRseq may be reliable. MD simulations have played such a complementary role to CSP in several studies (Zhao et al., 2021; Robustelli et al., 2022; Basu et al., 2023; Zhu et al., 2022).
De Mol et al., 2016 reported the CSPs of an IDR called AF-1* (residues 142–448) in the androgen receptor elicited by EPI-001 and its stereoisomers, including EPI-002; small but reproducible CSPs were found in three subregions, R1 (residues 341–371), R2 (residues 391–414), and R3 (residues 426–446). Conversely, AF-1* caused changes in the 1H NMR spectrum of EPI-001, but the individual peptides corresponding to R1, R2, and R3 did not, suggesting that, rather than separately interacting with the three subregions, EPI-001 induces a partial folding of the three subregions. MD simulations captured the partial folding of the R2-R3 fragment (residues 391–446) induced by EPI-002 and underscored the importance of aromatic residues in drug binding (Zhu et al., 2022). Basu et al., 2023 reported the CSPs of the transactivation unit 5 (Tau-5*, residues 336–448) in AF-1* by EPI-001 and a more potent variant, 1aa (Figure 3a, Figure 2—figure supplement 2b). The contact probabilities of the R2-R3 fragment with 1aa in MD simulations reaffirmed the importance of aromatic residues.

Drug-binding sites identified by combining chemical shift perturbation (CSP) or mutation data with DIRseq predictions.
(a) Tau-5*. (b) NS5A-D2D3. (c) β2 microglobulin. Display items in panels (a, b) have the same meanings as in Figure 2, except that cyan shading indicates consensus identification; in panel (c), vertical lines indicate mutation sites.
-
Figure 3—source data 1
Source data for Figure 3.
- https://cdn.elifesciences.org/articles/107470/elife-107470-fig3-data1-v1.xlsx
De Mol et al., 2016 and Basu et al., 2023 were careful not to name any drug-interacting residues based on CSPs. In Figure 3a, we compare the 1aa-elicited CSPs and DIRseq propensities of Tau-5*. In agreement with both NMR studies, DIRseq identifies drug-interacting residues in the middle of each of R1-R3: R360DYY363, A396WAA399, and W433H434. In addition, the latter two motifs showed the highest drug-contact probabilities in separate MD simulations of the R2-R3 fragment (Basu et al., 2023; Zhu et al., 2022). For the R2 and R3 subregions, CSPs above the m+1.5 SD threshold are observed at residues downstream of the DIRseq identifications, so we propose to expand the drug-interacting motifs to A396WAAAAAQ403 and W433HTLF437. The three putative drug-interacting motifs are indicated as cyan shading in Figure 3a.
Heller et al., 2024 used 19F transverse relaxation measurements to determine the binding affinity of the disordered domains 2 and 3 of the hepatitis C virus NS5A protein (NS5A-D2D3, residues 247–466) for 5-fluorindole. They also measured 1H-15N CSPs at two ligand concentrations but described them as ‘nearly undetectable’ (Figure 3b). We speculate that the small CSPs may be due to the small size of the ligand, making it difficult to interact with multiple residues simultaneously and thus achieve sufficient binding stability. In any event, CSPs above the m+1.5 SD threshold were largely isolated (and thus appeared to be random), and there was not much overlap between residues having these above-the-threshold CSPs at the two ligand concentrations. The one exception is a motif around W312, which had above-the-threshold CSPs at both ligand concentrations, and nearby residues A308 and A313 also had above-the-threshold CSPs at one of the ligand concentrations. DIRseq predicts the motif P310AWARPD316 as drug-interacting residues. We propose the expanded motif, A308LPAWARPD316, as residues interacting with 5-fluorindole (Figure 3b, cyan shading). DIRseq also predicts two more motifs, E323SWRRPDY330 and R352RRR355, as drug-interacting residues, which remain to be tested.
The fibrillation of acid-denatured β2 microglobulin is inhibited by rifamycin SV (Woods et al., 2011). An aromatic-rich motif, W60SFYLLYYTEF70, was implicated in the nucleation of fibrillation and also involved in ligand binding, as a triple mutation, F62A/Y63A/Y67A, significantly weakened binding. Low intensities of NMR peaks from residues 58–79 (possibly due to the formation of residual structures) prevented the measurement of CSPs. DIRseq predicts D59WSFY63 as drug-interacting residues. Combined with the mutational data of Woods et al., 2011, we propose the expanded motif, D59WSFYLLYY67, as the major drug-binding site (Figure 3c, cyan shading). DIRseq also predicts an additional motif, K94WDR97, as drug-interacting residues.
The aggregation of human islet amyloid polypeptide (hIAPP; 37 residues) is inhibited by the small molecule YX-I-1 (Xu et al., 2022). CSPs elicited by this molecule were small (Figure 4a). In addition, CSPs of short IDPs may not exhibit strong disparities because amino acids may be too well mixed along the sequence or drug binding may induce a conformational shift. We thus reduce the threshold for identifying drug-interacting residues to m+1.0 SD when the number of residues is ≤50. With this threshold, three residues are identified by CSP as drug-interacting residues: R11 and V17H18. In comparison, DIRseq identifies T9QRLA13 and F15 as drug-interacting residues, which partially overlap with the CSP identifications. Combining the two types of data, we propose T9QRLANFLVH18 as the primary drug-binding site (Figure 4a, cyan shading). We note that this motif is also prone to α-helix formation (Apostolidou et al., 2008).

Drug-binding sites identified by combining chemical shift perturbation (CSP) or mutation data with DIRseq predictions.
(a) hIAPP. (b) Aβ42. (c) c-Myc. Display items have the same meanings as in Figure 2, but with the following exceptions. (1) In panel (b), vertical lines indicate residues with prominent CSPs; those accompanied by NMR peak broadening have their vertical lines in dark color. (2) In panel (c), three CSP-identified drug-interacting regions are indicated by cyan, olive, and yellow shading. (3) The threshold for identifying drug-interacting residues is lowered to m+1.0 SD.
-
Figure 4—source data 1
Source data for Figure 4.
- https://cdn.elifesciences.org/articles/107470/elife-107470-fig4-data1-v1.xlsx
Ono et al., 2012 acquired 1H-15N heteronuclear single quantum coherence spectra of the 42-residue amyloid-β (Aβ42) in the absence and presence of the oligomerization-blocking compound myricetin. CSPs were not calculated, but chemical shift movements were most pronounced at R5, V12H13, K16LVFFAED23, and I31I32 (vertical lines in Figure 4b). In addition, NMR cross peaks suffered broadening upon ligand binding at four of these residues: R5, V12, K16, and V18 (dark vertical lines in Figure 4b), implicating elevated probabilities of ligand interactions. DIRseq predicts E3FRH6 and H13H14 as drug-interacting residues. Combined with the NMR data, we propose E3FRH6 and V12HHQKLV18 as the primary ligand-binding sites (Figure 4b, cyan shading). Aβ42 CSPs elicited by several other compounds were also widely distributed over the sequence, such that Iwaya et al., 2020 ‘failed to identify’ drug-interacting residues, implicating a conformational shift.
Hammoudeh et al., 2009 identified three distinct drug binding sites in a C-terminal IDR (residues 363–412) of the oncoprotein c-Myc. This region is disordered on its own but forms a helix-loop-helix structure upon heterodimerization with Max. These authors measured the CSPs of three overlapping fragments (residues 363–381, 370–409, and 402–412), each in the presence of a single compound and of the full IDR in the presence of all three compounds. Using 20 parts per billion (ppb) as the threshold, Hammoudeh et al. named residues R366RNELKRSFF375, F375ALRDQIPELE385, and Y402ILSVQAE409 as the binding sites for 10074-G5, 10074-A4, and 10058-F4, respectively. We present their CSP data for the full IDR in Figure 4c. Using the m+1.0 SD threshold, only six residues are identified as drug-interacting residues: R367, L370, F374F375, A399, and S405. DIRseq predicts two motifs, E363RQRR367 and K371RSFFA376, that overlap with the first two sites identified by CSP; a third motif, K397KATAY402, that corresponds to the third CSP-identified site has moderate drug-interacting propensities. Combining CSP and DIRseq, we revise the three drug-binding sites to be E363RQRRNELKRSFF375, S373FFALRDQI381, and K397KATAYILS405 (Figure 4c, cyan, olive, and yellow shading, respectively). In addition to 10074-G5, 10074-A4, and 10058-F4, many compounds bind to these three sites (Yu et al., 2016; Han et al., 2019; Shirey et al., 2021; Li et al., 2024).
Discussion
We have presented the first sequence-based method, DIRseq, to predict drug-interacting residues of IDPs. Assessment against NMR CSP demonstrates the accuracy of DIRseq. Drug-binding motifs are anchored on one or more aromatic residues for forming π-π interactions with drug-like molecules that are rich in aromatic groups. The success of DIRseq comes without any specific information on the drug molecules, suggesting that IDPs may have a relatively simple sequence code for drug binding.
The notion that drug-interacting residues may be agnostic to the molecular details of drug compounds is supported by the fact that the same drug can bind to different IDPs. For example, in addition to p53 (Zhao et al., 2021; Figure 2c), the polyphenol EGCG also binds to many other IDPs, including α-synuclein (Ehrnhoefer et al., 2008), hIAPP (Young et al., 2014; Meng et al., 2010), Aβ40 (Ahmed et al., 2017) and Aβ42 (Ehrnhoefer et al., 2008), tau (Wobst et al., 2015), and merozoite surface protein 2 (Chandrashekaran et al., 2010). Likewise, 10074-G5 binds to c-Myc (Hammoudeh et al., 2009; Figure 4c) but also to Aβ42 (Heller et al., 2020). On the other hand, c-Myc represents a case where different compounds bind to distinct sites on a single IDP (Hammoudeh et al., 2009). A related example is presented by p27, where SJ403 typifies a group of compounds that share the same three binding sites (Figure 2a). Another group of compounds, typified by SJ710, binds only to the third site. Chemically, the presence of nitrogen atoms in the rings of SJ403 enhances its aromaticity and thus strengthens π-π interactions; in addition, the electronegative groups of SJ403 project into different directions, making it less restricted when forming electrostatic interactions (Figure 2—figure supplement 3). These features may explain why SJ403 can bind to all three sites, whereas SJ710 can bind only to the third site, F87YYR90, where three consecutive aromatic residues followed by a basic residue ensure that SJ710 can form both π-π and electrostatic interactions. When more data for multiple drugs binding to a single IDP becomes available, it will be important to use such data to train the next generation of DIRseq where the parameters are drug-specific. As a simple example, the number of residues that can simultaneously bind a drug molecule may grow with the latter’s size; this dependence can be modeled by making the parameter dependent on drug molecule size. The drug molecules studied in the present work have molecular weights of 360±130 Da and thus span a relatively narrow size range.
We have illustrated the combination of DIRseq with NMR CSP to make robust identifications of drug-binding sites in IDPs. Indeed, CSP, MD simulations, and DIRseq are three orthogonal approaches that have great potential in complementing each other, not only for identifying drug-binding sites but also for elucidating the roles of amino acids, their sequence context, and different types of noncovalent interactions in forming such sites. DIRseq offers fast speed and a simple, direct link between sequence motif and propensity for drug binding.
Another application of DIRseq is to define IDP fragments for in-depth study by MD simulations, as shorter constructs both enable the use of a smaller simulation box and reduce the size of the conformational space. For example, based on CSP data and initial MD simulations of full-length α-synuclein, Robustelli et al., 2022 chose a 20-residue C-terminal fragment for simulations of binding with additional compounds, leading to the identification of Ligand 47 as a stronger binder than the original Fasudil. Similarly, based on CSP data from longer constructs of the androgen receptor, both Zhu et al., 2022 and Basu et al., 2023 chose the 56-residue R2-R3 fragment for MD simulations of drug binding. DIRseq can now play a similar role in selecting fragments for MD simulations when CSP data are unavailable. Longer constructs may also present challenges such as resonance assignments to NMR experiments, so well-chosen fragments guided by DIRseq can also benefit NMR studies.
Lastly, virtual screening has been conducted against conformational ensembles of IDPs (Ruan et al., 2021; Dhar et al., 2025); drug-binding sites predicted by DIRseq can be used to guide such screening. As a simple illustration, we present poses of EGCG generated by screening against the two DIRseq-predicted binding sites in p53 in Figure 5. As IDPs sample a vast conformational space, knowledge of the binding site can drastically reduce the computational cost. The subset of conformations that generate high docking scores for a given drug at the known site can also provide insight into the mechanism of drug action.
Methods
The sequences of the IDP studied here and the drugs that bind to them are listed in Supplementary file 1A. All DIRseq predictions were obtained using the web server at https://zhougroup-uic.github.io/DIRseq/. Conformations of IDPs were generated using the TraDES method (Feldman and Hogue, 2002).
Docking of EGCG onto p53 was performed via the SwissDock web server at https://www.swissdock.ch/ (Bugnon et al., 2024) utilizing the Autodock Vina docking engine (Eberhardt et al., 2021). The SMILES string for EGCG from PubChem (CID 65064) and several conformations of p53 were used as input. A cubic region (13–20 Å in side length) around the center of each drug-interacting residue was selected for docking.
Data availability
Figure 2-source data 1, Figure 3-source data 1, Figure 4-source data 1 contain the numerical data used to generate the figures. The source code for DIRseq can be downloaded at https://github.com/hzhou43/DIRseq/ with file name DIRseq.js (copy archived at Zhou, 2025).
References
-
Molecular mechanism for the (-)-Epigallocatechin gallate-induced toxic to nontoxic remodeling of Aβ oligomersJournal of the American Chemical Society 139:13720–13734.https://doi.org/10.1021/jacs.7b05012
-
Structure of alpha-helical membrane-bound human islet amyloid polypeptide and its implications for membrane-mediated misfoldingThe Journal of Biological Chemistry 283:17205–17210.https://doi.org/10.1074/jbc.M801383200
-
Rational optimization of a transcription factor activation domain inhibitorNature Structural & Molecular Biology 30:1958–1969.https://doi.org/10.1038/s41594-023-01159-5
-
SwissDock 2024: major enhancements for small-molecule docking with Attracting Cavities and AutoDock VinaNucleic Acids Research 52:W324–W332.https://doi.org/10.1093/nar/gkae300
-
Sequence-dependent backbone dynamics of intrinsically disordered proteinsJournal of Chemical Theory and Computation 18:6310–6323.https://doi.org/10.1021/acs.jctc.2c00328
-
AutoDock Vina 1.2.0: new docking methods, expanded force field, and Python bindingsJournal of Chemical Information and Modeling 61:3891–3898.https://doi.org/10.1021/acs.jcim.1c00203
-
EGCG redirects amyloidogenic polypeptides into unstructured, off-pathway oligomersNature Structural & Molecular Biology 15:558–566.https://doi.org/10.1038/nsmb.1437
-
Probabilistic sampling of protein conformations: new hope for brute force?Proteins 46:8–23.
-
Multiple independent binding sites for small-molecule inhibitors on the oncoprotein c-MycJournal of the American Chemical Society 131:7390–7401.https://doi.org/10.1021/ja900616b
-
Picosecond dynamics of a small molecule in its bound state with an intrinsically disordered proteinJournal of the American Chemical Society 146:2319–2324.https://doi.org/10.1021/jacs.3c11614
-
Small molecule modulation of intrinsically disordered proteins using molecular dynamics simulationsJournal of Chemical Information and Modeling 60:5003–5010.https://doi.org/10.1021/acs.jcim.0c00381
-
Principal component analysis of data from NMR titration experiment of uniformly 15N labeled amyloid beta (1-42) peptide with osmolytes and phenolic compoundsArchives of Biochemistry and Biophysics 690:108446.https://doi.org/10.1016/j.abb.2020.108446
-
Ligand clouds around protein clouds: a scenario of ligand binding with intrinsically disordered proteinsPLOS Computational Biology 9:e1003249.https://doi.org/10.1371/journal.pcbi.1003249
-
BookIn Intrinsically Disordered Proteins Studied by NMR SpectroscopySpringer International Publishing.https://doi.org/10.1007/978-3-319-20164-1_13
-
Targeting the disordered C terminus of PTP1B with an allosteric inhibitorNature Chemical Biology 10:558–566.https://doi.org/10.1038/nchembio.1528
-
A druggable conformational switch in the c-MYC transactivation domainNature Communications 15:1865.https://doi.org/10.1038/s41467-024-45826-7
-
MYC-targeting inhibitors generated from a stereodiversified bicyclic peptide libraryJournal of the American Chemical Society 146:1356–1363.https://doi.org/10.1021/jacs.3c09615
-
Atomistic molecular dynamics simulations of intrinsically disordered proteinsCurrent Opinion in Structural Biology 92:103029.https://doi.org/10.1016/j.sbi.2025.103029
-
Phenolic compounds prevent amyloid β-protein oligomerization and synaptic dysfunction by site-specific bindingThe Journal of Biological Chemistry 287:14631–14643.https://doi.org/10.1074/jbc.M111.325456
-
Chemical inhibition of N-WASP by stabilization of a native autoinhibited conformationNature Structural & Molecular Biology 11:747–755.https://doi.org/10.1038/nsmb796
-
Molecular basis of small-molecule binding to α-synucleinJournal of the American Chemical Society 144:2501–2510.https://doi.org/10.1021/jacs.1c07591
-
Synthetic fluorescent MYC probe: Inhibitor binding site elucidation and development of a high-throughput screening assayBioorganic & Medicinal Chemistry 42:116246.https://doi.org/10.1016/j.bmc.2021.116246
-
Fasudil attenuates aggregation of α-synuclein in models of Parkinson’s diseaseActa Neuropathologica Communications 4:39.https://doi.org/10.1186/s40478-016-0310-y
-
How to drug a cloud? Targeting intrinsically disordered proteinsPharmacological Reviews 1:PHARMREV-AR-2023-001113.https://doi.org/10.1124/pharmrev.124.001113
-
Rational drug design targeting intrinsically disordered proteinsWIREs Computational Molecular Science 13:e1685.https://doi.org/10.1002/wcms.1685
-
Ligand binding to distinct states diverts aggregation of an amyloid-forming proteinNature Chemical Biology 7:730–739.https://doi.org/10.1038/nchembio.635
-
Ion mobility spectrometry-mass spectrometry defines the oligomeric intermediates in amylin amyloid formation and the mode of action of inhibitorsJournal of the American Chemical Society 136:660–670.https://doi.org/10.1021/ja406831n
-
Fundamental aspects of phase-separated biomolecular condensatesChemical Reviews 124:8550–8595.https://doi.org/10.1021/acs.chemrev.4c00138
Article and author information
Author details
Funding
National Institute of General Medical Sciences (GM118091)
- Huan-Xiang Zhou
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by National Institutes of Health Grant GM118091.
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.107470. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2025, MacAinsh et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,122
- views
-
- 11
- downloads
-
- 0
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.