Introduction

For decades, cellular organization was primarily understood through the lens of membranebound compartmentalization. However, it has become increasingly clear that membrane-less organelles –such as stress granules, nuclear speckles, and Cajal bodies– are also widespread within cells. 1 At least in some cases, these organelles form by phase separation of their components,2 primarily disordered proteins and nucleic acids, which partition into dense and dilute phases. The condensates thus formed retain properties of liquids, like fusion, dripping and wetting.3 This phenomenon has generated significant interest in the physical mechanisms governing the phase behavior of biomolecular mixtures.4

Many proteins undergoing phase separation share common characteristics, including deviations from typical compositions of folding proteins and simple sequences with multiple amino acid repeats. 5 Specifically, in the sequences of proteins that experience upper critical solution temperature (UCST) transitions, like the low-complexity regions of FUS, hnRNPA1, Ddx4, LAF-1 or atGRP7, we observe long stretches rich in polar residues interspersed with aromatics or positively charged residues. Borrowing language from the theory of associative polymers,6 these units of sequence have been termed “stickers”, in the case of aromatics and positively charged residues, and “spacers”, for the polar residue repeats.79 Spacers act as linkers that lend flexibility to the polypeptide mesh in the protein-dense phase. In contrast, stickers play a crucial role in determining both the single-chain properties of the polymer and the phase behaviour of the condensate through interactions involving their aromatic or charged groups.10 Recent experiments have quantified the influence of different types of stickers in the polymer properties and phase behaviour. 8 This raises a crucial question: what is the fundamental origin of the relative strengths of different stickers?

This question has recently been addressed for the cationic amino acids lysine (Lys) and arginine (Arg).1115 Despite having the same net charge, these residues turn out not to be interchangeable. Mutagenesis experiments on LAF-1 have shown that substituting arginine with lysine completely suppresses phase separation. 12 This distinction is functionally relevant, as Arg to Lys substitutions affect speckle formation. 14 Additionally, experiments on the intrinsically disordered region (IDR) of Ddx4 indicate that LLPS is favoured by Arg relative to Lys.11,12,16 Coarse-grained simulation models have failed to account for the greater propensity of arginine to promote phase separation in Ddx4 variants with Arg to Lys mutations.11 These models are often parametrized using hydrophobicity scales that capture the “stickiness” of different amino acid residues.1720 However, by replacing the hydrophobicity scales in the coarse-grained model with interaction energies from amino acid contact matrices –derived from statistical analysis of the PDB– the correct trends were captured. 11 A key to the greater propensity for LLPS in the case of Arg may derive from the pseudo-aromaticity of this residue.10,21,22

Here we focus on the distinct roles of the main aromatic residues acting as stickers –tyrosine (Tyr) and phenylalanine (Phe)– excluding tryptophan due to its much lower abundance.23 Given that they only differ in a hydroxyl group, one could expect Tyr and Phe to be equally relevant to phase separation, especially considering the finding from double-mutant cycle that Tyr-Tyr and Phe-Phe pairs make nearly identical contributions to protein stability.24 However, experimental evidence on various proteins suggests that Tyr is a stronger driver of condensation.8,10,12,25 This is demonstrated by the reduced propensity to phase separate of Tyr-to-Phe mutants of FUS and LAF-110,12,25 and the opposite effect in Phe-to-Tyr mutants in hnRNPA1.8,10 Understanding the origin of the different sticker strengths of Phe and Tyr is important due to its functional relevance. In a large sample of hnRNPA1 variants, the fractions of Phe and Tyr have been found to co-vary, suggesting that evolutionary control of composition fine-tunes the properties of condensates. 8

The properties of Phe and Tyr may give important insights on the distinct behaviour as stickers in biomolecular condensates. Hydrophobicity scales typically rank Phe as the most hydrophobic residue,20,26 consistent with the greater hydration free energy of Tyr relative to Phe27,28 (see Figure 1A-B). As in the case of Arg and Lys, using hydrophobicity as a proxy for interaction energy in bead simulation models proved insufficient to explain the relative strengths of Phe and Tyr as stickers. 11 However, in this case statistical contact matrices also could not capture the correct order of stickiness. In the Miyazawa and Jernigan statistical potential, Tyr-Tyr contacts are weaker than Phe-Phe (with energies of -4.17 and -7.26 in RT units, respectively; see Figure 1C).29 We note that a different hydrophobicity scale based on peptides undergoing inverse temperature transitions (the Urry scale, 30 where Tyr is more hydrophobic) can account for the correct rank order in saturation concentration. 19 On the other hand, the potentials of mean force between amino acids calculated with atomistic force fields have a deeper free energy well for the Tyr-Tyr pair than for Phe-Phe. 31,32 A final data point of interest is the extremely low solubility of Tyr, over an order of magnitude smaller than that of Phe (0.045 and 2.79 g/100 g of water, respectively; see Figure 1D). 33 This has led to the suggestion that tyrosine-tyrosine hydrogen bonds are stronger than those formed by tyrosine and water.34

Bibliographic values for different properties of phenylalanine and tyrosine. (A) Solvation free energy (∆Gsolv).28 (B) Probability distributions of min-maxed normalized hydropathy values λ from bibliographic hydrophobicity scales.20 (C) Self-interaction energy (εii) from the Miyazawa-Jernigan contact matrix. 29 (D) Solubility in water at 25°C.33

In summary, experimental results on condensates containing Phe/Tyr variants do not seem to align with either solvation free energies, most hydrophobicity scales or statistical contact potentials, but are consistent with atomistic force fields in solution and solubilities in water. One possible explanation for these conflicting findings is that, due to their level of hydration, molecular condensates may differ significantly from the tightly-packed cores of folded protein structures. 11,25 Here we address this paradox using a combination of classical molecular dynamics (MD) simulations and quantum chemical calculations. First, we estimate transfer free energies of peptides including these aromatic residues into model peptide condensates and find that they are more favourable for Tyr than for Phe, an effect that is reversed when we perform the same transformation in apolar media. DFT calculations confirm that the interaction energies in Tyr-Tyr pairs are stronger than those between Phe residues. However, the transfer free energy contribution dominates at sufficiently low dielectric constants, making Phe-Phe pairs more favourable. These findings recapitulate the right rank order of interaction strengths of aromatic stickers in biomolecular condensates but also their crossover in low dielectric media like the hydrophobic cores of folded proteins.

Materials and Methods

Classical molecular dynamics simulations

Molecular models

In this work, we report simulations of terminally-capped GGXGG peptides with X=F/Y in different media, including water, organic solvents and molecular condensates. This peptide family has been characterized extensively in the past, both from experiment35 and simulation.36 We have built structures for the peptides using Ambertools37 and used the Amber 99sb⋆ force field, including corrections for backbone torsions,38 and TIP3P39 water. Our condensates are formed by an equimolar mixture of Gly, Ser and either Tyr or Phe “dipeptides” (in fact, terminally capped amino acids), as before. 40 For the simulations with different organic solvents, we used the GAFF parameters derived by Mobley and co-workers.41

Thermodynamic cycle

To estimate differences in the transfer free energy (∆∆GTransfer) of our short peptides into a different medium –either a condensate or a different solvent– we use an alchemical (i.e. non-physical) transformation.42 Specifically, the thermodynamic cycle we construct involves four different states for the GGXGG peptide, with X=F (λ = 0) or X=Y (λ = 1), which is inserted either in water or in the alternative medium. We illustrate this thermodynamic cycle in the case of transfer into a condensate in Figure 2A. The vertical branches of the thermodynamic cycle are associated to the free energy of transfer of the model peptides from water into the condensate ( and ), while the horizontal branches involve an alchemical transformation (∆GF→Y(H2O) and ∆GF→Y(Cond)). Using simulations, we can easily calculate the free energies corresponding to the horizontal branches. This enables us to estimate the desired quantity as

(a) Thermodynamic cycle used in this study for the estimation of free energy differences upon mutation for the insertion of a peptide in a molecular condensate. (B) Schematic of the process of contact formation between two molecules i and j used in the quantum chemical calculations. We consider that contact formation involves the transfer from water (blue) to a different medium (orange) and the interaction in this medium between the entities involved.

Negative ∆∆GTransfer values indicate that transferring the Tyr peptide into the condensate is more favourable than transferring the Phe peptide, whereas positive values indicate the opposite.

Simulation setup

For the simulation runs in pure solvents, we simply inserted the peptide in an octahedral box leaving at least 1 nm of distance to each wall. Then the boxes were filled with the corresponding solvent molecules. For the simulations in the model condensate, we first inserted the peptide at the central position of a rectangular box with dimensions 8.5 nm×4.5 nm×4.5 nm, containing an equimolar mixture of Gly, Ser and either Tyr or Phe dipeptides, which we term GSY and GSF, respectively. Then, the box was expanded up to 20 nm and filled with water molecules. This results in a simulation box where the peptide of interest is immersed in a protein-dense slab in contact with a dilute phase.

The equilibration protocol is identical in all cases. First, we ran an energy minimization using a steepest descent algorithm, followed by short simulations in the NVT and NPT ensembles, using the Berendsen43 and velocity rescaling44 thermostats, respectively, to set the temperature at 298 K, and the Berendsen barostat 43 to set the pressure at 1 bar on the latter. Finally, we performed equilibrium simulations of in the NPT ensemble using a leap-frog stochastic dynamics integrator with a time-step of 2 fs at 298 K and setting the pressure to 1 bar with the Parrinello-Rahman barostat.45

The duration of the equilibrium runs was 1-2 μs for the condensate simulations and 500 ns for all other solvents. Note that for each system-solvent combination, two different sets of simulations had to be performed, one for each of the alchemical (λ) states corresponding to the central peptide residue being Phe (λ = 0) or Tyr (λ = 1). For the alchemical transitions, we use a non-equilibrium switching method4648 often used for the calculation of differences in binding affinities49,50 or free energy changes upon mutation.51,52 We run short (50 ps) simulation trajectories where the λ-state was switched forward (λ = 0 → 1) or backward (λ = 1 → 0). We have run additional simulations of the solvents in the absence of the solute using the same protocols. To study interactions in the biomolecular condensates, we run simulations of equimolar GSY and GSF mixtures in a slab configuration, which we later sliced to keep only the dense phase. This was then briefly re-equilibrated as described above. Production simulations of the condensates were run for 1 μs.

For all simulations, we have used the Gromacs software package53 (v.2024). To generate hybrid topologies for the different λ-states, we have used the PMX software4648 (v. 3.0).

Analysis

We have analyzed the non-equilibrium switching simulations using the PMX package tools.4648 Specifically, we calculate the work values from the forward and backward transitions (i.e. ) and then estimate free energy differences between the initial and final states using Bennet’s Acceptance Ratio54 (for more details, we refer to the PMX papers4648). For our free energy estimates, we report bootstrap errors.

The simulations have been analyzed using a combination of analysis programs in the Gromacs suite53 and in-house Python scripts that make extensive use of the MDTraj library. 55 Specifically, we used gmx dipoles to calculate the dielectric constant ε for condensates and different solvents from the fluctuations in the dipole moment of the simulation box, M, which is defined as56

In this expression, ε0 is the vacuum permittivity, kB is Boltzmann’s constant and V is the volume of the simulation box. Error analysis has been performed by blocking.

Quantum chemical calculations

We performed quantum mechanical calculations using the Gaussian 16 program to study the interaction energy of the different complexes. All calculations were made at the Density Functional Theory (DFT) level, using the ωB97XD functional57 and Pople’s 6-311++G(d,p) basis set58 for geometry optimization. Energetics were refined with the 6-311++G(3df,2p) basis set. Geometry optimizations were done in different solvents, using the Polarizable Continuum Model approach.59 The interaction energy in a given solvent was calculated according to

where i and j specify the interacting monomers. We consider p-cresol to represent the sidechain of tyrosine (Y, hereafter), toluene to represent the sidechain of phenylalanine (F hereafter) and N-methyl-acetamide to represent an amide bond group (A, hereafter). Dimers ij include different combinations of Y, F, and A in various orientations.

To calculate the free energy transfer of a monomer from water to a given solvent, we used the SMD solvation model60 to calculate accurate solvation-free energies of the monomers in different solvents, using B3LYP/6-31+G(d,p) level of theory. 61,62 Thus, the transfer free energy from water to a given solvent s is calculated according to

As solvents, we have considered three different groups:

  1. Alcohols: methanol, ethanol, 1-propanol, 1-butanol, 1-poentanol, 1-hexanol, 1-nonanol and 1-decanol.

  2. Aliphatic solvents: 2-nitropropane, 2-bromopropane, 1-bromopropane, 1-bromopentane, 1-fluorooctane, n-pentane, n-decane nitrobenzene, o-dichlorobenzene, nromobenzene, iodobenzene, ethylbenzene, benzene, toluene and cyclohexane.

  3. Water with varying dielectric values.

These solvents span a broad range of values of the dielectric constant and can be viewed to represent different chemical environments where contact formation can take place.

Results and Discussion

Transfer free energies into a condensate slab

Our first goal is to find whether differences in transfer free energy for Phe and Tyr variants capture the hierarchy in sticker strength in biomolecular condensates. Ideally one would calculate the transfer free energies for the protein of interest inside a realistic protein condensate. However, this would involve simulations of multiple copies of an IDR with tens or hundreds of amino acids, resulting in a system with very slow conformational dynamics. 6365 As a computationally efficient alternative, we use a simple model peptide with sequence GGXGG, with X being either F or Y, which is transferred from water into the dense mixtures of short peptides we have recently characterized.40 Peptide condensates are useful model systems as they retain many properties of protein condensates but allow for much greater computational efficiency. 40,63,66,67

Free energy estimates for the horizontal branches in the thermodynamic cycle in Figure 2 were obtained from alchemical transformations, which we performed using a state-of-the-art non-equilibrium switching method4648 (see Methodology). The first step in this procedure involves running long equilibrium simulations at the initial λ-states (i.e. λ = 0 and λ = 1). While these calculations are straightforward for a peptide in solution, within the condensate we must first ensure that the condensate slab has converged to a stable density. Also the peptide that experiences the transformation must remain buried within the condensate for all the snapshots that we use as initial frames, so that we do not average the work in the dilute and dense phases. In Figure 3A we show the solvent accessible surface area (SASA) of all protein residues in a GSY condensate with the model peptide submerged. After an initial increase in SASA, at around 100 ns the values stabilize and remain flat for the remainder of the simulation, suggesting that equilibrium between the dense and dilute phases has been reached. We also show the density profiles for all the amino acid residues in the condensate and for the GGXGG peptide alone (Fig. 3B). For both λ values, we find that during the whole duration of the simulation, the GGXGG peptide remains buried within the condensate (see a representative snapshot in Fig. 3C). The protein density in the GSY condensate is comparable to the values reported before with a different force field (i.e. ∼ 1000 mg/mL protein and ∼200 mg/mL water),40 and slightly larger than in full-length IDP condensates. 64

After preparing our system, we selected initial configurations for non-equilibrium switching from evenly spaced snapshots of the λ = 0 and λ = 1 trajectories starting at 100 ns. We ran 200 independent simulations forward and backwards, collecting the values of (∂H/∂λ). Alchemical transformations were also performed in the same way for the peptide in TIP3P water (see Methods). Then we estimate transfer free energy differences by combining data from the transformations inside the condensate and in water. For the GSY condensate, we find that ∆∆GTransfer is small and negative (−1.91 ± 0.53 kJ/mol), indicating that transfer into a peptide condensate is slightly more favourable for the Tyr-containing peptide than for the Phe variant. This result is consistent with mounting experimental evidence that points to a greater propensity to form condensates for Tyr rather than Phe in variants of proteins like FUS, LAF1 and hnRNPA.

(A) Time series for the accessible surface area of the GGXGG in the GSY condensate for different values of λ. (B) Time series for the density profiles of all peptides (blue) and the GGXGG peptide (red) in the coexistence simulations. (C) Representative simulation box with a fully solvated condensate in slab geometry including a GGXGG peptide (spheres) and the capped amino acid mixture (G: white, S: yellow and Y: green).

Similar protein-protein interactions in Tyr and Phe condensates

One possible explanation for this result is the formation of preferential sticker-sticker interactions between the GGYGG peptide and the GSY condensate. While the hydroxyl group in tyrosine in the pentapeptide may form hydrogen bonds with other tyrosines in GSY, these should be depleted for the GGFGG variant. Conversely, if the condensate were composed of Phe instead of Tyr (i.e. the GSF condensate), there would be no reason for a preference for GGYGG. Additionally, from hydrophobicity arguments alone, transfer to the GSF condensate should be favoured for Phe. To test this hypothesis, we have run the same procedure for GSF instead of GSY. We show the time series data for SASA and protein density for the GSF simulations in the Supporting Information Figure S1. We find that ∆∆GTransfer = −2.75 ± 0.58 kJ/mol, hence also favourable to Tyr and of magnitude comparable to the result for transfer to GSY.

To gain further insight into the dominant interactions within condensates formed by different stickers, we have run additional simulations of isolated dense phases for GSY and GSF condensates (see Figure 4A). First we look at the relative number of different interaction pairs inside the condensates. We find that the interaction patterns are very similar in both cases, with a greater number of contacts formed by the aromatic stickers and smaller contributions from Gly and Ser (see Fig. 4B). This result recapitulates observations in atomistic simulations of full-length IDRs64 and is consistent with the scaffold-client relationship between aromatic and polar residues.68 To focus on ππ interactions between pairs of aromatic side chains, we calculated the mean inter-side chain distances and the angles between vectors normal to the aromatic rings. Figure 4C shows their distribution, which is similar in both condensates. Even though aromatic ππ stacking is feasible, it does not play a dominant role (see the regions with low inter-residue distance and angles close to 0 or π). We also examine sp2π contacts formed between aromatic and peptide bond groups, 69 which are much more common. This is consistent with previous simulation work64,70,71 and also with a statistical analysis of experimental PDB structures which showed that these interactions are particularly prevalent in disordered regions of folded proteins.69 Interestingly, also for these sp2π interactions, no significant differences were found between GSY and GSF condensates.

(A) Representative snapshots from the simulations of dense phases. (B) Interaction matrix for the normalized number of contacts between different pairs of amino acid residues in the condensates (top) and for each type of amino acid (bottom). (C) Density plots for side chain interactions between aromatic side chains, as characterised by the mean inter-residue distance and the angle θ between the vectors normal to the rings.72 (D) Density plots for sp2-π interactions between amide bonds and aromatic side chains, as characterised by the mean inter-group distance and the angle θ between the vectors normal to the peptide bond and ring planes. In all panels, results for GSY and GSF condensates are shown on the top and bottom, respectively. Representative snapshots of relevant interactions for each type of pair are shown.

These findings suggest that the preference for Tyr over Phe to form condensates, as captured by ∆∆GTransfer, is not primarily driven by differences in protein-protein interaction patterns. Although interactions may be slightly more favourable for Tyr than for Phe, as suggested by atomistic potentials of mean force, 31,32 these residues have similar contact statistics both for sticker-sticker and for sticker-spacer contacts, and both exhibit a preference for sp2π rather than ππ interactions within the condensate.

Condensates as hydrated media

An alternative explanation could involve the solvent effects on the Phe/Tyr peptides. To investigate this, we begin by analysing the distribution of water molecules around the distinct side chains in the GSF and GSY dense phase simulations, revealing notable differences. Figure 5A presents the radial distribution function (g(r)) for water oxygen around the Cζ in Phe and Tyr side chains. In the case of Tyr, a prominent peak at short distances suggests the presence of water molecules involved in hydrogen bonds, which are enthalpically favourable and entropically unfavourable due to the water structuring around the Tyr hydroxyl group. 25 These thermodynamic contributions are intertwined in the calculated ∆∆GTransfer and will vary depending on whether transfer of the peptide from water occurs into a condensate, into the core of a folded protein or an altogether different solvent, capable or not of hydrogen bonding.

(A) Radial distribution function for water oxygen around the Cζ in Phe/Tyr for GSF and GSY condensates. We show a representative overlay of simulation snapshots where water molecules are hydrogen-bonded to the Tyr side chain. (B) Transfer free energy differences from water to a different medium between Tyr and Phe. We consider condensates (green), polar solvents (orange) and apolar solvents (blue). (C) Same as a function of calculated dielectric constant, ε.

To investigate this dependence, we performed the same type of alchemical calculations using solvents such as octanol or cyclohexane –used in the past as models for the interior of folded proteins33,7376– as well as shorter alcohols or acetone. The results of these calculations are shown in Figure 5B. In the case of a purely non-polar solvent like cyclohexane, the difference in free energy from Phe to Tyr is large and positive (∆∆GTransfer = 13.29 ± 0.20 kJ/mol), indicating that Phe is favoured. This is also the case for aromatic solvents like benzene and toluene. This result is consistent with the stronger interaction strength of Phe-Phe interactions in the Miyazawa and Jernigan potential29 (see Fig. 1C), derived from contacts in the apolar cores of proteins where hydrophobicity dominates. The difference in free energy is smaller for alcohols and decreases gradually as we reduce the length of the aliphatic chain. For acetone, Tyr is slightly more favourable (∆∆GTransfer = −0.91 ± 0.25 kJ/mol). Clearly, the ∆∆GTransfer values have a dependence on the dielectric constant (ε), which we have also calculated for the GSY and GSF condensates (see Figure 5C). Our ε values for the condensates (39 ± 5 for GSY and 47 ± 3 for GSF) are surprisingly close to that derived from experiments on Ddx condensates (45±13),77 and appear at the rightmost part of the diagram. In summary, for non-polar solvents with low ε Phe is preferred to Tyr, while for polar solvents, including the condensates, the trend is reversed. At intermediate values of ε, there is a crossover between a regime dominated by hydrophobicity and one favoured by hydrogen bonding by Tyr.

Quantum calculations confirm a crossover in interaction strength

To characterize these subtle effects in greater detail we resort to quantum chemical calculations, which can more accurately determine interaction strengths for specific configurations involving aromatic residues.72,78 We consider different modes of interaction between the sidechains of Phe and Tyr, represented by toluene (F) and p-cresol (Y), respectively. We considered various orientations in the optimization process and selected the two most stable structures for each case. For F-F, the most stable structures correspond to cross and antiparallel configurations, whereas in the case of Y-Y, H-bonded and antiparallel are most favoured (see Fig. 6A and C). Our results are consistent with previous work on the toluene dimer, where the energy minima also corresponded to the cross and antiparallel structures, with only minor differences in their energies. 79,80 Other structures, like parallel or T-shaped, are less stable for toluene, although in the benzene dimer, T-shaped conformers can be favored. 81 In our optimizations, T-shaped F-F structures tended to collapse to less stable parallel structures, and hence we do not include them here. The interaction energy values we obtained are qualitatively similar to those in the literature, 72 though slightly more negative, and they follow the same rank order (i.e. Y-Y H-bonded < antiparallel Y-Y < cross F-F). We also calculate the energies for Phe and Tyr sidechains interacting with an amide group representing a peptide bond in the prevalent sp2-π interactions69 (see Fig. 6C). In the case of Tyr, we have also analyzed the configuration where it forms a hydrogen bond with the amide through the alcohol group of p-cresol.

(A) and (B) for different interaction pairs ij and different solvents as a function of the dielectric constant, ε. (C) Optimized geometries from quantum chemical calculations for Phe and Tyr interaction pairs. (D) with respect to the cross-F-F interaction for different solvents as a function of the dielectric. Different symbols correspond to the different solvent types. Filled: alcohols; empty: alkanes/benzenes; lines: water with different dielectric.

To explore the environmental dependence in contact formation for amino acid pairs, we have calculated values of the interaction energies considering a set of protic and non-protic solvents (see Fig. 6A). Additionally, we have run the same calculations using a modified water with the dielectric constant as a free parameter. The overall trends in interaction energies are very similar across interaction types. In all cases, the interaction energy tends to become stronger as the value of the dielectric decreases. Also, the relative strength of interactions is consistent across solvents. The strongest interaction corresponds to the Y-A hydrogen-bonded interaction. Next comes a series of partial π-stacking configurations with contributions of hydrogen bonds and sp2-π interactions, namely, Y-Y H-bond, Y-A sp2-π and F-A sp2-π structures, which we have found to be particularly abundant in our MD runs. Finally, the π-stacked F-F cross is the weakest interaction among those included in this analysis.

Therefore, if we consider these interaction energies as the primary driving force for condensate formation, Tyr should be more adhesive than Phe independently of the environment. However, as realized by Miyazawa and Jernigan, 29,82 contact formation, either within the core of folded proteins or in the case of molecular condensates, involves two processes: first, the transfer of these groups from water to an environment with a lower dielectric; and second, the interaction between side chain groups (or with the backbone), which is captured by (see Figure 2B). Hence, for a given solvent, the total energy can be calculated from the sum of these two contributions as

The last two terms on the right-hand side of this expression capture the propensity of a monomer to be transferred from water to the solvent, which can be modelled using polarizable continuum models such as SMD.60 Notice that when the solvent is a low dielectric non-protic solvent such as cyclohexane, recapitulates the results from a standard hydrophobicity scale.26

When both contributions are considered, the picture changes dramatically (see Fig. 6B). In non-protic aliphatic solvents, such as alkanes and benzene derivatives, interaction for F-F pairs becomes more favourable than in Y-Y pairs, irrespective of the dielectric constant. On the other hand, in alcohols, tends to favour Y-A interactions at high ε and F-F pairs at low ε. In other words, at intermediate dielectrics, we find a crossover between F and Y in the propensity to form pairwise contacts in solvents. In our calculations, we have also considered a water-like solvent where the dielectric constant can be tuned as a free parameter (see Fig. 6C). Also, we find a similar tendency in this case, but now the crossover is obtained at slightly lower values of ε. In Figure 6D, we show values, where we are using as reference the energy of the F-F cross configuration, for three selected interactions: Y-Y H-bond, Y-A H-bond, and F-A, for the three types of environments. Negative values of indicate a preference of the corresponding interaction relative to F-F cross, while positive values indicate that cross-F-F is stronger. Clearly, there is a crossover between a regime where toluene forms stronger interactions and one where p-cresol is preferred, which is highly influenced by the type of environment. In a non-protic solvent, where hydrogen bonding is not possible, or in a protic solvent with low dielectric constant, we find that F-F interactions are favoured. In contrast, in high dielectric environments, Y-Y and Y-A interactions dominate.

Conclusions

In this work, we have aimed to reconcile the apparent contradictions between the lower solvation-free energy27,28 and stronger hydrophobicity of Phe20,26 and the greater contact energies assigned to Phe-Phe interactions in proteins29 with the higher sticker strength of Tyr in biomolecular condensates. 8,10,12,25 In the past, the different propensities of positively charged stickers to form condensates had been explained from the fundamental properties of Lys and Arg.10,11,21,22 However, an explanation for the different strengths of Phe and Tyr remained elusive. 11 Through a combination of classical MD simulations and quantum mechanical calculations, we have found that, in addition to the distinct types of interactions formed by Phe and Tyr, a crucial factor in their overall propensity for contact formation across different environments is the transfer free energy. At one extreme, we find environments such as the well-packed core of a protein, where the dielectric constant can be as low as 2-4.83 This type of environment can be modelled using aliphatic solvents, an approach followed extensively in the past.33,7376 In this low-dielectric regime, Phe-Phe interactions are the most favourable, as evidenced by both our classical transfer free energies (Fig. 5C) and the estimated quantum values (Fig. 6). This explains why Phe forms the strongest interactions in contact matrices like that by Miyazawa and Jernigan, which were derived from statistics from 3D structures of proteins29 and are known to be dominated by hydrophobicity effects.84 At the other extreme, in environments with high dielectric constants such as aqueous or polar solvent conditions, Tyr-Tyr hydrogen bonds become more favourable. Biomolecular condensates, due to their high water content (∼ 200 mg/mL in our simulations, but up to ∼600 mg/mL in experiments64), are best modelled as a solvent with an intermediate dielectric. Hence, they operate in the second regime, although the influence of the dielectric is subtle, due to the possibility of crossover as a function of the dielectric.

The conclusions from our work are necessarily limited due to our reductionist approach. First, in our calculations, we are considering very short peptides including a single sticker, and hence we cannot account for the influence of multivalency in protein IDRs.7 Also, we have used a minimal alphabet of possible interaction pairs limited to two types of amino acid residues. We do not consider effects relative to cation-π interactions that have been characterized by others.11,16,49,72,85 Also, results may vary somewhat when considering different classical force fields. For example, our condensates are more protein-dense than found in experiments, possibly due to the propensity of the TIP3P water model to induce compact states.86 However the methodology proposed in the present work can be easily extended to other types of interactions. The overall qualitative conclusions on the crossover in interaction strengths as a function of the environment are likely to be maintained if different quantum approaches are used for contact calculations or different force fields are used for the classical simulations.

Acknowledgements

The authors gratefully acknowledge conversations with Priya R. Banerjee, Rohit Pappu and Tanja Mittag, which inspired this research. The work has been financed by the Spanish Ministry of Science and Innovation through project PID2021-127907NB-I00, and the Basque Government (Project IT1584-22). The authors also thank the IZO-SGI SGIker (UPV/EHU/ERDF,EU) and DIPC for technical and human support and for the allocation of computational resources.

Additional files

Supporting Information