Protein compactness and interaction valency define the architecture of a biomolecular condensate across scales
Abstract
Nonmembranebound biomolecular condensates have been proposed to represent an important mode of subcellular organization in diverse biological settings. However, the fundamental principles governing the spatial organization and dynamics of condensates at the atomistic level remain unclear. The Saccharomyces cerevisiae Lge1 protein is required for histone H2B ubiquitination and its Nterminal intrinsically disordered fragment (Lge1_{180}) undergoes robust phase separation. This study connects single and multichain allatom molecular dynamics simulations of Lge1_{180} with the in vitro behavior of Lge1_{180} condensates. Analysis of modeled proteinprotein interactions elucidates the key determinants of Lge1_{180} condensate formation and links configurational entropy, valency, and compactness of proteins inside the condensates. A newly derived analytical formalism, related to colloid fractal cluster formation, describes condensate architecture across length scales as a function of protein valency and compactness. In particular, the formalism provides an atomistically resolved model of Lge1_{180} condensates on the scale of hundreds of nanometers starting from individual protein conformers captured in simulations. The simulationderived fractal dimensions of condensates of Lge1_{180} and its mutants agree with their in vitro morphologies. The presented framework enables a multiscale description of biomolecular condensates and embeds their study in a wider context of colloid selforganization.
Editor's evaluation
In this work, the authors introduce and develop upon a computational model to investigate and quantify the effect of protein conformations and valence of interaction sites as organizers of structure within biomolecular condensates. The authors integrate their findings with new and emerging concepts regarding the coupling between phase separation and percolation as a determinant of driving forces and internal organization of condensates. The key insight that emerges from the current work pertains to the structure that prevails across length scales.
https://doi.org/10.7554/eLife.80038.sa0Introduction
Biomolecular condensates, such as Pbodies, nucleoli, and stress granules, are membraneless structures that contribute to the compartmentalization of the cell interior (Brangwynne et al., 2009; Feng et al., 2019; Lafontaine et al., 2021; Mitrea and Kriwacki, 2016; Mittag and Pappu, 2022; Molliex et al., 2015). They have been implicated in diverse biological functions including transcription, signaling, and ribosome biogenesis (Banani et al., 2017; Boeynaems et al., 2018) and have also been linked with different pathologies (Alberti and Dormann, 2019). Importantly, major efforts have been invested in understanding the general physicochemical principles behind the formation of biomolecular condensates (Alberti and Hyman, 2021; Banani et al., 2017; Brady et al., 2017; Brangwynne et al., 2015; Bremer et al., 2022; Dignon et al., 2018; Martin et al., 2020; Mitrea and Kriwacki, 2016; Pappu et al., 2023; Zeng et al., 2022). While an early paradigm for understanding condensate formation has been liquidliquid phase separation, mounting evidence suggests that a close coupling between segregative phase separation and associative network transition, that is, percolation, may be important in many cases (Choi et al., 2020a; Choi et al., 2020b; Mittag and Pappu, 2022; Pappu et al., 2023; Schmit et al., 2020; Seim et al., 2022).
Formation of biomolecular condensates has been observed for proteins (Nott et al., 2015; Wang et al., 2018; Wei et al., 2017), DNA (King and Shakya, 2021), RNA (Jain and Vale, 2017), and their mixtures (GarciaJove Navarro et al., 2019). Intrinsically disordered proteins (IDPs), in particular, feature extensively in many condensates and are thought to contribute to their formation via transient intermolecular contacts (Banani et al., 2017; Uversky, 2021). An important challenge in this regard has been to provide a multiscale picture of the spatial organization and dynamics of IDP condensates, connecting the conformational properties and interaction patterns of individual polypeptides in a crowded environment with the features of the condensates they build. Considering that IDP condensates are extremely dynamic and structurally heterogeneous, it is clear that such a picture needs to capture the statistical, ensemblelevel aspects of how matter inside the condensates is organized.
While an atomistic view of individual proteins in crowded environments is currently beyond the reach of highresolution experimental techniques, molecular dynamics (MD) simulations have been developed with precisely this aim in mind (Dror et al., 2012). Although limited in terms of sampling efficiency as compared to coarsegrained approaches (Benayad et al., 2021; Dignon et al., 2018; Martin et al., 2020), atomistic MD simulations can provide an accurate picture of protein structure, dynamics, and interactions with subÅngstrom resolution. For example, atomistic simulations have been used to model the dynamics of different peptides such as elastinlike peptide (Rauscher and Pomès, 2017) or different fragments of NDDX4 (Paloni et al., 2020) in condensate environment. Such simulations have also been combined with coarsegrained approaches and experiments (Zheng et al., 2020) and have provided a detailed view of the key interactions behind protein condensate formation (Conicella et al., 2020; Murthy et al., 2019; Ryan et al., 2018). Despite these important advances, however, the study of biomolecular condensates at the atomistic resolution is still at its beginning and many questions related to their most generalizable features are only starting to be addressed (Conicella et al., 2020; Li et al., 2022; Murthy et al., 2019; Ryan et al., 2018). These, in particular, concern the complex interplay between structure, dynamics, and thermodynamics of biomolecules in crowded environments.
In general, the concepts and methods of polymer physics have been widely used to understand the formation of biomolecular condensates (Alberti and Hyman, 2021; Brady et al., 2017; Brangwynne et al., 2015; Dignon et al., 2018; Dzuricky et al., 2020; GuillénBoixet et al., 2020; Martin et al., 2020; Mitrea and Kriwacki, 2016; Pappu et al., 2023). However, in contrast to the dense, continuous phase observed upon macroscopic phase separation in simple polymers, many protein condensates tend to have a lower density and be enriched in water (Alberti and Hyman, 2021; Keating and Pappu, 2021; Wei et al., 2017; Zaslavsky and Uversky, 2018), making them closer in organization to typical colloids (Slomkowski et al., 2011). According to the definition used in colloidal chemistry, biological condensates share features with weak gels, which undergo a transition between a population of finitesize prepercolation clusters or sol, and an infinitely large cluster or gel (Stauffer et al., 1982). In the case of biological condensates, phase separation coupled to percolation results in finitesized colloidal clusters and the appearance of surface tension. In particular, under the requirement that the saturation concentration (c_{sat}) for phase separation is lower than the percolation concentration (c_{perc}), phase separation leads to an increase in local protein concentration and defines the phase boundary, while a percolation transition establishes network connectivity (Mittag and Pappu, 2022).
Importantly, starting with the seminal work of Forrest and Witten, 1979, it has been recognized that aggregates or clusters of colloidal particles typically exhibit fractal scaling, that is that the cluster mass scales with cluster size according to a noninteger power law with the socalled fractal dimension d_{f} as the exponent (Lazzari et al., 2016). In contrast to the regular geometric fractals, colloids generally build statistical fractals in which scaling laws hold between average values of mass and cluster size (Havlin and BenAvraham, 1987; Stanley, 1984). Overall, statistical fractal properties have been demonstrated under different conditions for many nonbiological colloidal systems including silica, polystyrene, and gold colloids (Lazzari et al., 2016). Moreover, by using scattering techniques such as static and dynamic light scattering (Lazzari et al., 2016; Lin et al., 1989), or confocal and scanning electron microscopy (Khatun et al., 2020), the fractal nature has been associated with several biological colloidal systems, including different protein fibrils (Knowles et al., 2007; Nicoud et al., 2015), whey protein isolates (Kharlamova et al., 2020), and clusters of lysozyme (da Silva and Arêas, 2005) and amylin (Khatun et al., 2020). Finally, fractal model has recently been used to interpret the results of coarsegrained simulations and characterize the aggregation of a phaseseparating intrinsically disordered huntingtin fragment (Ruff et al., 2014).
Following the paradigm set by fractal colloidal systems (Lazzari et al., 2016), we derive here a general analytical framework for modeling statistical fractal cluster formation involving biomolecules. As a key result, we connect the interaction valency and compactness of individual biomolecules in condensates with the fractal dimension d_{f}, which in turn enables us to describe the structural organization of the condensate at an arbitrarily chosen length scale. We apply the above framework to the Nterminal 80residue fragment of Lge1, a scaffolding protein required for histone H2B ubiquitination during transcription (Gallego et al., 2020). Lge1_{180} exhibits a strong compositional bias shared by many known phaseseparating proteins (enriched in Y, R, and G, Figure 1A), is fully disordered, and undergoes phase separation readily (Gallego et al., 2020), making it a powerful system to study the general features of condensate formation in IDPs. We combine a detailed atomistic characterization of a model condensate containing 24 copies of Lge1_{180}, obtained via microsecond MD simulations, with our newly developed fractal formalism in order to propose an atomistic model of the Lge1_{180} condensate extending to micrometer scale. To probe sequence determinants of Lge1_{180} LLPS, we investigate its allRtoK (R>K) mutant, where the ability to form condensates in vitro is modulated, and its allYtoA (Y>A) mutant, which impairs phase separation in vitro (Gallego et al., 2020; Figure 1B). Finally, we compare our theoretical predictions against the phase behavior of these different systems as studied experimentally via light microscopy. Our analysis provides a detailed, multiscale model of a biologically relevant condensate and establish a general, experimentally stestable framework for studying other similar systems.
Results
Lge1_{180} condensate formation critically depends on tyrosine residues
We have first experimentally explored the solubility diagrams of WT Lge1_{180} and its R>K and Y>A mutants by light microscopy using recombinant Lge1_{180} peptides (Figure 1—figure supplement 1A). WT Lge1_{180} phase separates already at a protein concentration of 1 µM in 0.1 M NaCl (Figure 1B and C). Its phase separation is robust and can be inhibited at high salt concentration (>2 M NaCl). The presence of tyrosine amino acids, which could provide competing ππ interactions, has no significant impact on WT condensate formation in the concentration range tested, while imidazole disrupts it at concentrations above 0.5 M. Mutating all arginine residues in WT Lge1_{180} to lysine (R>K mutant) increases the c_{sat} for phase separation by approximately 10fold under the conditions tested (Figure 1B and C). The dropletlike WT and R>K condensates (Figure 1B, Figure 1—figure supplement 1B) display normal fusion behavior (Videos 1 and 2) and exhibit high circularity (Figure 1—figure supplement 1C). On the other hand, the substitution of the arginine guanidium groups, which are prone to ππ interactions, by the lysine amino groups, which lack sp2 electrons and do not participate in ππ interactions, makes the R>K mutant more sensitive to aromatic agents such as tyrosine (moderate effect) or imidazole (strong effect) (Figure 1C). At the same time, Y>A mutations strongly impair phase separation (Figure 1B and C, Figure 1—figure supplement 1B, see also Gallego et al., 2020), suggesting that tyrosines play a critical role in WT and R>K phase separation (Figure 1C) and highlighting the importance of ππ interactions in biomolecular condensate formation, as proposed earlier (Vernon et al., 2018). Moreover, the solubility diagrams are also in line with the observation that phase separation is related to the number of tyrosine and arginine residues in FUS and other phaseseparating proteins (Bremer et al., 2022; Dzuricky et al., 2020; Wang et al., 2018). Finally, we note that at protein concentration of 45 μM and above, the Y>A mutant results in sporadic amorphous precipitates (Figure 1C).
We have used fluorescence recovery after photobleaching (FRAP) to characterize the overall dynamics of the condensates formed by WT and R>K Lge1_{180} variants. First, the applied FRAP protocol was tested for LAF1 (see Methods) and a fluorescence recovery of up to 80% in approximately 20 min was demonstrated (Figure 1—figure supplement 2H, I), in agreement with the previously published data (Taylor et al., 2019). While the Y>A mutant was shown to be unable to form dropletlike clusters, the dynamic WT, and R>K condensates were studied using different bleaching strategies, including bleaching at the center, the periphery, or across the whole condensate (Figure 1D, Figure 1—figure supplement 2A–G). Calculation of the recovery halftimes by using different single exponential models did not provide quality fits (Figure 1—figure supplement 2H, Figure 1—figure supplement 2—source data 1). Using doubleexponent fitting allowed us to improve the quality of the fits and accurately describe the data (Figure 1E, Figure 1—figure supplement 2A–C, and Figure 1—figure supplement 2—source data 1). Thus, the WT and R>K recovery halftimes are in the range of 100 s for partial bleaching, and increase up to 350 s for bleaching of the whole condensates under the conditions tested. These values are similar to those previously reported for the in vitro condensates of other phaseseparating proteins (Lin et al., 2015; Taylor et al., 2019). Importantly, the recovery halftime for the whole bleached WT condensates is approximately 30% higher than for R>K (Figure 1E), suggesting a potentially different internal organization of the two types of condensates.
To rationalize the biochemical effects of the mutations, we have calculated the interaction strengths of selected pairwise contacts (YY, RY, KY). To this end, the binding free energies (ΔG) for these sidechain analogs pairs were determined using allatom MonteCarlo (MC) simulations in chloroform, methanol, DMSO, and water separately (Figure 1—figure supplement 1D and the corresponding source data). Interestingly, ΔG(YY) is independent of the polarity of the environment (approx. –5 kcal/mol on average), while ΔG(RY) and ΔG(KY) depend significantly on the dielectric permittivity of the medium (Figure 1—figure supplement 1D), as shown for other aromatic sidechain analogs (F, W) (Polyansky et al., 2009) and nucleobases (de Ruiter et al., 2017) interacting with R and K. Overall, both ΔG(RY) and ΔG(KY) are lowest in the apolar environment (approx. –10 kcal/mol on average), intermediate in bulk water (approx. –7.5 kcal/mol on average), and highest at intermediate polarity, where they are comparable to ΔG(YY). The strong dependence of ΔG(RY) on the properties of the environment and the fact that, regardless of the conditions, ΔG(YY) is never more favorable than ΔG(RY) were also recently reported for simulations of complete amino acids (Krainer et al., 2021), although the exact values are difficult to compare due to differences in the exact nature of the simulated systems, the force fields used, and the method of how the ΔGs were derived.
In our simulations, significantly stronger RY binding as compared to KY was observed only in the environments of intermediate polarity (Figure 1—figure supplement 1D). This could contribute to the observed increase in the concentration required for Lge1_{180} R>K phase separation as compared to the WT, especially if Lge1 condensates exhibit a lower dielectric constant than that of bulk water, as observed elsewhere (Nott et al., 2015). The effect of R>K substitutions on phaseseparation behavior was also examined by others for different IDPs (Bremer et al., 2022; Dzuricky et al., 2020; Schuster et al., 2020; Wang et al., 2018) and was, furthermore, linked with the general differences in the intrinsic physicochemical properties of R and K (Dubreuil et al., 2019; Fisher and ElbaumGarfinkle, 2020; Fossat et al., 2021; Hong et al., 2022; Paloni et al., 2021; Zeng et al., 2022). Our aim here is to explore it further in the context of Lge1_{180} single and multichain proteinprotein interactions. Finally, the Y>A substitution likely causes a strong thermodynamic effect due to the removal of all possible RY and YY intermolecular contacts.
ππ interactions shape protein clustering in Lge1_{180} condensates
To better understand the effect of mutations on the organization of Lge1 condensates, we have performed three 1µslong MD simulations with 24 copies each of Lge1_{180} WT, Y>A, and R>K mutants in the mM concentration range as well as control simulations of the three proteins present as single copies (Table 1 for details). In our simulations, proteins tend to form clusters that are characterized by pronounced structural heterogeneity and dynamics (Figure 2A; Video 3). Specifically, the WT 24copy system forms a single percolating cluster over the last 0.3 µs (Figure 2C) with all copies of the protein engaged. On the other hand, the mutants do not form such an extensive interaction network, but rather associate in multiple, differently sized clusters. The large continuous WT cluster is shaped by ππ interactions (Figure 2A and B) between the most abundant amino acids (R, Y, G), whereby RY contacts dominate (10% of all possible pairs, Figure 2—figure supplement 1A), followed closely by GY and YY contacts (7% of all possible pairs each, Figure 2—figure supplement 1B). In particular, both RY and YY contacts are strongly enriched over the sequence background (see Methods for definition), especially in the intermolecular context, that is between different protein chains in multichain WT simulations (Figure 2—figure supplement 1A, B). While glycine residues contribute significantly to the intermolecular interactions in WT with high absolute frequencies of GY and GR contacts (Figure 2B), these contacts are either only slightly enriched over the expected sequence background (GY) or are even significantly depleted (GR), as indicated in the corresponding source data. The latter suggests that in WT R and Y prefer to contact residues in the sequence other than G, which is also the case for singlechain interactions (Figure 2—figure supplement 1B).
Interestingly, R>K substitutions increase the importance of GY contacts with respect to multichain interactions, where they rank top and are even more enriched than KY contacts. At the same time, for both WT and R>K Lge1_{180} variants the YY contacts are enriched with respect to intramolecular interactions in singlechain simulations and become even more enriched in intermolecular interactions in multichain ones (Figure 2—figure supplement 1A). However, the opposite is seen for other homotypic contacts such as RR in WT and Y>A or GG in all three variants, since these contacts are enriched only for intramolecular interactions in singlechain simulations and are depleted for intermolecular interactions in multichain ones (Figure 2—figure supplement 1B). Altogether, our results show that RY is the top contact type in WT and together with YY drives the multichain association of this IDR, in agreement with recent observations (Bremer et al., 2022). The results also point at a specific set of contacts which undergo significant rewiring when going from intramolecular interactions in singlechain simulations to intermolecular interactions in crowded systems, as further discussed below.
Lge1_{180} interaction valency is defined by its sequence composition
Interaction valency, defined as the average number of binding partners per protein molecule, stabilizes in the course of WT simulations and reaches the average value of approximately 4 over the last 0.3 µs (Figure 2D), with individual WT copies having anywhere between 1 and 9 partners at some point (Figure 2E). In contrast, multiple noninteracting protein copies are still observed for the two mutants throughout the simulations, with the average valencies plateauing around 2 and the maximum number of partners not exceeding 6 in either case (Figure 2E). Thus, mutations with impaired phase separation as detected experimentally exhibit a significantly lower valency of proteinprotein interactions in our simulations (see Supplementary file 1 for details). Differences in valency can be also translated into different probabilities of contact formation, a key concept in percolation theory. We have estimated contact probabilities from simulations under the assumption of a wellmixed system, that is that all chains in the simulation box can in principle establish contacts with all the other chains. As shown in Figure 2—figure supplement 1C, contact probabilities evolve in direct proportion to valency, with a plateau over the last 0.3 µs. Notably, the WT contact probability reaches a level that is ~1.5fold higher than for either mutant. This contact probability is sufficient to form a single percolating cluster with all 24 copies interconnected, and is thus higher than the critical contact probability at which one expects the network transition, which is not seen for the two Lge1_{180} mutants (Figure 2C).
The interactions between IDPs in our simulations are characterized by a dynamic binding mode where the interacting sequence motifs (‘stickers’) and the noninteracting sequence motifs (‘spacers’) (Martin et al., 2020) can be described only statistically, and no welldefined, specific structural organization of protein complexes is detected (Figure 3A). For WT, the typical binding mode, which corresponds to the average valency of 4, is characterized by different partners binding along the sequence in multiple, highly interactive regions (Figure 3A and B). In line with valency decrease, both mutants display fewer prominent interaction motifs, whereby for the phase separationdisruptive Y>A mutant intermolecular binding is detected for the Cterminal third of the molecule only (Figure 3A).
Interestingly, the dynamic modes identified in the intramolecular context are clearly different for all modeled proteins (Figure 3—figure supplement 1A), which is also partially observed at the level of pairwise contact statistics (Figure 2—figure supplement 1A, B), as discussed above. Specifically, intra and intermolecular interactions rely on a similar pool of contacts by aminoacid type, but differ significantly if one analyzes specific sequence location of the interacting residues involved (Figure 2—figure supplement 1A, B). For example, one observes a high correlation between the frequencies of different contacts by amino acid type when comparing intramolecular contacts in singlechain simulations and intermolecular contacts in multichain simulations (Figure 3—figure supplement 1B). This correlation is completely lost if one analyzes positionresolved statistics (2D pairwise contacts maps) or statistically defined interaction modes (Figure 3A, Figure 3—figure supplement 1A, C). Interestingly, the core of intramolecular interactions observed for a single molecule at infinite dilution and in the crowded context remain approximately the same, as reflected in the high correlation between intramolecular modes obtained in single and multichain simulations (Figure 3—figure supplement 1C). The latter suggests that proteins maintain core selfcontacts and establish new ones with neighbors, but do not lose selfidentity as expected in a polymer melt. The observed ‘symmetry breaking’ between intra and intermolecular mode of IDRs interactions is in line with the recent study by Bremer et al., 2022, and (Martin et al., 2020). In particular, Lge_{180} exhibits a relatively high net charge per residue (0.075), a nonuniform patterning of tyrosines (Ω_{aro} = 0.47, p = 0.57, see Martin et al. for methodological details), and a high abundance of arginines, all of which could contribute to symmetry breaking, as proposed in these studies.
Lge1_{180} sequence impacts its conformational behavior and dynamics
The perturbation of intra and intermolecular interaction networks by mutations results in a different conformational behavior of the resulting Lge1_{180} variants. At the level of single molecules, extensive interactions involving R and Y residues result in a substantial compaction of the WT chain with the avaerage radius of gyration <Rg_{MD}> = 1.58 ± 0.12 nm (Figure 4A, see also Supplementary file 1 for details). In contrast, the Rg_{MD} distributions of the R>K and Y>A variants cover a wider range and display significantly higher average values (1.81 ± 0.35 and 1.71 ± 0.29 nm, respectively, Figure 4A). This is in all three cases more compact as compared with the predictions for a random coil of the same length (<Rg_{rc}> = 2.50 nm, Bernadó and Blackledge, 2009). On the other hand, the difference between WT and the two mutants is directly related to the extent of intramolecular interactions: the looser the interaction network, that is the fewer longrange sequence contacts there are, the larger the <Rg> (Figure 4—figure supplement 1A). In the crowded environment, WT again adopts a compact organization with an almost invariant <Rg_{MD}> = 1.60 ± 0.22 nm when averaged over all 24 copies (Figure 4B), while the weakly selfinteracting Y>A mutant unwinds toward more extended conformations (<Rg_{MD}> = 1.97 ± 0.43 nm). At the same time, R>K, being most loosely packed in the singlemolecule context, comes closer to WT values in the crowded environment (<Rg_{MD}> = 1.67 ± 0.27 nm, Figure 4B). The latter can potentially be explained by the repulsive nature of KK contacts, which are enriched in the singlemolecule context and depleted in the crowded phase (Figure 2—figure supplement 1A and B). Note here that, unlike KK pairs, RR pairs can engage in ππ interactions as reflected in relatively higher fractions and enrichments of these contacts for Y>A, particularly in the singlemolecule context (Figure 2—figure supplement 1B), as previously observed in PDB structures (Vernon et al., 2018). These results highlight the fact that the conformational behavior of IDPs in the bulk or in the crowded phase displays a clear sequencespecific character and cannot easily be generalized.
Singlemolecule translational diffusion coefficients of Lge1_{180} variants obtained from fitting of MSD curves with an applied finitesize PBC correction and solvent viscosity rescaling (see Methods for details) are ~120 µm^{2}/s for all three singlechain simulations or anywhere between 100 and 150 µm^{2}/s for multichain simulations and different Lge1_{180} variants (Figure 4C and Supplementary file 2). In comparison, the diffusion constant of the similarly sized ubiquitin (Rg = 1.32 nm, 76 aa, 8.6 kDa) at the protein concentration of 8.6 mg/ml is 149 µs^{2}/s (Altieri et al., 1995), while that of GFP (Rg = 2.8 nm, 238 aa, 27 kDa) at the concentration of 0.5–3 mg/ml is ~90 µs^{2}/s (Baum et al., 2014). This suggests that the diffusional dynamics captured by our simulations may be realistic. The obtained viscosity values (see Methods for details) in singlechain simulations of the three Lge1_{180} variants (effective concentration of 2.3 mg/ml) are all similar to each other and are close to the calculated solvent viscosity for TIP4PD water/0.1 M NaCl of 0.83 mPa*s (Figure 4—figure supplement 1F and Supplementary file 2). In the crowded multichain systems (effective concentration of 6–7 mg/ml), the viscosity systematically increases by about 20% and is again similar for all three Lge1_{180} variants (Figure 4—figure supplement 1F and Supplementary file 2). These calculated values are in the range of reported values for other similar systems, for example serum albumin (Gonçalves et al., 2016).
In both single and multichain simulations, the WT translational diffusion coefficients are somewhat lower than for either mutant (Figure 4C and Supplementary file 2). This effect does not appear to be related to protein size (<Rg_{MD}>, Figure 4B) or viscosity (Figure 4—figure supplement 1F), but may reflect protein slowdown due to more extensive interactions with partners, at least in the crowded environment (Figure 2D). For instance, the WT diffusion coefficient drops by approximately 20% over the last 0.3 µs of the trajectory (Figure 4C), which correlates with the formation of a single percolating cluster in the system (Figure 2C). At the same time, the R>K diffusion constant does not change during the multichain simulation and is similar to the singlechain one (Figure 4C), likely due to electrostatic repulsion in the crowded environment. For Y>A there is no clear trend, whereby the diffusion constant over the last 0.3 µs of multichain simulations is similar to the singlechain one, which may be related to its larger size as compared to other variants (Figure 4B).
We have next quantified the effect of R and Y mutations on the Lge1_{180} conformational dynamics by estimating the configurational entropy (S_{conf}) and its changes in different contexts via maximum information spanning tree (MIST) formalism. Due to the representation of molecules in internal bondangletorsion (BAT) coordinates (see Methods), MIST is well suited for unstructured protein ensembles, as shown before (Fleck et al., 2018). S_{conf} displays a reasonable convergence between the individual replicas of the singlechain simulation on the 1 µs time scale (<0.1 kJ/mol/K), especially in the case of the weakly selfinteracting Y>A mutant (Figure 4—figure supplement 1B). Interestingly, in the crowded multichain environment, we observe a significant decrease in S_{conf} for all three variants, with the biggest change seen for WT and the smallest for Y>A (Figure 4D). These results suggest that the crowding involves a conformational reorganization of the molecules toward decreasing the available free volume, that is increasing their compactness (volume fraction) φ, defined here as the ratio between the van der Waals and the hydrodynamic volume of a molecule. Finally, there is a substantial increase in S_{conf} associated with the Y>A mutation of the WT in both single and multichain contexts, in contrast with the R>K mutation, where a much weaker effect is observed (Figure 4E).
In the crowded environment, φ reaches plateau values over the last 0.3 µs (Figure 4—figure supplement 1C) with the average value fluctuating within a 2–4% interval for different averaging blocks (Supplementary file 1). Importantly, an increase in φ in the crowded environment correlates directly with an unfavorable ΔS_{conf} (Figure 4F, Pearson R is –0.6), an effect which can potentially be compensated for by the favorable enthalpy upon forming the extensive multivalent interaction network as observed for the WT. Conversely, ΔS_{conf} correlates poorly with the average number of bound partners (valency; Pearson R is –0.3) and there is no correlation between the compactness φ, and the valency of interactions (not shown). Being mutually largely independent, these two characteristics of protein molecules – compactness and valency – are the key parameters describing the organization of the corresponding crowded phase, which generally reflect the entropic and the enthalpic contributions to selforganization, respectively.
Describing condensate architecture via a fractal scaling model
To extrapolate the atomisticlevel properties of the crowded protein phase to larger length scales, we modeled the assembly of phaseseparated condensates as an iterative, fractal process (Figure 5A, see also Appendix 1). Colloid fractal models typically start with an Ansatz capturing the powerlaw dependence between mass and size across different scales (Carpineti and Giglio, 1992; Lazzari et al., 2016). For conceptual clarity and to demonstrate where the powerlaw dependence comes from, our derivation starts from a simple physical picture of associating clusters, and yields the known scaling relationship bottomup. Thus, we assume that initially individual protein molecules, characterized by a given volume fraction φ, interact to form clusters of a given average valency $n$. In the next iteration, these clusters arrange into higherorder clusters, whereby the average valency of each cluster and the volume fraction occupied by the clusters formed in the previous iteration remain constant at all levels of organization. For instance, if a single protein molecule binds four other proteins ($n$ = 4), this results in the seconditeration cluster consisting of 5 protein molecules (Figure 5A). In the next iteration, this 5mer arranges with four other 5mers into a new cluster with the same valency of 4, resulting in a larger cluster with 25 molecules. According to this model, the smaller clusters from iteration $i1$ always occupy the same volume fraction φ of the apparent volume of the larger cluster from iteration $i$ (taken as 2/3 in Figure 5A). This scenario results in a simple fractal formalism whose benefit is that it yields exact solutions for different features of the clusters at each iteration (see Appendix 1). Thus, for each iteration $i$, the formalism returns the number of molecules (Equation 1), the apparent volume (${V}_{i}$, Equation 4), the size (${R}_{i}$, Equation 5), the mass (${M}_{i}$, Equation 8), and the effective molar concentration of molecules in the cluster (${C}_{i}$, Equation 6). While the above derivation for simplicity uses the constant volume fraction and valency at each iteration, we want to emphasize that in the case of structurally heterogeneous statistical fractal clusters, these parameters would be equivalent to their time and ensembleaveraged counterparts.
We have compared the predictions of the fractal model with what is seen in the actual Lge1_{180} simulations. Importantly, the simulations in the first instance just give the average valency and compactness of individual chains in the dense phase. The fractal formalism, which is conceptually independent from the simulations, subsequently provides the dependence of condensate mass on its radius, M(R), at any desired length scale. This, in turn, enables one to directly test the predictions of the fractal formalism in the case of the actual clusters seen in the simulations. Thus, over the last 0.3 µs of MD simulations, the WT multichain system displays a narrow distribution of sizes for the single molecule (Figure 4B) and leads to a single percolating cluster (Figure 2D, Figure 5—figure supplement 1A) with an average radius R = 6.27 ± 0.24 nm. The latter point agrees closely with the predictions of the fractal formalism when the average values for φ and $n$ obtained over the last 0.3 µs of MD simulations are used (Figure 5—figure supplement 2). Thus, a single cluster consisting of 24 chains observed in simulation directly corresponds to the third iteration of the model (Figure 5—figure supplement 1A). Moreover, the slope (A, Equation 10) and the intercept (B, Equation 11) of the linear regression for the log R vs log Mw plots are also similar between the simulations and the model (2.33 vs 2.42 and 1.13 vs 1.11, respectively) (Figure 5—figure supplement 1B). Finally, the latter parameters allow the exact calculation of the characteristic φ and $n$ values (Equations 12, 13; 0.638 and 3.76, respectively), which again are very similar to those obtained directly from allatom MD (Figure 5—figure supplement 2). These nontrivial correspondences suggest that fractal organization is present even at the shortest scale, that is at the level of MD simulation boxes.
The model also enables one to explore the space of φ and $n$ parameters for a condensate with a fixed size and protein concentration (Equation 7, Figure 5B). For instance, for a 1 µm condensate with a 1 mM apparent protein concentration, the corresponding φ and $n$ values are generally in the range observed in MD for the crowded WT system. Generally, in order to keep the ratio of sizetoconcentration fixed, protein compactness must decrease nonlinearly with increasing valency (Figure 5B, dashed lines). Accordingly, perturbation of compactness and valency due to the two types of studied mutations can result in a decrease of the apparent concentration in condensates of fixed size. Importantly, the compactness of IDPs is not a stable parameter and is tunable by different factors (temperature, pH, ionic strength, etc.) (Uversky, 2009). This, in turn, also suggests that IDP concentration inside condensates may also be adaptable and tunable. Thus, unwinding or compaction of an IDP due to any factor would change the apparent concentration and density in the condensates. The latter also opens up the possibility for potential microphase transitions inside of phaseseparated droplets, in analogy to those known for lipid membranes (Lewis and McElhaney, 2013). Finally, protein concentration as a function of condensate size can be directly estimated from the model (Figure 5—figure supplement 1C). Notably, the complex topology of condensates (see also below) as proposed by the model allows for the formation of droplets with a very low apparent protein concentration.
Valency and compactness define fractal dimension and condensate topology
The fractal model provides a direct relationship between size and mass of different clusters which captures the scaling behavior of the condensate matter. Thus, the slope of the line in the log R vs. log Mw plot (A) is equal to the fractal dimension d_{f}, which describes the topology of molecular clusters (Carpineti and Giglio, 1992). The fractal dimensions equaling exactly 1, 2, or 3 correspond to the objects exhibiting 1D, 2D, or 3D organization, respectively, while systems with noninteger d_{f} have an intermediate dimensionality. The proposed model facilitates a direct investigation of the scaling properties in condensates for a molecule with defined characteristic compactness and valency of interactions. Most importantly, the fractal dimension of a condensate is completely defined by φ and $n$ (Equation 10), reflecting the predictive potential of the proposed model. For instance, the average values of φ and $n$ derived from MD simulations in the crowded state result in a different scaling behavior for WT, R>K and Y>A proteins, whereby their dimensionality respectively decreases (Figure 5C) and results in a different morphology of the corresponding condensates (Figure 6B), as observed experimentally (Figure 1B and C). Specifically, while the WT condensate simulated here exhibits a d_{f} of 2.42 and is also experimentally found to undergo robust phase separation, the R>K mutant exhibits a lower d_{f} of 2.09 and undergoes phase separation under a more limited set of conditions. Even more extremely, d_{f} is 1.63 for the Y>A mutant (Figure 5C), which does not undergo phase separation and can be thought of as an object between 1D and 2D.
Reconstructing the condensate architecture across scales
There exist several algorithms in the colloid literature that enable one to reconstruct the geometry of fractal clusters starting from a given fractal dimension d_{f} (Kätzel et al., 2008; Morán et al., 2019; Thouy and Jullien, 1994) (and prefactor k_{f}, which is equal to 1 in the present model; see Appendix 1 for derivation). Recently, Morán et al., 2019, have proposed a robust and tunable algorithm, FracVAL, for modeling the formation of clusters consisting of polydisperse primary particles. We have used the values of d_{f} derived from our simulations for the three Lge1_{180} variants in combination with FracVAL to generate individual realizations of the respective condensate structure on the length scale of hundreds of nanometers. In Figure 6A, we demonstrate this procedure for WT Lge1_{180}: FracVAL produces cluster geometries using spherical particles with radii corresponding to the respective <Rg_{MD}> values, which are then computationally replaced by the realistic protein conformations obtained from our simulations. In agreement with its micrometerscale behavior observed in vitro, the modeled WT Lge1_{180} condensate exhibits a densely packed fractal geometry. In contrast, the reconstructed Y>A cluster exhibits an elongated, filamentous topology of low dimensionality (Figure 6B), which may preclude the formation of welldefined phaseseparated droplets (Figure 1B and Figure 1—figure supplement 1B). The R>K cluster exhibits an intermediate topology. The clusters shown in Figure 6B were generated using 1024 primary particles in all three cases: clearly, the Y>A cluster occupies a significantly larger volume as compared to the R>K cluster and especially the WT. In particular, the three reconstructed systems exhibit holes and cavities of different sizes (see Videos 4–6 to zoom in the internal organization of the model condensates), with Y>A being most extreme in this regard. Finally, note that the three reconstructions shown in Figure 6A and B are individual snapshots of the local architecture; the full fractal model entails an ensemble of such snapshots, all configurationally different, yet still conforming to the same scaling pattern.
Discussion
We have provided a multiscale description of Lge1_{180} condensates extending from a detailed analysis of constituent molecules at the atomistic level all the way to the micrometersized droplets involving thousands of individual molecules, which can be observed in vitro. We have shown that mutations of R and Y residues induce perturbations at the level of both intra and intermolecular interaction networks, and result in conformational effects that can be related to the phase behavior and the ability to form condensates as detected by light microscopy. Specifically, the characteristic descriptors of protein behavior in the crowded phase, valency, and compactness (volume fraction) are shown to be sufficient to describe the structural organization of condensates across length scales in the context of the proposed analytical fractal model. Importantly, the studied mutations substantially change either just the valency (R>K) or both the valency and compactness (Y>A) of Lge1_{180} polypeptides as shown in MD simulations.
The applied simulation protocol reproduces the level of diffusive protein dynamics expected from molecules of Lge1_{180} size. As a further indication of its general quality, the values of valency and compactness obtained from simulations are consistent with the difference in FRAP recovery dynamics observed for WT and R>K. Namely, the accurate fitting of FRAP data is possible only if using at least two components (Figure 1—figure supplement 2). According to Sprague and McNally, 2005, these components reflect the contribution of particle diffusion and interactions. Thus, the recovery in centrally bleached condensates is faster for WT than for the R>K mutant, which can be related to the higher compactness of WT particle across scales, as compared to R>K (Figure 1E, Figure 1—figure supplement 2A). On the other hand, the FRAP results for the condensates bleached in the peripheral area highlight the contribution of valency to condensate formation. Indeed, the recovery is about three times faster for the R>K mutant (Figure 1—figure supplement 2B), which could potentially be related to the lower valency of interactions and the ease of replacement of inactivated fluorescent species or/and exchange with proteins in the bulk. Indeed, a similar behavior with faster recovery for the R>K is observed when bleaching the whole condensate (Figure 1E, Figure 1—figure supplement 2C).
In the fractal model, the changes in protein valency and compactness translate to different scaling behavior and, subsequently, different topology of the condensates. Brangwynne and coworkers have successfully adopted the theoretical formalism of patchy colloids to capture the relative contributions of oligomerization, RNA binding, and structural disorder in the formation of stress granules and Pbodies (Sanders et al., 2020). In their analysis, they emphasized the role of the valency of colloid particles as a key parameter defining specificity and tunability of condensate features. Furthermore, CollepardoGuevara and coworkers have used a coarsegrained patchyparticle colloid model to study the determinants of condensate stability and have highlighted the role of valency, while also clearly demonstrating the general impact of volume fraction (Espinosa et al., 2020). Our study now provides a general framework for assessing the importance of these two parameters on the formation of biomolecular condensates. Importantly, the proposed model predicts the existence and provides a quantitative characterization of topological properties of prepercolation finitesize clusters that are in line with the recent findings (Kar et al., 2022; Mittag and Pappu, 2022; Pappu et al., 2023). More generally, the model provides the fractal dimension (d_{f}) of protein clusters and enables evaluation of different scaledependent properties of clusters of arbitrary size, including protein density as a function of cluster size (Figure 5—figure supplement 1C, Figure 5C). Finally, MD simulations of proteins in the crowded context on the length scale of tens of nanometers can be used in combination with clustercluster aggregation algorithms to derive atomistically resolved models of the 3D organization of fractal clusters of any chosen size (Figure 6A and B, and Videos 4–6).
As discussed above, an emerging paradigm for biomolecular condensate formation is that of phase separation coupled to percolation. Importantly, fractal behavior, as explored in the present work, is naturally related to percolation phenomena. For instance, direct links between the fractal dimension defining scaling principles in selfsimilar clusters and critical exponents in different percolation models have been provided (Kapitulnik et al., 1984; Stauffer et al., 1982). While a similar formal derivation for our model is out of the scope of the present study, we can provide an illustration of how our results and the key parameters of the model can be interpreted according to percolation theory. A key concept in percolation theory is that of contact probability between components in the system. The network transition appears and a percolating cluster is formed if the contact probability exceeds a particular threshold, that is the critical contact probability (p_{crit}). For instance, in the simplest case according to the FloryStockmayer theory p_{crit} = 1/(n1), where n is a number of bonds formed by each monomer, and is related to the valency in our model. Thus, interaction valency contributes directly to the spatial organization of prepercolating clusters (the fractal dimension) and defines the threshold of the percolation. Therefore, our analysis shows that WT Lge1_{180} displays a robust network transition due to high contact probability, which depends on its valency and, more generally, on the topology of its clusters. This perspective suggests that concentrations at which network transition is expected should be lower for WT than for either mutant, as indeed observed. Of note, the ability of IDRs to form lowdimensional fractal structures (d_{f} is in a range of 1.6–1.9) upon the disruption of their tendency to phase separate by a polyalanine insertion was demonstrated experimentally for synthetic elastinlike polypeptides (Roberts et al., 2018).
The fractal formalism also provides a framework for approaching the question of protein concentration inside of phaseseparated condensates, which covers a large range extending to tens of mM and beyond (Brady et al., 2017; McCall et al., 2020; Ryan et al., 2018). However, some proteins form condensates with extremely low concentrations in the dense phase. For example, the measured binodals of LAF1 indicated that the concentration of the protein inside the droplet is 86.5 μM, which corresponds to an average separation between molecular centers of mass of 27 nm (Wei et al., 2017). Considering that the average Rg of LAF1 is approximately 4 nm, it is not immediately clear how the molecules inside the droplet are organized in order to simultaneously establish intermolecular contacts and also form lowdensity droplets. Fractal organization provides a simple resolution of this apparent conundrum. Namely, fractal systems are characterized by a remarkable property that their packing density is a function of the length scale on which it is examined (Stanley, 1984). For example, the density of a Sierpinski gasket, which is selfsimilar on all length scales, decreases exponentially with the length scale, with the exponent of d_{f}  d, where d is the dimensionality of the space. Translated to the question of biomolecular condensates, fractal organization enables high local concentration of biomolecules at short length scales and, simultaneously, low global concentration at long length scales, as illustrated in Figure 5B and Figure 5—figure supplement 1C.
Importantly, fractal systems are characterized by the existence of holes, that is unoccupied regions of space, whose size covers a wide range of length scales (Stanley, 1984). The observation that the µmsized WT Lge1_{180} droplets are fully permeable to dextrans in vitro (Gallego et al., 2020), even up to Mw = 2000 kDa or Rg ~27 nm (Armstrong et al., 2004; Figure 6—figure supplement 1), suggests that their organization allows for large holes (~55 nm in diameter or more). This is consistent with our fractal model, which proposes that the dimensionality of WT condensates is below 3 (d_{f} = 2.42, Figure 6, Video 2). Finally, the multiscale nature of biomolecular condensates, as embodied in the statistical fractal model, also points to the possible formation of clusters with sizes well below the resolution limit of light microscopy. This may relate to some of the open questions in the condensate field, especially when it comes to their in vivo function (McSwiggen et al., 2019; Musacchio, 2022). Having said this, it should be emphasized that the proposed fractal model may be applicable to varying degrees in different systems. In particular, it was shown that in vitro Lge1 assembles a liquidlike core that is surrounded by an enzymatic outer shell formed by the E3 ubiquitin ligase Bre1 (Gallego et al., 2020). Whereas such condensates can exhibit a diameter of 1 µm or more when grown in vitro, they are expected to be smaller in cells (i.e. low nm range) (Gallego et al., 2020). Hence, the relevance of the proposed model will need to be tested experimentally with respect to nmsized coreshell condensates in cells. Moreover, future studies must include the spherical Bre1 shell as a boundary condition which presumably constrains the 3D orientation of the Lge1 Cterminus and thereby impinges on the geometry of the Lge1 meshwork.
Finally, the fractal model predicts the coexistence of differently sized clusters within a condensate, as reported recently (Kar et al., 2022), which have a characteristic scaling of mass with condensate size in the nm to µm range. This prediction of the model can be tested using static light scattering (SLS) techniques and will be a subject of our future work: a linearly decreasing intensity as a function of the scattering vector in a loglog representation, as frequently seen for different colloidal systems, is expected by the fractal model (Lazzari et al., 2016; Lin et al., 1989). In fact, fractal dimension can be estimated from SLS experiments as the limiting value of scattering curves for high values of the product of the scattering vector q and the average cluster size <Rg> (Hagiwara et al., 1996). In addition, techniques such as DLS and MALS can be used in order to measure independently masses and sizes of LLPS condensates in vitro. It will be also important to analyze to what extent these features are retained in more complex, biologically relevant contexts.
Colloidal cluster formation is typically discussed in the context of two limiting regimes (Klein et al., 1990; Lazzari et al., 2016; Lin et al., 1989). In diffusionlimited cluster aggregation (DLCA), the rate of cluster formation is determined by the time it takes for colloidal particles to encounter each other and every encounter leads to binding. In reactionlimited cluster aggregation (RLCA), particles need to overcome a repulsive barrier before binding and not every encounter is productive. Both regimes result in fractal behavior, with DLCA leading to looser structural organization and lower fractal dimensions and RLCA leading to more compact structural organization and higher fractal dimensions. Computer simulations and scattering experiments show that d_{f} is ~1.8 for DLCA and ~2.1 for RLCA (Lazzari et al., 2016; Lin et al., 1989). Importantly, Meakin and colleagues have shown that the two regimes of colloid cluster formation are universal and do not depend on the chemical nature of the underlying particles, making them an attractive paradigm for modeling the multiscale structure of biomolecular condensates as applied here (Lin et al., 1989). However, for flexible, multivalent molecules like IDPs, the exact regime of cluster formation would depend on the values of φ and $n$ and may be a tunable feature of the exact conditions. In Figure 6C, we present the nonlinear relationship between the fractal dimension d_{f} on φ and $n$, which exhibits certain general trends. For example, high values of either compactness φ or valency $n$ or both, such as in the case of WT Lge1_{180}, result in high values of d_{f} and may be more associated with the RLCA model, while low values of both parameters may more be associated with the DLCA model (Figure 6C). A more quantitative analysis of the connection between the exact mechanism of cluster formation and the underlying parameters of compactness and valency will be a topic of future work.
Overall, our results provide an atomistic framework for understanding the role of valency and compactness of IDPs on condensate stability and architecture across scales. This presents an opportunity for the rational, quantitatively founded design of phaseseparating agents with predefined condensate properties. Indeed, recent studies have demonstrated the possibility to tune condensate properties in vitro and in vivo by manipulating the aromatic residue content and molecular weight in IDPs (Dzuricky et al., 2020) or the size of disordered linkers and valency in modular proteins (Lasker et al., 2021). Finally, it is our hope that our results may help to critically embed the field of biomolecular condensates in the wider context of colloid chemistry. We expect that the powerful theoretical, computational, and experimental tools of colloid chemistry could propel the study of biomolecular condensates to the next level of fundamental understanding.
Methods
Protein expression and purification
All proteins were expressed in Escherichia coli BL21 CodonPlus (DE3) RIL cells. 6HisLge_{180}StrepII constructs were induced by addition of 0.5 mM isopropyl 1thioβDgalactopyranoside at OD_{600} = 0.8 at 23 °C for 3 hr and purified as published elsewhere (Gallego et al., 2020) with Talon Superflow beads (Cytiva) in a final elution buffer (10 mM Tris, 1 M NaCl, 1 mM TCEP, 1 M imidazole, 10% vol/vol glycerol, pH 7.5) and stored at –80 °C. For protein labeling with Dylight 488 NHSEster (Thermo Scientific), the final elution of the Lge1_{180} constructs was performed in 10 mM HEPES, 1 M NaCl, 1 mM TCEP, 1 M imidazole, 10% vol/vol glycerol, pH 7.5. Labeling was performed during the elution step for 45 min according to the manufacturer’s instructions. Unbound dye was removed by sequential buffer exchange in centrifugal filters Amicon Ultra 0.5 ml 3 K (Merk Millipore). Lge1_{180}Dylight labeled protein was stored at –80 °C.
6HisLAF1 was expressed and purified as described (ElbaumGarfinkle et al., 2015) with some modifications as follows. Lysis buffer included 20 mM HEPES, 500 mM NaCl, 10% vol/vol glycerol, 14 mM βmercaptoethanol, 10 mM imidazole, 1% vol/vol Triton 100, pH 7.5, and was supplemented with 0.5 mg/ml lysozyme, DNase I, and protein inhibitor mix HP (Serva). After washing and eluting from NiNTA Sepharose 6 FastFlow beads (GE Healthcare) in elution buffer (20 mM HEPES, 1 M NaCl, 10% vol/vol glycerol, 14 mM βmercaptoethanol, 250 mM imidazole, pH 7.5), protein was labeled with Dylight 488 NHSEster (Thermo Scientific) for 30 min. Unbound dye was removed by sequential buffer exchange in centrifugal filters Amicon Ultra 0.5 ml 30 K (Merk Millipore). Finally, LAF1Dylight labeled protein was stored at –80 °C.
Protein quality was assessed by SDSPAGE (4–12% gel, MOPS buffer) and Coomassie staining. Total purity of the protein was calculated by densitometry analysis of the gel bands with ImageJ. Percentage of the fraction of fulllength protein was calculated in relation to all the bands present after purification.
Solubility diagrams
Different concentrations of purified 6HisLge1_{180}StrepII proteins in a total volume of 100 µl were added to a bottomclear 96well plate (Greiner BioOne) in a buffer containing 25 mM Tris, 100 mM NaCl, 100 mM imidazole, 1% vol/vol glycerol, and 1 mM DTT, pH 7.5. Varying concentrations of NaCl (100 mM to 3 M), imidazole (100 mM to 2.25 M), or tyrosine (0.25 mM to 2.5 mM) were analyzed as indicated in Figure 1C. Plates with the protein mix were incubated for 10 min at 20 °C. Turbidity of the samples was measured at 450 nm in a Victor Nivo plate reader (Perkin Elmer) at 20 °C. Assessment of protein phase separation or aggregation was performed by applying a total volume of 20 µl of the sample to a pretreated (Gallego et al., 2020) 16well glassbottom ChamberSLIP slide (Grace, BioLabs). DIC imaging was performed as previously described (Gallego et al., 2020).
Circularity
Morphology of Dylightlabeled Lge1_{180} particles was assessed by studying circularity, calculated using the formula:
Image analysis was done in Fiji/ImageJ by applying the particle analyzershape descriptor plugin, and statistical analysis was conducted in GraphPad Prism v 7.0e. For each construct, at least four independent images at each respective protein concentration were analyzed (1 µM for WT and 10 µM for R>K). Total number of the particles analyzed (n) is included in the figure legend (Figure 1—figure supplement 1C).
Fluorescence recovery after photobleaching
FRAP experiments were performed on a temperaturecontrolled DeltaVision Elite microscope as previously described (Gallego et al., 2020). Lge1_{180} WTDylight condensates were formed by 100% labeled protein at a final concentration of 10 µM in 25 mM Tris pH 7.5, 100 mM NaCl, 1% vol/vol glycerol, 1 mM TCEP, 100 mM imidazole. Lge1_{180} R>KDylight was mixed with 50% of unlabeled protein (given the enrichment in lysine residues that are labeled) to a final protein concentration of 30 µM. LAF1Dylight was mixed with 30% of unlabeled protein and processed as described (ElbaumGarfinkle et al., 2015) to a final concentration of 8 µM in 25 mM Tris, pH 7.5, 100 mM NaCl, 1 mM DTT. Bleaching was performed in protein condensates incubated for 30 min on pretreated 16well glassbottom slides (Gallego et al., 2020) by applying 20% power of a 50 mW laser 488 for 5 ms. Fluorescent intensity before bleaching was recorded for one frame prior to the bleach. Recovery of the bleach spot (central bleach, peripheral bleach, or whole condensate bleach) was recorded elapsed in time to avoid photobleaching (initially every 7 s, then 14 s, 30 s, and finally 60 s) with total 32 images for 20 min. Intensity traces were corrected for photobleaching. Recovery was calculated as published elsewhere (Taylor et al., 2019), normalized, and fitted to a double exponential function of the form:
Finally, the recovery half times were obtained by the numerical solution of the fitting equation:
using WolframAlpha online (https://www.wolframalpha.com).
Total area of the bleached spot was calculated in ImageJ by relating it to the whole area of the condensate. Fluorescent intensity profiles for whole bleached condensates were acquired in ImageJ. For LAF1, in addition to the double exponential function, several specific single exponent fits (Taylor et al., 2019) were tested in terms of quality of the fit.
Dextran experiments
Experiments were performed as published elsewhere (Gallego et al., 2020). TRITCDextrans with a final concentration of 0.05 mg/ml were added to the samples containing 6HisLge1_{180}StrepII (2 µM) in a final buffer containing 25 mM Tris, 100 mM NaCl, 100 mM imidazole, 1% vol/vol glycerol, and 1 mM DTT, pH 7.5. Samples were incubated for 15 min at 20 °C on pretreated 16well glassbottom ChamberSLIP slides prior to imaging.
Pairwise interaction free energy calculations
Pairwise interaction free energies were calculated using allatom MC simulations according to the previously established framework (Polyansky et al., 2009). MC simulations were carried out in TIP4P water (Jorgensen et al., 1983), methanol, dimethylsulfoxide, and chloroform for the sidechain analogs of tyrosine, arginine, and lysine using OPLS force field (Jorgensen et al., 1996). Initial structures of the molecules were optimized in vacuo using AM1 semiempirical molecular orbital method (Dewar et al., 1985) and placed in rectangular boxes with explicit solvent. All calculations were performed with the BOSS 4.2 program (Jorgensen and TiradoRives, 2005) with periodic boundary conditions in the NPT ensemble at 298 K and 1 bar. Standard procedures were employed including Metropolis criterion and preferential sampling for the solutes (Jorgensen and Ravimohan, 1985). The potential of mean force computations for YY, YK, and YR pairs were performed by gradually moving the solute molecules apart in steps of 0.05 Å along an axis defined by the particular atoms or centers of geometry, while both solute molecules were allowed to rotate around this axis. Nonbonded interactions were truncated with spherical cutoffs of 12 Å. The free energy changes for a particular pair were calculated in a series of consecutive MC simulations using statistical perturbation theory and doublewide sampling (Jorgensen and Ravimohan, 1985). Each simulation consisted of 3×10^{6} configurations used for equilibration, followed by 6×10^{6} configurations used for averaging.
MD simulations
The fulllength structure of Lge1 was modeled de novo using Phyre2 web portal (Kelley et al., 2015). The disordered Nterminal 1–80 aa fragment of Lge1 (Lge1_{180}) in the modeled structure lacked any secondary structure and was used as an initial configuration in further Lge1 simulations. Allatom MD simulations were performed for WT Lge1_{180} fragment as well as for its Y>A and R>K mutants. Singleproteincopy MD simulations were carried out in 9×9×9 nm^{3} water boxes using two independent 1 µs replicas for each of the three Lge1_{180} variants. The same initial configuration (see above) was used for all three. All systems had zero net charge and effective NaCl concentration of 0.1 M. Systems with 24 protein copies were simulated in 18×18×18 nm^{3} (WT, Y>A) or 19×19×19 nm^{3} (R>K) water boxes. Initial configurations for these simulations were generated as follows. Four different protein conformers were selected from the initial 100 ns parts of the two independent singlechain MD simulations (two conformations from each run) based on the criteria of having been the centers of the most highly populated clusters after clustering analysis performed using cluster utility (GROMACS) with the applied RMSD cutoff for backbone atoms of neighboring structures of 1.5 Å. The cells containing four copies were assembled manually and translated six times in different directions, resulting in a protein grid containing 24 protein copies. A total of 1 µs of MD statistics were collected for each large system. All MD simulations and the analysis were performed using GROMACS 5.1.4 package (Abraham et al., 2015) and Amber99SBILDN force field (LindorffLarsen et al., 2010). After initial energy minimization, all systems were solvated in an explicit aqueous solvent using TIP4PD water model (Piana et al., 2015), which was optimized for simulating IDPs. The final NaCl concentration was 0.1 M (Table 1). The solvated systems were again energyminimized and subjected to an MD equilibration of 30,000 steps using a 0.5 fs time step with position restraints applied to all protein atoms (restraining force constants Fx = Fy = Fz = 1000 kJ/mol/nm) and 250,000 steps using a 1 fs time step without any restraints. Finally, production runs were carried out for all systems using a 2 fs time step. A twinrange (10/12 Å) spherical cutoff function was used to truncate van der Waals interactions. Electrostatic interactions were treated using the particlemesh Ewald summation with a real space cutoff 12 and 1.2 Å grid with fourthorder spline interpolation. MD simulations were carried out using 3D periodic boundary conditions in the isothermalisobaric (NPT) ensemble with an isotropic pressure of 1.013 bar and a constant temperature of 310 K. The pressure and temperature were controlled using NoseHoover thermostat (Hoover, 1985) and a ParrinelloRahman barostat (Parrinello and Rahman, 1981) with 0.5 and 10 ps relaxation parameters, respectively, and a compressibility of 4.5×10^{−5} bar^{−1} for the barostat. Protein and solvent molecules were coupled to both thermostat and barostat separately. Bond lengths were constrained using LINCS (Hess et al., 1997).
Radii of gyrations (Rg) of simulated proteins were calculated using GROMACS gyrate utility, respectively. The average number of interaction partners per protein and the detailed statistics of intermolecular contacts were evaluated using GROMACS mindist and pairdist utilities with an applied distance cutoff of 3.5 Å, respectively, while intramolecular contact maps were generated using mdmat utility. Statistical significance of the difference between calculated parameters was evaluated using the Wilcoxon rank sum test with a continuity correction using R package (version 3.2.3). Protein structures were visualized using PyMol (Schrodinger and DeLano, 2020).
Pairwise MD contact statistics and the dynamic interaction mode
Frequencies of pairwise contacts between different positions in protein sequences were collected over the last 0.3 µs of MD trajectories independently for every simulated protein chain. These frequencies can be represented as positionresolved 2D maps or can be collapsed as total interaction preferences at each position along the sequence (1D interaction profile or dynamic interaction mode). Finally, they can be grouped by contact type and converted to pairwise frequencies and enrichments. An enrichment for a pairwise contact AB is calculated as:
where ${f}_{MD}$ is an observed MD frequency of contacts between A and B and ${f}_{exp}$ is the frequency of such contacts expected at random, given the sequence composition of the chain, that is ${f\left(A,B\right)}_{exp}=f\left(A\right)\times f\left(B\right)$ , where $f\left(A\right)$ is the frequency of X in the sequence. Individual 1D interaction profiles were obtained for each simulated protein considering only intramolecular protein contacts (INTRA) or only contacts with partners (INTER). Four individual interaction profiles of proteins having the number of partners corresponding to the average valency in the system over the last 0.3 µs and displaying the highest mutual correlations were used for the determination of the representative INTER mode. Interaction profiles averaged between the individual MD replicas of singlechain simulations were used for the determination of a representative INTRA mode.
Theoretical estimate of Rg
The estimation was done according to the scaling model used in the polymer theory to connect Rg and the chain length of a random coil as follows:
where N is the length of Lge1_{180} ($N$ = 80) and ${R}_{0}$ and $\upsilon $ are the empirical parameters refined for IDPs (${R}_{0}$ = 0.254 nm; $\upsilon $ = 0.522) (Bernadó and Blackledge, 2009).
Cluster analysis
The largest proteinprotein interaction clusters in the 24copy simulated system were identified using hierarchical clustering. For this purpose, minimumdistance matrices were calculated from each MD trajectory sampled at every 100 ps using GROMACS mindist. The clustering was done in MATLAB (R2009) using function cluster with an applied distance cutoff of 3.5 Å.
Entropy calculations
The configurational entropy was evaluated by applying the MIST approximation (King et al., 2012) using the PARENT suite (Fleck et al., 2016), a collection of programs for the computationintensive estimation of configurational entropy by information theoretical approaches on parallel architectures. All MD trajectories were first converted from Cartesian to BAT coordinates. To assess the convergence of configurational entropy (S_{conf}), cumulative plots were generated for single copy systems using a 50 ns time step. Due to the relatively slow convergence of the entropy values (Figure 4—figure supplement 1B), the final entropy calculations were performed for the entire 1 µs trajectories. Note that the absolute S_{conf} values are negative and carry arbitrary units due to the exclusion from the calculations of the constant momentum part of the configurational entropy integrals and are reported just to illustrate the convergence of the entropy values as a function of simulated time. However, upon subtraction of these absolute values (i.e. for single and crowded systems), the relative entropy (ΔS_{conf}) carries correct physical units and is equal to the total configurational entropy change between the two systems. The relative entropies of a protein in singlechain and multichain systems were averaged over all possible combinations of the two singleprotein copies and the 24 crowdedsystem copies. To estimate the effect of mutations on configurational heterogeneity, the corresponding entropy differences were calculated as follows:
where ${N}_{mut}$ and ${N}_{WT}$ represent the numbers of atoms in mutant and WT proteins, respectively. The final values of configurational entropy differences were multiplied by the temperature (T = 310 K) and converted to kcal/mol units.
Estimation of diffusion coefficients and shear viscosity
Diffusion coefficients of individual protein chains were calculated following the procedure described elsewhere (von Bülow et al., 2019), together with viscosity estimation (Hess, 2002), application of corrections for sizedependent effects (Yeh and Hummer, 2004), and rescaling against experimentally comparable values (Fennell et al., 2018).
Thus, translational diffusion coefficients ${D}_{t}^{PBC}$ were extracted for individual molecules by analyzing centerofmass meansquare displacement (MSD) curves, considering that:
for τ approaching infinity. The above equation was fitted in a linear regime of MSD between 20 ns and 40 ns for the 24copy systems (Figure 4—figure supplement 1D), and between 5 ns and 15 ns for the single molecule. As previously suggested (Yeh and Hummer, 2004), the thus obtained diffusion coefficients were corrected for sizedependent effects that arise from periodic boundary conditions (PBC). Applying this correction, the diffusion coefficient ${D}_{t}$ can be determined as:
where L is the edge length of the simulation box, η is the viscosity of the system that the particle is simulated in, and ξ = 2.837297, a term arising from the cubic lattice (Yeh and Hummer, 2004). The latter correction requires estimation of shear viscosity values in the system. For this purpose, short 10 ns NVT MD simulation was performed for each system starting from the last snapshot of 1 µs simulations with detailed output for GROMACS energy file (every 10 fs). Shear viscosities were extracted from these NVT simulations using the GreenKubo formula (Hess, 2002):
where V denotes the volume of the simulation box and ${C}_{ij}\left(t\right)$ is the autocorrelation function:
of the pressure tensor elements $P}_{ij}={P}_{xy},{P}_{xz},{P}_{yz},\frac{{P}_{xx}{P}_{yy}}{2},\frac{{P}_{xx}{P}_{zz}}{2},and\phantom{\rule{thinmathspace}{0ex}}\frac{{P}_{yy}{P}_{zz}}{2$ .
The autocorrelation function was numerically integrated between 0 ps and 1 ps, followed by analytical integration up to infinity. The analytical part of the integral was determined by a double exponential fit of the data between 1 ps and 5 ps (Figure 4—figure supplement 1E):
Shear viscosity η was then determined by averaging over the η_{ij} of the evaluated pressure tensor elements. Finally, the corrected diffusion coefficients were rescaled by the ratio of the simulated and experimentally determined water viscosities (Fennell et al., 2018):
For the experimental water viscosity value at 310 K and 0.1 M salt ${\eta}_{expt}$ of 0.69 mPa·s (Fennell et al., 2018) was used. The simulated value of viscosity ($\eta}_{sim$ = 0.83 mPa·s) in TIP4PD water box at the same conditions was obtained previously by us (in preparation) using a series of 100 ns NVT MD simulations for cubic boxes of different size (3, 4, and 5 nm).
MSD curves for complete 1 µs MD trajectories or only their last 0.3 µs fragments were calculated using msd utility from the GROMACS package. Pressure tensors were obtained using energy utility from the GROMACS package for the analysis of NVT simulations. Viscosities were calculated as described above using MATLAB (R2009) scripts written specifically for this purpose.
Modeling and visualization of condensates topology
To generate a visual representation of a condensate with a fractal dimension obtained by the selfpropagation model (see Appendix 1), FracVAL algorithm was used (Morán et al., 2019). FracVAL is a tunable algorithm for generation of fractal structures of aggregates of polydisperse primary particles, which preserves the predefined fractal dimension (${d}_{f}$) and the fractal prefactor (${k}_{f}$) to generate aggregates of desired size. The scaling law in this case can be defined as shown in Equation 14. The prefactor ${k}_{f}$ is equal to 1 in the present model (see Appendix 1). The ${d}_{f}$ values for generating cluster models in FracVAL were calculated according to Equation 10 using the averaged φ and $n$ values over the last 0.3 µs of the 24copy MD simulations, while <Rg_{MD}> values over the last 0.3 µs were taken as effective sizes of primary particles (detailed parameters are listed in Figure 5—figure supplement 2). For visualization purposes, the size of condensates generated by FracVAL was defined as 1024 molecules. FracVAL cluster models were transformed to allatom resolution by using selected protein MD conformations with the representative Rg values corresponding to the respective <Rg_{MD}> and scripts specially written for this purpose. The obtained coarsegrained and allatom structures were visualized using PyMol (Schrodinger and DeLano, 2020).
Appendix 1
Fractal model of condensate assembly
The number of molecules in a cluster at iteration $i$ can be calculated according to a simple geometrical progression:
where $n$ is the valency of interactions or more generally – the coordination number. An apparent volume $v$ of a single molecule, whereby atoms occupy the fraction $\phi $ of that volume, can be expressed as:
where ${V}_{mol}$ is the molecular volume, which in turn is proportional to the molecular weight (${M}_{W}$) with a factor $\kappa $ ($\kappa =1.21$ used for all calculations Harpaz et al., 1994):
Thus, combination of Equations 1 and 2 allows to estimate an apparent volume of condensate (${V}_{i}$) at iteration $i$:
Under the assumption of a spherical geometry, a characteristic size of the condensate at iteration $i$ (${R}_{i}$) can be derived from Equation 4:
An effective molar concentration (${C}_{i}$) of molecules in a condensate at iteration $i$ then can be estimated as:
where, ${N}_{a}$ is the Avogadro number. A combination of Equations 5 and 6 gives the volume fraction as a function of valency for a condensate with a given concentration ($C$) and size ($R$):
The above formalism allows one to extract the characteristic parameters $\phi $ and $n$ from the linear regression of an empirical log R vs. log M plot. First, the mass ${M}_{i}$ of a condensate at iteration $i$ is given as:
A combination of Equations 5 and 8 then gives analytical expressions for a slope ($A$) and an intercept ($B$) for the linear regression plot:
Finally, a combination of Equations 10 and 11 gives equations for $\phi $ and $n$ using the slope A and the intercept B for the linear regression (9):
Note that A, also known as the ‘fractal dimension’ (${d}_{f}$) (Carpineti and Giglio, 1992), is the key parameter describing the condensate organization. The scaling law in this case is typically defined as:
where $N$ is the number of molecules in each cluster with the corresponding size $R$, while $R}_{g$ is an effective size of an individual molecule, and ${k}_{f}$ is the fractal prefactor. In the present model of condensate assembly, ${k}_{f}$ is equal to 1, as proven below. Thus, according to Equation 14 the number of molecules for a cluster obtained at iteration $i$ can be calculated as:
where ${d}_{f}$ is a slope ($A$) for log R vs. log M linear regression and is defined in Equation 10. A combination of Equations 1 and 5 gives:
then simplification leads to:
and by using Equation 13 to substitute $n+1$ one derives:
Data availability
All data generated or analysed during this study are included in the manuscript and supporting files (Supplementary Files 1 and 2); source data files have been provided for Figure 2 (Figure 2 —source data 1), Figure 1—figure supplement 1 (Figure 1—figure supplement 1—source data 2), Figure 1—figure supplement 2 (Figure 1—figure supplement 2—source data 1), Figure 5—figure supplement 2 (Figure 5—figure supplement 2—source data 1); compressed folders containing source data files have been provided for Figure 1 (Figure 1 —source data 1), Figure 2 (Figure 2 —source data 2), Figure 3 (Figure 3 —source data 1), Figure 4 (Figure 4 —source data 1), Figure 5 (Figure 5 —source data 1), Figure 6 (Figure 6 —source data 1), Figure 1—figure supplement 1 (Figure 1—figure supplement 1—source data 1), Figure 2—figure supplement 1 (Figure 2—figure supplement 1—source data 1), Figure 3—figure supplement 1 (Figure 3—figure supplement 1—source data 1), Figure 4—figure supplement 1 (Figure 4—figure supplement 1—source data 1), Figure 6—figure supplement 1 (Figure 6—figure supplement 1—source data 1). These source files contain the numerical data used to generate the figures.
References

LiquidLiquid Phase Separation in DiseaseAnnual Review of Genetics 53:171–194.https://doi.org/10.1146/annurevgenet112618043527

Biomolecular condensates at the nexus of cellular stress, protein aggregation disease and ageingNature Reviews. Molecular Cell Biology 22:196–213.https://doi.org/10.1038/s41580020003266

Association of Biomolecular Systems via Pulsed Field Gradient NMR SelfDiffusion MeasurementsJournal of the American Chemical Society 117:7566–7567.https://doi.org/10.1021/ja00133a039

Biomolecular condensates: organizers of cellular biochemistryNature Reviews. Molecular Cell Biology 18:285–298.https://doi.org/10.1038/nrm.2017.7

Simulation of FUS Protein Condensates with an Adapted CoarseGrained ModelJournal of Chemical Theory and Computation 17:525–537.https://doi.org/10.1021/acs.jctc.0c01064

Protein Phase Separation: A New Phase in Cell BiologyTrends in Cell Biology 28:420–435.https://doi.org/10.1016/j.tcb.2018.02.004

Polymer physics of intracellular phase transitionsNature Physics 11:899–904.https://doi.org/10.1038/nphys3532

Spinodaltype dynamics in fractal aggregation of colloidal clustersPhysical Review Letters 68:3327–3330.https://doi.org/10.1103/PhysRevLett.68.3327

Physical Principles Underlying the Complex Biology of Intracellular Phase TransitionsAnnual Review of Biophysics 49:107–133.https://doi.org/10.1146/annurevbiophys121219081629

Solventinduced lysozyme gels: rheology, fractal analysis, and solgel kineticsJournal of Colloid and Interface Science 289:394–401.https://doi.org/10.1016/j.jcis.2005.04.026

Dependence of Binding Free Energies between RNA Nucleobases and Protein Side Chains on Local Dielectric PropertiesJournal of Chemical Theory and Computation 13:4504–4513.https://doi.org/10.1021/acs.jctc.6b01202

Development and use of quantum mechanical molecular modelsJournal of the American Chemical Society 107:3902–3909.https://doi.org/10.1021/ja00299a024

Biomolecular simulation: A computational microscope for molecular biologyAnnual Review of Biophysics 41:429–452.https://doi.org/10.1146/annurevbiophys042910155245

Protein Abundance Biases the Amino Acid Composition of Disordered Regions to Minimize Nonfunctional InteractionsJournal of Molecular Biology 431:4978–4992.https://doi.org/10.1016/j.jmb.2019.08.008

Formation of biological condensates via phase separation: Characteristics, analytical methods, and physiological implicationsThe Journal of Biological Chemistry 294:14823–14835.https://doi.org/10.1074/jbc.REV119.007895

Computational Signaling Protein Dynamics and Geometric Mass Relations in Biomolecular DiffusionThe Journal of Physical Chemistry. B 122:5599–5609.https://doi.org/10.1021/acs.jpcb.7b11846

Tunable multiphase dynamics of arginine and lysine liquid condensatesNature Communications 11:4628.https://doi.org/10.1038/s4146702018224y

PARENT: A Parallel Software Suite for the Calculation of Configurational Entropy in Biomolecular SystemsJournal of Chemical Theory and Computation 12:2055–2065.https://doi.org/10.1021/acs.jctc.5b01217

SelfConsistent Framework Connecting Experimental Proxies of Protein Dynamics with Configurational EntropyJournal of Chemical Theory and Computation 14:3796–3810.https://doi.org/10.1021/acs.jctc.8b00100

Longrange correlations in smokeparticle aggregatesJournal of Physics A 12:L109–L117.https://doi.org/10.1088/03054470/12/5/008

Uncovering Differences in Hydration Free Energies and Structures for Model Compound Mimics of Charged Side Chains of Amino AcidsThe Journal of Physical Chemistry. B 125:4148–4161.https://doi.org/10.1021/acs.jpcb.1c01073

Fractal Analysis of Aggregates Formed by Heating Dilute BSA Solutions Using Light Scattering MethodsBioscience, Biotechnology, and Biochemistry 60:1757–1763.https://doi.org/10.1271/bbb.60.1757

Diffusion in disordered mediaAdvances in Physics 36:695–798.https://doi.org/10.1080/00018738700101072

LINCS: A linear constraint solver for molecular simulationsJournal of Computational Chemistry 18:1463–1472.https://doi.org/10.1002/(SICI)1096987X(199709)18:12<1463::AIDJCC4>3.0.CO;2H

Determining the shear viscosity of model liquids from molecular dynamics simulationsThe Journal of Chemical Physics 116:209–217.https://doi.org/10.1063/1.1421362

Canonical dynamics: Equilibrium phasespace distributionsPhysical Review. A, General Physics 31:1695–1697.https://doi.org/10.1103/physreva.31.1695

Comparison of simple potential functions for simulating liquid waterThe Journal of Chemical Physics 79:926–935.https://doi.org/10.1063/1.445869

Monte Carlo simulation of differences in free energies of hydrationThe Journal of Chemical Physics 83:3050–3054.https://doi.org/10.1063/1.449208

Development and Testing of the OPLS AllAtom Force Field on Conformational Energetics and Properties of Organic LiquidsJournal of the American Chemical Society 118:11225–11236.https://doi.org/10.1021/ja9621760

Molecular modeling of organic and biomolecular systems using BOSS and MCPROJournal of Computational Chemistry 26:1689–1700.https://doi.org/10.1002/jcc.20297

On the Fractal dimension and correlations in percolation theoryJournal of Statistical Physics 36:807–814.https://doi.org/10.1007/BF01012940

Dynamic Light Scattering for the Characterization of Polydisperse Fractal Systems: I. Simulation of the Diffusional BehaviorParticle & Particle Systems Characterization 25:9–18.https://doi.org/10.1002/ppsc.200700004

LiquidLiquid Phase Separation: A Widespread and Versatile Way to Organize Aqueous SolutionsThe Journal of Physical Chemistry Letters 12:10994–10995.https://doi.org/10.1021/acs.jpclett.1c03352

The Phyre2 web portal for protein modeling, prediction and analysisNature Protocols 10:845–858.https://doi.org/10.1038/nprot.2015.053

Gelation of whey protein fractal aggregates induced by the interplay between added HCl, CaCl2 and NaClInternational Dairy Journal 111:104824.https://doi.org/10.1016/j.idairyj.2020.104824

Fractal selfassembly and aggregation of human amylinSoft Matter 16:3143–3153.https://doi.org/10.1039/c9sm02463h

Efficient calculation of molecular configurational entropies using an information theoretic approximationThe Journal of Physical Chemistry. B 116:2891–2904.https://doi.org/10.1021/jp2068123

Phase separation of DNA: From past to presentBiophysical Journal 120:1139–1149.https://doi.org/10.1016/j.bpj.2021.01.033

BookTheory of scattering from colloidal aggregatesIn: Zulauf M, Lindner P, Terech P, editors. Trends in Colloid and Interface Science IV. Steinkopff. pp. 161–168.https://doi.org/10.1007/BFb0115545

The nucleolus as a multiphase liquid condensateNature Reviews. Molecular Cell Biology 22:165–182.https://doi.org/10.1038/s4158002002726

Fractallike structures in colloid scienceAdvances in Colloid and Interface Science 235:1–13.https://doi.org/10.1016/j.cis.2016.05.002

Membrane lipid phase transitions and phase organization studied by Fourier transform infrared spectroscopyBiochimica et Biophysica Acta 1828:2347–2358.https://doi.org/10.1016/j.bbamem.2012.10.018

Modeling the structure and interactions of intrinsically disordered peptides with multiple replica, metadynamicsbased sampling methods and forcefield combinationsJournal of Chemical Theory and Computation 18:1915–1928.https://doi.org/10.1021/acs.jctc.1c00889

Evaluating phase separation in live cells: diagnosis, caveats, and functional consequencesGenes & Development 33:1619–1634.https://doi.org/10.1101/gad.331520.119

Phase separation in biology; functional organization of a higher orderCell Communication and Signaling 14:1.https://doi.org/10.1186/s1296401501257

Molecular interactions underlying liquidliquid phase separation of the FUS lowcomplexity domainNature Structural & Molecular Biology 26:637–648.https://doi.org/10.1038/s415940190250x

Fragmentation of amyloid fibrils occurs in preferential positions depending on the environmental conditionsThe Journal of Physical Chemistry. B 119:4644–4652.https://doi.org/10.1021/acs.jpcb.5b01160

Unraveling molecular interactions in liquidliquid phase separation of disordered proteins by atomistic simulationsThe Journal of Physical Chemistry. B 124:9009–9016.https://doi.org/10.1021/acs.jpcb.0c06288

Arginine multivalency stabilizes protein/RNA condensatesProtein Science 30:1418–1426.https://doi.org/10.1002/pro.4109

Phase transitions of associative biomacromoleculesChemical Reviews 123:8945–8987.https://doi.org/10.1021/acs.chemrev.2c00814

Polymorphic transitions in single crystals: A new molecular dynamics methodJournal of Applied Physics 52:7182–7190.https://doi.org/10.1063/1.328693

Water dispersion interactions strongly influence simulated structural properties of disordered protein statesThe Journal of Physical Chemistry. B 119:5113–5123.https://doi.org/10.1021/jp508971m

Adaptation of a membraneactive peptide to heterogeneous environment. I. Structural plasticity of the peptideThe Journal of Physical Chemistry. B 113:1107–1119.https://doi.org/10.1021/jp803640e

Protein network structure enables switching between liquid and gel statesJournal of the American Chemical Society 142:874–883.https://doi.org/10.1021/jacs.9b10066

Terminology of polymers and polymerization processes in dispersed systems (IUPAC Recommendations 2011)Pure and Applied Chemistry 83:2229–2259.https://doi.org/10.1351/PACREC100603

FRAP analysis of binding: proper and fittingTrends in Cell Biology 15:84–91.https://doi.org/10.1016/j.tcb.2004.12.001

Application of fractal concepts to polymer statistics and to anomalous transport in randomly porous mediaJournal of Statistical Physics 36:843–860.https://doi.org/10.1007/BF01012944

BookGelation and critical phenomenaIn: Dušek K, editors. Polymer Networks. Berlin, Heidelberg: Springer. pp. 103–158.https://doi.org/10.1007/3540114718_4

A clustercluster aggregation model with tunable fractal dimensionJournal of Physics A 27:2953–2963.https://doi.org/10.1088/03054470/27/9/012

Systemsize dependence of diffusion coefficients and viscosities from molecular dynamics simulations with periodic boundary conditionsThe Journal of Physical Chemistry B 108:15873–15879.https://doi.org/10.1021/jp0477147

Molecular details of protein condensates probed by microsecond long atomistic simulationsThe Journal of Physical Chemistry. B 124:11671–11679.https://doi.org/10.1021/acs.jpcb.0c10489
Decision letter

Rohit V PappuReviewing Editor; Washington University in St. Louis, United States

Aleksandra M WalczakSenior Editor; CNRS, France
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Protein compactness and interaction valency define the architecture of a biomolecular condensate across scales" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Rohit Pappu as Reviewing Editor and José FaraldoGómez as Senior Editor. The reviewers have opted to remain anonymous.
The Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
1) Many of the experimental details need careful scrutiny. Both reviewers raise specific concerns and make specific requests. Please respond with all the details that the reviewers are requesting.
2) Both reviewers are concerned about the rather sweeping generalizations and statements made that do not square with the state of the art. First and foremost, it is now clear that the phase transitions in question are not purely segregative LLPS type phenomena. That the field has overused this term and done so without a care is not a good enough reason to perpetuate the false notion of a pure LLPS type behavior. One of the earlier studies demonstrating the coupling of segregative transitions viz., phase separation, and associative transitions, viz., percolation, was published in eLife and elsewhere. Please see: https://elifesciences.org/articles/30294 and http://iopscience.iop.org/article/10.1088/13672630/aab8d9. Given the considerable progress made by several labs, and especially those of Mittag and Pappu on the topic of IDR phase transitions, it is imperative that the motivation for the current work not be that LLPS is "poorly understood" (see comments by Reviewer 2).
3) Please provide a coherent motivation/justification for coopting concepts such as the fractal analysis from colloidal chemistry for the analysis of the simulations. Please note that this is not the first time fractal analyses have been brought to bear in studying phase separating IDRs. They've been previously deployed in studies of ultracoarse grained simulations of mimics of the exon 1 encoded region of huntingtin (http://www.sciencedirect.com/science/article/pii/S0006349514007371).
4) The issues of convergence, statistical robustness, and finite size effects need careful consideration. How do the images interact with one another in the dilute and dense phase simulations. For simple liquids, even the earliest simulations deployed ca. 100 molecules for querying properties of neat liquids. How then does one justify the use of 24 copies for a complex fluid? In latticebased simulations, the effects of finite size were systematically analyzed, the inference was that one needs at least 100+ molecules to get to coherent descriptions of two coexisting phases. The current work does not simulate coexisting phases but approaches each phase separately. So, fewer molecules are reasonable, but it must still be the case that a reasonably rigorous assessment of finite size effects is provided.
5) Regarding the differences between Arg and Lys, the sources of differences are intrinsic and context dependent. As Reviewer 2 notes, this is not as enigmatic as the authors note. Please see recent contributions that have demonstrated clear differences of Arg vs. Lys as drivers of speckle formation (https://doi.org/10.1016/j.molcel.2020.01.025), the realization that Arg and Lys are very different in terms of their intrinsic free energies of hydration (https://pubs.acs.org/doi/abs/10.1021/acs.jpcb.1c01073), and that these differences contribute directly to the cationspecificity of IDP conformational ensembles (https://www.pnas.org/doi/full/10.1073/pnas.2200559119). These physical principles and the distinction of Arg being sticky vs. Lys being nonsticky appear to also contribute to relative abundance and amino acid compositions of IDRs (https://doi.org/10.1016/j.jmb.2019.08.008).
6) The size and shape analyses (please see comments of Reviewer 1) need a lot of work and thought. It would help to probe these effects at higher resolution and greater precision.
7) Finally, the asymmetry between interactions that determine single chain dimensions vs. collective phase behavior is puzzling, as noted by Reviewer 2. Please see why this is physically unexpected for simple systems (https://www.sciencedirect.com/science/article/pii/S0006349520304884) and how an asymmetry can arise as discussed recently for prionlike low complexity domains (https://doi.org/10.1038/s4155702100840w). Are the principles uncovered by Bremer et al., operative with the specific IDR studied here? If not, how is the symmetry broken?
Reviewer #1 (Recommendations for the authors):
– The manuscript will be strengthened by improving the experimental part. The phase diagrams are not "phase diagrams" in a true sense, they are solubility diagrams or state diagrams of the protein. Phase diagrams are characterized by binodal and tielines, which were not measured here. Please see the measured phase diagrams reported in Martin et al. (ref # 18). doi:10.1126/science.aaw8653.
– The use of BF microscopy to distinguish the "morphology" of the condensed phase provides shallow insight into the morphology and dynamics of these assemblies.
– The phaseseparated condensates are considered network fluids where percolation and phase separation goes handinhand. A discussion on this should be included in the current manuscript and should include how different variants that the authors studied show distinct percolation behavior. Please see https://doi.org/10.1016/j.molcel.2022.05.018.
– The idea that comes across from reading this manuscript is that IDPs/IDRs are the main driver of phase separation of proteins. This may not be true. It is fine to study an isolated IDR and its phase behavior, but one needs to acknowledge the fulllength protein may display a more nuanced behavior through a combination of its IDR and other domains.
– The colloidal cluster formalism, while interesting, should be compared with experimentally determined observables, as the authors point out. Without such data, I am unsure how one can conclude that this formalism provides "a potentially universal foundation" in studying the phase separation of proteins.
– Interactions and solubility of "stickers" and "spacers" have been recently studied by Mittag, Pappu, and coworkers. Such discussions would be helpful to include here since the authors focus on a similar set of residues (R, G, Y, K). Please see https://www.nature.com/articles/s4155702100840w
https://doi.org/10.7554/eLife.80038.sa1Author response
Essential revisions:
1) Many of the experimental details need careful scrutiny. Both reviewers raise specific concerns and make specific requests. Please respond with all the details that the reviewers are requesting.
Following the suggestions of the Editor and the Reviewers, we have significantly extended the experimental part of the study, including the analysis of sample purity (Figure 1—figure supplement 1A), FRAPbased estimation of condensate dynamic properties (Figure 1D, E and Figure 1—figure supplement 2), analysis of fusion behavior (Videos 1, 2) and circularity analysis of condensate shapes (Figure 1—figure supplement 1C). We have also carried out additional analysis of the previously reported experiments (Figure 1—figure supplement 1A, B and C). Moreover, we have significantly extended the computational/theoretical part of the manuscript, including the analysis of simulation convergence and finitesize effects (Supplementary File 1 “Technical summary”, Figure 4—figure supplement 1C), estimation of translational diffusion coefficients and viscosity (Figures 4C, Figure 4—figure supplement 1DF, Supplementary File 2) and evaluation of contact probabilities and comparison with percolation theory (Figure 2—figure supplement 1C). Please, find further details below.
2) Both reviewers are concerned about the rather sweeping generalizations and statements made that do not square with the state of the art. First and foremost, it is now clear that the phase transitions in question are not purely segregative LLPS type phenomena. That the field has overused this term and done so without a care is not a good enough reason to perpetuate the false notion of a pure LLPS type behavior. One of the earlier studies demonstrating the coupling of segregative transitions viz., phase separation, and associative transitions, viz., percolation, was published in eLife and elsewhere. Please see: https://elifesciences.org/articles/30294 and http://iopscience.iop.org/article/10.1088/13672630/aab8d9. Given the considerable progress made by several labs, and especially those of Mittag and Pappu on the topic of IDR phase transitions, it is imperative that the motivation for the current work not be that LLPS is "poorly understood" (see comments by Reviewer 2).
We thank the Editor and the Reviewers for the detailed, constructive and candid suggestions and criticism. In response, we have extensively revised the manuscript to reflect the current state of knowledge regarding the formation of biomolecular condensates. We fully agree with the Editor that the concept of purely segregative LLPS has been misused in the condensate literature and have paid particular attention in the revision to properly contextualize our work and cite the relevant literature, including the works mentioned by the Editor. Importantly, as further discussed below and in the revised manuscript, the idea of phase separation coupled to percolation (PSCP) provides a natural connection with the notion of fractal scaling, widely observed in different colloidal systems and explored in our manuscript in order to understand the spatial organization of IDP clusters. In combination with percolation theory, the fractal model provides a predictive structural and quantitative perspective on condensate formation and the polydisperse nature of IDRs selfassociation, as further discussed below. It is precisely this connection between the condensate field and colloidal chemistry that we feel is underexplored and that our work attempts to contribute to. In the revised text, we have articulated these ideas in more detail on pp. 45 and 1819.
3) Please provide a coherent motivation/justification for coopting concepts such as the fractal analysis from colloidal chemistry for the analysis of the simulations. Please note that this is not the first time fractal analyses have been brought to bear in studying phase separating IDRs. They've been previously deployed in studies of ultracoarse grained simulations of mimics of the exon 1 encoded region of huntingtin (http://www.sciencedirect.com/science/article/pii/S0006349514007371).
The principal motivation behind our work comes from the known universality of fractal behavior in colloidal systems. Given the close parallels between biomolecular condensates and colloids in many respects, it is natural to critically explore theoretical frameworks that have proven their worth in the latter case. In particular, while fractal scaling has been used to describe the multiscale structure in different colloids, ours is, to the best of our knowledge, the first application of such a formalism to describe the spatial organization of a condensate at an arbitrary scale starting from the atomistic simulations performed on the scale of tens of nanometers. We have discussed these points in the revised manuscript on p. 5. We also thank the Editor for pointing out the above reference, which we now cite and discuss in the revised manuscript. Indeed, the fractal model has been used to interpret the results of coarsegrained simulations and characterize the aggregation of an intrinsically disordered huntingtin fragment. While in the cited article evidence of fractal behavior was found for a subset of scattering vectors, no highresolution structural details could be provided, primarily because the simulations were coarsegrained. As a complement and an extension of these efforts, our model starts with the atomistic picture and provides a direct prediction of the features of the spatial organization of Lge1_{180} condensates at an arbitrary length scale and the impact of different mutations. Of note, the ability of IDRs to form lowdimensional fractal structures upon the disruption of their LLPS tendency by a polyalanine insertion was recently demonstrated experimentally for synthetic elastinlike polypeptides (https://doi.org/10.1038/s4156301801826), as cited in the revised manuscript.
4) The issues of convergence, statistical robustness, and finite size effects need careful consideration. How do the images interact with one another in the dilute and dense phase simulations. For simple liquids, even the earliest simulations deployed ca. 100 molecules for querying properties of neat liquids. How then does one justify the use of 24 copies for a complex fluid? In latticebased simulations, the effects of finite size were systematically analyzed, the inference was that one needs at least 100+ molecules to get to coherent descriptions of two coexisting phases. The current work does not simulate coexisting phases but approaches each phase separately. So, fewer molecules are reasonable, but it must still be the case that a reasonably rigorous assessment of finite size effects is provided.
We thank the Editor for this highly relevant comment. Indeed, we did not attempt to capture the process of phase separation or characterize two coexisting phases, for which much larger ensembles would be needed. Rather, our aim was to study the conformational behavior of individual protein chains in the context of a crowded protein mixture, taken as a model for the dense phase, and then use fractal scaling to provide a model of spatial organization of a condensate at an arbitrary length scale. Having said this, it is absolutely important to address how converged the key observables are, given the finite size of the allatom simulation setup and the limited sampling used. In the revised manuscript, we have included an additional analysis of convergence of our simulations and could show that both key MDderived parameters required by the fractal model, protein compactness and valency, display convergent behavior over the last 0.3 µs MD in the 24copy systems (Supplementary File 1, Figure 4—figure supplement 1C). For example, the block averages of compactness and valency exhibit a standard deviation of only 24% and 48%, respectively, over the last 0.3 µs of MD simulations. Moreover, it should be emphasized that we are interested in singlechain features in the context of a crowded mixture and, thus, our sampling over this range corresponds effectively to 24 x 0.3 µs = 7.2 µs. Finally, a detailed analysis of convergence in conformational sampling was performed for singlecopy simulations using calculations of configurational entropy as evaluated by the MIST formalism (Figure 4—figure supplement 1B). Using this measure in the case of the weakly selfinteracting Y>A, we indeed do observe a close convergence between two independent replicas over 1µs trajectories. However, we still recognize the possibility that with longer simulation times and/or more protein copies per simulation, the simulated systems may show a qualitatively different behavior, as discussed on p. 1213 of the revised manuscript.
Regarding the interaction between simulation images, we should first point out that we employ Ewald summation for the treatment of longrange electrostatics, an approach which by default includes an interaction between every particle and all of its periodicboundarycondition (PBC) images. We have also directly analyzed direct vanderWaals contacts between simulation images for each protein in our systems (Supplementary File 1). Importantly, in all 24copy systems, the average separation between images of protein atoms lies consistently in the 1215 nm interval and no direct contacts between PBC images are observed. Such large distances also mitigate potential spurious effects due to the usage of Ewald summation. In singlecopy WT simulations, the average distance between images in both cases remains above 4 nm and not a single direct contact between images is detected. For the less compact variants (both replicas of Y>A and 1 replica of R>K), the average distance is above 3.6 nm (Supplementary File 1) with a small frequency of transient contacts between the images (less than 0.2 %).
In order to further analyze how realistic our simulations are, we have carried out a detailed analysis of protein translational diffusion and viscosity in the simulations (Figure 4C and Figure 4—figure supplement 1DF, details in Methods). Our analysis shows signatures of realistic diffusive dynamics in the modeled allatom systems. Specifically, singlemolecule translational diffusion coefficients of Lge1_{180} variants obtained from fitting of MSD curves with an applied finitesize PBC correction and solvent viscosity rescaling (see Methods for details) are ~120 µm^{2}/s for singlecopy simulations or between 100150 µm^{2}/s for 24copy simulations and different Lge1_{180} variants (Figure 4C and Supplementary File 2). This corresponds well to the experimentally measured values for proteins of similar size. For instance, the diffusion constant of the similarly sized ubiquitin (Rg = 1.32 nm, 76 aa, 8.6 kDa) at the concentration of 8.6 mg/ml is 149 µs^{2}/s (https://doi.org/10.1038/s41467021211819), while that of GFP (Rg = 2.8 nm, 238 aa, 27 kDa) at the concentration of 0.53 mg/ml is ~90 µs^{2}/s (https://doi.org/10.1038/ncomms5494) Interestingly, the obtained viscosity values (see Methods for details) in singlechain simulations of the three Lge1_{180} variants (effective concentration of 2.3 mg/ml) are all similar to each other and are close to the reported solvent viscosity for TIP4PD water/0.1 M NaCl of 0.83 mPa*s (Figure 4—figure supplement 1F and Supplementary File 2). In the crowded 24copy systems (effective concentration of 67 mg/ml), the viscosity systematically increases by about 20 % and is again similar for all three Lge1_{180} variants (Figure 4—figure supplement 1F and Supplementary File 2). This reflects the experimental trend that viscosity of protein solutions depends primarily on protein concentration. Importantly, the calculated values correspond well to the experimentally obtained values for e. g. serum albumin in this concentration range (~ 1.1 mPa*s for 10 mg/ml, https://doi.org/10.1039/C5RA21068B).
Note that diffusion coefficients can be more accurately defined in the 24copy systems than in the singlecopy ones. This is due to both, the limited system size and the limitations of MD sampling. For instance, the average diffusion coefficient obtained on the complete MD trajectories for the 24copy systems relies on 24 µs of MD statistics in total vs. just 2 µs for the single copies. At the same time, finitesize corrections in the case of 24copy systems are relatively small (3560 %), while they are almost an order of magnitude higher (450530 %) for the singlecopy ones (Supplementary File 2). Thus, the sizerelated effects are found to be less dramatic for the modeled 24copy systems.
5) Regarding the differences between Arg and Lys, the sources of differences are intrinsic and context dependent. As Reviewer 2 notes, this is not as enigmatic as the authors note. Please see recent contributions that have demonstrated clear differences of Arg vs. Lys as drivers of speckle formation (https://doi.org/10.1016/j.molcel.2020.01.025), the realization that Arg and Lys are very different in terms of their intrinsic free energies of hydration (https://pubs.acs.org/doi/abs/10.1021/acs.jpcb.1c01073), and that these differences contribute directly to the cationspecificity of IDP conformational ensembles (https://www.pnas.org/doi/full/10.1073/pnas.2200559119). These physical principles and the distinction of Arg being sticky vs. Lys being nonsticky appear to also contribute to relative abundance and amino acid compositions of IDRs (https://doi.org/10.1016/j.jmb.2019.08.008).
We thank the Editor and the Reviewer 2 for bringing up these studies, which we now cite and discuss in the revised manuscript on pp. 68. We should also emphasize that we never intended to claim that the difference between Arg and Lys is generally enigmatic and poorly understood. Namely, the sentence that the Editor and the Reviewer 2 refer to (“Therefore, the effect of R>K substitution on LLPS should be further explored in the context of proteinprotein interactions.”) was poorly phrased on our part and was only meant in relation to the present study and not in relation to the wider literature on the topic. We simply wanted to refer to the fact that the binding free energies for individual residues do not provide sufficient information about interactions between protein chains. Following the comments of the Editor and Reviewer 2 and to improve clarity, we have rephrased this part and included and discussed additional references (p. 8). Importantly, in agreement with these and other studies (https://doi.org/10.1073/pnas.200022311, https://doi.org/10.1038/s4146702018224y, https://doi.org/10.1038/s41467022350011, https://doi.org/10.1002/pro.4109), we do see a significant difference in condensate behavior for the R>K mutant in both simulation and experiment. A direct analysis of contact statistics reveals that RY is the most dominant type of intermolecular contacts in the crowded mixture of Lge1_{180} (see updated statistics in Figure 2—figure supplement 1A and B) and, together with YY, may be the driver of condensate formation in Lge1_{180} , in agreement with previous observations by Bremer and coworkers (https://doi.org/10.1038/s4155702100840w). Expectedly, the contribution of YY to proteinprotein interactions is substantially increased in crowded mixtures of the R>K mutant. Finally, in contrast to the majority of other studies, which address the general properties of individual amino acids, here we present aminoacid/aminoacid interaction propensities as a function of the polarity of the environment (Figure 1—figure supplement 1D). This complements the recent results by Krainer et al. (https://doi.org/10.1038/s41467021211819) and provides a quantitative foundation for analyzing the role of cationpi interactions in condensate formation.
6) The size and shape analyses (please see comments of Reviewer 1) need a lot of work and thought. It would help to probe these effects at higher resolution and greater precision.
Following the comments of the Editor and Reviewer 1, we have purified and fluorescently labelled different constructs (Figure 1—figure supplement 1B) and have used them to characterize and compare different microscopic structures in solution and compare WT vs. R>K and Y>A mutants. Furthermore, we have used circularity analysis to quantify the shape of condensates (Figure 1—figure supplement 1C, details in Methods). In agreement with the predictions of our model, we observe a reduction of the propensity to form condensates in the Y>A mutant. Notably, we have observed amorphous precipitates at high protein concentration of this mutant (45 mM and above), but their material properties (and possible influence of sample impurities) remain unclear. Moreover, due to their sporadic nature, one lacks proper statistics for an adequate quantitative analysis. Hence, we refrain from commenting on these infrequently observed precipitates in the revised manuscript. For further details, please see our replies to the comments of Reviewer 1 below.
7) Finally, the asymmetry between interactions that determine single chain dimensions vs. collective phase behavior is puzzling, as noted by Reviewer 2. Please see why this is physically unexpected for simple systems (https://www.sciencedirect.com/science/article/pii/S0006349520304884) and how an asymmetry can arise as discussed recently for prionlike low complexity domains (https://doi.org/10.1038/s4155702100840w). Are the principles uncovered by Bremer et al., operative with the specific IDR studied here? If not, how is the symmetry broken?
We thank the Editor and the Reviewer 2 for pointing out these relevant studies, which we now cite and discuss. As all constructs in our study have the same net charge, we were not able to analyze the coupling between singlechain and phase behavior with respect to changes in net charge per residue (NCPR) as in the aforementioned studies, and we see this as an important area for future investigation. Having said this, we have compared in detail the sequence composition of Lge1_{180} with that of A1LCD variants studied by Bremer et al. (https://doi.org/10.1038/s4155702100840w) and indeed the principles uncovered by these authors shed light on the asymmetry between singlechain and collective phase behavior of Lge1_{180}. In particular, when it comes to aromatic composition, Lge1 is most similar to the 12F+12Y mutant of A1LCD, and by this token, i.e. the high frequency of sticker tyrosines, should exhibit a strong coupling between singlechain and phase behavior. However, the Lge1_{180} NCPR of 0.075 is greater than that of A1LCD (0.059) and this could contribute to the extent of decoupling as suggested by Bremer et al. Moreover, Lge1 is extremely abundant in Arg (13.5 % as compared to 7.4 % in A1LCD), and in terms of Arg and Lys abundance is most similar to +7R A1LCD mutant, which showed the greatest degree of decoupling between singlechain and phase behavior in Bremer et al., in agreement with what we see here. While these authors have shown that NCPR is the primary determinant of the extent of decoupling in the case of A1LCD mutants, including the A1LCD +7R, their analysis showed that the nature of positive and negative residues involved also makes a significant difference. In particular, the significant excess of Arg residues, as a contextdependent auxiliary sticker, could create the asymmetry between interactions that determine single chain dimensions vs. collective phase behavior.
Furthermore, Martin et al. (https://doi.org/10.1126/science.aaw8653) have shown that an approximately uniform distributions of stickers along the sequence is required for the correspondence between the driving forces behind coiltoglobule transitions and phase separation to hold. We have analyzed the patterning of Tyr residues along the Lge1_{180} sequence using W_{aro} parameter used by Martin et al. (note that Tyr is the only aromatic in the Lge1_{180} sequence). Interestingly, W_{aro} of the native Lge1 sequence (0.47) falls in the middle of the distribution for its shuffled variants (p=0.57), in contrast to the highly patterned sequences such as that of A1LCD with p>0.99. Taken together, in addition to NCPR, symmetry breaking in the case of Lge1_{180} could be a consequence of its complex sequence composition, including both the nonuniform patterning of tyrosines and a high abundance of arginines. Provided that our simulations are long enough to provide an equilibrium picture and are on the lengthscale of a single protein not strongly influenced by finitesize effects (these potential artifacts cannot be discounted), they actually can be seen as a demonstration of such symmetry breaking in a heteropolymer, as Reviewer 2 accepts. The above comparisons and discussion, while extremely important, are outside the scope of the present manuscript and will be treated in a separate manuscript.
Furthermore, analysis of pairwise contacts suggests that intra and intermolecular interactions rely on a similar pool of contacts by aminoacid type, but differ significantly if one analyzes specific sequence location of the interacting residues involved (Figure 2—figure supplement 1A and B). For example, one observes a high correlation between the frequencies of different contacts by aminoacid type when comparing intramolecular contacts in singlecopy simulations and intermolecular contacts in 24copy simulations (Figure 3—figure supplement 1B). This correlation is completely lost (Figure 3—figure supplement 1C) if one analyzes positionresolved statistics (2D pairwise contacts maps) or statistically defined interaction modes (Figure 3A, and Figure 3—figure supplement 1A). For example, although TyrTyr interactions dominate in both cases, in singlecopy simulations of WT Lge1_{180} the Cterminal Tyr_{80} barely participates in any intramolecular interactions with other residues (Figure 3—figure supplement 1A), while in 24copy simulations it is one of the most intermolecularly interactive residues (Figure 3A). In other words, while the symmetry between intra and intermolecular interactions can be observed at the level of pairwise contact types (similar type contact used for both), the distribution of these contacts along the peptide sequence is clearly different in the two cases. Finally, it should be mentioned that the parallel between singlecopy and phase behavior in both homopolymers and heteropolymers is observed primarily at the level of thermodynamic variables such as LLPS critical temperature (Tc), coiltoglobule transition temperature (T_{q}) or the Boyle temperature (T_{B}). It is possible that the noted correspondence extends primarily to such and similar thermodynamic variables, while more structural, topological features of the globule in the singlemolecule case and the network in the collective phase case remain uncoupled.
Interestingly, the core of intramolecular interactions observed for a single molecule at infinite dilution and in the crowded context remain approximately the same as reflected in the high correlation between intramolecular modes obtained in single and multichain simulations. Namely, proteins keep core selfcontacts and establish new ones with neighbors, but do not lose “selfidentity”, as in homopolymer melts. Similar effects have also been observed elsewhere: https://doi.org/10.1073/pnas.2000223117, https://doi.org/10.1073/pnas.1804177115.
Reviewer #1 (Recommendations for the authors):
– The manuscript will be strengthened by improving the experimental part. The phase diagrams are not "phase diagrams" in a true sense, they are solubility diagrams or state diagrams of the protein. Phase diagrams are characterized by binodal and tielines, which were not measured here. Please see the measured phase diagrams reported in Martin et al. (ref # 18). doi:10.1126/science.aaw8653.
We indeed could not measure complete binodal and tielines, due in part to the experimental limitations when it comes to working with high protein concentrations for Lge1_{180} constructs. Therefore, as suggested by the Reviewer, we now refer to the presented diagrams as “solubility diagrams”.
– The use of BF microscopy to distinguish the "morphology" of the condensed phase provides shallow insight into the morphology and dynamics of these assemblies.
Following the Reviewer’s comment, we have purified and fluorescently labelled different constructs. This has allowed us to use fluorescence microscopy for a more detailed analysis of the condensed phase (Figure 1—figure supplement 1B. Please, see also above). Different morphologies have been assessed by quantifying the circularity of the observed condensates (Figure 1—figure supplement 1C). Moreover, we have included the analysis of the dynamics of these assemblies at different levels by fusion experiments (Video 1 and Video 2) and FRAP (Figure 1D, E and Figure 1—figure supplement 2). Given the fact that protein precipitates of the Y>A mutant are present at extremely low abundance and only at concentrations of 45 µM and above, fusion behavior could not be studied in that case.
– The phaseseparated condensates are considered network fluids where percolation and phase separation goes handinhand. A discussion on this should be included in the current manuscript and should include how different variants that the authors studied show distinct percolation behavior. Please see https://doi.org/10.1016/j.molcel.2022.05.018.
Following the Reviewer’s suggestion, we have thoroughly revised the manuscript in order to connect and contrast our findings with the percolation/phase separation framework. Indeed, the fractal behavior is naturally related to percolation phenomena. According to the definition used in colloidal chemistry, biological condensates can be described as a weak gel, which undergoes a transition between a population of finitesize clusters (‘prepercolation clusters’) or sol, and an infinitely large cluster or gel (https://doi.org/10.1007/3540114718_4). In the case of biological condensates, phase separation coupled to percolation results in finitesized droplets and the appearance of surface tension. Under the requirement that c_{sat} < c_{perc} < c_{dense}, phase separation leads to an increase in local concentration of IDPs and defines the phase boundary, while percolation transition establishes the network connectivity (https://doi.org/10.1016/j.molcel.2022.05.018). Consequently, the spatial organization of both finitesized and infinite clusters is directly connected to a networking transition (percolation). For instance, direct links between the fractal dimension defining scaling principles in selfsimilar clusters and critical exponents in different percolation models have been provided (https://doi.org/10.1007/3540114718_4, https://doi.org/10.1007/BF01012940).
While a similar formal derivation for our model is out of the scope of the present study, we can provide an illustration of how our results and the key parameters of the model can be interpreted according to percolation theory. A key concept in percolation theory is that of contact probability between components in the system. The network transition appears, and a percolating cluster is formed if the contact probability exceeds a particular threshold i.e. the critical contact probability (p_{crit}). For instance, in the simplest case according to the FloryStockmayer theory p_{crit} = 1/(n1), where n is the number of bonds formed by each monomer and is related to the valency in our model. Thus, interaction valency contributes directly to the spatial organization of prepercolating clusters (the fractal dimension) and defines the threshold of percolation. In the revised manuscript, we have estimated the contact probabilities (p_{c}) for all three Lge1_{180} variants, under the assumption of a wellmixed system, that is, under the assumption that all chains in the simulation box can, in principle, establish contacts with all the other chains. As shown in Figure 2—figure supplement 1C, contact probabilities evolve in direct proportion to valency, with a plateau over the last 0.3 µs. Importantly, the WT contact probability reaches a level that is ~1.5 fold higher than for either mutant. While we cannot independently estimate the value of p_{c} in our simulated systems, the fact that on the simulation time scale WT forms a single percolating cluster and the two mutants do not (Figure 2C), is consistent with this difference in contact probability. We have highlighted these points on p. 19 of the revised manuscript.
– The idea that comes across from reading this manuscript is that IDPs/IDRs are the main driver of phase separation of proteins. This may not be true. It is fine to study an isolated IDR and its phase behavior, but one needs to acknowledge the fulllength protein may display a more nuanced behavior through a combination of its IDR and other domains.
We fully agree with the Reviewer that IDPs/IDRs are not the only drivers of protein phase separation and that folded domains play a critical role in many systems. We have emphasized this point on pp. 56, 20 of the revised manuscript. Stimulated by the Reviewer’s comment, we have also realized that the role of the Lge1 IDR in driving phase separation merits a clearer explanation. In response, we now provide a more extensive discussion of the contribution of different domains of Lge1 to both its phase behavior and biological function (pp. 56, 20). Specifically, in a previous study by Gallego et al. (https://doi.org/10.1038/s415860202097z), it was shown that the first 80 aa of Lge 1 are the main driver of its phase separation, with the rest of the IDR also contributing to phase separating properties. Moreover, the other key domain of Lge1 – its Cterminal predicted coiled coil – has an essential function in vivo due to its interaction with the E3 ligase Bre1, which is required for histone H2B ubiquitination. We should emphasize that in the present manuscript, the Lge1_{180} fragments and its mutants were used as a model system to study the physicochemical features of the respective condensates/precipitates, with the biological interpretations being out of the scope of the present study.
– The colloidal cluster formalism, while interesting, should be compared with experimentally determined observables, as the authors point out. Without such data, I am unsure how one can conclude that this formalism provides "a potentially universal foundation" in studying the phase separation of proteins.
A key contribution of the present work is the development and the definition of a quantitative model that treats the spatial organization of a biomolecular condensate across scales using just two key parameters that capture the behavior of individual polymer chains in the condensate (valency and compactness). As the Reviewer correctly points out, the extensive quantitative predictions of the model should be thoroughly tested against equally detailed experimental data. In the present manuscript, we have taken the first, largely qualitative steps in this direction. Specifically, the model can be used to reconstruct the spatial organization of clusters of arbitrary size at the atomistic level (Figure 5A and B, Videos 4, 5, and 6), enabling a structural understanding of the organization of condensate interior. A direct application of such understanding concerns the nature of cavity sizes and interpretation of dextran partitioning experiments. Second, as pointed above, differences in morphology of protein clusters propagate across scales, and can be qualitatively characterized by the analysis of microscopic images i.e. presence or absence of detectable condensates as a function of the fractal dimension d_{F} (see also discussion above). In particular, the model correctly predicts the difference in the behavior of WT and R>K as opposed to Y>A variants, solely based on the predicted fractal dimension they exhibit. Finally, we could show that the MD simulations indeed match the predictions of fractal scaling for the three smallest clusters. Here, it is important to understand that MD simulations in the first instance just give the average valency and compactness of individual chains in the dense phase. These values are then input into the fractal scaling formalism, which is conceptually fully independent from MD simulations, to obtain the dependence of condensate mass on its radius, M(R), at any desired length scale. The analysis presented in Figure 5—figure supplement 1B and discussed on p. 16 shows that the predictions of fractal scaling for the first three smallest clusters indeed correspond to what is seen in MD. This is a nontrivial correspondence and can be taken as direct evidence that fractal organization is present even at the shortest scale, i.e. at the level of MD simulation boxes. Ultimately, static light scattering experiments would give the best possibility to test the model directly. However, these experiments are beyond the scope of the present work, concerned with presenting the quantitative model and linking it with MD simulations, and will be the topic of our future work. In particular, the fractal formalism predicts significant regions of linear behavior in such curves in loglog representation, while the fractal dimension d_{F}, provides a quantitative point of comparison between theoretical predictions and experimental measurements. Following the Reviewer’s comment, we have rephrased the above statement to better reflect these points (p. 22).
– Interactions and solubility of "stickers" and "spacers" have been recently studied by Mittag, Pappu, and coworkers. Such discussions would be helpful to include here since the authors focus on a similar set of residues (R, G, Y, K). Please see https://www.nature.com/articles/s4155702100840w
Indeed, a number of findings in our study are in line with Bremer et al. (https://doi.org/10.1038/s4155702100840w), as discussed above in our reply to the Editor’s comments. As already pointed out, this concerns both similar trends in aminoacid interaction preferences, whereby RY and YY are the drivers of IDR selfassociation, while R>K mutations modulate the LLPS potential, as well as symmetry breaking between intra and intermolecular interaction modes. We have included the corresponding discussion in the revised manuscript on pp. 89, 11.
https://doi.org/10.7554/eLife.80038.sa2Article and author information
Author details
Funding
Austrian Science Fund (P 30550)
 Bojan Zagrovic
Austrian Science Fund (P 30680B21)
 Bojan Zagrovic
NOMIS Stiftung (Pioneering Research Grant)
 Alwin Köhler
Austrian Science Fund (F79)
 Alwin Köhler
Ministry of Science and Higher Education of the Russian Federation (agreement No. 075152020773)
 Roman G Efremov
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by Austrian Science Fund FWF Standalone Grants P 30550 and P 30680B21 (to BZ); a NOMIS Pioneering Research Grant and a grant of the Austrian Science Fund (FWF, project F79) (to AK).
Senior Editor
 Aleksandra M Walczak, CNRS, France
Reviewing Editor
 Rohit V Pappu, Washington University in St. Louis, United States
Version history
 Preprint posted: February 19, 2022 (view preprint)
 Received: May 6, 2022
 Accepted: July 18, 2023
 Accepted Manuscript published: July 20, 2023 (version 1)
 Version of Record published: August 7, 2023 (version 2)
 Version of Record updated: August 16, 2023 (version 3)
Copyright
© 2023, Polyansky, Gallego et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,067
 Page views

 177
 Downloads

 2
 Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Physics of Living Systems
Many animals moving through fluids exhibit highly coordinated group movement that is thought to reduce the cost of locomotion. However, direct energetic measurements demonstrating the energysaving benefits of fluidmediated collective movements remain elusive. By characterizing both aerobic and anaerobic metabolic energy contributions in schools of giant danio (Devario aequipinnatus), we discovered that fish schools have a concave upward shaped metabolism–speed curve, with a minimum metabolic cost at ~1 body length s^{1}. We demonstrate that fish schools reduce total energy expenditure (TEE) per tail beat by up to 56% compared to solitary fish. When reaching their maximum sustained swimming speed, fish swimming in schools had a 44% higher maximum aerobic performance and used 65% less nonaerobic energy compared to solitary individuals, which lowered the TEE and total cost of transport by up to 53%, near the lowest recorded for any aquatic organism. Fish in schools also recovered from exercise 43% faster than solitary fish. The nonaerobic energetic savings that occur when fish in schools actively swim at high speed can considerably improve both peak and repeated performance which is likely to be beneficial for evading predators. These energetic savings may underlie the prevalence of coordinated group locomotion in fishes.

 Computational and Systems Biology
 Physics of Living Systems
The adaptive dynamics of evolving microbial populations takes place on a complex fitness landscape generated by epistatic interactions. The population generically consists of multiple competing strains, a phenomenon known as clonal interference. Microscopic epistasis and clonal interference are central aspects of evolution in microbes, but their combined effects on the functional form of the population’s mean fitness are poorly understood. Here, we develop a computational method that resolves the full microscopic complexity of a simulated evolving population subject to a standard serial dilution protocol. Through extensive numerical experimentation, we find that stronger microscopic epistasis gives rise to fitness trajectories with slower growth independent of the number of competing strains, which we quantify with powerlaw fits and understand mechanistically via a random walk model that neglects dynamical correlations between genes. We show that increasing the level of clonal interference leads to fitness trajectories with faster growth (in functional form) without microscopic epistasis, but leaves the rate of growth invariant when epistasis is sufficiently strong, indicating that the role of clonal interference depends intimately on the underlying fitness landscape. The simulation package for this work may be found at https://github.com/nmboffi/spin_glass_evodyn.