Data generation and representation of dataset for probing scaling laws of PLD phase behavior

(A) Representation of the prion-like domains (grey-shaded regions) of each considered protein, alongside the associated number of variants simulated, totaling a set of 140 sequences. (B) Composition of the wild-type sequence of the PLDs simulated, in terms of the percentage of neutral amino acids (A, C, I, L, M, P, S, T, and V; no net charge at pH 7 and no π electrons in the side chain), glycine (G), negative (D, E), positive (K, H, R), neutral-π (N and Q; no net charge at pH 7 with π electrons in the side chain) and aromatic (F, W, Y) residues. (C) Top panel: Representation of the simulation model: each amino acid of the protein is represented by a single bead, bonded interactions are modeled with springs, and non-bounded interactions are modeled with a combination of the Wang–Frenkel potential and a Coulomb term with Debye-Hückel screening. Middle panel: The Direct Coexistence simulation method, in the slab geometry, is employed, where both the protein-rich and protein-depleted phases are simulated simultaneously. Bottom panel: Finally, to obtain the data point in the concentration–temperature phase space, at a given temperature the density profile is computed and the concentrations of the rich and depleted phases are obtained via a suitable fitting (Methods). (D) Representation of the entire computational data set. Orange data points represent variants where charged or aromatic residues were mutated (i.e., stickers), while cyan data points represent all other types of mutations studied (i.e., spacers).

Mpipi model quantitatively captures phase behavior for a large set of hnRNPA1 PLD mutants.

Mutations are divided into A aromatic, B Arg/Lys, C charged, and D Gly/Ser sequence variants of the PLD of the protein hnRNPA1, and comparison with experimental results. The hnRNPA1 PLD was used to validate the accuracy of the model, by extracting critical temperatures from experimental phase diagrams of a large set of these variants [35]. To assess how well the Mpipi model performed for PLDs, we compared critical temperatures obtained from simulations with the corresponding ones extracted from the experimental data for those variants that were available (depicted in the top right of each binodal, where r is the Pearson coefficient and D is the root mean square deviation between simulated and experimental values). For the variants where the experimental critical temperature could not be determined, we make a comparison between the critical temperature with the saturation concentration, since for these systems we expect them to be inversely correlated. The naming system for the variants is discussed in detail in the text.

Complete set of 140 binodals, in the temperature versus density phase space, generated via largscale molecular dynamics simulations.

Binodals are grouped according to the mutation type (Aromatic, Arg/Lys, Gln/Asn (Polar), Ala/Ser and Thr/Gly/Ser (Neutral)). Each set is further grouped based on the PLD in question.

Mutations in aromatic amino acids have strong effects on the critical solution temperature of PLDs.

A. Tyr to Phe mutations. The number of Tyr mutated to Phe (x-axis) versus the change in critical temperature, (y-axis), computed as the critical temperature of the variant minus that of the corresponding PLD wild-type sequence. The trendline defines the simplest, dominating scaling law for this mutation. The R2 value of 0.88 shows the agreement of our data to the scaling law defined by Equation 4. B. Tyr or Phe to Trp mutations. In this case, the overall trendline, defined by equation 5 has an R2 of 0.99.C. Analysis of the variants involving mutations of aromatic residues to uncharged, non-aromatic amino acids, a.k.a. aromatic deletions. In the y-axis, the change in the critical temperature of the variant to that of the wild type sequence, and in the x-axis, a renormalized measure of mutations: the fraction of aromatic residues mutated times the, to account for the competing physical interactions between PLDs. Error bars indicate the standard error associated with replicas of the simulation.

Critical effects of Arg mutations in the phase transition temperature of the condensates

(A) Analysis of the critical solution temperature of PLDs with Arg to Lys mutations. The number of Arg to Lys mutations divided by PLD length (x-axis) versus the change in the critical solution temperature, that is, Tc(variant) −Tc(WT) (y-axis). The trendline, representing Equation 7, fits the data with an R2 = 0.94, and defines the stability measure of this perturbation. (B) Analysis of the critical solution temperature of PLDs with Arg mutations to uncharged, non-aromatic amino acids that maintain the WT compositional percentages (mainly Gly, Ser, and Ala. For more details see Supporting Information). Arg deletion fraction (x-axis) versus Tc(variant) −Tc(WT) (y-axis). The trendline, which fits the data with R2 = 0.96, defines the stability measure, represented in Equation 8.

Polar and neutral amino acid mutations show subtle modulation of the critical solution temperatures of PLDsA)

Analysis of the critical solution temperature of PLDs with Asn to Gln mutations. The number of Gln to Asn mutations renormalized by dividing by PLD length (x-axis) versus the temperature change Tc(variant) −Tc(WT) (y-axis). The trendline, which fits the data with an R2 = 0.84, fits the scaling law Equation 9. (B) Analysis of the critical solution temperature of PLDs with Gly/Ser to Thr mutations (x-axis) versus the temperature change Tc(variant) −Tc(WT) (y-axis). The trendlines, representing Equations 10 and 11 fit the data with an R2 = 0.7. (C) The number of Ala to Ser mutations (x-axis) versus the temperature change Tc(variant) −Tc(WT). The trendline, Equation 12, fits the data with an R2 = 0.77

Prion-like domains featuring sequences with dispersed aromatic residues show greater propensities for phase separation.

A Representation of the aromatic amino acids present in the wild type sequences of each PLD considered B Location of the positively charged (red) and negatively charged (blue) amino acids present in the wild type sequences of each PLD considered C Analysis of the critical solution temperature versus the σ order parameter, defined in Equation 13. Those groups of variants with the same 13 value show their average critical temperature highlighted, and softer values for other data points. For lower values of σ, the critical temperature present in the condensates is higher, and vice versa, indicating that those variants with a more homogeneous distribution of aromatic amino acids have condensates more stable at higher temperatures.

Convergence tests for the WT sequence of the PLD of FUS. The density profiles of different and independent simulations are plotted across the perpendicular axis to the condensate interfaces. Each simulation was performed for different timescales (as specified in the legend) to check for proper convergence.

Predicted effect of single point mutations on the critical temperature, in Kelvin, of PLDs, as a function of their length, L

Convergence tests for the WT sequence of the PLD of FUS. The density profiles of different and independent simulations are plotted across the perpendicular axis to the condensate interfaces. Each simulation was performed for different system sizes (aka, number of chains, as specified in the legend) to check for proper convergence.

Normalised contact maps of different R to K variants of PLDs: A) TDP B) FUS, C) hnRNPA1

Normalised contact maps of different aromatic variants of PLDs: A) FUS B) hnRNPA1.