Data generation and representation of dataset for probing scaling laws of PLD phase behavior
(A) Representation of the prion-like domains (grey-shaded regions) of each considered protein, alongside the associated number of variants simulated, totaling a set of 140 sequences. (B) Composition of the wild-type sequence of the PLDs simulated, in terms of the percentage of neutral amino acids (A, C, I, L, M, P, S, T, and V; no net charge at pH 7 and no π electrons in the side chain), glycine (G), negative (D, E), positive (K, H, R), neutral-π (N and Q; no net charge at pH 7 with π electrons in the side chain) and aromatic (F, W, Y) residues. (C) Top panel: Representation of the simulation model: each amino acid of the protein is represented by a single bead, bonded interactions are modeled with springs, and non-bounded interactions are modeled with a combination of the Wang–Frenkel potential and a Coulomb term with Debye-Hückel screening. Middle panel: The Direct Coexistence simulation method, in the slab geometry, is employed, where both the protein-rich and protein-depleted phases are simulated simultaneously. Bottom panel: Finally, to obtain the data point in the concentration–temperature phase space, at a given temperature the density profile is computed and the concentrations of the rich and depleted phases are obtained via a suitable fitting (Methods). (D) Representation of the entire computational data set. Orange data points represent variants where charged or aromatic residues were mutated (i.e., stickers), while cyan data points represent all other types of mutations studied (i.e., spacers).