1. Structural Biology and Molecular Biophysics
  2. Computational and Systems Biology
Download icon

Pi-Pi contacts are an overlooked protein feature relevant to phase separation

  1. Robert McCoy Vernon
  2. Paul Andrew Chong
  3. Brian Tsang
  4. Tae Hun Kim
  5. Alaji Bah
  6. Patrick Farber
  7. Hong Lin
  8. Julie Deborah Forman-Kay  Is a corresponding author
  1. Hospital for Sick Children, Canada
  2. University of Toronto, Canada
Research Article
Cite as: eLife 2018;7:e31486 doi: 10.7554/eLife.31486
8 figures, 1 table and 3 additional files

Figures

Figure 1 with 2 supplements
PDB statistics for planar pi-pi interactions.

(A) Average number of sp2 groups involved in planar pi-pi contacts per 100 protein residues binned by crystal structure resolution. Values are shown for contacts defined by the nature of the involved sp2 groups, with all groups in black, aromatic to non-aromatic sp2 in blue, non-aromatic to non-aromatic in pink, backbone to backbone in gray, and aromatic to aromatic in orange. Error bars show bootstrap SEM. (B) Planar pi-pi contact interaction frequencies for each residue type, with the average across all residue types shown as a red line, and (C) frequency of each residue type in contributing to planar pi-pi interactions, with bars showing overall frequency colored proportionally by the nature of the contact partners. Figure 1—source data 1 and 2.

https://doi.org/10.7554/eLife.31486.002
Figure 1—source data 1

Pi-Pi contact annotations for the full PDB set.

Text file listing the pi-pi contacts observed across our non-redundant PDB set, with contact types shown by residue annotations where single amino acid names refer to sidechains and pairs of amino acids refer to the backbone peptide bond between residue i and residue i + 1.

https://doi.org/10.7554/eLife.31486.005
Figure 1—source data 2

Residue and amino acid counts for the full PDB set.

Text file listing the residues assessed in each individual PDB chain, used for calculating contact frequencies.

https://doi.org/10.7554/eLife.31486.006
Figure 1—figure supplement 1
Proportion of sidechain to backbone VDW contacts that satisfy planar contact criterion.

To examine relative contact enrichment, sidechain contacts to the backbone are normalized against the total number of contacts satisfying the same VDW criterion (two pairs of atoms within 4.9 Å), with comparison between (left) planar sp2 sidechain groups (for W, F, Y, H, R, Q, N, E and D) and (right) selected sp3 planar surfaces (for C, S, M, T, K, L, V, I). The sp3 planar surfaces were chosen as a control by taking sets of atoms describing exposed planar surfaces, as described in the Materials and methods. Comparing relative planar contact frequency, we observe the majority of sp2 sidechain types show clear enrichment relative to the sp3 controls.

https://doi.org/10.7554/eLife.31486.003
Figure 1—figure supplement 2
Selected sidechain-to-sidechain contact frequencies by resolution.

Percentage of residues involved in planar contacts are shown in red, and percentage in any other non-planar VDW contact are shown in blue, with panels showing contacts by sidechain group (for panels A-F: R to R, R to K, H to R, H to K, Q to R, and Q to K). We observe that the increase in planar pi-pi contacts to arginine at higher resolution comes at the expense of non-planar VDW contacts (panels A, (C and E). In contrast, contacts made to an arbitrary surface plane at the end of lysine sidechains do not show this increase in planar orientation with resolution (panels B, D and F).

https://doi.org/10.7554/eLife.31486.004
Examples of planar pi-pi contacts in folded protein structures.

Pi-pi interactions shown using rods to describe the normal vector of the plane. Rods extend to a carbon VDW radius of 1.7 Å, colored by category with sidechain groups in purple, backbone in blue, small molecule ligands in orange, and RNA in gray. Ligand molecules are green, with relevant water molecules shown as red spheres and hydrogen bonds as yellow lines. (A) Arginine ladder motif in Porin P (PDB:2o4v). (B) Catalytic site from arginine kinase (PDB:1m15). (C) Network of interactions in nitrogenase (PDB: 3u7q). (D) Backbone/sidechain contacts at the ends of secondary structure elements (PDB:4b93). (E) RNA-binding interactions (PDB: 4lgt). (F) Interaction network stacked between disulfide bonds (PDB: 4v2a).

https://doi.org/10.7554/eLife.31486.007
Figure 3 with 2 supplements
Correlation of planar pi-pi interactions with solvent and lack of secondary structure.

(A) Contact frequency for sidechain groups (red) and backbone (blue) increases with the total number of solved water molecules within 4.9 Å of the residue, based on structures with >1 water oxygen per residue, including all molecules within 8 Å of the chain of interest, including symmetry partners. (B) Representative example of a pi-stacked sidechain in contact with 11 water molecules (PDB:4u98), showing how the interaction does not appear to compete with solvent. (C) Mean contact frequency vs. sequence distance from regular secondary structure and loop/turn regions. (D) Example of the range of interactions found >10 residues from helix/strand secondary structure (PDB:4b4h).

https://doi.org/10.7554/eLife.31486.008
Figure 3—figure supplement 1
Effect of solvation on pi-pi category frequencies.

Effects of solvation, measured by the total number of water molecules within 4.9 Å of a given residue, on the overall frequency of different types of interactions, categorizing contacts by the identities of the solvent contact tested residue and its partner, where the solvated residue is listed first (green for aromatic to aromatic, blue for aromatic to non-aromatic, orange for non-aromatic to aromatic, and pink for non-aromatic to non-aromatic). Note that non-aromatic includes backbone interactions.

https://doi.org/10.7554/eLife.31486.009
Figure 3—figure supplement 2
Enrichment of pi-pi contacts, relative to overall VDW contacts, as a function of the number of interactions with water.

Water contacts are measured to residue A, and the percentage of pi-pi contacts per VDW contact is measured for all contacts from residue A to residue B. Panel A shows the change in percentage of pi-pi contacts per VDW contact by number of waters for each sidechain-sidechain interaction, with pi-contact enrichment with solvation being a consistent property of the majority of interactions involving at least one non-aromatic sidechain. Panels B-F show slope measurements for a selection of examples, Phe to Phe, Arg to Arg, Phe to Arg, Arg to Glu and Phe to Glu, respectively.

https://doi.org/10.7554/eLife.31486.010
Sidechain contacts at interface positions.

Contact frequencies are shown for the nine sp2-containing sidechain types, split into three bars based on interface proximity. From left to right, these bars are i) no other chain within 4.9 Å of any sidechain atom, ii) within 4.9 Å VDW contact distance of any atoms in a different chain within the unit cell of the crystal, iii) within 4.9 Å of any atoms in a chain from a neighboring unit cell, as determined by crystal symmetry data. Bars are colored by the proportion of total contacts contributed by three categories, bottom/black corresponding to local (sequence separation ≤4 residues) intrachain contacts, middle/blue to non-local intrachain contacts, and top/pink to interchain contacts, showing that overall contact frequencies and local contact frequencies remain similar and that the non-local contacts do not discriminate between intra and interchain.

https://doi.org/10.7554/eLife.31486.011
Figure 5 with 2 supplements
Prediction of phase separation based on planar pi-pi interactions.

(A) Reliability plot showing average predicted and observed contact frequencies for percentile bins by pi-pi contact prediction for proteins in the PDB, with PDB sequences used for training in blue and the leave out set in red. Bars show SEM. (B) Highest number of contacts predicted, by window, for two phase separation predictor training sets and three test sets, for the unoptimized predictor. (C) Modified ROC curve showing the final predictor’s performance on three test sets vs. the human proteome, with the full set in pink (N = 62), the full set minus the insufficient for phase separation set shown in green (N = 44), and the sufficient for phase separation set in blue (N = 32). (D) Results for the final predictor (as for panel b) plotted with the predictor’s phase separation propensity scores (PScore). Data underlying B-D included in Figure 5—source data 1 and Figure 5—source data 2.

https://doi.org/10.7554/eLife.31486.012
Figure 5—source data 1

Phase separation training, testing and designed protein test sets.

Excel table containing identification and literature references for proteins in the phase separation test and training sets, with sheet one showing the training set proteins, two showing proteomic test set proteins, and three showing synthetic test set proteins.

https://doi.org/10.7554/eLife.31486.015
Figure 5—source data 2

Additional phase separation propensity scores used in final ROC analysis.

Excel table containing protein IDs and predicted propensity scores, with different datasets on each sheet. Sheets 1–3 have full predictions for the human, E. coli, S. cerevisiae proteomes, respectively. Sheet four repeats the subset of human proteins found in the DisProt database. Sheet five shows scores for the protein sequences found in our non-redundant PDB set, and sheet six repeats the subset of PDB sequences withheld from predictor training.

https://doi.org/10.7554/eLife.31486.016
Figure 5—figure supplement 1
Contrasting behavior of disorder prediction algorithms and the phase separation prediction.

Disopred3 (Jones and Cozzetto, 2015) derived disorder predictions are shown on the y axis and PScores are shown on the x axis for four different test sets, (A) our PDB test set, representing a negative set for both phase separation and disorder, (B) a random sample of 4385 sequences from the human proteome, (C) the subset of the human proteome annotated as containing disorder in the Disprot database (Piovesan et al., 2017), representing a positive set for disorder, and (D) our full phase separation test set. Results are split into four categories separated by PScore = 4 and Disorder = 0.8, with the percentage of sequences in each category inset in blue. The majority of known phase-separating proteins are associated with disorder, and are predicted to be disordered, but sequences predicted to phase separate represent a small subset of both the known and the predicted disordered proteins.

https://doi.org/10.7554/eLife.31486.013
Figure 5—figure supplement 2
Comparison of scores used in generating phase separation predictions.

(A) Highest number of short-range backbone contacts predicted, by window, for the PDB test set, the human proteome, the set of disordered human proteins from Disprot, and the full phase separation test set (N = 121), where percentile ranges are shown in colored boxes. (B) Highest number of long-range backbone contacts predicted, as for panel a. (C) Results for the final predictor plotted with the predictor’s phase separation propensity scores (PScore). Prediction of long-range backbone contacts provides the majority of the discrimination seen in the final predictor.

https://doi.org/10.7554/eLife.31486.014
Association of phase separation propensity scores with protein interactions, splice isoforms, PTMs, and GO localization, process, and function terms.

(A) Protein-protein interaction enrichment by the PScore of partner 1 vs. the PScore of partner 2. The color gradient shows the natural logarithm of the observed over expected ratio. (B) Percentage of human proteins at each PScore range that are detected in more than 10% of AP-MS negative control experiments. (C), Score ranges for alternative splicing variants shown as vertical lines sorted by reference sequence values. (D), Number of PTMs vs. average relative PScore, with methylation shown in red, phosphorylation in green, and ubiquitination in blue.

https://doi.org/10.7554/eLife.31486.017
PScore enrichment by gene ontology annotation for subcellular localization (A), biological process (B), and molecular function (C).

The color gradient shows the natural logarithm of the observed over expected ratio. Heatmaps show enrichment in vertebrate sequences across six defined score ranges, with the highest score range (PScore ≥4) labeled with human enrichment values calculated using PANTHER (see Materials and methods).

https://doi.org/10.7554/eLife.31486.018
Figure 8 with 1 supplement
Visual confirmation of phase separation.

(A) Test tubes containing transparent or turbid solutions of 1 mM FMR1 C-terminus (residues 445–632) along with their corresponding DIC microscopy images taken at room temperature or 4°C, respectively. (B) 1 mM FMR1 C-terminus forms droplets exhibiting liquid fusion properties at 4°C. (C) 40 µM solutions of Human Cytalomegalovirus pAP along with corresponding microscopy images taken at room temperature or 80°C, respectively.

https://doi.org/10.7554/eLife.31486.019
Figure 8—figure supplement 1
Visual confirmation of phase separation, using 20 mg/ml ficol as a crowding agent.

(A) 200 µM FMR1 C-terminus shows reversible droplet formation between 2°C and RT, (B) 220 µM engrailed-2 shows reversible droplet formation between 2°C and 35°C. DIC Images taken at 63x magnification, where shading reflects the differences in position relative to the focal plane of the free floating droplets. Scale shown as black bars sized to 10 µm.

https://doi.org/10.7554/eLife.31486.020

Tables

Key resources table
Reagent type (species)
or resource
DesignationSource or
reference
IdentifiersAdditional information
Recombinant DNA reagentHis-SUMO-Ddx4 1-236PMID 25747659Expression vector (His-Sumo tagged)
for Ddx4 residues 1–236, sequence from
UID: Q9NQI0-1 (uniprot identification)
Recombinant DNA reagentHis-SUMO-Ddx4 1-236(9FtoA)PMID 25747659Expression vector (His-Sumo tagged) for
Ddx4 residues 1–236, sequence from
UID: Q9NQI0-1, 9 out of 14 phenylalanines mutated to alanine
Recombinant DNA reagentHis-SUMO-Ddx4 1-236(14FtoA)PMID 28894006Expression vector (His-Sumo tagged) for
Ddx4 residues 1–236, sequence from
UID: Q9NQI0-1, all phenylalanines mutated to alanine
Recombinant DNA reagentHis-SUMO-Ddx4 1-236(RtoK)PMID 28894006Expression vector (His-Sumo tagged) for
Ddx4 residues 1–236, sequence from
UID: Q9NQI0-1, all arginines mutated to lysine
Recombinant DNA reagentHis-SUMO-FMR1445-632This paperExpression vector (His-Sumo tagged) for
FMR1 residues 445–632, sequence
from UID: Q06787-1
Recombinant DNA reagentHis-SUMO-FMR1445-632(RtoK)This paperExpression vector (His-Sumo tagged) for
FMR1 residues 445–632, sequence from
UID: Q06787-1, all arginines mutated to lysine
Recombinant DNA reagentHis-SUMO-pAPA341QThis paperExpression vector (His-Sumo tagged) for
SCAF isoform pAP, sequence from UID: P16753-2,
alanine 341 mutated to glutamine
Recombinant DNA reagentHis-SUMO-EN2This paperExpression vector (His-Sumo tagged)
for Engrailed-2, sequence from UID: P19622-1

Additional files

Source code 1

Python scripts for identifying PDB contacts.

Pi-pi contact identification scripts suitable for reproducing the annotation data contained in Figure 1—source data 1 and 2.

https://doi.org/10.7554/eLife.31486.021
Source code 2

Final predictor code package.

Python script and associated database files for the final phase separation propensity predictor.

https://doi.org/10.7554/eLife.31486.022
Transparent reporting form
https://doi.org/10.7554/eLife.31486.023

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)