1. Structural Biology and Molecular Biophysics
Download icon

Sequence co-evolution gives 3D contacts and structures of protein complexes

  1. Thomas A Hopf
  2. Charlotta P I Schärfe
  3. João P G L M Rodrigues
  4. Anna G Green
  5. Oliver Kohlbacher
  6. Chris Sander  Is a corresponding author
  7. Alexandre M J J Bonvin  Is a corresponding author
  8. Debora S Marks  Is a corresponding author
  1. Harvard University, United States
  2. Technische Universität München, Germany
  3. University of Tübingen, Germany
  4. Bijvoet Center for Biomolecular Research, Utrecht University, Netherlands
  5. Memorial Sloan Kettering Cancer Center, United States
Research Article
Cite this article as: eLife 2014;3:e03430 doi: 10.7554/eLife.03430
7 figures, 1 table and 7 additional files


Figure 1 with 1 supplement
Co-evolution of residues across protein complexes from the evolutionary sequence record.

(A) Evolutionary pressure to maintain protein–protein interactions leads to the co-evolution of residues between interacting proteins in a complex. By analyzing patterns of amino acid co-variation in an alignment of putatively interacting homologous proteins (left), evolutionary couplings between co-evolving inter-protein residue pairs can be identified (middle). By defining distance restraints on these pairs, the 3D structure of the protein complex can be inferred using docking software (right). (B) Distribution of E. coli protein complexes of known and unknown 3D structure where both subunits are close on the bacterial genome (left), allowing sequence pair matching by genomic distance. For a subset of these complexes, sufficient sequence information is available for evolutionary couplings analysis (dark blue bars). As more genomic information is created through on-going sequencing efforts, larger fractions of the E. coli interactome become accessible for EVcomplex (right). A detailed version of the workflow used to calculate all E. coli complexes currently for which there is currently enough sequence information is shown in Figure1—figure supplement 1.

Figure 1—figure supplement 1
Details of the EVcomplex Pipeline.
Figure 2 with 8 supplements
Evolutionary couplings capture interacting residues in protein complexes.

(A) Inter- and Intra-EC pairs with high coupling scores largely correspond to proximal pairs in 3D, but only if they lie above the background level of the coupling score distribution. To estimate this background noise a symmetric range around 0 is considered with the width being defined by the minimum inter-EC score. For the protein complexes in the evaluation set, this distribution is compared to the distance in the known 3D structure of the complex that is shown here for the methionine transporter complex, MetNI. (Plots for all complexes in the evaluation set are shown in Figure 2—figure supplement 1 and 2.) (B) A larger distance from the background noise (ratio of EC score over background noise line) gives more accurate contacts. Additionally, the higher the number of sequences in the alignment the more reliable the inferred coupling pairs are which then reduces the required distance from noise (different shades of blue). Residue pairs with an 8 Å minimum atom distance between the residues are defined as true positive contacts, and precision = TP/(TP + FP). The plot is limited to range (0,3) which excludes the histidine kinase—response regulator complex (HK–RR)—a single outlier with extremely high number of sequences. (C) To allow the comparison across protein complexes and to estimate the average inter-EC precision for a given score threshold independent of sequence numbers, the raw couplings score is normalized for the number of sequences in the alignment, resulting in the EVcomplex score. In this work, inter-ECs with an EVcomplex score ≥0.8 are used. Note: the shown plot is cut off at a score of 2 in order to zoom in on the phase change region and the high sequence coverage outlier HK-RR is excluded. (D) For complexes in the benchmark set, inter-EC pairs with EVcomplex score ≥0.8 give predictions of interacting residue pairs between the complex subunits to varying accuracy (8 Å TP distance cutoff). All predicted interacting residues for complexes in the benchmark set that had at least one inter-EC above 0.8 are shown as contact maps in Figure 2—figure supplement 3–8.

Figure 2—figure supplement 1
Distribution and accuracy of raw EC scores for all complexes in evaluation set.
Figure 2—figure supplement 2
Distribution and accuracy of raw EC scores for all complexes in evaluation set (2).
Figure 2—figure supplement 3
Contact maps of all complexes with solved 3D structure with inter-ECs above EVcomplex score of 0.8.

Predicted coevolving residue pairs with an EVcomplex score ≥0.8 and all inter-ECs up to the rank of the last include inter-EC are visualized in complex contact maps (red dots: inter-ECs, green and blue dots: intra-ECs for monomer 1 and 2, respectively). Top left and bottom right quadrants: intra-ECs; top right and bottom left quadrants: inter-ECs. Inter- and intra-protein crystal structure contacts at minimum atom distance cutoffs of 5/8/12 Å are shown as dark/middle/light gray dots, respectively; missing data in the crystal structure as shaded blue rectangles.

Figure 2—figure supplement 4
Contact maps of all complexes with solved 3D structure with inter-ECs above EVcomplex score of 0.8 (2).
Figure 2—figure supplement 5
Contact maps of all complexes with solved 3D structure with inter-ECs above EVcomplex score of 0.8 (3).
Figure 2—figure supplement 6
Contact maps of all complexes with solved 3D structure with inter-ECs above EVcomplex score of 0.8 (4).
Figure 2—figure supplement 7
Contact maps of all complexes with solved 3D structure with inter-ECs above EVcomplex score of 0.8 (5).
Figure 2—figure supplement 8
Contact maps of all complexes with solved 3D structure with inter-ECs above EVcomplex score of 0.8 (6).
Figure 3 with 1 supplement
Blinded prediction of evolutionary couplings between complex subunits with known 3D structure.

Inter-ECs with EVcomplex score ≥0.8 on a selection of benchmark complexes (monomer subunits in green and blue, inter-ECs in red, pairs closer than 8 Å by solid red lines, dashed otherwise). The predicted inter-ECs for these ten complexes were then used to create full 3D models of the complex using protein–protein docking. For the fifteen complexes for which 3D structures were predicted using docking, energy funnels are shown in Figure 3—figure supplement 1.

Figure 3—figure supplement 1
Comparison of Interface RMSD to HADDOCK score.

The HADDOCK scores of docked models are plotted against their iRMSDs to the bound complex crystal. Gray data points correspond to models created without any ECs as unambiguous restraints whereas blue dots correspond to model created using all inter-couplings with EVcomplex score ≥0.8. HADDOCK score outliers with scores >100 are not shown, and any model with an iRMSD >35 Å is displayed as iRMSD = 35 Å for visualization purposes.

Evolutionary couplings give accurate 3D structures of complexes.

EVcomplex predictions and comparison to crystal structure for (A) the methionine-importing transmembrane transporter heterocomplex MetNI from E. coli (PDB: 3tui) and (B) the gamma/epsilon subunit interaction of E. coli ATP synthase (PDB: 1fs0). Left panels: complex contact map comparing predicted inter-ECs with EVcomplex score ≥0.8 (red dots, upper right quadrant) and intra-ECs (up to the last chosen inter-EC rank; green and blue dots, top left and lower right triangles) to close pairs in the complex crystal (dark/mid/light gray points for minimum atom distance cutoffs of 5/8/12 Å for inter-subunit contacts and dark/mid gray for 5/8 Å within the subunits). Inter-ECs with an EVcomplex score ≥0.8 are also displayed on the spatially separated subunits of the complex (red lines on green and blue cartoons, couplings closer than 8 Å in solid red lines, dashed otherwise, lower left). Right panels: superimposition of the top ranked model from 3D docking (green/blue cartoon, left) onto the complex crystal structure (gray cartoon) and close-up of the interface region with highly coupled residues (green/blue spheres).

Figure 5 with 3 supplements
Evolutionary couplings in complexes of unknown 3D structure.

Inter-ECs for five de novo prediction candidates without E. coli or interaction homolog complex 3D structure (Subunits: blue/green cartoons; inter-ECs with EVcomplex score ≥0.8: red lines). For complex subunits which homomultimerize (light/dark green cartoon), inter-ECs are placed arbitrarily on either of the monomers to enable the identification of multiple interaction sites. Contact maps for all complexes with unsolved structures are provided in Figure 5—figure supplement 1 and 2. Left to right: (1) the membrane subunit of methionine-importing transporter heterocomplex MetI (PDB: 3tui) together with its periplasmic binding protein MetQ (Swissmodel: P28635); (2) the large and small subunits of acetolactate synthase IlvB (Swissmodel: P08142) and IlvN (PDB: 2lvw); (3) panthotenate synthase PanC (PDB: 1iho) together with ketopantoate hydroxymethyltransferase PanB (PDB: 1m3v); (4) subunits a and b of ATP synthase (model for a subunit a predict with EVfold-membrane, PDB: 1b9u for b subunit), for detailed information see Figure 6; and (5) the complex of UmuC (model created with EVfold) with one possible conformation of UmuD (PDB: 1i4v) involved in DNA repair and SOS mutagenesis. For alternative UmuD conformation, see Figure 5—figure supplement 3.

Figure 5—figure supplement 1
Contact maps of all complexes without solved 3D structure with at least one inter-ECs above EVcomplex score of 0.8.

Inter-ECs are shown as red dots in the top right and bottom left quadrant while intra-ECs of the two monomers are shown in green and blue in the top left and bottom right quadrant, respectively.

Figure 5—figure supplement 2
Contact maps of all complexes without solved 3D structure with at least one inter-ECs above EVcomplex score of 0.8 (2).
Figure 5—figure supplement 3
Details of the predicted UmuCD interaction residues.
Figure 6 with 1 supplement
Predicted interactions between the a-, b-, and c-subunits of ATP synthase.

(A) The a- and b- subunits of E. coli ATP synthase are known to interact, but the monomer structure of subunits a and b and the structure of their interaction in the complex are unknown. (B) EVcomplex prediction (right matrix) for ATP synthase subunit interactions compared to experimental evidence (left matrix), which is either strong (left, solid blue squares) or indicative (left, crosshatched squares). Interactions that have experimental evidence, but are not predicted at the 0.8 threshold are indicated as yellow dots. (C) Left panel: residue detail of predicted residue–residue interactions (dotted lines) between subunit a and b (residue numbers at the boundaries of transmembrane helices in gray). Right panel: proposed helix–helix interactions between ATP synthase subunits a (green), b (blue, homodimer), and the c ring (gray). The proposed structural arrangement is based on analysis of the full map of inter-subunit ECs with EVcomplex score ≥0.8 (Figure 6—figure supplement 1).

Figure 6—figure supplement 1
Contact map of predicted ECs in the ATPsynthase a and b subunits.

Inter-ECs are shown as red dots in the top right and bottom left quadrant while intra-ECs of the two monomers are shown in green and blue in the top left and bottom right quadrant, respectively.

Author response image 1


Table 1

EVcomplex predictions and docking results for 15 protein complexes

EVcomplex contactsDocking quality (iRMSD)
Complex nameSubunitsSeqsECsTP rate§Top ranked model#Best model
Carbamoyl-phosphate synthaseCarB:CarA2.3170.881.91.9
Aminomethyltransferase/Glycine cleavage system H proteinGcsH:GcsT2.950.25.45.4
Histidine kinase/response regulatorKdpD:CheY (T. maritima)95.4780.722.12.0
Ubiquinol oxidaseCyoB:CyoA1.0110.551.81.2
Outer membrane usher protein/Chaperone proteinFimD:FimC3.660.833.23.0
Molybdopterin synthaseMoaD:MoaE3.681.04.44.1
Methionine transporter complexMetN:MetI1.9140.861.51.2
Dihydroxyacetone kinaseDhaL:DhaK1.4120.426.72.4
Vitamin B12 uptake systemBtuC:BtuF3.
Vitamin B12 uptake systemBtuC:BtuD9.8210.881.10.9
ATP synthase γ and ε subunitsAtpE:AtpG2.9150.531.41.4
IIA-IIB complex of the N,N'-diacetylchitobiose (Chb) transporterPtqA:PtqB3.
30 S Ribosomal proteinsRS3:RS141.4110.911.11.1
Succinatequinone oxido-reductase flavoprotein/iron-sulfur subunitsSdhB:SdhA3.080.621.41.4
30 S Ribosomal proteinsRS10:RS141.
  1. Number of non-redundant sequences in concatenated alignment normalized by alignment length.

  2. Inter-ECs with EVcomplex score ≥0.8.

  3. §

    True Positive rate for inter-ECs above score threshold.

  4. #

    iRMSD positional deviation of model from known structure, for docked model with best HADDOCK score.

  5. Lowest iRMSD observed across all models.

Additional files

Supplementary file 1

Benchmark data set and results.

Supplementary file 2

De novo prediction data set and results.

Supplementary file 3

Docking results.

Supplementary file 4

Predicted inter-ECs for complexes in de novo prediction data set with EVcomplex score ≥0.8.

Supplementary file 5

ATP synthase interaction predictions.

Supplementary file 6

Comparison of ATP synthase EVcomplex predictions of a and b subunit with cross-linking studies.

Supplementary file 7

PDB identifiers used for comparison of predicted evolutionary couplings to known 3D structures.


Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)