Inferring joint sequence-structural determinants of protein functional specificity

Abstract
Introduction
Results
Discussion
Materials and methods
Appendix 1
References
Article and author information
Metrics

Abstract

Residues responsible for allostery, cooperativity, and other subtle but functionally important interactions remain difficult to detect. To aid such detection, we employ statistical inference based on the assumption that residues distinguishing a protein subgroup from evolutionarily divergent subgroups often constitute an interacting functional network. We identify such networks with the aid of two measures of statistical significance. One measure aids identification of divergent subgroups based on distinguishing residue patterns. For each subgroup, a second measure identifies structural interactions involving pattern residues. Such interactions are derived either from atomic coordinates or from Direct Coupling Analysis scores, used as surrogates for structural distances. Applying this approach to N-acetyltransferases, P-loop GTPases, RNA helicases, synaptojanin-superfamily phosphatases and nucleases, and thymine/uracil DNA glycosylases yielded results congruent with biochemical understanding of these proteins, and also revealed striking sequence-structural features overlooked by other methods. These and similar analyses can aid the design of drugs targeting allosteric sites.

https://doi.org/10.7554/eLife.29880.001

Introduction

Residues remote from an enzyme’s active site can influence catalytic activity and substrate specificity. It has been proposed that an enzyme generally has multiple conformational states that modulate its function, with residues remote from the active site often shifting the enzyme’s conformational equilibrium to favor interactions associated with specific substrates or reactions (Ramanathan et al., 2014; Bhabha et al., 2015; Whitney et al., 2016; Campbell et al., 2016). Computational methods can help identify such functionally relevant non-active-site residues and their interactions. For example, direct coupling analysis (DCA) (Morcos et al., 2011), which predicts structural contacts from covarying residue pairs in a multiple sequence alignment (MSA), has been used to infer major conformational transitions for Hsp70 chaperones (Malinverni et al., 2015) and to explain the conformational heterogeneity seen in molecular dynamics simulations (Sutto et al., 2015). Statistical Coupling Analysis (SCA) (Lockless and Ranganathan, 1999) seeks to identify structural pathways of ‘energetic connectivity’ by applying principal component analysis to a covariance matrix to identify groups of coevolving residue positions (Halabi et al., 2009). SCA has been used to design proteins (Reynolds et al., 2013) and to predict surface sites (Reynolds et al., 2011) and hydrophobic cavities (Tanwar et al., 2013) involved in allosteric regulation. Here, we investigate residue interaction networks by combining two correlation analysis methods distinct from DCA and SCA (see Figure 7): Bayesian Partitioning with Pattern Selection (BPPS) (Neuwald, 2014a; Neuwald, 2014b), which identifies arbitrarily large correlated residue patterns arising through evolutionary divergence, and Structurally Interacting Pattern Residues’ Inferred Significance (SIPRIS), which we first describe here.

BPPS relies on the observation that protein superfamilies often diverge into subgroups, each adapting the superfamily’s structural core to fill a functional niche. Often a subgroup G diverges further into smaller subgroups, each conserving residues constrained by G’s function, as well as other residues constrained by more specialized functions. Repeated rounds of such divergence have led to hierarchically arranged subgroups, each of which conserves distinctive residues at particular positions. BPPS identifies and characterizes these subgroups by partitioning an MSA into a hierarchically nested series of MSAs, a hiMSA, based on correlated residue patterns that are distinctive of each subgroup and that often include non-active site residues.

For each subgroup of interest, the SIPRIS program takes a BPPS-defined residue pattern as input, as well as structural coordinates for a protein from that subgroup. It then identifies the statistically most significant network of pattern residues embedded within a structurally defined cluster, with a view to suggesting hypotheses for experimental investigation. Such a network is doubly significant inasmuch as BPPS identifies significant residue patterns in the absence of structural data, whereas SIPRIS defines structural clusters in the absence of sequence data. In this way, SIPRIS may statistically validate the output of BPPS or other sequence-based methods. Of course, a set of residues identified by a sequence-based method may still be biologically relevant despite a lack of SIPRIS-assigned significance. However, as we illustrate, BPPS-SIPRIS analyses often elucidate sequence/structural properties that conventional computational and experimental approaches have failed to detect.

Results

SIPRIS takes as input: (1) structural coordinates for a protein of interest; (2) a set of residues defined by BPPS; and, optionally, (3) a predefined cluster of residues, or a starting residue defined either explicitly or as the residue closest to a ‘focal point’ molecule or atom. If a third input is absent, then SIPRIS uses each of the BPPS-defined residues as a starting residue, in turn, and returns the most significant result. Nested clusters are defined around a starting residue in one of three ways: (i) ‘Spherical expansion’, which sequentially adds residues closest to the starting residue, which thus forms the center of each cluster. (ii) ‘Core expansion’, which sequentially adds the residue closest to a residue within the cluster’s ‘core’. This core is defined as the starting residue R plus all cluster residues whose distance to their k^th closest cluster residue is less than R’s distance to its k^th closest cluster residue (with k = 7 by default; this was selected empirically to avoid both spherical- and tentacle-shaped clusters). In this case, the cluster typically expands less symmetrically. (iii) Hydrogen-bond-network expansion, which sequentially adds a residue forming the closest sidechain-to-sidechain or sidechain-to-backbone hydrogen bond with a cluster residue. (iv) For spherical or core clustering, SIPRIS may also take DCA scores (Marks et al., 2012, 2011) as a surrogate for 3D structural distances. SIPRIS evaluates the intersection between clusters and BPPS-defined residue sets with a p-value.

We applied BPPS-SIPRIS to a GCN5-like N-acetyltransferase (GNAT), several P-loop GTPases, an RNA Superfamily-II helicase, several members of the Synaptojanin/Exonuclease-Endonuclease-Phosphatase (EEP) superfamily, and two uracil/thymine DNA glycosylases. These results are summarized in Table 1. (Go to sipris.igs.umaryland.edu for BPPS output alignments.)

Table 1

Summary of BPPS-SIPRIS results for the most significant cluster in each test case.

https://doi.org/10.7554/eLife.29880.002

Protein	PDB	SIPRIS	Focal	BPPS-SIPRIS^‡			SIPRIS	Tree	Interpretive comments^#
	Structure	mode^*	point^†	Dist.	Init.	Term.	p-value	level^§
Gna1	4ag9A	p=BDF	-	22	57	71	8.5 × 10⁻⁷	1	Substrate and homodimeric interfaces
		S	CoA	17	41	87	6.8 × 10⁻⁵	0	CoA-binding subdomain
		S	-	23	56	72	9.3 × 10⁻⁶	1	DCA-based clustering
		S	-	14	21	107	2.5 × 10⁻⁴	1	Structure-based clustering
Rho1	3refB	B	-	20	53	100	8.3 × 10⁻⁵	1	(Active site secondary shell)
		C	-	22	55	98	7.8 × 10⁻⁷	1	“ “ “ “
Rab4	1z0kA	S	-	10	11	153	2.1 × 10⁻⁵	1	(Active site secondary shell)
		C	-	25	91	73	2.6 × 10⁻⁶	1	“ “ “ “
		p=B	-	14	23	141	2.9 × 10⁻⁸	2	Interface with Rabenosyn-5
		S	-	22	42	122	4.8 × 10⁻¹⁰	2	“ “ “ “
Rab8	3qbtA	p=B	-	13	23	139	5.2 × 10⁻⁷	2	Interface with Ocrl1
		p=B	-	12	23	139	6.1 × 10⁻⁶	3	Interface with Ocrl1 helix
	4lhwB	p=A	-	10	14	148	8.7 × 10⁻⁷	2	Homodimeric interface
EF-Tu	1ob5A	S	-	18	33	150	1.4 × 10⁻⁷	1	(GTP to tRNA allosteric link)
		S	-	23	71	112	1.0 × 10⁻⁶	2	(GTP/tRNA allosteric link to β-barrel)
		S	1B	22	81	102	1.3 × 10⁻⁵	1	Cluster around 5’ base 1 of tRNA
		S	2B	18	47	136	2.6 × 10⁻⁶	1	Cluster around 5’ base 2 of tRNA
	1efuA	S	81B	14	49	128	5.2 × 10⁻⁵	1	(Nucleotide exchange allosteric network)
	4zv4A	S	291C	21	66	109	0.0060	1	(Mediates hijacking by Tse6 toxin)
CysN	1zunB	S	-	23	79	118	6.3 × 10⁻⁵	2	(Allosteric link to β-barrel domain)
eIF4AIII	3ex7H	p=J	-	11	18	128	6.4 × 10⁻⁶	1	(ATP to RNA allosteric link)
		S	4J	13	18	128	5.1 × 10⁻⁷	1	Cluster around RNA rotation bond
		S	5J	16	41	105	5.5 × 10⁻⁴	1	“ “ “ “ “
APE1	5dfiA	H	11P	9	13	238	5.2 × 10^-6	0	Abasic site H-bond network
		H	11P	22	99	152	1.6 × 10⁻⁶	1	“ “ “ “
		H	-	25	137	114	1.7 × 10⁻⁶	1	(Active site secondary shell)
		H	9P	25	137	114	1.9 × 10⁻⁷	1	H-bond network positioning abasic site
		H	12P	23	119	132	7.6 × 10⁻⁶	1	“ “ “ “ “
Inpp5b	4cmlA	S	-	24	69	216	5.8 × 10⁻¹³	0	Active site core residues
		S	-	21	77	208	3.9 × 10⁻⁷	1	(Substrate recognition with allosteric link)
		S	-	12	30	255	0.0022	2	(Membrane substrate sequestration)
Inpp5b	3mtcA	S	-	22	91	194	8.0 × 10⁻⁷	1	(Substrate recognition with allosteric link)
		S	-	12	29	256	0.0015	2	(Membrane substrate sequestration)
Inpp5e	2xswA	S	-	25	140	148	3.7 × 10⁻⁷	1	(Substrate recognition with allosteric link)
		S	-	9	13	275	3.6 × 10⁻⁴	2	(Membrane substrate sequestration)
SHIP2	4a9cA	S	-	17	38	260	6.0 × 10⁻⁸	1	(Substrate recognition with allosteric link)
		S	-	4	4	294	0.30	2	(Membrane substrate sequestration)
TDG	5hf7A	H	17D	19	97	76	4.1 × 10⁻⁴	1	H-bond network around excised base
		H	-	20	98	75	3.5 × 10⁻⁵	1	H-bond network around catalytic water
UDG	2dp6A	B	-	13	17	121	1.7 × 10⁻⁵	1	H-bond network distinct from TDG

^*Modes: S, spherical expansion; C, core expansion; H, hydrogen bond expansion (involving sidechain interactions); B, hydrogen bond expansion (also involving backbone-to-backbone interactions); P, predefined clustering (residues in the cluster are those interacting with the chain(s) whose pdb identifiers are given to the right of the equal sign).

^†Focal points defining starting residue(s): ‘-‘,analysis was optimized over multiple starting residues (i.e., no focal point); CoA, cluster initiated from the residue closest to Coenzyme A; others, cluster initiated from the residue closest to the indicated position and chain (e.g., 1B = position 1 in pdb chain B).
^‡Nature of the optimum cluster: dist., the number of distinguishing residues within the cluster (total = 25); init., the total number of residues within the cluster; term., the number of residues outside of the cluster.

^§Codes designate pattern residue class: 0, superfamily; 1, family; 2, subfamily; 3, sub-subfamily. In the figures, these correspond to residues with yellow, red, orange and green sidechains, respectively.
^#Comments in parentheses indicate possible functions.

Distinct N-acetyltransferase cofactor- and substrate-binding subdomains

GNATs catalyze the transfer of a carboxylic acyl group from Coenzyme A (acyl-CoA) to a diversity of substrates. Previously, a BPPS analysis of glucosamine-6-phosphate N-acetyltransferase (Gna1) led to two observations (Neuwald and Altschul, 2016a) (Figure 1): (1) Within the homodimeric structure of Gna1 (pdb: 4ag9) (Dorfmueller et al., 2012), BPPS-defined residues for this family are contributed by both subunits to form the dimeric interface and the active site for each subunit. In contrast, within a single subunit most of these residues are far from the active site and face away from it. Thus, the BPPS analysis implicates family-specific residues in the formation of this unusual substrate-binding pocket between subunits. (2) Residues conserved in the GNAT superfamily cluster within an acyl-CoA binding subdomain distinct from the homodimer/substrate interacting subdomain. This raises the question: How likely is such a structural distribution of these family and superfamily residues to have occurred by chance?

Figure 1 with 1 supplement see all

Download asset Open asset

BPPS-SIPRIS analysis of the GNAT superfamily and Gna1-family based on structural coordinates for Gna1 (pdb: 4ag9) (Dorfmueller et al., 2012).

SIPRIS clearly associates Gna1-residues with the substrate and homodimeric interfaces (p=8.5 × 10⁻⁷). Color scheme: homodimer subunits A and B, green and blue backbones, respectively; BPPS-defined Gna1-family residues in subunits A and B, magenta and red sidechains, respectively (glycine residues are shown as C_α atom spheres); GNAT superfamily residues, yellow sidechains; ligands, cyan. Lys116 (shown in light red) is outside of the SIPRIS defined cluster, but forms a hydrogen bond to a CoA phosphate group. BPPS-SIPRIS spherical clustering identified the GNAT superfamily residues shown (p=1.7 × 10⁻⁵). The following figure supplement and source data are available for Figure 1.

https://doi.org/10.7554/eLife.29880.003

Figure 1—source data 1 Contrast alignments for Gna1 N-acetyltransferase.: https://doi.org/10.7554/eLife.29880.005
Download elife-29880-fig1-data1-v1.docx

SIPRIS returns a p-value of 8.5 × 10⁻⁷ for the intersection between Gna1-family residues and the predefined cluster of 57 residues contacting either the substrate or the other subunit (for residues conserved across GNATs, the corresponding p-value was 0.96). Among the 25 Gna1-family residues defined by BPPS, 22 intersect with the structurally defined cluster. The three remaining residues may perform complementary functions: Gly35 and Gly101 by imparting backbone flexibility and Lys116 by helping properly position CoA via interaction with a CoA phosphate group.

SIPRIS returns a p-value of 6.8 × 10⁻⁵ for the intersection between a (spherical) CoA-centered cluster and the set of residues conserved in all GNATs. (The corresponding p-value for Gna1-family residues is >0.99.) Of the 25 residues most distinctive of GNATs, 17 are among the 41 residues of this CoA-centered cluster. Hence, in the absence of explicit structural information, BPPS detects structurally and presumably biologically relevant features: GNAT-residues that map to an acetyl-CoA-binding module and Gna1-family residues that map to a substrate-specific ‘reaction chamber’ facilitating acetylation of glucosamine-6-phosphate.

DCA-based SIPRIS analysis

Spherical clustering using residue-to-residue pseudo-distances based on DCA pairwise scores (instead of actual structural distances) likewise identifies these Gna1 structural features. In fact, the DCA-based p-value for Gna1-family residues (9.3 × 10⁻⁶) was more significant than the corresponding structurally based p-value (2.5 × 10⁻⁴). We suggest two possible reasons for this. First, DCA scores are based on multiple sequences (1200 in this case) and thus implicitly on multiple structures rather than one. Second, DCA scores should be affected by pairwise contacts between homodimeric subunits, whereas SIPRIS currently considers distances only within a single subunit. Thus, DCA- and structurally-based analyses provide somewhat different perspectives.

Likely determinants of GTPase family and subfamily functional specificity

P-loop GTPases, upon binding to GTP versus GDP, undergo a conformational change in their so-called switch I and switch II regions that depends on the presence of a γ-phosphate group; this acts as a signal to downstream cellular components. We applied SIPRIS to two major subgroups: Rab/Rho/Ras/Ran GTPases (termed R⁴) and translation factor (TF) GTPases (Figure 2A).

Figure 2

Download asset Open asset

BPPS-SIPRIS analysis of R⁴ P-loop GTPases.

Bound guanine nucleotide (shown in cyan) allows orientation of each subfigure relative to the others. (A). BPPS-defined hierarchical relationships among the GTPases examined here. (B). *Entamoeba histolytica* Rho1 GTPase (pdb: 3refB) (Bosch et al., 2011). Color scheme: R⁴-specific residues forming a BPPS-SIPRIS-defined hydrogen-bond network (p=8.3 × 10⁻⁵), red sidechains; residues conserved in P-loop GTPases and interacting with bound guanine nucleotide, yellow sidechains; atoms forming hydrogen bonds, CPK coloring. Modeled hydrogen atoms were generated using the Reduce program (Word et al., 1999). (C). Rab4 bound to GTP and to the Rab-binding domain of Rabenosyn (pdb: 1z0kA [Eathiraj et al., 2005]). BPPS-SIPRIS-defined residues distinctive of R⁴ (red sidechains) and Rab (orange) have core and Rabenosyn-contacting predefined cluster p-values of 2.6 × 10⁻⁶ and 2.9 × 10⁻⁸, respectively. The sensor threonine (Thr40) has substantial van der Waals contact with Glu44; Thr40 is a R⁴-specific (red) residue outside of the SIPRIS-defined cluster. (D). Rab8a in complex with the GTP analog, GNP, and with Ocrl1 (residue 540–678) (pdb: 3qbtA) (Hou et al., 2011]). Residues distinctive of Rab GTPases (orange) and of the Rab8 subgroup (green) are enriched at the Ocr1 interface (p=5.2 × 10⁻⁷ and 6.1 × 10⁻⁶, respectively). (E). Rab8a homodimeric complex (pdb: 4lhwAB) (Guo et al., 2013). Rab-specific residues (orange) are enriched at the homodimeric interface (p=8.7 × 10⁻⁷). The following source data are available for Figure 2.

https://doi.org/10.7554/eLife.29880.006

Figure 2—source data 1 Contrast alignments for Rab8, Rab4 and Rho1 GTPases.: https://doi.org/10.7554/eLife.29880.007
Download elife-29880-fig2-data1-v1.docx

R⁴ GTPases function as on/off switches regulating cellular processes. GTPase activating proteins (GAPs) facilitate hydrolysis of bound GTP (the ‘on’ state) to GDP (the ‘off’ state). Guanine nucleotide exchange factors (GEFs) turn GTPases back on by stimulating replacement of GDP with GTP. SIPRIS identifies a significant network of BPPS-defined R⁴ residues. In Rho1 GTPases, this appears within a hydrogen-bond cluster (p=8.3 × 10⁻⁵; Figure 2B) or within a core cluster (p=7.8 × 10⁻⁷). In most Rab GTPases, this network often appears within a spherical or core cluster (e.g., Figure 2C) and, rarely, within a hydrogen-bond cluster (e.g. Rab9, pdb:1s8f [Wittmann and Rudolph, 2004]; p=9.0 × 10⁻⁴). We postulate that a significant hydrogen-bond network forms only in certain conformations. These R⁴ sequence/structural configurations correspond to features identified through previous analyses, including: (i) Several aromatic-CH-π interactions proposed to stabilize β-strands (Merkel and Regan, 1998) associated with the P-loop and with the guanine binding loop, and to facilitate guanine nucleotide exchange (Neuwald, 2009a) (Phe99-Gly131 and Trp114-Gly27 in Figure 2B). (ii) A salt bridge also associated with the guanine-binding loop (Arg137-Glu163 in Figure 2B). (iii) Residues forming a switch II ‘charge dipole pocket’ proposed to facilitate conformational changes associated with the switching mechanism (Neuwald, 2009b). And (iv) glutamine and glutamate residues proposed to function in GTP hydrolysis (Vetter and Wittinghofer, 2001) and nucleotide exchange (Gasper et al., 2008), respectively. We propose that, together, these residues, which adjoin the GTP-binding site from the guanine-binding loop to the γ-phosphate interacting switch II region, constitute in large part the R⁴ switching mechanism.

SIPRIS identifies a network of residues distinctive of the Rab subfamily of R⁴ GTPases within a spherical cluster in the switch I and II regions (p=4.8 × 10⁻¹⁰ for Rab4). Rab subfamily residues also intersect with those residues contacting Rab-binding domains, with high significance based on predefined clustering: for Rab4-Rabenosyn-5 (Figure 2C) (Eathiraj et al., 2005) and Rab8a-Ocr1 (Hou et al., 2011) (Figure 2D) p=2.9 × 10⁻⁸ and 5.2 × 10⁻⁷, respectively. This occurs despite the Rabenosyn and Ocrl1 domains being structurally distinct. Rab subfamily residues are similarly enriched at the Rab8a homodimeric interface (p=8.7 × 10⁻⁷) (Figure 2E) (Guo et al., 2013), supporting the notion that these residues can interact with diverse structural folds. For the Rab4 structure in Figure 2C, Thr40, another R⁴-specific residue, albeit one outside of the SIPRIS-defined cluster, corresponds to the switch I residue that senses the γ-phosphate of GTP. This residue establishes its greatest contact area (45 Å²) with Glu44, one of the Rab-specific residues contacting Rabenosyn-5; thus Thr40 and Glu44 may link sensing of the γ-phosphate to substrate binding. For Rab8a both Rab- and Rab8-specific residues appear to mediate binding to the Ocr1 domain (Figure 2D); in all, 19 of the 23 Rab8-Ocrl1 interface residues are distinctive of either the Rab subfamily or the Rab8 sub-subfamily. Many of the Rab8-residues interact with an N-terminal helix extending out of the Ocrl1 β-sandwich domain, perhaps thereby compensating for the lack of binding specificity of Rab-subfamily residues.

BPPS grouped translation factor (TF) GTPases into a single family (Figure 2A), which includes initiation factors (e.g. IF2 and eIF5B), sulfate adenyltransferases (CysN), ribosome-releasing factor 2, peptide chain release factor 3, elongation factors EF-Tu, EF1α and selenocysteine-specific elongation factor, EF4, aEF2, and EF-G (Leipe et al., 2002). Within Thermus aquaticus EF-Tu complexed with a GTP analog, Phe-tRNA, and the antibiotic Enacyloxin IIA (Parmeggiani et al., 2006), TF-specific residues (Figure 3A) spherically cluster around the switch I and II and P-loop regions (p=1.4 × 10⁻⁷); this differs from the R⁴-residue arrangement in Figure 2B. The two 5’-terminal tRNA nucleotide bases, which base-pair with the 3’ strand to which the aminoacyl group is attached, establish the greatest contact with the EF-Tu GTPase domain among all the bases of the tRNA. TF-specific residues cluster around these 5’ bases (p=1.3 × 10⁻⁵ and 2.6 × 10⁻⁶, respectively) and link the 5’ region of aa-tRNA to the GTP γ-phosphate; this cluster includes Thr62, which senses γ-phosphate. We hypothesize that, upon correct tRNA-anticodon pairing with its mRNA codon, these TF residues assist in coupling GTP hydrolysis to coordinated conformational changes that dissociate EF-Tu from the ribosome and from tRNA, which can then fully enter the ribosomal A site.

Figure 3

Download asset Open asset

BPPS-SIPRIS analysis of translation-associated P-loop NTPases.

(A). *Thermus aquaticus* EF-Tu complexed with the antibiotic enacyloxin IIA, a GTP analog, and Phe-tRNA (pdb: 1ob5) (Parmeggiani et al., 2006). Color scheme: BPPS-SIPRIS defined GTPase-, TF- and EF-Tu/CysN-specific residues, yellow, red, and orange sidechains, respectively; GTPase domain backbone, green; C-terminal β-barrel domains, gray; phe-tRNA, teal; 5’ end nucleotide bases, light cyan; guanine nucleotide, cyan; enacyloxin IIA, greenish-cyan. Spheres indicate glycine C_α atoms. (B). BPPS-SIPRIS cluster of EF-Tu TF-residues centered on EF-Ts Phe81 at the EF-Tu/EF Ts interface (pdb: 1efu) (Kawashima et al., 1996). Regions in EF-Ts conserved between *E. coli* and cow are shown in cyan both in the figure and in the corresponding alignment below it. (C). *P. aeruginosa* EF-Tu bound to the Tse6 toxin domain (pdb: 4zv4) (Whitney et al., 2015). EF-Tu His20, which corresponds to His19 in (B), appears to form a salt bridge with Glu291 of Tse6. In light pink are regions of Tse6 contacting EF-Tu. Spherically clustered residues (p=0.0060) centered on Glu291 of Tse6 are shown with red sidechains. (D). Spherically clustered EF-Tu/CysN residues (orange; p=6.3 × 10⁻⁵) within the CysND complex (pdb: 1zun) (Mougous et al., 2006). (E). Spherically clustered EF-Tu/CysN-residues in EF-Tu (pdb: 1ob5) (p=1.0 × 10⁻⁶). (F). Human eIF4AIII bound to RNA, ADP, and the γ-phosphate transition state mimic AlF₃ (pdb: 3e × 7) (Nielsen et al., 2009). Color scheme: eIF4AIII N- and C-terminal domains, violet and green, respectively; RNA and ADP, cyan; AlF₃, light cyan; superfamily-conserved catalytic residues, yellow sidechains; RNA helicase-specific residues clustered on (light cyan-colored) RNA bases 4–5, red; other RNA helicase-specific residues, light red; C-terminal catalytic residues, bright green. The following source data are available for Figure 3.

https://doi.org/10.7554/eLife.29880.008

Figure 3—source data 1 Contrast alignments EFTu GTPase and eIF4AIII RNA helicase.: https://doi.org/10.7554/eLife.29880.009
Download elife-29880-fig3-data1-v1.docx

TF-specific residues also may be important for guanine nucleotide exchange mediated by EF-Ts. Within the structure of EF-Tu bound to EF-Ts (pdb: 1efu) (Kawashima et al., 1996), 14 TF-residues form a (spherical) cluster (p=5.2 × 10⁻⁵; Figure 3B) centered on Phe81 of EF-Ts, the residue with the greatest area of contact with EF-Tu. These TF-residues, which include His19, His84, and Gln114 of EF-Tu, adjoin two regions of EF-Ts contacting EF-Tu and are conserved across bacteria and eukaryotes (Figure 3B). His19, which is located in the P-loop of EF-Tu, is the residue that is most characteristic of these translation factors. Both His19 and Gln114 have been implicated in nucleotide exchange (Zhang et al., 1998), and in destabilization of Mg⁺² coordination (leading to guanine nucleotide release) upon intrusion of EF-Ts Phe81 near His84 of EF-Tu (Schümmer et al., 2007). Given recent evidence for an EF-Tu/Ts·GTP·aa-tRNA quaternary complex (Burnett et al., 2014), we conjecture that TF-residues may help couple GTP-hydrolysis-mediated loading of aa-tRNA onto the ribosome with nucleotide exchange by EF-Ts. P. aeruginosa Tse6 toxin (Whitney et al., 2015) appears to have hijacked this TF interaction interface with EF-Ts (Figure 3C).

BPPS partitions EF-Tu and CysN into a common subfamily within the TF family, consistent with earlier analysis supporting their specific relationship (Leipe et al., 2002; Inagaki et al., 2002). CysN, together with the catalytic CysD subunit, form a sulfate adenylyltransferase complex involved in sulfur assimilation. The CysND-catalyzed reaction is analogous to the first step in charging a tRNA, and CysN’s contact sites with CysD are similar to, and include residues homologous to, EF-Tu’s contact sites with aa-tRNA. Within the CysND complex (pdb: 1zun) (Mougous et al., 2006) EF-Tu/CysN-residues cluster around the switch I and II regions (p=6.3 × 10⁻⁵; Figure 3D). In CysN, these residues adjoin contact regions with CysD and with the CysN C-terminal linker and β-barrel domains. Analogously in EF-Tu, they are proximal to the contact region with aa-tRNA and the EF-Tu C-terminal linker and β-barrel domains (Figure 3E). Within EF-Tu these residues are also located between the bound antibiotic enacyloxin IIA and the GTPase- and TF-specific residues (Figure 3A). Because enacyloxin IIA hinders the release of EF-Tu-GDP from the ribosome (Parmeggiani et al., 2006), we hypothesize that these residues may help mediate this process.

Comparison of two P-loop NTPase superfamilies: eIF4AIII RNA helicase

For comparison, we analyzed another nucleic-acid-associated P-loop NTPase, the Superfamily II RNA helicase eIF4AIII, which is a component of the exon junction complex (EJC). The EJC deposits onto spliced mRNAs and plays an important role in mRNA transport, translation, and quality control. RNA helicases are part of a huge group of NTPases that undergo ATP-hydrolysis-coupled conformational changes to unwind double-stranded nucleic acids, translocate nucleic acids or re-distribute protein complexes on nucleic acids (Anantharaman et al., 2002; Bourgeois et al., 2016; Lohman et al., 2008; Northall et al., 2016). For the transition state structure of eIF4AIII bound to RNA, a predefined cluster of RNA helicase-specific residues contacting RNA is highly significant (p=6.4 × 10⁻⁶; Figure 3F). Focal point spherical clustering indicates that these residues are centered on RNA bases 4 and 5 (p=5.1 × 10⁻⁷ and p=5.5 × 10⁻⁴, respectively), which establish the greatest contact with the ATPase domain. These observations and a rotated bond between bases 4 and 5 suggest that these residues help couple ATP hydrolysis to disruption of duplex RNA. Clusters centered on other bases are not significant (p>0.9). Most of the remaining RNA helicase-specific residues surround key active site residues or interact with C-terminal domain catalytic residues, including two arginine fingers (Figure 3F). Given this configuration, ATP hydrolysis seems likely to shift the relative orientations of the N- and C-terminal domains, both of which interact with RNA.

Residue networks adapting the EEP catalytic core to diverse substrates

EEP enzymes cleave phosphodiester bonds in substrates that include nucleic acids and phospholipids. To identify residues likely responsible for EEP functional divergence, we applied BPPS-SIPRIS to APE1, an exonuclease III-like apurinic/apyrimidinic endonuclease (exoIII-AP-endo), and several inositol polyphosphate 5-phosphatases (INPP5) (Figure 4A).

Figure 4

Download asset Open asset

BPPS-SIPRIS analysis of synaptojanin/EEP domains.

(A). The two major groups of the BPPS-defined EEP hierarchy examined here. (B). Human APE1 phosphorothioate substrate complex (pdb: 5dfi) (Freudenthal et al., 2015). Replacement of the phosphodiester bond with phosphorothioate prohibits cleavage by APE1 at the abasic site (circled). Cys310, which is nitrosated, is indicated. Color scheme: APE1 backbone trace, green; DNA strand containing the abasic site, cyan; complementary strand, marine blue; the BPPS-SIPRIS-defined residues distinctive of the EEP superfamily and of the exoIII-AP-endo family, yellow and red sidechains, respectively; basic residues within a loop interacting with the major groove of DNA, purple. (C). Close up of the APE1 active site. EEP-specific residues forming a hydrogen-bond network are shown with yellow sidechains. For clarity, only a few of the EEP- and exoIII-AP-endo-specific residues in the network are shown. The following source data are available for Figure 4.

https://doi.org/10.7554/eLife.29880.010

Figure 4—source data 1 Contrast alignments for APE1 endonuclease.: https://doi.org/10.7554/eLife.29880.011
Download elife-29880-fig4-data1-v1.docx

APE1 participates in the DNA excision repair pathway by incising the apurinic/apyrimidinic (AP) site phosphodiester backbone; this generates a single nucleotide DNA gap with 3’-hydroxyl and 5’-deoxyribose phosphate termini—a cytotoxic intermediate substrate that is then processed by DNA polymerase β (Liu et al., 2007). A proposed mechanism for APE1 (Mol et al., 2000) involves superfamily-conserved active site residues forming hydrogen bonds with the oxygen atoms of the phosphate group at the abasic site. Consistent with this, SIPRIS identifies a superfamily-conserved hydrogen-bond network centered on the abasic site (p=5.2 × 10⁻⁶) within a structure of APE1 bound to DNA harboring an abasic site phosphate group analog (phosphorothioate) in one strand (Figure 4B,C). Centering on adjacent bases in the same strand was less significant (p>0.003). For exoIII-AP-endo-conserved residues SIPRIS identifies a significant hydrogen-bond network centered on the abasic site (p=1.6 × 10⁻⁶) or on adjacent bases 8–9 and 12–13 (p=1.9 × 10⁻⁷ to 7.6 × 10⁻⁶); these residues may contextually position catalytic residues around the abasic site. In particular, regions associated with these residues insert into the DNA major and minor grooves on either side of the abasic site, and form a kink in and engulf the target DNA strand (Figure 4B). Thus, exoIII-AP-endo residues appear to form a substrate-specific ‘reaction chamber’, as might be expected. They also tend to aggregate between the catalytic core and a loop containing basic residues that interact with the major groove of DNA (Figure 4B). Modification by nitric oxide (nitrosation) of one of these residues, Cys310, results in dissociation of APE1 from DNA and relocation to the cytoplasm (Qu et al., 2007); thus, the associated hydrogen-bond network may communicate the nitrosation signal to the DNA-binding site.

BPPS-SIPRIS-defined INPP5-residues also form a significant hydrogen bond network (p=1.1 × 10⁻⁷) adjacent to the superfamily-conserved cluster (Figure 5A,B). We hypothesize that this network recognizes inositol polyphosphates harboring phosphate groups at positions 4 and 5 of the inositol ring. INPP5 phosphatases cleave the 5-phosphate, but require for recognition the 4-phosphate, which directly interacts with three network-associated basic residues—perhaps thereby mediating substrate recognition (Figure 5C). In some structures, the INPP5 network residues most remote from the catalytic core are part of a cleft accommodating a phosphate or a glycerol (Figure 5D,E), suggesting that these may form another (unknown) membrane interaction site or an allosteric site that binds a molecule similar to the known substrate.

Figure 5

Download asset Open asset

BPPS-SIPRIS analysis of synaptojanin/EEP domains within INPP5 proteins.

Color code: EEP-residues, yellow sidechains; INPP5 residues, red sidechains; INPP5B-, INPP5E- and SHIP2-subfamily residues, orange sidechains; ligands, cyan; atoms involved in hydrogen bonds, CPK coloring. (A). Human INPP5B in complex with phosphatidylinositol 3,4-bisphosphate (pdb: 4cml) (Trésaugues et al., 2014), which is associated with cytosolic and mitochondrial membranes (Speed et al., 1995). BPPS-SIPRIS results: EEP spherical cluster, p=5.8 × 10⁻¹³; INPP5 spherical cluster, p=3.9 × 10⁻⁷; INPP5B spherical cluster, p=0.0021. (B). INPP5 hydrogen bond network within human INPP5B (pdb: 3mtc) (unpublished). (C). View of INPP5-residues (in 3mtc) that bind the 4-phosphate group required for substrate recognition. (D). Human INPP5B with phosphate bound to a possible membrane interaction or allosteric site (Mills et al., 2016). (E). Human INPP5B Ocrl with glycerol bound to the same site as indicated in (D) (Trésaugues et al., 2014). (F). INPP5 subgroups within the BPPS-defined hierarchy. (G). Human INPP5E (pdb: 2xsw) (unpublished), which is associated with the primary cilium, an organelle involved in signal transduction (Jacoby et al., 2009) (spherical cluster, p=3.6 × 10⁻⁴). (H). Human SHIP2 (pdb: 4a9c) (Mills et al., 2012), which is associated with membrane ruffle formation (Hasegawa et al., 2011) (spherical cluster, p=0.30). The following source data are available for Figure 5.

https://doi.org/10.7554/eLife.29880.012

Figure 5—source data 1 Contrast alignments for INPP5 phosphatases.: https://doi.org/10.7554/eLife.29880.013
Download elife-29880-fig5-data1-v1.docx

INPP5 proteins regulate diverse cellular processes, including postsynaptic vesicular trafficking, insulin signaling, cell growth and survival, and endocytosis. With this in mind, we examined three INPP5 subfamilies: INPP5B, INPP5E and SHIP2 (Figure 5F). Residues that most distinguish the INPP5B subfamily form a cluster between the proposed membrane interacting region (Trésaugues et al., 2014) and the EEP catalytic core (Figure 5A). INPP5E- and SHIP2-specific residues also cluster in this same region (Figure 5G,H)—although the SHIP2 cluster is not statistically significant. This suggests a possible role for these residues in sequestering specific membrane-associated phosphoinositide substrates from the lipid bilayer.

Family-specific catalysis: thymine DNA glycosylases

Uracil DNA glycosylases (UDGs) remove uracil from DNA, thereby initiating the DNA base excision repair pathway (Aravind and Koonin, 2000). Uracil may be incorporated into DNA by DNA polymerase or by cytosine deamination. Thymine DNA glycosylases (TDGs) initiate base excision repair by removing T from G·T mispairs, which can be due to deamination of 5-methylcytosine. These enzymes also remove oxidized derivatives of methyl cytosine such as 5-formyl and 5-carboxymethyl cytosine, which are epigenetic marks or intermediates in the reset of 5mC marks by the TET enzymes (Pastor et al., 2013). Within the structure for human TDG (Pidugu et al., 2016) BPPS-SIPRIS identifies a significant hydrogen-bond network associated with TDG-family residues (Figure 6A,B); also in this network are residues classified by BPPS to a metazoan TDG subfamily. Like APE1, network residues appear to position loops containing basic residues that, in this case, interact with both the major and minor grooves of bound DNA (Figure 6C). Network residues also form hydrogen bonds to DNA oxygen atoms on either side of the thymine base being excised—suggesting that they may help position the substrate for catalysis by sensing particular sequence contexts (Figure 6B). Near the center of this network and in contact with the targeted thymine base is the residue most distinctive of metazoan TDGs, Asn230 (Figure 6B and Figure 6—source data 1); in other TDG subfamilies, a hydrophobic residue occurs at this position. Other TDG-residues in this network encase a water molecule believed to function as a nucleophile in catalysis (Pidugu et al., 2016) (Figure 6D). Hence, for TDG, family-specific residues may play a critical catalytic role. UDG harbors a hydrogen bond network distinct from that of TDG (Figure 6E), indicating a mechanistic divergence.

Figure 6

Download asset Open asset

BPPS-SIPRIS analysis of DNA glycosylases.

(A). Thymine DNA glycosylase (TDG) family (red sidechains) and metazoan subfamily (orange sidechains) residues forming a significant hydrogen bond network (p=3.5 × 10⁻⁵) within human TDG (pdb: 5hf7) (Pidugu et al., 2016). (B). TDG H-bond network consisting of residues distinctive both of all TDGs (red sidechains) and of metazoan TDGs (orange sidechains). This network includes hydrogen bonds to DNA oxygen atoms on either side of the thymine base to be excised (cyan); note that Phe238 and Tyr235 appear to position the N-terminus of their helix to hydrogen bond to substrate backbone oxygens; another such hydrogen bond involves Ser273, a residue generally conserved in the entire superfamily. The water molecule shown may act as the nucleophile in the reaction. For clarity, not all of the BPPS-SIPRIS-defined residues are shown. (C). TDG hydrogen-bond network residues may help position basic residues (green sidechains) interacting with the minor and major grooves of DNA. (D). TDG family-specific hydrogen-bond network residues surrounding a proposed catalytic water molecule (red sphere with dot cloud). (E). A BPPS-SIPRIS-defined H-bond network (p=1.7 × 10⁻⁵) distinct from that of TDG within *Thermus thermophilus* uracil DNA glycosylase (UDG) (pdb: 2dp6). The following source data are available for Figure 6.

https://doi.org/10.7554/eLife.29880.014

Figure 6—source data 1 Contrast alignments for DNA glycosylases.: https://doi.org/10.7554/eLife.29880.015
Download elife-29880-fig6-data1-v1.docx

Applying SIPRIS with other methods

Applying SIPRIS in conjunction with various protein function determining residue (FDR) methods (Casari et al., 1995; Ye et al., 2008; Pirovano et al., 2006; Kalinina et al., 2004; Hannenhalli and Russell, 2000; Livingstone and Barton, 1996; Mihalek et al., 2004; Mirny and Gelfand, 2002; Lichtarge et al., 1996; Sankararaman and Sjölander, 2008; Fischer et al., 2008; Kalinina et al., 2009; Janda et al., 2012; Janda et al., 2014; Marttinen et al., 2006; Kolesov and Mirny, 2009; Wilkins et al., 2012; Chakraborty and Chakrabarti, 2015; Gaucher et al., 2002; Xin and Radivojac, 2011; Capra and Singh, 2008) is straightforward in principle. However, several factors complicate comparisons to BPPS-SIPRIS. First, a fair number of published FDR methods are no longer available as source code, executables or over the world wide web (e.g. INTREPID [Sankararaman and Sjölander, 2008] and MINER [La and Livesay, 2005]). Second, many FDR methods (e.g. GroupSim [Capra and Singh, 2008]) require user-provided input, such as an MSA, a phylogenetic tree, or prespecified categories with corresponding sequence assignments for each category. This confounds the comparison because the contribution of each user-provided component to overall performance is unclear. In contrast, BPPS-SIPRIS requires no input beyond the query and database sequences, and its algorithmic components are statistically coherent. Third, those FDR methods not requiring user-generated input typically are based on a phylogenetic tree; this renders infeasible their application to large sequence sets, which is a key aspect of SIPRIS’s ability to detect biologically relevant features. Our attempts to input even moderately large sequence sets to various FDR programs resulted in runtime errors. By focusing on a hierarchy of subgroups, each defined by a correlated residue pattern, BPPS eliminates the need for a phylogenetic tree, which would introduce more complexity than either is necessary or can be reliably inferred.

Finally, BPPS-SIPRIS aims to identify biologically relevant interaction networks whose functions are not necessarily known, whereas FDR methods generally try to identify residues responsible for well-characterized functions—such as catalysis or substrate recognition—that can be experimentally benchmarked (Chakrabarti and Panchenko, 2009). However, as has been noted (Dessimoz et al., 2013; Jiang et al., 2014), we lack reliable gold standards for many functionally relevant residues, due to a lack of experimental characterization. Consequently, methods designed to identify residues with specific, known functions, if successful, will tend to penalize residues involved in unknown functions. In contract, the goal of BPPS-SIPRIS is to recognize also such residues of unknown function.

With this in mind, we compared the BPPS-SIPRIS analyses in this study to SIPRIS analyses based on the FRpred (Fischer et al., 2008), CLIPS-1D (Janda et al., 2012), and Evolutionary Trace (ET) (Lichtarge et al., 1996; Wilkins et al., 2012) programs, which define residue sets given only a query sequence. These and similar methods differ from BPPS by not classifying sequences into divergent subgroups per se. Instead, FRpred seeks to classify residues as catalytic, ligand binding and subtype-specific. FRpred catalytic and ligand-binding residues generally correspond to superfamily-conserved residues, whereas FRpred subtype-specific residues fail to correspond to any BPPS subgroups. For example, when we ran the Rab4 analysis as in Figure 2C using FRpred-defined residue sets instead of BPPS-defined sets, the first two FRpred categories nearly entirely overlapped with each other and with the Rab4 structural core; the subtype-specific category failed to return a significant cluster (p>0.05). SIPRIS analyses of other protein domains yielded similar results. CLIPS-1D defines catalytic, ligand-binding and structural categories, which likewise fail to correspond to BPPS subgroups. ET assigns residue functional importance scores without splitting into categories, and thus fails to differentiate between BPPS subgroups. As previously noted (Madabushi et al., 2002), high ET-scoring residues are often clustered structurally, which SIPRIS analyses confirm. Due to methodological differences, however, BPPS-SIPRIS clustering identifies sequence/structural features distinct from these other methods, as illustrated in Figure 1—source data 1. Although other methods may identify biologically relevant residues different than those identified here, this study suggests that by characterizing divergent subgroups, BPPS-SIPRIS analyses can identify significant, otherwise overlooked sequence/structural properties.

Discussion

Active site residues directly involved in catalysis are believed often to communicate with a network of other functionally important residues, some of which may be far from the active site (Sunden et al., 2015). The problem of identifying these networks is fundamental for understanding how proteins work. As illustrated here, BPPS-SIPRIS analyses can reveal information relevant to functional specialization by identifying statistically significant interaction networks. This includes, for example: (1) The nitrosation associated network in APE1 of the synaptojanin (EEP) superfamily. (2) The protein-protein interaction interfaces for diverse R⁴ GTPases. (3) The protein-protein interaction interface in EF-Tu, which can be hijacked by the P. aeruginosa Tse6 toxin (Whitney et al., 2015). In each of these cases, the residue-networks identified by our analysis suggest features congruent with current biochemical understanding of these proteins. Additionally, our analyses generated the following hypotheses: (1) Family-specific residues form hydrogen bonds (Figure 4C) responsible for APE1 abasic site substrate specificity. (2) INPP5 family and sub-family specific residues (Figure 5E–F) mediate, respectively, allosteric regulation and sequestration of specific membrane-associated phosphoinositide substrates from the lipid bilayer. (3) A hydrogen bond network associated with the residue most distinctive of metazoan TDGs, Asn230 in humans, mediates substrate-specific catalysis in DNA glycosylases, perhaps related to the discrimination of epigenetic marks present in metazoan DNA (Pastor et al., 2013; Zhang et al., 2012), such as 5-fC and 5-caC.

More generally our analyses suggest: (1) Family-specific residues often form a substrate-specific ‘reaction chamber’ associated with the structural core and active site, as seen for Gna1-related acetyltransferases, phosphoesterases related to APE1, and DNA glycosylases. (2) Subfamily-specific residues serve subordinate roles, such as mediating interactions with effector proteins, or coupling conformational changes to signaling. In this way, the same basic structural core and catalytic mechanism may accommodate a wide variety of cellular functions.

The SIPRIS clustering strategies described here accommodate further development. For example, one might use consensus distances from multiple structures to reduce noise. An open question is the significance of multiple BPPS-SIPRIS networks for a single subgroup, analogous to that for multiple regions of similarity between two sequences (Karlin and Altschul, 1993). Additional strategies include: applying BPPS-SIPRIS to functionally interacting proteins, treating them as a single sequence; and defining clusters using features such as secondary structure, surface accessibility or electrostatic potential. BPPS identifies correlated residue patterns presumably associated with functional specialization, and SIPRIS identifies correlations between defined residue sets and structural features. In contrast, DCA identifies correlations between pairs of residues that presumably interact structurally. Combining BPPS-SIPRIS with DCA may improve protein modeling and the characterization of functional interactions. Given the statistical and information theoretic foundation of these methods, one should be able to combine them in a principled manner.

In summary, the BPPS-SIPRIS system should aid the characterization of functionally interacting residues remote from protein active sites.

Superfamily	structures*		RMSD^† (Å)				Domain length^‡			Resolution (Å)
	% ID	No.	Avg	Min	Max	S.D.	MSA	Avg	S.D.	Avg	Max
GNAT	27	16	3.25	1.0	6.7	1.4	125	139.8	17.0	1.94	2.61
GTPases	30	20	3.96	0.6	14.7	3.5	164	195.9	41.6	2.31	3.10
Helicases	40	12	6.39	2.6	9.8	1.8	466	482.8	60.7	2.86	3.56
EEP	40	16	3.02	0.8	5.2	0.95	241	259.0	27.6	2.07	2.99
UDG/TDG	40	8	2.54	1.1	3.6	0.69	125	135.9	12.7	1.83	2.58

Superfamily	Subgroup	# Sequences	% Identity^*	# Nodes in subtree	Minimum subtree size
GNAT		237,359	98	44	200
	Gna1 family	1243		1
GTPases		127,418	95	121	500
	R⁴ family	18,901		26
	Rab subfamily	7002		12
	Rab8 sub-subfamily	3.312		7
	TF family	25,224		10
	EFTu/CysN subfamily	4429		3
Helicases		131,321	98	47	300
	RNA helicases	36,788		8
EEP		45,799	99	166	100
	exoIII-AP-endo	13,711		47
	INPP5	3855		14
TDG/UDG		23,592	98	47	100
	TDG	1639		6
	UDG	376		1

Superfamily	# subgroups		BPPS	Annotated by SFLD			BSG^‡	BPPS conflicts^§			Maximum
	SFLD	BPPS	min.^*	No	Yes	expt^†		Error	?	Correct	% errors^#
radical SAM	49	17	800	52,608	17,680	12	13,676	10	6	326	0.12
glutathione transferase	26	15	100	6921	3633	0	1945	0	0	0	0
peroxiredoxin	6	11	100	3870	5521	0	5255	0	1	0	0.02
haloacid dehalogenase	24	28	200	21,768	33,379	9	26,589	35	66	27	0.38
isoprenoid synthase I	9	7	200	9666	1604	55	1536	0	0	0	0
isoprenoid synthase II	3	5	100	6974	671	38	591	1	0	0	0.17
nitroreductase	110	11	200	0	17,318	0	7242	20	11	0	0.43
enolase	8	8	800	26,227	2267	7	2143	0	0	0	0
				total:	82,073	121	58,977	66	84	353	avg: 0.14

Subgroup IDs		SFLD&	SFLD^#
BPPS	SFLD^‡	BPPS^§	Total	%
root^†	various	1531	1618	96.2
	1138	82	833	9.8
34	0	200	21768	0.9
	1129	3	129	2.3
23	0	101	21768	0.5
	1124	1	495	0.2
	1135	125	9423	1.3
21	0	158	21768	0.7
	1124	91	495	18.4
	1145	2	43	4.7
20	0	46	21768	0.2
	1131	162	201	80.6
25	0	76	21768	0.3
	1135	311	9423	3.3
2	0	1915	21768	8.8
	2	10091	11846	85.2
3	0	937	21768	4.3
	1129	4	129	3.1
	1134	1	866	0.1
	1135	4500	9423	47.8
	1139	4	1851	0.2
	1140	1	821	0.1
4	0	2422	21768	11.1
	2	1	11846	0.0
	1137	9	1430	0.6
	1140	3	821	0.4
	1141	53	278	19.1
	1142	2	236	0.8
	1144	2497	2759	90.5
5	0	229	21768	1.1
	2	986	11846	8.3
6	0	342	21768	1.6
	1124	360	495	72.7
7	0	330	21768	1.5
	1134	628	866	72.5
33	0	100	21768	0.5
	1134	153	866	17.7
8	0	57	21768	0.3
	1133	400	400	100

Share this article

Cite this article

Summary of BPPS-SIPRIS results for the most significant cluster in each test case.

BPPS-SIPRIS analysis of the GNAT superfamily and Gna1-family based on structural coordinates for Gna1 (pdb: 4ag9) (Dorfmueller et al., 2012).

Figure 1—source data 1

BPPS-SIPRIS analysis of R4 P-loop GTPases.

Figure 2—source data 1

BPPS-SIPRIS analysis of translation-associated P-loop NTPases.

Figure 3—source data 1

BPPS-SIPRIS analysis of synaptojanin/EEP domains.

Figure 4—source data 1

BPPS-SIPRIS analysis of synaptojanin/EEP domains within INPP5 proteins.

Figure 5—source data 1

BPPS-SIPRIS analysis of DNA glycosylases.

Figure 6—source data 1

Overview of BPPS-SIPRIS analysis.

Structural diversity among proteins identified and aligned by MAPGAPS.

Summary of BPPS results for five superfamilies.

Eleven haloacid dehalogenase sequences that the SFLD assigned to SG1129, but that are more closely related to SG1130 sequences.

Summary of SFLD benchmarking of BPPS.

Correspondence between BPPS and SFLD subgroups for haloacid dehalogenases*.

Haloacid dehalogenase SG1129 sequences that BPPS assigned to distinct subgroups (BSG).

Average percentage of matches to various BPPS subgroup (BSG) patterns for haloacid dehalogenase sequences assigned to SFLD subgroup SG1135.

BPPS-SIPRIS analyses using MAPGAPS (MG) versus Jackhmmer (JH) generated MSAs as input.

Author details

Andrew F Neuwald

Contribution

For correspondence

Competing interests

L Aravind

Contribution

Competing interests

Stephen F Altschul

Contribution

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

BPPS-SIPRIS analysis of R⁴ P-loop GTPases.