Structural assembly of the bacterial essential interactome

  1. Jordi Gómez Borrego
  2. Marc Torrent Burgas  Is a corresponding author
  1. Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de Barcelona, Spain

Abstract

The study of protein interactions in living organisms is fundamental for understanding biological processes and central metabolic pathways. Yet, our knowledge of the bacterial interactome remains limited. Here, we combined gene deletion mutant analysis with deep-learning protein folding using AlphaFold2 to predict the core bacterial essential interactome. We predicted and modeled 1402 interactions between essential proteins in bacteria and generated 146 high-accuracy models. Our analysis reveals previously unknown details about the assembly mechanisms of these complexes, highlighting the importance of specific structural features in their stability and function. Our work provides a framework for predicting the essential interactomes of bacteria and highlight the potential of deep-learning algorithms in advancing our understanding of the complex biology of living organisms. Also, the results presented here offer a promising approach to identify novel antibiotic targets.

Editor's evaluation

This important study uses AlphaFold2 to predict the structures of bacterial protein complexes that the authors classify as "essential". The evidence supporting the conclusions is convincing, as the authors have tested the approach on an external dataset of 140 experimentally solved bacterial protein-protein complexes, 81% of which were predicted with high accuracy. This paper will be of general interest to a wide audience in the field of biosciences and in particular for molecular biologists.

https://doi.org/10.7554/eLife.94919.sa0

Introduction

Bacteria carry out a wide range of essential functions for their survival. These vital cellular activities are referred to as “core biological processes” and include energy production, DNA replication, transcription, translation, cell division, and cell wall synthesis, among others. These processes are executed by multiprotein complexes, which require the coordinated action of multiple essential proteins to function properly. In the absence of these proteins, the complexes cannot work, with the consequent loss of cell viability. Therefore, understanding the essential protein-protein interactions (PPIs) is critical to understand how core biological processes are regulated and how they contribute to the cell’s overall function (Koonin, 2000; Carro, 2018; Cossar et al., 2020). By investigating these pathways and their associated proteins, we can gain insight into bacterial growth and survival mechanisms (de Groot et al., 2020; Gómez Borrego and Torrent Burgas, 2022).

Proteomic techniques such as yeast two-hybrid and tandem affinity purification coupled with mass spectrometry have identified millions of PPIs. However, the high number of false positives in high-throughput screenings makes the results less reliable (Rao et al., 2014; Zhao et al., 2020). A useful way to deal with false positives in interatomic data is to consider the three-dimensional structure of proteins, which provides insights into their function and architecture. The scientific community has experimentally determined thousands of protein structures at atomic resolution using X-ray crystallography, NMR, and cryo-EM. However, most protein complexes have not yet been determined. Recently, novel deep-learning models such as AlphaFold2 (AF2) and RosettaFold have outperformed previous methods in predicting protein structures, providing results with similar precision to experimental methods in successful cases (Jumper et al., 2021; Baek et al., 2021). AF2 can fold protein monomers and protein complexes, outperforming standard docking approaches (Evans et al., 2021). Therefore, we posit that AF2 can effectively differentiate between genuine interactions and false positive cases.

The topological analysis of pathogen interactomes is a powerful method for exploring the function of interacting proteins, uncovering the evolutionary conservation of protein interactions, or identifying essential hubs (Dong et al., 2020; Crua Asensio et al., 2017; Macho Rendón et al., 2022). Therefore, developing a complete map of the essential interactome is a powerful strategy to study the functional organization of proteins and to identify new targets for discovering new antibiotics. Here, we used AF2 to predict the Gram-negative and Gram-positive essential interactomes, comprising a total of 1402 interactions, which include the global confidence scores of the binary complexes predicted by AF2. We also discuss how these structures can provide insight into new mechanisms of action and identify intereting PPIs to target for discovering novel antibiotics.

Results and discussion

The average bacterial proteome is composed of ~4000–5000 proteins, which means that the interactome could potentially span around 20 million interactions. Based on recent estimates, there are approximately 12,000 physical interactions in Escherichia coli, which indicates that only about 0.1% of potential interactions may occur (Rajagopala et al., 2014). However, not all these interactions are expected to be essential for bacterial survival. If we were to selectively disrupt each interaction without impacting any other factors, only a small subset of interactions would likely be classified as essential. How can we identify these essential interactions without the paramount effort of performing all these experiments? We reasoned that a given interaction would only be essential if and only if both proteins forming the complex are essential (Figure 1a). While this simple approximation does not give us the exact answer, it does provide an upper bound for the essential interactome.

Figure 1 with 7 supplements see all
Analysis of essential binary complexes predicted by AlphaFold2 (AF2).

(a) Representation of protein-protein interactions (PPIs) based on their essentiality. This study focuses on interactions between essential proteins, highlighted by a green rectangle. (b) Pipeline used to construct the essential interactomes. (c) Cumulative distribution function of ipTM scores in selected (orange) and randomly generated PPIs (cyan). A two-sample Kolmogorov-Smirnov test was performed to assess the statistical significance of the difference between the two distributions. (d) Histograms displaying ipTM scores in selected complexes compared to random PPIs. Chi-square test p-values: <0.05*, <0.01**, <0.001***. (e) Accessible surface area of AF2 binary complexes grouped by ipTM score. (f) Conservation score comparison between interface and surface residues. Wilcoxon test p-values: <0.05*, <0.01**, <0.001***. (g) Network representation of side-chain residue contacts in high-accuracy binary models. Nodes represent residue types, and edges indicate interactions between residues. The color of the edges reflects the number of occurrences.

Using this premise, we retrieved a list of all essential Gram-negative and Gram-positive proteins from previous studies (Figure 1b), and considered as essential proteins only those that are present in at least two different species (Baba et al., 2006; Gerdes et al., 2003; Goodall, 2018; Liberati et al., 2006; Gallagher et al., 2011; Poulsen et al., 2019; Ramage et al., 2017; Bai et al., 2021; Commichau et al., 2013; Dembek et al., 2015; Ji et al., 2001; Chaudhuri et al., 2009; Santiago et al., 2015; Liu et al., 2017). Next, we retrieved all PPIs with experimental evidence (experimental score >0.15) and/or high-confidence PPIs (score >0.7) between these proteins from the STRING database (Szklarczyk et al., 2021). Additionally, we incorporated all of the synthetically lethal interactions identified in E. coli-K12-BW25113, as recorded in the Mslar database (Zhu et al., 2023a) to capture interactions between non-essential proteins that become essential in combination. We filtered out interactions that include ribosomal subunits and tRNA ligases. Using this pipeline, we modeled 722 unique Gram-negative essential PPIs (involving 216 proteins), 680 essential Gram-positive PPIs (involving 167 proteins), and 28 synthetically lethal PPIs (involving 45 proteins) using AF2-Multimer (Evans et al., 2021). To assess the confidence of the predictions, we used the ipTM scores to classify the models, as previously reported (Figure 1—figure supplements 1 and 2, Source data 1; Evans et al., 2021; Mackay et al., 2007; Bryant et al., 2022a). Concurrently, we modeled 722 Gram-negative and 680 Gram-positive negative PPIs, generated by random pairing among the selected proteins, to evaluate the ability of AF2 to distinguish between correct and incorrect models. To define an appropriate ipTM score cutoff, we calculated the cumulative distribution function of the ipTM scores for the selected and random complexes. The analysis revealed a significant difference between the two distributions (Figure 1c). Based on these results, we classified the models into three categories: unlikely (ipTM<0.4), plausible (0.4≤ipTM≥0.6), and high confidence (ipTM>0.6). Of the 722 Gram-negative PPIs, 549 (76.04%) were classified as unlikely, 74 (10.25%) as plausible, and 99 (13.71%) as high accuracy. For the 680 Gram-positive PPIs, 576 (84.70%) were classified as unlikely, 57 (8.48%) as plausible, and 47 (6.91%) as high accuracy (Figure 1d). We also validated our predicted models using crosslinking data that were available for 14 complexes (Source data 1). The distance restraints identified (crosslinked lysines are ~15–20 Å apart) are compatible with our models in 93% of the cases. Hence, despite the limited overlap between the crosslinking datasets and our list of validated interactions, for the complexes that did match, our models were consistent with the experimental data. These findings support the notion that AF2 is capable of distinguishing between incorrect and high-accuracy models, which is consistent with previous observations in other applications (Mackay et al., 2007). Thus, our results suggest that many of the essential PPIs retrieved from databases could be false positives, likely due to the high number of false positives found in large-scale screening experiments, which may include indirect interactions (Zhu et al., 2023b). We also compared ipTM scores with both pDockQ (Akdel et al., 2022) and pDockQ2 (Bryant et al., 2022b). The correlation between ipTM and pDockQ was low (R=0.328), but a stronger correlation was obtained between ipTM and pDockQ2 (R=0.649, Figure 1—figure supplement 1). Notably, some complexes with high ipTM values (>0.8) had minimal pDockQ2 scores, some of them virtually 0. However, these interactions showed improved pDockQ2 scores when modeled alongside accessory proteins (Figure 1—figure supplement 2), suggesting a better recall performance for ipTM. We conclude that pDockQ2 is a very accurate but restrictive metric. Therefore, we selected ipTM for assessing predicted interactions. Nonetheless, pDockQ and pDockQ2 scores for all predicted complexes can be found in Source data 1.

To test AF’s predictive capabilities in bacterial complexes, we conducted a thorough validation of 140 bacterial protein-protein complexes from the PDB (Supplementary file 1). This dataset encompasses structures published after the latest release of AF, sharing less than 30% sequence homology with all other complexes in the PDB. According to our criteria (ipTM>0.6), we observed that 81% (113 out of 140) of these structures were accurately predicted by AF2. From all models generated, 83% (116 out of 140) were almost identical to the native structures in terms of correct folding (TM-score>0.8). Most interestingly, 72% (101 out of 140) of the predicted structures were similar in terms of root mean square deviation at the interaction interface (i-RMSD<4 Å) and 56% (79 out of 140) of the interfaces were virtually identical to the real structures (i-RMSD<2 Å), highlighting the excellent prediction power of AF2.

The interface solvent accessible surface area (SASA) of our selected models showed moderate correlation with the ipTM score, suggesting that larger interfaces were more likely to have better model accuracies (Figure 1e). Additionally, we considered the conservation of the interface residues, which is frequently used as a proxy to identify protein-binding sites (Guharoy and Chakrabarti, 2010). As expected, the residues in the interface were significantly more conserved than those located at the surface, suggesting that the predicted models are reliable (Figure 1f, Figure 1—figure supplements 35). We also analyzed the residue types of the interface in high-confidence models (Figure 1g, 4.5 Å distance cutoff). The most abundant interface residues were involved in electrostatic interactions, particularly between arginines and negatively charged residues. There was also a significant contribution of hydrophobic interactions, with a high relevance of leucine and isoleucine residues, as well as between the hydrophobic moiety of the arginine side chain and the last two residues.

In summary, we assembled a high-accuracy essential interactome for both Gram-negative (Figure 2a) and Gram-positive bacteria (Figure 2b) that will enable us to identify protein hubs and investigate the importance of these interactions. Here, we focus on new structures involving essential complexes, where we can gain mechanistic insight from a detailed understanding of the structure (Table 1).

Figure 2 with 1 supplement see all
Essential interactomes.

(a) Gram-negative essential interactome; (b) Gram-positive essential interactome. Nodes represent essential proteins, and edges indicate interactions between them. The color of the edges reflects the ipTM score as calculated by AlphaFold2 (AF2). The most representative biological processes are highlighted in the figure.

Table 1
Protein complexes discussed in this work.

The ipTM score is shown along with the PDB accessions for the cases where the structure has already been solved. The AlphaFold2 (AF2) predictions are structurally aligned with the experimental structures in Figure 2—figure supplement 1 except for SecYEDF-YidC, which is discussed in Figure 6.

ProteinipTMPDB*ModelArchive IDFunction
AccB-BirA0.841ma-sysbio-bei-02Fatty acid synthesis
AccABCD0.809ma-sysbio-bei-01Fatty acid synthesis
AcpP-FabG0.757ma-sysbio-bei-06Fatty acid synthesis
AcpP-FabI0.7532FHSma-sysbio-bei-07Fatty acid synthesis
AcpP3-GlmU30.908ma-sysbio-bei-03Lipopolysaccharide synthesis
AcpP3-LpxA30.940ma-sysbio-bei-04Lipopolysaccharide synthesis
AcpP3-LpxD30.9574IHFma-sysbio-bei-05Lipopolysaccharide synthesis
LptC-LptD0.695ma-sysbio-bei-24Lipopolysaccharide transport
LptCAD0.600ma-sysbio-bei-23Lipopolysaccharide transport
SecYEDF-YidC0.6425MG3ma-sysbio-bei-27Outer membrane protein transport
SecYEDFA-YidC0.632ma-sysbio-bei-26Outer membrane protein transport
LolA-LolC0.8096F3Zma-sysbio-bei-22Lipoprotein transport
LolA-LolB0.838ma-sysbio-bei-21Lipoprotein transport
FtsA30.761ma-sysbio-bei-13Cell division
FtsZ30.614ma-sysbio-bei-18Cell division
FtsA3-FtsZ30.542ma-sysbio-bei-14Cell division
FtsQLBWIN0.727ma-sysbio-bei-17Cell division
FtsQLBK0.572ma-sysbio-bei-16Cell division
FtsE2-FtsX20.856ma-sysbio-bei-15Cell division
MreB4CD-RodZ-MrdAB0.764ma-sysbio-bei-12Cell division
DnaA40.545ma-sysbio-bei-08DNA replication
DnaN-PolA0.813ma-sysbio-bei-11DNA replication
DnaB-DnaI0.750ma-sysbio-bei-10DNA replication
DnaB-DnaC0.6506KZAma-sysbio-bei-09DNA replication
NrdE-NrdF0.856ma-sysbio-bei-25DNA replication
GyrA-GyrB0.7156RKUma-sysbio-bei-20DNA replication
GyrA-FolP0.847ma-sysbio-bei-19DNA replication
UbiEFGHIJK0.806ma-sysbio-bei-28Ubiquinone synthesis
  1. *

    Complexes FtsA3-FtsZ3 and FtsQLBK have an ipTM score <0.6 because they contain large intrinsically disordered segments that, despite not participating in the interaction, contribute to decrease the global ipTM score.

Complexes involved in the endogenous fatty acid synthesis

The biosynthesis of fatty acids (FA) is a crucial process for membrane biosynthesis and plays a pivotal role in related processes, such as the biosynthesis of lipid A, lipoic acid, and phosphatidic acid (Yao and Rock, 2013). The initial step in FA biosynthesis involves the transfer of biotin from the biotin protein ligase (BPL) BirA to the Acc complex via AccB. This is followed by the generation of malonyl-CoA through the catalytic action of the Acc complex. The resulting malonyl-CoA is then transferred to AcpP, which couples to each step of the elongation cycle catalyzed by the Fab family of proteins, ultimately resulting in the production of FA (Chan and Vogel, 2010; Figure 3a).

Figure 3 with 1 supplement see all
Core enzymes in fatty acid (FA) synthesis.

(a) FA synthesis pathway. (b) Proposed structural rearrangements in the BirA-AccB complex. Initially, the yellow arginine-rich loop and the green loop encapsulate the substrate in BirA pocket (closed state, left). (1) Upon interaction, Lys122 in AccB repels the arginine-rich loop in BirA (open state, right), (2) facilitating the covalent binding of the substrate to Lys122. The brown thumb loop likely interacts with the arginine-rich loop, contributing to complex stabilization. (c) Proposed mechanism of AccB shuttle in the Acc complex. Initially, the C-terminal domain of holo-AccB exhibits stronger affinity for AccC. Once the biotinyl group of AccB is carboxylated, the same domain may shuttle to AccA, facilitating the transfer of the carboxyl group to an acetyl-CoA molecule. The dotted line represents the flexible loop of AccB that would allow it to shuttle between AccA and AccC. All represented protein structures are AlphaFold2 (AF2) models. Uniprot codes used for AF2: AccA: P0ABD5, AccB: P0ABD8, AccC: P24182, AccD: P0A9Q5, and birA: P06709.

Currently, the structure of the BirA-AccB binary complex remains unsolved. Hence, our model provides valuable functional insights into this complex. We show that the BPL catalytic domain of BirA aligns with the biotinyl-binding (BB) domain of AccB. Within the structure of the complex, two BirA loops play a significant role: the first loop, spanning residues 218–226, interacts with the substrate, while the second loop, consisting of residues 116–121, is enriched in arginine and aids in stabilizing the substrate’s negative charge (Figure 3b). Based on our model, we propose that these loops act together to encapsulate the biotin moiety within the catalytic pocket of BirA, creating a closed state. Upon interaction with AccB, BirA engages with two specific AccB loops: the β-hairpin loop, that contains the important residue Lys122, and the “thumb motif”, comprising residues 94–102. The presence of Lys122 near the substrate leads to electrostatic repulsion of the arginine-rich loop, creating an open state. Then, the biotin molecule can covalently attach to the Lys122 residue of AccB, presenting itself to the essential Acc complex. Our model is compatible with mutagenesis studies performed in BirA where mutations M310L and P143T were found to induce a superrepressor phenotype, i.e., BirA lacks the capacity to biotinylate AccB (Chakravartty and Cronan, 2012). The effect of these mutations, that do not significantly affect the BirA active site, can be explained by the destabilization of the BirA-AccB interface.

The Acc complex, composed of four subunits, is responsible for catalyzing two half-reactions. First, AccC carboxylates the biotin group attached to Lys122 of AccB. In the second step, the AccAD complex transfers the carboxyl group from Lys122-carboxybiotin to acetyl-CoA to form malonyl-CoA (Figure 3a). While crystal structures of all the monomeric subunits have been solved (accAD: 2F9Y, accB: 1BDO, accC: 3RV4), the full structure of the Acc complex remains unknown. The accepted stoichiometry for the Acc complex is AccB4C2A2D2, although a dimeric form of AccB has also been reported (Chakravartty and Cronan, 2012; Cronan, 2021). When testing various AccBC stoichiometries, we found that the dimeric form of AccB led to higher accuracies. Our predicted models suggest that the BB domain of AccB can interact with the catalytic pockets of AccA and AccC, while the N-terminal domain can only be attached to AccC (Figure 3b). Additionally, the essential AccB ‘thumb motif’ interacts with the N-terminus of AccA and the loop comprising residues 192–195 of AccC, in agreement with previous mutational and structural studies (Tong, 2005). These studies concluded that the thumb region is critical for identifying Acc proteins, as only biotin-dependent enzymes involved in the synthesis of malonyl-CoA contain thumb domains (Tong, 2005). Other studies also suggest that the thumb domain may act as a mobile lid that tightly fits into AccC and AccA active sites (Cronan, 2001). While the heterotetrameric AccAD has already been crystallized, we identified a new, unsolved, high-accuracy interaction between AccC and AccD, which is consistent with coevolutionary studies (Broussard et al., 2013). We hypothesize that this interaction is crucial for maintaining AccBC close in space with AccAD, allowing the BB domain of AccB to dynamically shuttle from AccC to AccA (Figure 3c). The binding affinity of the BB domain to either AccC or AccA can be influenced by the carboxylation state of the biotin moiety. The introduction of a negative charge to biotin through carboxylation may decrease the affinity for AccC, leading to the binding of the BB domain to AccA. The structural information obtained from these interfaces is consistent with the bi-substrate ping-pong mechanism followed by the Acc complex (Cronan, 2001).

The malonyl-CoA produced by the Acc complex is then loaded onto AcpP by FabD, initiating the FA synthesis through the catalytic reaction of FabH. The FA elongation process is cyclic and requires several Fab proteins, adding two carbons to the FA intermediate in each cycle (Figure 3a; Cong et al., 2019). The interaction of AcpP to each Fab protein is essential for the cycle to proceed, as FA intermediates are tethered and transported by AcpP (Yao and Rock, 2015). In these lines, many AcpP-Fab protein complexes have been solved (AcpP-FabD: 6UOJ, AcpP-FabF: 7L4E, AcpP-FabB: 6OKC, AcpP-FabA: 4KEH, AcpP-FabI: 2FHS, AcpP-FabZ: 4ZJB) but the structure of the complex AcpP-FabG remains unknown, despite the similarity between FabG and FabI (Bartholow et al., 2021; Dodge et al., 2019). Both FabG and FabI contain Rossmann folds composed of twisted β-sheets surrounded by α-helices (Masoudi et al., 2014). To investigate these interactions, we generated models of homodimeric FabG and FabI and analyzed their interactions with AcpP (Figure 3—figure supplement 1). The interfaces between the Fab homodimers exhibited a high degree of similarity, but the interaction between AcpP and the Fab partner displayed some distinct features. In both cases, Ser36 of AcpP was positioned near the active site of the FabG/FabI pocket where the catalytic activity takes place. However, the exact binding location of AcpP appeared to differ, possibly due to the presence of FabI’s C-terminal region, which also interacts with the catalytic site and is absent in FabG (Figure 3—figure supplement 1). It is worth noting that the crystallized structure of the FabI-AcpP complex does not show AcpP’s Ser36 facing the catalytic site, whereas in our model, Ser36 is positioned in the correct orientation. These findings provide valuable insights into the selectivity of AcpP for different Fab protein pairs, particularly for the uncharacterized AcpP-FabG complex.

Complexes involved in LPS synthesis

Lipopolysaccharide (LPS) is a crucial molecule that forms the outer leaflet of the Gram-negative outer membrane (OM). It consists of lipid A, O-antigen polysaccharide, and a core oligosaccharide connecting both parts. The OM is an asymmetric lipid bilayer, with LPS making up the outer leaflet and phospholipids forming the inner leaflet. The biosynthesis of lipid A, also called the Raetz pathway, is highly conserved in Gram-negative bacteria and involves several enzymes of the Lpx family (Shanbhag, 2019; Whitfield and Trent, 2014). In E. coli, LpxA binds to AcpP to transfer β-hydroxymyristoyl, one of the many substrates of FabA/FabZ, to UDP-N-acetylglucosamine, which is synthesized by GlmU. Next, LpxC deacetylates the LpxA product, and LpxD transfers another β-hydroxylauroyl molecule, which is also transported by AcpP. The Raetz pathway requires six more reactions to convert the initial UDP-N-acetylglucosamine into Kdo2-lipid A before it is translocated to the outer leaflet of the inner membrane (IM) by the MsbA flippase (Figure 4a; Shanbhag, 2019; Mahalakshmi et al., 2014).

Figure 4 with 1 supplement see all
Common mechanism in initial steps of lipopolysaccharide (LPS) synthesis pathway.

(a) Simplified Raetz pathway. (b) Top view (left), front view (center), and magnified interface (right) of GlmU-AcpP, LpxA-AcpP, and LpxD-AcpP predicted AlphaFold2 (AF2) models. GlmU contains an N-terminal uridyltransferase domain (UDT, yellow) while LpxA incorporates a C-terminal acetyltransferase domain (ACT, cyan) forming a collapsed helix that does not interact with the other LpxA monomers. LpxD incorporates a uridine-binding domain (UBD, green) and a C-terminal acetyltransferase domain forming a 3-helix bundle. The common left-handed β-helix domain is colored in pink, the extruding loop is highlighted in blue, AcpP in orange, and AcpP’s Ser36 in red. Uniprot codes used for AF2: GlmU: P0ACC7, LpxA: P0A722, LpxD: P21645, AcpP: P0A6A8.

The crystal structures of homotrimeric LpxA3 (6P9S), LpxD3 (6P89), and GlmU3 (2OI6) contain left-handed β-helix domains, with different structural features characterizing each protein (Figure 4b). Though the LpxD3-AcpP3 structure is already known (4IHF), the LpxA3-AcpP3 and GlmU3-AcpP3 complexes remain unsolved. The interfaces in our predicted models for both complexes consistently display the critical Ser36 residue of AcpP (located in the universal recognition helix or helix II) placed in the catalytic chamber, resembling the LpxD3-AcpP3 crystal structure. Interestingly, our models reveal a hydrophobic patch that accommodates the lipid moiety of the ligand (Figure 4—figure supplement 1) with a size proportional to the substrate’s length. These structures reveal that all the complexes contain an extruding loop derived from the left-handed β-helix domain, which could act as a lid, facilitating ligand recognition. Therefore, we propose that a shared mechanism mediated by the extruding loop of the left-handed β-helix domain defines substrate specificity in these three complexes.

Complexes involved in LPS transport

The lipid A-core synthesis and transport in bacteria must be tightly coupled. The lipid A-core region of LPS is synthesized in the cytoplasm and transported to the periplasmic face of the IM using the MsbA flippase. The O-antigen is then ligated to the lipid A-core by the WaaL ligase to form the LPS molecule. Subsequently, the LPS is carried from the IM to the OM by the lipoprotein transport protein complex (LptA-G), which plays a vital role in cellular function (Olsen et al., 2007; Hicks and Jia, 2018; Putker et al., 2015).

To extract the LPS from the IM, the LptB2FG complex, an ATP-binding cassette (ABC) transporter, hydrolyzes ATP to induce conformational changes in the transmembrane (TM) LptFG complex. The LptFG periplasmic β-jellyroll (βJR) domains are arranged in an antiparallel manner, creating a conduit for the LPS to move from the hydrophobic pocket of LptFG to the βJR domains of LptFC (Figure 5a). Once inside the LptFC complex, LptA facilitates the unidirectional transport of LPS to LptD in the OM. For this transport the formation of a physical bridge in the periplasm between LptC, LptA, and LptD is essential (Li et al., 2019). Hence, LPS undergoes a two-portal mechanism, moving from LptA to the N-terminal βJR fold of LptD, and then to the C-terminal TM β-barrel domain. There, the LptDE complex forms a plug-and-barrel structure, with LptE inserted into the β-barrel of LptD, effectively blocking a portion of the extracellular opening to maintain membrane impermeability (Figure 5a; Okuda et al., 2016).

Model of Lpt bridge.

(a) Schematic representation of the Lpt complex. Initially, the LptB2FGC complex extracts the LPS from the inner membrane (IM). The LPS molecule then moves from the hydrophobic pocket of LptFG to LptC. The LptCAD periplasmic bridge shields the LPS molecule and facilitates its insertion into the outer membrane (OM) by LptDE. Key compartments include the IM, OM, periplasm (P), cytoplasm (C), and extracellular space (ECS). LPS refers to lipopolysaccharide. (b) AlphaFold2 (AF2) models of Lpt bridges with varying LptA stoichiometries are depicted, with each LptA subunit approximately measuring 40 Å in length. (c) A view of the interior of the Lpt bridge reveals a hole with a diameter ranging from 10 to 15 Å in all three cases. The structures are presented in the same order as in the previous model: LptCD, LptCAD, and LptCA2D. Uniprot codes used for AF2: LptA: P0ADV1, LptC: P0ADV9, LptD: P31554.

While the cryo-EM and crystal structures of LptA (6GD5), LptB2FGC (6MK7), and LptDE (4RHB) have been extensively studied, the structure of the bridge formed by LptCAD remains ill-defined. Additionally, the exact number of LptA molecules that make up the periplasmic bridge is still unknown, although previous research suggests that LptA molecules in isolation can form polymers of up to eight subunits (Malojčić et al., 2014; Merten et al., 2012). In our study, we have successfully generated a high-accuracy model of the periplasmic bridge by computationally predicting the structure of the LptCAD complex. Our model supports the formation of a head-to-tail LptCAD complex (Figure 5b) and suggests that the presence of a single LptA monomer is enough to form a bridge spanning approximately 15 nm, which corresponds to the average thickness of the periplasm in E. coli. It should be noted that the width of the periplasmic space can vary depending on environmental conditions, contracting or expanding during stress (Santambrogio et al., 2013). Consequently, the oligomeric state of LptA may adapt to these changes, allowing the formation of larger bridges. By modeling different LptA oligomers (such as LptA2 and LptA3), we were able to generate models consistent with previously reported structures (Merten et al., 2012; Sochacki et al., 2011), indicating that LptA can transiently oligomerize in the periplasm, facilitating the formation of extended bridges (Figure 5b). Furthermore, under certain conditions, the periplasmic space can significantly shrink (approximately 10 nm), consistently with the loss of a single LptA molecule. In our analysis, we identified a high-accuracy interaction between LptC and LptD, which involves the interface region of their βJR domains, analogous to previously characterized complexes, suggesting that the formation of the complex without LptA is also feasible.

Complexes involved in OMP transport

Outer membrane proteins (OMPs) are β-barrel proteins that are synthesized in the cytoplasm and require translocases to be transported to the OM (Suits et al., 2008). This transport is mediated by the Sec complex, which drives the translocation of unfolded peptides across the IM, and the Bam machinery, which mediates the insertion and folding of β-barrel proteins into the OM. High-resolution cryo-EM images of the Bam complex are available, but only a single low-resolution (5MG3, 14 Å) structure of the Sec holo-translocon (HTL; SecYEGDF-YidC).

The export of nascent OMPs can occur co-translationally if the proteins contain signal peptides or post-translationally through the action of SecA (Knyazev et al., 2018). The translocation process relies on the essential components SecY and SecE. While SecY and SecE are essential for translocation, SecG stimulates the process but is not indispensable. SecY and SecE interact with other accessory proteins such as SecDF, a secretion factor that utilizes proton motive force to facilitate protein secretion into the periplasm, and YidC, an integral membrane protein that functions as a chaperone and insertase for membrane protein biogenesis (Figure 6a; Ma et al., 2019). Crystal and cryo-EM data have provided valuable insights into the structure and function of sub-complexes like SecYEA (6ITC), SecYEG (6R7L), SecDF (3AQ0), and YidC (6AL2), but limited information is available regarding the conformational rearrangements carried out by YidC within the overall structure of the translocon (Suits et al., 2008; du Plessis et al., 2011; Oswald et al., 2021).

Figure 6 with 1 supplement see all
Organization of the Sec translocon.

(a) Schematic representation of the Sec translocon and its crosstalk with the Bam translocon. During protein translocation, the preprotein engages with the central cavity of SecY, where the N-terminal helix of YidC is accommodated. Subsequently, the plug domain is displaced, allowing the preprotein to be released into the periplasm through the lateral gate. Crosstalk between the Sec and Bam translocons may occur via indirect interactions facilitated by periplasmic chaperones. Key compartments include the inner membrane (IM), outer membrane (OM), periplasm (P), and cytoplasm (C). (b) Front and top views of the cryo-EM structure (top) and the AlphaFold2 (AF2) model (bottom), providing different perspectives on the Sec translocon organization. (c) Schematic representation of the Sec translocon showing the relative orientation of the corresponding subunits in the cryo-EM structure (top) and our AF2 model (bottom). Uniprot codes used for AF2: secD: P0AG90, secE: P0AG96, secF: P0AG93, secY: P0AGA2, YidC; P25714.

To gain a more comprehensive understanding of the translocon assembly, we generated a model of the HTL assembly, which encompasses SecYEDF and YidC, and compared it to the low-resolution cryo-EM structure (Figure 6b; Veenendaal et al., 2004). Interestingly, the model positioned the previously uncharacterized N-terminal helix of YidC inside the central cavity, providing potential stabilization of the complex in a specific state (Figure 6—figure supplement 1). In the cryo-EM structure, the C-terminal domain of SecE encircles SecY from the external face (Figure 6b, top). However, in the model, SecE adopts a diagonal embrace of the two SecY halves, with the hinge facing the central cavity and the C-terminal region facing the TM domains of YidC (Figure 6—figure supplement 1). The cryo-EM structure shows close contacts between SecF and YidC, constraining the complex and preventing the formation of the central cavity. In contrast, our model shows weak interaction between SecF and YidC’s N-terminal helix. In addition, SecF is distant from the TM and periplasmic domains, being SecD positioned between both subunits. Furthermore, the crystal structures of SecDF and YidC closely resemble our model but exhibit poor alignment with the cryo-EM structure (RMSDs for YidC and SecDF: 0.512 Å and 3.552 Å in our model; 14.060 Å and 15.336 Å in the cryo-EM structure).

The subunit organization in our model is consistent with a proposed mechanism in which the preprotein infiltrates into the pocket of SecY, displaces the plug domain, and is subsequently released through the exit lateral gate, with the dynamic periplasmic domains coordinating its release into the periplasm. Previous studies have examined the dynamics of the SecY lateral gate (formed by TM2 and TM7) and concluded that it fluctuates significantly, irrespective of the bound ligand and the experimental conditions (du Plessis et al., 2011). In the cryo-EM structure, the lateral gate is in a closed state and faces the membrane, whereas in our model, it faces the TM region of YidC (Figure 6b).

We also decided to model the HTL including SecA as several mechanisms have been proposed to explain post-translational translocation in bacteria (Figure 6—figure supplement 1; Knyazev et al., 2018). Tight interactions involving the SecY’s β-hairpin loop comprising residues 247–262 and SecA could explain some rearrangements in SecY that mediate the open/closed states, allowing the preprotein to move from the SecA-SecY pocket to the SecY pore. It is noteworthy that when SecA attaches to SecY, the central cavity is not formed, and the N-terminal helix of YidC is positioned near the lateral exit gate of SecY, which supports earlier research (Figure 6—figure supplement 1; Botte et al., 2016). It appears that the arrangement of the Sec translocon can vary greatly and depends on its interaction with SecA, and the ribosome, and whether the translocation is YidC-dependent or -independent. Based on our models, SecA is essential for propelling the polypeptide during the initial stages, and the preprotein is transported to the exit lateral gate where YidC is located. If SecA is absent, a different mechanism may be employed to translocate the preprotein, (Knyazev et al., 2018; Steudle et al., 2021; Alvira et al., 2020) and the N-terminal helix of YidC found in the central cavity may play a crucial role.

Complexes involved in lipoprotein transport

Lipoproteins are integral components of the OM that play essential roles in cell wall synthesis, secretion systems, and antibiotic efflux pumps (Tsirigotaki et al., 2017). The transport of lipoproteins from the IM to the OM is facilitated by the Lol pathway, which involves five essential proteins: LolA, LolB, LolC, LolD, and LolE (Figure 7a; Grabowicz and Silhavy, 2017). However, recent studies suggest that in certain species, the involvement of LolA and LolB in lipoprotein trafficking may not be essential, indicating the existence of alternative pathways (Tsirigotaki et al., 2017).

Figure 7 with 1 supplement see all
Organization of the Lol complex.

(a) Schematic depiction of the Lol complex. The outer membrane (OM), inner membrane (IM), periplasm (P), and cytoplasm (C) are highlighted in the figure. The structures of LolA and LolB are shown in green and yellow, respectively. The LolCD2E complex and the lipoprotein are represented in a schematic manner. (b) Predicted AF2 models of LolAB and LolAC. The protruding loops of LolB and LolC are highlighted in red for clarity. Uniprot codes used for AF2: lolA: P61316, lolB: P61320, lolC: P0ADC3.

In the Lol pathway, lipoproteins are extracted from the IM by the ABC transporter LolCD2E and transferred to the lipoprotein periplasmic carrier, LolA. The ATPase activity of the LolD dimer is responsible for ATP hydrolysis, leading to structural rearrangements that enable LolC to recruit LolA (Figure 7b, bottom) (Narita and Tokuda, 2017). LolA then accepts the lipoprotein moiety. Despite sharing structural homology, LolC and LolE have two distinct clear roles: LolC specifically binds to LolA, while LolE interacts with lipoproteins (Kaplan et al., 2018) To gain insights into the specific role of each subunit, we compared the already solved LolAC structure (6F3Z) with the hypothetical LolAE complex (Figure 7—figure supplement 1). LolC and LolE share an identical overall fold, except for a β-hairpin located in the interface. The β-hairpin loop in LolC is smaller and can be easily accommodated within the β-barrel of LolA. Instead, the loop in LolE is larger and cannot be placed inside the β-barrel. This comparison indicates that the β-hairpin loop may be responsible for the specific interaction between LolA and LolC.

After the lipoprotein is loaded into LolA, the lipoprotein-LolA complex travels across the periplasm to interact with LolB, which accepts the lipoprotein and incorporates it into the OM. LolA and LolB also contain a β-barrel domain, however, the latter also accommodates a helix inside the β-barrel (Kaplan et al., 2022). Surprisingly, the LolAB crystal structure remains unsolved. Our LolAB model shows strikingly similar interfaces with LolAC, as both show the protruding β-hairpin loop contained inside the β-barrel hydrophobic cavity, evidencing that both complexes share a similar mechanism (Figure 7b, top). Moreover, the critical Leu68 of LolB, which is crucial to receive and localize lipoproteins to the OM, is located at the interface region (Takeda et al., 2003). An incorrect fold is obtained if one tries to model the interaction between LolB and LolC (Figure 7—figure supplement 1) as the protruding β-hairpin loops of both subunits face each other instead of following a ‘mouth-to-mouth’ model. Probably the helix inside the LolB β-barrel allows LolC to distinguish between LolA and LolB as binding partners. In summary, this data is consistent with a model in which the periplasmic chaperone LolA accepts and delivers lipoproteins in a ‘mouth-to-mouth’ mechanism by interacting specifically with LolC and LolB (Narita and Tokuda, 2017).

Complexes involved in cell division

Bacterial cell division is a highly regulated and dynamic process that involves the coordinated action of numerous proteins. The initial step of this process is the formation of the Z-ring, a circular structure located at the midcell, composed of polymerized tubulin-like FtsZ proteins, which serves as a landmark for the division site. FtsA and ZipA proteins anchor the FtsZ proteins to the membrane (Hayashi et al., 2014). Current models suggest that other proteins like FtsN, FtsK, and the FtsQLB complex are recruited when FtsA changes from a group to a single molecule through FtsEX (Hayashi et al., 2014; Mahone and Goley, 2020). These recruited proteins are important for initiating the contraction of the membrane. Later, FtsN recruits FtsW, which adds glycan strands, and FtsI, which connects peptide side chains to specific areas where peptidoglycan (PG) is needed (Figure 8). FtsW and FtsI contribute to the synthesis and modification of the cell wall during cell division (Pichoff et al., 2019; Rohs et al., 2018).

Figure 8 with 3 supplements see all
Divisome and elongasome predicted complexes.

The initial step of cell division involves the binding of the polymer FtsZ to inner membrane proteins FtsA. FtsEX assists in converting the polymer form of FtsA to its individual subunit form, which promotes the recruitment of FtsK, FtsQLB, FtsWI, and FtsN. On the left side, the AlphaFold2 (AF2) model shows the interaction between FtsQLBWIN and FtsA2. Previous research suggested that the monomeric form of FtsA is responsible for recruiting the divisome proteins, while the AF2 model indicates that the dimeric form of FtsA could also play a role in this recruitment. In the center, the interactions between the transmembrane domains of FtsK and FtsQLB are shown, along with FtsK’s long linker and the DNA binding domain. This interaction likely occurs before the recruitment of FtsN to prevent DNA entrapment during division. On the right side, the AF2 predicted elongasome complex is displayed. For a more detailed depiction of the divisome and elongasome complexes, please refer to Figure 8—figure supplement 2 and Figure 8—figure supplement 3, respectively. Notations: PG refers to peptidoglycan, P refers to periplasm, and C refers to cytoplasm. All represented protein structures are AF2 predictions. Uniprot codes used for AF2: ftsA: Q02KT7, ftsB: A0A0H2ZE93, ftsE: A0A0H2ZGN1, ftsH: A0A0H2ZC79, ftsI: A0A0H2ZFM0, ftsK: P46889, ftsQ: A0A0H2ZGP2, ftsN: P29131, ftsW: A0A0H2ZGG8, ftsY: A0A0H2ZKT5, ftsZ: A0A0H2ZM25. mrdA: P0AD65, mrdB: P0ABG7, mreB: P0A9X4, mreC: P16926, mreD: P0ABH4, rodZ: P27434.

The crystal structure of FtsA bound to the C-terminal helix of FtsZ of Thermotoga maritima is already solved (4A2A) but the N-terminal GTPase domain and the long-unfolded linker which connects both domains of FtsZ in the complex are missing. AF2 allowed us to predict the FtsA-FtsZ binary complex including the interface region between the GTPase domain of FtsZ and FtsA, absent in the crystal structure. After testing multiple stoichiometries, we detected that trimeric and tetrameric FtsA and FtsZ are the most confident states based on the ipTM score. The FtsA4-FtsZ3 complex displays the C-terminal of FtsZ attached to the pockets created between two FtsA monomers (Figure 8).

Although FtsZ plays a central role in cell division, the divisome assembly depends on the recruitment of multiple scaffold proteins and is influenced by the polymerization states of FtsA and FtsZ. Furthermore, some essential proteins like FtsN and FtsX were not included in our essential interactome as they were identified as essential in only one species, E. coli. With the aim of increase our understanding of the cell division process, we decided to include these proteins in our model. Also, we successfully obtained a high-confidence model for the experimentally unsolved FtsEX complex, an ABC transport involved in coordinating PG synthesis and hydrolysis and recruiting divisome proteins (Figure 8—figure supplement 1; Mahone and Goley, 2020). Recent studies have suggested that FtsEX acts on FtsA, promoting the transition from polymeric to monomeric FtsA, which in turn activates the constriction pathway through its interaction with FtsN (Hayashi et al., 2014; Mahone and Goley, 2020). Unfortunately, our attempts to predict the interfaces between FtsEX and FtsA/FtsZ were unsuccessful. We also modeled the binary complexes, FtsQB and FtsBL, which strongly support the formation of the FtsQLB complex. FtsLB adopts a helical coiled-coil conformation, while FtsQB reveals the binding of FtsB’s C-terminal domain to FtsQ, consistently with other experimental findings (Figure 8—figure supplement 2; Vicente et al., 2006). Additionally, we explored the interactions between FtsK and FtsQLB and found that their binding is primarily mediated by the N-terminal TM domains of FtsK and FtsQ (Figure 8). We observed contacts between the C-terminal domain of FtsK and the periplasmic domains of FtsQLB. These findings suggest that FtsKQ could play a role in connecting chromosome segregation and PG synthesis, ensuring DNA is not trapped during membrane constriction.

Our interactome highlights the central role of FtsW, which participates in multiple PPIs. As previously mentioned, FtsW and FtsI form a well-studied GTase-TPase pair involved in PG synthesis (Pichoff et al., 2019; Craven et al., 2022). The current model of cell membrane constriction proposes that FtsQLB mediates the localization of FtsWI to the midcell and triggers the final steps of constriction, although its structure remains structurally unverified (Vicente et al., 2006). We obtained confident models when modeling FtsW with FtsL and FtsB, which are consistent with a model in which the formation of FtsQLB regulates FtsWI, as detailed in recent studies (Vicente et al., 2006). Finally, FtsN is an essential protein involved in initiating membrane constriction through interactions with FtsQLB and FtsWI sub-complexes (Hayashi et al., 2014). Therefore, we extended our analysis to predict the structures of the FtsWIN and FtsQLBWIN complexes. As shown in Figure 8—figure supplement 2, the N-terminal helix of FtsN interacts with the TM helices of FtsW, while the helix and loop comprising residues 98–140 attach to the C-terminal domain of FtsI. The SPOR domain of FtsN does not participate in protein interactions. In addition, we acquired an FtsQLBN model with poor precision, suggesting that FtsN would bind exclusively to FtsWI. Notably, we observed that the SPOR domain of FtsN (present in the FtsWIN model) shares the same interaction site as FtsLB when joining with FtsWI (as seen in the FtsQLBWI model) by overlapping the FtsWIN and FtsQLBWI structures. Therefore, we suggest that PG synthesis occurs when FtsQLB binds to FtsWI, displacing the SPOR domain so that it can attach to PG, facilitating the transport of the complex to regions where PG is required.

Complexes involved in cell elongation

The elongasome is formed when the actin-like MreB protein polymerizes and attracts various proteins from the Mre and Rod families, which are critical for maintaining the shape of rod-shaped bacteria, such as E. coli (Hayashi et al., 2014; Sjodt et al., 2020). In these bacteria, the elongation and cell division are closely coordinated, to avoid changes in shape that may impact cell survival (van Teeseling, 2021). The elongasome and divisome share important similarities: both involve the polymerization of an actin-like protein that signals the assembly of membrane-associated protein complexes anchored in the IM, such as FtsA and MreB (van Teeseling, 2021). These proteins form dynamic filaments with an actin-like nucleotide-binding domain that hydrolyzes ATP to initiate polymerization (van Teeseling, 2021). Both complexes also have specific GTase-TPase sub-complexes which polymerize and cross-like glycan chains: FtsWI in the divisome and MrdAB in the elongasome. However, while MrdAB is mainly found in the lateral wall and midcell, FtsWI is localized in the division septum (Szwedziak and Löwe, 2013). Despite their similarities, the structure of the two complexes differs in several ways. The divisome comprises the tubulin-like FtsZ protein which assembles in a ring-like complex and recruits several Fts proteins such as FtsWI, FtsEX, FtsQLB, FtsK, and FtsN (Hayashi et al., 2014). In contrast, the elongasome contains the actin-like MreB-forming patches attached to the membrane and interacts with proteins such as RodZ, MreBCD, and MrdAB (Graham et al., 2021). Moreover, while MreB is undoubtedly an essential component of the elongasome, its specific function remains unclear (Sjodt et al., 2020).

Based on biochemical and interaction studies and the confidence of the binary complexes, we modeled the elongasome incorporating MreBCD and MrdAB (Figure 8; Graham et al., 2021). Several studies have revealed connections between MrdC and MreD, MrdA and MrdB, and MreB and MreC, emphasizing the central role of MreB (Graham et al., 2021; Liu et al., 2020; Banzhaf et al., 2012), which forms filament-like oligomers in the cytoplasmic leaflet of the IM and recruits elongasome proteins (Pichoff et al., 2019). The predicted model of the elongasome suggests direct interactions between the MreB filament and the TM domains of MrdAB, but not with the other accessory proteins (Figure 8, Figure 8—figure supplement 3). Additionally, the model incorporates the MreCD-RodZ sub-complex, which is crucial for maintaining bacterial morphology. The cytoplasmic N-terminal domain of RodZ, characterized by a helix-turn-helix motif, likely contributes to protein-protein interactions with MreB, while the C-terminal domain may interact with periplasmic proteins to regulate bacterial morphology. The two sub-complexes are expected to interact with each other through the TM domains, likely facilitated by MrdB and MreD, as well as through the periplasmic domains of MrdA and MreC (Figure 8—figure supplement 3). These findings suggest that the cytoplasmic regions of MreB initially recruit the MrdAB GTase-TPase sub-complex, followed by the binding of MreCD-RodZ to MrdAB. Interestingly, the overall arrangement of the elongasome model exhibits similarities to the divisome sub-complex FtsQLBWI. For instance, the connections between the periplasmic domains of MreC and MrdB in the elongasome resemble the interactions between FtsB and FtsI in the divisome. Additionally, the binding between the TM domains of MreCD and MrdA may serve a comparable role to the interactions of FtsQLB and FtsW in the divisome.

Complexes involved in DNA replication

DNA replication involves the duplication of DNA during cell division to pass it on to the next generation. This intricate process is divided into three steps: initiation, elongation, and termination, which are carried out by conserved and dynamic protein machineries called replisomes. Despite progress made in characterizing the architecture of prokaryotic replisomes, the highly dynamic nature of replication makes the structural characterization challenging (van der Ploeg et al., 2013; Reyes-Lamothe et al., 2010).

The initiator protein of replication, DnaA, self-oligomerizes in the presence of ATP at the replication origin (OriC) (Xu and Dixon, 2018). This facilitates the formation of a DNA bubble, enabling the loading of helicases and recruitment of the DNA polymerase III complex (Reyes-Lamothe et al., 2010). First, the DnaBC complex, comprising 12 subunits, inhibits the unwinding of the double-stranded DNA. The later binding of DnaG primase to DnaB promotes dissociation from DnaC, resulting in DNA unwinding (Reyes-Lamothe et al., 2010). Experimentally solved structures of the DnaBC complex are available (6KZA), but data on oligomeric DnaA or DnaBG interactions is limited, as they can vary depending on bacterial species, cell cycle stage, and ATP/ADP presence (Reyes-Lamothe et al., 2010; Xu and Dixon, 2018). Previous studies have suggested that high concentrations of ATP-DnaA are required to adopt a helical filament-like structure to fully engage oriC. In our AF2 model, which describes tetrameric DnaA, the monomers are arranged in a bent filament, with the domain III of the monomers interacting in a head-to-tail manner and the domain IV facing the DNA (Figure 9—figure supplement 1; Xu and Dixon, 2018; Katayama et al., 2017). Unfortunately, we were unable to obtain larger oligomers or highly reliable interactions involving DnaG bound to DnaBC. One possible explanation for this is that the presence of a DNA molecule or accessory proteins, such as DiaA, are required in such cases.

DNA elongation is facilitated by the DNA polymerase III holoenzyme, which is a complex composed of three sub-complexes: the αεθ polymerase core, the β2 sliding clamp, and the δτηγ3-ηδ'ψχ clamp loader (Reyes-Lamothe and Sherratt, 2019). Detailed structural insights into these subassemblies have been obtained through cryo-EM studies, shedding light on their underlying mechanisms. However, modeling these large and dynamic complexes is challenging, especially in the absence of DNA molecules. Despite these inherent limitations, we identified an intriguing unresolved complex involving the interaction between the sliding clamp DnaN and DNA polymerase I (Figure 9a). The existence of this interaction suggests that DnaN may serve as a recruiter for DNA polymerase I at the replication fork, facilitating its attachment to the DNA. This finding highlights the crucial role of DnaN in coordinating the activities of multiple polymerases at the replication fork, thereby ensuring the efficiency and accuracy of DNA synthesis (Reyes-Lamothe et al., 2010).

Figure 9 with 1 supplement see all
Complexes involved in DNA replication and synthesis.

(a) Predicted interface between DNA polymerase I (PolA) and DnaN2. (b) Models of GyrAB and GyrA-FolP (top). Close-up view of the GyrA-FolP interface and comparison with the crystal structure of FolP (bottom;1AJ0). The notable difference between the two structures is the loop region spanning residues 22–36, indicated in yellow/blue. (c) Predicted binary complexes DnaBI and DnaBC. The DnaBC predicted model is aligned to the solved crystal structure 6KZA (Figure 2—figure supplement 1). (d) Close-up view of the AlphaFold2 (AF2) predicted interface between NrdE and NrdF, highlighting important aromatic residues and cysteines involved in nucleotide reduction. Uniprot codes used for AF2: DnaB (DnaBI): A0A062WMW9, DnaB (DnaBC): P0ACB0, DnaC: P0AEF0, DnaI: Q8CWP7, DnaN: P0A988, GyrA: P0AES4, GyrB: P0AES6, FolP: P0AC13, NrdE: A0A0B7LYQ0, NrdF: A0A062WM39.

During DNA replication, gyrases and topoisomerases IV form heterotetramers (GyrA2B2, ParC2E2) that modulate DNA topology by transiently cutting one or both DNA strands (Fijalkowska et al., 2012; Badshah and Ullah, 2018). Interestingly, we have discovered a potential connection between type II topoisomerases and the folate metabolism, facilitated by the GyrA-FolP interaction. As illustrated in Figure 9b, FolP and the C-terminal domain of GyrB share a similar interface with GyrA, indicating that FolP might compete with GyrB, thus exerting regulatory control over the complex. By exploring different stoichiometries, we have developed a model that suggests a complex comprising two GyrA and four FolP copies. When aligning our model with the FolP crystal structure bound to its substrate (1AJ0; Figure 9b, bottom), we observed a significant difference in the loop region spanning residues 22–36. In our model, this loop obstructs the catalytic site, whereas in the experimentally resolved structure, the pocket is accessible. This rearrangement of the loop, likely induced by the presence of the substrate, may be crucial in facilitating its interaction with GyrA while impeding its interaction with GyrB. Although the exact nature and significance of the interplay between these complexes remain incompletely understood, it is conceivable that this interaction plays a role in regulating DNA topology and preserving genome stability, given the vital role of folate metabolism in nucleotide synthesis.

Our Gram-positive interactome analysis reveals significant representation of both topoisomerases and replisome proteins. Notably, we have identified a distinctive interaction specific to Gram-positive bacteria involving the replication initiator DnaB and DnaI in Bacillus subtilis and Streptococcus pneumoniae. This PPI is absent in Gram-negative bacteria, as they lack a DnaI homolog and follow a different mechanism for replication initiation regulation (Hooper and Jacoby, 2016). In certain Gram-positive bacteria, DnaI interacts with DnaB, thereby aiding in the coordination of DNA replication initiation with the activities of the replication machinery. The predicted interface reveals close contacts between the N-terminal region of DnaI and the C-terminal domain of DnaB, resembling the structure of DnaBC (Figure 9c). Furthermore, our analysis predicts highly reliable binary interactions involved in DNA synthesis (nrdEF) and DNA transcription (rpoCZ, rpoC-greA, and rpoC-sigA). While the subunits of the DNA-dependent RNA polymerase have been extensively characterized, with cryo-EM structures available at good resolutions, a high-resolution binary complex of the two components of the ribonucleotide reductase enzyme (NrdEF) remains unresolved. The predicted interface emphasizes the importance of the C-terminal loop of NrdF in the interaction, where the “thumb motif” containing two phenylalanine residues interacts with four tyrosines in the catalytic site of NrdE, probably to stabilize the nucleotide substrate (Figure 9d). These findings align with previous studies proposing that a thiyl radical is formed in Cys382 and the reduction of the nucleotide occurs through the cooperation of two cysteines present in the catalytic pocket, namely Cys172 and Cys409. These cysteines function as reducing agents (Jameson and Wilkinson, 2017).

Complexes involved in the synthesis of ubiquinone

Ubiquinone, also known as coenzyme Q, plays a vital role in the electron transport chain, driving ATP synthesis in numerous organisms. In E. coli, a series of enzymatic steps performed by ubiquitin proteins (Ubi) utilizes chorismate and octaprenyl diphosphate as precursors to synthesize ubiquinone (Figure 10a; Thomas et al., 2019) While some Ubi proteins function independently, the final six reactions are performed by the Ubi metabolon (UbiE-I). This metabolon comprises three hydroxylases (UbiI, UbiH, and UbiF) and two methyltransferases (UbiG and UbiE) (Abby et al., 2020). The overall structure of this obligatory Ubi metabolon remains poorly defined. The metabolon enhances catalytic efficiency by organizing sequential enzymes of the same metabolic pathway and encapsulating reactive ubiquinone intermediates, thereby protecting against oxidative damage (Abby et al., 2020). Additionally, two accessory factors, UbiJ and UbiK, are present. UbiJ binds ubiquinone and other non-specific lipids. The mechanisms by which octaprenylphenol exits the membrane and attaches to UbiJ in the soluble Ubi complex (potentially facilitated by UbiB) and how the final product is transported to the membrane are still unclear.

Organization of the Ubi metabolon.

(a) Simplified ubiquinone synthesis pathway from 4-HB. 4-HB: 4-hydroxybenzoic acid, OPP: octaprenyl diphosphate. (b) Architecture of the Ubi metabolon. The numbers indicate the six reactions carried out by the Ubi metabolon, and the arrows depict the path followed by the lipid intermediate transported by UbiJ. In the first step, UbiJ shields the lipid intermediate and binds to UbiI, catalyzing the first reaction. In the following steps, the flexible UbiJ transport the biosynthetic intermediates to the next enzyme. (c) AlphaFold2 (AF2) model of the Ubi metabolon. Uniprot codes used for AF2: ubiA: P0AGK1, ubiE: P0A887, ubiF: P75728, ubiG: P17993, ubiH: P25534, ubiI: P25535, ubiJ: P0ADP7, ubiK: Q46868.

Through our analysis, we have identified high-confidence binary complexes involved in consecutive enzymatic steps, supporting the existence of the Ubi metabolon complex. Furthermore, we have predicted the UbiE-K assembly, shedding light on the structural arrangement of this previously unexplored metabolon. Based on the predicted interfaces, UbiE and UbiH interact with UbiG and UbiI to form a heterotetramer. In addition, UbiF seems to interact only with UbiI (Figure 10b and c). Additionally, the accessory proteins UbiJ and UbiK adopt a coiled-coil structure, which suggests their association with the membrane to facilitate the delivery of ubiquinone. Moreover, the SCP2 domain of UbiJ creates a lipophilic environment that accommodates lipid intermediates within the Ubi complex, consistent with previous findings (Hajj Chehade et al., 2019). Our model further suggests that the presence of two α-hairpin domains in UbiJ facilitates its interaction with UbiK, with the loops assisting the movement of the SCP2 domain between different subunits. The initial reaction catalyzed by the metabolon is likely initiated by the interaction between UbiJ and UbiI (Abby et al., 2020; Hajj Chehade et al., 2019). Subsequently, the lipid intermediate is sequentially transported to UbiG, UbiH, UbiE, UbiF, and ultimately to UbiG to catalyze the final reaction (Figure 10b and c). Interestingly, the initial reaction involves a hydroxylase, succeeded by a methyltransferase, and this process is reiterated once, ultimately concluding with another hydroxylase. Additionally, the three hydroxylases share a very similar structure, and likewise, the two methyltransferases also display structural homology. It should be noted that the quaternary structure of our model suggests the possibility of Ubi subunit polymerization, as it deviates significantly from the 1 MDa Ubi metabolon suggested by Abby et al., 2020. This initial model of the complete Ubi metabolon provides valuable insights into the complex’s mechanism, emphasizing the role of UbiJ in transporting lipid intermediates between different subunits.

Conclusion

The advancements in deep-learning technologies are poised to revolutionize various life science fields, particularly structural bioinformatics. Developing comprehensive interactomes holds great promise in identifying potential targets for the discovery of novel antibiotics. By combining deep-learning model confidence scores with interactome data, we can address the issue of high false positive rates. The structural insights presented in this study shed light on the underlying mechanisms of crucial biological processes in prokaryotes. Many of the discussed complexes lacked prior structural characterization, making the findings valuable for structural-based drug discovery approaches. To further enrich our interactomes, we can incorporate protein interaction data from other species or include information about the quaternary structure of the complexes. We hope that with the continuous training of deep-learning models using larger datasets, we will generate more accurate and confident protein complex models in the near future.

It is also crucial to acknowledge the limitations of the methodology employed in this study. First, the interpretation of protein essentiality can be influenced by the culturing conditions of bacteria. The essential proteins mentioned in the literature have been identified in bacteria cultured under rich medium conditions. However, it is important to recognize that protein complexes are dynamic entities that can rearrange in response to changing conditions and cellular stress. Therefore, it is necessary to understand these interactions within the appropriate biological context. Second, studying isolated binary complexes may result in inaccurate representations of the complete architecture due to the absence of accessory proteins or the omission of the correct stoichiometry. Finally, the performance of the AF-Multimer algorithm tends to decrease with a higher number of chains and in the case of heteromeric complexes. This is because homomeric structures typically possess internal symmetry, resulting in identical interfaces between chains and consistent interface quality. Heteromeric complexes, on the other hand, are more susceptible to variations in confidence scores due to irregularities in interface regions. Despite these constrains, AF2 showed remarkable predictive accuracy in modeling bacterial protein-protein complexes, generating high-confidence models for almost 90% of the complexes tested. Nevertheless, our results present an initial description of the essential interactome, which can assist researchers in gaining a deeper understanding of the fundamental processes within bacterial cells. As additional data becomes available in the coming years and new methods are developed to enhance the accuracy of protein multimer prediction, structural biology will deeply improve our understanding of the cell interactome.

Methods

Compilation of essential proteins and processing the data

First, we compiled from previous studies the essential proteins for four Gram-negative (Acinetobacter baumannii, Bai et al., 2021), E. coli (Baba et al., 2006; Gerdes et al., 2003; Goodall, 2018), Klebsiella pneumonia (Ramage et al., 2017), and Pseudomonas aeruginosa (Liberati et al., 2006; Gallagher et al., 2011; Poulsen et al., 2019) and four Gram-positive species (Bacillus subtillis, Commichau et al., 2013). Clostridium difficile (Dembek et al., 2015). Staphylococcus aureus (Ji et al., 2001; Chaudhuri et al., 2009; Santiago et al., 2015), and S. pneumoniae (Liu et al., 2017; Source data 1; Figure 1—figure supplements 67). In addition, we retrieved all synthetically lethal interactions found in E. coli-K12-BW25113 from the Mlsar database (Zhu et al., 2023a). Then, we mapped the Uniprot ID, the locus tag, and the gene name for each essential protein using Uniprot ID mapper to maintain the same annotation for all the entries and accommodate our comparisons in future mapping steps (Source data 1). We used EGGNOG mapper v2 (Launay et al., 2022) to retrieve the ortholog proteins of all our compiled proteins. By mapping the ortholog proteins we could link the proteins belonging to different species.

To retrieve the essential PPIs, we used the ‘Multiple protein’ search from the STRING database v11.0 (Szklarczyk et al., 2021) website (https://version-11-0.string-db.org). We selected those interactions with a high-confidence score (combined score >0.7) and/or those based purely on experimental data (experimental score >0.15) then we downloaded the short version of the output containing only one-way edges. The networks downloaded from STRING can also include interactions involving non-essential proteins, which we filtered out. In addition, to increase the confidence of the selected essential interactions, we shortlisted the Gram-negative/Gram-positive PPIs identified in at least two out of the four species. Finally, ribosomal-related proteins and tRNA ligases were also discarded, because they form huge multiprotein complexes and/or they are proteins too massive to be predicted by AF2 in our setup. A total of 722 Gram-negative and 680 Gram-positive essential PPIs were modeled. Furthermore, 722 Gram-negative and 680 Gram-positive random essential PPIs were generated to test whether AF2 can discriminate between high-accuracy and incorrect folds as well as to define an ipTM score cutoff. We verified that the randomly generated PPIs were absent in the positive dataset.

Compilation of experimentally solved PPIs not included in the training dataset of AlphaFold 2.3.1

We compiled all bacterial protein complexes from the PDB (accessed on 2023-09-15) that were not included in the training set of AF v2.3 (complexes until 2021-09-30). Our selection criteria encompassed heterodimers released after 2021-09-30 that were determined by either X-ray crystallography or cryo-EM with a resolution of 2 Å or better. We then selected the polymer entities grouped by UniProt Accession, retrieving a total of 425 structures. To eliminate redundancy, we clustered these structures using the ‘easy-cluster’ utility from Foldseek, with an alignment coverage cutoff of 0.9. From these clusters, we selected only one representative structure for each cluster, resulting in 304 representative structures. Next, we used the ‘easy-complexsearch’ module from Foldseek to align these structures with the AF training set and retained only those structures with a sequence identity below 30% with complexes in the AF training set, ultimately obtaining a total of 140 low-homology structures. We calculated the TM-score with the TMalign package downloaded from https://zhanggroup.org/TM-align/. Additionally, the DockQ and iRMS scores were determined using the ‘DockQ.py’ script downloaded from https://github.com/bjornwallner/DockQ; (Wallner, 2016; Basu and Wallner, 2016).

Prediction of binary protein complexes and interactomes

We used AlphaFold v2.3.1 (https://github.com/deepmind/alphafold; Jumper et al., 2022) to predict the structures of our essential PPIs. We installed locally AF2 in a cluster with the following node configuration: Intel(R) Xeon(R) Gold 6226R CPU @2.90 GHz and a NVIDIA GeForce RTX 3080 Ti GPU. The database versions used to carry out the predictions are the following: UniRef90 v2022_01, MGnify v2022_05, Uniclust30 v2021_03, BFD (the only version available), PDB (downloaded on 2023-01-10) and PDB70 (downloaded on 2023-01-10). The FASTA files containing the sequences of the essential proteins were fetched from Uniprot. To run AF-Multimer we executed the Python script ‘run_alphafold.py’ pointing to the FASTA files and adding the ‘model_preset = multimer’ flag. We retrieved the model with the best ipTM score over the five predicted models, which are stored in the ‘ranking_debug.json’ file, and computed pDockQ and pDockQ2 scores for the selected models (Akdel et al., 2022; Bryant et al., 2022b). The PPIs and the scores were collected in tabular format (Source data 1) and introduced to Cytoscape to build the essential interactomes (Figure 2). One protein partner was defined as ‘Source node’ and the other one as ‘Target node’ to establish the interactions (undirected edges) between the proteins (nodes). The ipTM score was expressed as ‘Edge attribute’ to modify the colors and widths of the edges depending on the ipTM score values. When possible, models were compared with available experimental structures deposited in the PDB.

Protein interface and surface analysis

We analyzed the interfaces with the ‘GetInterfaces.py’ Python script from the Oxford Protein Informatics Group (OPIP, Krawczyk, 2013) to obtain interacting and interface residues. The contact distance was defined as 4.5 Å and the interface distance as 10 Å. To find the surface residues we employed the findSurfaceAtoms PyMol function with a cutoff of 6.5 Å (de Groot et al., 2020). Per-residue conservation scores were computed using VESPA (Cantalapiedra et al., 2021), whose scores range from 1 (most variable) to 9 (most conserved). SASA was computed using FreeSASA (Mitternacht, 2016). Python module. Statistical data analyses were carried out using R v4.2.1 and Python v3.9. Molecular graphics were performed with PyMol.

Data availability

All models described in this paper are available on ModelArchive (https://modelarchive.org, dataset ID: ma-sysbio-bei) with accession codes in Table 1. The scores of selected and random binary PPIs and the annotations of the essential proteins are provided in Source data 1.

The following data sets were generated
    1. Gómez Borrego J
    2. Torrent Burgas M
    (2024) ModelArchive
    Structural assembly of the bacterial essential interactome.
    https://doi.org/10.5452/ma-sysbio-bei

References

    1. Goodall ECA
    (2018)
    The essential genome of Escherichia coli K-12
    mBio 9:e02096-17.
    1. Xu ZQ
    2. Dixon NE
    (2018) Bacterial replisomes
    Current Opinion in Structural Biology 53:159–168.
    https://doi.org/10.1016/j.sbi.2018.09.006

Decision letter

  1. Alan Talevi
    Reviewing Editor; National University of La Plata, Argentina
  2. Volker Dötsch
    Senior Editor; Goethe University, Germany

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors' note: this paper was reviewed by Review Commons.]

https://doi.org/10.7554/eLife.94919.sa1

Author response

Reviewer #1

The paper provides models of essential complexes formed in bacteria. These models have been predicted by AlphaFold2 and in some of the models, information from existing experimental structures is utilized. The predicted models have been calculated based on standard workflow procedures which are explained in detail and can be reproduced by others. The figures are informative and clear.

We are grateful for the reviewer's insightful comments, which have significantly contributed to improve our manuscript.

Suggestions for improvement:

a. The PDB accession codes of the experimental structures should be provided

b. A comparison of the predicted models with the experimental structures should be provided (e.g., same orientation, superposition). In Figure 6 for example, a figure with superposition or use of the same orientation would be more informative.

As suggested by the reviewer, we have included a new table (Table 1) listing all experimental structures discussed in the main text, with the corresponding PDB codes. All predictions are listed in Source data 1. For instances with available PDB codes, we compared the predicted structures to the experimental ones (new Figure 2—figure supplement 1). In Figure 6, the structures were difficult to superimpose because the subunits in the complexes have different relative orientations. To help comparing both models, we have added a schematic representation (new Figure 6c).

The paper will certainly generate many hypotheses based on the predicted models. In this respect, it would be useful for a wide audience in the bioscience field. However, the discussed models will need experimental verification by various techniques, such as X-ray crystallography, cryo-EM, SAXS, and structural proteomics. A more thorough analysis of the literature may help to improve the paper in this respect.

We acknowledge that we may have under-emphasized this aspect in our manuscript and understand the need of validating the predictions to determine the general validity of our AlphaFold (AF2) models. To address this issue, we conducted a thorough validation of 140 bacterial protein-protein complexes from the PDB. This dataset contains all structures published after the last release of AF2 that share less than 30% sequence homology with all other complexes deposited in the PDB. Hence, this dataset serves as the strongest test possible for AF2's prediction accuracy in bacterial complexes. We found that 81% (113 out of 140) of these structures were accurately predicted by AF2 according to our criterion (ipTM>0.6). From all models generated, 83% (116 out of 140) were almost identical to the actual structures in terms of correct folding (TM-score > 0.8). Most interestingly, 72% (101 out of 140) of the predicted structures were acceptable in terms of root mean square deviation at the interaction interface (i-RMSD < 4 Å) and 56% (79 out of 140) were virtually identical to the real structures (i-RMSD < 2Å), highlighting the excellent prediction power of AF2. This high precision likely reflects the vast number of bacterial sequences available in databases, from which coevolutionary relationships among protein residues can be inferred with great accuracy. These results provide strong support for our findings. We have included all this new data in a revised version of our manuscript, both in the main text (Results and Discussion, page 3) and Supplementary file 1, that contains all the results.

Also, we investigated whether our complexes have known crosslinks in the xlinkdb database (https://xlinkdb.gs.washington.edu/xlinkdb/). We also revisited the literature for more datasets that include crosslinking data and found an additional study that was potentially useful to validate our results (Lenz et al., Nat. Comm., 2021). Overall, we found information for 14 bacterial complexes. In all cases but one, the distance restraints identified (crosslinked lysines are ~15-20Å apart) are compatible with our models. Hence, although the overlap between the datasets and our list of validated interactions was not extensive, for the complexes that did match, our models were consistent with the experimental data. We have incorporated these results in the main text (Results and Discussion, page 3) and Source data 1, that contains all the results.

Finally, we identified several mutations in BirA, documented in the literature, that affect its interaction with AccB. In BirA mutations M310L and P143T were found to induce a superrepressor phenotype (BirA lacks the capacity to biotinylate AccB). These mutations do not significantly affect the BirA active site but can destabilize the BirA-AccB interface (Chakravarti et al., J. Baceriol., 2012), supporting our results. We have added this information in the main text (page 8).

Reviewer #2

This study attempts to identify the 'essential interactome' through combining information in presence/absence genomics across bacteria, information in the STRING database, and predictions from α-fold. Overall, the strategy is clear, and I do not have concerns about reproducibility and clarity.

We value the reviewer's constructive evaluation of our manuscript, and we would like to thank the reviewer's feedback as it has significantly helped us in improving our manuscript.

Strengths: Clever approach to get at the essential interactome.

Weaknesses: Putative impact. It is clear why understanding which interactions are present are important. But even as the authors suggest, interactions are dynamic and there are plenty of other tools that people could use to find interactions (including AA Coev that the authors themselves cite). The counter argument the authors bring up is the high false positive rate of interactions that is solved by this method. While true, the stringency criteria for what constitutes an interaction in this paper is remarkably high: each protein within the interaction needs to be essential, and needs to have a high confidence score in STRING, and then there is a hyperparameter that dictates the level at which AlphaFold 2 is providing confident answers. In this sense, this is less about an 'essential' interactome, and more about an interactome that is present with the highest true positive rate (trading off with the ability to discover new interactions at a reasonable breadth).

  • We appreciate the reviewer's insights concerning the stringency criteria for defining interactions. Here, we provide a detailed justification for our selection criteria and show how it aligns with our goal of identifying high-confidence interactions.

  • Protein essentiality: In our model, interactions are considered essential if, and only if, both proteins involved are essential, providing a conservative estimate for the essential interactome. In our revised manuscript, we explored the possibility for two non-essential proteins to form an essential interaction by investigating synthetically lethal interactions. Out of all synthetic lethal interactions in E. coli, only 28 interactions were identified, and only two could be modeled with an ipTM score > 0.6. Likely, these non-essential proteins operate in parallel or compensatory pathways instead of interacting directly. These findings lend support to our hypothesis and suggest that our interactome encompasses most essential interactions. These results were included in the revised manuscript.

Author response image 1
Author response table 1
Performance of state-of-the-art PPI prediction methods (Huang et al., 2023).
MethodsAUPRC*
SGPPI0.422
Profppikernel0.359
PIPR0.342
PIPE20.220
SigProd0.264
  1. *

    AUPRC denotes the average AUPRC value of 10-fold cross-validation.

It is clear from the data that such methods are not mature enough to be used as confident predictors. Hence, we decided to resort to validated interactions in the String database, which is one of the most comprehensive PPI databases. In this revised version, we have expanded our data set to include all experimentally abelled interactions in the String database, even those with a low probability (experimental score > 0.15). The addition of these new interactions increased the total number of interactions tested from 1089 to 1402 and generated 38 new models for Gram-negative species (13 with high accuracy) and 275 new models for Gram-positive bacteria (18 with high accuracy). All interactions are now included in the Source data 1 and high accuracy models will be deposited on Model Archive after acceptance.

  • Alphafold (AF2) criterion for complex prediction. Although AF2 has its limitations, its accuracy in predicting bacterial complexes is consistently high. Additionally, as extensively detailed in the response to Reviewer 1, we conducted a thorough validation of 140 bacterial protein-protein complexes from the PDB, whose structure was published after the last release of AF2. Our results show that AF2 was able to predict 81% of these complexes, with high accuracy, in line with other reported studies (Evans et al., Yin et al., 2022). While there might be some minor deviations, AF2 can largely capture the bacterial essential interactome accurately. In the revised version, we compare pDockQ and pDockQ2 metrics with our ipTM criterion to define confident models. We observed that both pDockQ and pDockQ2 metrics were capable of identifying highly reliable complexes, but also disregarded actual complexes (Figure 1—figure supplements 1-2). Thus, we decided to retain our initial criterion, based on ipTM scores, which is consistent with other authors who used similar ipTM thresholds to model bacterial interactions (e.g., O’Reilly et al., 2023).

In summary, although our methodology has inherent limitations, we believe that our approach is sound and can give a comprehensive and realistic view of the bacterial essential interactome. We hope that these new insights further substantiate our approach.

I don’t know of too many studies that use AlphaFold 2 in this way. This was clever. However, there are plenty of studies that use phylogenomic information to infer interactions. In this sense, the core idea of the paper is not intrinsically novel.

We thank the reviewer for valuing our approach. Although other methods have been used to predict interactomes, our study, to the best of our knowledge, provides the first high-quality essential interactome for bacteria. We used experimental data (analysis of single deletion mutants) to define the essential interactions in bacteria. Other methods, either using phylogenomic information and/or deep learning tools to infer interactions, have a poor performance, as illustrated in the preceding table. Often, these methods yield a high number of interactions and, in many cases, show a bias towards overrepresented entries in the positive databases used to train the predictors (Macho Rendón et al., 2022). Also, while other methods lack detailed structural insights into the interactions, we offer structural models for every interaction tested.

Overall, I do feel this would be worth publishing as an expose of AF2 is capable of. I'm not sure of the impact it will have on researchers, however.

We appreciate the reviewer's positive feedback on our manuscript. Using AF2, we identified key interactions using only gene deletion mutant data. This manuscript reveals new insights into the assembly of essential bacterial complexes, providing specific structural details to understand their stability and function. Additionally, our work seeks to establish a methodology applicable to all bacterial species, guiding future research in this field. The approach taken in this study may expand drug targeting opportunities and accelerate the development of more effective antibiotics aimed to disrupt these essential interactions. In conclusion, the impact of the paper lies in its novel use of Alphafold2 to understand essential bacterial protein interactions, providing key insights into assembly mechanisms, and identifying new potential drug targets.

Response to Reviewer #3

The selection of "essential" interactions is a bit arbitrary, given that their main criterion for selection is that both proteins are essential. Unfortunately, it's not always clear where the essential protein data is coming from. Authors cite Mateus et al. (ref 15) as source for E. coli, but I don't see an explicit list of essential genes in this paper (nor its supplement). For Pseudomonas the citation doesn't contain author information and for Acinetobacter essentiality only seems to refer to "essentiality" in the lung.

As a minimum, the author should provide a table with summary statistics for the essential proteins they are using, as this is the basis for the whole paper. Such a table should include the names of the species, the number of genes that are considered as essential, a very brief characterization of how essentiality was determined and the source for this information. For instance, are the genes listed in the Supplementary File congruent with the genes in the Database of Essential Genes (DEG) for these organisms? Finally, authors should indicate in that table which (essential) protein pairs are conserved across species, as this is another one of their selection criteria. Conservation is not necessary for an essential interaction, but it certainly makes it more likely.

We understand the reviewer's concerns regarding the selection of essential interactions and the need for a more thorough description of the sources of essential protein data. To address these concerns in the revised manuscript:

We included a clear explanation of the sources for essential protein data, including proper citations for each organism in Source data 1. The selected studies were primarily sourced from the DEG database. If data was unavailable, we revised the literature for relevant studies. The DEG database's most recent update was on September 1, 2020. A graphical summary of the datasets has been included in Figure 1—figure supplements 6-7, that shows the overlapping between the different studies.

Also, the reviewer was right in pointing out that for Acinetobacter baumannii, the study was conducted in the lung, which may bias the results as all other studies were performed in the test tube. To solve this, we replaced this study for Bai et al., 2021, that was performed in rich medium.

Author should also state whether they have verified that none of the random pairs are in the positive set.

We thank the reviewer for this comment. We certainly checked that none of the random pairs was present in the positive dataset. This clarification has now been added to the methods section.

This is also relevant because authors "retrieved all high-confidence PPIs between these proteins from the STRING database" which provides compound scores for interactions but that has often little to do with physical interactions (given that the scores factor in co-expression and several other criteria). In fact, I find STRING scores difficult to interpret for that very reason.

We appreciate the reviewer's comment to the use of combined interaction scores from the STRING database. We agree with the reviewer that STRING combined scores are somehow difficult to interpret because they combine different evidence of interaction. We decided to use the STRING combined scores to include interactions that may not have direct experimental evidence but are probable to interact according to other information (e.g., co-expression). However, to further examine the interactome we have also included in the revised version all interactions with experimental evidence in String to complete our interactome. As mentioned in the response to Reviewer 1, we expanded the tested interactions from 1089 to 1402. This resulted in 38 new models for Gram-negative species, with 13 being highly accurate, and 275 for Gram-positive bacteria, of which 18 were highly accurate. All interactions are now included in the Source data 1 and high accuracy models will be deposited on the Model Archive after acceptance.

The authors "reasoned that a given interaction would only be essential if and only if both proteins forming the complex are essential" – this sounds reasonable but doesn't capture synthetically lethal (genetic) interactions, that is, interactions between two proteins that are both non-essential but are essential in combination. Admittedly, I don't have a number of how many such cases exist, but there are such cases in the literature (e.g. Hannum et al. 2009, PLoS Genet 5[12]: e1000782, for yeast).

We thank the reviewer for bringing this point into discussion. We acknowledge that our reasoning does not capture synthetic lethality, which occurs when the loss of one of two individual genes has no effect on cell survival, but the simultaneous loss of both leads to cell death. In this case, the two genes or proteins are non-essential individually but become essential in combination. To cover synthetic lethality, we retrieved all synthetically lethal interactions found in Escherichia coli, strain K12-BW25113 from the Mlsar database and included them in our pipeline. We identified 28 synthetically lethal PPIs (involving 45 proteins) and we modeled them with AF2. Only two interactions displayed an ipTM score > 0.6 (nadA-pncB and nuoG-purA). Hence, the number of interactions due to synthetic lethality seems to contribute low to the overall interactome. We believe that synthetic lethal partners often function in parallel or compensatory pathways, rather than directly interacting with each other. For example, in yeast, the genes RAD9 and RAD24 are synthetic lethal. RAD9 is involved in cell cycle checkpoints, while RAD24 is involved in DNA damage response. They function in related pathways but do not encode proteins that directly interact with each other. Hence, finding specific examples of proteins that are both synthetic lethal and directly interact might be challenging as the synthetic lethal relationship often reveals functional rather than physical interactions.

Apart from that, one could question the selection method more generally, given that for a biological process always essential and non-essential proteins work together, so I wonder why the authors didn't include additional proteins known to be involved in specific processes as this could make their predictions much more biologically meaningful.

We agree with the reviewer that accessory proteins are important to understand the biological context of interactions. In fact, in several sections of our manuscript, we included accessory proteins to fully describe the essential complexes. For example, in the cell division complex, we incorporated proteins like MreCD-RodZ from the elongasome to enhance the structural context of the interactions. However, a comprehensive explanation of all identified interactions and accessory proteins would extend beyond the scope of this manuscript and further lengthen an already extensive document. In our study, we sought to describe the fundamental interactions for both Gram-negative and Gram-positive bacteria. We anticipate that our findings will prompt additional research to confirm our hypotheses and enhance knowledge of these protein complexes within the proper cellular context.

In any case, to understand their choice better, authors should provide a table (in the main text) summarizing the proteins they actually analyze and discuss in more detail in their models. This would allow a reader to see which proteins are considered essential and which ones are missing. I would organize this by function / pathway / process, so these proteins are listed in a functional context.

We added Table 1 in the main text, listing all interactions described in the text. Table 1 includes the proteins involved in each complex, the ipTM score of the interaction, whether a PDB code is available for comparison and the functional classification of the interaction.

With regard to docking, please also discuss why you focus on iPTM, as there are other derived metrics from AF2 scores, such as pdockq based on if_plddt (e. g. Bryant et al., 2022), as well as external metrics to AF2 (physics-based methods such as Rosetta). Another option may be a modified versions of AF2 multimer, such as AFSample, which produces a greater diversity of models, allowing for more "shots on goal" and ultimately a higher success rate, assuming one has a reliable QC filter (I wonder how those compares to iPTM).

We did not use AFsample because is a very expensive computational approach that would require too many resources for the batch prediction of more than 2.000 complexes. Afsample generates 240x models, and including the extra recycles, the overall timing is around 1,000x more costly than the baseline. However, we acknowledge that using other metrics can be useful to further evaluate our models. Hence, we investigated how pDockQ and pDockQ2 metrics compare with ipTM score. We observed that pDockQ hardly correlates with ipTM (R = 0.328) whereas the improved metric pDockQ2 correlates much better (R = 0.649). All complexes described in the manuscript, which have an ipTM score higher than our threshold (0.6), have also a pDockQ2 score higher than 0.23, except for six interactions that have a lower pDockQ2 score. However, these scores improve when the interactions are modeled with accessory proteins in the complex. This somehow suggests that the IpTM metric better captures binary interactions when these are excluded from their context. It is possible however, that pDockQ scores are better in discriminating false positive interactions than ipTM scores. Based on the strong correlation between the two metrics and the observation that ipTM may better capture binary interactions, we decided to keep our method in the manuscript. Other authors have employed analogous ipTM thresholds to model bacterial interactions (e.g., O’Reilly et al., 2023). Notwithstanding, we also included pDockQ and pDockQ2 metrics in Source data 1, so readers can evaluate complexes based on these metrics.

P. 1, third last line: “the essential interactome is a potentially powerful strategy to […] identify new targets for discovering new antibiotics”

Figures and figure legends need to be explicit which species is represented (ideally with a Uniprot ID) and which structure was predicted by alphafold and which one has an experimental structure. Known structures should be indicated in a table, as suggested above.

Figure 5: LptF is too dark when printed, so a lighter color may be better.

Figure 6: The cryoEM and alphafold structures look quite different, so please discuss discrepancies between them (in terms of prediction or cryEM odelling). A schematic may be helpful to illustrate the differences in more clarity.

Figure 7: LolC is also too dark when printed. Make lighter.

Maybe in some cases it may be worthwhile looking at Consurf structures to see if the predicted inferfaces are indeed more conserved than the non-conserved parts.

We thank the reviewer for his/her insightful feedback on our manuscript. We have addressed all these comments as follows:

The statement on page 1 was revised as suggested.

The main significance of this study is its potential use for a better understanding of the protein complexes described in more detail (and the fact that alphafold can be applied in a similar fashion to many other complexes). This is why the individual sections need to be evaluated to process-specific experts (disclaimer: I have only worked on some of the complexes, but I am not an expert on any of them). I wonder if it would make more sense to break out some of the sections on individual complexes into separate papers, and then discuss them in more detail and with more context from previous studies. Complexes such as the divisome have a huge body of literature and it may be worth reviewing which structures are known and which ones are not. However, the dynamic and labile nature of these complexes have made it difficult for both crystallography as well as odelling to get a good structural understanding, but some of the models proposed here may be useful for overcoming some of these hurdles.

We appreciate the reviewer's suggestion. While we acknowledge the complexity of some of the individual complexes, such as the divisome, and the wealth of existing literature, we believe that the current manuscript provides a valuable comprehensive view on how AF2 can be used to predict essential protein complexes in bacteria. In our opinion, dividing the manuscript in separate pieces might dilute its scope. Nonetheless, we are exploring in our laboratory the interactions detailed in the manuscript, aiming to further expand the knowledge on these important complexes and their potential as targets for new antimicrobials.

References:

Bai J, Dai Y, Farinha A, et al. Essential Gene Analysis in Acinetobacter baumannii by High-Density Transposon Mutagenesis and CRISPR Interference. J Bacteriol. 2021; 203(12):e0056520.

Chakravartty, V. and Cronan, J. E. Altered regulation of Escherichia coli biotin biosynthesis in BirA superrepressor mutant strains. J Bacteriol. 2012; 194:1113–1126.

Evans R, O’Neill M, Pritzel A, et al. Protein complex prediction with AlphaFold-Multimer.

bioRxiv. 2021; 2021.10.04.463034.

Huang Y, Wuchty S, Zhou Y, Zhang Z. SGPPI: structure-aware prediction of protein-protein interactions in rigorous conditions with graph convolutional network. Brief Bioinform. 2023; 24(2):bbad020

Lenz, S, Sinn, L.R, O’Reilly, F.J. et al. Reliable identification of protein-protein interactions by crosslinking mass spectrometry. Nat Commun. 2021; 12:3564.

Macho Rendón J, Rebollido-Ríos R, Torrent Burgas M. HPIPred: Host-pathogen interactome prediction with phenotypic scoring. Comput Struct Biotechnol J. 2022; 20:6534-6542.

O'Reilly FJ, Graziadei A, Forbrig C, et al. Protein complexes in cells by AI-assisted structural proteomics. Mol Syst Biol. 2023; 19(4):e11544.

Potvin, E., Lehoux, D.E., Kukavica-Ibrulj, I., et al. in vivo functional genomics of Pseudomonas aeruginosa for high-throughput screening of new virulence factors and antibacterial targets. Environmental Microbiology. 2003; 5: 1294-1308.

Wang N, Ozer EA, Mandel MJ, Hauser AR. Genome-wide identification of Acinetobacter baumannii genes necessary for persistence in the lung. mBio. 2014; 5(3):e01163-14.

Yin, R, Feng, BY, Varshney, A, Pierce, BG. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Science. 2022; 31(8):e4379.

https://doi.org/10.7554/eLife.94919.sa2

Article and author information

Author details

  1. Jordi Gómez Borrego

    Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Writing – original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1963-5455
  2. Marc Torrent Burgas

    Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    marc.torrent@uab.cat
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6567-3474

Funding

Ministerio de Ciencia e Innovación (PDC2021-121544-I00)

  • Marc Torrent Burgas

European Society of Clinical Microbiology and Infectious Diseases (ESCMID2022)

  • Marc Torrent Burgas

Ministerio de Ciencia e Innovación (PID2020-114627RB-I00)

  • Marc Torrent Burgas

Generalitat de Catalunya (Joan Oró Fellowship 2023 FI-100278)

  • Jordi Gómez Borrego

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This study was funded by a Research Grant 2022 of the European Society of Clinical Microbiology and Infectious Diseases (ESCMID) and the Spanish Ministerio de Ciencia e Innovación (PDC2021-121544-I00 funded by MCIN/AEI/10.13039/501100011033 and European Union Next GenerationEU/ PRTR, and project PID2020-114627RB-I00 funded by MCIN/AEI /10.13039/501100011033), all to MT. This work has been co-financed by the Spanish Ministry of Science and Innovation with funds from the European Union NextGenerationEU, from the Recovery, Transformation and Resilience Plan (PRTR-C17.I1) and from the Autonomous Community of Catalonia within the framework of the Biotechnology Plan Applied to Health. JGB is a recipient of a Joan Oró Fellowship from the Generalitat de Catalunya (2023 FI-100278). We would like to thank Dr. Enea Sancho Vaello for her comments on this work.

Senior Editor

  1. Volker Dötsch, Goethe University, Germany

Reviewing Editor

  1. Alan Talevi, National University of La Plata, Argentina

Version history

  1. Preprint posted: June 14, 2023 (view preprint)
  2. Received: November 30, 2023
  3. Accepted: December 22, 2023
  4. Accepted Manuscript published: January 16, 2024 (version 1)
  5. Version of Record published: February 13, 2024 (version 2)

Copyright

© 2024, Gómez Borrego and Torrent Burgas

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 428
    Page views
  • 123
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jordi Gómez Borrego
  2. Marc Torrent Burgas
(2024)
Structural assembly of the bacterial essential interactome
eLife 13:e94919.
https://doi.org/10.7554/eLife.94919

Share this article

https://doi.org/10.7554/eLife.94919

Further reading

    1. Computational and Systems Biology
    James D Brunner, Nicholas Chia
    Research Article

    The microbial community composition in the human gut has a profound effect on human health. This observation has lead to extensive use of microbiome therapies, including over-the-counter 'probiotic' treatments intended to alter the composition of the microbiome. Despite so much promise and commercial interest, the factors that contribute to the success or failure of microbiome-targeted treatments remain unclear. We investigate the biotic interactions that lead to successful engraftment of a novel bacterial strain introduced to the microbiome as in probiotic treatments. We use pairwise genome-scale metabolic modeling with a generalized resource allocation constraint to build a network of interactions between taxa that appear in an experimental engraftment study. We create induced sub-graphs using the taxa present in individual samples and assess the likelihood of invader engraftment based on network structure. To do so, we use a generalized Lotka-Volterra model, which we show has strong ability to predict if a particular invader or probiotic will successfully engraft into an individual's microbiome. Furthermore, we show that the mechanistic nature of the model is useful for revealing which microbe-microbe interactions potentially drive engraftment.

    1. Computational and Systems Biology
    2. Physics of Living Systems
    Nicholas M Boffi, Yipei Guo ... Ariel Amir
    Research Article

    The adaptive dynamics of evolving microbial populations takes place on a complex fitness landscape generated by epistatic interactions. The population generically consists of multiple competing strains, a phenomenon known as clonal interference. Microscopic epistasis and clonal interference are central aspects of evolution in microbes, but their combined effects on the functional form of the population’s mean fitness are poorly understood. Here, we develop a computational method that resolves the full microscopic complexity of a simulated evolving population subject to a standard serial dilution protocol. Through extensive numerical experimentation, we find that stronger microscopic epistasis gives rise to fitness trajectories with slower growth independent of the number of competing strains, which we quantify with power-law fits and understand mechanistically via a random walk model that neglects dynamical correlations between genes. We show that increasing the level of clonal interference leads to fitness trajectories with faster growth (in functional form) without microscopic epistasis, but leaves the rate of growth invariant when epistasis is sufficiently strong, indicating that the role of clonal interference depends intimately on the underlying fitness landscape. The simulation package for this work may be found at https://github.com/nmboffi/spin_glass_evodyn.