Introduction

Around 10,000 to 20,000 different types of proteins are encoded in the genome of most organisms, catalyzing the vast majority of physico-chemical reactions in cells1. Many proteins have specialized functions and they are often regulated through protein-protein interactions, where the formation of protein complexes can activate, inhibit, or stabilize their partners. Furthermore, protein-protein interactions can recruit target proteins to specific locations where they will function or regulate the mobility of the protein complex2. Within cells, proteins are thought to exist in a crowded environment and frequently interact with other molecules3. Thus, characterizing protein-protein interactions is fundamental for understanding protein function and regulation. Large-scale analyses of protein-protein interactions have been carried out, including Tandem Affinity Purification coupled with Mass Spectrometry (TAP-MS) for the yeast proteome4 and the comprehensive 2-hybrid screening for the Human Reference Interactome (HuRI)5. Despite these extensive studies, the overall protein-protein interactions are still not fully understood in many organisms.

The binding between proteins is significantly influenced by their three-dimensional (3D) structures. The characteristics of their interfaces, including hydrogen bonds, salt bridges, and hydrophobicity, determine the interactions6. Therefore, to analyze protein-protein interactions physically and chemically, information on the individual 3D structures of proteins is necessary. The 3D structures of proteins have been determined through experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR), and cryo-electron microscopy7. However, these techniques demand considerable labor and time. The recently developed AlphaFold2 program can predict the 3D structure from its amino acid sequence with high accuracy8. This tool has not only been utilized in computational studies but has also become a valuable resource in experimental sciences for predicting protein complexes, as demonstrated with yeast protein complexes9.

In this study, we attempted a rapid screening of the protein interactions using AlphaFold2 prediction, primarily focusing on components of nuage, a germline specific, non-membrane organelle that involves wide variety of proteins containing unique motifs and domains in Drosophila melanogasater10. Nuage is known to serve as the production and amplification site for small non-coding piRNA, which is bound to PIWI-family proteins. The piRNAs and the PIWI family proteins function to repress mobile genetic elements, or transposons, that disrupt the genomes through their active transpositions11. Not only proteins involved in piRNA production, but also translation repressor proteins including Me31B, Cup, and Trailer hitch (Tral), also localize in nuage12. Previous studies have shown that the localization of several components in nuage depends on their partners in a hierarchical manner13. However, the interaction and organization among nuage components remain unclear.

By using AlphaFold2 predictions, we investigated 20 of the nuage-localizing or piRNA-related proteins for pairwise interactions. We confirmed the novel interactions of candidate pairs including Spindle-E (Spn-E)_Squash (Squ), by co-immunoprecipitation assay using cultured cells. In addition, a Squ mutant, which disrupts the salt bridges predicted at the interface with Spn-E, failed to interact with Spn-E, validating the accuracy of the predicted dimer structure. This screening was expanded for direct interacting pairs between piRNA-related proteins and proteins involved in oogenesis, as well as Piwi and other Drosophila proteins. This in silico approach not only streamlines the identification of interaction partners but also bridges the gap between bioinformatics predictions and experimental validation in biological research.

Result/Discussion

The nuage-localizing proteins and piRNA-related proteins used in the AlphaFold2 screening

Several dozen proteins engaged in piRNA production in germline cells exert their function by recruiting piRNA precursors and interacting with their partner proteins, forming non-membrane structure called a nuage10,13. Previous studies reported that many piRNA-related proteins localized to nuage and some proteins localized in mitochondria (Table 1). In addition, protein components of processing bodies and sponge bodies, which are involved in the translation, storage, degradation, and transportation of mRNAs—such as Me31B, Cup, and Tral—also localize to nuage12 (Table 1). However, the details of how these proteins interact and organize themselves within the nuage remain unclear.

The piRNA production-related proteins used in this study

In this study, we used the AlphaFold2 program to screen for interactions among 20 proteins that are localized in the nuage and/or involved in piRNA production in Drosophila (Table 1). The monomeric structures of these 20 proteins, ranging in size from 20 kDa to 250 kDa, have already been predicted and are registered in databases14. This set includes both well-structured proteins and those that are largely disordered with numerous loops (Supplemental Fig. S1A). Of those, eight proteins feature one or more Tudor domains or extended Tudor (eTud) domains. The Tudor domain contains approximately 60 residues and folds into an antiparallel β-sheet with five strands forming a barrel-like fold, while the eTud domains include an additional Oligonucleotide/oligosaccharide-Binding fold domain15. Both Tudor and eTud domains are known to bind predominantly to methylated lysine or arginine residues. In addition, five RNA helicases, such as Vasa (Vas) and the fly homolog of Tdrd9, Spn-E, which are essential for piRNA processing, are also included (Table 1). The Vas’s C-terminal region is known to bind to the Lotus domain shared by two nuage components, Tejas (Tej) and Tapas. Spn-E is also recently shown to interact with Tej16. Among those 20 proteins, the Molecular Interaction Search Tool (MIST), a conventional database of protein-protein interactions, registers eight interacting pairs as direct binding, and 28 interactions which are direct or indirect (Table 1, Supplemental Fig S1B, C)17.

Screening for the protein-protein interactions by AlphaFold2

We used AlphaFold2 program to predict the direct protein-protein interaction and 3D structure of the complex. Assuming a 1:1 binding of 20 types of proteins, a total of 400 pairs of dimer predictions were calculated by a supercomputer. The prediction flow of AlphaFold2 consisted of two main parts8. Initially, a multiple sequence alignment was performed for each query protein and stored for the future use. Subsequently, the AlphaFold2 program predicted 3D dimer structures based on the co-evolution inferred from the multiple sequence alignments. For each dimer prediction, five different structure models with varying parameters were generated. Among these, the model with the highest prediction confidence score (pcScore) was selected as the final prediction result. The pcScore is constituted by two evaluations, the overall structure (pTM) and an evaluation of the dimeric interface (ipTM), emphasizing the interface evaluation as represented by the following formula18:

pcScore = 0.8 x ipTM + 0.2 x pTM

These 3 values, pcScore, ipTM and pTM, for each prediction pairs were visualized in the separate heatmaps (Fig. 1A). In general, pcScore and ipTM values showed similar trends although a well-structured protein (e.g. Spn-E) tended to have a higher pTM value, which slightly elevated the pcScore. Based on this, in this study, we used the pcScore as an indicator of the protein-protein interaction. Each heterodimeric pair was calculated twice in the pairwise screening (e.g. proteins A_B and B_A), and the pcScores were plotted (Fig. 1B). The results showed that there was significant variance in the pairs with lower pcScores, while pairs with pcScores above 0.6 had relatively higher reproducibility. Consequently, we set a threshold of 0.6 and considered protein pairs with pcScores above 0.6 as likely complex-forming candidates. This approach identified 13 pairs; seven of these were already known to form complexes, confirming the effectiveness of AlphaFold2 in predicting complex formations (Table 2). The highest pcScore pair was the Zuc homodimer, possibly because AlphaFold2 had learned from Zuc homodimer’s crystal structure registered in the database19. For the remaining 12 pairs, the predicted 3D structures and the Predicted Aligned Error (PAE) plots were shown in Fig. 1C. Consistent with a previous report using silkworm Bombyx mori20, both Argonaute 3 (AGO3) and Aub, members of PIWI-family proteins sharing 50%-60% amino acid sequence similarity, were predicted to form dimers with Maelstrom (Mael) (Fig. 1C-i, ii, Table 2). AGO3 and Aub appeared well-folded protein except for their N-terminal flexible regions. In contrast, Mael protein was divided into three parts: N-terminal HMG domain, middle MAEL domain, and C-terminal disordered region21 (Fig. 1C-i, ii). AlphaFold2 predicted the MAEL domain interacted with AGO3 and Aub.

The 1:1 dimer structure prediction by AlphaFold2 for piRNA-related proteins

(A) Heatmaps of the prediction confidence scores (pcScore, Green), pTM values (Blue), and ipTM values (Red) provided by AlphaFold2. The 20 types of proteins are aligned from top to bottom and left to right in the same order. Boxes on diagonal line represent homodimers.

(B) Scatter plot of the pcScores. The scores from first and second predictions for each heterodimer pair are plotted on X and Y axis, respectively.

(Ci ∼ xii) The predicted 3D structures (top panels) and the Predicted Aligned Error (PAE) plots (bottom panels) for each candidate heterodimers scoring above 0.6. The PAE plot displays the positional errors between all amino acid residue pairs, formatted in a matrix layout.

(D) Co-immunoprecipitation assays using tagged proteins to verify interactions between specific pairs: Spn-E_Squ (i), Aub_Vret (ii), Spn-E_BoYb (iii), BoYb_Shu (iv), and Me31B_Vret (v). Single transfected cells expressing only Myc-tagged but not Flag-tagged proteins are used as negative controls for each set. Box and whisker plots show the intensity ratio between immunoprecipitated and input bands (n=3).

The screening for the interacting proteins (prediction confidence score, pcScore > 0.6)

Me31B, Tral, and Cup are recognized as RNA regulators localized to the nuage and/or sponge body, though they are not directly involved in the piRNA pathway. Previous studies have indicated that these proteins form complexes12,22,23. Me31B is a well-conserved RNA helicase and showed the tightly-folded structure composed of two concatenated RecA helicase domains24. On the other hand, Tral and Cup were predicted largely disordered structure with some secondary structures (Fig. 1C-iii, iv). The predicted dimer structures of Me31B_Tral and Cup_Me31B showed the score of 0.74 and 0.68, respectively (Table 2). Consistent with the previous study23, AlphaFold2 predicted that the FDF motif of Tral, which contains a Phe-Asp-Phe sequence folded into two a-helixes from residue 405 to 537, was associated with Me31B. (Fig. 1C-iii). In addition, an α-helix and loop regions of Cup were predicted to make a contact with Me31B (Fig. 1C-iv). BoYb and Vret, both are eTud domain containing proteins25 and their direct interaction has been suggested by the high retrieval rate for BoYb in the immunoprecipitant of Vret from the ovary26. The predicted structure revealed that both BoYb and Vret proteins consist of two domains, one at the N-terminal and the other at the C-terminal, connected by a flexible region. (Fig. 1C-v). Interactions were predicted between their N-terminal domains and between C-terminal domains, respectively. It has been reported that Tej, known as Tdrd5 in mammal, binds directly to Vas through its N-terminal Lotus domain27 (Fig. 1C-vi) and to Spn-E through its loop region continuing the eTud domain16 (Fig. 1C-vii). The predicted structures of Tej_Vas and Spn-E_Tej were consistent to their binding properties reported previously.

The remaining five pairs, previously unreported as directly interacting, were considered novel binding pairs (Table 2, Fig. 1C-viii-xii). These interactions were experimentally examined using Drosophila S2 culture cells derived from embryonic somatic cells that lack germline-specific proteins. Previously, Squ was co-immunoprecipitated with Spn-E along with other nuage components from ovarian lysate28, but whether this interaction was direct had not been examined. Co-immunoprecipitation assay in S2 cells, Myc-Spn-E was strongly detected in the precipitant of Flag-Squ by Western blotting, possibly supporting the direct interaction between Spn-E and Squ in the S2 cells devoid of germline proteins (Fig. 1D-i). Similarly, AlphaFold2 predicted a direct interaction between Aub and Vret, which was corroborated by co-immunoprecipitation assays (Fig. 1D-ii). The binding capabilities of another pair, BoYb-Shutdown (Shu), were also confirmed in S2 cells (Fig. 1D-iv). Three out of five candidate pairs confirmed interactions, validating the effectiveness of AlphaFold2 in identifying the binding partners. However, BoYb-Spn-E and Me31B-Vret did not show interaction in these assays (Fig. 1D-iii, v), possibly suggesting weak interactions that co-immunoprecipitation may have failed to detect.

Evaluation of Spn-E and Squ interaction in culture cells and ovaries

Among the binding candidates, we focused on the predicted dimer structure of Spn-E and Squ pair. Spn-E is an evolutionarily conserved RNA helicase which is expressed in germline cells. It plays a crucial role in the piRNA production and transposon suppression in germline cells28,29. Similarly, Squ is also expressed in ovary and testis and involved in the piRNA production, although its molecular role is less defined29,30. While squ is conserved across Drosophila species (Supplemental Fig. 2A, B), vertebrate orthologs remain unidentified. Spn-E contains four domains: DEAD/DEAH helicase, Hel-C, HA2, and eTud domains (Fig. 2A). Its predicted 3D structure was well folded and contained few flexible regions (Fig. 1C-viii). In contrast, Squ was predicted to be largely disordered, consisting of three α-helices and two β-strands (Fig. 2A). The middle parts of Squ were in close contact with Spn-E, showing lower PAE values, suggestive of their interaction (Fig. 1C-viii, 2A). AlphaFold2 predicts the five structure models for each query using different initial model parameters (models 1-5) and pcScore is given to each model. As for Spn-E_Squ pair, the scScores were ranging from 0.74 to 0.77. The 3D structures of Spn-E were very similar across all five models, superimposing almost perfectly (Fig. 2B). The middle region of Squ was consistently positioned relative to Spn-E, although the N- and C-terminal regions of Squ remained flexible (Fig. 2B).

Interaction between Spn-E and Squ

(A) Schematic of Spn-E domain structures defined in SMART44. Boxes (α-helix: orange) and arrow (β-sheet: green) for Squ structure. The predicted interacting regions between Spn-E and Squ are indicated in gray boxes. Tej interaction site of Spn-E is also shown16.

(B) The predicted five models of heterodimer of Spn-E (in gray) and Squ (in magenta). Spn-E molecules in all five models are superimposed.

(C) 3D structure of the Spn-E_Squ dimer colored by Spn-E domains as indicated in (A), with Squ in magenta. The enlarged image of the interface indicated by box is also shown.

(D) The predicted salt bridges at the interface, with Spn-E in gray and Squ in magenta. The residues forming salt bridges are depicted in stick model.

(E) Co-immunoprecipitation assay using S2 cell lysate to examine the interaction between Myc-Spn-E and Flag-Squ mutant (4A) whose salt bridge-forming residues are mutated to Ala. S2 cells expressing Myc-Spn-E alone is used as a control. The ratios of the band intensity (IP/input) are shown in a box and whisker plot (n=3).

(F) The heterotetramer model of Spn-E_RNA_Squ_Tej predicted by AlphaFold3. Spn-E is shown as a space filled model in gray, Squ in magenta, Tej in cyan, and RNA in yellow. The model on the left is rotated 180° in the Y axis to produce the image on the right.

The closer examination of the Spn-E_Squ dimer interface revealed a short α-helix of Squ (106th-116th residues) fitted into a groove on the Spn-E surface, while the anti-parallel β-sheet (140th-153rd) was also predicted to interact with Spn-E (Fig. 2A, C). Physico-chemical structural analysis using PDBePISA server (EMBL-EBI) identified salt bridges between Spn-E and Squ (Supplemental Table S2, S3)31. To validate these predicted interactions, we generated Squ mutants substituting each residue involved in the four salt bridges (E107, E109, R115, and K163) with alanine (Fig. 2D, Supplemental Fig. S2B) and assessed their interactions by co-immunoprecipitation in S2 cells expressing tagged proteins, Myc-Spn-E and Flag-Squ. The assay revealed that while the E107A single mutation did not affect the interaction, other single mutations mildly reduced the binding affinity of Squ to Spn-E (Supplemental Fig. S3A), Furthermore, the localization of GFP-tagged Squ and mKate2 (mK2)-tagged Spn-E were examined in S2 cells. When only Squ was expressed, it was dispersed in cytosol (Supplemental Fig. S3B). On the other hand, when only Spn-E was expressed, it localized in the nucleus as reported previously16. In the co-expression of Squ wildtype or single mutants, Spn-E was moved to the cytoplasm and form granules together with Squ, suggesting the interaction between them. Although the single mutants still could bind to Spn-E, Squ quadruple mutant (Squ4A) completely lost the binding (Fig. 2E) and did not show the co-localization with Spn-E in S2 cells (Supplemental Fig. S3B). These results suggest that the salt bridges are important for the interaction between Spn-E and Squ and support the accuracy of their dimer structure predicted by AlphaFold2.

While the RNA binding site of Spn-E has not been extensively studied, it is presumed to be near the helicase domain, similar to the Vas helicase-RNA complex32. In addition, Lin et al demonstrated that Hel-C domain of Spn-E interacted with the Tej’s eSRS region, which recruits Spn-E to nuage16, a site distinct from the predicted Squ binding sites (Fig. 2A). Interestingly, a tetramer complex of Spn-E_Squ_Tej_RNA predicted by the recently available AlphaFold333 placed the single strand RNA (ssRNA) near Spn-E’s helicase domain (Fig. 2F), aligning with the ssRNA binding position found in Vas (Supplemental Fig. S3C). The predicted tetramer model suggests that Squ binding to Spn-E does not inhibit but may potentially regulate Spn-E’s interaction with Tej or RNA by stabilizing the domain orientation of Spn-E (Fig. 2F).

We investigated whether Spn-E also interacts with Squ within the Drosophila ovary. The antibody against Squ detected a specific band at the expected size by Western blotting in the heterozygous control ovarian lysate, which was absent in the transheterozygote mutant, squPP32/HE47 (Fig. 3A)30. Consistent with the previous report conducted with the transgenic line expressing HA-Squ30, immunostaining of ovaries revealed the Squ’s localization in nuage, which overlaps with endogenously-tagged Spn-E with mK2 (Fig. 3B). Spn-E was co-immunoprecipitated together with Squ from ovarian lysate, indicating the interaction between Squ and Spn-E (Fig. 3C). While the previous mass spectrometry analysis detected PIWI family proteins, Piwi, Aub, and AGO3, in Spn-E immunoprecipitates28, these three proteins were not present in the immunoprecipitant of Squ (Fig. 3C), further supporting the direct interaction between Squ and Spn-E.

Spn-E and Squ interact in Drosophila ovary

(A) Western blotting analysis using anti-Squ antibody reveals a specific band at the expected size (approximately 28 kDa) for endogenous Squ in Drosophila ovarian lysates of the heterozygous control. This band is absent in the transheterozygoute, squPP32/HE47.

(B) Immunostaining of Drosophila egg chambers with anti-Squ antibody and anti-mKate2 (mK2) antibody demonstrates colocalization of Squ and Spn-E-mK2 in nuage, a perinuclear granule in germline cells. The enlarged images of nuclei are shown in the panels below. Scale bars: 10 μm (top row), 2.5 μm (enlarged images)

(C) Immunoprecipitation of the endogenous Squ from ovarian lysate revealed the interaction with Spn-E protein. Proteins were detected by western blotting analysis using the specific antibody for each protein. The negative control was performed without anti-Squ antibody (beads only).

Screening oogenesis-related proteins for interaction with nuage proteins

Given the role of nuage for piRNA biogenesis and germline development, interactions between nuage-localized proteins and those involved in oogenesis were expected. We employed AlphaFold2 to predict these interactions using Vas, Squ, and Tej, the representative nuage components yet remain elusive, as baits. Of 430 proteins in oogenesis pathway34, dimeric binding of 1,290 pairs was predicted (Supplemental Table S4), with 18 pairs showing dimer structures scoring above 0.6 (Table 3). Among those, co-immunoprecipitation in S2 cells confirmed interactions of three pairs, Mei-W68_Squ, CSN3_Squ, and Pka-C1_Tej (Fig. 4A, B, Table 3). The Mei-W68_Squ dimer, scoring 0.63, the binding site of Squ to MeiW68 was predicted at α-helixes in its middle region, which overlaps with the interacting site to Spn-E (Table 3, Fig 4A-i, compare with Fig.1C-viii). Mei-W68 is a topoisomerase, known as Spo11 in many organisms, which is required for the formation of double strand breaks during meiosis35.

Squ- and Tej-interacting proteins predicted by AlphaFold2

(A i-iii) The predicted dimer structures (top) and Predicted Aligned Error (PAE) plots (bottom) of Mei-W68 in blue and Squ in magenta (i), CSN3 in green and Squ in magenta (ii), Pka-C1 in orange and Tej in cyan (iii). The PAE plot displays the positional errors between all amino acid residue pairs, formatted in a matrix layout.

(B i-iii) Co-immunoprecipitation assays using tagged proteins to verify interactions between specific pairs: Mei-W68_Squ (i), CSN3_Squ (ii), and Pka-C1_Tej (iii). Single transfected cells expressing only Myc-tagged but not Flag-tagged proteins are used as negative controls for each set. Box and whisker plots show the intensity ratio between immunoprecipitated and input bands (n=3).

The binding candidates predicted by AlphaFold2

Interestingly, Squ also plays a role in DNA damage response pathway and showed the genetic interaction with chk2, a meiotic checkpoint gene30. These results suggest that the binding of Squ to Mei-W68 may regulate the enzymatic activity of Mei-W68 in order to suppress the excessive formation of double-strand breaks. Another confirmed pair was CSN3_Squ pair scoring 0.62 (Fig. 4A-ii, B-ii). CSN3, a component of COP9 signalosome which removes Nedd8 modifications from target proteins, is required for the self-renewal of the germline stem cells36. Pka-C1, a cAMP-dependent protein kinase involved in axis specification, rhythmic behavior and synaptic transmission37 and predicted to bind with the N-terminal Lotus domain of Tej (Score 0.64, Fig. 4A-iii, B-iii), which is also known as binding site to Vas27. This suggests a potential competitive interaction between Pka-C1 and Vas for Tej. Although the success rate of confirmed interactions was low (3 out of 18) (Table 3, Supplemental Fig. S4), the results indicate that these protein pairs could interact within cells if co-expressed in vivo.

Screening all Drosophila proteins for Piwi-interacting proteins

Given the crucial role of Piwi in piRNA biogenesis, heterochromatin formation, and germline stem cell (GSC) maintenance, we employed AlfaFold2 to screen all proteins in Drosophila melanogaster for potential Piwi interactions. Piwi, the founder member of the PIWI family proteins, is not only essential for binding piRNAs and regulating complementary mRNAs but also plays a critical role in GSC self-renewal38. Studies have shown that Piwi, lacking the N-terminal moiety containing the nuclear localization signal (NLS), still retains GSC self-renewalcapabilities. Its function in GSC self-renewal is realized independently in the cytoplasm of GSC niche cells, separate from its role in transposon repression. The crystal structures of Drosophila Piwi and silkworm Siwi have been solved and revealed the organization of four domains (N, PAZ, MID, and PIWI)39,40. Recently, the ternary structure of piRNA, target RNA, and MILI, a mouse ortholog of Piwi, has been reported and the bound piRNA threaded through the channel between N-PAZ and MID–PIWI lobes (Supplemental Fig. S5A)41.

To identify novel Piwi-binding proteins, we conducted a 1:1 interaction screening involving approximately 12,000 Drosophila proteins, excluding any proteins over 2,000 amino acid residues due to the computational limits. The pcScores by AlphaFold2 were primarily low, with over 98% being below 0.6, suggesting a low likelihood of interaction between Piwi and the vast majority of the proteins (Fig. 5A). Approximately 1.5% of the pairs, totaling 164 pairs, scored above 0.6, was expected to contain the novel binding partners (Supplemental Table S5). Top 24 candidates with greater than 0.75 pcScore were listed in Table 4. This list contained many metabolic enzymes and three piRNA-related proteins, Asterix (Arx), Mael, and Hen1. The interactions between Mael and Piwi-family proteins have been already reported20. Arx, known as Gtsf1 in mammals and integral to Piwi–piRISC-mediated transcriptional silencing in nucleus42, had high pcScores (0.83, Table 4). Despite its known three-dimensional structure determined by NMR spectroscopy43, the Arx_Piwi complex structure remained elusive. AlphaFoldF2 predicted that while Arx lacked a compact domain, the majority of Arx protein associated around the PIWI domain, except for the flexible C-terminal region (130th-167th residues) (Fig. 5B-i). Three Arx paralogs in Drosophila (CG34283, CG32625, and CG14036) were also predicted to bind to Piwi with high pcScores, suggesting their interactions within the cells (Supplemental Fig. S5B). Although CG34283 is not expressed, CG32625 and CG14036 are moderately and highly expressed in ovary, respectively37. However, unlike arx, knockdown of each paralogous gene did not result in de-repression of a transposon, mdg142, suggesting that they may be pseudogenes or possess redundant roles.

Screening for Piwi-interacting proteins in Drosophila proteome

(A) Pie chart displaying the distribution of pcScores from the AlphaFold2 screening for Piwi-interacting proteins among those encoded by Drosophila genome.

(Bi∼ v) The predicted dimer structure (top) and PAE plots (bottom) for the Piwi and the binding candidates in red: Arx (i), Hen1 (ii), CG33703 (iii), Twf (iv), and Brn (v). Piwi is shown in the same colors as Supplemental Fig. S5A.

(C) Co-immunoprecipitation assays using tagged proteins to verify interactions between Piwi and the binding candidates, Twf and Brn. Single transfected cells expressing only Flag-Piwi is used as negative control. Box and whisker plots show the intensity ratio between immunoprecipitated and input bands (n=3).

Piwi-interacting proteins predicted by AlphaFold2 (Socre >= 0.75).

Hen1 is a methyltransferase known to mediate methylation of the terminal 2’ hydroxyl group of small interfering RNAs and piRNAs, thereby enhancing the stability of the small RNAs. Consistent with the previous report showing Hen1 binding to Piwi42, the dimer structure of Hen1_Piwi was predicted with high pcScore, 0.77. This prediction further suggests that Hen1 is recruited to Piwi, thereby positioning it closer to the piRNA substrate (Fig. 5B-ii). Another potential interacting protein for Piwi was CG33703, a protein whose functions remains uncharacterized despite having 75 paralogs listed in Drosophila genome37. Together with three of these paralogs (CG33783, CG33647, and CG33644), CG33703 was predicted to form dimer with Piwi (pcScores 0.82) (Table 4, Supplemental Fig. S5C). The domain of unknown function, DUF109144, shared by these paralogs was predicted to associate with the PIWI-domain (Fig. 5B-iii). Although these proteins are generally not expressed under the normal conditions37, their potential to bind Piwi suggests a regulatory role in the abnormal or stress conditions where CG33703 or its paralogs are expressed. In addition, we investigated two oogenesis-related proteins, Twinfilin (Twf, pcScore 0.64, Fig. 5B-iv) and Brainiac (Brn, pcScore 0.63, Fig. 5B-v), for their binding with Piwi through co-immunoprecipitation (Fig. 5C, Supplemental Table S5). While no binding was observed with Twf, significant binding was detected with Brn, which is involved in dorsal-ventral polarity determination in follicle cells45.

In this study, we have identified several potential partners for novel protein interactions, though the physiological relevance of these pairs remains to be elucidated. The expression patterns of these candidate proteins within the organism are crucial for further validation of our findings. It is likely that these proteins interact when co-expressed in the same cellular context. Under typical growth conditions, these interactions might not occur; however, in stress or disease states where these proteins are upregulated, the likelihood of interaction increases, potentially implicating these interactions in the disruption of normal cellular functions and contributing to disease or tumorigenesis. Furthermore, in silico screening proves extremely valuable, especially when dealing with toxic bait proteins, as it allows us to narrow down the list of potential candidates and reduce the need for hazardous experimental procedures. Ultimately, establishing these potential interactions in vivo could significantly advance our understanding of protein functions under both normal and pathological conditions.

Materials and Methods

Antibodies

The anti-Squ antibody was generated as follows. His-tagged full-length Squ was expressed in Escherichia coli BL21(DE3) strain, with the plasmid that subcloned the squ coding region into pDEST17 vector (Thermo Fisher Scientific). His-Squ was solubilized with 6 M Urea in PBS, purified using Nickel Sepharose beads (GE healthcare) following the manufacturer’s protocol, and subsequently used for immunization in rats. The antibodies used for Western blotting analysis were rat anti-Spn-E16 (1:500), rat anti-Ago316 (1:200), guinea pig anti-Aub46 (1:1000), mouse monoclonal anti-Piwi (G-1, sc-390946, Santa Cruz Biotechnology (United States)), and mouse monoclonal anti-α-Tubulin (DM1A, sc-32293, Santa Cruz Biotechnology). The secondary antibodies used in this study were HRP-conjugated goat anti-guinea pig (Dako, Cat.# P0141), HRP-conjugated goat anti-rat (Dako, Cat.# P0450), HRP-conjugated goat anti-mouse (BioRad, Cat.# 1706516) and HRP-conjugated goat anti-rabbit (BioRad, Cat.# 1706515). HRP-conjugated anti-DDDDK-tag antibody (MBL, Cat.#M185-7) and HRP-conjugated anti-Myc-tag antibody (MBL, Cat.#M192-7) were used to detect FLAG-tagged and Myc-tagged proteins, respectively.

AlphaFold2 prediction for the direct interacting protein pairs

Amino acid sequences for Drosophila proteins were obtained from Flybase37. For proteins annotated with multiple isoforms, only the longest isoform was selected. Proteins exceeding 2,000 residues were excluded due to computational limitations. AlphaFold v2.2 program was installed in the Supercomputer for Quest to Unsolved Interdisciplinary Datascience (SQUID) at Cyber Media Center in Osaka University. All necessary protein sequence databases for AlphaFold2 were stored on an SSD device connected to the SQUID system.

The AlphaFold2 prediction process was divided into two steps: generation of the multiple sequence alignment (MSA) and the prediction of the 3D structure. The MSAs were computed on SQUID’s CPU node and stored for reuse. For dimer structure prediction, two MSAs corresponding to the dimer pair were placed in the directory of msas/A and msas/B. The calculations were performed on the GPU node with the options of -t 2022-05-14 -m multimer -l 1 -p true. AlphaFold2 generates five structural models for each prediction. To speed up the prediction, five computations were assigned to five GPU units, even though the original AlphaFold2 program computes 5 models one at a time. The prediction confidence score (pcScore) was provided for each model and among 5 models, the highest pcScore was used as the prediction score for the corresponding dimer structure. PAE plots for dimer structures were drawn by extracting the data form pkl files generated by AlphaFold2. The list of protein pairs scoring above 0.6 and the corresponding PAE plots and PDB structures are available on Github (https://dme-research.github.io/AF2_2/).

AlphaFold3 prediction for the RNA-containing complex structure

The structure of Spn-E_Squ_Tej complexed with RNA, 5’-CUGACUACCGAAGUACUACG-3’, was predicted by the AlphaFold3 prediction server (https://golgi.sandbox.google.com/)33.

Analysis of protein 3D structure

The protein 3D structure was visualized using ChimeraX software47. The SpnE_Squ dimer interface was analyzed with the ‘Protein interfaces, surfaces and assemblies’ service (PISA) at the European Bioinformatics Institute (http://www.ebi.ac.uk/pdbe/prot_int/pistart.html)31.

Fly stocks

All stocks were maintained at 25L with standard methods. Mutant alleles of squ (squpp32 and squHE47) were used in this study30. The mK2-tagged Spn-E-mK2 knock-in fly was previously generated16. y w strain served as the control.

Western blotting

Ovaries were homogenized in the ice-cold PBS and denatured in the presence of SDS sample buffer at 95°C for 5 min. The samples were then subjected to SDS-PAGE and transferred to ClearTrans SP PVDF membrane (Wako). The primary and secondary antibodies described above were diluted in the Signal Enhancer reagent HIKARI (Nacalai Tesque). Chemiluminescence was induced by the Chemi-Lumi One reagent kit (Nacalai Tesque) and detected with ChemiDoc Touch (Bio-Rad). The bands were quantified using ImageJ48 or Image Lab software (Bio-Rad).

Co-immunoprecipitation in S2 cells

Drosophila Schneider S2 cells were cultured at 28°C in Schneider’s medium supplemented with 10% (v/v) fetal bovine serum and antibiotics (penicillin and streptomycin). Protein coding regions were cloned into pENTR vector (Thermo Fisher Scientific) and then transferred into pAFW or pAMW destination vectors. S2 cells (0.2-2x106 cells/ml) were seeded in 12-well plates overnight and transfected using Hilymax (Dojindo Molecular Technologies, Japan). After 36-48 hours, S2 cells were resuspended in 360 μl of ice-cold PBS containing 0.02% Triton-X100 and 1x protease inhibitor cocktail (Roche), and sonicated (0.5 sec, 5 times). The resulted lysate was clarified by spinning at 15,000 xg for 15 min at 4°C. 300 μl of supernatant was incubated with 6 μl of prewashed anti-FLAG magnetic beads (MBL) or anti-Myc magnetic beads (Thermo Fisher Scientific) for 1.5 h at 4°C with gentle rotation. After incubation, the beads were washed three times with 800 μl of ice-cold PBS with 0.02% Triton-X100, denatured in SDS sample buffer and subjected to SDS-PAGE and Western blot. 1% of the total lysates were loaded as input samples.

Co-localization assay in S2 cells

Construction of GFP-tagged or mKate2-tagged proteins and transfection were conducted as described in the previous section. After 48 h of transfection, the cells were placed onto the concanavalin A-coated coverslips for 20 min, fixed with PBS containing 4% (w/v) paraformaldehyde for 15 min at room temperature, permeabilized with PBX [PBS containing 0.2% (v/v) TritonX-100] for 10 min twice, stained with DAPI (1:1000) and mounted with Fluoro-Keeper Antifade Reagent (Nacalai Tesque). Images were taken by ZEISS LSM 900 with Airy Scan 2 using 63x oil NA 6.0 objectives and processed using ZEISS ZEN 3.0 and ImageJ48.

Crosslinking immunoprecipitation (CL-IP)

As previously described16, 100 ovaries from y w flies were dissected in ice-cold PBS and fixed in PBS containing 0.1% (w/v) paraformaldehyde for 20 min on ice, quenched in 125 mM glycine for 20 min, and then homogenized in CL-IP lysis buffer. The lysate was incubated at 4°C for 20 min and then sonicated. After centrifugation at maximum speed for 10 min at 4°C, the supernatant was collected and diluted with an equal volume of CL-IP wash buffer. 10 μl of pre-washed Dynabeads Protein G/A mixture (1:1) (Invitrogen) was added for pre-clearance at 4°C for 1 h. Anti-Squ antibody was added to the cleared supernatant with 1:500 dilution and incubated at 4°C overnight. The 20 μl of pre-washed Dynabeads Protein G/A 1:1 mixture beads (Invitrogen) were added for binding and incubated at 4°C for 3 h. After washed with CL-IP wash buffer for 3 times, beads were collected and 50 μl of CL-IP wash buffer containing SDS sample buffer was added. The beads were boiled at 95°C for 5 min and subjected for SDS-PAGE and Western blotting analysis.

Immunostaining of ovaries

As previously described16,46, ovaries were dissected, fixed, permeabilized with PBX and immunostained. The primary and the secondary antibodies were anti-Squ antibody (in this study, 1:500) and Alexa Fluor 488-conjugated anti-rat IgG (Thermo Fisher Scientific, 1:200). Images were taken by ZEISS LSM 900 with Airy Scan 2 using 63X oil NA 1.4 objectives and processed by ZEISS ZEN 3.0 and ImageJ48.

Data availability statement

PDB files and PAE plots for the protein dimers whose pcScores were more than 0.6, were deposited and available at GitHub Pages (https://dme-research.github.io/AF2_2/)

Acknowledgements

The prediction by AlphaFold2 was achieved through the use of large-scale computer systems, Supercomputer for Quest to Unsolved Interdisciplinary Datascience (SQUID) at the Cybermedia Center, Osaka University through the Research Proposal-based Use, Large-Scale High-Performance Computing Projects to K.S. (Cyber media center, Osaka University) and the High Performance Computing Infrastructure (HPCI) System Research Project (Project ID: hp240099 to K.S.). We thank Dr Trudi Schüpbach (Princeton University) for generous gifts of squ mutant flies. We also thank the FBS Core Facility in Osaka University for providing access to the LSM 900 and ChemiDoc Touch. We appreciate the insightful discussion and suggestions from all the members of KT’s laboratory.

Additional information

Author contributions

Conceptualization: S.K.;

Methodology: S.K., X.X., S.T., D.S.;

Software: S.K., S.T., Y.K., K.R., S.R., D.S.;

Validation: S.K.;

Formal analysis: S.K., X.X.;

Investigation: S.K., X.X.;

Resources: S.K., T.K.;

Data curation: S.K.;

Writing - original draft: S.K., X.X., T.K.;

Visualization: S.K., X.X.;

Supervision: S.K., T.K.;

Project administration: S.K., T.K.;

Funding acquisition: S.K., T.K.

Competing Interests Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding Statement

This work was supported by TAKEDA Bioscience Research Grant [J191503009 to K.T.]; Grant-in-Aid for Transformative Research Areas (A) [21H05275 to K.T.]; and Osaka University Institute for Datability Science “Transdisciplinary Research Project” [Na22990007 to K.T. and K.S.].