Topology-driven protein-protein interaction network analysis detects genetic sub-networks regulating reproductive capacity
Abstract
Understanding the genetic regulation of organ structure is a fundamental problem in developmental biology. Here, we use egg-producing structures of insect ovaries, called ovarioles, to deduce systems-level gene regulatory relationships from quantitative functional genetic analysis. We previously showed that Hippo signalling, a conserved regulator of animal organ size, regulates ovariole number in Drosophila melanogaster. To comprehensively determine how Hippo signalling interacts with other pathways in this regulation, we screened all known signalling pathway genes, and identified Hpo-dependent and Hpo-independent signalling requirements. Network analysis of known protein-protein interactions among screen results identified independent gene regulatory sub-networks regulating one or both of ovariole number and egg laying. These sub-networks predict involvement of previously uncharacterised genes with higher accuracy than the original candidate screen. This shows that network analysis combining functional genetic and large-scale interaction data can predict function of novel genes regulating development.
Introduction
The final shape and size of an organ is critical to organismal function and viability. Defects in human organ morphology cause a multitude of pathologies, including cancers, organ hypertrophies and atrophies (e.g. Yang and Xu, 2011). It is thus critical to understand the regulatory mechanisms underlying the stereotypic shape and size of organs. To this end, assessing the genetic regulation of size is significantly facilitated by using quantifiable changes in organ size and shape.
The Drosophila melanogaster female reproductive system is a useful paradigm to study quantitative anatomical traits. In these organs, the effects of multiple genes and the environment combine to produce a quantitative phenotype: a species-specific average number of egg-producing ovarian tubes called ovarioles. Fruit fly ovaries can contain as few as one and as many as 50 ovarioles per ovary, depending on the species (Kambysellis and Heed, 1971; King, 1970; Markow et al., 2009; Sarikaya et al., 2019), with each ovariole capable of producing eggs. Ovariole number, therefore, may affect the reproductive fitness of Drosophila species by determining the potential of an adult female to produce eggs (Klepsatel et al., 2013b; R' kha et al., 1997). While ovariole number within a species can vary across temperatures (Azevedo et al., 1996), altitudinal and latitudinal clines (Capy et al., 1994; David and Bocquet, 1975), under constant environmental conditions ovariole number is highly stereotypic (Capy et al., 1993; Klepsatel et al., 2013a; R'Kha et al., 1991; R' kha et al., 1997). The reproducibility of ovariole number thus indicates a strong genetic component (Sarikaya et al., 2019). Genome wide association studies and quantitative trait locus mapping have demonstrated that the ovariole number is a highly polygenic trait (Bergland et al., 2008; Lobell et al., 2017; Orgogozo et al., 2006; Wayne et al., 2001; Wayne et al., 1997; Wayne and McIntyre, 2002). In contrast, functional genetic studies have identified only a small number of genes whose activity regulates ovariole number (discussed below). Thus, the complexity of the genetic regulation of this important trait remains largely unknown.
The determination of ovariole number in D. melanogaster occurs during late larval and pupal development (King et al., 1968). Each ovariole in the adult fly arises from a single primordial structure called a terminal filament (TF), which forms in the late third instar larval ovary (Godt and Laski, 1995) by convergent extension (Keller, 2006) of the terminal filament cells (TFCs) (Godt and Laski, 1995; Sahut-Barnola et al., 1996). TFCs are first specified from an anterior population of somatic cells in the larval ovary by the expression of transcription factors including Bric-à-brac 1/2 (bric-à-brac 1/2; bab1/2) and Engrailed (engrailed; en) (Godt and Laski, 1995; Sahut-Barnola et al., 1995). Initially a loosely arranged group in the anterior of the larval ovary, TFCs undergo morphogenetic movements to give rise to the ordered columns of cells that are TFs. Cell intercalation during convergent extension is dependent on the actin regulators Cofilin (twinstar; tsr) and the large Maf factor Traffic Jam (traffic jam; tj), and on E-cadherin dependent adhesion (Chen et al., 2001; Godt and Laski, 1995). Regulation of ovariole number is thus largely dependent on the specification of the TFCs and their rearrangement into TFs (Sarikaya and Extavour, 2015).
We previously showed that the regulation of both TFC and TF number is dependent on the Hippo signalling pathway (Sarikaya and Extavour, 2015), a pan-metazoan regulator of organ and tissue size (Hilman and Gat, 2011; Sebé-Pedrós et al., 2012). At the core of the Hippo kinase cascade are two protein kinases, Hippo (hippo; hpo) and Warts (warts; wts), which prevent the nuclear localisation of the transcriptional co-activator Yorkie (yorkie; yki). Yki and the transcription factor Scalloped (scalloped; sd) together initiate the transcription of multiple target genes, including those that promote cell proliferation and survival. In the D. melanogaster larval ovary, loss of Hpo in the somatic cells causes an increase in nuclear Yki, leading to an increase in TFCs, TFs, ovariole number and egg laying in adults (Sarikaya and Extavour, 2015).
Production of fertile eggs from a stereotypic number of ovarioles requires a spatially and temporally coordinated interplay of signalling between the somatic and germ line cells of the ovary. Thus, signalling amongst somatic and germ line cells in the larval ovary is crucial to all stages of ovarian development (Ables and Drummond-Barbosa, 2017; Gilboa, 2015; Green et al., 2011; LaFever and Drummond-Barbosa, 2005; LaFever et al., 2010; Sarikaya and Extavour, 2015). For instance, disruptions in insulin or Tor signalling affect both somatic and germ line cell proliferation (Gancz and Gilboa, 2013; Green and Extavour, 2012; Hsu and Drummond-Barbosa, 2009; LaFever and Drummond-Barbosa, 2005; LaFever et al., 2010; Sarikaya et al., 2012). Similarly, ecdysone pulses from the prothoracic gland regulate the timely differentiation of the primordial germ cells (PGCs) and the somatic TFCs (Gancz et al., 2011; Hodin and Riddiford, 1998; Hodin and Riddiford, 2000b). Both Hpo and ecdysone signalling also control the proportion of germ line to somatic cells by differentially regulating proliferation of both cell types (Gancz et al., 2011; Sarikaya and Extavour, 2015).
Although it is clear that genes function together in regulatory networks (Gonzalez and Kann, 2012), determining how the few genes functionally verified as required for ovariole development and function work together to coordinate ovariole number and ovarian function more generally is a challenge, because most genes or pathways have been considered individually. An alternative approach that is less often applied to animal developmental genetics, is a systems biology representation of complex biological systems as networks (Barabási and Albert, 1999; Watts and Strogatz, 1998). Protein-protein interaction networks (PINs) are such an example (Albert and Barabási, 2002). The availability of high-throughput molecular biology datasets from, for example, yeast two-hybrid, protein CHiP and microarrays has allowed for the emergence of large scale interaction networks representing both functional and physical molecular interactions (Barabási and Oltvai, 2004; Berger et al., 2007; Giot et al., 2003; Gonzalez and Kann, 2012).
With ample evidence that signalling in the ovary can affect ovarian development, but few genes functionally verified to date, we aimed to identify novel regulators of ovariole development by functionally testing all known members of all characterized D. melanogaster signalling pathways. We used tissue-specific RNAi to systematically knock down 463 genes in the larval ovary and looked for modifiers of the hpo loss of function egg laying and ovariole number phenotypes. To analyse the results of this phenotypic analysis, we used topology-driven network analysis to identify genetic networks regulating these phenotypes, thus generating hypotheses about the relationships between these networks. With this systems biology approach, we identify not only signalling pathway genes, but also previously untested genes that affect these reproductive traits. Functional testing showed that these novel genes affect ovariole number and/or egg laying, providing us with a novel in silico method to identify target genes that affect ovarian development and function. We use these findings to propose putative developmental regulatory networks underlying one or both of ovariole formation and egg laying.
Results
An RNAi modifier screen for signalling pathway involvement in ovariole number
To systematically ascertain the function of signalling pathway genes and their interactions with Hippo signalling in the development of the D. melanogaster ovary, we first curated a list of all known and predicted signalling genes (Gramates et al., 2017; Kanehisa et al., 2010; Mbodj et al., 2013). We identified 475 genes belonging to the 14 developmental signalling pathways characterised in D. melanogaster (Table 1; Supplementary file 1), and obtained UAS:RNAi lines for 463 of these genes from the Vienna Drosophila RNAi centre (VDRC) or the TRiP collections at the Bloomington Drosophila Stock centre (BDSC) (all D. melanogaster genetic lines used are listed in Materials and methods).
We previously showed that reducing the levels of hpo in the somatic cells of the larval ovary using traffic jam Gal4 (tj:Gal4) driving hpo[RNAi] increased both ovariole number and egg laying of adult female flies (Sarikaya and Extavour, 2015). To identify genes that modify these phenotypes, we used tj:Gal4 to drive simultaneous hpo[RNAi] and RNAi against a signalling candidate gene, and quantified the phenotypic change (Figure 1a–d). We observed that on driving two copies of hpo[RNAi] using tj:Gal4, we obtained a further increase in both egg laying and ovariole number (Figure 1e). This indicates that ovaries have further potential to increase ovariole number and egg laying beyond the increase induced by tj:Gal4 driving one copy of hpo[RNAi], and that tj:Gal4 can drive the expression of two RNAi constructs, indicating that our screen could identify both enhancers and suppressors of the tj:Gal4 >hpo[RNAi] phenotype.
We proceeded to identify modifiers of the tj:Gal4 >hpo[RNAi] phenotype by crossing males of each of the 463 candidate genes RNAis individually with tj:Gal4 >hpo[RNAi] females, and performing three phenotypic screens on the offspring. In the first screen (Figure 1a), we measured egg laying of three F1 female offspring (tj:Gal4 >hpo[RNAi], signalling candidate[RNAi]) over five days. To address batch variation (Figure 1—figure supplement 1), we standardised egg laying measurements by calculating the Z scores (Zgene = number of standard deviations from the mean) for each candidate line relative to its batch controls. 190 genes had an egg laying |Zgene| below 1. Previous studies have shown that the egg laying of newly eclosed adult mated females correlates with ovariole number during the first five days (Klepsatel et al., 2013b). We therefore eliminated these 190 genes from subsequent screening, because the change in egg laying was so modest that we considered these candidates were unlikely to show changes in ovariole number when compared to controls.
In the second screen (Figure 1b), we measured egg laying in a wild-type background (tj >signalling candidate[RNAi]) for the 273 remaining candidate genes. For the third screen (Figure 1c), we quantified the ovariole number of tj:Gal4 >hpo[RNAi], signalling candidate[RNAi] F1 adult females for the same 273 candidate genes. To choose candidates from the second and third screens for further study, we wished to account for the fact that the two screens had different effective numbers of data points. This was because egg laying data were obtained from individual vials of three females over five days, while ovariole numbers were obtained from 20 ovaries from ten females (see Materials and methods). We therefore selected the 67 genes with a |Zgene| above two for ovariole number (Figure 1c and d; Table 2), and the 49 genes with a more conservative |Zgene| above five for egg laying (Figure 1a, b and d; Table 2), for a total of 116 positive candidates for subsequent analyses.
Ovariole number is weakly correlated with egg laying
Standardisation of the results from the three screens using Z scores allowed us to compare the effects of individual genes on one or both of egg laying and ovariole number. We performed a pairwise comparison of the Zgene values for all combinations of screens, and considered genes with |Zgene| values that were above the thresholds set for the phenotype in each screen (above two for ovariole number, above five for egg laying; green dots in Figure 2a–c). Across all three screens, loss of function of our positive candidates yielded reductions in ovariole number and egg laying more commonly than increases (Figure 2a–c). Comparing the |Zgene| values of egg laying and ovariole number of tj:Gal4 >hpo[RNAi], signalling candidate[RNAi] adult females revealed that genes that caused a change in egg laying did not always similarly affect ovariole number, and vice versa (Figure 2a). We therefore hypothesise that egg laying and ovariole number may be regulated by genetically separable mechanisms. This hypothesis notwithstanding, we observed a weak but statistically significant correlation between egg laying and ovariole number (p=1e10−5; Figure 2d), and this correlation was most significant in adult females that had a drastic reduction in both phenotypes (Figure 2a).
No single signalling pathway dominates regulation of ovariole number or egg laying
We found that at least some genes from all tested signalling pathways could affect both egg laying and ovariole number (Figure 3). To determine if some pathway(s) appeared to play a more important role than others in these processes, we asked whether any of our screens were enriched for genes from a specific signalling pathway. To measure enrichment, we compared the distribution of individual pathway genes among the positive candidates in each screen, to a randomly sampled null distribution of pathway genes among a group of the same number of genes randomly selected from our curated list of 463 signalling genes (Figure 3a). Involvement of a pathway in the regulation of a phenotype would be reflected in a difference between the representation of pathway genes in an experimentally derived list and a randomly selected group of signalling genes. We found that rather than only one or a few pathways showing functional evidence of regulating ovariole number or egg laying, nearly all pathways affected both phenotypes (Figure 3a). We further tested this result by calculating the hypergeometric p-value for the enrichment of each signalling pathway, in each of the three groups of genes. Consistent with the results of the random sampling approach (Figure 3a), we found that most pathway members were not significantly enriched for egg laying or ovariole number phenotypes (Figure 3b). The absence of significant enrichment of any specific pathway is not simply attributable to the pool of genes that were screened, because our experimental manipulations of ovariole number and egg laying did cause a change in the distribution of signalling pathway members (Figure 3—figure supplement 1). Instead, both phenotypes appeared to be regulated by members of most or all signalling pathways (Figure 3). The only two exceptions to this trend were a greater than twofold enrichment of (1) genes from the Notch signalling pathway in the regulation of ovariole number (p-value<0.05, pink bar in Figure 3a,b), and (2) members of the Hedgehog (Hh) signalling pathway in the regulation of Hippo-dependent egg laying (p-value<0.05, brown bar in Figure 3a and b; Figure 3—figure supplement 2). In summation, our analyses of the enrichment of signalling pathways within the different screens indicated that both ovariole number and egg laying are regulated by genes from nearly all described animal signalling pathways (Figure 3a), rather than being dominated by any single pathway.
Comparing the results of the Egg Laying screens performed in a wild type background (Figure 1b) or in a hpo[RNAi] background (Figure 1a), revealed that most of the genes that met a threshold of |Zgene| > 5 in one screen, did not meet that threshold in the other screen (Figure 2c). This result suggests the existence of both Hippo-dependent and Hippo-independent mechanisms of regulation of egg laying. The interpretations of separable Hippo-dependent and -independent regulation of egg laying, and of the separable regulation of ovariole number and egg laying, were supported by the results of the network analysis described in the following section.
Centrality of genes in the ovarian protein-protein interaction networks can predict the likelihood of loss of function phenotypic effects
The finding that these reproductive traits were regulated by the genes of all signalling pathways led us to consider the broader topology of putative gene regulatory networks in the analysis of our data. Previously characterised genes in the ovary are often pleiotropic and can regulate both ovariole number and egg laying (Gilboa, 2015; Sarikaya and Extavour, 2015). As with proteins in a linear pathway, proteins in a protein-protein interaction network (PIN) are more likely to function in conjunction with genes that are connected to them within the network (e.g. Ideker and Sharan, 2008; Jeong et al., 2001). Centrality is one measure of the connectedness of a gene in the PIN and can be used to identify the most important functional centres within a protein network (Hahn and Kern, 2005; Ma'ayan, 2011). Most centrality measures use path length, which is a measure of the number of other proteins required to link any two proteins in the network. Here, we used four commonly used metrics to quantify gene centrality, each measuring slightly different properties (Jalili et al., 2016; Koschützki and Schreiber, 2008). (1) Degree centrality is proportional to the number of proteins that a given protein directly interacts with. (2) Betweenness centrality measures the number of shortest paths amongst all the shortest paths between all pairs of proteins that require passing through a particular protein. (3) Closeness centrality measures the average shortest path that connects a given protein to all other proteins in the network. (4) Eigenvector centrality is a measure of the closeness of a given protein to other highly connected proteins within the network.
We hypothesised that if the candidate genes we identified in our screen as playing roles in ovarian function worked together as a PIN, then the degree of centrality of a gene might be an indicator of function. To test this hypothesis, we first compiled a PIN consisting of all described interactions between D. melanogaster proteins, from the combination of publicly available protein-protein interaction (PPI) studies in the DroID database (see Materials and methods). We then calculated the four centrality measures described above for all genes within the D. melanogaster PIN (Supplementary file 1). We rank ordered only the genes tested in each screen by their score for each centrality measure, and asked whether their rank order correlated with the results of the screen, plotting these results as a receiver operating characteristic (ROC) curve. Positive correlations between centrality (a continuous variable) and phenotype (a binary variable: above or below the |Zgene| threshold) are reflected in an area under the curve (AUC) of more than 0.5. We found that the higher the centrality score, the greater the likelihood that a gene had |Zgene| values above our threshold for effects on ovariole number and egg laying (Figure 4a; Figure 4—source data 1). This supports the premise that the positive candidates identified in our screen function together as a network in the regulation of either ovariole number or egg laying. Interestingly, while the centrality of genes did predict whether a gene would affect our phenotypes of interest, it could only weakly predict the strength of that effect (p-value<0.05 in Figure 4—figure supplement 1).
Genes regulating egg laying and ovariole number regulation form non-random gene interaction networks
The centrality analyses above suggested that the genes implicated in ovariole number and egg laying displayed characteristics of a functional network. PINs can often be further sorted into a collection of sub-networks. A sub-network is a smaller selection of proteins from the PIN. Examples of such sub-networks could be proteins within the same subcellular organelle (Foster et al., 2006) or genes that are expressed at the same time (Spellman et al., 1998), thus making them likely to function together (Srinivasan et al., 2007). A putative module is a sub-network that can perform regulatory functions as a unit, independent of other sub-networks, and has key measurable features (Barabási and Oltvai, 2004; Hartwell et al., 1999; Ravasz et al., 2002; Yook et al., 2004). Genes and interactions between genes are not mutually exclusive to such putative modules and can be shared between putative modules. We therefore asked if our sub-networks, consisting of genes that showed similar mutant phenotypes, might display features of modularity. To determine whether genes that were implicated in regulation of ovariole number and egg laying interacted with each other in specific groups more than would be expected by chance, we created four lists of genes, called ‘seed’ lists, based on their individual phenotypic effects based on our screen results: (1) the core seed list, including genes positive in all three screens (Figure 4b); (2) the egg laying seed list, including genes positive in the wild type background egg laying screen (Figure 1b; Figure 4c); (3) the hpo[RNAi] egg laying seed list, including genes positive in the hpo[RNAi] background egg laying screen (Figure 1a; Figure 4c); and (4) the hpo[RNAi] ovariole seed list, including genes positive in the hpo[RNAi] background ovariole number screen (Figure 1c; Figure 4c). Interestingly, the core seed list, comprising genes that affected all three measured phenotypes, only consisted of genes that caused a reduction in both ovariole number and egg laying (Figure 4b).
We then asked whether the members of these four seed lists were more connected than would be expected by chance. In other words, we formally tested them for modularity as defined above. Meeting our criteria for modularity would suggest that the genes in these phenotypically separated seed lists might operate together as putative functional modules within the Drosophila PIN. We performed our modularity test using four commonly measured network metrics: (1) Largest Connected Component (LCC) (the number of proteins or nodes connected together by at least one interaction), (2) network density (the relative number of edges as compared to the theoretical maximum), (3) total number of edges, and (4) average shortest path (average of the minimum distances connecting any two proteins). We considered a sub-network to show modular features if they showed most of the following properties: higher LCC, higher network density, more edges, and shorter average shortest path length, when compared to a similarly sized, randomly sampled selection of genes from the PIN.
To determine whether these criteria would correctly identify signalling genes, which are known to function together as a module, we measured these four parameters in the original set of genes (all signalling genes) used in this study (Table 1). We found that the signalling genes display features of modularity when compared both to a randomly selected set of genes, as well as to a degree-controlled list of genes (Figure 4—figure supplement 2a). We then used this approach to test the modularity of the four phenotypic sub-networks, when compared to two different ‘control sub-networks’ consisting of a group of the same number of genes as contained in the sub-network, one chosen randomly from among the candidate genes from our initial screen list (Table 1), and the second chosen from a degree-controlled list of genes selected from the entire PIN (see Materials and methods: Building degree-controlled randomised networks). We found that the four predicted phenotypic sub-networks showed higher LCC, higher network density, and more edges (Figure 4—figure supplement 2b), compared to both ‘control sub-networks’. This result suggests that these sub-networks display many features of modularity (although their average shortest path length is higher than controls, rather than lower) and may function as putative modules within the PIN to regulate one or both of ovariole number or egg laying.
Based on published molecular interactions, in addition to the four criteria described above, further evidence for putative functional modules of genes can also be obtained by applying algorithms that use either the shortest path method (Bromberg et al., 2008) or the Steiner Tree approach (Huang and Fraenkel, 2009). Such methods identify and predict functional connections between the seed proteins, as well as additional nodes (proteins or genes) that have not been experimentally tested within the given parameters, but are known to interact with the seed genes in the PIN (Albert and Albert, 2004; Yu et al., 2006). This process can provide evidence for or against the existence of a predicted functional module, and subsequent experimental testing of this predicted module can confirm its functionality. Given its recent success in predicting gene modules, we applied the previously published Seed Connector Algorithm (SCA), a member of the Steiner Tree algorithm family (Wang et al., 2017; Wang and Loscalzo, 2018), to the groups of genes that had similar phenotypic effects in our screens (seed genes; Figure 4b and c). The SCA connects seed genes and previously untested novel genes (connectors) to each other using a known PIN, producing the largest possible connected putative module given the data. Using the PIN and the aforementioned four lists of seed genes, we applied a custom python implementation of the SCA (Materials and methods: 04_Seed-Connector.ipynb) to build and extract the largest possible (given our PIN) connected putative modules that regulate egg laying and ovariole number.
This SCA method yielded four putative modules, one for each seed list, which we initially referred to as the Core Module (Figure 5b), hpo[RNAi] Egg Laying module (Figure 5—figure supplement 1), Egg Laying Module (Figure 5—figure supplement 2), and hpo[RNAi] Ovariole Number Module (Figure 5—figure supplement 3) respectively. Each of these four putative modules contained seed genes, which had been functionally evaluated in our screens (green and red circles in Figure 5), as well as connector genes, which were genes newly predicted as regulators of these phenotypes (green and red triangles in Figure 5). Of the four putative modules generated by the SCA, we found that the Core module had higher centrality measures than the other three putative modules (Figure 5—figure supplement 4). We interpret this to mean that the genes regulating these ‘Core phenotypes’ are more strongly connected to each other.
We found, however, that these four groups of genes produced by the SCA did not have increased LCC values, increased network density, more edges nor decreased average shortest path (Figure 5—figure supplement 5), compared to our ‘control sub-networks’. This result shows that the SCA in this instance does not provide evidence for putative functional modules from the four phenotypic sub-networks in our system, above and beyond the evidence provided by the application of the four network metrics discussed above. To be conservative in our description of these results, we therefore henceforth refer to these four groups of genes united by phenotype and with strong predicted interactions, as sub-networks rather than as modules. We noted that each of these four sub-networks contains genes from most, if not all, known signalling pathways, rather than only genes from a single pathway (Figure 5—figure supplement 6).
Low edge densities between sub-networks suggest genetically separable mechanisms of ovariole number and egg laying
Our network analysis identified four highly connected sub-networks of genes that regulate two distinct developmental processes, together with or independently of Hippo signalling activity: ovariole number determination, which occurs primarily during larval development, and egg laying, which takes place in adult life (Figure 5). We wished to assess the degree to which there might be any shared genetic components between these four phenotypic sub-networks, and whether the addition of connector genes by the SCA had any impact on this. To understand potential interactions between the phenotypic sub-networks in the regulation of both ovariole number and egg laying, we constructed a composite network of all genes in each of the four phenotypic sub-networks (Figure 5b; Figure 5—figure supplement 1; Figure 5—figure supplement 2; Figure 5—figure supplement 3), which we refer to as the ‘meta network’ (Figure 6a). We then grouped the genes of the meta network into seven bins based on their phenotypic effects as measured in the three screens, resulting in sub-groups I through VII shown in Figure 6a. To ask whether the genes in these phenotypic groupings showed any notable interaction patterns, we compared the connectivity between genes assigned to the same phenotypic group, to the connectivity of a group of the same size randomly assembled from the genes of the meta network (Figure 6—figure supplement 1). As a measure of connectivity, we used an edge density map, which reflects the number of interactions between the genes within a group and between groups. We quantified the deviation between the edge density of each of groups I through VII, and their corresponding randomly assigned groups of the same size, by computing their respective Z score. When we included only seed genes in each of groups I through VII, we found that the edge density values of these groups were somewhat lower (Z score <−1.5) than those of the randomised groups (Figure 6—figure supplement 1a). The single exception to this was group IV, whose members shared more edges with each other than did the members of its randomised comparison group of the same size (Figure 6—figure supplement 1a). In other words, these groups of phenotypically binned seed genes were not notably more connected to each other then we would expect by chance.
In contrast, expanding each of the seven phenotypic sub-groups to include both seed genes and the connector genes predicted by the SCA changed the edge densities of these groups relative to their randomised control groups. Specifically, edge densities were much lower between groups I, II and III (Z score < −3), and much higher within group IV (Z score >3) (Figure 6—figure supplement 1). This shows that applying the SCA to these phenotypically binned groups increased the non-random differences in connectivity between them that were already present within the seed genes (Figure 6—figure supplement 1a), thus clarifying the internal structure of the meta network.
We then asked if these seven sub-groups were as connected to each other, as were the genes within each of the sub-groups, again using the edge density assessment as described above (Figure 6b). This analysis yielded three principal findings. First, edge densities between the three groups corresponding to the three scored phenotypes (I, II and III in Figure 6a) were very low (Figure 6b).This implies that the genes in each of the groups that regulate only one phenotype (I, II and III in Figure 6a) share more interactions with themselves than with genes in the other two groups, suggesting that each of these initially scored phenotype can be largely regulated by an independent, non-interacting set of genes. Second, the core group (IV in Figure 6a) displayed a higher edge density with the other three groups (I, II and III in Figure 6a) than any of those three groups did with each other (Figure 6b). Consistent with the definition of core genes as regulating all three scored reproductive phenotypes, this result suggests that the core genes, in contrast to those from the other three groups, may share substantial functional interactions with genes of the other groups. Finally, three small additional groups emerged from this analysis (V, VI and VII in Figure 6a), suggesting small networks of genes that might work together to regulate two of the three scored phenotypes. In sum, this meta network analysis supports the hypothesis of three potentially largely non-interacting genetic networks that regulate Hippo-dependent ovariole number, Hippo-dependent egg laying, and Hippo-independent egg laying, respectively. The presence of smaller sub-networks (V, VI and VII in Figure 6a) that interact with each other further supports the observation that the putative modules predicted by the SCA – which we refer to as sub-networks – could include genes that function within more than one such sub-network (Figure 5—figure supplement 4). Moreover, each of these genetically separable sub-networks included genes in multiple signalling pathways (Figure 6c).
Network analysis predicts novel genes involved in egg laying and ovariole number
The four predicted phenotypic sub-networks produced by the SCA approach included connector genes that were not included in our original screen, and thus had not been tested for possible effects on our phenotypes of interest (triangles in Figure 5b; Figure 5—figure supplements 1–3). Given that prior work in human disease models showed that predicted disease modules can correctly identify gene involvement in the relevant diseases (Chen et al., 2006; Gonzalez et al., 2007; Wang et al., 2017; Wang and Loscalzo, 2018), we asked whether our deployment of the SCA had likewise successfully predicted novel, functionally important genes in each sub-network. To do this, we used UAS:RNAi lines for each connector, driven by tj:Gal4 to measure the effects of knocking down each of the connector genes (triangles in Figure 5b and Figure 5—figure supplement 1; Figure 5—figure supplement 2; Figure 5—figure supplement 3) both on phenotypes within the sub-network where they were predicted (Figure 7a–b, Figure 7—source data 1), and on either of the other two tested phenotypes (Figure 7c, Figure 7—source data 2).
Of the ten predicted novel connectors within the Core sub-network, loss of function of several of these had significant effects on at least one of the three scored phenotypes. Five affected ovariole number, two affected Hpo-dependent egg laying and one affected Hpo-independent egg laying. However, only one of them significantly altered all three scored phenotypes (Figure 7a; Figure 7—source data 1).
The predicted connector genes from two of the other three phenotypic sub-networks showed high positive prediction rates for novel genes within the sub-networks. RNAi against seven out of 18 of the hpo[RNAi] Egg Laying connectors, three out of 11 of the hpo[RNAi] Ovariole Number connectors, and none of the 11 Egg Laying connectors, significantly affected the sub-network phenotype (Figure 7—source data 1). Thus, although the Egg Laying connectors failed to impact this phenotype in our assay, 41.1% and 27.2% of the connectors from the other two sub-networks were correctly predicted (Figure 7b; Figure 7—source data 1).
In sum, taken across all sub-networks, this methodology correctly identified genes regulating at least one of the scored reproductive phenotypes, at significantly higher rates than those obtained in the original screen of 463 members of all known signalling pathways (Figure 7c; Figure 7—source data 2). By this measure, testing network-predicted novel genes derived from experimentally obtained data was even more successful than testing signalling pathways as a means of identifying novel genes that regulate ovariole number and egg laying.
Discussion
In this study, we have identified many novel genes that regulate one or both of egg laying and ovariole number. Though the development of the insect ovary has been studied for over 100 years, our understanding of the genetic mechanisms that regulate the development of the ovary is sparse. The female reproductive system and its ability to produce eggs are one of the key determinants for the survival of a species in an ecological niche. The genes we have uncovered here are possible targets for the regulation of the construction and function of the reproductive system in D. melanogaster, and potentially in other species of insects as well. Understanding the gene regulatory networks that regulate egg laying and ovariole development could provide a framework to understand the key regulatory steps during this process that may be modified over evolutionary time, to yield the wide diversity of ovariole numbers and fecundities displayed by extant insects. We suggest that, given our success in applying a network approach to the results of a traditional forward genetic screen, the field of developmental genetics should find it fruitful to apply network analyses to the interpretation of large scale transcriptomic and proteomic data.
Identification of regulatory sub-networks for ovariole development and egg laying
The D. melanogaster ovary is a commonly studied model for organogenesis (Chen et al., 2001; Godt and Laski, 1995; Lobell et al., 2017; Sarikaya and Extavour, 2015), stem cell maintenance (Gilboa, 2015) and interactions of development and ecology (Cohet and David, 1978; Hodin and Riddiford, 2000a; Klepsatel et al., 2013a; Sarikaya et al., 2019). Nevertheless, our understanding of the genetic mechanisms that regulate these processes remains fragmentary. In this paper, we have identified four distinct protein interaction sub-networks that regulate ovariole number and egg laying in the D. melanogaster ovary. These sub-networks consist of both novel and previously characterised genes that regulate either ovariole number or egg laying or both, thus enhancing our understanding of the genetic underpinnings of this reproductive system.
Of the four sub-networks, the Core sub-network affects both ovariole number and egg laying. The Core sub-network contains numerous housekeeping genes, including regulators of transcription, translation and cell division such as polo (Llamazares et al., 1991), cyclin K (Edwards et al., 1998), nucleosome assembly protein 1 (Ito et al., 1996) and eukaryotic translation release factor 1 (Chao et al., 2003). While polo and eukaryotic translation release factor one are members of signalling pathways, cyclin K and nucleosome assembly protein one are genes predicted by the SCA. Given that the Core sub-network largely consists of genes whose loss of function decreases both of these parameters, we hypothesise that these are essential genes for the basic structure and function of the ovaries. Essential genes are more interconnected in a PIN with higher centrality measures (Jeong et al., 2001; but see Yu et al., 2008) and interestingly, we find that the genes in the Core sub-network also have higher connectivity than those in the other three sub-networks (Figure 5—figure supplement 4).
In addition to genes that regulate basic cellular processes, the Core sub-network is enriched for the central components of the Hh signalling cascade, namely patched (ptc), smoothened (smo) and costa (cos) (Lee et al., 2016). However, we find that the loss of Hh ligand, which is expressed in the TF cells in the developing larval ovary (Lai et al., 2017), does not significantly affect either ovariole number or egg laying. Though surprising, ligand-independent activation of Hedgehog signalling has been observed before. For example, in the Drosophila eye, loss of either ptc or cos in clones leads to non-cell autonomous proliferation in wild type cells, as well as growth disadvantages in the mutant tissue (Christiansen et al., 2012). In another example, sufficient intracellular smo levels can also activate downstream transcription of Hh pathway targets, showing that Hh itself is not always required to activate the cascade (Jiang et al., 2018).
Development of the larval ovary
The hpo[RNAi] Ovariole Number sub-network is composed of genes that affect the Hippo signalling activity-dependent determination of ovariole number during development. Establishment of ovariole number occurs largely during the third instar stage of larval development in D. melanogaster (Godt and Laski, 1995; Hodin and Riddiford, 1998; King et al., 1968; Sahut-Barnola et al., 1996). During this period, the TFCs are specified in the anterior of the ovary and undergo rearrangement into stacks of cells called TFs, each of which gives rise to an ovariole (Godt and Laski, 1995; Sahut-Barnola et al., 1995). TF specification requires the expression of engrailed (En) (Bolívar et al., 2006) and the transcription factors Bab1 and Bab2, encoded by the bric-à-brac locus (Couderc et al., 2002; Godt et al., 1993). A third transcription factor, Lmx1a, was recently found to be necessary for the specification of the TFCs (Allbee et al., 2018). Our hpo[RNAi] Ovariole Number sub-network identifies numerous additional novel transcription factors including bunched (bun) and retinoblastoma-family protein (rbf), which we hypothesise could also be involved in the specification of ovariole number. bun and rbf have been implicated in the migration (Dobens et al., 2005) and endoreplication (Cayirlioglu et al., 2003) of the follicle cells during oogenesis, but have not, to our knowledge, been previously identified as playing a role in the context of larval ovary development.
The TFCs specified in the larval ovary undergo a process of convergent extension to form TFs. This process of convergent extension requires cell intercalation, and the actin depolymerising factor Cofilin, encoded by the gene twinstar, is essential to this process (Chen et al., 2001). During intercalation, the cells also dynamically modify their actin cytoskeleton and their expression of E-cadherin (Godt and Laski, 1995). Our hpo[RNAi] Ovariole Number sub-network further identifies Rho1 (Barrett et al., 1997) and Rho kinase (Rok) (Mizuno et al., 1999) as necessary for correct ovariole number. During the extension of the D. melanogaster embryonic germ band, a commonly studied model of convergent extension, the localised activation of the actin-myosin network facilitated by Rho1 and Rok is necessary for cell intercalation (Kasza et al., 2014). Given the known roles of Rho1 and Rok as regulators of the actin cytoskeleton (Ridley, 2006), we propose that TF assembly in the ovary requires both these proteins for correct cell intercalation. A third actin cytoskeleton regulator, misshapen (msn), was also identified by our hpo[RNAi] Ovariole Number sub-network. msn encodes a MAP kinase previously shown to regulate the polarisation of the actin cytoskeleton during oogenesis (Lewellyn et al., 2013), but has not, to our knowledge, been studied to date in the context of larval ovarian development.
We propose that the polarity of the somatic cells in the ovary is also necessary for correct larval ovary development, given the presence of the lateral membrane proteins discs large 1 (dlg1) and prickle (pk) in the ovariole sub-network. During the maturation of the TFs during larval development, the TFCs undergo significant cell shape changes, coincident with localised expression of beta-catenin and actin to the lateral edges of the TFCs (Godt and Laski, 1995). Restriction of the E-cadherin domain in epithelia requires establishment of the basolateral domain (Harris and Peifer, 2004) and we propose that testing a similar requirement for dlg1 and pk in the larval ovary would be a fruitful avenue for future studies.
Network analysis as a tool in developmental biology
Using a systems biology approach to analyse RNAi screening data has proven fruitful, providing us with new insights into the development and function of the D. melanogaster ovary by identifying novel and previously understudied genes that regulate this process. Systematic analysis of the function of single genes in development has been a historical convention and has provided valuable and precise genetic interaction information (Jansen et al., 2002; von Mering et al., 2002). With the advent of genome-wide analysis, however, we can use data from a larger number of genes to predict the identity of additional functionally significant genes with relative ease (Yu et al., 2006). We note that the novel gene prediction rate within each individual sub-network ranged from as high as 41.1% from the hpo[RNAi] Ovariole Number sub-network to as low as 0% from the Egg Laying sub-network (Figure 7b; Figure 7—source data 1). We suggest that this may be due to multiple factors. Firstly, the possible incompleteness of the PIN is expected to lead to some areas of the network being sparse or non-existent (von Mering et al., 2002). If the sub-network of interest happens to fall in such regions of the PIN, prediction algorithms will fail. Secondly, the initial restriction of tested genes to signalling pathway members might have provided a seed list too sparse to usefully predict functional connectors. Finally, it could be the case that ‘Egg Laying’ is such a complex phenotype that its gene regulation cannot be adequately captured within a highly connected network of the type suited for identification by the analyses we have used here.
Ovariole number in D. melanogaster is the outcome of a discrete developmental process with a clear beginning and end, comprising a specific series of cellular behaviours that take place in the confines of one organ (Godt and Laski, 1995; Hodin and Riddiford, 2000a; Sahut-Barnola et al., 1996). Once established during larval life, ovariole number in Drosophila remains unaltered through to and during adulthood, even if oogenesis within those ovarioles suffers congenital or age-related defects (King, 1970). Because previous work suggested that ovariole number in Drosophila could have at least some predictive relation to egg laying (Cohet and David, 1978; Klepsatel et al., 2013b; Sarikaya and Extavour, 2015), we reasoned that scoring the latter phenotype in a primary screen (Figure 1a) could be an effective way to uncover ovariole number regulators (Figure 1c). While our results showed that this was true in many cases, it was also clear that these two traits can vary independently (Figure 2), highlighting the fact that ovariole number is not the only determinant of egg laying. Egg-laying dynamics, even during the limited five-day assay used in our study, are likely influenced not just by a single anatomical parameter such as ovariole number, but rather by many biological, biomechanical, hormonal and behavioural processes. Consequently, the sub-network we were able to extract from the results of this screen (Figure 1a and b) might be too coarse to extract novel genes that participate in the potentially complex gene interactions regulating egg laying. Furthermore, genes predicted within each of the sub-networks are unlikely to function exclusively within just one sub-network. This conclusion is supported by our observation that genes predicted to function in any of the sub-networks, also function in at least one of the four sub-networks at a higher rate than genes selected for screening by their presence in a signalling pathway (Figure 7c). We also observe that although substantial regions of the meta network do not share interactions with genes in the other sub-networks (Figure 6b), we do find smaller sub-networks where there is some overlap, further indicating pleiotropy of some genes in both egg laying and ovariole number regulation.
The predictive rates of the approach we have used here, although encouraging, are likely limited by the degree of noise in the high-throughput data used to generate the PIN (Li et al., 2010), the sparseness of the PIN, and the degree of misidentification of protein interactions (Zhang et al., 2017). Addressing one or more of these parameters could improve the outcomes of future network predictions from developmental genetics data. For example, the problem of sparseness, which is a paucity of high confidence detectable interactions relative to all biologically relevant interactions, has been addressed in other studies by using an ‘Interolog PIN’ (Matthews et al., 2001) in place of an organism-specific PIN. An Interolog PIN combines known interactions from multiple organisms and has been used successfully to identify, for example, gene modules relevant in squamous carcinoma, based on a starting dataset of microarray data on differentially expressed genes between cancer cells and the surrounding tissue (Wachi et al., 2005). Future studies applying such an Interolog PIN to the outcomes of genetic screens for developmental processes of interest could potentially overcome the problem of sparseness, as well as the biases towards proteins that are more heavily studied and thus better represented in organism-specific PINs.
Materials and methods
Lead contact and materials availability
Request a detailed protocolThis study did not generate new unique reagents. This study generated new python3 code available on GitHub: https://github.com/extavourlab/hpo_ova_eggL_screen; Kumar, 2020 (copy archived at https://github.com/elifesciences-publications/hpo_ova_eggL_screen).
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Cassandra G. Extavour (extavour@oeb.harvard.edu).
Experimental model and subject details
Request a detailed protocolWild type and mutant lines of Drosophila melanogaster were obtained from publicly accessible stock centers and maintained as described in ‘Fly Stocks’ below. Genotypes and provenance are provided in the Key Resource Table. Candidate genes were randomly assigned to batches for screening (see the Supplementary file 1 for which genes were in each batch). F1 animals from the same cross were randomly assigned to experimental groups for phenotyping in all screens.
Method details
Fly stocks
Request a detailed protocolFlies were reared at 25°C at 60% humidity with standard Drosophila food (Sarikaya et al., 2012) containing yeast and in uncrowded conditions as previously defined (Sarikaya and Extavour, 2015). RNAi lines were obtained from the TRiP RNAi collection at the Bloomington Drosophila Stock Centre (BDSC) and from the Vienna Drosophila Resource Centre (VDRC). See Key Resources Table for complete list of stocks used in this study. Oregon R was used as a wild-type strain. The genotype of the traffic jam:Gal4 line used in the screen was y w; P{w[+mW.hs]=GawB}NP1624 (Kyoto Stock Center, K104–055; abbreviated hereafter as tj:Gal4). The hippo RNAi line used in the screen was y[1] v[1]; P{y[+t7.7]v[+t1.8]=TRiP. HMS00006}attP2 (BDSC:33614; abbreviated hereafter as hpo[RNAi]).
Egg and ovariole number counts
Request a detailed protocolAdult egg laying was quantified by crossing three virgin females of the desired genotype (see ‘Screen design’ below) with two males in a vial containing standard food and yeast granules (day one) and then transferring them into a fresh food vial without yeast granules for a 24 hr period. Eggs from vials were then counted by visual inspection of the surface of the food in the vial. Males and females were transferred to fresh food vials without yeast granules, every day thereafter until day six. All egg -laying measurements reported and analysed in the paper are the sum of the eggs laid by three adult female flies over the five days of this assay (days two through six without yeast granules). Data from any vial in which either a female or male died, during the course of the experiment, were not included in the analysis.
Ovariole number was quantified by mating ten virgin adult females with five virgin adult Oregon R males for three days post-eclosion in vials with yeast at 25°C and 60% humidity. After this three-day mating period, all 20 adult ovaries from the mated females were dissected in 1X PBS with 0.1% Triton-X-100 and stained with 1 µg/ml Hoechst 33321 (1:10,000 of a 10 mg/ml stock solution). Ovarioles were separated from each other with No. 5 forceps (Fine Science Tools) and counted by counting the number of germaria under a ZEISS Stemi 305 compact stereo microscope with a NIGHTSEA stereo microscope UV Fluorescence adaptor.
Screen design
Request a detailed protocolIn the primary screen (Figure 1a: hpo[RNAi] Egg Laying), 463 candidate genes (Supplementary file 1) were screened for the effect of an RNAi-induced loss of gene function in a hpo[RNAi] background on the number of eggs laid in the first five days of mating (see ‘Egg and ovariole number counts’ above) by adult females. These females were the F1 offspring of UAS:candidate gene RNAi males crossed to P{w[+mW.hs]=GawB}NP1624; P{y[+t7.7] v[+t1.8]=TRiP.HMS00006}attP2 (tj:Gal4; UAS:hpo[RNAi]) virgin adult females (Figure 1a: hpo[RNAi] Egg Laying). All genes that yielded an egg laying count with a |Zgene| > 1 (see ‘Gene selection based on Z score and batch standardisation’ below) were selected to undergo two secondary screenings (n = 273, Table 2; Figure 1d). First, these genes were screened for effects on the egg laying of mated adult female offspring from a cross of UAS:candidate gene[RNAi] males and tj:Gal4 virgin adult females (Figure 1b: Egg Laying). Secondly, these genes were screened for effects on ovariole number in a hpo[RNAi] background. All 20 ovaries from ten adult female F1 offspring of a cross between UAS:candidate gene[RNAi] males to P{w[+mW.hs]=GawB}NP1624; P{y[+t7.7] v[+t1.8]=TRiP.HMS00006}attP2 (tj:Gal4; UAS:hpo[RNAi]) virgin adult females were scored for ovariole number (see ‘Egg and ovariole number counts’ above). (Figure 1c: hpo[RNAi] Ovariole Number).
Gene selection based on Z score and batch standardisation
Request a detailed protocolCandidate genes were screened in batches with an average size of 50 genes. For each batch, control flies were the female F1 offspring of Oregon R males crossed to P{w[+mW.hs]=GawB}NP1624; P{y[+t7.7] v[+t1.8]=TRiP.HMS00006}attP2 (tj:Gal4; UAS:hpo[RNAi]) virgin adult females. Because the control group in each batch had slightly different distributions of egg laying and ovariole number values (Figure 1—figure supplement 1), it was inappropriate to compare absolute mean values between genes that were scored in different batches. Instead, comparisons of the Z score of each candidate (Zgene) to its batch control group was used as a discriminant. This approach standardises for batch effects and allows the comparison of all genotypes within and across the primary and secondary screens with a single metric (Zgene).
Firstly, the mean and standard deviation of the eggs laid by the control genotype for a batch were calculated as μb and σb respectively. Then, using the number of eggs laid by adult females of a candidate gene RNAi (xgene) of the same batch, the Z score for the egg laying count of that gene (Zgene) was calculated as . The same standardisation protocol was applied to both egg laying and ovariole number counts of every gene and its corresponding batch control.
Ovariole numbers were derived from counts of the number of ovarioles per ovary for 20 ovaries per candidate gene, and a threshold of |Zgene| > 2 was applied for ovariole number phenotype. Egg laying counts were derived from measurements of three females in a single vial per gene. We therefore chose to be more conservative in our Z score comparisons for the egg laying phenotype, than for ovariole number phenotype, and applied a stringent threshold of |Zgene| > 5 to select genes of interest. All genes with |Zgene| values above these thresholds are referred to throughout the study as ‘positive candidates’. (See Ipython notebooks 02_Z_score_calculation.ipynb and 02.2_Z_score_calculation_prediction.ipynb for code implementation and calculation of Z scores, and 06_Screen Analysis.ipynb for batch effects.)
Signalling pathway enrichment analysis
Request a detailed protocolTo study the enrichment of a particular signalling pathway in a group of candidate genes that had similar phenotypic effects revealed by the screen, custom scripts (see 07_Signaling_pathway_analysis.ipynb for code implementation) were generated to implement two different methods (Figure 3a and b; Figure 3—figure supplement 1; Figure 5—figure supplement 6a,b).
The first method is a numerical method that uses random sampling to calculate the null distribution of the number of members (M) of a signalling pathway (S) that would be expected at random in a set of genes of size (N). The script randomly sampled N genes from among the 463 tested D. melanogaster signalling genes 10,000 times, and counted the number of genes (M) that were members of the signalling pathway S. Positive candidates in each of the three screens were sorted by their presence in signalling pathways and counted. The Z score was then calculated by comparing the experimentally observed number of positive candidates in each signalling pathway against the randomly sampled null distribution.
The second method used the hypergeometric p-value to calculate the probability of M members of a signalling pathway being in a group of N genes, given a starting population of 463 tested D. melanogaster signalling genes, and the known attribution to a pathway S of each gene.
Protein-Protein Interaction Network (PIN) building
Request a detailed protocolThere is no standard complete Protein-Protein Interaction network (PIN) available for Drosophila melanogaster. However, there exist many smaller networks from different screens, as well as literature extractions. We therefore combined data from these sources and then created a PIN for use in the present study, as follows:
Step 1
Request a detailed protocolSeveral screens assessing protein-protein interactions have been centralised in a database called DroID: http://www.droidb.org. The version DroID_v2018_08 was used. All available datasets were first downloaded from that database using this link: http://www.droidb.org/Downloads.jsp. The description of all of these datasets can be found here: http://www.droidb.org/DBdescription.jsp.
Step 2
Request a detailed protocolWe used the datasets from all screens that assessed direct protein-protein interactions and did not use the interolog database (predicted protein interaction based on mouse human and yeast PPIs). These direct assessment screens were seven in total, as follows:
Finley Yeast Two-Hybrid Data (size 2.0 MB; 3610 Nodes and 9007 Edges)
Curagen Yeast Two-Hybrid Data (size 4.6 MB; 6678 Nodes and 19506 Edges)
Hybrigenics Yeast Two-Hybrid Data (size 381 KB; 1269 Nodes & 1842 Edges)
Perrimon co-AP complex (size 108 KB; 252 Nodes and 384 Edges)
DPiM co-AP complex (size 6.3 MB; 3732 Nodes and 17652 Edges)
PPI from other databases (size 16.2 MB; 7524 Nodes and 47471 Edges)
PPI curated by FlyBase (size 7.4 MB; 5125 Nodes and 31491 Edges)
We did not consider self-loop edges from proteins predicted to interact with themselves (homotypic or self-interactions). An important element to note is that the PPIs curated by FlyBase is a literature-based PPIs. FlyBase protein-protein interactions are experimentally derived physical interactions curated from the literature by FlyBase and does not include FlyBase-curated genetic interactions.
Step 3
Request a detailed protocolWe concatenated the seven datasets listed above into a single unique database. A custom python script was created that downloads and reads each of the above seven unique PPI tables, and generates a single PIN (see 01_PIN_builder.ipynb). From this concatenation, a single-edge undirected network was created and saved. This network is hereafter referred to as the PIN (see 01_PIN_builder.ipynb). The PIN contains 10,632 proteins (nodes) and 85,019 interactions (edges), giving a network density of 0.0015.
Network metric computations
Request a detailed protocolThe centrality of a node is often used as a measure of a node’s importance in a network. Within a PIN, the centrality of a gene reflects the number of interactions in which the gene directly or indirectly participates. Four different centrality metrics were computed for all genes in the PIN using the networkx python library:
Betweenness reflects the number of shortest paths passing through a gene.
Eigenvector is a measure of the influence of a gene in the network.
Closeness measures the sum of shortest distance of a gene to all the other genes.
Degree centrality corresponds to the normalised number of edges of a gene in the network.
While there exist more centrality measures, these four are commonly used to assess biological networks. These computed centrality parameters of the genes measured in the screen were computed with 03_ROC_curve_analysis_of_network_metrics.ipynb, and are reported in the Supplementary file 1 (see 09_Making_the_database_table.ipynb).
Receiver Operating Characteristic (ROC) curves
Request a detailed protocolTo check whether the centrality of a gene in the network could predict the phenotypic effect produced by RNAi against that gene, ROC curves were plotted for the four aforementioned centrality measures of each gene in each screen. A ROC analysis is used to measure the correlation between a continuous variable (centrality) and a binary outcome (above or below Z score threshold). Therefore, for each screen, measured genes were rank ordered from high centrality to low centrality, and plotted against the binary outcome of |Zgene| being above or below the appropriate |Z score| threshold (>5 for egg laying and >2 for ovariole number). The Area Under the Curve (AUC) measures the extent of correlation between centrality and effect of a gene on measured phenotype. AUC above or below 0.5 indicates a positive or negative correlation respectively, while an AUC of 0.5 indicates no correlation of the parameters. The scikit-learn python package was used to calculate the AUC of each ROC curve plotted (see 03_ROC_curve_analysis_of_network_metrics.ipynb).
Building degree-controlled randomised networks
Request a detailed protocolWe assessed the modularity of the networks by comparing the network metrics of each sub-network to a degree-controlled randomly sampled network. To generate this degree controlled random network, we applied a previously developed method (Guney et al., 2016). In short, nodes in the PPI are binned by degree with the minimum size of each bin being set at 100 nodes. Bins are constructed iteratively from the lowest degree to the highest degree in the network. To sample a set of nodes, the sub-network degree distribution is computed, using the bin cut-off, from the PPI. Then, nodes are randomly selected from each bin to match this degree distribution (see 05.2_Degree_Controlled_Testing.ipynb for code implementation).
Assessing the utility of the Seed Connector Algorithm in building network modules
Request a detailed protocolNetwork modules were assessed using the previously published Seed Connector Algorithm (SCA) (Wang et al., 2017; Wang and Loscalzo, 2018), implemented here in python (see 04_Seed-Connector.ipynb) and illustrated in Figure 5a. Creating a module using the SCA requires a list of seed genes and a PIN. From each of the three screens, we selected the genes whose |Zgene| value was above the threshold and created three seed lists, respectively (Figure 4c: Egg laying, hpo[RNAi] egg laying and hpo[RNAi] ovariole ‘seed’ list). A fourth list consisting of the intersection of the aforementioned seed lists was also collated and called the core ‘seed’ list (Figure 4b). Genes were assigned in the core list if they passed the Z threshold in all three screens. The SCA was then executed on each of these seed lists using the PIN. Not all genes in the four seed lists were found in the PIN (specifically, CG12147 in the hpo[RNAi] Egg Laying seed list and CG6104 in the hpo[RNAi] Ovariole number seed list were absent from the PIN) and were therefore eliminated from further network analysis. The removal of these two genes accounts for the variation in the number of positive candidates in Table 2 and the number of seed genes in the module. Modules were obtained for each seed list (Figure 5b; Figure 5—figure supplements 1–3) consisting of the seed genes (circles in Figure 5b and Figure 5—figure supplements 1–3) and previously untested genes added by the SCA (squares in Figure 5b and Figure 5—figure supplements 1–3) to increase the LCC size that we refer to as connector genes (see 04_Seed-Connector.ipynb). The results of the algorithm are summarised in the Supplementary file 1.
The modularity of the sub-networks was then assessed using four network metrics namely Largest Connected Component (LCC), number of edges, network density and average shortest path in the LCC. Each metric for each module was assessed using distance of the network metric to a null distribution. Initially, the null distribution was calculated by taking 1000 samples of 463 genes randomly selected from the PIN and calculating the above metrics. We found that the 463 genes selected in the signalling screen were already more connected than the null distribution of sets of 463 genes randomly selected from the PIN (Figure 5—figure supplement 4a). Therefore, to avoid a false positive detection of modularity, the four experimentally obtained sub-networks were compared to null distributions obtained by randomly sampling an equal number of genes from the 463 signalling candidate genes selected for our screen. For each of the four modules, comparison of the metrics was performed on the seed lists and the sub-network after the SCA. Most metrics were enriched in the seed group when compared to the null distribution with the exception of the Average shortest path (Figure 5—figure supplement 4b, light red line). The sub-networks obtained from the SCA further increased all four metrics suggesting the modularity of the four sub-networks (Figure 5—figure supplement 4b, dark red line; see 05_Network_Module_testing.ipynb for code implementation).
Meta network
Request a detailed protocolTo build the meta network, the genes from all four sub-networks were concatenated into one network. This network was then visually sorted in an approach akin to projecting the network onto a Venn Diagram. The meta network was sorted by which of the three screens the gene was positive in. The intersections were genes whose |Zgene| value was above the threshold in more than one and possibly all three of the screening paradigms. For example, if a gene was found in the hpo[RNAi] Ovariole Number and Egg Laying sub-networks it is then assigned to the dual positive group hpo[RNAi] Ovariole Number/Egg Laying (Figure 6a, sub-network VI). After applying this grouping strategy, the connectivity across the groups was studied by calculating the edge density between all groups (). Finally, the proportion of each signalling candidate in each of those groups was calculated by taking the number of members of a signalling pathway divided by the total members of a group (see Ipython notebook 08_MetaModule_Analysis.ipynb). A single gene, sloppy paired 1, was a seed in the Egg Laying sub-network and also a connector in the hpo[RNAi] Egg Laying sub-network; it fell within sub-network VII in the meta network, and is marked as a seed (grey) in Figure 6a.
Quantification and statistical analysis
Number of samples
Request a detailed protocolThe number of samples across the different screens were as follows:
Hpo[RNAi] Egg Laying and Egg Laying screens
Controls: five vials of three females and two males
Sample: one vial of three females and two males
Hpo[RNAi] Ovariole number screen
Controls: 20 flies, two ovaries per fly considered as independent measurements
Sample: 10 flies, two ovaries per fly considered as independent measurements
Correction of batch effect
Request a detailed protocolDespite best efforts to maintain the exact same condition between each experiment, some variation was measured between the batches. Control flies showed variations in both measured phenotypes, ovariole number and egg laying (Figure 1—figure supplement 1). In order to compare the values measured across different batches, each sample was standardised by calculating its Z score (Zgene) to the control distribution. For each batch, the measurements for controls were pooled into a distribution, and the mean and standard deviation was computed. Then each sample was compared to its respective batch and its Z score computed (see ‘Gene selection based on Z score and batch standardisation’ for formula).
Statistical analysis
Request a detailed protocolAll statistical analyses were performed using the scipy stats module (https://www.scipy.org/) and scikit-learn (https://scikit-learn.org/) python package. Statistical tests and p-values are reported in the figure legends. All statistical tests can be found in the Ipython notebooks mentioned below.
Data and code availability
Request a detailed protocolThis study generated a series of python3 Ipython notebook files that perform the entire analysis presented in this study. All the results presented in this paper, including the figures with the exception of the network visualisations, which were created using Cytoscape3 (https://cytoscape.org/) can be reproduced by running the aforementioned python3 code. The raw data, calculations made with these data, and code used for calculations and analyses (Ipython notebooks) are available as supplementary information. For ease of access, legibility and reproducibility, the code and datasets have been deposited in a GitHub repository available at https://github.com/extavourlab/hpo_ova_eggL_screen.
Data availability
This study did not generate new unique reagents. This study generated new python3 code available on GitHub: https://github.com/extavourlab/hpo_ova_eggL_screen (copy archived at https://github.com/elifesciences-publications/hpo_ova_eggL_screen).
References
-
Statistical mechanics of complex networksReviews of Modern Physics 74:47–97.https://doi.org/10.1103/RevModPhys.74.47
-
Network biology: understanding the cell's functional organizationNature Reviews Genetics 5:101–113.https://doi.org/10.1038/nrg1272
-
Genetic dissection of a stem cell niche: the case of the Drosophila ovaryDevelopmental Dynamics 235:2969–2979.https://doi.org/10.1002/dvdy.20967
-
Mutations in eukaryotic release factors 1 and 3 act as general nonsense suppressors in DrosophilaGenetics 165:601–612.
-
ConferenceMining alzheimer disease relevant proteins from integrated protein interactome dataPacific Symposium on Biocomputing. pp. 367–378.https://doi.org/10.1142/9789812701626_0034
-
The bric a brac locus consists of two paralogous genes encoding BTB/POZ domain proteins and acts as a homeotic and morphogenetic regulator of imaginal development in DrosophilaDevelopment 129:2419–2433.
-
Organizing stem cell units in the Drosophila ovaryCurrent Opinion in Genetics & Development 32:31–36.https://doi.org/10.1016/j.gde.2015.01.005
-
Pattern formation in the limbs of Drosophila: bric à brac is expressed in both a gradient and a wave-like pattern and is required for specification and proper segmentation of the tarsusDevelopment 119:799–812.
-
Mechanisms of cell rearrangement and cell recruitment in Drosophila ovary morphogenesis and the requirement of bric à bracDevelopment 121:173–187.
-
ConferenceMining gene-disease relationships from biomedical literature: weighting protein-protein interactions and connectivity measuresPacific Symposium on Biocomputing. pp. 28–39.https://doi.org/10.1142/9789812772435_0004
-
Chapter 4: protein interactions and diseasePLOS Computational Biology 8:e1002819.https://doi.org/10.1371/journal.pcbi.1002819
-
FlyBase at 25: looking to the futureNucleic Acids Research 45:D663–D671.https://doi.org/10.1093/nar/gkw1016
-
Counting in oogenesisCell and Tissue Research 344:207–212.https://doi.org/10.1007/s00441-011-1150-5
-
Network-based in silico drug efficacy screeningNature Communications 7:10331.https://doi.org/10.1038/ncomms10331
-
Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networksMolecular Biology and Evolution 22:803–806.https://doi.org/10.1093/molbev/msi072
-
The evolutionary history of YAP and the hippo/YAP pathwayMolecular Biology and Evolution 28:2403–2417.https://doi.org/10.1093/molbev/msr065
-
The ecdysone receptor and ultraspiracle regulate the timing and progression of ovarian morphogenesis during Drosophila metamorphosisDevelopment Genes and Evolution 208:304–317.https://doi.org/10.1007/s004270050186
-
Drosophila NAP-1 is a core histone chaperone that functions in ATP-facilitated assembly of regularly spaced nucleosomal arraysMolecular and Cellular Biology 16:3112–3124.https://doi.org/10.1128/MCB.16.6.3112
-
An intracellular activation of Smoothened that is independent of Hedgehog stimulation in DrosophilaJournal of Cell Science 131:jcs211367.https://doi.org/10.1242/jcs.211367
-
KEGG for representation and analysis of molecular networks involving diseases and drugsNucleic Acids Research 38:D355–D360.https://doi.org/10.1093/nar/gkp896
-
Mechanisms of elongation in embryogenesisDevelopment 133:2291–2302.https://doi.org/10.1242/dev.02406
-
The development of the female Drosophila reproductive systemJournal of Morphology 124:143–165.https://doi.org/10.1002/jmor.1051240203
-
Reproductive and post-reproductive life history of wild-caught Drosophila melanogaster under laboratory conditionsJournal of Evolutionary Biology 26:1508–1520.https://doi.org/10.1111/jeb.12155
-
Centrality analysis methods for biological networks and their application to gene regulatory networksGene Regulation and Systems Biology 2:193–201.https://doi.org/10.4137/GRSB.S702
-
Hedgehog signaling establishes precursors for germline stem cell niches by regulating cell adhesionJournal of Cell Biology 216:1439–1453.https://doi.org/10.1083/jcb.201610063
-
Misshapen decreases integrin levels to promote epithelial motility and planar polarity in DrosophilaThe Journal of Cell Biology 200:721–729.https://doi.org/10.1083/jcb.201209129
-
Polo encodes a protein kinase homolog required for mitosis in DrosophilaGenes & Development 5:2153–2165.https://doi.org/10.1101/gad.5.12a.2153
-
Introduction to network analysis in systems biologyScience Signaling 4:tr5.https://doi.org/10.1126/scisignal.2001965
-
Egg size, embryonic development time and ovoviviparity in Drosophila speciesJournal of Evolutionary Biology 22:430–434.https://doi.org/10.1111/j.1420-9101.2008.01649.x
-
Logical modelling of Drosophila signalling pathwaysMolecular BioSystems 9:2248–2258.https://doi.org/10.1039/c3mb70187e
-
Rho GTPases and actin dynamics in membrane protrusions and vesicle traffickingTrends in Cell Biology 16:522–529.https://doi.org/10.1016/j.tcb.2006.08.006
-
Terminal filament cell organization in the larval ovary of Drosophila melanogaster: ultrastructural observations and pattern of divisionsRoux's Archives of Developmental Biology 205:356–363.https://doi.org/10.1007/BF00377215
-
The roles of cell size and cell number in determining ovariole number in DrosophilaDevelopmental Biology 363:279–289.https://doi.org/10.1016/j.ydbio.2011.12.017
-
Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridizationMolecular Biology of the Cell 9:3273–3297.https://doi.org/10.1091/mbc.9.12.3273
-
Current progress in network research: toward reference networks for key model organismsBriefings in Bioinformatics 8:318–332.https://doi.org/10.1093/bib/bbm038
-
Network-Based disease module discovery by a novel seed connector algorithm with pathobiological implicationsJournal of Molecular Biology 430:2939–2950.https://doi.org/10.1016/j.jmb.2018.05.016
Article and author information
Author details
Funding
National Institutes of Health (1R01-HD073499)
- Tarun Kumar
- Cassandra G Extavour
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2020, Kumar et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,408
- views
-
- 251
- downloads
-
- 13
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
- Microbiology and Infectious Disease
Timely and effective use of antimicrobial drugs can improve patient outcomes, as well as help safeguard against resistance development. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is currently routinely used in clinical diagnostics for rapid species identification. Mining additional data from said spectra in the form of antimicrobial resistance (AMR) profiles is, therefore, highly promising. Such AMR profiles could serve as a drop-in solution for drastically improving treatment efficiency, effectiveness, and costs. This study endeavors to develop the first machine learning models capable of predicting AMR profiles for the whole repertoire of species and drugs encountered in clinical microbiology. The resulting models can be interpreted as drug recommender systems for infectious diseases. We find that our dual-branch method delivers considerably higher performance compared to previous approaches. In addition, experiments show that the models can be efficiently fine-tuned to data from other clinical laboratories. MALDI-TOF-based AMR recommender systems can, hence, greatly extend the value of MALDI-TOF MS for clinical diagnostics. All code supporting this study is distributed on PyPI and is packaged at https://github.com/gdewael/maldi-nn.
-
- Computational and Systems Biology
- Genetics and Genomics
Enhancers and promoters are classically considered to be bound by a small set of transcription factors (TFs) in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected five distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.