Studies of the genetic basis and evolution of complex social behavior emphasize either conserved or novel genes. To begin to reconcile these perspectives, we studied how the evolutionary conservation of genes associated with social behavior depends on regulatory context, and whether genes associated with social behavior exist in distinct regulatory and evolutionary contexts. We identified modules of co-expressed genes associated with age-based division of labor between nurses and foragers in the ant Monomorium pharaonis, and we studied the relationship between molecular evolution, connectivity, and expression. Highly connected and expressed genes were more evolutionarily conserved, as expected. However, compared to the rest of the genome, forager-upregulated genes were much more highly connected and conserved, while nurse-upregulated genes were less connected and more evolutionarily labile. Our results indicate that the genetic architecture of social behavior includes both highly connected and conserved components as well as loosely connected and evolutionarily labile components.https://doi.org/10.7554/eLife.04775.001
Animal species vary widely in their degree of social behavior. Some species live solitarily, and others, such as ants and humans, form large societies. Many researchers have tried to understand the genetic changes underlying the evolution of social behavior. Some researchers suggest that it involves recycling existing genes that also have other conserved functions. Others propose that the evolution of social behavior involves completely new genes that are not found in related but solitary species.
Ants are one of the best-studied social animals. An established colony can contain many 1000s of individuals that live and work together and perform different roles. The queen's job is to lay eggs, while the worker ants do everything else, including collecting food, caring for the young, and protecting the colony. In some species of ant—including the pharaoh ant—a worker's role changes as it ages. Younger workers tend to stay in the nest and nurse the brood, while older workers tend to leave the nest and forage for food.
Mikheyev and Linksvayer asked: which genes are responsible for this age-based division of labor? And how did this aspect of social behavior evolve? First, after observing pharaoh ants from two colonies set up in the laboratory, they confirmed that workers nursing the brood were on average almost a week younger than those seen collecting food. Next Mikheyev and Linksvayer identified which genes were expressed in ants of different ages, or ants engaged in different tasks. Specific sets of genes were expressed more (or ‘up-regulated’) in nurse workers, while others were up-regulated in foraging workers.
Mikheyev and Linksvayer then investigated how rapidly these genes had evolved by comparing them to related genes found in other social insects (fire ants and honey bees). They also determined the ‘connectivity’ of these genes by asking how many other genes showed similar expression patterns. In many organisms, how rapidly a gene evolves depends on how tightly connected its expression is to the expression of other genes; highly connected genes evolve more slowly.
The genes that were expressed more in the older foraging workers were both more highly connected and more evolutionarily conserved in the other social insects. Genes that were up-regulated in the younger nurse workers were more loosely connected and rapidly evolving.
Mikheyev and Linksvayer's findings show that the evolution of social behavior in animals involves both new genes, which tend to be loosely connected, and conserved genes, which tend to be more highly connected.https://doi.org/10.7554/eLife.04775.002
The main conclusion of a decade of sociogenomic research with a range of solitary and social animal species is that highly conserved genes underpinning core physiological processes can also influence behavioral state (Amdam et al., 2004, 2006; Toth and Robinson, 2007; Toth et al., 2007, 2010; Woodard et al., 2011; Woodard et al., 2014). For example, the insulin signaling pathway, which mediates an organism's response to its internal nutritional state, also influences its behavior (Ament et al., 2008). The genetic toolkit hypothesis and related hypotheses propose that a conserved set of genes and gene pathways involved in core physiological processes such as metabolism and reproduction has been repeatedly used in the evolution of complex social behavior in diverse lineages (West-Eberhard, 1996; Amdam et al., 2004, 2006; Toth and Robinson, 2007; Toth et al., 2007). This hypothesis stems from findings in Evolutionary Developmental Biology that morphological innovation in disparate lineages often involves the convergent use of a conserved set of genes (e.g., Hox genes) (Carroll et al., 2001; Toth and Robinson, 2007; Wilkins, 2013).
However, social behavior and other social traits are commonly viewed as having unique genetic features and evolutionary dynamics, including especially rapid evolution (West-Eberhard, 1983; Tanaka, 1996; Moore et al., 1997; Wolf et al., 1999; Nonacs, 2011; Bailey and Moore, 2012; Van Dyken and Wade, 2012). Could the molecular mechanisms underlying social interactions (e.g., social signal production and response) and social behavior, together with the process of social evolution result in distinct genetic architectures for social traits compared with other traits? Recent comparative transcriptomic and genomic studies find low overlap in genes associated with social behavior in different highly social animals and instead highlight the importance of novel genes and rapid evolution of social traits (Johnson and Tsutsui, 2011; Ferreira et al., 2013; Simola et al., 2013; Wissler et al., 2013; Feldmeyer et al., 2014; Harpur et al., 2014; Sumner, 2014; Jasper et al., 2015), in accordance with recent studies emphasizing the ubiquity of taxonomically restricted genes (Domazet-Loso and Tautz, 2003; Khalturin et al., 2009; Tautz and Domazet-Loso, 2011). Perhaps social evolution does not consistently use sets of highly conserved genes to the same degree as morphological evolution? The novel social genes hypothesis proposes that genes underlying social behavior are often novel socially acting genes or are genes with novel social functions not present in solitary ancestors (Johnson and Linksvayer, 2010; Johnson and Tsutsui, 2011; Sumner, 2014).
Research supporting the genetic toolkit hypothesis has stressed the significant signal of highly conserved genes affecting core physiological processes in transcriptomic data sets for social behavior (Robinson et al., 2008; Toth et al., 2010; Fischman et al., 2011; Woodard et al., 2011, 2014; Toth et al., 2014). In contrast, research supporting the novel social genes hypothesis has stressed the overall low proportional overlap of genes underlying social behavior in divergent lineages as well as the apparently general low degree of transcriptomic and genomic conservation in divergent lineages (Johnson and Tsutsui, 2011; Ferreira et al., 2013; Simola et al., 2013; Wissler et al., 2013; Feldmeyer et al., 2014; Harpur et al., 2014; Jasper et al., 2015; Sumner, 2014).
We sought to build on these previous results by examining how transcriptional regulatory context influences evolutionary conservation for genes associated with ant social behavior, and further whether genes associated with ant social behavior exist in distinct regulatory and selective contexts compared to the rest of the genome. Research in a range of model organisms demonstrates that the degree of a gene's connectivity to the rest of the regulatory network and its level of expression is often negatively correlated with its rate of molecular evolution (Krylov et al., 2003; Hahn and Kern, 2005; Jovelin and Phillips, 2009; Ramsay et al., 2009). For example, highly connected ‘hub’ genes are often highly expressed and evolutionarily conserved. Previous research has compared rates of molecular evolution for genes associated with reproductive division of labor in social insects (Hunt et al., 2010, 2013; Harpur et al., 2014), as well as other conditionally expressed phenotypes in other organisms (Brisson and Nuzhdin, 2008; Leichty et al., 2012; Purandare et al., 2014), indicating that genes associated with the expression of worker traits experience elevated rates of molecular evolution. However, the relationships among molecular evolution, connectivity, and expression have been little explored in social insects and are generally little understood for genes associated with social behavior. As a result, it is unclear if observed differences in rates of molecular evolution are caused by differences in regulatory architecture, expression, or perhaps result from distinct evolutionary mechanisms such as kin selection, which may operate differentially on genes associated with social behavior relative to the rest of the genome (Linksvayer and Wade, 2009; Hall and Goodisman, 2012). We further sought to identify modules of co-expressed genes that may be composed of both conserved and novel genes and may contribute to the expression and evolution of social complexity.
We studied the genetic basis and evolution of a fundamental aspect of social insect behavior, age-based division of labor (age polyethism). Age polyethism involves the progression of workers from in-nest tasks such as nursing to outside-nest tasks such as foraging. Because age polyethism is a trait expressed by the functionally sterile worker caste, it is expected to be shaped primarily through kin selection (Hamilton, 1964). While age polyethism plays a central role in the functioning of many eusocial systems (Hölldobler and Wilson, 2009), the molecular underpinnings have only been well studied in the honey bee Apis mellifera (Whitfield et al., 2006; Ament et al., 2008; Chandrasekaran et al., 2011), so that the genetic and evolutionary basis of age polyethism is not generally understood outside of honey bees. We identified transcriptional modules of co-regulated genes associated with worker age polyethism in the pharaoh ant Monomorium pharaonis; we identified the degree that these genes overlap with genes involved in age polyethism in two other social insects (Alaux et al., 2009; Manfredini et al., 2014); and we studied the relationship between expression level, connectivity and rates of molecular evolution at these genes compared to the rest of the genome.
We tracked cohorts of age-marked workers and recorded their behavior and location inside and outside the nest. In order to identify differentially expressed genes associated with age-based division of labor, we collected age-marked workers and workers observed performing specific behaviors. The observed location of workers from different age classes changed with both nest location and behavior (glm with quasipoisson errors and log link, both p < 0.01) (Figure 1, Figure 1—figure supplements 1, 2). In concordance with the expected pattern of age polyethism, the average age of workers observed in the different locations increased as distance from the brood area increased (Figure 1—figure supplement 2). Of the 15 behaviors observed more than 15 total times (Supplementary file 1), the likelihood of observing workers performing the behaviors ‘nurse’, ‘groom’, ‘rest’, ‘trophallaxis’, ‘walk’, and ‘forage’ depended on worker age (Figure 1A; glm with binomial errors and logit link, all nominal p < 0.0002, α = 0.003, controlling for multiple testing). Nursing and foraging were at the two extremes: the average age of workers observed nursing was 6.94 days and the average age of workers observed foraging (i.e., in the act of collecting food) was 13.04 days. There appeared to be a marked transition from nursing to foraging between 9 and 12 days of age (Figure 1A), with 75% of nursing observations made for workers less than 10 days old and 75% of foraging observations made for workers over 10 days old (Figure 1—figure supplement 1).
There was a trade-off in the assemblies between N50 and overall assembly lengths, as a function of kmer size. We chose k = 69 as a compromise between these two metrics, resulting in a scaffolded assembly of 284 mb, with a N50 of 19.0 kb. Although there is no M. pharaonis genome size estimate, the assembly is in the range of genome sizes typical of other myrmicine ants (Tsutsui et al., 2008). CEGMA analysis (Parra et al., 2009) found complete sequences for 92% of the ultra-conserved eukaryotic genes, and partial sequences for 97%. Most reads (97.6%) could be re-mapped to the genome assembly, resulting in a coverage estimate of 40×. Cufflinks assembly identified 22,385 transcribed loci. 74.9 ± 18% (median 85.1%) of the reads for each sample could be re-mapped to predicted transcripts extracted from the reference. After the reads were re-mapped to the assembled transcripts using the RSEM pipeline, each library had 10,602,832 ± 2,925,898 expected counts.
The complete analysis of gene expression data, including R code and output, is available in the Supplementary file 2 (with the complete R markdown script as Source code 1), and it is summarized below. We wished to examine which of the four worker behavioral samples (nursing larvae, foraging, grooming larvae, and worker–worker trophallaxis [i.e., exchanging liquid food]) had distinct expression profiles vs all of the others. We used linear contrasts to determine the number of differentially expressed genes between the focal behavioral category and the other behaviors. Of these contrasts, only foragers and nurses had significantly different gene expression patterns, when compared to the rest, that is, there was no evidence that workers engaged in grooming and trophallaxis had distinct transcriptional states. Consequently, we focused subsequent analysis on nurse and forager behavioral categories, except in the construction of the co-expression networks, where all behavioral category and age class samples were used (see below). There were 1217 forager-upregulated, 1247 nurse-upregulated transcripts, and 14,907 transcripts that were not differentially expressed.
Qualitatively, gene expression patterns mirrored the behavioral transition from nursing to foraging that we observed around day 10 (Figure 1A,B). To quantify these observations, we used a supervised learning approach (K-nearest neighbor classifier or KNN) to check whether genes differentially expressed in nurses and foragers could be used to differentiate the age class data into two clusters. After the KNN was trained on nurse and forager profiles, it clearly separated workers into two distinct classes based on age, assigning those younger than 12 days into the nurse class, and the rest into the forager class (Supplementary file 2 pages 14–15), suggesting a fairly discrete transcriptomic transition between the two behaviors.
The proportion of genes with identified orthologs in the fire ant Solenopsis invicta differed between behavioral categories (Manfredini et al., 2014), with forager-upregulated genes having a higher proportion (0.54) relative to nurse-upregulated (0.43) and non-differentially expressed (0.43) (multiple comparison Kruskal–Wallis, p < 0.05). Similarly, the proportion of genes with identified honey bee A. mellifera orthologs was higher for forager-upregulated genes (0.50), relative to nurse-upregulated (0.38), and non-differentially expressed genes (0.38) (multiple comparison Kruskal–Wallis, p < 0.05) (note we used a less conservative BLAST threshold for the honey bee so that the proportions of honey bee and fire ant orthologs are not directly comparable, see ‘Materials and methods’). Furthermore, approximately half of non-differentially expressed (0.51) and nurse-upregulated (0.50) genes did not have orthologs identified in either the fire ant or honey bee genomes, but this proportion was lower for forager-upregulated genes (0.39); correspondingly, the proportion of forager-upregulated genes with orthologs identified from both fire ants and honey bees was higher (0.43) compared to nurse-upregulated and non-differentially expressed genes (0.32) (X2 = 71.42, df = 6, p < 10−13).
Genes previously detected as upregulated in nurses and foragers of S. invicta were more likely to have identified M. pharaonis orthologs up-regulated in these contexts as well (p = 0.0022 and p = 0.040, respectively). However, the actual percentage of genes differentially expressed in the same context in these two ant data sets was small: 3.8% (47/1247) of nurse genes and 3.2% (39/1217) of forager genes; or if only considering genes with orthologs identified in both species, 8.6% (47/549) nurse genes and 5.9% (39/657) forager genes. While there was low overlap in the lists of differentially expressed genes, there could still be stronger overlap in genome-wide expression profiles when comparing nurse and forager samples between S. invicta and M. pharaonis. Thus, we estimated the correlation in the change of expression between nurse and forager samples (i.e., log fold change) between the S. invicta and M. pharaonis datasets for all genes with identifiable homologs. There was a significant correlation in the change of expression for nurse and forager samples, but one that explained only 2% of the variance (Spearman's rho = 0.14, 6324 genes, p < 10−16).
In contrast to the fire ant and pharaoh ant comparison, previously identified forager- and nurse-upregulated honey bee A. mellifera genes (Alaux et al., 2009) were not more likely to have M. pharaonis orthologs expressed in the same context (p = 0.99, p = 0.98, respectively), consistent with a previous comparison between S. invicta and A. mellifera (Manfredini et al., 2014). The actual overlap in honey bee and pharaoh ant gene lists was higher (71 nurse-upregulated genes and 46 forager-upregulated genes) due to the less conservative BLAST threshold we used for identifying honey bee orthologs, but the honey bee lists were also larger (Alaux et al., 2009) and the overlap was not significant.
Nurse-upregulated genes were strongly enriched for a range of GO terms associated with metabolism (nearly 50 metabolism-related terms with p < 10−5; Supplementary file 3). Forager-upregulated genes had a more diffuse signal, being relatively more weakly enriched for various GO terms, for example, associated with signal transduction and gland morphogenesis. Forager-upregulated genes showed a more consistent signal for underrepresented terms, for example, GO terms associated with metabolic processes and chromatin modification (Supplementary file 3).
The number of modules produced by WGCNA can vary based on several thresholding parameters, which we left as defaults (Supplementary file 2, pages 20–21). These settings resulted in 14 co-expression modules, ranging in size from 83 to 4218 genes (Figure 1C; Figure 1—figure supplement 3). A module's overall expression can be characterized by its eigengene. Correlations between eigengenes and traits in the original data suggest the involvement of corresponding modules in these traits. Eigengenes in two of the modules—1 and 14, which contained the most nurse and forager genes, respectively—were strongly correlated with worker age, although in opposite directions, suggesting their role in aging and age-based division of labor (r = −0.95, r = 0.91 and with FDR-adjusted p-values 0.0038, 0.023, respectively) (Supplementary file 2, page 24). Other modules showed complex patterns of age and behavior specific expression, with most of them showing a peak in expression once or twice during the lifetime of a worker (Supplementary file 2, page 26). Interestingly, most module eigengenes switched signs during the period between 9 and 12 days, corresponding to the behavioral transition from nursing to foraging. In other words, there appeared to be a major reprogramming step, where modules initially showing low expression became up-regulated, while modules initially showing high expression were down-regulated.
Forager-upregulated genes were concentrated in just a few modules, with only two modules containing more than 100 forager-upregulated genes (Figure 1—figure supplement 3). In contrast, nurse-upregulated genes were more widely distributed, with five modules having more than 100 nurse-upregulated genes (Figure 1—figure supplement 3). These five modules were mainly enriched for GO terms associated with metabolism and development (Figure 1—figure supplement 3; Supplementary file 4). Module 5, which contained 116 nurse-upregulated genes, was also enriched for terms associated with female gonad development, which is surprising given that M. pharaonis workers lack ovaries and are completely sterile. The modules containing forager-upregulated genes were enriched for a broad range of GO terms, for example associated with regulation of signaling, development and neurogenesis, and gene expression (Figure 1—figure supplement 3; Supplementary file 4). The proportion of module genes with identified S. invicta orthologs ranged from 0.28 to 0.53 (Figure 1—figure supplement 3), suggesting that in addition to being involved in different functions, the modules are composed of different proportions of conserved and taxonomically restricted genes.
Forager-upregulated genes were much more connected than nurse or non-differentially expressed genes, while nurse-upregulated genes were less connected than non-differentially expressed genes (Figure 2A) (multiple comparison Kruskal–Wallis, p < 0.05). There was a small but significant difference in evolutionary rate dN/dS (Figure 2C), with nurse-upregulated genes evolving more rapidly than non-differentially expressed genes (multiple comparison Kruskal–Wallis, p < 0.05). Nurse and forager genes were also more highly expressed (Figure 2B) than non-differentially expressed genes (Kruskal–Wallis, p < 0.05), although this last comparison is likely biased because differential expression is more easily detected in highly expressed genes.
Co-expression network connectivity and expression level were overall negatively associated with evolutionary rate, such that highly connected and highly expressed genes had decreased rates of molecular evolution (Figure 2D,E; evolutionary rate and connectivity, r = −0.15, p < 2 × 10−16; evolutionary rate and expression, measured in terms of transcriptional abundance, fragments per million reads mapped, FPKM, r = −0.12, p < 2 × 10−16); and connectivity and expression were positively correlated (r = 0.30, p < 2 × 10−16). In a full model considering how a gene's rate of molecular evolution depended on its gene expression level, network connectedness, and behavioral category, the largest effects were main effects of expression (z = −7.42, p = 1.29 × 10−13) and connectivity (z = −3.69, p = 0.00023).
We also studied the effects of gene category (i.e., upregulated in nurses or foragers, or not differentially expressed), expression level, and connectivity on whether a given M. pharaonis gene had an identifiable fire ant S. invicta and honey bee A. mellifera orthologs. Overall, genes with orthologs in the fire ant or honey bee had greater connectivity and expression (Figure 3, Figure 3—figure supplement 1). In considering a model with both main and interaction effects of behavioral category, expression level, and connectivity, connectivity had the strongest effect (glm with quasibinomial residuals: t = 24.5, p < 10−16, for the presence of S. invicta orthologs; t = 32.2, p < 10−16, for the presence of A. mellifera orthologs), with more highly connected genes being more likely to have an ortholog. There were also much smaller interaction effects indicating that nurse-upregulated genes had fewer orthologs than expected given their connectivities (i.e., connectivity had a weaker effect on nurse-upregulated genes than other genes, Figure 3 and Figure 3—figure supplement 1; t = −3.17, p = 0.0015 for S. invicta orthologs; t = −2.76, p = 0.0057 for A. mellifera orthologs), and forager-upregulated genes had fewer orthologs than expected given their expression (t = −2.33, p = 0.02 for S. invicta orthologs; t = −2.58, p = 0.0098 for A. mellifera orthologs; Figure 3 and Figure 3—figure supplement 1).
Pharaoh ant workers showed a clearly defined age-based transition from nursing to foraging, in terms of both behavioral and transcriptional patterns, with nurses and foragers having strongly differentiated sets of upregulated genes (Figure 1). We recovered the commonly observed genome-wide relationship between a gene's rate of molecular evolution, its expression level, and its network connectivity (Krylov et al., 2003; Hahn and Kern, 2005; Jovelin and Phillips, 2009; Ramsay et al., 2009). Specifically, the rate of molecular evolution (dN/dS) as well as the likelihood a gene had identified fire ant and honey bee orthologs was negatively correlated with its expression level and connectivity within co-expression networks, while expression and connectivity were positively correlated (Figures 2, 3). In addition to these genome-wide patterns, nurse- and forager-upregulated genes had distinct regulatory and evolutionary patterns relative to each other and to the rest of the transcriptome (Figures 2, 3). Most strikingly, forager-upregulated genes were much more highly connected and correspondingly more conserved, while nurse-upregulated genes were less connected, and more rapidly evolving and less conserved.
Previous studies of the evolutionary genetic basis of social behavior have focused on the overlap of genes lists associated with social traits in different lineages. We found significant but seemingly low (<4%) overlap in lists of differentially expressed genes and the correlation in genome-wide expression profiles (r = 0.14) when comparing gene expression in nurse and forager samples between the pharaoh ant and fire ant, S. invicta. Such low overlap seems surprising, given that these two ants are in closely related ant genera, having diverged on the order of 50 Mya (Ward et al., 2014). However, the comparison is not perfect, given substantial differences between the two studies in methodology used to characterize the behaviors, and in the technology used to measure gene expression (i.e., microarray vs RNA sequencing) (Manfredini et al., 2014). We did not find significant overlap between lists of honey bee and pharaoh ant genes associated with age polyethism, consistent with results reported by the earlier fire ant study (Manfredini et al., 2014). While we expected decreased overlap given that honey bees and ants diverged longer ago, ∼170 Mya (Ronquist et al., 2012), and represent independent origins of eusociality, the ant-honey bee comparison is also more problematic because the honey bee data are based on brain gene expression profiles whereas the fire ant and pharaoh ant data are based on whole body gene expression profiles.
Past studies have often interpreted significant but similarly low overlap in lists of genes associated with social behavior from different lineages as supporting the genetic toolkit hypothesis (Toth et al., 2010, 2014; Woodard et al., 2014). In contrast, other authors have recently interpreted low overlap as being consistent with the novel social genes hypothesis, which emphasizes the importance of taxonomically restricted genes (Ferreira et al., 2013; Feldmeyer et al., 2014; Sumner, 2014). The contrasting emphasis of authors on either conserved or novel genes begs the question: what degree of conservation in gene lists is necessary for confirmation or rejection of these two hypotheses? For example, the fact that nurse-upregulated genes in M. pharaonis are more rapidly evolving than the rest of the genome and that 50% of nurse-upregulated genes do not have identifiable fire ant or honey bee orthologs suggests that novel genes may have important nurse-specific functions. At the same time, the significant overlap of fire ant and pharaoh ant gene lists and the strong enrichment of nurse-upregulated genes for gene ontology terms associated with metabolism and development suggests that conserved genes involved in core physiological processes also play important roles in nurse function and the evolution of division of labor. Thus, our results are generally consistent with both hypotheses. We suggest that neither of these two hypotheses has yet been formulated in a way that is readily tested, in part because it is unclear what precise genes are expected to be included or excluded from a genetic toolkit (Wilkins, 2013). Furthermore, these hypotheses are not mutually exclusive, since both conserved and novel genes likely play roles in the evolution of all new traits (Johnson and Linksvayer, 2010; Woodard et al., 2011).
We suggest that shifting the focus, from lists of genes to modules of co-expressed genes in the context of genome-wide transcriptional and evolutionary patterns, can help to elucidate how social evolution has produced social complexity. In this way, one question we can ask is whether we see any simple molecular signature of social evolution, for example due to kin selection? As Monomorium ant workers are obligately sterile, all worker traits are expected to be shaped exclusively by indirect selection (i.e., kin selection) (Hamilton, 1964). All-else-equal, such indirect selection is weaker than direct selection, proportional to relatedness (Hamilton, 1964), and a priori is expected to produce relaxed selective constraint and elevated rates of molecular evolution for all genes associated with worker traits (Linksvayer and Wade, 2009). Past studies have found different rates of molecular evolution for worker-biased and queen-biased genes, with most studies finding that worker-biased genes are more rapidly evolving (Ferreira et al., 2013; Feldmeyer et al., 2014; Harpur et al., 2014; but see; Hunt et al., 2010). Some researchers have interpreted different patterns between lineages as being consistent with simple kin selection predictions based on differences in within-colony relatedness (Hall and Goodisman, 2012), but most studies have emphasized the association between conditional expression and relaxed selection (Hunt et al., 2011, 2013), as well as genes associated with worker traits simply experiencing stronger positive selection (Hunt et al., 2010; Ferreira et al., 2013; Feldmeyer et al., 2014; Harpur et al., 2014). We observed weakly elevated rates of molecular evolution at nurse-upregulated genes compared to the rest of the genome, but much more notable was the distinct connectivity and corresponding differences in gene conservation for forager-upregulated genes relative to nurse-upregulated and non-differentially expressed genes. These results suggest that social evolution does not just have simple genome-wide effects such as relaxed effective selection associated with kin selection, but instead shapes complex social traits while acting within general systems-level constraints imposed by regulatory architecture.
The common perception that social evolution often involves rapid evolutionary dynamics (West-Eberhard, 1983; Tanaka, 1996; Moore et al., 1997; Wolf et al., 1999; Nonacs, 2011; Bailey and Moore, 2012; Van Dyken and Wade, 2012) may result from the fact that genes influencing many key social traits are not only conditionally expressed, but are also located peripherally within regulatory networks, and so are relatively unconstrained. For example, we expect that traits associated with social signal production (e.g., pheromone and glandular secretions) are often located peripherally within regulatory networks and as a result may be evolutionarily labile (Jasper et al., 2015), as is the case more generally with secreted proteins (Julenius and Pedersen, 2006; Liao et al., 2010; Nogueira et al., 2012). More core and conserved components are also certain to be important to the expression of these traits, but their contribution to trait evolution may be minimized by virtue of the fact that they are highly connected. These arguments suggest how both conserved, toolkit genes, as well as rapidly evolving and taxonomically restricted novel genes, likely play important roles in the evolution of social novelty, with novel genes being added peripherally to regulatory networks. Our results are consistent with this interpretation, because M. pharaonis age-based division of labor seems to have a complex genetic basis with some components that are highly connected and conserved, and other components that are more loosely connected and evolutionarily labile.
Our findings that nurse-upregulated genes are more rapidly evolving and less conserved among social insect lineages relative to forager-upregulated genes suggest that nurse traits have been a major focus of evolutionary innovation between social insect lineages. This result seems surprising given that foragers of different lineages experience diverse environments outside the nest compared to the relatively constant within-nest environment experienced by nurses and could be expected to experience more diverse selective pressures. One explanation is that the physiological mechanisms associated with metabolically costly foraging activities and older adult life (M. pharaonis workers usually only live several weeks [Peacock and Baxter, 1950], so that foragers which start right before their second week of age may already be senescing) may be relatively conserved and simple. Nursing behavior, occurring during very early adult life, may involve more diverse physiological and developmental processes, and nursing itself may also involve more diverse behaviors and physiological processes, such as food processing and the synthesis of glandular secretions that are fed to larvae. Perhaps the relatively more complex genetic architecture (less tightly connected, involving more modules, and diverse processes) has served as less of a constraint and facilitated more evolutionary change for nurse-related genes. If so, we predict that nurse-specific functions and functions for early adult life may be generally more evolutionarily labile as well as more physiologically and behaviorally labile within and across lineages than forager-specific functions. Note that this prediction is opposite the typical expectation that genes acting early in development have more pleiotropic effects and are thus especially constrained (Roux and Robinson-Rechavi, 2008; Piasecka et al., 2013), but obligate sterility may, in part, release workers from these constraints on the evolution of genes acting early in worker development.
We identified two discrete sets of genes with distinct genetic architecture associated with age-based division of labor. The majority of forager-upregulated genes were contained within a single gene module (module 14; Figure 1—figure supplement 3) that was significantly positively associated with age. Another module with expression negatively associated with age contained the largest number of nurse genes, but nurse genes were also broadly spread out across a number of other modules with complex expression patterns across age and behavioral groups. Interestingly, the modules differed in the proportion of constituent genes which had identifiable S. invicta and A. mellifera orthologs, indicating that modules vary in the degree to which they are composed of conserved genes and gene networks vs rapidly evolving genes with unknown function. That said, the modules were enriched for various gene ontology terms, providing some insight into their putative functional importance (Supplementary file 4).
By explicitly studying regulatory architecture and inferring modules of tightly connected genes in other species as well as M. pharaonis, it will be possible to further identify what network components contribute to the expression of social traits, how rapidly these components are evolving within populations, and how they have contributed to phenotypic differences between divergent lineages. Building on the genetic toolkit conceptual framework, it will be possible to ask to what degree diverse lineages repeatedly use the same modules, and importantly approaches already exist for quantifying module overlap in the absence of functional information (Oldham et al., 2006; Langfelder et al., 2011). Similarly, after finding non-significant overlap in lists of genes associated with queen- and worker-caste development in paper wasps and honey bees, Berens et al. (2014) recently invoked a ‘looser’ version of the genetic toolkit hypothesis by examining the overlap of inferred functional enrichment of gene lists (i.e., via gene ontology analysis). Focus on co-expressed modules may actually improve the feasibility of inferring the function of co-expressed genes based on observed expression patterns together with standard functional information inferred from the subset of conserved annotated genes with identifiable orthologs from model systems. It will also be possible to determine the relative contribution of conserved vs taxonomically restricted genes to co-expression modules.
Two replicate M. pharaonis observation colonies were established, each with 10 queens, approximately 4000 workers, and 1000 brood, representing a random subsample of a larger source colony. Each colony was established from a separate source colony, which came from a stock of approximately 40 colonies that have been repeatedly mixed across generations so that they are genetically similar. Observation nests were constructed of two pieces of 5 × 15 cm glass separated by 1.5 mm thick plastic sheeting. Colonies were given water in cotton-plugged test tubes, 50% honey solution, beef liver, egg yolk, and mealworms ad libitum, replaced twice a week. Colonies were maintained at 27 C and 65% relative humidity in climate controlled rooms at the University of Pennsylvania.
Every 3 days, 600 newly eclosed callow workers, which were inferred to be approximately 0–1 days old, were collected from 8–10 stock colonies. These callow workers were briefly anesthetized with CO2 and individually paint marked on the gaster with a unique age cohort color dot using a Sharpie extra fine oil based paint pen, and then 300 were added to each of the observation colonies. Five uniquely marked age cohorts were thus added to the colonies on days 1, 4, 7, 10, and 13 of the study. Nestmate recognition is at most weak and transient in M. pharaonis (Schmidt et al., 2010), and callows in particular are readily accepted. We also set up a camera to automatically take images of the nest areas of each colony once every 20 min for the entire period of the study, although we do not further discuss these images.
Previous literature indicates that M. pharaonis workers are expected to live 9–10 weeks (Peacock and Baxter, 1950), but our preliminary trials with our setup indicated that workers tend to die or lose their paint marks after several weeks. We ran the study for 1 month, expecting to capture the major age-based transitions in worker behavior (e.g., the nursing to foraging transition observed in other species), but it is possible that we missed late behavioral transitions that occurred towards the end of workers' lives. In practice, such late transitions are difficult to detect as sample size necessarily declines as increasing numbers of workers die.
A behavioral scan of each colony was completed once each day for the duration of the month-long study by recording the instantaneous behavior and location observed for every visible paint-marked worker. Each behavioral scan was performed at 20× magnification with a Nikon SMZ800 stereomicroscope. We recorded 30 distinct behaviors, but only 15 were observed more than 15 total times during the study period (Supplementary file 1). We defined an individual as foraging if it was observed on a food or water source or actually carrying food (i.e., foraging included the behaviors ‘on honey’, ‘on liver’, ‘on water’, or ‘carrying food’; Supplementary file 1). Each experimental colony contained four identifiable locations that were redefined prior to each behavioral scan: brood area, brood periphery, remaining nest area, and foraging area. The brood area was defined as the central area within the nest containing all brood and queens (Edwards, 1991). The nest periphery was defined as the region directly adjacent to the brood area, where workers were dense in aggregation but not in contact with any of the brood. The nest area was defined as the sparsely occupied remainder of the space within the nest, not including the brood area and nest periphery. The foraging area included all areas outside of the nest. Analyses of behavioral data were conducted in R (www.r-project.org).
Every 3 days, whole bodies of five individuals from each available uniquely paint marked age cohort were collected, flash frozen in liquid nitrogen, and stored at −80°C. This sampling scheme resulted in seven groups of individuals of known age (0, 3, 6, 9, 12, 15, and 18+ days old). 20 individuals of each of these age category were pooled for whole body RNA extraction for each of the two replicate observation colonies. In addition, for each of the two replicate observation colonies, we collected and pooled 20 non-paint marked workers in the act of the following five behaviors: nursing larvae, grooming larvae, engaged in trophallaxis with other workers, foraging for protein (collecting egg, mealworm, or liver), and foraging for carbohydrates (collecting honey solution). RNA was extracted from pools of worker samples of known age or observed behavior using Qiagen RNeasy kits with standard protocols. RNA sequencing libraries were constructed at the University of Arizona Genetics Core (UAGC) with RNA TruSeq library construction kits following standard protocols. In total there were 24 libraries: 2 colony replicates × (7 age groups + 5 behavioral groups). RNA sequencing was conducted at the University of Arizona Genetics Core on an Illumina HiSeq2000 with 100 bp paired ends reads, with six samples multiplexed per lane, randomly distributed across four lanes. Sequences were post-processed by cutadapt (Martin, 2011) to remove Illumina adapter sequences and ConDeTri (Smeds and Künstner, 2011) to remove low-quality bases.
DNA from a single haploid male (183 ng) was used to prepare a TruSeq library, which was sequenced in multiplex on an Illumina HiSeq 2000, yielding 70,894,179 million 100 bp read pairs. Raw genomic reads were quality and adaptor trimmed using ConDeTri and cutadapt (Martin, 2011; Smeds and Künstner, 2011), producing 57,002,951 read pairs and 8,361,560 single reads (12.3 Gb total). The assembly was carried out using ABYSS, with a range of kmers from 53 to 91 (Simpson et al., 2009). We then chose the assembly with the longest N50 as the reference for transcriptome assembly. Genome assembly quality was evaluated using the CEGMA pipeline (Parra et al., 2009), and by re-mapping the paired end trimmed reads using bowtie2 (Langmead and Salzberg, 2012).
The transcriptome was mapped to the reference using Tophat 2, and assembled into transcripts using Cufflinks 2.1 (Roberts et al., 2011; Kim et al., 2013). Gene expression data were obtained by re-mapping the transcript reads to the extracted transcripts using RSEM and calculating the expected counts at the gene level (Li and Dewey, 2011). When multiple isoforms of a single locus were found, only the longest transcript was used for gene annotation. Assembled transcripts were annotated using BLASTX from the non-redundant NCBI database with expectation values of E = 10−5. These results were used to assign Gene Ontology (GO) profiles with Blast2go (Conesa et al., 2005).
Transcript counts were filtered by abundance, removing those with less than 1 fragment per kilobase mapped (FPKM) in more than half of the libraries (Mortazavi et al., 2008). Differential gene expression analysis was carried out in edgeR, using a GLM fit to the count data and identifying differentially expressed genes using planned linear contrasts (Robinson et al., 2010). In order to infer co-expression modules and gain an insight into network structure of gene interactions, we performed a weighted gene co-expression network analysis (WGCNA) on the count data (Langfelder and Horvath, 2008). WGCNA was conducted on the entire transcript set, after filtering out the low-abundance transcripts. This analysis relies on patterns of gene co-expression, but has been shown to reconstruct protein–protein interaction networks with reasonable accuracy (Zhao et al., 2010; Allen et al., 2012). We used total connectivity as a measure of gene interaction strength, because it is not as sensitive to module assignments, and most likely reflects the overall selective pressures acting on the gene, beyond those imposed by its role in age polyethism. As with most gene expression analysis, WGCNA provides better estimates for highly abundant genes, and in particular for genes showing variation in their expression levels. Consequently, low-abundance and invariant genes will show lower connectivity.
GO term enrichment analysis was performed using the R package GOstats (Falcon and Gentleman, 2007). We report GO terms as enriched when p < 0.05.
Fire ant (S. invicta) orthologs for each gene were determined using reciprocal best BLASTP, using cutoffs of 10−10. This parameterization allowed for high specificity, though at the cost of sensitivity, since paralogs were ignored (Chen et al., 2007). These results were used to predict the M. pharaonis coding sequence using ORFPredictor (Min et al., 2005). We then computed the pairwise dN/dS ratios for each gene using the branch model in PAML (v. 4.6). Using the list of differentially expressed genes in foragers vs nest workers in the fire ant (Manfredini et al., 2014), Fisher's exact tests were used to examine whether genes differentially expressed in these categories of workers were more likely conserved, than expected by chance. We repeated the analysis above using honey bee (A. mellifera) genes, except that the BLAST cutoff was lowered to 10−5 to increase the chance of identifying orthologs in the more divergent honey bee.
To initially study whether evolutionary rate (dN/dS), connectivity (kTotal), and expression (FPKM) differed between behavioral categories (nurse-upregulated, forager-upregulated, and non-differentially expressed), we used a Kruskal–Wallis test, adjusted for multiple comparisons (kruskalmc function in the R package pgirmess). Finally, to study the main and interaction effects of connectivity, expression, and behavioral category on evolutionary rate, we used a linear model log transformed rate as the dependent variable, log transformed connectivity and expression as continuous predictors, and behavioral category as a categorical predictor.
Statistical analysis was performed with R. Means are presented ± their standard deviations. p-value cutoffs of 0.05 were used throughout the analysis. In the case of differential gene expression, data analyses were corrected for multiple comparisons using the Benjamini-Hochberg (FDR) procedure (Benjamini and Hochberg, 1995).
Scripts for the bioinformatic analyses, and a README explaining the workflow can be found at https://github.com/mikheyev/monomorium-polyethism. Most of the workflow and output is shown in Supplementary file 2, with the corresponding R script shown in Source code 1. All behavioral and gene expression data, including a MySQL database for the gene expression data have been deposited to Dryad, doi:10.5061/dryad.cv0q3 (Mikheyev and Linksvayer, 2014). Raw reads and the genome assembly are available at the DNA Data Bank of Japan, DDBJ BioProject PRJDB3164.
Honey bee aggression supports a link between gene regulation and behavioral evolutionProceedings of the National Academy of Sciences of USA 106:15400–15405.https://doi.org/10.1073/pnas.0907043106
Reproductive ground plan may mediate colony-level selection effects on individual foraging behavior in honey beesProceedings of the National Academy of Sciences of USA 101:11350–11355.https://doi.org/10.1073/pnas.0403073101
Insulin signaling is involved in the regulation of worker division of labor in honey bee coloniesProceedings of the National Academy of Sciences of USA 105:4226–4231.https://doi.org/10.1073/pnas.0800630105
Controlling the false discovery rate - a practical and powerful approach to multiple testingJournal of the Royal Statistical Society Series B-Statistical Methodology 57:289–300.
Comparative transcriptomics of convergent evolution: different genes but conserved pathways underlie caste phenotypes across lineagse of eusocial insectsMolecular Biology and Evolution, 10.1093/molbev/msu330.
From DNA to diversity: molecular genetics and the evolution of animal designMalden, MA: Blackwell Science.
Behavior-specific changes in transcriptional modules lead to distinct and predictable neurogenomic statesProceedings of the National Academy of Sciences of USA 108:18020–18025.https://doi.org/10.1073/pnas.1114093108
An evolutionary analysis of orphan genes in DrosophilaGenome Research 13:2213–2219.https://doi.org/10.1101/gr.1311003
Using GOstats to test gene lists for GO term associationBioinformatics 23:257–258.https://doi.org/10.1093/bioinformatics/btl567
Molecular evolutionary analyses of insect societiesProceedings of the National Academy of Sciences of USA 108:10847–10854.https://doi.org/10.1073/pnas.1100301108
Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networksMolecular Biology and Evolution 22:803–806.https://doi.org/10.1093/molbev/msi072
Population genomics of the honey bee reveals strong signatures of positive selection on worker traitsProceedings of the National Academy of Sciences of USA 111:2614–2619.https://doi.org/10.1073/pnas.1315506111
The superorganism: the beauty, elegance, and strangeness of insect societiesNew York: Norton.
Evolution at two levels in fire ants: the relationship between patterns of gene expression and protein sequence evolutionMolecular Biology and Evolution 30:263–271.https://doi.org/10.1093/molbev/mss234
Relaxed selection is a precursor to the evolution of phenotypic plasticityProceedings of the National Academy of Sciences of USA 108:15936–15941.https://doi.org/10.1073/pnas.1104825108
Sociality is linked to rates of protein evolution in a highly social insectMolecular Biology and Evolution 27:497–500.https://doi.org/10.1093/molbev/msp225
Large scale coding sequence change underlies the evolution of post-developmental novelty in honey beesMolecular Biology and Evolution 32:334–346.https://doi.org/10.1093/molbev/msu292
Deconstructing the superorganism: social physiology, groundplans, and sociogenomicsQuarterly Review of Biology 85:57–79.https://doi.org/10.1086/650290
Relaxed genetic constraint is ancestral to the evolution of phenotypic plasticityIntegrative and Comparative Biology 52:16–30.https://doi.org/10.1093/icb/ics049
Impact of extracellularity on the evolutionary rate of mammalian proteinsGenome Biology and Evolution 2:39–43.https://doi.org/10.1093/gbe/evp058
Molecular and social regulation of worker division of labour in fire antsMolecular Ecology 23:660–672.https://doi.org/10.1111/mec.12626
Data from: genes associated with ant social behavior show distinct transcriptional and evolutionary patternsDryad Digital Repository..
OrfPredictor: predicting protein-coding regions in EST-derived sequencesNucleic Acids Research 33:W577–W680.
Mapping and quantifying mammalian transcriptomes by RNA-SeqNature Publishing Group 5:621–628.https://doi.org/10.1038/nmeth.1226
Kinship, greenbeards, and runaway social selection in the evolution of social insect cooperationProceedings of the National Academy of Sciences of USA 108:10808–10815.https://doi.org/10.1073/pnas.1100297108
Conservation and evolution of gene colexpression networks in human and chimpanzee brainsProceedings of the National Academy of Sciences of USA 103:17973–17978.https://doi.org/10.1073/pnas.0605938103
Studies in Pharaoh's ant, Monomorium pharaonis (L.), 3: life historyEntomologist's Monthly Magazine 86:171–178.
Accelerated evolution of Morph-biased genes in pea aphidsMolecular Biology and Evolution 31:2073–2083.https://doi.org/10.1093/molbev/msu149
The correlation of evolutionary rate with pathway position in plant terpenoid biosynthesisMolecular Biology and Evolution 26:1045–1053.https://doi.org/10.1093/molbev/msp021
ABySS: a parallel assembler for short read sequence dataGenome Research 19:1117–1123.https://doi.org/10.1101/gr.089532.108
The importance of genomic novelty in social evolutionMolecular Ecology 23:26–28.https://doi.org/10.1111/mec.12580
Brain transcriptomic analysis in paper wasps identifies genes associated with behaviour across social insect lineagesProceedings Biological Sciences / The Royal Society 277:2139–2148.https://doi.org/10.1098/rspb.2010.0090
Sexual selection, social competition, and speciationQuarterly Review of Biology 58:155–183.https://doi.org/10.1086/413215
Wasp societies as microcosms for the study of development and evolutionIn: S Turillazzi, MJ West-Eberhard, editors. Natural history and evolution of paper wasps. New York: Oxford University Press. pp. 290–317.
Genomic dissection of behavioral maturation in the honey beeProceedings of the National Academy of Sciences of USA 103:16068–16075.https://doi.org/10.1073/pnas.0606909103
Advances in evolutionary developmental biologyAdvances in evolutionary developmental biology, Hoboken, NJ, John Wiley & Sons.
Mechanisms and dynamics of orphan gene Emergence in insect genomesGenome Biology and Evolution 5:439–455.https://doi.org/10.1093/gbe/evt009
Molecular heterochrony and the evolution of sociality in bumblebees (Bombus terrestris)Proceedings Biological Sciences / The Royal Society 281:20132419.https://doi.org/10.1098/rspb.2013.2419
Genes involved in convergent evolution of eusociality in beesProceedings of the National Academy of Sciences of USA 108:7472–7477.https://doi.org/10.1073/pnas.1103457108
Weighted gene coexpression network analysis: state of the artJournal of Biopharmaceutical Statistics 20:281–300.https://doi.org/10.1080/10543400903572753
Philipp KhaitovichReviewing Editor; Partner Institute for Computational Biology, China
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for sending your work entitled “Genes associated with ant social behavior show distinct transcriptional and evolutionary patterns” for consideration at eLife. Your article has been favorably evaluated by Diethard Tautz (Senior editor), a Reviewing Editor, and three reviewers.
The Reviewing editor and the reviewers discussed their comments before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.
While all three reviewers are enthusiastic about the study, several important concerns were voiced by all of them. All reviewers pointed out that this paper does not provide the necessary statistical details to be able to assess the quality of the work. This is certainly the most important concern, as hypothesis acceptance/rejection—which is the central message of the study—fully depends on gene expression analysis and its interpretation. The respective comments are combined below and it will be necessary to address them before a final decision about acceptance can be reached.
The authors investigate the “novel social genes” vs the “gene toolkit” hypotheses. This is certainly interesting and worthwhile. However, one general issue I have with this (and most other) studies on the topic is that it is not clear what “exactly” constitutes evidence for one hypothesis or the other.
For example, the authors here state that the percentage of genes differentially expressed in the same context in the two ants is small (∼3%). However, the overlap is marginally significant (fifth paragraph of the Results section). So the question is, what percent overlap exactly would constitute support for the toolkit hypothesis? 50%? 25%? 10% 5%?
The point is that it isn't clear how and when authors reject or fail to reject the toolkit/novel hypotheses because there is never an explicit significance or overlap threshold provided. The authors are not alone in having to deal with this issue, and so I don't want to lay this completely at their feet, but it would be nice if they stated explicitly what exactly would constitute evidence, or lack thereof, for the toolkit hypothesis somewhere before they get to the results.
Related to this issue, I wanted more detail about the analyses comparing expression patterns between taxa. Did the authors just ask if the same genes were differentially expressed between behavioral types in the two taxa to be viewed as 'consistent' with the toolkit model? Or did a particular gene have to be differentially expressed in the same direction in both taxa? (The latter, I think, although this wasn't clear.) And did a gene need to be significantly differentially expressed to be counted? Or was it sufficient that it show the same directionality in expression, regardless of significance? Directionality without significance is as important as significance, given that the studies in the taxa had different power, used different methods, etc. These are not trivial issues and they may affect the outcome and interpretation of the results. I urge the authors to look into this more closely.
In the third paragraph: The authors state they are interested in the “molecular mechanisms of social interactions (e.g. social signal production, reception and response)”. They refer to genes related to social behavior throughout the manuscript. But are their expression profiles indeed reflecting social information producing/processing skills? Or morphological changes related to other functions, such as exposure to exterior environments? They do not state they used whole ants or just ant heads for transcriptome profiling, which is highly crucial for interpreting the results (especially given that some other studies have used the animal head).
In the third paragraph of the Results section: “Of these contrasts, only foragers and nurses had significantly different gene expression patterns.” This is not well explained. Four categories are compared and 2 of these are said to have different expression. What is the reference here; are the other two categories (grooming and trophallaxis) not different from these two? Or perhaps I am missing something?
This can be partly followed in the Supplementary Material, but should be referred in the main text, e.g. it seems as if all samples were grouped together in the DE analysis. The foragers and nurses were most different as they represent the youngest and oldest. I would have stated this explicitly.
I am also concerned somewhat about the PCA in the Supplementary Material: There seems to be two groups emerging, but this is likely technical (I would guess sample processing dates). It might be difficult to control for this, but if possible, could improve DE analysis significantly.
In the third paragraph of the Results section: “There were 1217 forager- and 1247 nurse-upregulated genes”. What was the p-value cutoff? How did the authors control for multiple-testing? (This can also be followed in the Supplementary Material, but should be referred in the main text.)
In the fourth paragraph of the Results section: “(…) it separated workers into two distinct classes based on age”. If I understand what was done, I think the authors might be overinterpreting: the algorithm will separate the profiles into 2 classes if k=2, and n classes if k=n. Thus, without additional analysis I think one cannot decide on the existence of distinct classes. The authors could consider applying some other test; e.g. check the slope of the expression-age curve.
In the fifth and sixth paragraphs of the Results section: In the gene expression conservation analysis, we are given no information how many genes are used in the comparisons (i.e, the number of genes showing DE in both this and the other datasets, as well as background genes). If the numbers are low, they could instead check the effect size of orthologous genes identified as DE for honeybee, for example. Was the honeybee data generated by Manfredini et al., 2014? If not, the authors should state that.
Most importantly, if the honeybee data was generated from the brain (as done by quite a few studies) and the data in this study from the whole body, this could also be a reason for finding limited overlap.
In comparisons with the Fisher's exact test, it would be useful to state what the background is (non-DE genes, genes up-regulated in the other category, or both?).
The expression “whether genes differentially expressed in these categories of workers were more likely conserved” is a bit confusing, as it also implies sequence conservation, but I think the authors mean conservation with respect to correlated changes.
In the seventh paragraph of the Results section: Connectivity—this could be more explicitly defined, such as emphasizing that the prediction comes from transcription data correlations (e.g. not protein-protein interaction data), and that it depends on how the modules are defined. I think the authors could also discuss potential biases here. Depending on the signal/noise ratio of a gene and the module size, how would connectivity be affected? One would want to make sure that these factors are not influential on the reported result.
Figure 2: Would it not be informative to add a violin plot (similar to A and B) for dN/dS? Especially so, as lower conservation among up-regulated genes is one of the paper's main points. But no information is given regarding the magnitude of the effect. The authors could also plot expression versus connectivity.
In the sixth paragraph of the Results section: There is little discussion on the GO analysis. Does the UV response pathway have to do with sudden exposure to the sun? At least would one not expect to see the same pathway up-regulated in foragers of other taxa?
Please indicate the p-value cutoffs for the GO analysis. This is also found in the Supplementary Material, but should be in the main text or Methods.
It would be helpful if the authors addressed the following:
What is the estimated genome size? What was the CEGMA assembly score for the de novo genome assembly? What was the average coverage per sample for the genomic and transcriptomic data?
The main conclusion that “genes unregulated in foragers and nurses were on average less connected and more rapidly evolving” (ninth paragraph of the Results section) relies heavily on the assumption that they are working with a high-quality transcriptome and that their orthology assignments are correct.
How did they evaluate this? A table with summary statistics would be very useful. How many transcripts had homology to the fire ant and/or the honey bee? How was the paralog problem dealt with, particularly with respect to the molecular evolution analyses?
Similarly, for the network analyses: Were these co-expression networks calculated only on significant transcripts or on all transcripts? How was a significant “network” determined? Two of the modules had > 8000 transcripts in each of them. Does that mean all 8000 transcripts show tightly-correlated expression levels?
Finally, why didn't the authors include Polistes in their comparative analyses? There are at least 2 studies on Polistes, both of which are already cited in this manuscript. This seems like it would be another independent data point worth discussing.https://doi.org/10.7554/eLife.04775.015
- Timothy A Linksvayer
The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Sandra Rehan and Nadeesha Perera collected the behavioral observation data and did RNA extractions. We thank Fabio Manfredini for help with the fire ant data. We are grateful to Yannick Wurm, Eyal Privman, and Laurent Keller for providing the raw M. pharaonis genomic reads and to Yannick Wurm for bioinformatic assistance early on in the project. This project was funded in part by a University of Pennsylvania University Research Foundation grant to TAL.
- Philipp Khaitovich, Reviewing Editor, Partner Institute for Computational Biology, China
© 2015, Mikheyev and Linksvayer
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.