Experimental evidence that chronic outgroup conflict reduces reproductive success in a cooperatively breeding fish

  1. Ines Braga Goncalves  Is a corresponding author
  2. Andrew N Radford
  1. School of Biological Sciences/Life Sciences, University of Bristol, United Kingdom

Abstract

Conflicts with conspecific outsiders are common in group-living species, from ants to primates, and are argued to be an important selective force in social evolution. However, whilst an extensive empirical literature exists on the behaviour exhibited during and immediately after interactions with rivals, only very few observational studies have considered the cumulative fitness consequences of outgroup conflict. Using a cooperatively breeding fish, the daffodil cichlid (Neolamprologus pulcher), we conducted the first experimental test of the effects of chronic outgroup conflict on reproductive investment and output. ‘Intruded’ groups received long-term simulated territorial intrusions by neighbours that generated consistent group-defence behaviour; matched ‘Control’ groups (each the same size and with the same neighbours as an Intruded group) received no intrusions in the same period. Intruded groups had longer inter-clutch intervals and produced eggs with increasingly less protein than Control groups. Despite the lower egg investment, Intruded groups provided more parental care and achieved similar hatching success to Control groups. Ultimately, however, Intruded groups had fewer and smaller surviving offspring than Control groups at 1-month post-hatching. We therefore provide experimental evidence that outgroup conflict can decrease fitness via cumulative effects on reproductive success, confirming the selective potential of this empirically neglected aspect of sociality.

Editor's evaluation

This paper experimentally investigates the fitness consequences of intergroup conflict in social fish. It finds that groups that face frequent territorial intrusion suffer costs in terms of both fertility and number of surviving offspring, despite behavioral compensation through increased parental care. These results provide clear and compelling evidence that intergroup conflict leads to lower fitness, and are therefore of substantial interest for understanding social evolution, where the importance of between-group competition has long been debated.

https://doi.org/10.7554/eLife.72567.sa0

Introduction

In social species, conspecific outsiders often challenge groups and their members for resources and reproductive opportunities (Birch et al., 2019; Braga Goncalves and Radford, 2019; Kitchen and Beehner, 2007; Radford, 2008; Thompson et al., 2017). This ‘outgroup conflict’—conflict with one or more outsiders, of which ‘intergroup conflict’ (that between two groups) is a subset—is theorised to be a driving force in the evolution of territoriality, social structure, group dynamics, and cooperation (Bowles, 2009; Gaston, 1978; Rusch, 2014; Wrangham, 1980). Empirical research in non-human animals has traditionally focussed on aggressive outgroup contests, such as factors determining which individuals participate, their level of contribution, and who wins (Arseneau-Robar et al., 2016; Green et al., 2020; Kitchen and Beehner, 2007), as well as the immediate fitness costs arising from loss of life, breeding position, or territory (Goodall, 1986; Morris-Drake et al., 2022; Spong et al., 2008). Recently, studies have begun to explore the effects of outgroup conflict beyond periods of active confrontation, documenting short-term behavioural changes (e.g. increased within-group affiliation) in the aftermath of single interactions with rivals (Birch et al., 2019; Braga Goncalves and Radford, 2019; Mirville et al., 2020; Preston et al., 2020; Radford, 2008). However, the cumulative build-up of threat posed by outsiders is also likely a potent stressor (Eckardt et al., 2016; Samuni et al., 2019) and may disrupt within-group relationships (Anderson et al., 2020; Hellmann and Hamilton, 2019), potentially generating long-term fitness consequences even in the absence of immediate direct effects from individual aggressive contests (Braga Goncalves et al., 2022; Morris-Drake et al., 2022). Three observational studies have found associations between outgroup conflict and reproductive success, beyond the direct death of offspring during aggressive interactions: greater intergroup conflict during pregnancy was associated with improved foetal survival in both crested macaques (Macaca nigra) and banded mongooses (Mungos mungo) (Kerhoas et al., 2014; Thompson et al., 2017), but with longer inter-birth intervals and reduced infant survival in chimpanzees (Pan troglodytes verus) (Lemoine et al., 2020). However, to test a causal link between outgroup conflict and reproductive success—that is to rule out potential confounding explanations associated with natural observations—experiments are needed.

Here, we use the daffodil cichlid (Neolamprologus pulcher), a model species for the study of sociality, to test experimentally the reproductive consequences of chronically elevated outgroup conflict. N. pulcher is a highly territorial, cooperatively breeding fish species native to Lake Tanganyika. In the wild, groups comprising a breeding pair and 0–20 subordinates of both sexes (Wong and Balshine, 2011) defend territories from predators, heterospecific competitors, and conspecific intruders (Desjardins et al., 2008; Taborsky and Limberger, 1981), with multiple small, contiguous territories often clustered together (Taborsky, 1984). Although individuals develop dear–enemy relationships with neighbours (Frostman and Sherman, 2004; Sogawa et al., 2016), aggressive disputes at shared borders are common (Balshine et al., 2001) and intruding neighbours can be attacked by all group members (Balshine-Earn et al., 1998). Single intrusion events are known to cause short-term changes in within-group behavioural interactions (Braga Goncalves and Radford, 2019; Bruintjes et al., 2016; Taborsky, 1985). Crucially, N. pulcher is a highly tractable experimental system—they are easily maintained in captive conditions, where groups display natural behaviour and breed regularly (Heg and Hamilton, 2008; Jindal et al., 2017; Wong and Balshine, 2011)—allowing controlled manipulations and detailed monitoring over extended periods.

To investigate the cumulative effect of elevated outgroup conflict on reproductive rate, investment (in eggs and parental care) and output, we simulated intrusions by neighbours at territorial borders in two long-term, laboratory experiments. We predicted that because repeated territorial intrusions are likely stressful and may destabilise social groups (Arseneau-Robar et al., 2016; Morris-Drake et al., 2022), there would be effects on breeding and investment decisions as parents are selected to respond to prevailing ecological (environmental and social) conditions (Carlisle, 1982; McGinley et al., 1987; Roff, 1992). For instance, organisms faced with prolonged challenging conditions may maximise survival at the cost of other functions such as reproduction (Beldade et al., 2017; Jørgensen et al., 2006; Love and Williams, 2008; Schreck et al., 2001), so we predicted a negative impact of outgroup conflict on reproductive rate in terms of spawning likelihood, latency to spawn, number of clutches produced, and inter-clutch interval. Female fish can modify egg investment depending on current intrinsic (e.g. maternal state and stress) and extrinsic (e.g. ecological) factors (Faria et al., 2018; McCormick, 1998; Schreck, 2010), with trade-offs between egg number and quality apparent in stressful conditions (Faria et al., 2018). So, we predicted an effect of outgroup conflict on egg number, size (weight and volume), and nutritional (lipid and protein) content. Parental care extends the opportunity for adjustment of reproductive investment, allowing compensation for lower offspring quality (Carlisle, 1982; Clutton-Brock, 1991) and poorer environmental conditions (Östlund and Ahnesjö, 1998). So, we predicted an effect of outgroup conflict on clutch visits and egg caring effort. Since maternal and early-life stress, initial egg size and quality, and parental care can all influence offspring characteristics (Bell et al., 2016; Boogert et al., 2013; Jonsson and Jonsson, 2014; McCormick, 1998), we predicted that outgroup conflict would directly or indirectly have a negative effect on reproductive output in terms of offspring survival, behaviour, and size.

Results

In both our experiments (which used different fish), 30 tanks with breeding shelters were arranged linearly in triplets: the end ‘focal’ tanks (one for each of two treatments: 10 ‘Intruded’ and 10 ‘Control’ tanks per experiment) contained a dominant pair and a subordinate; the middle tank contained a dominant pair who comprised the shared neighbours used as intruders (Figure 1A). Within each triplet, the same-sex dominants in all tanks and the subordinates in focal end tanks were size matched (using standard body length) at the start (see Materials and methods). In Experiment I (the longer of the two experiments; see below), focal individuals in the two treatments remained size matched at the end of the study, in both standard length (paired t-test, dominant male [DM]: t9 = 0.10, p = 0.920; dominant female [DF]: t9 = 1.67, p = 0.130; Wilcoxon signed-rank test, subordinate: V = 15.5, n = 12, p = 0.343) and body mass (paired t-test, DM: t9 = 0.64, p = 0.540; DF: t9 = 0.11, p = 0.914; Wilcoxon signed-rank test, subordinate: V = 18, n = 12, p = 0.156).

Experimental setup and timelines.

(A) Experimental setup showing a linearly arranged tank triplet, with size-matched experimental groups (Intruded and Control) at the ends and the tank with the neighbour pair at the centre. The female neighbour is presented in the Intruded group tank to illustrate an intrusion along the border of the resident group’s territory. (B) Timelines show measures of general reproductive behaviour (orange), egg and parental-care investment (green), and reproductive output (purple).

Each week in both experiments, Intruded groups experienced several 10-min territorial intrusions by one of their neighbouring individuals (mean ± standard error [SE] intrusions per week, Experiment I: 4.8 ± 0.1; Experiment II: 3.9 ± 0.3); both neighbouring individuals were used for intrusions pseudo-randomly (see Materials and methods). During intrusions, the focal group and the intruder were physically separated by a transparent partition that precluded adversaries from injuring each other and cannibalism of focal group young by intruders. Focal groups typically respond to the presence of intruders with aggressive behaviours, including attacks (rams and bites) and displays (aggressive postures, frontal displays, and fast approaches); intruders reciprocate with aggressive displays and attacks, but also with submissive postures and displays and by turning away from the focal group (Reddon et al., 2015; Sopinka et al., 2009). Control groups did not receive intrusions but did experience the same procedural disturbances as Intruded groups, in terms of the placement and removal of barriers (see Materials and methods). The aim of Experiment I was to assess the effects of chronic outgroup conflict on reproductive rate, parental care of eggs, hatching success, and offspring survival, behaviour, and size. So, following the first three intrusions (and equivalent time in Control groups), we allowed each group to raise all clutches produced over a 13-week period (Figure 1B). Experiment II was a complement to Experiment I, aiming to assess how chronic outgroup conflict impacts reproductive investment in terms of egg size and nutritional content (Figure 1B); these are measures that can only be obtained by removing clutches. We therefore collected the first clutch produced by each group following 13 days of treatment, which represents approximately half of a reproductive cycle (Jindal et al., 2017); the experiment ended for a focal group once that clutch was collected. For all response measures analysed, we provide in the main text the effect of treatment (Intruded and Control) and, where significant or near-significant, the interaction between treatment and treatment duration, as the factors of core interest. Full model outputs including all tested variables (significant or not) are detailed in the Supplementary files.

Defensive behaviour

To assess whether the presence of a neighbour in their territory was perceived as an intrusion, we recorded defensive behaviour by focal groups in Experiment 1. Intruded groups exhibited 4.5 times more defensive actions than Control groups at the start of the study (Wilcoxon signed-ranked test: V = 0, n = 20, p = 0.002; Figure 2A); this difference persisted until the end of the experiment (4.7 times more defence; paired t-test: t9 = 4.33, p = 0.002; Figure 2A). Our protocol was therefore likely to have elevated the perceived level of outgroup conflict throughout the experimental period, as intended. Within the Intruded treatment, dominant individuals but not subordinates, increased their defensive efforts between the start and the end of the study (Wilcoxon signed-ranked test, DF: V = 4.5, n = 20, adjusted-p = 0.043; paired t-test, DM: t9 = 3.50, adjusted-p = 0.020; subordinates: t5 = −0.05, p = 0.963; Figure 2B); this pattern was not driven by any significant changes in intruder responsiveness to the focal group (t9 = 0.55, p = 0.594). Greater defensive efforts by dominant individuals at the end of the study may be a by-product of age- or size-mediated behavioural changes—a positive effect of body size on defensive efforts has been documented in at least dominant males in this species (Ligocki et al., 2019)—or may have been due to the presence of offspring that need safeguarding from intruders (Dyble et al., 2019).

Defensive actions.

(A) Group defensive actions displayed per 10-min trial, towards a transparent partition (blue) or the intruding female neighbour (red) in weeks 1 and 11 of Experiment I (N = 20 groups). (B) The number of defensive actions displayed by the dominant female (DF, N = 10), dominant male (DM, N = 10), and subordinate (S, N = 6) towards the intruding female neighbour during weeks 1 (green) and 11 (yellow) of Experiment I. Boxplots show medians, 25% and 75% quartiles, and whiskers representing 95% confidence intervals; dots are raw data, with lines connecting matched groups (A) or repeated measures on individuals (B).

Figure 2—source data 1

Number of group defensive actions displayed per 10-minute trial, towards a transparent partition (Control treatment) or the intruded female (Intruded treatment) in weeks 1 and 11 of Experiment I (n = 20 groups); and number of defensive actions displayed by the dominant females (DF, n = 10), dominant male (DM, n = 10), and subordinate (S, n = 6) per 10-minute trial, towards the intruding female neighbour during weeks 1 and 11 of Experiment I.

https://cdn.elifesciences.org/articles/72567/elife-72567-fig2-data1-v1.xlsx

Reproductive rate

In Experiment I, repeated territorial intrusions did not significantly affect the likelihood of spawning (McNemar test: χ21 = 0.36, p = 0.547), the latency to first spawn (linear mixed model [LMM]: χ21 = 1.02, p = 0.312; Supplementary file 1a) or the number of clutches produced (Wilcoxon signed-rank test: V = 15, N = 10, p = 0.396). However, Intruded groups had inter-clutch intervals that were 40% longer than Control groups (LMM: χ21 = 3.89, p = 0.049; Supplementary file 1b; Figure 3A). This finding aligns qualitatively with previous work on N. pulcher, where higher neighbour densities were associated with greater spawning latencies (Taborsky et al., 2007), and with studies on different species documenting how various chronic stressors may negatively impact fish reproductive rates (Ali and Wootton, 1999; Mileva et al., 2011). Longer inter-clutch intervals likely result in fewer breeding attempts per season in the wild, which is why inter-birth interval is a life-history trait commonly used to assess female reproductive success across taxa (Clutton-Brock et al., 1984).

Effects of chronically elevated outgroup conflict on reproductive rate and egg investment.

Effects of chronically elevated outgroup conflict (red) relative to control conditions (blue) on: (A) inter-clutch interval (N = 21 intervals); (B) clutch size (N = 34 clutches); (C) egg volume (N = 15 clutches); and (D) egg protein content (N = 15 clutches). All panels depict predicted means and associated 95% confidence intervals; dots are raw data.

Figure 3—source data 1

Inter-clutch intervals (days, n = 21 intervals); clutch size (number of eggs, n = 34 clutches); volume of eggs (mm^3, n = 15 clutches); and egg protein content (micrograms, n = 15 clutches), of clutches produced by Control and Intruded groups.

https://cdn.elifesciences.org/articles/72567/elife-72567-fig3-data1-v1.xlsx

Investment in eggs

In Experiment I, there was a near-significant effect of the interaction between treatment and treatment duration on clutch size (LMM, parameter estimate [PE] = 0.85, 95% confidence interval [CI] = −0.04 to 1.70, χ21 = 3.55, p = 0.059; Supplementary file 2). Against our expectation, Intruded females showed a tendency to produce larger clutches over time (i.e. with longer exposure to outgroup conflict), relative to Control females (Figure 3B). As conditions that reduce the probability of future reproductive success may select for increased current reproductive investment (Carlisle, 1982; McGinley et al., 1987; Olofsson et al., 2009), it is possible that the repeated territorial invasions adversely impacted the perceived expected future reproductive success of at least the dominant females.

Experiment II allowed us to consider how increased outgroup conflict affected egg size and nutrient allocation because we removed clutches for detailed assessment. There was no significant effect of treatment on egg weight (LMM: χ21 = 0.85, p = 0.356; Supplementary file 3a). However, egg volume was affected by the interaction between treatment and treatment duration (χ21 = 4.34, p = 0.037; Supplementary file 3b; Figure 3C): females who took longer to spawn produced eggs with larger volume in the Control treatment (PE = 0.05, 95% CI = 0.02–0.09, χ21 = 7.30, p = 0.007), but there was no significant effect in the Intruded treatment (PE = 0.009, 95% CI = −0.019 to 0.037, χ21 = 0.51, p = 0.474). There was no significant treatment difference in egg lipid content (χ21 = 0.07, p = 0.797; Supplementary file 3c). However, egg protein content was affected by the interaction between treatment and treatment duration (χ21 = 18.50, p < 0.001; Supplementary file 3d; Figure 3D): mean egg protein content decreased with treatment duration in the Intruded treatment (PE = −0.97, 95% CI = −1.65 to −0.30, χ21 = 6.30, p = 0.012), but did not change significantly in the Control treatment (PE = 0.41, 95% CI = −0.47 to 1.30, χ21 = 0.99, p = 0.320). Previous work on N. pulcher described a positive correlation between time taken to produce a clutch and mean egg size (Taborsky et al., 2007). Outgroup conflict thus had an increasingly (across the period of intrusions) disruptive effect on this relationship, as well as causing a progressive decline in egg quality, at least with respect to protein content. It is possible that the much higher egg concentrations of protein cf. lipids made it easier to detect an effect in just the former, but reductions in just egg protein content have also been reported in female eastern fence lizards, Sceloporus undulatus, exposed experimentally to elevated glucocorticoids during egg production (Ensminger et al., 2018). Protein is often the most abundant dry constituent of eggs (Blaxter, 1969), provides amino acids for tissue growth and energy via catabolic processes (Blaxter, 1969), and has been found to be positively correlated with fertilisation, hatching success, and early-life survival (Kamler, 2005).

Investment in parental care

In Experiment I, repeated territorial intrusions had an increasingly positive effect on parental-care behaviour, as measured from the number of clutch visits (as a proxy for protection effort) and the combined number of egg-fanning and egg-cleaning events (as an estimate of caring effort) (Supplementary file 4). The time spent on such activities was similarly affected (Supplementary file 5) but we present just the results relating to the number of events here. The number of clutch visits was affected by the interaction between treatment and treatment duration (generalised linear mixed model [GLMM]: χ21 = 5.17, p = 0.023; Supplementary file 4a; Figure 4A): clutch visits increased over time in the Intruded treatment (PE = 0.01, 95% CI = 0.003–0.20, χ21 = 5.77, p = 0.016), but did not change significantly in the Control treatment (PE = 0.001, 95% CI = −0.008 to 0.005, χ21 = 0.13, p = 0.719). Similarly, the number of care behaviours performed was affected by the interaction between treatment and treatment duration (GLMM: χ21 = 4.54, p = 0.033; Supplementary file 4b; Figure 4B): caring behaviour increased over time in the Intruded treatment (PE = 0.02, 95% CI = 0.008–0.031, χ21 = 8.82, p = 0.003), but did not change significantly in the Control treatment (PE = 0.004, 95% CI = −0.004 to 0.011, χ21 = 0.85, p = 0.356). Our results contrast those of studies that have shown reductions in parental (Vitousek et al., 2014; Vitousek et al., 2018) and alloparental (Mares et al., 2012) care in response to immediate or short-term stressful situations but, as effects became evident over the course of the experiment, it is possible that acute and chronic stressors elicit opposing behavioural responses. The increased parental care in the Intruded treatment may have at least partly compensated for lower relative egg investment (see above) because there was no significant treatment difference in hatching success (χ21 = 0.08, p = 0.782; Supplementary file 6; Figure 4C). Similar hatching success does not, however, necessarily equate to similar offspring quality (Botterill-James et al., 2019).

Effect of chronically elevated outgroup conflict on parental-care investment and hatching success.

Effects of chronically elevated outgroup conflict (red) relative to control conditions (blue) on: (A) number of clutch visits (N = 33 clutches); (B) number of clutch caring (cleaning and fanning) events per 10 min observation (N = 33 clutches); and (C) offspring hatching success (N = 21 clutches). All panels depict predicted means and associated 95% confidence intervals; dots are raw data.

Figure 4—source data 1

Number of clutch visits (n = 33 clutches) and number of clutch caring evens (n = 33 cutches) provided to; and offspring hatching success (n = 21 clutches) of, clutches produced by Control and Intruded groups.

https://cdn.elifesciences.org/articles/72567/elife-72567-fig4-data1-v1.xlsx

Reproductive output

The absolute number of offspring surviving to 1 month in Experiment I was affected by the interaction between treatment and treatment duration (LMM: χ21 = 13.31, p < 0.001; Supplementary file 7; Figure 5A): the number of surviving young decreased with more exposure to outgroup conflict (Intruded groups; PE = −1.82, 95% CI = −2.50 to −0.99 χ21=9.70, p = 0.002), but there was no significant change over time in the Control treatment (PE = 0.01, 95% CI = −0.49 to 0.52, χ21 = 0.003, p = 0.956). The result was about 1.8 fewer offspring surviving to 1 month per experimental day in the Intruded treatment relative to the Control treatment. Neither food competition nor heterospecific predation can explain the lower offspring number in the Intruded treatment because all tanks were provided daily ad-lib food and there were no predators in our laboratory setting. Adults in the Intruded treatment may have displayed filial cannibalism (Jindal et al., 2017), but we are not aware of reports of within-group fry or juvenile cannibalism in this species. Alternatively, low early-life survival may have resulted from stress effects on parents that reduced the quality of offspring (Anderson et al., 2020; Beldade et al., 2017; Love and Williams, 2008; McCormick, 1998), stress transmission between group members that were particularly detrimental to young (Noguera et al., 2017), the direct early-life experience of outgroup conflict on offspring (Jonsson and Jonsson, 2014) or any combination of these factors.

Effect of chronically elevated outgroup conflict on reproductive output.

Effect of chronically elevated outgroup conflict (red) compared to control conditions (blue) on: (A) number of offspring surviving to 1 month (N = 32 clutches); (B) latency to first movement of offspring post-stimulus (N = 21 clutches); (C) mean offspring standard length (N = 24 clutches); and (D) mean offspring dry weight (N = 24 clutches). All panels depict predicted means and associated 95% confidence intervals; dots are partial residuals (A) or raw data (B–D).

Figure 5—source data 1

Number of offspring surviving to 1-month post hatching (n = 32 clutches); latency for first movement of offspring post-stimulus (n = 21 clutches) in a startling stimulus test; mean offspring standard length (n = 24 clutches); and mean offspring dry weight (n = 24 clutches), from clutches produced by Control and Intruded groups.

https://cdn.elifesciences.org/articles/72567/elife-72567-fig5-data1-v1.xlsx

To investigate offspring behaviour, we placed 5–10 1-month-old young from Experiment I clutches in a novel environment (20 × 20 cm container). Following a 30-min settling period, there was no significant treatment difference in offspring baseline activity level (proportion of active individuals, LMM: χ21 = 1.69, p = 0.194; Supplementary file 8a) or mean nearest-neighbour distance (χ21 = 0.11, p = 0.754; Supplementary file 8b). After this undisturbed period, we assessed the response to a startle stimulus; normally, on detecting danger, young N. pulcher swiftly sink to the bottom and remain motionless (Taborsky, 1984; Watve and Taborsky, 2019). Treatment had no significant effect on offspring latency to freeze immediately post-stimulus (χ21 = 0.06, p = 0.806; Supplementary file 8c). However, latency for the first offspring to move post-stimulus was affected by the interaction between treatment and treatment duration (χ21 = 4.84, p = 0.028; Supplementary file 8d; Figure 5B): post-stimulus latency to move decreased over time in the Intruded treatment (PE = −0.35, 95% CI = −0.68 to −0.07, χ21 = 5.43, p = 0.020) but there was no significant change in the Control treatment (PE = 0.29, 95% CI = −0.20 to 0.78, χ21 = 1.49, p = 0.223). The speed of return to activity is thought to determine vulnerability to predation, because prey movement enhances detection and attack rates by predators (Lima and Dill, 1990; Middlemis Maher et al., 2013).

Immediately following the behavioural tests, we sacrificed the offspring to assess standard length and dry weight. Compared to Control individuals, Intruded offspring were 19% shorter (LMM: χ21 = 4.85, p = 0.028; Supplementary file 9a; Figure 5C) and 30% lighter, though this was not statistically significant as a difference (χ21 = 2.89, p = 0.089; Supplementary file 9b; Figure 5D). The smaller size at 1 month of age indicates that outgroup conflict hinders early-life growth, despite unlimited access to food. Size in early life is considered a key determinant of survival (Ahnesjo, 1992; Candolin et al., 2022) due to, for instance, lower susceptibility to starvation (Marsh, 1998) and greater ability to escape predators (Ahnesjo, 1992), and of reproductive success in adulthood (Royle et al., 2005), although compensatory growth occurs in fishes (Eriksen et al., 2015; Royle et al., 2005). In the daffodil cichlid in particular, adult body size is a key determinant of lifetime reproductive success because it correlates positively with fecundity in females and with dominance acquisition in both sexes (Heg et al., 2011; Wong and Balshine, 2011).

Discussion

Our experimental results show that chronic elevation of outgroup conflict negatively affected inter-clutch intervals and investment in egg quality; whilst there was a compensatory positive influence on parental care, an increased outgroup threat ultimately resulted in fewer offspring of smaller size surviving to 1 month of age. Our findings are in-line with previous correlational work showing that greater levels of intergroup threat were associated with reductions in chimpanzee reproductive rate and offspring survival (Lemoine et al., 2020), but contrast the observational studies of crested macaques and banded mongooses that found a positive correlation between intergroup conflict and foetal survival (Kerhoas et al., 2014; Thompson et al., 2017). Tests of causal links between outgroup conflict and reproductive success have previously been lacking due to the inherent difficulties of carrying out long-term manipulations that allow cumulative fitness impacts to be assessed. We overcame this by using a model fish species that enables both extended manipulations in controlled conditions and the collection of behavioural and reproductive data (Wong and Balshine, 2011). In doing so, we demonstrated that chronic outgroup conflict likely negatively impacts the fitness of multiple generations: adults suffer reductions in current reproductive output, whilst the lower quality of surviving offspring means that they are potentially less likely to achieve dominance and thus direct reproductive success later in life (Royle et al., 2005; Wong and Balshine, 2011).

We uncovered outgroup-conflict effects in several aspects of reproductive investment and output that became stronger the longer the experimental treatment continued. These findings highlight that, to ascertain the full fitness consequences in social animals, it is important to consider both direct and indirect effects and to assess cumulative impacts in addition to those arising from single contests (Morris-Drake et al., 2022). Since outgroup conflict can be a potent social stressor, capable of stimulating acute and chronic stress responses (Samuni et al., 2019), stress is a likely mechanism underpinning the reproductive consequences seen. Studies examining natural stress responses (Beldade et al., 2017; Creel et al., 2009), and experimental manipulations of glucocorticoids (Ensminger et al., 2018; Eriksen et al., 2015; McCormick, 1998), have demonstrated how stress can affect adult reproductive measures in ways similar to those uncovered in our work. For instance, chronic anemone bleaching and predation risk led to increased levels of stress hormones and reduced reproductive rates in the anemonefish, Amphiprion chrysopterus, and in elk, Cervus elaphus, respectively (Beldade et al., 2017; Creel et al., 2009). Experimental increases in maternal cortisol levels negatively impacted egg protein content of Eastern fence lizards and larval length of the coral reef fish Pomacentrus amboinensis (Ensminger et al., 2018; McCormick, 1998). Higher rates of territorial defence have previously been associated with elevated cortisol levels in N. pulcher dominant females (Culbert et al., 2021), so treatment effects on maternal physiology could plausibly have contributed to the effects seen in our study. Likewise, early-life stressors can have long-lasting effects on offspring phenotypes and fitness (Antunes and Taborsky, 2020; Jonsson and Jonsson, 2014), so both direct and indirect effects may be important. Some of the fitness costs that we document here may also have been indirectly caused by outgroup conflict-mediated disruptions to within-group social relationships. Within-group aggression is promoted by the mere presence of neighbours in complex time-, rank-, and sex-dependent ways (Hellmann and Hamilton, 2019) and may be further dependent on the extent of interactions between residents and neighbours (Hamilton and Heg, 2005). Additionally, territorial intrusions can cause (at least) short-term changes to within-group social interactions (Braga Goncalves and Radford, 2019; Bruintjes et al., 2016). Cumulative effects of outgroup conflict are likely to have a wide influence, beyond reproductive success, and thus deserve further research attention moving forward (Morris-Drake et al., 2022).

There are obvious trade-offs in conducting captive versus field studies. In the context of cumulative outgroup-conflict consequences, long-term experimental manipulations are logistically and ethically challenging, which is why the few previous studies have reported only correlational data (Kerhoas et al., 2014; Lemoine et al., 2020; Thompson et al., 2017). Our captive experiments allowed for tight control of the environment and close monitoring of a variety of response measures, but consideration is needed of the ecological validity of the intrusion regime. Whilst intrusions in the wild might be, on average, shorter than our 10-min experimental presentations, there are several reasons to believe that outgroup conflict could be more severe in natural conditions. In the wild, N. pulcher groups are often surrounded by several adjacent groups (Taborsky, 1984), so that territories are intruded from any direction by neighbours and outsiders from further afield. Therefore, intrusions likely take place several times daily, requiring continuous vigilance and quick behavioural responses, risking potential physical injury and leaving residents more vulnerable to predation (Balshine-Earn et al., 1998; Balshine et al., 2001; Hess et al., 2016). Our setup also precluded individuals from injuring each other, the intruder from committing infanticide, sneaking a breeding attempt or taking-over part or all the territory, and individuals from having to trade-off foraging time against vigilance efforts (due to plentiful food availability), all in the absence of predation. Thus, relative to wild conditions, our laboratory intrusions were likely longer but less frequent and came from only one pair of neighbours, and posed relatively low threat to focal groups within an overall less challenging environment. Whilst we chronically increased the perception of outgroup conflict, allowing us to assess cumulative indirect effects of outgroup conflict (beyond injury, death, and infanticide), our findings are potentially conservative. Ultimately, future field experiments are needed to confirm the causal impact of chronic outgroup conflict on reproductive success, but many challenges will need to be overcome first.

We demonstrate that chronically increased outgroup conflict can reduce reproductive success, even in the absence of physical fights with rivals that may also result in injury, death, or loss of territory or breeding position (Goodall, 1986; Spong et al., 2008; Thompson et al., 2017), or cause offspring death (Dyble et al., 2019; Thompson et al., 2017). Outgroup conflict can clearly, therefore, influence fitness in myriad ways, lending support to the theory that it is likely a powerful selective force in social evolution with respect to, for example, group dynamics, social structure, and cooperation (Bowles, 2009; Gaston, 1978; Wrangham, 1980). Given the widespread taxonomic occurrence of outgroup conflict, we advocate further experimental testing of what is arguably the most neglected aspect of sociality.

Materials and methods

Fish husbandry

Request a detailed protocol

We used a captive population of daffodil cichlids, N. pulcher, at the University of Bristol; work was approved by the University of Bristol Ethical Committee (University Investigator Number: UB/16/049 + UB/19/059). All fish groups were housed in 70 l tanks (width × length × height: 30 × 61 × 38 cm) that formed their territory (as in Braga Goncalves et al., 2021; Braga Goncalves and Radford, 2019). Each tank contained 2–3 cm of sand (Sansibar river sand), a 75 W heater (Eheim), a filter (Eheim Ecco pro 130), a thermometer (Eheim), two flowerpot halves (10 cm wide) that served as breeding shelters at the centre of the territory, an artificial plant and a small tube hanging close to the water surface to provide extra shelter for the subordinate in focal groups. Fish were fed twice daily: alternating between frozen brine shrimp, water fleas, prawns, mosquito larvae, mysid shrimp, bloodworms, cichlid diet, spirulina, copepods and krill in the mornings from Monday to Friday; and dry fish flakes in the evenings and weekends. Water temperature was maintained constant (mean ± SE, Experiment I: 26.9 ± 0.08°C, Experiment II: 26.7 ± 0.10°C) and room lights were set on a 13L:11D hour cycle (daylight from 7 am to 8 pm). Water quality tests (pH, nitrates, nitrites, conductivity, and ammonia) and 10% water changes were performed weekly to maintain water quality levels.

Experimental setup

Request a detailed protocol

In both experiments, we organised tanks in triplets (N = 10), positioned in a line with the short sides about 0.5 cm apart, so that neighbours could always see each other. In each triplet, the central tank housed a breeding pair that was a common neighbour to the two focal groups (each comprising a breeding pair and an adult helper) on either side. Groups of three, although at the low range of natural group sizes for this species, are common in nature (Balshine et al., 2001) and frequently used in laboratory studies (Braga Goncalves and Radford, 2019; Fischer et al., 2017; Hamilton and Ligocki, 2012; Mileva et al., 2011; Zöttl et al., 2013). We formed groups separately (with different fish used for the two experiments) using standard procedures (Braga Goncalves and Radford, 2019) approximately 2 weeks prior to the start of each experiment. In Experiment I, the 20 focal groups included 12 with female helpers and 8 with male helpers; within a triplet, experimental groups had same-sex helpers. In Experiment II, the 20 focal groups all had female helpers. Female helpers were favoured because although all subordinates face increasing risks of eviction as they grow (Taborsky, 1985), males are more likely to parasitise reproductive events resulting in eviction from the group (Dierkes et al., 1999) rendering groups less stable, and group size has been shown to affect egg parameters (Taborsky et al., 2007). To minimise same-sex aggression and to aid individual identification, we ensured that each dominant was at least 5 mm larger than the same-sex subordinate in the focal group (mean ± SE size difference, Experiment I: 10.7 ± 1.0 mm; Experiment II: 22.4 ± 1.3 mm). To minimise potential size-difference effects between treatments, and control for the influence of female size on reproductive measures (Heg et al., 2011), we size-matched (in standard length) breeders in all tanks in a triplet to each other. At the start of Experiment I, focal-group dominant males were 55.2 ± 1.3 mm, dominant females were 53.4 ± 0.9 mm, subordinate males were 41.8 ± 0.9 mm and subordinate females were 44.3 ± 1.3 mm long, while males were 59.3 ± 2.6 mm and females were 54.8 ± 1.6 mm long in the middle-tank breeding pairs. At the start of Experiment II, focal-group dominant males were 73.1 ± 1.3 mm, dominant females were 64.4 ± 0.8 mm and subordinate females were 41.8 ± 1.0 mm long, while males were 71.4 ± 2.7 mm and females were 63.6 ± 1.8 mm long in the middle-tank breeding pairs.

For each triplet, we randomly allocated (by flip of a coin) the side tanks to one of two experimental treatments, Control and Intruded; the central tank provided the intruders used in the Intruded treatment. Experiment I spanned 13 weeks to allow for multiple spawning bouts; we destroyed clutches produced before groups had experienced at least three intrusions (Control = 1 clutch, Intruded = 2 clutches) as groups would likely not have been significantly affected by treatments within such a short timeframe. In Experiment II, following eight intrusions during the first 13 days, the first clutch produced by each group marked the end of treatment for that group; 2 weeks represents approximately half the time of a regular breeding cycle (Desjardins et al., 2011; Jindal et al., 2017). Eleven clutches were destroyed during the initial 13 days of treatment (Control = 8 clutches, Intruded = 3 clutches). Previous destruction of a clutch had no significant impact on egg dry weight (χ21 = 0.21, p = 0.650), egg volume (χ21 = 0.05, p = 0.833), lipid content (χ21 = 0.16, p = 0.692), or protein content (χ21 = 0.89, p = 0.345). Experiment II was run for 11 weeks to maximise the number of focal groups that produced a clutch.

Simulated territorial intrusions

Request a detailed protocol

We selected the side of the focal Intruded tank where each intrusion took place (near or far from the neighbour tank) and the identity of the intruder (male or female neighbour) pseudo-randomly: by flip of a coin, but no more than four of the same side or sex in a row. In the wild, dominant individuals of both sexes undertake regular forays to nearby territories (Jungwirth et al., 2015) and may, thus, return to their own territories from any direction. At the start of an intrusion (or equivalent in the Control tanks), we slid down one transparent and one opaque flexible partition (0.75 mm white ViPrint) through single-channel PVC tracks glued to the long walls, 8 cm from the tank edge, creating a side compartment at the edge of the territory of the focal group (Braga Goncalves et al., 2021; Braga Goncalves and Radford, 2019). Then, we netted out the pre-selected neighbour and placed it in the side compartment of the focal tank, obscured from view of the resident group for a 5-min settling period. Brief handling experiences do not affect behaviour adversely in this species (Braga Goncalves and Radford, 2019; Mileva et al., 2009). After the settling period, we removed the opaque partition in the focal tank to reveal the intruder to the resident group for 10 min, a trial duration that falls comfortably with the range (5–20 min) often used in laboratory and field studies in this species (Desjardins et al., 2008; Jungwirth et al., 2015; Reyes-Contreras et al., 2019). At the end of the intrusion period, we replaced the opaque partition in the focal tank and placed another opaque partition between the focal and neighbour tanks, netted the intruder and transferred it back to the neighbour tank out of sight of the focal group. Concurrently, in the matching Control group, we conducted the same sequence of placement and removal of opaque and transparent partitions as in the Intruded tank, but in the absence of an intruder. To minimise the impact of human presence on the behaviour of the experimental subjects, the experimenter hid behind a curtain during the period that the intruder was visible to the resident group.

In Experiment I, we filmed (Sony Handycam HDR-XR520) two simulated intrusions, one at the start (week 1) and another towards the end (week 11) of the study, to assess how focal groups responded to the presence of a neighbour in their territory, and how their response changed over time. For consistency, all filmed intrusions had the female neighbour as intruder placed on the side of the tank closest to the neighbour tank. Similarly, we filmed the behavioural responses of the Control groups to the presence of the transparent partition in their territory. Subordinate sample size was smaller (N = 12) at the end of the study due to deaths unrelated to the experiment (fish jumped out of tank) and evictions (Control = 4, Intruded = 4). Following previously established protocols for this species (Reddon et al., 2015; Sopinka et al., 2009), we recorded using JWatcher (v1.0; Macquarie University, Sydney, Australia) the frequencies of aggressive behaviours, including attacks (rams and bites) and aggressive displays (aggressive postures, frontal displays, and fast approaches), directed at the intruder or the transparent partition by each group member. From the Intruded videos, we also recorded intruder responsiveness towards the focal group, by assessing proportion of time active and facing the focal group, as in Braga Goncalves and Radford, 2019.

Reproductive rate

Request a detailed protocol

To assess the impact of outgroup conflict on the likelihood of spawning, latency to spawn, number of clutches produced, and mean inter-clutch interval (Experiment I), we scanned all focal tanks every day for new clutches.

Investment in eggs

Request a detailed protocol

To evaluate the impact of outgroup conflict on clutch size, we photographed all clutches in Experiment I (N = 34) the day after they were laid and counted the numbers of eggs using ImageJ (version 1.46r, National Institutes of Health, USA). Hatching success could only be assessed in a subset of clutches (N = 21) because the hatchlings of several clutches took cover on the sand or on the plant and could not be counted reliably.

To evaluate the impact of outgroup conflict on egg size and nutritional content, we separated from each clutch in Experiment II: (1) 10–12 eggs for assessment of egg size; (2) two samples of 6–11 eggs for lipid analysis; and (3) two samples of 5 eggs for protein analysis. Two clutches (1 Control, 1 Intruded) were laid over 2 days; we collected two samples for egg-size assessment from these clutches. Samples (2) and (3) were stored at −20°C until extraction.

Using the egg-size sample, we measured the length and width of each egg with a stereo microscope (×2 magnification) and a graticule. We then calculated the effective diameter of each egg (i.e. the diameter if it were perfectly round) as the cube root of the length multiplied by the square of the width, and used the effective diameter to calculate the volume assuming a spherical shape (as per Coleman, 1984). We used all egg volumes from each sample in the analysis. After the eggs were individually measured, all eggs from the same clutch were placed together in a petri dish with wax paper and dried in a heating cupboard at 70°C for 48 hr, weighed twice (ME5, Sartorius, Göttingen, Germany), returned to the heating cupboard for another 24 hr and weighed twice again; all four measurements were used in the analysis.

To assess the total lipid content of eggs, we used the colorimetric sulfo-phosphato-vanillin method for microquantities (Alqurashi et al., 2019). Briefly, we dried the samples at 70°C for 24 hr and weighed them twice to the nearest microgram (CPA26P, Sartorius, Göttingen, Germany). We transferred the samples into 15 ml round-bottom glass test tubes (16 × 150 mm) and crushed them with a glass rod before adding 10 ml of chloroform–methanol solution (1:1, vol/vol). We extracted 0.5 ml of supernatant from each sample into new test tubes and placed them in a dry bath (LSE single block digital; Corning Ltd, Barry, UK) at 100°C for 15 min to evaporate the solvent. After, we added 0.2 ml of sulphuric acid to the samples and placed them in the dry bath for a further 10 min. Once the samples had cooled to room temperature, we added 4.8 ml of vanillin- ortho-phosphoric acid 85% reagent (1.2 g/l), thoroughly mixed the samples for 30 s with a vortex mixer (Bibby Scientific, Stone, UK), and transferred 1 ml of the solution into 1 ml polystyrene semi micro cuvettes. Using a spectrophotometer (WPA Biowave UV/Vis; Biochrom Ltd, Cambridge, UK) at 525 nm, calibrated with a vanillin/phosphoric acid only blank, we took three absorbance readings from each sample and calculated their mean. The mean sample absorbance values were plotted against a standard curve to extrapolate their total lipid content (µg). The standard curve was prepared using eight serial dilutions of analytical soybean oil solution (0.917 g/ml, Sigma Aldrich) in methanol:chloroform (1:1).

To assess the total protein content of eggs, we used the Bradford method (Bradford, 1976) following specifications by Alasmari and Wall, 2020. The egg samples were transferred into 15 ml round-bottom borosilicate glass test tubes (16 × 150 mm) and crushed with a glass rod. We added 0.5 ml of phosphate buffer (100 mM of monopotassium phosphate [KH2PO4], 1 mM of ethylenediaminetetraacetic acid, and 1 mM of dithiothreitol dissolved in distilled water), mixed with an aqueous solution of potassium phosphate dibasic (K2HPO4) to achieve pH = 7.4 to dissolve the sample, followed by another 0.5 ml of buffer to clean the glass rod. After, we thoroughly mixed the samples for 30 s with a vortex mixer (Bibby Scientific, Stone, UK), we transferred 0.1 ml into new tubes, added 2.9 ml of buffer, and vortexed the new solutions for another 30 s to perform 1:30 dilutions. From these dilutions, we transferred 1 ml of the solution into a new tube, added 1 ml of Bradford reagent, and vortexed the resulting solution for 1 min. After waiting 5 min at room temperature, we transferred 1 ml of the solution into 1 ml polystyrene semi micro cuvettes and read the absorbance values on a spectrophotometer at 595 nm, calibrated with a buffer and Bradford reagent-only blank. Each sample was measured three times and the mean value was plotted against a standard curve to extrapolate the total protein content in the sample. The standard curve was prepared using eight serial dilutions of bovine serum albumin (1 mg/ml, Sigma) between 0 and 50 µl diluted in buffer to a total volume of 1 ml. Protein content estimates were multiplied by 30 to express protein content per egg (µg).

Investment in parental care

Request a detailed protocol

To assess the impact of outgroup conflict on parental care, we filmed (Sony Handycam HDR-XR520) each focal group in Experiment I for 10 min on the morning after a clutch was laid, before they experienced an intrusion that day. Parental care at the egg stage, consisting of fanning and cleaning the eggs, aids embryonic development and survival (Taborsky, 1984). From the videos, we recorded the number of clutch visits (visits to within a body length of the clutch) and parental-care behaviours (egg-cleaning and clutch-fanning combined) displayed by all group members, as well as the total time spent in clutch visits and in parental-care behaviour.

Reproductive output

Request a detailed protocol

To assess the impact of outgroup conflict on offspring survival, we visually counted the number of surviving fry on the day that they reached 1-month post-hatching in Experiment I (N = 32 broods). To count offspring numbers in larger clutches reliably, we temporarily divided the tanks into three sections using transparent partitions and counted the number of offspring in each section separately. All counts were done twice to confirm totals; where values differed, we counted young a third time and either took the confirmed number of young or calculated the mean number from the three counts.

To assess the impacts of outgroup conflict on offspring activity levels and response to a sudden stimulus, we tested offspring at 1-month post-hatching. At this age, offspring actively explore the territory (i.e. swim around the tank) and, when they perceive a conflict, they sink to the substrate and remain immobile to blend with the sandy background (Taborsky, 1984; Watve and Taborsky, 2019). A sample of 5–10 offspring (N = 21 clutches) was transferred from their home tank to a test container (20 × 20 × 10 cm), filled with 3 l of water from their home tank, and left to settle for 30 min before the commencement of a trial. Each trial was filmed (Sony Handycam HDR-XR520) from above. After an initial 5 min undisturbed, pre-stimulus period, we released a small glass marble in a 60-cm plastic tube placed at a right angle relative to and touching the side of the container, so that the marble produced sudden vibrations and noise as it hit the container. We then filmed offspring behaviour for a 5-min post-stimulus period. We calculated mean offspring activity level pre-stimulus by observing 3 s of film every 20 s and recording how many offspring were actively swimming during that period. We also took screenshots every 20 s during the 5 min pre-stimulus period, from which we measured the mean nearest-neighbour distance of the offspring. From the post-stimulus period, we recorded the latency for all individuals to become immobile in response to the stimulus and the latency for the first offspring to become active again.

After the startle-stimulus trials, we euthanised the offspring with an overdose of tricaine methanesulfonate (MS222, 12 ml/ 100 ml tank water) and stored them in a 30% ethanol freshwater solution, shown to be adequate for simultaneously preserving body tissues and minimising shrinkage (Gagliano et al., 2006). We measured offspring standard length (from the tip of the snout to the end of the caudal peduncle, ±0.01 mm) using Leica Application Suite (version 4.4.0 [Build:454], Leica Microsystems Limited, Switzerland) connected to a camera (Greenough Stereozoom ×0.8 manual) mounted on a stereo microscope (EZ4 HD, eyepiece ×10/21B). Immediately after measuring the offspring, we dried them for 36 hr at 70°C before weighing them individually on a Mettler scale (AE260, Delta Range, ±0.1 mg). We then used the individual measurements to calculate mean offspring standard length and dry weight per clutch.

Statistical analyses

Request a detailed protocol

All statistical analyses were conducted using RStudio (version 1.2.5033, RStudio, 2020). In Experiment I, we used paired t-tests or Wilcoxon signed-rank tests (depending on whether datasets met the assumptions of parametric testing or not, respectively) to assess intruder behaviour and defensive contributions between treatments and within individuals at the start and end of the study; the latter were adjusted using the Bonferroni–Holm sequential method (Dobson and Barnett, 2008; Holm, 1979). We used a McNemar test and a Wilcoxon rank sum test to assess treatment differences in the likelihood of spawning and in the number of clutches produced, respectively.

For all other analyses, we used mixed-effects models. We visually assessed model assumptions and performance using the package ‘performance’ (Lüdecke et al., 2021). Where appropriate, we fitted LMMs with Gaussian error distributions to the raw data (package ‘lme4’ version 1.1-21, 99). Where datasets did not fit the assumptions of linear models, we fitted appropriate GLMMs where possible: a GLMM with negative binomial error distribution and log-link function (package ‘glmmTMB’ version 1.1.2.3, Brooks et al., 2017) for the analyses of the number of nest visits and clutch caring events; and a GLMM with Gaussian error distribution and log-link function (package ‘lme4’ version 1.1-21, Bates et al., 2015) for the analyses of time spent on clutch care. We log10-transformed latency to spawn to conform with linearity assumptions. The analysis of number of offspring to survive to 1 month of age included clutch size at laying as an offset. All models included tank-triplet identity and, where relevant, focal-group identity nested within tank-triplet identity as random factors to control for the shared neighbours within each triplet and for repeated observations for groups, respectively. Our main fixed factors of interest were treatment (Intruded and Control) and its interaction with treatment duration (covariate); we controlled for the effects of several covariates in different models as appropriate (full details of factors used in each analysis are provided in the Supplementary file 1).

In each model, we assessed term significance by comparing a model with and without the specific term using likelihood ratio tests (chi-square tests using R function ‘anova’; Dobson and Barnett, 2008). Non-significant interaction terms were removed to enable us to assess the effects of the main factors independently (Dobson and Barnett, 2008; Engqvist, 2005); the resulting final models contained all mains factors and significant interaction terms. The effects of significant interactions between treatment and treatment duration were teased apart by analysing the effect of treatment duration on each treatment separately. In the main text, we provide significant PEs and associated 95% CIs or treatment effect sizes of the main factors of interest (i.e. treatment and its interaction with treatment duration); all PEs, CIs, and associated statistical outputs are provided in the Supplementary files.

Data availability

Source data files have been uploaded for each of our figures and all other datasets are uploaded as Supplementary files.

References

  1. Book
    1. Blaxter JHS
    (1969) 4 development: eggs and larvae
    In: Hoar WS, Randall DJ, editors. Fish Physiology. Academic Press. pp. 177–252.
    https://doi.org/10.1016/S1546-5098(08)60114-4
  2. Book
    1. Goodall J
    (1986)
    The Chimpanzees of Gombe: Patterns of Behavior
    Belknap Press.
    1. Holm S
    (1979)
    A simple sequentially rejective multiple test procedure
    Scandinavian Journal of Statistics 6:65–70.
  3. Book
    1. Roff DA
    (1992)
    The Evolution of Life Histories: Theory and Analysis
    Chapman & Hall.
    1. Taborsky M
    2. Limberger D
    (1981) Helpers in fish
    Behavioral Ecology and Sociobiology 8:143–145.
    https://doi.org/10.1007/BF00300826

Decision letter

  1. Jenny Tung
    Reviewing Editor; Duke University, United States
  2. Detlef Weigel
    Senior Editor; Max Planck Institute for Biology, Tübingen, Germany
  3. Mark Dyble
    Reviewer; University College London, United Kingdom

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Experimental evidence that chronic outgroup conflict reduces reproductive success in a cooperatively breeding fish" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Detlef Weigel as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Mark Dyble (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Eliminate stepwise model selection procedure to minimize the risk of false positives. Report either betas and p-values from the full model, or via removal of the effect of interest from the full model, followed by comparison of model fit (e.g., using likelihood ratio tests).

2) Explain the experimental paradigm early in the manuscript to facilitate interpretation by readers. Key points include the fact that the intrusion paradigm involved no actual physical contact, and that in Experiment 2, all clutches prior to Day 13 were destroyed. Clearly motivate the analyses that are later presented in the Results section by reference to description of the experimental paradigm.

3) Intergroup conflict effects on the number of surviving offspring, the closest proxy to fitness used here, are one of the most important results of the paper. If possible, present a straightforward comparison of this outcome for the treatment versus control groups across the duration of the experiment, along with a simple analysis of the magnitude of this difference (e.g., using a paired t-test). Clarify the degree to which conflict effects on surviving offspring are magnified over time (e.g., in terms of the % of additional surviving offspring in the control group, across a given number of days since the start of the experiment).

4) Pay close attention to the reviewer recommendations on statistical analysis below, especially consistency in declaring effects "significant"/"near significant" and reporting of "near-significant" effects.

Reviewer #2 (Recommendations for the authors):

– To address my concern about stepwise model reduction, the authors should report the significance of each effect by comparing the fit of a full model to that of a model without that effect. A second option would be to remove only non-significant interaction terms, if the goal is mainly to investigate the fixed effects (Enqvist 2005 An. Beh., but see Shielzeth 2010 Meth. Ecol. Evo.). I often take the second approach in my own work, but am realizing that the first approach may be more appropriate.

– To address my concern about a lack of clear hypotheses, the authors could outline why each specific metric was measured and why each relates to fitness early on (e.g., at the beginning of each paragraph or at the end of the Introduction).

Comments below are line-by-line

Title: "outgroup conflict" might be a less widely-used term than "intergroup conflict" in this literature. For example, the titles of most of the first 10 references in this paper use "intergroup" instead of "outgroup". Consider using "intergroup" instead of "outgroup".

– Line 43: consider citing Green et al., 2020, which is a review of intergroup contest studies and how they focus on factors like the determinants of intergroup contest success. Green PA, Briffa M, Cant MA. 2020. Assessment during intergroup contests Trends Ecol Evol. 36(2):139-150.

– Line 47: consider citing Preston et al., 2020, which looked at short-term and longer-term (3 days) changes in grooming and aggressive behavior within groups after simulated intergroup conflict. Preston EFR, Thompson FJ, Ellis S, Kyambulima S, Croft DP, Cant MA. 2020. Network‐level consequences of outgroup threats in banded mongooses: Grooming and aggression between the sexes. J Anim Ecol.

– Lines 54-55: I had trouble telling whether there was a timescale difference in these three studies. Did the studies truly differ in a focus on "foetal survival" (first two) versus "infant survival" (third)?

– Lines 56-57: It would be useful to have some more detail about why we really need these experiments. What is missing from those observational studies that an experimental study could really help us learn? Put another way, can you give more details on the importance of this "causal link"?

– Lines 88-96: It's unclear to me from these results and Figure 2 why a Wilcoxon test versus a t-test was used for certain tests. My assumption is that this relates to the normality of the data distributions. Perhaps a simpler way to get at the same question would be to build a linear mixed model predicting # of defensive behaviors from treatment, time, and individual type (dominant male/female, subordinate)?

– Results in general: I'm a bit confused as to why significant effects are reported for certain interaction terms, and then also for each level of the interaction. For example, lines 180-184 detail the significant interaction (and associated P-value from a Chi-squared test), as well as reporting t-values and P-values from each level of the interaction (e.g., treatment duration effects within each treatment). I believe just reporting the P-value from the Chi-squared test for the interaction term shows that the interaction is significant, which should then be followed by simpler reporting of the estimate for each level of the interaction. It seems unnecessary to test whether each level of the interaction is also significant.

– Results in general: There are several examples where the authors report "trending" or nearly-significant results that match their predictions (e.g., lines 161-163; 193-194; 216-217; 266). However, there are also instances in the reporting of the statistical models in which similarly trending factors were removed from statistical models (e.g., Table S2; S4; S6; S8; S9). I believe some of these removed values were reported in the results (e.g., line 266 matches to Table S9). However, it is confusing to understand why certain terms were removed from the statistical models, especially if they are reported in the Results (i.e. we can assume the authors consider these are important). See above for comments regarding statistical analyses.

– Figure 3 and all scatterplots: are these lines showing best fit from the LMM, i.e. the estimates from the LMM? Or are they plotting simply using a function like geom_smooth in ggplot? Showing the actual predicted estimates from the model on the scatterplot would give the reader a better idea of both the raw data (from the points) and the model estimates (from the lines and associated ribbons of confidence intervals). There are also methods to do this for boxplots. All of these methods can be accomplished using the "effects" package in R.

– Lines 166-168: This is a bit confusing, because I thought the protein and lipid analyses were only conducted in Experiment II on eggs from the first clutch? So, how can conclusions about protein/lipids in the eggs from later clutches be made if only the first clutch was analyzed? Or, is this because clutches were analyzed across groups, i.e., several clutches for a given focal group were not analyzed, but across the entire experiment the first clutch from each group gave a time effect?

– Figure 4B: There appears to be an outlier with an incredibly high number of cleaning events at approximately 60 days treatment duration in the Intruded group. It would be useful to know if there are any biological differences between this observation and the others. Also, if this is a statistical outlier, it would be useful to know whether removing it affects the results. While I don't suggest the removal of statistical outliers simply because they're outliers (i.e., there should be a biological reason for data to be removed), it's important to take not of any outlier values like this.

– Lines 219-224: I might be being dense here, but I'm unsure of what this sentence is saying. How is the relationship (or lack thereof) between the number of hatched fry and the number of offspring reaching 1 month of age not offspring survival? It seems from Table S7 that there was a difference between offspring survival to 1 month and survival from hatching to one month. I guess I'm having trouble understanding why we need both of these measures.

– Lines 257-259: Could another interpretation of the return to activity data be regarding boldness? That is, offspring from Intruded groups were bolder later on in the experiment?

– Lines 277-280: However, your results are counter to those of other studies you described in the Introduction, which found no or positive effects of intergroup conflict on reproduction and survival. Consider re-reporting that work here, as well, for consistency and clarity.

– Paragraph lines 311-340: I'm a bit confused as to the point of this paragraph. I understand the authors may want to compare the pros and cons of their experimental approach to a more field-based, correlational approach. However, the result of this comparison mainly convinces me that the experiment under-stressed the study animals, as compared to what they might experience in the wild. Further, the reporting of the lack of impacts on adult body mass, egg mass, etc… essentially make the point that the treatment did not affect groups much. This is counter to the rest of the paper's point, showing that groups were significantly affected. Do the authors have any hypotheses as to why they found specific results for the variables they did (e.g., egg protein, but not egg mass)? Do the authors have any thoughts or data on the relative reproductive success of these experimental groups compared to groups in the wild? I.e., are these experiments actually replicating stressors in the wild?

Methods line-by-line

– Lines 408-411: If one of the points of the present experiment was understanding the effects of short- versus long-term intergroup conflict, why destroy clutches produced before three intrusions? Wouldn't these give valuable information on short-term effects (or the lack thereof)?

– Line 549: was the number of offspring in the experiment considered as a factor predicting responses? In Table S8, I believe "number of offspring" is used as a fixed effect, but I was unclear as to whether that was number of offspring produced, or number used in the startle experiment. One might expect that the experimentally-designed groups with 10 offspring were different from those with 5.

– Lines 550-551: Was there any sand in the bottom of this startle tank? I would imagine that offspring sinking responses, if meant to blend into the substrate, would be different if there were no substrate to blend in to.

– Lines 585-589: Stepwise model reduction has come under heavy criticism in the past 15 years. See, for example, Forstmeier and Schielzeth (2011), Mundry and Nunn (2009), and Harrison et al., (2018). This process can greatly increase the likelihood of false-positive results, and can inflate parameter estimates. From my knowledge, removal of non-significant interaction terms is OK, if one is interested in achieving accurate parameter estimates for main effects. However, it is rarely advised to remove main effects. Ideally, the authors would report significance tests from the full model, or use information-theoretic approaches. However, the authors could consider removing only non-significant interaction terms, as a means of getting better estimates for the main effects (Engqvist 2005).

Forstmeier W, Schielzeth H. 2011. Cryptic multiple hypotheses testing in linear models: Overestimated effect sizes and the winner's curse. Behav Ecol Sociobiol. 65(1):47-55.

Mundry R, Nunn CL. 2009. Stepwise model fitting and statistical inference: Turning noise into signal pollution. Am Nat. 173(1):119-123.

Harrison XA, Donaldson L, Correa-Cano ME, Evans J, Fisher DN, Goodwin CED, Robinson BS, Hodgson DJ, Inger R. 2018. A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ. 6:e4794.

Engqvist L. 2005. The mistreatment of covariate interaction terms in linear model analyses of behavioural and evolutionary ecology studies. Anim Behav. 70(4):967-971.

– Lines 603-607: I would appreciate learning more about this approach from the authors. It's very unclear to me why you wouldn't simply fit an appropriate response variable distribution in the initial model fitting step. For example, why not just use a Poisson distribution for the number of caring events response variable? This reads as though all response variables were forced into a Gaussian distribution. Other distributions are appropriate for other types of data, and do not require this step. I might be misunderstanding, however.

– Lines 609-612: I'm unclear as to why these clutches with zero hatching success were removed. The reasons these clutches had zero hatching success seem incredibly relevant to the study: if a clutch was eaten it seems evidence of cannibalism by the focal group (see lines 227-230 for a statement relevant to cannibalism in this species). If the clutch had a fungal infection, it would seem that the focal group's lack of care could have resulted in an infection.

Reviewer #3 (Recommendations for the authors):

My recommendations are superficial. My main comment is that I was frustrated by the cursory description of the experimental set up prior to the results. I know the eLife format is for the methods to go at the end of the manuscript but I really do think that at the very least the authors ought to explain how the simulated intrusions were enacted. The results didn't really come in to focus for me until after I had read the methods.

Line comments:

L70: on natural behaviour in the lab: have multiple groups been kept in the same tank and formed territories? If so, would be worth mentioning.

L84: This is where I would like to know how these territorial intrusions were simulated.

L100: In general, the paper is clearly written. However, I found this paragraph confusing and had to re-read several times. The description of the experimental set-up in the methods, in contrast, reads very clearly.

L126: '40%'. Can you give a 95% CI here?

L423-441: This is the critical detail on how the incursions were set up that I would have liked to see in the main text.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Experimental evidence that chronic outgroup conflict reduces reproductive success in a cooperatively breeding fish" for further consideration by eLife. Your revised article has been evaluated by Detlef Weigel (Senior Editor) and a Reviewing Editor.

The manuscript is much improved and close to publication-ready. However, the reviewers offered a few suggestions that may be helpful in communicating with your audience, so we are returning the draft to you to address these comments at your discretion.

1. Addressing Reviewer 2's notes about consistent reporting of significant parameter estimates.

2. Clarifying the set of behaviors coded as "defensive action".

Reviewer #2 (Recommendations for the authors):

Overall, I think the authors have done a great job of addressing my and the other reviewers' concerns. I am glad that the statistical suggestions have helped simplify the message, which is still an engaging and exciting one. I also greatly appreciate the laying out of predictions and the details of the methods early on. This will hopefully help readers get oriented. I have one main point, which I lay out first, and then a few line-by-line comments. I don't believe any of these comments are likely to change the main message of the manuscript.

Main comment:

On line ~680-681 in the Methods, the authors say they "provide significant parameter estimates…in the main text." However, looking at the Supplemental Tables in comparison to what is reported in the main text, only some significant effects are reported in the main text, not all. I'm curious as to why this is. For example, in the analysis of inter-clutch interval (Supp Table 1), female size had a highly significant effect (P = 0.017) but only the effect of treatment was discussed in the Results (lines 180-181). Similarly, in the analysis of egg investment, only the near-significant effect of the interaction between treatment and treatment duration was discussed (lines 195-197), but not the significant fixed effect of treatment. Also, an effect of clutch size on protein content and an effect of treatment x female size on protein content (Supp Table 3) were unreported in the Results. If these significant, but unreported, effects are not of interest, then one might ask why they were included in the model in the first place (aside from where necessary, like the treatment main effect when there is an interaction of interest). Or, are these omissions due to space constraints? There is likely a simple way to include a fuller reporting of these results. E.g., around line 195 the authors could report that "…there was a significant effect of treatment duration on clutch size (STATS HERE). This was likely driven by the interaction between treatment and treatment duration, in which Intruded females…." Or, an alternative would be in the Methods to state clearly that the authors only report selected results relevant to their hypotheses, but that some significant findings are reported only in the Supplementary Tables.

Line by line:

Lines 103 and 107: For the other predictions the authors make directional predictions, e.g. a negative effect. Can they specify the direction of the effect here? Or, are there no directional predictions?

Lines 132-151: These details are incredibly helpful! Thanks for including them.

Figure 3D: There are two red points at the end of treatment duration, right in the path of the line for the control (blue) effect. Are these points miscolored? It could just be an incredible chance that the line for control goes right through these intruded points, but I would also expect the 95% CI ribbon for intruded to be wider if these points are from the intruded treatment.

Supp Table 4A, Figure 4A, and lines 656-663: I could be wrong, but I would imagine that the number of clutch visits analysis should also be done using negative binomial error distribution, or a Poisson distribution. It is also count data, just like the number of clutch caring events.

All figures with categorical predictors (e.g., "treatment"): It would be useful to include some type of asterisk system (e.g., * = P < 0.05, ** = P < 0.01, NS = P > 0.05…) for delineating P values in these figures. Especially in Figure 5, where 5C is significant (P = 0.028) and 5D is not (P = 0.089), the reader cannot see these differences at a glance. The authors do something like this in Figure 2, but not in other figures. Of course, we can discuss issues about whether P values really represent biological significance, but since that's the framework we're working under here, it would be useful to reflect in the figures.

Lines 332-336: Though I think this point is likely correct, the authors did not actually show how the experiment impacted the later dominance and reproductive success of the offspring. Therefore, I might suggest revising this to "…outgroup conflict likely negatively impacts the fitness of multiple generations…."

Methods lines 534-536: This is incredibly nit-picky, but I'm mostly curious. Why were there such varying numbers of eggs removed for the lipid versus protein analyses?

Reviewer #3 (Recommendations for the authors):

I enjoyed reading this paper again, and the authors now do a much better job of explaining the experimental set-up prior to the results. The only thing I still think is missing is a brief description (prior to the results) of how the animals behave during the experimentally induced conflicts and a definition of what "defensive actions" involve. A few small additions should suffice, and add important context. I would suggest promoting the descriptions of the behaviours coded as defensive (headbutting, displays, etc) from L514 to the results paragraph starting at L129. It would also be good to hear a little more about the behaviour of the "intruder". Did they frequently antagonise their neighbours, or exhibit aggressive displays or attacks?

Otherwise, the authors have satisfactorily addressed my comments and, for my part, I see no reason that this paper shouldn't be published – it is very interesting!

https://doi.org/10.7554/eLife.72567.sa1

Author response

Essential revisions:

1) Eliminate stepwise model selection procedure to minimize the risk of false positives. Report either betas and p-values from the full model, or via removal of the effect of interest from the full model, followed by comparison of model fit (e.g., using likelihood ratio tests).

We have revised our statistical analyses as requested. We have removed only non-significant interaction terms from our models (as per Enqvist 2005 and the suggestion of Reviewer #2 below), before reporting the statistical output from the remaining final models (lines 665–668). All factor significance assessments were done using likelihood ratio tests (LRTs) (lines 663–665). Throughout the main manuscript, we report parameter estimates and associated 95% confidence intervals, and the results of LRTs for treatment and, where significant or near-significant, its interaction with treatment duration (lines 144–147). When an interaction between treatment and treatment duration was found to be statistically significant, we further present the parameter estimates, confidence intervals and LRTs for each treatment separately (lines 668–670). Full outputs from analyses are presented in the Supplementary Tables (lines 147–148).

2) Explain the experimental paradigm early in the manuscript to facilitate interpretation by readers. Key points include the fact that the intrusion paradigm involved no actual physical contact, and that in Experiment 2, all clutches prior to Day 13 were destroyed. Clearly motivate the analyses that are later presented in the Results section by reference to description of the experimental paradigm.

We have included explanation of the experimental paradigm at the start of the Results section (line 114ff.), including key requested information that there could be no physical contact between focal groups and intruders (lines 130–132), and that in Experiment II clutches prior to Day 13 were removed (lines 141–144). We have included the reason for each of the experiments, in terms of what response measurements were obtained in each case (Experiment I: lines 135–137; Experiment II: lines 139–141).

3) Intergroup conflict effects on the number of surviving offspring, the closest proxy to fitness used here, are one of the most important results of the paper. If possible, present a straightforward comparison of this outcome for the treatment versus control groups across the duration of the experiment, along with a simple analysis of the magnitude of this difference (e.g., using a paired t-test). Clarify the degree to which conflict effects on surviving offspring are magnified over time (e.g., in terms of the % of additional surviving offspring in the control group, across a given number of days since the start of the experiment).

In redoing our analyses (see response to point 1 above), the storyline relating to number of offspring surviving became simpler: the absolute number of offspring surviving decreased with treatment duration in the Intruded treatment but not in the Control treatment (lines 260–265). This provides a very straightforward comparison (see Figure 5A). [NB It is not possible to use a simple t-test because several groups produced multiple clutches with offspring surviving to 1-month, hence our use of mixed models.] We have provided a simple indicator of the reduced offspring survival likelihood in Intruded groups cf. Control groups across time (lines 265–267).

4) Pay close attention to the reviewer recommendations on statistical analysis below, especially consistency in declaring effects "significant"/"near significant" and reporting of "near-significant" effects.

We have made sure that we are consistent in the presentation of results. We have made it clear that in the main text we are presenting the main treatment effect (whether significant or not) and, when significant or near-significant, the interaction between treatment and treatment duration (lines 144–147); these are the factors of core interest. All other factors in models are those that we are controlling for and so are provided in full in the Supplementary Tables (lines 147–148).

Reviewer #2 (Recommendations for the authors):

– To address my concern about stepwise model reduction, the authors should report the significance of each effect by comparing the fit of a full model to that of a model without that effect. A second option would be to remove only non-significant interaction terms, if the goal is mainly to investigate the fixed effects (Enqvist 2005 An. Beh., but see Shielzeth 2010 Meth. Ecol. Evo.). I often take the second approach in my own work, but am realizing that the first approach may be more appropriate.

We have taken the second approach suggested by the reviewer, because our main terms of interest were the factor ‘treatment’ and its interaction with the ‘treatment duration’, and thus we wanted to be able to assess to effect of treatment on its own, when interactions involving treatment were found to be not significant. We have explained this approach in the Methods (lines 663–670).

– To address my concern about a lack of clear hypotheses, the authors could outline why each specific metric was measured and why each relates to fitness early on (e.g., at the beginning of each paragraph or at the end of the Introduction).

We have included a new paragraph at the end of the Introduction (line 86ff.) providing explicit reasons for specific predictions relating to our response variables.

Comments below are line-by-line

Title: "outgroup conflict" might be a less widely-used term than "intergroup conflict" in this literature. For example, the titles of most of the first 10 references in this paper use "intergroup" instead of "outgroup". Consider using "intergroup" instead of "outgroup".

In our view, ‘outgroup’ conflict encompasses all conflicts involving the whole or some members of a focal group, and outsiders, whether they be a group or a single individual. We consider ‘intergroup’ conflict as a subset of outgroup conflict, in which two groups are in conflict. We have added an explanation of our terminology (lines 40–42). As our intruder was a single individual, we believe the term outgroup is the most appropriate and so have left that in the title.

– Line 43: consider citing Green et al., 2020, which is a review of intergroup contest studies and how they focus on factors like the determinants of intergroup contest success. Green PA, Briffa M, Cant MA. 2020. Assessment during intergroup contests Trends Ecol Evol. 36(2):139-150.

We have cited this review (line 47).

– Line 47: consider citing Preston et al., 2020, which looked at short-term and longer-term (3 days) changes in grooming and aggressive behavior within groups after simulated intergroup conflict. Preston EFR, Thompson FJ, Ellis S, Kyambulima S, Croft DP, Cant MA. 2020. Network‐level consequences of outgroup threats in banded mongooses: Grooming and aggression between the sexes. J Anim Ecol.

We have cited this paper (line 53).

– Lines 54-55: I had trouble telling whether there was a timescale difference in these three studies. Did the studies truly differ in a focus on "foetal survival" (first two) versus "infant survival" (third)?

We believe that there is a difference between these studies. Our understanding is that all three of them considered (at least as part of their studies) the impact of intergroup conflict during pregnancy (which is the focus of our sentence; lines 60). Whereas Kerhoas et al., 2014 and Thompson et al., 2017 found positive correlations of prenatal intergroup conflict with foetal survival, Lemoine et al., 2020 did not obviously consider that response measure but did find a negative correlation of prenatal intergroup conflict on inter-birth intervals and post-birth (infant) survival. So, we have left this sentence as is.

– Lines 56-57: It would be useful to have some more detail about why we really need these experiments. What is missing from those observational studies that an experimental study could really help us learn? Put another way, can you give more details on the importance of this "causal link"?

It is a fundamental tenet of scientific research that correlations found in observational data could arise due to confounding effects. For instance, a positive correlation between outgroup conflict and improved foetal survival might be because groups with better-quality territories engage in more interactions with neighbours (who are after those resources) but those resources mean that mothers are in better condition; outgroup conflict per se is not necessarily driving the reproductive effect seen. Experiments are needed to test causality because they can isolate the effect of a particular variable (in this case, outgroup conflict). We have added a sentence to the Introduction explaining the general need for experiments (lines 64–66).

– Lines 88-96: It's unclear to me from these results and Figure 2 why a Wilcoxon test versus a t-test was used for certain tests. My assumption is that this relates to the normality of the data distributions. Perhaps a simpler way to get at the same question would be to build a linear mixed model predicting # of defensive behaviors from treatment, time, and individual type (dominant male/female, subordinate)?

Yes, we chose the most appropriate statistical test (parametric vs non-parametric) depending on consideration of the relevant assumptions (lines 638–639). We were specifically interested in whether any or all categories of individuals had shown signs of having habituated to our simulated intrusions. The fact that none of the categories of individuals decreased their defensive contributions—indeed, both dominants increased their defensive actions towards the intruder—shows that groups did not habituate to our treatment and that they continued to perceive the intruded in their territory as a threat (lines 158–159). Using a linear mixed model with a three-way interaction term would not, in our view, be a simpler way of considering this; since this is also not the core aspect of the paper, but rather a check that intruders were indeed viewed as a threat throughout the experiment, we would rather leave this as is.

– Results in general: I'm a bit confused as to why significant effects are reported for certain interaction terms, and then also for each level of the interaction. For example, lines 180-184 detail the significant interaction (and associated P-value from a Chi-squared test), as well as reporting t-values and P-values from each level of the interaction (e.g., treatment duration effects within each treatment). I believe just reporting the P-value from the Chi-squared test for the interaction term shows that the interaction is significant, which should then be followed by simpler reporting of the estimate for each level of the interaction. It seems unnecessary to test whether each level of the interaction is also significant.

Testing each level of treatment is key to interpreting the results of the main analyses; without testing the treatments separately following a significant treatment*treatment duration interaction, it is not possible to state conclusively whether and how the simulated intrusions impacted the groups over time. For instance, while egg volume increases over time in the Control treatment but remains stable in the Intruded treatment (Figure 3C), egg protein content remains stable in the Control treatment but decreases significantly over time in the Intruded treatment (Figure 3D). Reporting just a significant interaction does not by itself describe the nature of the effect.

– Results in general: There are several examples where the authors report "trending" or nearly-significant results that match their predictions (e.g., lines 161-163; 193-194; 216-217; 266). However, there are also instances in the reporting of the statistical models in which similarly trending factors were removed from statistical models (e.g., Table S2; S4; S6; S8; S9). I believe some of these removed values were reported in the results (e.g., line 266 matches to Table S9). However, it is confusing to understand why certain terms were removed from the statistical models, especially if they are reported in the Results (i.e. we can assume the authors consider these are important). See above for comments regarding statistical analyses.

In redoing our analyses in light of earlier comments, this issue has been resolved (see earlier responses).

– Figure 3 and all scatterplots: are these lines showing best fit from the LMM, i.e. the estimates from the LMM? Or are they plotting simply using a function like geom_smooth in ggplot? Showing the actual predicted estimates from the model on the scatterplot would give the reader a better idea of both the raw data (from the points) and the model estimates (from the lines and associated ribbons of confidence intervals). There are also methods to do this for boxplots. All of these methods can be accomplished using the "effects" package in R.

We have replotted all result figures in light of the feedback. We plotted figure 2 panels using the package “ggPaired” that allowed us to connect the datapoints from the same group/individual. Where the treatment-by-treatment duration interaction was significant or near-significant, we used the package “interactions” to plot predicted means with associated 95% confidence intervals alongside the raw data (Figure 3B–D, Figure 4A, B, Figure 5B) or partial residuals (Figure 5A). To visualise main treatment effects (or lack thereof), we used the package “jtools” to plot predicted means and associated 95% confidence intervals alongside the raw data (Figure 3A, Figure 4C, Figure 5C, D).

– Lines 166-168: This is a bit confusing, because I thought the protein and lipid analyses were only conducted in Experiment II on eggs from the first clutch? So, how can conclusions about protein/lipids in the eggs from later clutches be made if only the first clutch was analyzed? Or, is this because clutches were analyzed across groups, i.e., several clutches for a given focal group were not analyzed, but across the entire experiment the first clutch from each group gave a time effect?

We have resolved this confusion by clearly delineating results arising from Experiment I and Experiment II, and not combining them in the same sentence. It is correct that protein and lipid analyses were only from Experiment II (lines 139–141); these results can be found in a Results paragraph relating only to that Experiment (line 203ff.).

– Figure 4B: There appears to be an outlier with an incredibly high number of cleaning events at approximately 60 days treatment duration in the Intruded group. It would be useful to know if there are any biological differences between this observation and the others. Also, if this is a statistical outlier, it would be useful to know whether removing it affects the results. While I don't suggest the removal of statistical outliers simply because they're outliers (i.e., there should be a biological reason for data to be removed), it's important to take not of any outlier values like this.

There was nothing noteworthy about this group or this particular clutch to justify its removal from the dataset: it was the third clutch produced by the female during the study, was the 8th largest clutch recorded and had the 13th highest hatching success; only three fry survived to 1 month (so no behavioural data were collected), but they were of average size.

– Lines 219-224: I might be being dense here, but I'm unsure of what this sentence is saying. How is the relationship (or lack thereof) between the number of hatched fry and the number of offspring reaching 1 month of age not offspring survival? It seems from Table S7 that there was a difference between offspring survival to 1 month and survival from hatching to one month. I guess I'm having trouble understanding why we need both of these measures.

We agree that including both these measures was superfluous and somewhat confusing. In redoing our analyses in the ways suggested (see above), a simpler story emerged in relation to offspring survival. We thus now simply present one, clear-cut analysis (lines 260–267) and associated figure (Figure 5A).

– Lines 257-259: Could another interpretation of the return to activity data be regarding boldness? That is, offspring from Intruded groups were bolder later on in the experiment?

In principle, repeated territorial intrusions could lead to the production of increasingly bolder offspring. However, we did not find any complementary evidence for boldness (e.g., greater pre-stimulus activity levels or increased nearest-neighbour distances) so do not think there is strong justification for including this possibility.

– Lines 277-280: However, your results are counter to those of other studies you described in the Introduction, which found no or positive effects of intergroup conflict on reproduction and survival. Consider re-reporting that work here, as well, for consistency and clarity.

We have included explicit mention of those other studies at this point in the Discussion (lines 320–322).

– Paragraph lines 311-340: I'm a bit confused as to the point of this paragraph. I understand the authors may want to compare the pros and cons of their experimental approach to a more field-based, correlational approach. However, the result of this comparison mainly convinces me that the experiment under-stressed the study animals, as compared to what they might experience in the wild. Further, the reporting of the lack of impacts on adult body mass, egg mass, etc… essentially make the point that the treatment did not affect groups much. This is counter to the rest of the paper's point, showing that groups were significantly affected. Do the authors have any hypotheses as to why they found specific results for the variables they did (e.g., egg protein, but not egg mass)? Do the authors have any thoughts or data on the relative reproductive success of these experimental groups compared to groups in the wild? I.e., are these experiments actually replicating stressors in the wild?

We have simplified the message here. The core point of our paper is that the experimental tests showed various impacts of outgroup conflict on reproduction, ultimately resulting in fewer and smaller young surviving to 1 month of age. Whilst our experimental intrusions were possibly longer than would be seen in the wild, there are several reasons to expect greater effects in the wild (lines 373–375). Ultimately, future studies would beneficially conduct wild-based experiments, although those are logistically very challenging and have important ethical considerations (lines 390–392).

Methods line-by-line

– Lines 408-411: If one of the points of the present experiment was understanding the effects of short- versus long-term intergroup conflict, why destroy clutches produced before three intrusions? Wouldn't these give valuable information on short-term effects (or the lack thereof)?

At the outset, our main aim was comparing reproductive rate, investment and output between the two treatments (Intruded and Control). So, we decided to remove the clutches produced before at least three intrusions had occurred because we assumed groups would not have been substantially affected by our treatment in that period and would have reduced our ability to assess the effect of treatment on response variables such as latency to spawn. In analysing our data, we sometimes found that the interaction between treatment and treatment duration was important, but this had not been the core focus when we set up the experiments. Only three clutches were destroyed in Experiment I (line 458–461)—two on the morning of the first day, before the groups had experienced any intrusions, and one on the second day, before the group has experienced its second intrusion—so there is likely no great loss in terms of understanding.

– Line 549: was the number of offspring in the experiment considered as a factor predicting responses? In Table S8, I believe "number of offspring" is used as a fixed effect, but I was unclear as to whether that was number of offspring produced, or number used in the startle experiment. One might expect that the experimentally-designed groups with 10 offspring were different from those with 5.

Yes, the factor “number of offspring” refers to the number of individuals used in the test, and not the number of surviving offspring. We have clarified this point in Supplementary Table S8 by using the term ‘number of offspring in test’.

– Lines 550-551: Was there any sand in the bottom of this startle tank? I would imagine that offspring sinking responses, if meant to blend into the substrate, would be different if there were no substrate to blend in to.

No, we did not add sand to the tank to facilitate the cleaning of the tank and removal of olfactory cues from other broods that could have influenced the focal individuals’ behaviour during the startle test. It is possible that the lack of sand could have influenced the offspring behaviour, but the vast majority of individuals did display sinking behaviour in response to the startle stimulus, and we expect that the lack of sand in the bottom of the tank would have affected offspring from Control and Intruded groups alike.

– Lines 585-589: Stepwise model reduction has come under heavy criticism in the past 15 years. See, for example, Forstmeier and Schielzeth (2011), Mundry and Nunn (2009), and Harrison et al., (2018). This process can greatly increase the likelihood of false-positive results, and can inflate parameter estimates. From my knowledge, removal of non-significant interaction terms is OK, if one is interested in achieving accurate parameter estimates for main effects. However, it is rarely advised to remove main effects. Ideally, the authors would report significance tests from the full model, or use information-theoretic approaches. However, the authors could consider removing only non-significant interaction terms, as a means of getting better estimates for the main effects (Engqvist 2005).

Forstmeier W, Schielzeth H. 2011. Cryptic multiple hypotheses testing in linear models: Overestimated effect sizes and the winner's curse. Behav Ecol Sociobiol. 65(1):47-55.

Mundry R, Nunn CL. 2009. Stepwise model fitting and statistical inference: Turning noise into signal pollution. Am Nat. 173(1):119-123.

Harrison XA, Donaldson L, Correa-Cano ME, Evans J, Fisher DN, Goodwin CED, Robinson BS, Hodgson DJ, Inger R. 2018. A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ. 6:e4794.

Engqvist L. 2005. The mistreatment of covariate interaction terms in linear model analyses of behavioural and evolutionary ecology studies. Anim Behav. 70(4):967-971.

Thank you for the advice and suggested literature. We agree with the reviewer and have revised our analyses accordingly: we removed only non-significant interactions (lines 665–668), allowing us to assess the effect of treatment more accurately when the factor was not involved in significant interactions (Supplementary Tables S1–S9).

– Lines 603-607: I would appreciate learning more about this approach from the authors. It's very unclear to me why you wouldn't simply fit an appropriate response variable distribution in the initial model fitting step. For example, why not just use a Poisson distribution for the number of caring events response variable? This reads as though all response variables were forced into a Gaussian distribution. Other distributions are appropriate for other types of data, and do not require this step. I might be misunderstanding, however.

We agree with the reviewer that other distributions may be more appropriate for some datasets and have revised out analyses accordingly. To this effect, the number of caring events has been analysed using a GLMM with a negative binomial distribution, and time spent caring was analysed using a GLMM with a Gaussian distribution and log-link function (lines 650–655); results were qualitatively the same as before. For latency to spawn, the GLMM with a γ distribution had computational issues that could not be fixed so we log10-transformed the raw data before using a LMM (lines 649–654). Formal comparison of possible models using the package “performance” confirmed that our chosen method performed better than alternatives.

– Lines 609-612: I'm unclear as to why these clutches with zero hatching success were removed. The reasons these clutches had zero hatching success seem incredibly relevant to the study: if a clutch was eaten it seems evidence of cannibalism by the focal group (see lines 227-230 for a statement relevant to cannibalism in this species). If the clutch had a fungal infection, it would seem that the focal group's lack of care could have resulted in an infection.

We agree and have included all clutches in our analyses; the revised results are qualitatively similar to the previous results (line 524, Table S6).

Reviewer #3 (Recommendations for the authors):

My recommendations are superficial. My main comment is that I was frustrated by the cursory description of the experimental set up prior to the results. I know the eLife format is for the methods to go at the end of the manuscript but I really do think that at the very least the authors ought to explain how the simulated intrusions were enacted. The results didn't really come in to focus for me until after I had read the methods.

Line comments:

L70: on natural behaviour in the lab: have multiple groups been kept in the same tank and formed territories? If so, would be worth mentioning.

No, groups were kept in separate tanks in both experiments (lines 114–118, 410).

L84: This is where I would like to know how these territorial intrusions were simulated.

We have included information on territorial intrusions (lines 130–135).

L100: In general, the paper is clearly written. However, I found this paragraph confusing and had to re-read several times. The description of the experimental set-up in the methods, in contrast, reads very clearly.

We have included more, and clearer, information about our methods early in the Results (line 114ff.).

L126: '40%'. Can you give a 95% CI here?

This value is the relative difference between the treatment means and so there is not a 95% CI value to provide as well.

L423-441: This is the critical detail on how the incursions were set up that I would have liked to see in the main text.

We have included this information in the Results (lines 130–135).

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

The manuscript is much improved and close to publication-ready. However, the reviewers offered a few suggestions that may be helpful in communicating with your audience, so we are returning the draft to you to address these comments at your discretion.

1. Addressing Reviewer 2's notes about consistent reporting of significant parameter estimates.

We explicitly state in lines 149–152 that we report in the main text only the effect of treatment and, where significant or near-significant, the effect of the treatment by treatment duration interaction, because these are the focus of our work. See below for full response to Reviewer 2.

2. Clarifying the set of behaviors coded as "defensive action".

We have added a description of the classes of behaviours coded as “defensive action” in lines 133–137, as well as a brief explanation of the types of behaviours typically adopted by intruders.

Reviewer #2 (Recommendations for the authors):

Overall, I think the authors have done a great job of addressing my and the other reviewers' concerns. I am glad that the statistical suggestions have helped simplify the message, which is still an engaging and exciting one. I also greatly appreciate the laying out of predictions and the details of the methods early on. This will hopefully help readers get oriented. I have one main point, which I lay out first, and then a few line-by-line comments. I don't believe any of these comments are likely to change the main message of the manuscript.

Main comment:

On line ~680-681 in the Methods, the authors say they "provide significant parameter estimates…in the main text." However, looking at the Supplemental Tables in comparison to what is reported in the main text, only some significant effects are reported in the main text, not all. I'm curious as to why this is. For example, in the analysis of inter-clutch interval (Supp Table 1), female size had a highly significant effect (P = 0.017) but only the effect of treatment was discussed in the Results (lines 180-181). Similarly, in the analysis of egg investment, only the near-significant effect of the interaction between treatment and treatment duration was discussed (lines 195-197), but not the significant fixed effect of treatment. Also, an effect of clutch size on protein content and an effect of treatment x female size on protein content (Supp Table 3) were unreported in the Results. If these significant, but unreported, effects are not of interest, then one might ask why they were included in the model in the first place (aside from where necessary, like the treatment main effect when there is an interaction of interest). Or, are these omissions due to space constraints? There is likely a simple way to include a fuller reporting of these results. E.g., around line 195 the authors could report that "…there was a significant effect of treatment duration on clutch size (STATS HERE). This was likely driven by the interaction between treatment and treatment duration, in which Intruded females…." Or, an alternative would be in the Methods to state clearly that the authors only report selected results relevant to their hypotheses, but that some significant findings are reported only in the Supplementary Tables.

We explicitly state in lines 149–152 that we only provide in the main text the effect of treatment and, where significant or near-significant, the effect of the interaction between treatment and treatment duration, because these are the factors of core interest. We have included in the additional sentence there that the Supplementary files contain full model outputs including all tested variables, whether significant or not (lines 152–153). We have amended the sentences in the Methods to clarify that only those core results are detailed in the main text and that all model results are detailed in the Supplementary files (lines 676–680). Other factors such as female size and clutch size are included in some models because these traits are known to affect some response variables. Their inclusion allows us to assess the effect of our treatment above and beyond their expected influence on the response variable, but any discussion of such significant confounding effects would be tangential to our main story and thus we have not included it in the main text.

Line by line:

Lines 103 and 107: For the other predictions the authors make directional predictions, e.g. a negative effect. Can they specify the direction of the effect here? Or, are there no directional predictions?

While we generally predicted a decrease in reproductive investment and success due to outgroup conflict, we did not have directional predictions for each of these variables due to the potential for behavioural and physiological trade-offs to occur.

Lines 132-151: These details are incredibly helpful! Thanks for including them.

Thank you for the original suggestion to do so.

Figure 3D: There are two red points at the end of treatment duration, right in the path of the line for the control (blue) effect. Are these points miscolored? It could just be an incredible chance that the line for control goes right through these intruded points, but I would also expect the 95% CI ribbon for intruded to be wider if these points are from the intruded treatment.

The datapoints are for the Intruded treatment and the 95% CI ribbon is correctly estimated.

Supp Table 4A, Figure 4A, and lines 656-663: I could be wrong, but I would imagine that the number of clutch visits analysis should also be done using negative binomial error distribution, or a Poisson distribution. It is also count data, just like the number of clutch caring events.

Thank you for pointing this out. We have replaced it with a negative binomial GLMM, which produced very similar results. We have revised Table S4 and figure 4A, the Results text (lines 242–245) and the Statistical Methods (line 656), accordingly.

All figures with categorical predictors (e.g., "treatment"): It would be useful to include some type of asterisk system (e.g., * = P < 0.05, ** = P < 0.01, NS = P > 0.05…) for delineating P values in these figures. Especially in Figure 5, where 5C is significant (P = 0.028) and 5D is not (P = 0.089), the reader cannot see these differences at a glance. The authors do something like this in Figure 2, but not in other figures. Of course, we can discuss issues about whether P values really represent biological significance, but since that's the framework we're working under here, it would be useful to reflect in the figures.

We have added p-values to all panels in Figures 3–5, in line with Figure 2.

Lines 332-336: Though I think this point is likely correct, the authors did not actually show how the experiment impacted the later dominance and reproductive success of the offspring. Therefore, I might suggest revising this to "…outgroup conflict likely negatively impacts the fitness of multiple generations…."

Done (line 332).

Methods lines 534-536: This is incredibly nit-picky, but I'm mostly curious. Why were there such varying numbers of eggs removed for the lipid versus protein analyses?

The number of eggs collected for the different analyses was based on literature searches and consultations with colleagues who have used the same protocols, albeit on different taxa.

Reviewer #3 (Recommendations for the authors):

I enjoyed reading this paper again, and the authors now do a much better job of explaining the experimental set-up prior to the results. The only thing I still think is missing is a brief description (prior to the results) of how the animals behave during the experimentally induced conflicts and a definition of what "defensive actions" involve. A few small additions should suffice, and add important context. I would suggest promoting the descriptions of the behaviours coded as defensive (headbutting, displays, etc) from L514 to the results paragraph starting at L129. It would also be good to hear a little more about the behaviour of the "intruder". Did they frequently antagonise their neighbours, or exhibit aggressive displays or attacks?

We have added this information as requested (lines 133–137). Intruders often respond with aggressive postures and occasional attacks, but also with submissive postures and displays, and with evasive behaviour. We did not quantify these behaviours for the intruder and therefore cannot comment on the frequency of the different behaviours.

https://doi.org/10.7554/eLife.72567.sa2

Article and author information

Author details

  1. Ines Braga Goncalves

    School of Biological Sciences/Life Sciences, University of Bristol, Bristol, United Kingdom
    Contribution
    Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    ines.goncalves@bristol.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0659-9029
  2. Andrew N Radford

    School of Biological Sciences/Life Sciences, University of Bristol, Bristol, United Kingdom
    Contribution
    Conceptualization, Funding acquisition, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5470-3463

Funding

European Research Council (682253)

  • Andrew N Radford

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Richard Wall for access to his laboratory facilities, Shatha Alqurashi and Saeed Alasmari for assistance in developing protein and lipid extraction protocols, Martin Aveling for the artwork in Figure 1, Barbara Taborsky and Michael Taborsky for detailed discussions about this project and the study species across the years, and Stephanie King, Ben Ashton, and Patrick Kennedy for comments on the manuscript. This work was supported by a European Research Council Consolidator Grant (project no. 682253) awarded to ANR. Funding: This work was supported by a European Research Council Consolidator Grant (project no. 682253) awarded to ANR.

Ethics

Our work was approved by the University of Bristol Ethical Committee (University Investigator Number: UB/16/049 + UB/19/059).

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Biology, Tübingen, Germany

Reviewing Editor

  1. Jenny Tung, Duke University, United States

Reviewer

  1. Mark Dyble, University College London, United Kingdom

Publication history

  1. Received: July 28, 2021
  2. Preprint posted: August 12, 2021 (view preprint)
  3. Accepted: August 31, 2022
  4. Version of Record published: September 14, 2022 (version 1)

Copyright

© 2022, Braga Goncalves and Radford

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 157
    Page views
  • 26
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ines Braga Goncalves
  2. Andrew N Radford
(2022)
Experimental evidence that chronic outgroup conflict reduces reproductive success in a cooperatively breeding fish
eLife 11:e72567.
https://doi.org/10.7554/eLife.72567

Further reading

    1. Ecology
    Cwyn Solvi, Yonghe Zhou ... Fei Peng
    Research Article

    Are animals’ preferences determined by absolute memories for options (e.g. reward sizes) or by their remembered ranking (better/worse)? The only studies examining this question suggest humans and starlings utilise memories for both absolute and relative information. We show that bumblebees’ learned preferences are based only on memories of ordinal comparisons. A series of experiments showed that after learning to discriminate pairs of different flowers by sucrose concentration, bumblebees preferred flowers (in novel pairings) with (1) higher ranking over equal absolute reward, (2) higher ranking over higher absolute reward, and (3) identical qualitative ranking but different quantitative ranking equally. Bumblebees used absolute information in order to rank different flowers. However, additional experiments revealed that, even when ranking information was absent (i.e. bees learned one flower at a time), memories for absolute information were lost or could no longer be retrieved after at most 1 hr. Our results illuminate a divergent mechanism for bees (compared to starlings and humans) of learned preferences that may have arisen from different adaptations to their natural environment.

    1. Ecology
    2. Evolutionary Biology
    Nicholas Grebe, Jean Paul Hirwa ... Stacy Rosenbaum
    Research Article

    Evolutionary theories predict that sibling relationships will reflect a complex balance of cooperative and competitive dynamics. In most mammals, dispersal and death patterns mean that sibling relationships occur in a relatively narrow window during development, and/or only with same-sex individuals. Besides humans, one notable exception are mountain gorillas, in which non-sex biased dispersal, relatively stable group composition, and the long reproductive tenures of alpha males mean that animals routinely reside with both maternally and paternally related siblings, of the same and opposite sex, throughout their lives. Using nearly 40,000 hours of behavioral data collected over 14 years on 699 sibling and 1235 non-sibling pairs of wild mountain gorillas, we demonstrate that individuals have strong affiliative preferences for full and maternal siblings over paternal siblings or unrelated animals, consistent with an inability to discriminate paternal kin. Intriguingly, however, aggression data imply the opposite. Aggression rates were statistically indistinguishable among all types of dyads except one: in mixed-sex dyads, non-siblings engaged in substantially more aggression than siblings of any type. This pattern suggests mountain gorillas may be capable of distinguishing paternal kin, but nonetheless choose not to affiliate with them over non-kin. We observe a preference for maternal kin in a species with high reproductive skew (i.e., high relatedness certainty), even though low reproductive skew (i.e., low relatedness certainty) is believed to underlie such biases in other non-human primates. Our results call into question reasons for strong maternal kin biases when paternal kin are identifiable, familiar, and similarly likely to be long-term groupmates, and they may also suggest behavioral mismatches at play during a transitional period in mountain gorilla society.