1. Evolutionary Biology
Download icon

Rank orders and signed interactions in evolutionary biology

  1. Kristina Crona  Is a corresponding author
  1. American University, United States
Research Advance
  • Cited 1
  • Views 1,011
  • Annotations
Cite this article as: eLife 2020;9:e51004 doi: 10.7554/eLife.51004

Abstract

Rank orders have been studied in evolutionary biology for almost a hundred years. Constraints on the order in which mutations accumulate are known from cancer drug treatment, and order constraints for species invasions are important in ecology. However, current theory on rank orders in biology is somewhat fragmented. Here, we show how our previous work on inferring genetic interactions from comparative fitness data (Crona et al., 2017) is related to an influential approach to rank orders based on sign epistasis. Our approach depends on order perturbations that indicate interactions. We apply our results to malaria parasites and find that order perturbations beyond sign epistasis are prevalent in the antimalarial drug-resistance landscape. This finding agrees with the observation that reversed evolution back to the ancestral type is difficult. Another application concerns the adaptation of bacteria to a methanol environment.

Introduction

Rank orders of genotypes with respect to fitness are central in evolutionary biology. Key concepts, such as peaks and mutational trajectories, refer to an underlying rank order. The impact of suboptimal peaks has been discussed ever since fitness landscapes were introduced (Wright, 1932). A major advantage with rank orders, as compared to other statistics for fitness landscape, is that order data is robust for variation in experimental methods and measurements. In fact, one does not necessarily need to measure fitness at all. If a rank order can be established by a competition experiment or by any type of rank order preserving fitness proxy, then one can apply theory on rank orders. Results from different empirical studies can therefore be compared; fitness or log-fitness, radius or area, change per minute or generation, makes no difference if rank orders are considered.

The concept of sign epistasis introduced by Weinreich et al. (2005), has been important for recent work on rank orders in evolutionary biology (Kaznatcheev, 2019; Crona et al., 2017; Wu et al., 2016; de Visser and Krug, 2014; Crona et al., 2013; Poelwijk et al., 2011). A system has sign epistasis if the sign of the effect of a mutation, whether positive or negative, depends on genetic background. The graphs in Figure 1 illustrate the absence of sign epistasis (A), sign epistasis (B), and reciprocal sign epistasis (C) for two-locus systems. Each arrow points toward the genotype of higher fitness. The importance of rank order data for two-locus subsystems is that they carry information about global aspects of fitness landscapes, and therefore about long-term evolution. For instance, all multi-peaked fitness landscapes have reciprocal sign epistasis (Poelwijk et al., 2011). Conversely, if subsystems of type C exist, but no systems of type B, then the landscape is multi-peaked (Crona et al., 2013).

The graphs illustrate (A) no sign epistasis, (B) sign epistasis, and (C) reciprocal sign epistasis.

The peaks are marked red. Under the assumption that 00 has minimal fitness, and that the genotypes are positioned as in the figure, the three types can be characterized as graphs with no arrows down, one arrow down, or two arrows down.

The squares representing a two-locus subsystem (see the orange square in Figure 2) are clearly informative. However larger rectangles can improve the precision of the analysis. Each arrow in Figure 2 points toward the genotype of higher fitness. The undirected edges connect mutational neighbors, that is genotypes that differ at one locus only. The blue rectangles in Figure 2 concern replacements of the type 0011 for the second and third loci. If the sign of the effect of such a replacement depends on background, then the long arrows have different directions (Figure 2B and D). Such a perturbation is similar to sign epistasis, except that two loci are replaced.

Each arrow points toward the genotype of higher fitness.

The undirected edges connect mutational neighbors and carry no information about fitness differences. The graph 2A is compatible with additive fitness. The other graphs are not compatible with additive fitness because at least one pair of parallel arrows point in different directions. The lower graphs indicate sign epistasis, as is clear from the short arrows. The right graphs indicate size two perturbations, as is clear from the long arrows.

We propose a broader perspective on perturbations, that takes advantage of all rectangles in the cube (or hypercube). Differently expressed, in addition to single mutations it is sometimes useful to consider the effects of double mutations and any higher order mutations, as well as replacements 01 and 10 for any selected subset of loci. If the sign of the effect of such a replacement depends on background, then the system has a rectangular perturbation. As proof of principle, we demonstrate that the large rectangles give new insights for two empirical studies (as compared to an analysis of sign epistasis). The first concerns antimalarial drug resistance (Ogbunugafor and Hartl, 2016), whereas the second example involves bacteria adapting to a methanol environment (Chou et al., 2011).

As in Crona et al. (2017), fitness is additive if the fitness effects of individual mutations sum, otherwise the system has epistasis (or gene interactions). By a rank order induced, or signed, interaction, we mean that the rank order reveals epistasis. Questions about how different approaches to signed interactions relate and how it is possible to determine whether a rank order is compatible with additive fitness were briefly discussed by the authors. As eLife reviewers pointed out, the signed interactions in Crona et al. (2017) are not generalizations of sign epistasis but there is clearly some connection. The remaining open problems were the starting point for this work. The main idea for resolving the problems was to consider rectangles such as those in Figure 2 (no knowledge of Walsh-coefficient, polytopes, or similarly is assumed here).

Results

We consider biallelic n-locus systems. The fitness of a genotype g is denoted wg. For simplicity, we assume that there exists a total order of the genotypes with respect to fitness, here referred to as a rank order (no two genotypes have equal fitness).

For two-locus systems, the expression u=w00+w11-w10-w01 measures epistasis. If it easy to verify that u=0 if fitness is additive. By definition, the system has an interaction if u0, and a signed interaction if the rank order implies that u>0 or u<0. For instance the order w11>w00>w10>w01 implies that u>0.

For a systematic treatment of order perturbations, it is necessary to write the information in compact form. Notice that a rank order where w00>w10 and w01<w11 implies sign epistasis, as the effect of a mutation at the first locus can increase or decrease fitness. Reciprocal sign epistasis for a two-locus system can be defined as a system with two peaks (Figure 1C). In particular, parallel arrows point in different directions.

It is straightforward to a check that a two-locus system has sign epistasis exactly if at least one of the following expressions is negative, and reciprocal sign epistasis if both of them are negative (Figure 1).

(1) (w00-w10)(w01-w11)
(2) (w00-w01)(w10-w11)

A negative sign corresponds to that a pair of parallel arrows that point in different directions. Large systems can be analyzed similarly.

A system has a rectangular perturbation if the sign of the effect of replacing a subset of loci, according to the rule 01 and 10, depends on background. The size of the perturbation refers to the number of loci replaced.

In particular, a rectangular perturbation of size 1 is a case of sign epistasis. The expression u was used to analyze gene interactions in the two-locus case, and a similar approach works in general. We define 𝒞 as the set of all expressions of the type

r=wg+wg-wg′′-wg′′′,

for genotypes g,g,g′′,g′′′ such that r=0 if fitness is additive. For instance, for n = 3, the expressions

w110+w000w100w010 and w111+w000w100w011

associated with two of the rectangles in Figure 2 (i.e., the leftmost side and the blue rectangle) are included in 𝒞. Note that each expression in 𝒞 can be obtained by going around a rectangle in the cube and assigning coefficients with alternating signs along the way (each w-coefficient is +1 or –1).

We call the elements in 𝒞 circuits, with reference to theory presented in Beerenwinkel et al. (2007b) (see also 'Materials and methods'; no knowledge of circuits is assumed). By a rank-order-induced circuit, or signed circuit, we mean that the rank order implies that the circuit is positive or negative (exactly as in the two-locus case).

For n=3, consider the circuit w111+w000-w100-w011 and the related expressions (the four genotypes from the circuit can be combined in two different ways):

(3) (w000-w100)(w011-w111)
(4) (w000-w011)(w100-w111).

If the rank order implies that expression (3) is negative, then the system has sign epistasis, and if expression (4) is negative, the system has an order perturbation of size 2. The short blue arrows in Figure 2 point in different directions in the first case, and the long blue arrows do so in the latter. Similarly, any rectangular perturbation corresponds to two parallel arrows (along opposite sides of a rectangle) that point in different directions.

Remark 1. The relation between signed circuits and sign epistasis can be summarized as follows for n=2.

  1. A rank order of the genotypes 00, 10, 01 and 11 with respect to fitness implies that u>0 or u<0 exactly if the system has sign epistasis.

  2. From the information that the rank order implies u>0 (or u<0) alone, one cannot determine whether there are one or two order perturbations, that is, whether or not the system has reciprocal sign epistasis.

  3. The system has reciprocal sign epistasis if both expressions (1) and (2) are negative, and sign epistasis if at least one of them is negative.

Remark 2. A similar observation holds for signed circuits and any n.

  1. Each case of sign epistasis is associated with a signed interaction for a circuit in 𝒞, that is the rank order of genotypes with respect to fitness implies that the form is positive or negative.

  2. More generally, each rectangular perturbation is associated with a signed circuit interaction for a circuit in 𝒞.

  3. Each circuit in 𝒞 corresponds exactly to two potential rectangular perturbations.

In particular, in order to check a three-locus system for perturbations it is sufficient to check the signs of 24 expressions, including expression (3) and (4), associated with 12 circuits (see 'Materials and methods').

Applications

The first application concerns a study of the malaria-causing parasite Plasmodium vivax (Ogbunugafor and Hartl, 2016). The original study investigates a four-locus system exposed to different concentrations of the anti-malarial drug pyrimethamine (PYR). The quadruple mutant denoted 1111 has the highest degree of drug resistance, whereas the genotype 0000 has the highest fitness among all genotypes in the drug-free environment. Several concentrations of the drug were tested in the original study. We compared the highest concentration of the drug and the drug-free environment. Sign epistasis was equally frequent in both fitness landscapes. However, the drug-free environment had about twice as many perturbations of size 2 and 3, as the drug-exposed environment. The summary statistic indicates that (Table 1) adaptation is more difficult in the drug-free environment, because few replacements of pieces of code (regardless of size) are universally beneficial.

Table 1
Rectangular perturbations for drug-exposed and drug-free malaria fitness landscapes.

The third line shows the total number of expressions checked. The prevalence of sign epistasis is similar for the landscapes, whereas the remaining perturbations differ by a factor of two.

Perturbation size123Size (1-3)
Drug-exposed5521581
Drug-free54399102
Expressions checked1127216200

This finding agrees well with the authors’ observation that resistance development is a relatively straightforward process, whereas reversed evolution from the mutant 1111 back to the ancestral type is difficult. Interestingly, the complete statistics of order perturbations was better able to distinguish the landscapes than sign epistasis alone.

The second application concerns a study of the bacterium Methylobacterium extorquens, which adapts to a methanol environment (Chou et al., 2011). For the sake of the argument, we assume that the published measurements are exact (a discussion about the impact of measurement errors can be found in 'Materials and methods'). The fitness landscape does not have sign epistasis. However, the system has size 2 perturbations (Table 2). In particular w1000>w0100 and w1011<w0111, so that the sign of the effect of the change 1001 at the first pair of loci depends on background. The perturbations show that the rank order is incompatible with additive fitness.

Table 2
Order perturbations of size 1 and 2 for Methylobacterium extorquens.

The landscape has no sign epistasis. However, perturbations of size 2 reveal that the landscape is not additive.

Perturbation of size12
Number of perturbations03
Expressions checked11272

A complete analysis of perturbations for an n-locus system requires an investigation of 2(6n8-4n-1+2n-3) expressions for n3 (Theorem 1, 'Materials and methods'). Statistics on order perturbations should always be considered estimates because of measurement errors. However, that is true for all rank-order-based concepts, and a peak count is sensitive to noise. One could naively believe that a broader perspective makes the problem worse. By contrast, rectangular perturbations can sometimes be helpful in separating signal and noise.

For instance, assume that fitness is additive and that all mutations decrease fitness (Figure 3). The graphs show the proportion of unexpected observations, that is increased fitness, for replacements of the type 01, 0011 and 000111 for two error distributions (errors follow a normal distribution in the upper graph and the Student’s t-distribution with df = 3 in the lower, see 'Materials and methods' for more details). In both cases, the proportion of unexpected results decreases by the number of loci replaced. The reason is that the noise level is similar in all cases, whereas the effect size is not.

The fitness landscapes are additive and each mutation 01 decreases fitness.

The graphs compare replacements 01,0011 and 000111. Because of measurement errors, some replacements appear to increase fitness. The graphs show the proportion of such cases.

This observation can be useful. For instance, it is difficult to exclude additive fitness for a collection of detrimental mutations by observing sign epistasis alone (apparent sign epistasis can result from noise). However, an investigation of the three types of replacements described could be conclusive. In particular, similar proportions of unexpected results would constitute a strong argument against additive fitness.

Rank orders and perturbations are important for problems beyond natural selection. Constraints on the orders in which mutation occur are known for development of cancer drug resistance (Hosseini et al., 2019; Beerenwinkel et al., 2007a). In brief, a typical case would be that some mutation B is not selected for unless a mutation A has occurred, even though AB has high fitness. If the constraint holds universally, it can be described as: w01<w00<w10<w11, where * is an arbitrary sequence. For an n-locus system, such a constraint implies 2n-2 order perturbations if fitness is additive for the remaining loci.

Additivity and rank orders

The problem of determining if rank orders are compatible with additive fitness is already interesting for n=3.

For the fitness landscape

w000=1,w100=1.1,w010=1.12,w001=1.09,
w110=1.2,w101=1.22,w011=1.19,w111=1.3,

the rank order is w111>w101>w110>w011>w010>w100>w001>w000. The change 01 at any locus increases fitness regardless of background, so there is no sign epistasis. However, the sign of the effect of the change 1001 at the first pair of loci depends on background, as w100<w010 and w101>w011. In particular, the rank order is not compatible with additive fitness.

As remarked in Crona et al. (2017), exactly 384 orders are compatible with the absence of sign epistasis for n=3. By applying order perturbations (and by inspection), we verified that exactly 96 out of the 384 orders are compatible with additive fitness, or 0.24 percent of all 40,320 rank orders for three-locus systems. After relabeling (see 'Materials and methods' for details), only the following two orders are compatible with additive fitness:

w111>w110>w101>w011>w100>w010>w001>w000
w111>w110>w101>w100>w011>w010>w001>w000.

Theory on rank orders and additivity has been developed independent of biological application and is still an active research area (Searles and Slinko, 2015; Maclagan, 1998). In principle, one can determine whether a specific rank order is compatible with additive fitness by solving a system of linear inequalities obtained from the order. It is straightforward to verify that a rank order is compatible with additive fitness unless it has rectangular perturbations for n=4. However, counterexamples to the analogous statement for n=5 exist (Kraft et al., 1959) (see Materials and methods for more details).

Discussion

A rank order of genotypes, from highest to lowest fitness, is informative about gene interactions (Crona et al., 2017). We introduced rectangular perturbations to resolve problems on how different approaches to rank order induced (or signed) interactions relate. Sign epistasis concerns the effects of single mutations, and the new perturbations also concern the effects of multiple mutations.

We have established that sign epistasis implies that a system has signed circuit interactions, but the converse is not true. Strictly speaking, signed circuits are not generalizations of sign epistasis, neither are signed circuits refinements of sign epistasis. However, sign epistasis can be seen as rectangular perturbations of minimal size.

A rank order is compatible with additive fitness unless there is sign epistasis for a two-locus system. We have provided a counterexample for the analogous claim for three loci. For four-locus systems, a rank order is compatible with additive fitness unless there are rectangular perturbations. However, counterexamples exist for five loci (Maclagan, 1998; Kraft et al., 1959). In general, only a very small proportion of all rank orders are compatible with additive fitness, and the theoretical understanding for the property is limited.

As a proof of principle, we applied rectangular perturbations to empirical studies on antimalarial drug resistance (Ogbunugafor and Hartl, 2016) and on bacteria adapting to a methanol environment (Chou et al., 2011). Rectangular perturbations have the capacity to detect epistasis when conventional rank-order-based methods cannot. The perturbations capture evolutionary important properties beyond local aspects. A complete analysis of rectangular perturbations requires an investigation of a large number of expressions (of the order 6n). Often, it is meaningful to consider selected perturbations, depending on the context.

In general, rank orders are quite informative about gene interactions, and also regarding evolutionary potential (in a qualitative sense) provided one assumes the Strong Selection Weak Mutation regime (Gillespie, 1984). If the available information from an empirical study is a rank order, for instance from a competition experiment, then it is obviously useful to have methods for interpreting rank orders (further motivations are discussed in Crona et al., 2017). However, rank-orders methods have obvious limitations. Rank orders are insufficient for determining the effect of genetic recombination (de Visser et al., 2009; Otto and Lenormand, 2002). Evolutionary predictability is sensitive for population parameters, and the importance of accessible mutational trajectories is not universal (de Visser and Krug, 2014; Krug, 2019).

It would be interesting to determine the extent to which rectangular perturbations are helpful for relating local and global properties of fitness landscapes, similar to results on sign epistasis (Weinreich et al., 2005; Poelwijk et al., 2011; Crona et al., 2013), and also in analyzing statistical aspects further. Algorithms and exact formulas for potential perturbations are provided in the 'Materials and methods'.

Materials and methods

General circuits

Request a detailed protocol

For a biallelic n-locus system, we have discussed a set 𝒞 defined as all expressions of the form

r=wg+wg-wg′′-wg′′′,

for genotypes g,g,g′′,g′′′, such that r=0 if fitness is additive. In order to connect to general theory, we need some concepts. Starting with the two-locus case, for the genotypes 00,10,01,11, one can form vectors in 3 by adding an extra coordinate one for each genotype

(0,0,1),(1,0,1),(0,1,1),(1,1,1).

The four vectors are linearly dependent since

(1,1,1)+(0,0,1)-(1,0,1)-(0,1,1)=(0,0,0).

The dependence relation corresponds exactly to the linear form

u=w11+w00-w10-w01.

Similarly, one can form vectors in n+1 from the vertices of the n-cube by adding a coordinate 1. Circuits are defined as the minimal dependence relations, in the sense that each proper subset of the vectors (with non-zero coefficients) are linearly independent.

There are in total 20 circuits for n=3, including 𝒞, but also for instance the circuit

w000+2w111-w110-w101-w011.

In this terminology, one can describe 𝒞 as the set of circuits that have exactly four non-zero coefficients for the variables wg. Note that all circuits are zero if fitness is additive. In that sense, circuits measure epistasis. An analysis of all circuits provides very complete information on gene interactions (Beerenwinkel et al., 2007b).

Counting rectangular perturbation

Request a detailed protocol

To count rectangular perturbations, one needs to find all rectangles with vertices in an n-cube. We provide an explicit formula with proof for the reader’s convenience, even though the result is elementary. The proof depends on Stirling numbers of the second kind. The Stirling number S(n,k) is defined as the number of ways to partition a set of n objects into k non-empty subsets. We refer to Grimaldi (2006) for more background.

Theorem 1. The total number of (potential) rectangular perturbations for an n-locus system is

2(6n84n1+2n3)forn3.

Moreover, the number of rectangular perturbations of size k (exactly k loci are replaced) equals

2k-1(nk)(2n-k2).

Corollary. A complete investigation of sign epistasis for an n-locus system requires that one checks the signs of n(2n-12) expressions.

Lemma 1. For Stirling numbers of the second kind S(n,k), the following identities hold.

S(n,2)=2n-1-1
S(n,3)=16(3n-32n+3).

Proof. The first formula holds since there are 2n-2 non-empty proper subsets of n elements, and each partition corresponds to exactly two subsets. Similarly, the second formula can be derived from the observation that one can construct three labeled subsets of n elements in 3n ways. After reducing for all cases with empty sets, the number of alternatives is

3n-3(2n-2)-3=3n-32n+3.

Each partition corresponds to six alternatives, which completes the argument.

Lemma 2. Let n3. There are

6n/8-4n-1+2n-3

rectangles with vertices in an n-cube.

Proof. For each vertex s1sn in an n-cube, one can construct a rectangle with vertices on the n-cube as follows. Distribute the set of n loci into three subsets S1,S2 and S3, where the intersection of each pair of sets is empty, and where S1 and S2 are non-empty.

From the vertex s1sn, one constructs the remaining vertices by replacing sets of loci, according to the rule 01, and 10. Specifically, one vertex is obtained by replacing all elements in S1, one by replacing all elements in S2, and the last by replacing all elements in S1S2. If S3 is empty, one can construct S(n,2) rectangles. If all three sets are non-empty, then one can choose S3 in three ways, and consequently construct 3S(n,3) rectangles. In total, one obtains 3S(n,3)+S(n,2) rectangles starting from a particular vertex. There are 2n vertices in the n-cube and each rectangle has four vertices. By the previous lemma, the number of rectangles is

2n4(3S(n,3)+S(n,2))=6n/8-4n-1+2n-3,

which completes the proof.

We can now prove the main result.

Proof of Theorem 1. The total number of (potential) rectangular perturbations for an n-locus system is

2(6n8-4n-1+2n-3).

A rectangle with vertices in the n-cube corresponds to exactly two rectangular perturbations, one for each pair of parallel edges. Consequently, the result follows from Lemma 2.

The second part of the theorem states that the number of rectangular perturbations where exactly k loci are replaced is equal to

2k-1(nk)(2n-k2).

The positions of the k loci that change can be chosen in (nk) different ways. There are 2k words of length k, and therefore 2k-1 pairs consisting of a word and its replacement (for instance, the replacement of 110 is 001). Finally, there are 2n-k different backgrounds, so that a pair of backgrounds can be chosen in (2n-k2) ways.

As mentioned, a single signed circuit may correspond to two cases of sign epistasis. Indeed, a two-locus system with reciprocal sign epistasis is an example. It is thus of interest to identify all order perturbations, rather than identifying signed circuits only. Theorem 1 and its proof indicate how one can find all order perturbations. In particular, the complete list for identifying rectangular perturbations for n=3 consists of the following 24 expressions, where the first 18 expressions concern sign epistasis, and the remaining six size 2 perturbations.

(w000-w100)(w010-w110),(w000-w100)(w001-w101),(w000-w100)(w011-w111),
(w010-w110)(w001-w101),(w010-w110)(w011-w111),(w001-w101)(w011-w111),
(w000-w010)(w100-w110),(w000-w010)(w001-w011),(w000-w010)(w101-w111),
(w100-w110)(w001-w011),(w100-w110)(w101-w111),(w001-w011)(w101-w111),
(w000-w001)(w100-w101),(w000-w001)(w010-w011),(w000-w001)(w110-w111),
(w100-w101)(w010-w011),(w100-w101)(w110-w111),(w010-w011)(w110-w111),
(w000-w110)(w001-w111),(w000-w101)(w010-w111),(w000-w011)(w100-w111),
(w100-w010)(w101-w011),(w100-w001)(w110-w011),(w010-w001)(w110-w101).

Rank orders compatible with additive fitness

Request a detailed protocol

Different rank orders may differ only 'by labels’. For instance, the rank orders

w11>w10>w01>w00andw00>w01>w10>w11

differ by the map 01 and 10, applied to both loci for each genotype. By definition, a cube isomorphism preserves the adjacency structure of the cube (a pair of mutational neighbors is mapped to a pair of neighbors).

Two rank orders are considered equivalent if a cube isomorphism induces a map between them. Differently expressed, for any given rank order (for a two-locus system), one can obtain an equivalent order by assigning the label 00 to a genotype of choice (among four alternatives) and then 10 to one of its neighbors (among two alternatives). After that, the adjacency condition determines the labels of the remaining genotypes. The new rank order is identical to the original one, except that the genotypes have new labels as described. It follows that each equivalence class consists of 8=42 orders. The 24 rank orders for a two-locus systems can be partitioned into 248=3 equivalence classes.

For general n, each equivalence class consists of 2n(n!) rank orders by a similar argument. In particular, for n=3, each equivalence class consists of 48 rank orders, and the 8! rank orders can be partitioned into 840 equivalence classes. Questions on rank orders for n=3 are manageable as it suffices to check 840 orders.

To determine whether a specific rank order is compatible with additive fitness, one has to solve a system of inequalities. For simplicity, we illustrate the argument for a two-locus system. Consider the order w10>w01>w11>w00. We can assume that

w00=1,w10=1+a1,w01=1+a2,wherea1,a2>0.

Additive fitness would imply w11=1+a1+a2, and then the rank order implies 1+a1>1+a1+a2, which is a contradiction. It follows that the rank order is incompatible with additive fitness. In general, a rank order combined with an additive assumption determines a system of linear inequalities. The rank order is compatible with additive fitness exactly if the system has a solution. It is not difficult to find software for solving such a system of inequalities. In particular any software for solving linear programming problems can be used.

The case n3 was discussed in the main text. For n=4, one can verify computationally that there exist 14 rank orders (up to equivalence) with no order perturbations. As outlined, one can verify that all 14 rank orders are compatible with additive fitness. For n=5, the analogous statement is not true, which was first shown by Kraft et al. (1959). An explicit counterexample is given in Maclagan (1998). This author studies Boolean term orders, in our terminology perturbation free rank orders, and refers to an order as being coherent if it is compatible with additive fitness. (The counterexample is described in slightly different notation, and the translation to our notation is

00000,{1}10000,{2}01000,{3}00100,{1,2}11000,{1,2,3}11100,

and analogously.)

As explained, in principle, one can check whether a given rank order is compatible with additive fitness. However, the theoretical understanding of rank orders and additive fitness is still limited.

Rank orders and statistical significance

Request a detailed protocol

As noted in the main text, rank-order methods are sensitive for measurement errors. A sufficient number of tests is necessary for reliable results, and elementary probability theory provides some guidance. If, for instance, genotype g beats genotype g in 9 out of 10 comparisons, then it is reasonable to reject a null hypothesis of equal fitness.

Research on inferring rank orders from pairwise comparisons has a long history because of applications to sports and games (Wauthier et al., 2013; Boyd and Silk, 1983; Albers and de Vries, 2001; Bradley and Terry, 1952; Thurstone, 1927). However, statistical significance for rank orders has received considerably less attention, and it would be interesting to develop more theory on the topic.

Simulations. The upper graph in Figure 3 was obtained by assuming additive fitness, a fitness decrease by 0.01 for each mutation, and errors sampled from a Gaussian distribution, where the magnitude of the errors is a value on the list 0.0010, 0.0011, 0.0012, …, 0.01. The assumptions were the same for the lower graph, except that a Student’s t-distribution with df = 3 was used.

References

  1. 1
  2. 2
  3. 3
    Epistasis and shapes of fitness landscapes
    1. N Beerenwinkel
    2. L Pachter
    3. B Sturmfels
    (2007b)
    Statistica Sinica 17:1317–1342.
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
    Discrete and Combinatorial Mathematics (5th Edition)
    1. RP Grimaldi
    (2006)
    Pearson Education India.
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
    Noncoherent initial ideals in exterior algebras
    1. D Searles
    2. A Slinko
    (2015)
    Beiträge Zur Algebra Und Geometrie / Contributions to Algebra and Geometry 56:759–762.
    https://doi.org/10.1007/s13366-015-0239-5
  22. 22
    The method of paired comparisons for social values
    1. LL Thurstone
    (1927)
    The Journal of Abnormal and Social Psychology 21:384–400.
    https://doi.org/10.1037/h0065439
  23. 23
    Efficient ranking from pairwise comparisons
    1. F Wauthier
    2. M Jordan
    3. N Jojic
    (2013)
    International Conference on Machine Learning. pp. 109–117.
  24. 24
  25. 25
    The roles of mutation, inbreeding, crossbreeding, and selection in evolution
    1. S Wright
    (1932)
    Proceedings of the Sixth International Congress of Genetics. pp. 356–366.
  26. 26

Decision letter

  1. Joachim Krug
    Reviewing Editor; University of Cologne, Germany
  2. Diethard Tautz
    Senior Editor; Max-Planck Institute for Evolutionary Biology, Germany
  3. Joachim Krug
    Reviewer; University of Cologne, Germany
  4. Luca Ferretti
    Reviewer; The Pirbright Institute

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This work extends a previous paper in eLife, where it was shown how the sign of certain epistatic interaction coefficients can be inferred from partial rank orders of fitness values. The purpose of the manuscript is to clarify the relation of this inference with the concept of sign epistasis; Crona et al., 2017, had shown that sign epistasis was a sufficient, but not necessary, condition for rank order to be informative about at least some signed interactions. Here the range of informative rank orders is extended by introducing the concept of rectangular perturbations, which generalizes the concept of sign epistasis by asking for the background dependence of the effect of mutational events that modify several loci at once. In the revision, the presentation was improved so that it has become more accessible for (evolutionary) biologists with a non-mathematical background.

Decision letter after peer review:

Thank you for submitting your article “Rank orders and signed interactions in evolutionary biology” for consideration by eLife. Your article has been reviewed by three peer reviewers, including Joachim Krug as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Diethard Tautz as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Luca Ferretti (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This submission refers to the authors' previous eLife publication, Crona et al., 2017, where it was shown how the sign of certain epistatic interaction coefficients can be inferred from partial rank orders of fitness values. The purpose of the manuscript is to clarify the relation of this inference with the concept of sign epistasis. Crona et al., 2017 showed that sign epistasis is a sufficient condition for a rank order to be informative about at least some signed interactions, however it is not a necessary condition. Here the range of informative rank orders is extended by introducing the concept of rectangular perturbations, which generalizes the concept of sign epistasis by asking for the background dependence of the effect of mutational events that modify several loci at once. The exploratory application of the new concept to two empirical data sets suggests that it does give access to additional information about the topography of the landscape.

Overall, the reviewers found that, while the manuscript contains some interesting ideas and results that are worth to be reported in the form of a Research Advance, substantial revisions are necessary. Most importantly, the results should be presented in a way that is reasonably accessible for (evolutionary) biologists with a non-mathematical background, and the presentation also needs to be more systematic. A detailed list of the issues that should be addressed is given below.

Essential revisions:

1) All three reviewers found that the manuscript was lacking in clarity and context. Specifically, they felt that both the motivating question of the manuscript (as it arises from the original eLife publication of Krona et al., 2017), and the final answer that is achieved need to be spelled out more clearly and concisely. Moreover, it was not clearly stated how the new method of epistasis analysis goes beyond more established approaches, and in what sense it “add(s) precision to the analysis”.

2) Reviewers #2 and #3 addressed the generalizability of the results. One was concerned that the scaling of the number of perturbations as 6n would limit the practical applicability of the method, and both asked whether and how the approach is generalizable to multi-allelic sequence spaces.

3) Another issue related to the practical application of the approach concerns the effect of measurement error. If measurement errors are not considered and there are many values close to each other, then the ranks could change significantly and the rank order could be heavily influenced by random noise, generating spuriously inferred interactions. The author states that “the results are not sensitive for a few false positives or negatives”, but this statement is not supported in the text aside from its assertion. This must be addressed (maybe through simulation?) before publication.

4) Reviewers #1 and #3 were concerned that the text assumes too much previous knowledge about fitness landscapes and specifically the polytope theory of Beerenwinkel et al., 2007, on the part of the reader. One reviewer found that the introduction of the “circuits from polytope theory” is not helpful and probably incomprehensible to readers who are not already familiar with the concept, and the other wondered how to understand remark 2.3 about “the class of circuits with non-zero coefficients for exactly four elements” when the word “element” has never been used before in the text. Perhaps a formal definition of circuits can be avoided by simply saying that circuits are linear combinations of the 4 genotypic fitness values that with coefficients +/-1 arranged in such a way that the expression vanishes when fitness is additive?

5) Reviewers #1 and #2 found that the discussion of the two rank orders that are compatible with additive fitness in the n=3 case in “Example 2.4” requires further clarification. What is “relabeling” in this context? And is it correct that the allowed orders are those where either the fitness is monotonic in the number of mutations, or at most one pair of genotypes violates this monotonicity, and this is a pair of genotypes with 1 and 2 mutations?

6) Reviewers #2 and #3 addressed the comparison to empirical data. Reviewer #2 was not convinced that the proposed method allows one to gain “new insights” beyond previous findings, and asks for further specification of the unique contribution of the method. Reviewer #3 suggests to place the results into the context of previous analyses of the same empirical landscapes [Ferretti et al., JTB 2016; Blanquart and Bataillon, Genetics 2016]. Moreover, it was not clear whether the proposed method (as applied to empirical landscapes) singles out the wild type sequence, or whether it treats all genotypes on the same footing.

7) Reviewer #1 requested further clarification of the precise relation between sign epistasis and signed interactions for different values of n (“Example 2.4”). It appears that the information contained in the manuscript regarding this point can be summarized as follows:

i) For n=2 loci instances of sign epistasis and signed interaction coincide.

ii) For n=3 and 4, any signed interaction corresponds either to an instance of sign epistasis or to a rectangular perturbation of higher order. If neither of these are present, the rank order is compatible with additive fitness.

iii) For more than 4 loci, the characterization by rectangular perturbations is insufficient to decide whether a rank order is compatible with additive fitness.

Statement (i) is obvious and well known, and the case n=3 has apparently been covered by the author through exhaustive enumeration. However, it is clear neither where the statement for n=4 comes from, nor in what sense the paper of Kraft et al., 1959, is relevant in the present context. Is there a characterization beyond rectangular perturbations that would apply for n > 4, and what kind of objects would it involve?

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Rank orders and signed interactions in evolutionary biology" for further consideration by eLife. Your revised article has been evaluated by Diethard Tautz (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

Major issue

The Results section does not read well in its present form and needs to be reorganized. Presently the section begins with a partial explanation of the theory, followed by the discussion of experimental and hypothetical examples, and the complete account of the theory is found at the end. As a consequence, several concepts and definitions (e.g., the expression for the interaction u and the conditions for sign epistasis and rectangular perturbations) are introduced twice. I suggest to subdivide the section into subsections to improve readability, and to present the theory fully in the first subsection. I would also suggest to emphasize the simple geometric meaning of the various expressions in terms of the squares and rectangles in Figures 1 and 2: The circuits in the set C are obtained by going around one of the square or rectangle and giving the genotype fitness values alternating signs along the way. Similarly the conditions for sign epistasis and higher order perturbations amount to comparing fitness differences (= directions of arrows) on two opposite sides of a square/rectangle. Regarding the hypothetical examples, there is something wrong with landscape B, since the genotypes with 3 1's are not peaks. Also, I did not find the discussion of the epistasis measure of Ferretti et al. very illuminating (and I did not understand why it is called \lambda instead of \gamma as in the original paper). It seems to me that both the hypothetical examples and the discussion of Ferretti et al. could be omitted without losing much content.

https://doi.org/10.7554/eLife.51004.sa1

Author response

Essential revisions:

1) All three reviewers found that the manuscript was lacking in clarity and context. Specifically, they felt that both the motivating question of the manuscript (as it arises from the original eLife publication of Krona et al., 2017), and the final answer that is achieved need to be spelled out more clearly and concisely. Moreover, it was not clearly stated how the new method of epistasis analysis goes beyond more established approaches, and in what sense it "add(s) precision to the analysis".

In line with the reviewers suggestions, I have expanded text in the Introduction on how the results connect to open problems from Crona et al., 2017, added a separate Discussion section that summarizes conclusions, and also added a discussion about measurement errors to the Materials and method section. Most important, I have provided more evidence that rectangular perturbation goes beyond established approaches.

A clarification regarding “added precision” is probably necessary. If the available information is complete fitness measurements of the preferred kind (say Wrightian fitness), then rank order methods cannot add precision. There are good reasons to consider rank orders anyway. Observations such as many peaks, prevalent sign epistasis and few trajectories to the global peak, provide intuition for the evolutionary potential.

The “added precision” refers to how rectangular perturbations perform as compared to other rank order based concepts, such as summary statistics on sign epistasis, accessible mutational trajectories, peaks or fitness graphs:

A) The new method has the ability to detect epistasis for landscapes, also if there is no sign epistasis. An empirical example has been added as proof of principle.

B) Similarly, the new method reveals evolutionary important differences for landscapes that cannot be distinguished by frequently used methods. Explicit examples have been added.

C) In addition, the revised manuscripts describes a case where one can apply rectangular perturbations for handling a problem with measurement errors in sign epistasis data.

Note: One example from the original manuscript (on Aspergillus Niger) was deleted, since the added examples probably are more instructive. From the other original example (on malaria), the more complete statistics on rectangular perturbations that includes replacements 00 → 11, seems more informative than statistics of sign epistasis alone. In general, it seems rather obvious that checking the effects of replacements 00 → 11, in addition to 0 → 1 (sign epistasis) is meaningful, especially since no effort is required.

2) Reviewers #2 and #3 addressed the generalizability of the results. One was concerned that the scaling of the number of perturbations as 6n would limit the practical applicability of the method, and both asked whether and how the approach is generalizable to multi-allelic sequence spaces.

In many cases it make sense to consider selected perturbations, depending on context. For very large systems, one can also sample genotypes and sets of loci and check for perturbations. (For a collection of [mostly] deleterious mutations, it would be interesting to check if replacements at several loci sometimes increase fitness.)

Rectangular perturbations can be defined for multi-allelic sequence spaces. However, a lot of theory on epistasis for biallelic n-locus systems (Walsh-coefficients, applications of the Fourier transform, fitness graphs) cannot easily be generalized to the multi-allelic case. Foundational work would be necessary for extending theoretical results to the multi-allelic case.

3) Another issue related to the practical application of the approach concerns the effect of measurement error. If measurement errors are not considered and there are many values close to each other, then the ranks could change significantly and the rank order could be heavily influenced by random noise, generating spuriously inferred interactions. The author states that "the results are not sensitive for a few false positives or negatives", but this statement is not supported in the text aside from its assertion. This must be addressed (maybe through simulation?) before publication.

I think the claim: “the results are not sensitive for a few false positives or negatives” caused some confusion (what I intended to say was merely that the conclusion for this particular example would not change because of a small number of false positives/negatives). I have deleted the claim and added a discussion about measurement errors to the Materials and methods section. In addition, the revised manuscripts describes a case where one can apply rectangular perturbations for handling a problem with measurement errors for sign epistasis data (Figure 3 in the revised manuscript shows a related simulation).

4) Reviewers #1 and #3 were concerned that the text assumes too much previous knowledge about fitness landscapes and specifically the polytope theory of Beerenwinkel et al., 2007, on the part of the reader. One reviewer found that the introduction of the “circuits from polytope theory” is not helpful and probably incomprehensible to readers who are not already familiar with the concept, and the other wondered how to understand remark 2.3 about “the class of circuits with non-zero coefficients for exactly four elements” when the word “element” has never been used before in the text. Perhaps a formal definition of circuits can be avoided by simply saying that circuits are linear combinations of the 4 genotypic fitness values that with coefficients +/-1 arranged in such a way that the expression vanishes when fitness is additive?

The manuscript is supposed to be much easier than Beerenwinkel et al., 2007, and several other cited articles. Figure 2 is intended to explain everything a reader needs to know for understanding the main idea. I have followed the reviewers suggestion and replaced my original definition of the circuits we use with a brief description, and moved the discussion about general circuits to the Materials and methods section (some readers will appreciate the full context).

5) Reviewers #1 and #2 found that the discussion of the two rank orders that are compatible with additive fitness in the n=3 case in “Example 2.4” requires further clarification. What is "relabeling" in this context? And is it correct that the allowed orders are those where either the fitness is monotonic in the number of mutations, or at most one pair of genotypes violates this monotonicity, and this is a pair of genotypes with 1 and 2 mutations?

In brief, a pair of rank order differ by labels only, if there exists a cube isomorphism that induces a map between the rank orders. A detailed description has been added to the Materials and methods section.

Yes, the claim is correct. For clarity, if we assume that ω000 < ωg for each g ≠ 000, then additivity would imply that ω 111> ω g for all g ≠ 111. If we also assume ω 100> ω 010> ω 001, then the additivity assumption will impose further conditions on the rank order. A remaining question is whether ω 100 > ω 011or ω 100< ω 011. If that question has been answered, the order is completely determined.

6) Reviewers #2 and #3 addressed the comparison to empirical data. Reviewer #2 was not convinced that the proposed method allows one to gain “new insights” beyond previous findings, and asks for further specification of the unique contribution of the method. Reviewer #3 suggests to place the results into the context of prevous analyses of the same empirical landscapes [Ferretti et al., JTB 2016; Blanquart and Bataillon, Genetics 2016]. Moreover, it was not clear whether the proposed method (as applied to empirical landscapes) singles out the wild type sequence, or whether it treats all genotypes on the same footing.

My response to comment 1 answers this question as well. In particular, an analysis of sign epistasis alone does neither have the same ability to rule out additive fitness, nor to capture global aspects, as a complete analysis of rectangular perturbation has.

For clarity, I have rephrased the text in one place in the revised manuscript. The new text is: “As proof of principle, we demonstrate that the large rectangles give new insights for two empirical studies (as compared to an analysis of sign epistasis)”.

I have added a remark about measures λ and λ* from Ferrettti et al., 2016. The proposed methods does not single out the wild type sequence. If the genotypes of highest and lowest fitness have maximal distance, it would be perhaps be most natural to assign the zero-string label to the genotype of lowest fitness (regardless if wild-type or not).

7) Reviewer #1 requested further clarification of the precise relation between sign epistasis and signed interactions for different values of n (“Example 2.4”). It appears that the information contained in the manuscript regarding this point can be summarized as follows:

i) For n=2 loci instances of sign epistasis and signed interaction coincide.

ii) For n=3 and 4, any signed interaction corresponds either to an instance of sign epistasis or to a rectangular perturbation of higher order. If neither of these are present, the rank order is compatible with additive fitness.

iii) For more than 4 loci, the characterization by rectangular perturbations is insufficient to decide whether a rank order is compatible with additive fitness.

Statement (i) is obvious and well known, and the case n=3 has apparently been covered by the author through exhaustive enumeration. However, it is clear neither where the statement for n=4 comes from, nor in what sense the paper of Kraft et al., 1959, is relevant in the present context. Is there a characterization beyond rectangular perturbations that would apply for n > 4, and what kind of objects would it involve?

If the rank order implies that υ > 0 for n = 2, we do not know whether or not the system has reciprocal sign epistasis. All we know is that the system has sign epistasis. In other words, signed circuits do not reveal complete information about sign epistasis, not even for n = 2. Similarly, a signed circuit interaction for c ε C tells us that there are one or two perturbations corresponding to c (differently expressed, one or two pairs of parallel arrows in the corresponding rectangle disagree). Note that potential rectangular perturbations are exactly twice as many as the number of circuits in C.

As for additivity, checking rectangular perturbations is sufficient for n < 4. The case n = 4 is reasonably straight forward from a computational point of view (the Materials and methods section describes how one can verify the claim). The reviewer’s last question is very interesting. Some relevant results are in Maclagan, 1999, and work that cites the paper (in particular detailed statistics for n = 5). However, the theoretical understanding for rank orders and additivity is limited. Because of the questions I have made several changes:

a) added explanations and clarifications for n = 3; 4.

b) In addition to my reference to Kraft et al., 1969, a reference to Maclagan, 1999, has been added. Maclagan gives an explicit counterexample (which is probably easier to understand), and I have provided a “dictionary” that explains concepts and notation in the paper so that an interested reader can check.

c) I have described how one can determine if a given rank order is compatible with additivity in the Materials and methods section.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

Major issue

The Results section does not read well in its present form and needs to be reorganized. Presently the section begins with a partial explanation of the theory, followed by the discussion of experimental and hypothetical examples, and the complete account of the theory is found at the end. As a consequence, several concepts and definitions (e.g., the expression for the interaction u and the conditions for sign epistasis and rectangular perturbations) are introduced twice. I suggest to subdivide the section into subsections to improve readability, and to present the theory fully in the first subsection. I would also suggest to emphasize the simple geometric meaning of the various expressions in terms of the squares and rectangles in Figures 1 and 2: The circuits in the set C are obtained by going around one of the square or rectangle and giving the genotype fitness values alternating signs along the way. Similarly the conditions for sign epistasis and higher order perturbations amount to comparing fitness differences (= directions of arrows) on two opposite sides of a square/rectangle. Regarding the hypothetical examples, there is something wrong with landscape B, since the genotypes with 3 1's are not peaks. Also, I did not find the discussion of the epistasis measure of Ferretti et al. very illuminating (and I did not understand why it is called \lambda instead of \gamma as in the original paper). It seems to me that both the hypothetical examples and the discussion of Ferretti et al. could be omitted without losing much content.

I have restructured the result section accordingly. However, there are still some “repeats” of concepts, because I first discuss the two-locus case in some detail, and then (immediately after in the revised version) the general case, so as not to overwhelm the reader.

I have removed the hypothetical examples and the discussion of the measures introduced in Ferretti et al., 2016, in line with the suggestions. (My reason for adding the examples in the first place was that I thought reviewers wanted more evidence for that the proposed method differs from checking for sign epistasis, but I don’t insist on the examples.)

I have also added the clarifications requested (see below).

I would also suggest to emphasize the simple geometric meaning of the various expressions in terms of the squares and rectangles in Figures 1 and 2: The circuits in the set C are obtained by going around one of the square or rectangle and giving the genotype fitness values alternating signs along the way.

A similar text has been added after the definition of rectangular perturbations.

Similarly the conditions for sign epistasis and higher order perturbations amount to comparing fitness differences (= directions of arrows) on two opposite sides of a square/rectangle.

A similar comment has been added right before the remarks in the result section.

https://doi.org/10.7554/eLife.51004.sa2

Article and author information

Author details

  1. Kristina Crona

    Mathematics and Statistics, American University, Washington DC, United States
    Contribution
    Conceptualization, Formal analysis, Methodology
    For correspondence
    kcrona@american.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1819-474X

Funding

No external funding was received for this work

Acknowledgements

We are grateful to Casey Aguilar-Gervase, Tonia Bell, Payal Dudheida and David Dunleavy for their studies on rank orders of genotypes for four-locus systems, and to Ethan Christensen for his work on antimalarial drug resistance.

Senior Editor

  1. Diethard Tautz, Max-Planck Institute for Evolutionary Biology, Germany

Reviewing Editor

  1. Joachim Krug, University of Cologne, Germany

Reviewers

  1. Joachim Krug, University of Cologne, Germany
  2. Luca Ferretti, The Pirbright Institute

Publication history

  1. Received: August 14, 2019
  2. Accepted: January 5, 2020
  3. Accepted Manuscript published: January 14, 2020 (version 1)
  4. Version of Record published: February 4, 2020 (version 2)

Copyright

© 2020, Crona

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,011
    Page views
  • 123
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    J Dylan Shropshire et al.
    Review Article

    Cytoplasmic incompatibility (CI) is the most common symbiont-induced reproductive manipulation. Specifically, symbiont-induced sperm modifications cause catastrophic mitotic defects in the fertilized embryo and ensuing lethality in crosses between symbiotic males and either aposymbiotic females or females harboring a different symbiont strain. However, if the female carries the same symbiont strain, then embryos develop properly, thereby imparting a relative fitness benefit to symbiont-transmitting mothers. Thus, CI drives maternally-transmitted bacteria to high frequencies in arthropods worldwide. In the past two decades, CI experienced a boom in interest due to its (i) deployment in worldwide efforts to curb mosquito-borne diseases, (ii) causation by bacteriophage genes, cifA and cifB, that modify sexual reproduction, and (iii) important impacts on arthropod speciation. This review serves as a gateway to experimental, conceptual, and quantitative themes of CI and outlines significant gaps in understanding CI’s mechanism that are ripe for investigation from diverse subdisciplines in the life sciences.

    1. Developmental Biology
    2. Evolutionary Biology
    Cheng-Yi Chen et al.
    Research Article

    Two distinct mechanisms for primordial germ cell (PGC) specification are observed within Bilatera: early determination by maternal factors or late induction by zygotic cues. Here we investigate the molecular basis for PGC specification in Nematostella, a representative pre-bilaterian animal where PGCs arise as paired endomesodermal cell clusters during early development. We first present evidence that the putative PGCs delaminate from the endomesoderm upon feeding, migrate into the gonad primordia, and mature into germ cells. We then show that the PGC clusters arise at the interface between hedgehog1 and patched domains in the developing mesenteries and use gene knockdown, knockout and inhibitor experiments to demonstrate that Hh signaling is required for both PGC specification and general endomesodermal patterning. These results provide evidence that the Nematostella germline is specified by inductive signals rather than maternal factors, and support the existence of zygotically-induced PGCs in the eumetazoan common ancestor.