1. Evolutionary Biology
  2. Genetics and Genomics
Download icon

Repeated origins, widespread gene flow, and allelic interactions of target-site herbicide resistance mutations

  1. Julia M Kreiner  Is a corresponding author
  2. George Sandler
  3. Aaron J Stern
  4. Patrick J Tranel
  5. Detlef Weigel
  6. John R Stinchcombe
  7. Stephen I Wright
  1. Department of Ecology and Evolutionary Biology, University of Toronto, Canada
  2. Graduate Group in Computational Biology, University of California, Berkeley, United States
  3. Department of Crop Sciences, University of Illinois Urbana-Champaign, United States
  4. Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Germany
Research Article
  • Cited 0
  • Views 880
  • Annotations
Cite this article as: eLife 2022;11:e70242 doi: 10.7554/eLife.70242

Abstract

Causal mutations and their frequency in agricultural fields are well-characterized for herbicide resistance. However, we still lack understanding of their evolutionary history: the extent of parallelism in the origins of target-site resistance (TSR), how long these mutations persist, how quickly they spread, and allelic interactions that mediate their selective advantage. We addressed these questions with genomic data from 19 agricultural populations of common waterhemp (Amaranthus tuberculatus), which we show to have undergone a massive expansion over the past century, with a contemporary effective population size estimate of 8 x 107. We found variation at seven characterized TSR loci, two of which had multiple amino acid substitutions, and three of which were common. These three common resistance variants show extreme parallelism in their mutational origins, with gene flow having shaped their distribution across the landscape. Allele age estimates supported a strong role of adaptation from de novo mutations, with a median age of 30 suggesting that most resistance alleles arose soon after the onset of herbicide use. However, resistant lineages varied in both their age and evidence for selection over two different timescales, implying considerable heterogeneity in the forces that govern their persistence. Two such forces are intra- and inter-locus allelic interactions; we report a signal of extended haplotype competition between two common TSR alleles, and extreme linkage with genome-wide alleles with known functions in resistance adaptation. Together, this work reveals a remarkable example of spatial parallel evolution in a metapopulation, with important implications for the management of herbicide resistance.

Editor's evaluation

This paper studies the evolution of herbicide resistance in Amaranthus tuberculatus, a widespread agricultural weed. By illuminating how adaptive mutations arose and spread in this remarkable example of rapid human-induced adaptation, the study will be of interest to a broad audience, ranging from plant biologists interested in herbicide resistance to evolutionary biologists and population geneticists studying the fundamental factors and processes that govern rapid adaptation. The paper applies innovative population genetic methodology to support its primary finding that resistance mutations have evolved multiple times in parallel.

https://doi.org/10.7554/eLife.70242.sa0

Introduction

The evolution of resistance in agricultural pest populations occurs rapidly and repeatedly in response to herbicide and pesticide applications. Reports of herbicide resistance across agricultural landscapes have been steadily growing, threatening crop productivity and greatly raising costs for agricultural production (Peterson et al., 2018). These reports put a lower limit on the estimated number of unique resistance cases—over 500 across the globe—based on just the occurrence of resistance to different herbicide modes-of-action across different species (Heap, 2014) and barring the probably minor role of interspecific hybridization. For acetolactate synthase (ALS) inhibiting herbicides alone, over 160 species have evolved resistance since the first report in 1986, which was only five years after their initial introduction (Comai and Stalker, 1986; Heap, 2014; Whitcomb, 1999). These numbers are likely a vast underestimate of the repeatability of herbicide resistance evolution. For ALS herbicides, for example, non-synonymous substitutions at eight distinct codons confer resistance, with most of them found in multiple species (Tranel and Wright, 2002), and with multiple independent causal mutations often occurring in the same population (Heap, 2014; Kreiner et al., 2018). In addition to repeated resistance evolution through distinct causal resistance loci, it is likely that for a single locus, resistance mutations have arisen repeatedly within a species (Kreiner et al., 2019). While these observations suggest herbicide resistance may be among the most extreme cases of contemporary parallel evolution in plants, it remains unclear how often resistance is spread across the range through gene flow versus repeated independent origins.

Population genomic approaches can greatly help to understand the origin and spread of herbicide resistance. Genomic methods have tested for differences in population structure among resistant and susceptible agricultural populations (Küpper et al., 2018), reconstructed complex genomic regions associated with resistance (Molin et al., 2017), and investigated patterns of selection on and the extent of convergence between loci conferring non-target site resistance (Kreiner et al., 2020; Van Etten et al., 2019). But even for validated resistance mutations that occur within the gene whose product is targeted by the herbicide (target-site resistance [TSR] mutations), investigations of their recent evolutionary history are sparse (but see Flood et al., 2016; Kreiner et al., 2019). With large-effect mutations identified as being causal for conferring target-site resistance to nine herbicides at 19 loci across many species (Murphy and Tranel, 2019), the field is ripe for the application of population genomic techniques for resolving the evolutionary history of herbicide resistance and informing integrative management strategies in weed populations.

In contrast to most of the selective sweep literature coming from within-host studies of drug resistance in HIV (e.g. Feder et al., 2016; Pennings et al., 2014)—where sweeps occur in a closed-system, often starting from a single founding viral lineage and evolving within individual patients—evolutionary patterns of resistance to herbicides across a relevant agricultural landscape are by no means expected to be as tidy (but see Feder et al., 2017; Feder et al., 2019 for spatial structure in HIV evolution). Weedy agricultural populations themselves, or at least genotype compositions, may be transient in space and time due to widespread gene flow and changing selection regimes through rotations of both focal crops and herbicide applications (Holst et al., 2007; Naylor, 2003; Neve et al., 2009). Consequently, persistent agricultural weed populations likely comprise a collection of resistant haplotypes that have arisen and dispersed across the landscape, following a model of spatial parallel mutation in an interconnected network of populations (Ralph and Coop, 2010). Recent population genomic evidence supports this prediction for a subset of newly problematic glyphosate-resistant agricultural populations of common waterhemp (Amaranthus tuberculatus) in Ontario, Canada, where both multiple origins and long-distance dispersal contributed to the spread of glyphosate resistance (Kreiner et al., 2019).

Amaranthus tuberculatus is a major challenge for agricultural practices in the Midwestern US, and is among the most problematic weeds worldwide in terms of its capacity for evolving resistance to multiple herbicides (Tranel, 2021). Conforming to classic hypotheses about successful weeds (Baker, 1974), the species has large census population sizes, a widespread distribution, and considerable seedbanks (Costea et al., 2005). The dioecious, wind-pollinated A. tuberculatus additionally offers an obligately outcrossing mating system, providing more independent backgrounds on which new mutations can arise (Costea et al., 2005; Kreiner et al., 2018) and an effective dispersal system (Liu et al., 2017). The species has not always been troublesome—the plant is native to North America, where it likely has grown in riparian habitats long before the advent of modern agriculture (Sauer, 1957). While it persists in these habitats, in the past 100 years or so, A. tuberculatus has become strongly associated with agricultural fields (Costea et al., 2005; Tranel and Trucco, 2009). Over the past three decades, A. tuberculatus has evolved resistance to seven chemical modes-of-action (Heap, 2021; Tranel, 2021), including resistance to both ALS-inhibiting herbicides and protoporphyrinogen oxidase (PPO)-inhibiting herbicides (Shoup et al., 2003).

ALS-inhibiting herbicides have been among the most popular mode-of-action for weed control in crops since their introduction in the 1980s (Brown, 1990), and are widely used in both corn and soy production systems. They were rapidly adopted due to their application rates being an order of magnitude lower than previous herbicides and thus increased affordability, along with low toxicity and broad-spectrum weed control (Mazur and Falco, 1989), but quickly became notorious for their ability to select for resistant weeds (Tranel and Wright, 2002). Use of ALS herbicides thus decreased in the 1990s, coinciding with the widespread adoption of Round-up ready cropping systems, of which glyphosate herbicides are an essential component (Green, 2007). While historically not as popular as ALS-inhibiting herbicides, PPO-inhibiting herbicides have been used for nearly 50 years for the control of dicotyledonous (broadleaf) weeds, at its peak in the early 1990s representing 10% of annual applications in the USA but dropping to 1.5% by 2006 (U.S. Department of Agriculture, National Agricultural Statistics Service (USDA-NASS), 2012). However, PPO-inhibiting herbicides have since seen a resurgence for control of weeds that have evolved resistance to heavily-used herbicides such as glyphosate and ALS inhibitors (Dayan et al., 2018; Tranel, 2021; Tranel and Wright, 2002; Zhao et al., 2020).

Here, we investigate the evolutionary histories of mutations that have been previously demonstrated to confer target-site resistance in A. tuberculatus, focusing on TSR mutations within ALS and PPO. We infer the number of TSR mutational origins across populations and their distribution across the landscape, examining the signals left behind by both mutation and recombination. Specifically, we implement a method that infers the ancestral recombination graph (ARG) and that offers a powerful approach for inference of selective history, by providing near-complete information on relatedness among haplotypes (Rasmussen et al., 2014). Coupled with estimates of effective population size (Ne) through time based on coalescent rates across the genome (Speidel et al., 2019), these methods allow for powerful hypothesis testing on the role of standing variation versus new mutation. We assess heterogeneity in whether independent resistant lineages are associated with pronounced signals of selection based on a tree-based test and selective sweep signals, some of which may be mediated by intra- and inter-locus allelic interactions. We also examine signatures of the haplotype competition between common ALS resistance alleles, and the extent that extreme selection from herbicides on TSR mutations has impacted diversity across the genome. Our detailed population genomic analysis describing the repeatability of and heterogeneity in target-site herbicide resistance evolution advances our understanding of rapid adaptation of multicellular organisms to extreme selective pressure, while providing evolutionary informed priorities for agricultural weed management.

Results

Types of target-site mutations

To test hypotheses about the origins of TSR in Amaranthus tuberculatus, we used whole-genome sequence information from 19 agricultural fields in the Midwestern US and Southwestern Ontario, Canada. It is important to note that these populations were obtained from fields where A. tuberculatus was only poorly controlled, potentially overrepresenting the frequency of resistance across the landscape.

Having previously characterized two types of target-site glyphosate resistance in these samples (coding sequence substitutions and gene amplification; Kreiner et al., 2019), here we focus on all other characterized mutations known to confer resistance in the genus Amaranthus. We examined our sequence data for the presence of eight such substitutions in ALS, three in PPO, and one in photosystem II protein D1 (psbA - the target of a class of herbicides that inhibits electron transfer). Across 152 individuals, we found segregating variation at six out of eight known ALS mutations, and one of the three known PPO mutations (Table 1). We did not find any mutation in psbA.

Table 1
Number and frequency of resistant individuals and mutations at loci known to be causal for resistance to PPO and ALS herbicides, both totals, and within each agricultural region.

Relative frequencies given in parentheses.

PPOALS
ΔGly210Trp-574-LeuSer-653-AsnSer-653-ThrGly-654-PhePro-197-LeuPro-197-HisAla-122-SerAsp-376-Glu
Total number of individuals with TSR mutations22 (0.145)80(0.526)48 (0.316)2(0.013)1(0.007)1(0.007)1(0.007)2(0.013)3(0.020)
Total number ofTSR mutations25 (0.082)106 (0.349)59 (0.194)2(0.007)1(0.003)1(0.003)1(0.003)2(0.007)3(0.010)
TSR mutations in Walpole,
Ontario, Canada
012(0.162)12(0.162)1(0.0135)00000
TSR mutations in Essex,
Ontario, Canada
1(0.0125)23(0.288)35(0.438)01(0.0125)01(0.0125)2(0.025)3(0.0375)
TSR mutations in Midwestern US24(0.16)71(0.47)12(0.08)001(0.0067)000

The nine unique PPO and ALS target-site resistant mutations occur at seven distinct codons, with two positions segregating for multiallelic resistance: two non-synonymous changes at codons 197 and 653 in the ALS gene. Six out of nine variants are rare, with five or fewer instances, in contrast to the common Trp-574-Leu and Ser-653-Asn nonsynonymous substitutions in ALS, and the PPO ΔGly210 deletion (Table 1). Notably, the most common resistance mutation (referring to identity-by-state), Trp-574-Leu, is found in 53% of agricultural individuals, and the second most common, Ser-653-Asn, in 32% of individuals (Table 1). From these two most frequent ALS mutations alone, 74% of individuals sampled had resistance to ALS-inhibiting herbicides. Accounting for rare ALS resistance mutations only increases this percentage to 75%, because these rare variants are almost exclusively found in individuals already harboring one of the two common ALS mutations.

Regardless of the geographic region (within Essex County, Walpole Island, and the Midwestern US), multiple causal changes confer ALS resistance. Furthermore, the majority of populations (5/8 populations within the Midwestern US, 5/5 populations in Essex County, and 4/6 populations in Walpole) harbor multiple causal ALS mutations (Table 1). Thus, at just the level of these distinct mutational types, we observe genetic convergence in adaptation to ALS-inhibiting herbicides at global, regional, and population scales.

Regional selective sweep signals

To learn how and how often the individual mutations might have arisen, we first visualized regional selective sweep patterns at PPO and ALS genes—two genes that are located only ~250 kb apart in the genome—with respect to the common Trp-574-Leu, Ser-653-Asn, ΔGly210 alleles. In particular, we assayed the extent to which selection from herbicides at these genes has led to reductions in diversity, and increases in homozygosity and linkage across the haplotype, as would be expected if TSR alleles have increased in frequency rapidly enough that recombination has yet to unlink these alleles from the background on which they arose. We found that corresponding selective sweep signals appear to be highly heterogeneous across geographic regions and across resistance mutations (Figure 1). The most pronounced selective sweep signal at the regional level is for the ALS Ser-653-Asn mutation, in our large collection of nearby populations from Essex County. These resistant haplotypes show a dramatic excess of homozygosity over susceptible haplotypes for nearly 10 Mb (XPEHH, Figure 1 top-left green line). The breadth of the impact of selection on local chromosome-wide linkage disequilibrium (LD) is worth noting—this extended sweep signal is even larger than what was seen for an EPSPS-related gene amplification whose selective sweep in response to glyphosate herbicides spanned 6.5 Mb in these same individuals (Kreiner et al., 2019). This pattern of a hard selective sweep is also apparent in patterns of pairwise diversity and in LD of the focal Ser-653-Asn mutation with missense SNPs (r2) (Figure 1, top middle/right). In contrast, the Trp-574-Leu mutation in Essex actually shows a slight excess of heterozygosity and excess diversity compared to susceptible haplotypes, but nearly as strong LD with other missense SNPs (Figure 1, purple line top row).

Sweep-scan summary statistics by geographic region.

(Left) Difference in integrated haplotype homozygosities (XPEHH) between haplotypes carrying the focal TSR mutation and susceptible haplotypes. (Middle) Difference in mean pairwise diversity between haplotypes carrying the focal TSR mutation and susceptible haplotypes. (Right) r2 of other missense mutations with focal TSR mutation on genotype, rather than haplotype data. In all columns, dashed vertical lines denote PPO (left) and ALS (right) genes, which are only 250kb apart in the genome.

Selective sweep signals are much subtler in Walpole and especially in the Midwestern US compared to Essex. In Walpole as in Essex, ALS resistance shows a stronger signal of selection for Ser-653-Asn than for Trp-574-Leu, whereas signatures of selection for either mutation are almost completely lacking from the Midwestern US except for a slight peak in r2 for ALS Ser-653-Asn. The PPO ΔGly210 mutation is found at considerable frequencies only in the Midwestern US, but regional sweep signals based on homozygosity, diversity, and LD are absent with respect to the deletion (Figure 1, pink line bottom row).

Despite inconsistent sweep signals, the mutations we describe here are extremely likely to have experienced selection over their history, but varying over space and time. We know from previous functional validation that these mutations are causal for resistance to ALS or PPO inhibiting herbicides [in Amaranthus tuberculatus for the PPO deletion, as well as ALS Trp-574-Leu, and both Ser-653-Asn and Ser-653-Thr substitutions (Foes et al., 2017; Patzoldt and Tranel, 2017; Shoup et al., 2003), and in congeners for the remaining mutations (McNaughton et al., 2001; Nakka et al., 2017; Singh et al., 2018; Whaley et al., 2004)]. Thus, we set out to identify the extent to which repeated origins and gene flow have influenced regional signatures of selection, as well as identify key processes that may underlie heterogeneity in their recent evolutionary histories.

Inferring the genealogical history of target-site resistance mutations

We first took a gene tree approach to reconstruct the evolutionary history of TSR mutations, based on phased haplotypes inferred from performing the most up-to-date joint population and read-backed phasing methods (SHAPEIT4 Delaneau et al., 2019; WhatsHap v1.0 Martin et al., 2016). We found that patterns of similarity among phased haplotypes at ALS and PPO (including 5 kb upstream and downstream of either gene) indicated numerous origins for every common resistance mutation: PPO ΔGly210, ALS Trp-574-Leu, and Ser-653-Asn (Figure 2—figure supplement 1). A gene tree based on raw pairwise differences between haplotypes, as illustrated here, sets an upper limit on the number of independent origins for each mutation. Because recombination causes ancestral haplotypes to decay in size as they are passed down through time, linked sites may not necessarily have identical genealogies as a single mutational origin may be recombined onto distinct haplotypes. In cases such as this, ancestral recombination graphs (ARGs) can allow for more accurate inferences of genealogical history by generalizing the inference of coalescent history along a recombining unit (Griffiths and Marjoram, 1997; Griffiths and Marjoram, 1996; Hudson, 1983).

We reconstructed the ARG for 20,000 SNPs encompassing both ALS and PPO genes (a ~ 1 and ~ 10 kb gene, respectively, separated by 250 kb on the same chromosome) using ARGweaver (Hubisz and Siepel, 2020; Rasmussen et al., 2014). We assessed the likelihood of the ARG inferences under varying constant recombination rates and over two time step parameters (Figure 2—figure supplements 2 and 3). From our most likely parameter values (recombination = 10–8, time steps = 30), and based on the MCMC sample that maximizes the likelihood of our data across 1,250 iterations, we extracted the tree corresponding to each focal TSR locus. For all three common TSR mutations, ALS Trp-574-Leu, ALS Ser-653-Asn, and PPO ΔGly210, we found evidence for multiple independent origins producing the same resistant variant—three for ALS Trp-574-Leu, two for ALS Ser-653-Asn, and two for PPO ΔGly210 (Figure 2A). Support for these origins was generally very high, with 5/7 origins showing that 100% of MCMC samples were consistent with each cluster of haplotypes being monophyletic. The two origins with less than full support were haplotypes harbouring the ALS Trp-574-Leu, where one high-frequency origin had 85% support (105/125 MCMC samples) and a low-frequency origin had 45% support (56/125), implying that occasionally haplotypes mapping to these origins belonged to other groupings across MCMC samples. The findings of multiple origins of identical resistance mutations, and most specific origin scenarios, are consistent even across less likely recombination rate parameterizations (Figure 2—figure supplements 2 and 3). Compared to the gene trees based on the average mutational history across the PPO and ALS (Figure 2—figure supplements 1 and 4), accounting for recombination has clearly further resolved the origins of these target-site resistance alleles. In comparison to susceptible haplotypes, resistant haplotypes at each origin are apparently more homozygous but retain some diversity, presumably driven by both mutation and recombination subsequent to their origin (Figure 2—figure supplements 46).

Figure 2 with 6 supplements see all
Repeated-independent origins and range-wide distribution of three target-site resistance mutations, along with their associated significance of selection over two different timescales.

(A) Trees at focal TSR loci corresponding to an ARG estimated across 20 kb SNPs. Bold numbers around trees identify clusters of resistant haplotypes consistent with independent origins. The presence of an asterisk at each origin number implies significant evidence of selection since the mutation arose de novo at p < 0.05 against the null distribution, as in C. Support for monophyly for each origin across 125 samples of 1250 MCMC iterations is depicted by the fraction found outside each cluster. (B) Geographic distribution of haplotypes originating from distinct mutational lineages as inferred from A. TSR mutational lineages are found across numerous populations and agricultural regions, although regions show clear differences in the frequency of some mutations. (C) Results of tree-based tests of non-neutral allele frequency change (Speidel et al., 2019) from each mutational origin of TSR under two alternative models of selection; selection on a mutation since its origin versus selection over more recent timescales (on the last 0.01% of the tree). The horizontal dashed line denotes the p-value cutoff of α = 0.05, after false discovery rate correction.

Figure 2—source data 1

Tree-based coalescent test for selection under two scenarios; selection on since the mutation first arose and selection even more recent timescales (i.e.the last 1% of the tree).

All bolded significant p-values remain significant after a 5% FDR correction. Values in parentheses in the p-value columns illustrate the ranking of each origin for significance in each test.

https://cdn.elifesciences.org/articles/70242/elife-70242-fig2-data1-v2.txt
Figure 2—source data 2

Tree sequence corresponding to ALS Trp-574-Leu, ALS Ser-653-Asn, and PPO ΔGly210 extracted from the most likely iteration of the ARGweaver MCMC.

https://cdn.elifesciences.org/articles/70242/elife-70242-fig2-data2-v2.zip
Figure 2—source data 3

Resistance status at ALS Trp-574-Leu, ALS Ser-653-Asn, and PPO ΔGly210 for haplotypes mapped in Figure 2.

https://cdn.elifesciences.org/articles/70242/elife-70242-fig2-data3-v2.txt

When haplotypes belonging to distinct mutational lineages are mapped across populations (Figure 2B), it is clear that, despite the many independent origins, gene flow has also played a major role in the spread of resistance across the landscape. Haplotypes from the three most common origins of resistance to ALS herbicides —Trp-574-Leu #1, #2, and Ser-653-Asn #5 (corresponding to 39, 25, and 47 haplotypes, respectively)—were identified in 15, 10, and 14 populations. To test if mutational lineages were more geographically structured than expected given their frequency, we performed a permutation of haplotype assignment to a geographic region. All but two mutational lineages were consistent with expectations under this null, suggesting near panmixia between the Midwestern US and Ontario (Essex + Walpole) for most resistant lineages. The exceptions were ALS Trp-574-Leu #2, which is exclusive to the Midwestern US (44/47 resistant haplotypes mapping to the Midwest or 94%; 95% CIs of regional permutations [0.383,0.638]), and Ser-653-Asn #5, which is at a much higher frequency in Ontario populations (87% found in either Essex or Walpole; 95% CIs of regional permutation [0.370, 0.630]), suggesting that these mutations arose locally in these geographic regions and have not yet had the chance to spread extensively.

We next performed a tree-based test of non-neutral allele frequency change to examine whether TSR alleles have experienced consistent or shifting selection over their histories. Specifically, we implemented a tree-based statistic that relies on the order of coalescent events (Speidel et al., 2019), in addition to a modified version of this statistic that evaluates the probability of selection on more recent timescales (Materials and methods, Tree-based tests for selection). We approached these tests of selection one unique mutational origin at a time, excluding all other resistant lineages from the tree, such that our estimates of the probability of selection for a given mutational origin are relative to all other susceptible lineages.

Under the scenario of consistent selection since the origin of the mutation, four out of seven mutational origins we tested were significant at α = 0.05 after a 5% false discovery rate (FDR) correction (Figure 2C, Figure 2—source data 1). Since it arose, the Midwestern US high-frequency ALS Trp-574-Leu #2 variant showed the strongest signature of selection across all origins and all resistant loci (p = 0.0058), followed by the widespread high-frequency ALS Trp-574-Leu #1 variant (p = 0.0089), the common Midwestern US PPO variant #7 (p = 0.00434), and lastly, the high-frequency Ontario ALS Ser-653-Asn #5 variant (p = 0.02831).

In addition, a test for selection over the most recent 1% of the tree showed that for three of the five selected lineages that predated this time cutoff, there is even stronger evidence for selection on more recent timescales, after FDR correction (Figure 2C, Figure 2—source data 1). The most obvious of these is the ALS Ser-653-Asn variant #5, which, while having the weakest evidence of consistent selection over its history, shows the strongest evidence of selection over recent timescales (p = 3.44 × 10–7). These tests therefore illustrate a strong role for fluctuating selection, intensifying over recent timescales.

While our previous test provides insight into the consistency of selection across the course of a mutational lineage’s history, a conceptually related approach is to directly assess the role of resistance adaptation from standing genetic variation or new mutation based on allele age estimates relative to the onset of the selection. Allele age estimates depend greatly on the accuracy of effective population size estimates over the relevant evolutionary timescale. Namely, for herbicide resistance evolution, we posit that the relevant Ne is most likely the effective population size over the last half-century or less, corresponding with the introduction of agronomic pesticide regimes. While we have previously used δaδi (Gutenkunst et al., 2009) to model species-wide demography in a two-epoch model and found a large population size expansion (historical epoch Ne ~500,000; recent epoch var. rudis Ne ~5,000,000; Kreiner et al., 2019), we now used Relate to infer effective population size through time from genome-wide tree sequence data on even more recent timescales (Speidel et al., 2021; Speidel et al., 2019). Historical Ne between 100 and 1,000,000 years ago appears to have stayed relatively consistent, with a harmonic mean of ~63,000 (standard error across chromosomes ± 7000 years) (Figure 3A). However, our samples show evidence for massive recent population expansion over the last 100 years, with the contemporary geometric mean Ne estimate 3–4 orders of magnitude higher than the historical Ne (Figure 3A)—80,000,000 over the timescale of ALS herbicide use (approximately the last 40 years).

Contemporary population expansion of A. tuberculatus and corresponding ages of TSR variants.

(A) Relate-inferred effective population size through time, illustrating a remarkable population expansion occurring over the last 100 years. The bold line indicates results from genome-wide SNPs, whereas thinner lines represent results from chromosome-by-chromosome analyses, with the shaded area showing the bounds of the variance in the chromosome-by-chromosome data. (B) Allele age inferred from the geometric mean effective population size estimate over the timescale of contemporary herbicide use ( < 50 years ago, GM[Ne] = 83,294,700). Horizontal dashed lines for ALS Trp-574-Leu and ALS Ser-653-Asn, and PPO origins represent the approximate onset of ALS and PPO herbicide use, respectively.

Figure 3—source data 1

Effective population size estimates for each chromosome, and genome-wide, from Relate.

https://cdn.elifesciences.org/articles/70242/elife-70242-fig3-data1-v2.txt
Figure 3—source data 2

Unscaled allele age estimates corresponding to the seven inferred origins of ALS Trp-574-Leu, ALS Ser-653-Asn, and PPO ΔGly210.

Scaling was performed by dividing the raw age (based on an assumed Ne = 500,000) by 166.5894 (corresponding to the geometric mean contemporary Ne estimate of 83,294,700).

https://cdn.elifesciences.org/articles/70242/elife-70242-fig3-data2-v2.csv

Based on our contemporary Ne estimates relevant to the timescale of herbicide use, we rescaled allelic ages for distinct mutational origins across our ARG-inferred trees, accounting for variation across 1250 MCMC ancestral recombination graph iterations. We found considerable variation in the estimated age of resistance mutations across distinct lineages of the same mutation and across the three different TSR loci, according to haplotype groupings from the most likely ARG (Figure 3B). Across all mutational lineages, even for those with slightly lower support for monophyly (ALS Trp-574-Leu #1, #3), the 95% confidence interval of allele ages show these estimates are highly consistent across MCMC sampling of the ARG (Figure 3B). Our estimates of the age of the PPO ΔGly210 lineages are extremely recent and much less variable compared to ALS Trp-574-Leu and ALS Ser-653-Asn lineages (median PPO ΔGly210 age = 3.7 years [SE = 0.2]; ALS Ser-653-Asn = 37.8 [SE = 7.1]; ALS Trp-574-Leu = 37.2 [SE = 106.3]). With ALS-inhibiting herbicides having been introduced in the early 1980s, only one lineage (ALS Trp-574-Leu #3) appears to substantially predate herbicide, with an estimated age of ~350 years. However, this lineage is also the one with the least support for monophyly, which will upwardly bias estimates of allele age. Thus, while the exact timescales highly depend on accurate estimation of contemporary effective population sizes, the results are generally consistent with most mutations arising very recently after the onset of herbicide use.

In aggregate, our analyses have uncovered multiple independent origins of large-effect resistance mutations, along with heterogeneity in their evolutionary histories, from the timescale over which they have persisted to their associated signatures of selection. The spread of these parallel origins across the landscape further allows us to observe how these alleles interact when they meet (Ralph and Coop, 2010), and how this interaction may be modified by other alleles across the genome.

Haplotype competition and inter-locus interactions of target-site resistance mutations

While 16 individuals harbour both the common ALS Trp-574-Leu and ALS Ser-653-Asn mutations, haplotype-level analyses indicate that no single haplotype harbours both mutations (Figure 2—figure supplements 1 and 4). This lack of double resistant haplotypes is a strong violation of expectations under linkage equilibrium (χ2df=1 = 16.18, p = 5.77 × 10–5), further suggesting that no or very little recombination has occurred between these sites. Given how globally common these resistance alleles are (Table 1; 53% and 32% of individuals harbour ALS Trp-574-Leu or Ser-653-Asn), their coexistence, yet independence, suggests that allelic competition may be important in adaptation to ALS-inhibiting herbicides.

Locally in Essex, where haplotypes carrying either the ALS Trp-574-Leu or ALS Ser-653-Asn mutations segregate at intermediate frequencies (29% and 44% respectively; Table 1), signed linkage disequilibrium (LD) between genotypes at these two sites is considerably negative (r = –0.67). In comparison to other non-synonymous SNPs at similar frequencies (minor allele frequency > 0.20) and separated by similar distances (≤ 500 bp) across the genome, this level of repulsion is unexpected (one-tailed p = 0.033; Figure 4—figure supplement 1). However, this repulsion is not restricted to these two resistance mutations, which are only 237 bp apart. Rather, when we visualized the distribution of signed LD between all bi-allelic missense SNPs around the ALS locus, and the two TSR mutations, we observed a predominance of positive association with the focal ALS Trp-574-Leu mutation (n = 19), but a negative association with the ALS Ser-653-Asn mutation (n = 34), across an extended 10 Mb region (Figure 4A). This long-range repulsion is not seen for other pairs of loci that are outliers for particularly strong repulsion within 500 bps (Figure 4—figure supplement 2). The divergence associated with these two TSR mutations segregating in Essex is also apparent with a genotype-based PCA—structure that is otherwise absent across the genome (Figure 4B, Figure 4—figure supplement 3). Together, this indicates that selection for these alternative resistance variants from ALS-inhibiting herbicides within Essex must occur through the competition of extended haplotypes up to ⅓ the size of this focal chromosome.

Figure 4 with 4 supplements see all
Signals of intra- and inter-chromosomal allelic interactions with target-site resistance mutations.

(A) The breadth of haplotype competition between TSR mutations, illustrated by repulsion linkage disequilibrium (opposite signed LD, r) between two target-site-resistance mutations and bi-allelic missense SNPs surrounding them on Scaffold 11 in Essex. Each point shows mean LD in non-overlapping 10 kb windows. A smoothing spline shows that missense SNPs tend to be in positive LD with ALS Trp-574-Leu but negative LD with ALS Ser-653-Asn in Essex. (B) Signatures of population structure for 2 Mb around ALS compared to genome-wide, based on PCAs of genotypes in Essex. Ellipses represent 95% CIs assuming a multivariate distribution. (C) Distribution of mean signed LD of ALS TSR resistance mutations (ALS 574 or 653) with 1 Mb windows genome-wide in Essex, excluding the ALS containing Scaffold 11. Upper and lower 1% quantile indicated by dashed vertical lines. (D) Distribution of p-values from top 2% of genome-wide windows with high absolute signed LD with ALS TSR mutations, from permuting individual assignment within genomic windows and recalculating LD 1,000 times. (E) GO terms significantly enriched for biological process after FDR correction from the set of 348 genes mapping to the top 13, 1 Mb windows that show significantly extreme LD with ALS TSR mutations in Essex. Number of asterisks represent significance level after bonferroni correction (** = p < 0.01, * = p < 0.05).

Figure 4—source data 1

Signed LD in 1 kb windows with two target-site-resistance mutations and bi-allelic missense SNPs surrounding them on Scaffold 11 in Essex.

https://cdn.elifesciences.org/articles/70242/elife-70242-fig4-data1-v2.txt
Figure 4—source data 2

Local PCA at 2 Mb around ALS, along with a genome-wide PCA (excluding the ALS containing scaffold 11), in Essex.

https://cdn.elifesciences.org/articles/70242/elife-70242-fig4-data2-v2.zip
Figure 4—source data 3

Signed LD of 1 Mb windows across the genome (aside from scaffold 11) with either ALS Trp-574-Leu and ALS Ser-653-Asn.

https://cdn.elifesciences.org/articles/70242/elife-70242-fig4-data3-v2.txt

While the competition of haplotypes harbouring these TSR mutations may be important for shaping the distribution of resistance alleles across populations, the selective advantage of a given TSR haplotype may also depend on other modifier loci across the genome. In particular, we might expect that individuals that have withstood many generations of herbicide applications due to large-effect resistance mutations may have also accumulated compensatory and tolerance-conferring mutations across the genome (stacking of resistance alleles; Busi et al., 2013; ffrench-Constant et al., 2004; Kreiner et al., 2020; Petit et al., 2010; Preston, 2003). Considering that haplotype competition seems to have manifested itself in patterns of signed LD within the ALS containing chromosome, we posited that physically unlinked modifiers of resistance (Busi et al., 2013; ffrench-Constant et al., 2004; Kreiner et al., 2020; Petit et al., 2010; Preston, 2003) could be identified due to strong linkage with focal TSR loci.

To identify such interactions in populations in Essex, we calculated mean signed LD between focal ALS Trp-574-Leu or ALS Ser-653-Asn mutations and bi-allelic missense alleles in 1 Mb non-overlapping windows across the genome in Essex (Figure 4C). We then focused on the upper 1% and lower 1% of windows with particularly extreme signed LD with either TSR mutation (24/1156 1 Mb windows). On each outlier window, we performed a permutation in which we randomized TSR allele assignment among individuals 1000 times, to test whether observed LD with resistance genotypes was more extreme than expected. Compared to the null expectation, the 1 Mb window with the strongest ALS TSR association showed a significant excess of positive inter-chromosomal signed LD with ALS Trp-574-Leu (one-tailed p < 0.0001, r = + 0.068) but negative signed LD with respect to the ALS Ser-653-Asn mutation (r = –0.132), consistent with repulsion between TSR alleles. Upon further inspection, this 1 Mb region is directly centred on a cytochrome P450 gene, CYP82D47, that has been implicated in conferring non-target site resistance in Ipomoea purpurea (Leslie and Baucom, 2014).

Of the 24 outlier windows, 13 had p-values consistent with significantly extreme LD with TSR loci after FDR correction at α = 0.05 (Figure 4D). These 13 windows included 348 genes, 120 of which have Arabidopsis thaliana orthologs. These 120 orthologs were enriched for six GO biological processes belonging to two unique hierarchical categories after FDR correction, four of which were enriched even after Bonferroni correction: cellular amine and amino acid metabolic process, programmed cell death, and plant-type hypersensitive response (Figure 4E), seemingly directly related to the function of ALS—amino acid synthesis. Two particularly interesting examples from our set of genes in strong inter-chromosome LD with ALS resistance mutations are those encoding GCN2 (general control non-repressible 2) and KIN10 (SNF1 kinase homolog 10). Both proteins have been previously identified as playing key regulatory roles in response to herbicides, with GCN2 directly involved in homeostatic tolerance to ALS and glyphosate herbicides through regulating autophagy and amino acid signaling (Faus et al., 2015; Zhao et al., 2018). Similarly, KIN10, a key positive regulator of autophagy in A. thaliana, is activated in response to photosystem II herbicides (Baena-González et al., 2007; Chen et al., 2017; Fujiki et al., 2001).

Discussion

The application of herbicides to manage agronomically important weeds has led to one of the best-studied examples of parallel evolution outside the laboratory, with target-site-resistance mutations to ALS-inhibiting herbicides identified in more than 150 species (Heap, 2014). We have studied the evolution of resistance mutations at two genes, ALS and PPO, from a genome-wide perspective across a large fraction of the range of one of the most problematic weeds in the US and Canada, A. tuberculatus. We found rampant evidence for both independent origins and gene flow, competition among resistant haplotypes, and the interaction of large-effect TSR mutations with physically unlinked alleles with resistance-related functions. These results paint a picture of the rise, spread, and fate of adaptive alleles in the face of extreme selection, with important implications for the management of herbicide-resistant agricultural weeds.

Repeated origins and the spread of resistance via gene flow

We detected strong evidence for parallel evolution to herbicides within A. tuberculatus agricultural weed populations at multiple levels. Target-site mutations conferring resistance to PPO and ALS herbicides in the sampled population were found at seven distinct codons, with nine distinct variants, three of which are common. These three common mutations have arisen repeatedly seven times across our sampled populations, based on ARG inference (Table 1, Figure 2), consistent with the largely soft selective sweep signals we observe at a regional scale (Figure 1). ARGs have seen limited implementation outside of human populations for examining patterns of local adaptation but recently have been used to infer the evolutionary processes that govern islands of differentiation across birds (Hejase et al., 2020). In comparison to gene trees that illustrate the average coalescent history of these genomic regions (Figure 2—figure supplement 1), we show that accounting for intragenic recombination through ARG inference has been extremely valuable for further resolving independent origins of adaptive mutations.

From a mutation-limited view of adaptation, the extent of parallelism in target-site herbicide resistance that we observe here is particularly extreme. However, given the estimate of North American A. tuberculatus 4Neµ ~ Θπ = 0.041 for putatively neutral sites (Kreiner et al., 2019), a new TSR mutation at any of the eight adaptive ALS mutations should arise at a rate of Θπ/2—once every six generations (i.e. 0.041/2 x 8 known TSR loci = 0.164 TSR mutations per generation; see also Charlesworth, 2009; Karasov et al., 2010). Furthermore, the estimated Θπ we use to infer the rate of new mutations may even be an underestimate given contemporary population size, which may be closer to census size than long-term estimates of Ne from neutral polymorphism, should determine the mutational supply for rapid adaptation under models of evolutionary rescue (Bell, 2013; Karasov et al., 2010; Kreiner et al., 2018; Neve et al., 2014). Indeed, if we modify this value to reflect the contemporary estimate of Ne over the last 50 years (~8 × 108) and assume an A. thaliana mutation rate of 7 × 10–9 (Ossowski et al., 2010; Weng et al., 2019), our Θ becomes >1 and a new mutation at any TSR codon should arise every generation—consistent with the remarkably parallel mutational origins we describe here. From these inferences, parallelism in simple target-site herbicide resistance adaptation in A. tuberculatus appears to be on par with prokaryotic adaptation and pesticide resistance adaptation in Drosophila melanogaster, where population sizes on the order of Θ ~ 1 facilitate adaptation to occur rapidly, without being limited by mutational input at single sites (Karasov et al., 2010).

In the context of such extreme recurrent evolution, we still find an important role of gene flow in the spread of herbicide resistance across the range. Not only do agricultural regions and populations within them harbour multiple origins of TSR, but distinct recombinational units harbouring these mutational origins also map to many populations (Figure 2B). The widespread impact of gene flow is further consistent with our inference of near panmixia for all but two resistant lineages, although our permutation test is limited in power for rare origins. In part, widespread movement of A. tuberculatus and TSR variants across the North American range is likely to reflect the massive recent expansion we see here (Figure 3A)—population size increasing by four orders of magnitude over the last 100 years. This expansion also corresponds well with A. tuberculatus’s contemporary agricultural association (Sauer, 1957), suggesting that agronomic regimes are likely to have in large part facilitated the success of this weed species. Thus, both extreme mutational parallelism and a complex network of haplotype sharing, via gene flow and colonization, characterize the distribution of herbicide resistance across our sampled agricultural populations.

While the role of repeated origins and widespread gene flow we characterized here fit well with the cosmopolitan and highly convergent nature of herbicide resistance adaptation, the patterns we observe may be influenced by the sensitivity of ARG inference to both phasing quality and fine-scale recombination rate variation. We performed a two-step phasing method, performing population-level phasing with SHAPEIT4, a powerful up-to-date method (Delaneau et al., 2019), after performing read-backed phasing with WhatsHap (Martin et al., 2016). Nonetheless, phase switching remains a challenge for haplotype inference in naturally occurring populations in lieu of long-read population resequencing. Phase-switching between haplotypes is likely to be interpreted as a recombination event during ARG inference, however, by explicitly modeling how these ‘recombinational’ units relate to one another, ARG inference should still be better able to resolve independent origins of adaptive haplotypes than traditional reconstruction methods. Nonetheless, to adjust for the potential phase-switching that may artificially inflate recombination rates, we ran ARGweaver over three magnitudes of recombination rate values (r = 10–7 to 10–9), as well as two resolutions of discrete time steps (t = 20, 30) (Figure 2—figure supplement 2), to find the parameter values that maximized the likelihood of our phased data (r = 10–8 and t = 30). Even across less likely parameter values, we find consistent support for multiple origins across all TSR loci (Figure 2—figure supplement 3).

The selective history of target-site resistance lineages

Detrimental effects of monogenic resistance mutations as a result of pleiotropic tradeoffs and fluctuating ecological selective pressures (Lenormand et al., 2018) have led to the question of whether such costs could be leveraged to prevent the persistence of resistance mutations (Vila-Aiub, 2019). We were able to evaluate evidence for TSR alleles that predated the onset of herbicide usage, and thus learn about how long these alleles persist, by rescaling allelic age estimates by the geometric mean effective population size estimate over the last 50 years (Figure 3). We found evidence for one mutational lineage vastly predating the onset of ALS herbicide use, arising nearly 350 years ago; however, this lineage only showed 45% support for a monophyletic origin across MCMC iterations of the ARG inference. By and large, our results suggest that the ALS resistance mutations we have sampled arose repeatedly, soon after the introduction of ALS-inhibiting herbicides in the 1980s. The haplotypes on which they arose have since been subject to widespread gene flow, which can facilitate a rapid response to selection across a large geographic range. In comparison to these non-synonymous substitutions within the ALS gene, the single-codon deletion that confers resistance to PPO-inhibiting herbicides appears to be much younger, estimated to have only arisen within the last 3–4 years prior to population sampling.

While these allele age estimates provide a powerful test of the extent of adaptation from standing variation versus from new mutations, and the timescale over which resistance mutations may persist, these estimates are an approximation based on the geometric mean Ne over the last 50 years, and do not fully account for the monumental population expansion this species shows. Furthermore, this rescaling depends on the accuracy of our Ne estimate through time, as inferred by Relate (Speidel et al., 2019). Previously, we have also implemented δaδi (Gutenkunst et al., 2009), which uses the site frequency spectrum, to infer demographic history in these samples and similarly found evidence for large recent population expansion in a two epoch model (Kreiner et al., 2019). Relate is a better method for Ne inference in this scenario, however, as it provides estimates of Ne on contemporary timescales (0–100 years ago). Further, it has been shown to be accurate in the face of phasing error, with only a slight overestimation of Ne on recent timescales (Speidel et al., 2019). Nonetheless, this slight bias towards larger Ne estimates in the face of phasing error may suggest that our rescaled allele ages tend to underestimate the true age of these mutations.

Given the challenges of allele age estimation, we also used a conceptually related approach, testing for evidence of selection over two different timescales with a tree-based statistic that is robust to population size misspecification. Reassuringly, this tree-based test clearly shows evidence of selection since a mutation arose (consistent with de novo adaptation) for the history of four out of seven lineages. However, this test further demonstrates that these lineages are even more likely to have been under selection on more recent timescales, with rank order shifts in support across resistant lineages as compared to support for consistent selection over their history. Thus, while by our estimates, adaptation to PPO- and ALS-inhibiting herbicides relies predominantly on de novo mutation, spatially and temporally varying selection has resulted in muted signals of selection over the course of many mutational lineages’ histories—selection that has intensified over timescales even more recent than the onset of herbicide use in some geographic regions.

While the intensification of herbicide use over the last half-century may be one source of temporally varying selection, rotating herbicide and cropping regimes may also contribute to fine-scale fluctuations in selection for or against particular TSR mutations. For example, in corn and soy production systems (where the focal crop alternates each season), PPOs were typically used only in soy, whereas ALS herbicides were heavily used in both crops (Salas et al., 2016; Tranel and Wright, 2002). Thus, the lower net-positive selection from PPO-inhibiting herbicides along with their recent resurgence in popularity (Tranel, 2021) may explain the more recent origins and thus shorter persistence of the PPO ΔGly210 alleles. Similarly, while the ALS Trp-574-Leu mutation tends to confer resistance broadly across all ALS herbicides, the ALS Ser-653-Asn mutation confers resistance to fewer types of ALS-inhibiting herbicides, which also happen to be used more commonly in soy (Patzoldt and Tranel, 2017). This may contribute to the relatively lower frequency of ALS Ser-653-Asn compared to ALS Trp-574-Leu across the A. tuberculatus range, or even suggest that a lack of both focal crop and herbicide rotation facilitated strong recent selection on the high-frequency Ontario ALS Ser-653-Asn lineage.

Intra and inter-locus allelic interactions shape the history of TSR mutations

The outcome of parallel adaptation in a continuous species range has been thoroughly described by Ralph and Coop, 2010. When the geographic spread of an adaptive mutation is migration limited, partial sweeps for parallel adaptive mutational origins that occur in distinct geographic regions will be common. However, as ‘waves of advance’ of these distinct mutational origins expand, eventually coming into contact, beneficial haplotypes will compete on their way to fixation (Ralph and Coop, 2010). Given our evidence for highly parallel TSR adaptation across the range, we expect that this scenario fits the evolution of resistance particularly well. The widespread gene flow we observe sets up a scenario where independent origins of TSR mutations have now met and must interact. Under such a scenario, further background-dependent fitness effects, additive and epistatic interactions with both physically linked and unlinked loci, may shape the success of particular mutational lineages.

The coexistence of mutations of independent origin under strong selection in a given locale, notably of the ALS Trp-574-Leu and ALS Ser-653-Asn mutations in Essex, has allowed us to observe the signature of such intra-locus interactions. The 10 Mb stretch of repulsion and strong haplotypic divergence we observe (Figure 4A and B) likely reflects the unique spatial origins of each focal ALS TSR mutation and local selection. However, these mutations now co-occur across many agricultural regions. The ongoing repulsion of these alleles, combined with selection being constrained to act on independent allelic combinations (Otto, 2021), means that further adaptation to ALS-inhibiting herbicides must occur through the competition (sensu Mather, 1969) of these extended resistance haplotypes nearly 10 Mb long.

One question that arises based on our observations of haplotypic competition is why recombination has not eroded the signal of LD, and further, why we do not observe a single haplotype with both ALS mutations? The former is clearly not because of a lack of opportunity for recombination, since trans-heterozygous individuals that carry both mutations in opposite phases are not uncommon, and both mutations segregate at considerable frequencies. However, it is possible that recombinant double mutants are only rarely generated through recombination. The local LD-based population recombination rate estimate of ρ = 4 Ner = 0.0575 in a region of 100 kb on either side of ALS implies 3.4 new recombination events per generation in the distance between these two mutants (237 bp x (0.057/4)). Accounting for the local frequencies of ALS Trp-574-Leu and ALS Ser-653-Asn alleles (0.29 × 0.44), in a panmictic population, ~ 1 of these recombination events should generate a double resistant mutant every other generation. However, recombination is preferred in promoter regions (Kent et al., 2017; Sandler et al., 2020), suggesting that these calculations may overestimate intragenic levels of recombination and thus the opportunity to generate recombinant resistant haplotypes.

Understanding the fitness consequences of these alleles, both separately and in combination, will determine how this competition will be resolved. If one allele confers significantly greater resistance (and assuming similar costs in the absence of herbicides), we may expect it to eventually reach fixation, outcompeting the alternate haplotype. Alternatively, if the alleles are selectively neutral with regard to each other, stochastic processes may predominate (Ralph and Coop, 2010). Given enough time, however, recombination should create haplotypes carrying both TSR mutations. It is possible that the joint effect of non-synonymous mutations at codon 574 and 653 on ALS protein structure results in negative epistasis, which would keep such double resistant haplotypes rare (as observed for antibiotic resistance alleles, Porse et al., 2020). Alternatively, additive or positive epistatic effects between these mutations would favour fixation of a double mutant haplotype, suggesting that the current observed haplotypic competition is in fact a form of Hill-Robertson interference (where the rate of adaptation is slowed due to linkage) which can eventually be resolved through recombination (Cooper, 2007; Hill and Robertson, 1966; Otto, 2021).

While we find that intra-chromosomal interactions such as competition between alleles have influenced the selective trajectory of individual TSR alleles, we were also interested in the extent to which interactions with physically unlinked loci may have facilitated herbicide resistance evolution. We find evidence that selection on Essex haplotypes containing ALS TSR mutations has likely been affected by such second-site modifiers (Figure 4C–E). We find particularly extreme signed LD between TSR mutations and alleles on different chromosomes. LD between resistance alleles on different chromosomes has been interpreted as epistasis (Gupta et al., 2021), but LD between alleles that are not physically linked may also result from the effects of additive alleles with correlated responses to selection (e.g. the stacking of resistance alleles; Busi et al., 2013).

Alleles in windows on different chromosomes with the strongest evidence of linkage with ALS target-site resistance mutations function in biological processes related to known organismal responses to ALS herbicides. ALS-inhibiting herbicides disrupt biosynthesis of branched amino acids, and a rapid response after exposure leads to amino-acid remobilization through enhanced protein degradation (autophagy) and reduced synthesis (Orcaray et al., 2011; Trenkamp et al., 2009; Zhao et al., 2018; Zulet et al., 2013). We observe significant enrichment for functions in both amino acid production and cell death. These alleles may thus provide additional levels of tolerance on the large-effect TSR background, by compensating for homeostatic disturbances caused by ALS exposure or the potential costs of large-effect resistance mutations (as seen for antibiotic resistance, MacLean et al., 2010).

Conclusion

In conclusion, adaptation to herbicides and the emergence of well-characterized target-site resistance mutations provide a powerful system for characterizing rapid and repeated evolution in plant populations, as well as the consequences of extreme selection on local and genome-wide patterns of diversity. Studies of herbicide resistance evolution have highlighted how extreme selection can modify life-history and plant mating systems (Kuester et al., 2017; Van Etten et al., 2020) and vice versa (Kreiner et al., 2018), as well as the role of small- versus large-effect mutations (or monogenic versus polygenic adaptation) (Kreiner et al., 2021; reviewed in Délye, 2013; Powles and Yu, 2010), costs of adaptation under fluctuating environments (Vila-Aiub, 2019; Vila-Aiub et al., 2009), and mutational repeatability (e.g. Heap, 2014; Menchari et al., 2006) (see Baucom, 2019). The work here contributes to this literature by characterizing not only the parallel origins and spatial distribution of target-site-resistance alleles across a broad collection of agricultural populations, but also heterogeneity in their evolutionary history and key contributing processes, such as fluctuating selection, haplotype competition, and cross chromosomal linkage with putative resistance alleles.

From a practical perspective, this work highlights A. tuberculatus as a worst-case scenario for controlling the evolution of herbicide resistance and containing its spread. Large census and effective population sizes facilitate extreme convergence for repeated selection of identical nucleotide polymorphisms conferring resistance. One key priority for thwarting new resistance mutations from arising and spreading should thus be containing A. tuberculatus population sizes as much as possible. Our findings suggest that most resistance mutations are of very recent origin and that they can persist for several decades with average herbicide usage. Across growing seasons, our best hope is to reduce the fitness advantage of resistant types by decreasing reliance on herbicides, rotating herbicide chemical compounds and mode-of-actions, and using herbicide mixtures. However, one must also be wary of neighboring weed populations. Gene flow of resistance mutations across the range is widespread, facilitating a rapid response to selection from herbicides, potentially even in naive populations. In particular, we find strong regional patterns in the distribution of resistant lineages, suggesting coordinated herbicide management regimes across farms, land-use planning, and hygienic machine-sharing will be important tools for efficient control of herbicide-resistant weeds.

Materials and methods

Amaranthus tuberculatus sequence data

Request a detailed protocol

Sequencing and resequencing data were from a published study (Kreiner et al., 2019). Whole-genome Illumina sequencing data are available at European Nucleotide Archive (ENA), while the reference genome and its annotation are available on CoGe (reference ID = 54057). The analyses in this paper focus on herbicide resistance in 158 agricultural samples, collected from eight fields with high A. tuberculatus densities across Missouri and Illinois in the Midwestern US (collected 2010), and from newly infested counties in Ontario, Canada, Walpole Island, and Essex County (collected 2016). The eight Midwestern US populations previously had been surveyed for resistance to glyphosate (Chatham et al., 2017). Ten additional samples collected from natural populations in Ontario, Canada are also included, but only for tree-based inference. These samples have been recently analyzed with respect to the convergent evolutionary origins of amplification of the glyphosate-targeted gene, 5-enolpyruvylshikimate-3-phosphate (Kreiner et al., 2019), as well as the polygenic architecture of glyphosate resistance (Kreiner et al., 2020).

SNP calling and phasing genotypes

Request a detailed protocol

Filtered VCFs were obtained from Kreiner et al., 2019 for all analyses. Briefly, freebayes-called SNPs were filtered based on missing data ( > 80% present), repeat content, allelic bias ( > 0.25 and < 0.75), read paired status, and mapping quality (> Q30). Six individuals were removed due to excess missing data, leaving 152 for agricultural and 10 natural samples for further analyses.

Known TSR mutations were assayed for presence/absence in our set of 162 A. tuberculatus individuals. At the time, that meant checking for known TSR mutations at eight ALS codons (Yu and Powles, 2014), three PPO codons (Giacomini et al., 2017; Rousonelos et al., 2017; Varanasi et al., 2017), one PsbA (conferring resistance to photosystem II inhibitors) codon (Lu et al., 2019), and three EPSPS codons (Perotti et al., 2019). To assay these mutations in our samples, we referred to the literature on previously verified TSR mutations, extracting the sequence surrounding a given focal TSR mutation, and BLAST (Altschul et al., 1990) searched our reference genome to locate its position.

To phase genotypes into haplotypes, we first used WhatsHap (Martin et al., 2016), which performs readback phasing, and passed these phase sets to SHAPEIT4 (Delaneau et al., 2013) which further phases haplotypes based on population-level information. Since phasing is very sensitive to data quality, we also applied a more stringent threshold of no more than 10% missing data for each SNP. SHAPEIT4 also requires a genetic map; with no recombination map for A. tuberculatus yet available, we used LDhat to infer recombination rates across the genome in our samples (as in Kreiner et al., 2019). Specifically, we used the interval function to estimate variable recombination rates within each of the 16 chromosomes of the pseudoassembly, using a precomputed lookup table for a θ of 0.01 for 192 chromosomes. We then converted rho estimates to genetic distance-based recombination rates (100/4Ner; Ne = 500,000), and used a monotonic spline to extrapolate genetic distance to each SNP in our VCF. We provided SHAPEIT4 an effective population size estimate of 500,000, inferred from previous demographic modeling in dadi (Kreiner et al., 2019).

Tree inference

Request a detailed protocol

Gene trees were inferred based on haplotypes within focal target-site genes (ALS and PPO), and 5 kb on either side around them. This resulted in 296 SNPs and 622 SNPs for inference of ALS and PPO trees respectively. Using the phased data around these genes, we first converted each phased haplotype to FASTA format, performed realignment with clustal omega (Madeira et al., 2019), and then ran clustal-w2 (Rédei, 2008) to infer the maximum likelihood tree, once for each gene, with 1,000 bootstraps. We then plotted mutational status for each focal TSR mutation (ALS Trp-574-Leu, ALS Ser-653-Asn, and PPO ΔGly210) at each tip of both gene trees using ggtree (Yu, 2020).

We ran ARGweaver (Hubisz and Siepel, 2020; Rasmussen et al., 2014) on a region of 20,000 SNPs centered between the ALS and PPO genes on Scaffold 11 across our phased haplotypes. We used the settings -N (effective population size) 500,000 -m (mutation rate) 1.8e-8 --ntimes (estimated timepoints) 30 --maxtime (max coalescent time) 100e3 -c (bp compression rate) 1 -n (MCMC samples) 1,250. We used an effective population size of 500,000, based on the best fitting demographic model previously inferred from this dataset with δaδi (Gutenkunst et al., 2010; Kreiner et al., 2019). To account for any bias introduced by phasing error in the form of inflated recombination rates, we ran multiple iterations of ARGweaver at varying constant recombination rates, from r = 7e–7 to 7e–9, drawing our inferences from the parameter values that maximized the likelihood of observing our data (r = 7e–8). Similarly, we also tested out two parameter values of --ntimes (20, 30), and used t = 30 for our inference, together with r = 7e–8. We then extracted the most likely ARG sample from the MCMC chain (sample 1240), and the local trees corresponding to each of our three focal TSR mutations using arg-summarise. The arg-summarize function of ARGweaver was used to estimate the mean and 95% confidence intervals of the age of each mutational origin (based on clusters inferred from the most likely trees in the previous step) across all MCMC samples of the ARG. Since by default, arg-summarise --allele-age will infer the age of only the oldest allele under a scenario of multiple origins, we subsetted the dataset one mutational origin at a time (including all susceptible haplotypes) to obtain age estimates for each origin. ARGweaver runs on phased bi-allelic SNP data. Therefore to obtain a tree for the PPO ΔGly210 deletion, we recoded the VCF entry at this indel as an SNP, changing the reference allele at this site to a single bp record matching the reference call at the start position (G) with an alternate single base-pair call (A).

As a test for panmixia, we performed a permutation test of the proportion of a mutational lineage (i.e. haplotypes belonging to a particular origin) mapping to Ontario (Essex and Walpole) compared to the Midwest. To do so, we randomized haplotype assignment to either Ontario or the Midwest, and recalculated the proportion of haplotypes belonging to each origin found in each region, 1000x. We then calculated the 95% CI of the proportion of haplotypes mapping to either Essex or the Midwest, as the null expectation under panmixia. When our observed value for a given origin exceeded the null expectation, we took this as significant evidence for stratification in the geographic distribution of a mutational lineage.

Coalescent tree-based tests for selection

Request a detailed protocol

RELATE (Speidel et al., 2021; Speidel et al., 2019), a scalable method for estimating ARGs across large genomic datasets implements a tree-based test for detecting positive selection (Griffiths and Tavaré, 1998; Speidel et al., 2019). Under the standard coalescent model (i.e. assuming selective neutrality of mutations), the number of descendants in a particular lineage is exchangeable. Thus, one can compute the probability of some observed skew in the number of descendants using the hypergeometric distribution (Griffiths and Tavaré, 1998; Speidel et al., 2019). This approach gives us a p-value for this skew under the null (i.e. no selection). We did this on an origin-by-origin basis, comparing the rate of allele frequency change across the tree for a focal resistant lineage compared to all other susceptible haplotypes across the tree. Since this statistic is simply based on the order of coalescents, rather than branch lengths, it should be robust to misspecified effective population sizes used to infer our ARG (Speidel et al., 2019). Since RELATE assumes an infinite sites model and thus is unsuitable for testing hypotheses about multiple origins, we performed our own implementation of this method for trees outputted from Argweaver (Rasmussen et al., 2014).

Briefly, the statistic works as follows. Let fN be the number of carriers of our focal mutation in the current day, N be the total present-day sample size, and kS be the number of susceptible lineages present when the mutation increases in count from 1 to 2. We sum each individual probability that a mutation spreads to at least a given frequency, from fN to N-ks + 2.

(1) pR,denovo=f=fNNkS+2(f1)(Nf1kS3)N1kS1

The null hypothesis, that allele frequency change occurred under drift, is rejected when this one-side p-value is sufficiently small (i.e. α < 0.05), implying selection has governed the spread of this mutation since it first arose.

We also modified this statistic to test for selection on more recent timescales, and thus the scenario of adaptation from standing genetic variation. Here, we need to define kR, the number of resistant lineages at some time (t) before the present day, in addition to kS(t).

(2) pR,sgv=f=fNNkS+2f1Nf1(kR(t)1)(kS(t)1)(N1kR(t)+kS(t)1)

The null hypothesis that the frequency change (between the current day and some time in the past more recent than when the mutation first arose (t) happened under random drift (and hence no selective pressures)) is rejected if this p-value is sufficiently small.

Ne estimation through time

Request a detailed protocol

We used RELATE 1.1.6 to estimate tree sequence from distinct recombinational units across the genome from our phased dataset. Relate requires polarized ancestral allele calls, such that alternate alleles represent the derived state. To do so, we performed a multiple alignment of our A. tuberculatus genome to A. palmeri (Montgomery et al., 2020) using lastz (Harris, 2007), retained the best orthologous chain from the alignment, and extracted variant sites. We modified the A. tuberculatus reference genome with the derived allele states from our multiple alignment, using this modified reference to polarize allele calls. On each chromosome, we then ran RelateParallel.sh --mode All, using the output from all chromosomes to first estimate mutation rate (RelateMutationRate --mode Avg), reestimate branch lengths with this updated mutation rate (ReEstimateBranchLengths), and then lastly estimate population size through time (EstimatePopulationSize.sh). Population sizes were estimated from 0 years ago to 10,000,000 years ago, in epochs timesteps of 100.25 years, to obtain particularly fine-scale estimates in the recent past.

Selection scans and LD-based analyses

Request a detailed protocol

The phased data used as input for ARGweaver was also used to extract selective sweep summary statistics in selscan (Szpiech and Hernandez, 2014). In selscan, we estimated both XPEHH (Sabeti et al., 2007), in this case, the difference in the integrated extended haplotype homozygosity between resistant and susceptible haplotypes, and mean pairwise difference estimates. For both of these statistics, we provided LD-based recombination maps, inferred from LDhat. Because some individuals in Essex always carried at least one resistant ALS haplotype through either mutations at ALS Trp-574-Leu or ALS Ser-653-Asn, to compare patterns of selection associated with resistance and susceptibility, these statistics were calculated at the haplotype, rather than individual level. For each independent origin as inferred from ARGweaver, we similarly inferred XPEHH, as well as H12 (Garud and Petrov, 2016; Garud and Rosenberg, 2015) across the chromosome containing ALS.

We used plink (v1.90b3.46) (Purcell et al., 2007) to calculate both signed LD (r) between each focal TSR mutation and missense mutations on the same chromosome, and with all other bi-allelic missense SNPs across the genome. We used nonsynonymous SNPs as we expected them to be less influenced by population structure and admixture (Good, 2020) compared to synonymous SNPs, but present the correlation between genome-wide LD with synonymous and nonsynonymous SNPs in Figure 4—figure supplement 4. We performed these calculations with respect to a given TSR mutation by using the --ld-snp options to specify a focal mutation. To visualize patterns of signed LD between TSR mutations and other missense SNPs, we split the genome into non-overlapping 10 kb windows and calculated the average LD among all SNPs in each window. All LD calculations were polarized by rarity (e.g. minor alleles segregating on the same haplotypes were regarded as being in positive LD). In Essex, despite being considerably common, both ALS 574 and ALS 653 had a frequency less than 50%, so LD values with all missense alleles for each of these focal TSR mutations are directly comparable. As another visualization of the haplotype structure in this region, we conducted two genotype-based PCAs using the R package SNPrelate (Zheng et al., 2012), for genotypes spanning 2 Mb around ALS, 10 Mb around ALS, and on genome-wide genotypes.

To test whether the negative LD we observed between ALS Trp-574-Leu and ALS Ser-653-Asn was particularly extreme in Essex, we compared this value to pairs of either nonsynonymous or synonymous SNPs of similar frequency (minor allele frequency >0.20) and physical distance apart (< 500 bp). To test whether the top 2% of 1 Mb windows of nonsynonymous SNPs with particularly low or high signed LD with ALS TSR mutations (either ALS Ser-653-Asn or ALS Trp-574-Leu) was significantly different from the null expectation, we used a permutation approach whereby we randomly shuffled the assignment of the focal ALS TSR mutation between all individuals and calculated mean LD (with respect to the permuted TSR mutations) in the window of interest. We repeated this permutation 1000 times to generate a null distribution for comparison to the real average signed LD value of each region. This permutation test explicitly evaluates whether a TSR mutation and missense mutations in a focal window are more likely to be found together in the specific set of individuals containing the focal TSR mutation than any other set of individuals of the same size. Thus, this test is robust to variance in data quality across windows. The proportion of permuted observations with a mean absolute signed LD exceeding the observed signed LD was taken as the two-tailed p-value for cross-chromosome LD. Lastly, we found the intersect of these windows with the closest gene according to our genome annotation, and found their A. thaliana orthologs (Emms and Kelly, 2015). We used the set of A. thaliana orthologs found across all 13 significantly enriched 1 Mb windows in a Gene Ontology (GO) Enrichment analysis for biological processes.

Data availability

Sequencing data used in this paper were previously deposited in ENA under project number PRJEB31711, and reference genome is available on CoGe (reference ID = 54057). Code used to generate results in this paper is available at https://github.com/jkreinz/TSRevolution (copy archived at swh:1:rev:93ccf9e9d8d1fa25d1e4ef37a62bbb3c91df35dd).

The following previously published data sets were used
    1. Kreiner JM
    2. Giacomini D
    3. Bemm F
    4. Waithaka B
    5. Regalado J
    6. Lanz C
    7. Hildebrandt J
    8. Sikkema PH
    9. Tranel PJ
    10. Weigel D
    11. Stinchcombe JR
    12. Wright SI
    (2019) EBI
    ID PRJEB31711. Multiple modes of convergent adaptation in the spread of glyphosate-resistant Amaranthus tuberculatus.

References

    1. Comai L
    2. Stalker D
    (1986)
    Mechanism of action of herbicides and their molecular manipulation
    Surveys of Plant Molecular and Cell Biology 3:166–195.
    1. Griffiths RC
    2. Marjoram P
    (1996) Ancestral inference from samples of DNA sequences with recombination
    Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 3:479–502.
    https://doi.org/10.1089/cmb.1996.3.479
  1. Book
    1. Griffiths RC
    2. Marjoram P
    (1997) An Ancestral Recombination Graph
    In: Tavare S, editors. Progress in Population Genetics and Human Evolution. Springer. pp. 16–329.
    https://doi.org/10.1007/978-1-4757-2609-1_16
  2. Book
    1. Heap I
    (2014) Herbicide Resistant Weeds
    In: Pimentel D, Peshin R, editors. Integrated Pest Management. Springer Netherlands. pp. 281–301.
    https://doi.org/10.1007/978-94-007-7796-5_12
    1. Mazur BJ
    2. Falco SC
    (1989) The Development of Herbicide Resistant Crops
    Annual Review of Plant Physiology and Plant Molecular Biology 40:441–470.
    https://doi.org/10.1146/annurev.pp.40.060189.002301
  3. Conference
    1. Tranel PJ
    2. Trucco F
    (2009) Weedy and invasive plant genomics
    21st-century weed science: a call for Amaranthus genomics. pp. 53–81.
    https://doi.org/10.1002/9780813806198

Decision letter

  1. Philipp W Messer
    Reviewing Editor; Cornell University, United States
  2. Molly Przeworski
    Senior Editor; Columbia University, United States
  3. Philipp W Messer
    Reviewer; Cornell University, United States
  4. Pleuni S Pennings
    Reviewer; San Francisco State University, United States

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Repeated origins, gene flow, and allelic interactions of herbicide resistance mutations in a widespread agricultural weed" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, including Philipp W Messer as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Molly Przeworski as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Pleuni S Pennings (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

Both reviewers agree that this paper is interesting and should warrant publication in eLife, assuming that the authors can perform the following essential revisions. These points are laid out in more detail in the "recommendations for the authors" sections below.

1) Please add a discussion of what the results mean for preventing resistance in weeds, as pointed out by reviewer 2.

2) Reviewer 1 had some concerns about the robustness of ARG inferences to potential phasing errors. This should be discussed in more detail. Similarly, it should be tested whether results are robust to potential misspecification of the recombination rate at the resistance loci.

3) Please follow the suggestions provided by reviewer 2 regarding the analyses shown in Figure 2. It might be helpful to show the actual haplotypes.

4) Given that the finding of the recent population expansion is critical for the interpretation of allele ages, and therefore the question of whether adaptation occurred from SGV or de novo mutations, it would be reassuring if this result could be confirmed with a different inference method such as DaDi, stairway plot, or SMC++. If not possible, or if results are inconsistent, the claims about allele ages and sweep origins would need to be qualified accordingly.

5) Both reviewers were not entirely convinced by the finding of possible haplotype competition. Please clarify more precisely what you mean by interference / competition / etc. It should also be made more clear what the null expectation here is; for example, how often would ALS574 and ALS 653 be expected to occur together on the same haplotype in a model of free recombination?

Reviewer #1 (Recommendations for the authors):

The need for phased data for all RELATE, ARGweaver, and selection scan analyses raises some concerns about how potential phasing errors could affect the results. I feel the authors should at least discuss this. Ideally, additional analyses would be performed to test for such effects explicitly, although it is not clear to this reviewer what the most appropriate model would be. Maybe respective analyses were already performed in the papers that introduced the methods, in which case those results could simply be discussed here.

The recombination rate inference with LDhat could also be problematic at the resistance loci due to the presumably strong selection these loci have experienced, which may have affected LD patterns significantly. If, for example, sweeps have led to substantially elevated levels of linkage disequilibrium at these loci, LDhat would presumably interpret this as a lower recombination rate. My suggestion would be to run potentially affected analyses at the sweep loci not only with the recombination map inferred by LDhat, but also scenarios with a constant recombination rate set to a range of values, in order to test how robust the results would be to potential recombination rate misspecification.

The results from the demographic inference are intriguing and also play an important role for the interpretation of the results, as they provide the basis for the rescaling used to estimate both the population-level adaptive mutation rate and allele ages. If there would be a substantial error in the estimate of the recent effective population size, interpretations about adaptation from de novo mutation versus standing genetic variation could change quite dramatically. The demographic inference is currently based on the RELATE method. I wonder whether the authors have considered confirming these results with alternative methods to check how robust they are between different methods. For example, it would be interesting to compare this with SFS-based methods such as stairway plot and DaDi, or hybrid approaches such as SMC++. If the results from different methods agree, this would greatly increase trust in their accuracy. By contrast, if there are large discrepancies, possible reasons for this and potential implications for the interpretation of results would need to be discussed.

The arguably most speculative part of this study are the results on negative LD between common resistance mutations, which is interpreted by the authors as being caused by either haplotype competition, negative epistasis, or selective interference. Again, phasing could be somewhat problematic here. Maybe there are enough homozygotes in the data set that the authors could at least confirm that some of the findings hold even for unphased data? Also, I'm concerned that unknown population structure could potentially play into these results. This would be difficult to test, obviously, given that it's unclear what models one should test specifically. However, one question that I think could be more easily answered is how likely it is to find such negative LD at other genomic loci. Are the resistance loci truly genomic outliers in this regard? I hope the authors can add some discussion about whether they think population structure may or may not provide a potential alternative explanation for the observed negative LD.

Reviewer #2 (Recommendations for the authors):

I think this is a super interesting paper. It shows evidence for multiple origins (9) of drug resistance and also widespread migration (transmission) of resistance alleles between local populations. That in and of itself is worth publishing for me.

I feel like what is missing from the paper is a discussion of what these results mean for preventing resistance in weeds. I think that it means that when we want to prevent resistance, we need to be concerned with preventing mutation (control pop size / selection locally) and we need to be concerned with gene flow between populations (do not share equipment and staff?). I think the paper could benefit from more translation of the results to a non-evolutionary genetics audience. I also would love to see more easy-to-digest information about the relevant herbicides and the weed itself. This way, you can make the paper more interesting for evolutionary geneticists who have never thought about herbicide resistance and you can introduce those who are interested in plants and agriculture to evolutionary genetics.

In my opinion, some major improvements could be made to the presentation. Part of that is eye for detail in figures (e.g., use same colors and same notation throughout) and text (e.g., line 118 why mention psbA here, but nowhere else? Line 126 why mention glyphosate here? These things make the text hard to follow.…).

Some of the analysis on allele ages / evidence of recent selection should be presented differently before I am convinced.

The same is true for the analysis on clonal interference. This should start with clarifying what is meant by interference / competition / etc.

Instead of TSR I would write "resistance allele" – I think that will help other readers.

Figure 1: could Figure 1 be remade using the origins from Figure 2? Once you know that there are multiple origins, doing an analysis that ignores these origins doesn't make too much sense, I think. Can you use the same nomenclature as Table 1 and Figure 2?

Figure 2:

1. First, use ARG to find origins. Conclusion: there are multiple origins!

2. Then plot origins and fractions on a map. Conclusion: there is migration as well, though not panmixia (would it make sense to test that?).

3. Finally test for recent selection. Here I would love to see some kind of strength of selection statistic rather than just p-value.

For one allele there is evidence for recent selection, but not selection since origin (right?) – ALS 574#4. Could you show a zoom in for the tree with the 0.02% cut-off for that allele?

Conclusion of figure 2: multiple origins and gene flow are both important. For some alleles evidence that there has been selection since origin – suggests de novo evolution of resistance. But for some possibly standing genetic variation? I am not so convinced of that part of the analysis. Maybe it'd help to show haplotypes?

Suggestion: can you show haplotypes (like Harpak et al. on rats or Garud et al. on Drosophila or Williams and Pennings on HIV?)

Please label MidWest and Ontario. Could the pies have the same scale in the two regions?

Suggestion: Use the same color and nomenclature scheme throughout. For example, the 653_7 origin should have that name and color throughout the figure and other figures.

Figure 3:

The increase in pop size is clear and not surprising.

The allele ages are interesting. Could they be plotted?

I am not convinced that there is evidence for SGV for 653#7, because its age is only about 30 years according to the figure. 3B and 2C are hard to reconcile in my head.

Figure 4:

This is about interactions between ALS574 and ALS 653. These are 300-ish bp apart.

The resistance mutations never occur on the same haplotype which is surprising (how surprising? Could we get a prediction for how common this should have been if there was free recombination given the age of the alleles?).

Now, I'd say the next question is: how often do they occur in the same individual, given their commonality in each population. Like a HW-test. Is there selection against carrying two resistance alleles?

I am not entirely convinced that there is evidence for competition. Or maybe it is not clear to me what the authors mean exactly by competition / interference.

I am not sure my brain can follow Figure 4A – what would be the expectation here given multiple origins at both loci?

https://doi.org/10.7554/eLife.70242.sa1

Author response

Essential revisions:

Both reviewers agree that this paper is interesting and should warrant publication in eLife, assuming that the authors can perform the following essential revisions. These points are laid out in more detail in the "recommendations for the authors" sections below.

1) Please add a discussion of what the results mean for preventing resistance in weeds, as pointed out by reviewer 2.

We have added a paragraph to the conclusions discussing the implications of this work for preventing the evolution and spread of resistance in weeds. In the introduction, we have expanded the background on the weed species and relevant herbicide use.

2) Reviewer 1 had some concerns about the robustness of ARG inferences to potential phasing errors. This should be discussed in more detail. Similarly, it should be tested whether results are robust to potential misspecification of the recombination rate at the resistance loci.

We performed extensive reanalysis to address this important issue, which greatly contributed to the time this work was under revision. We have rephased our entire dataset, using the most up-to-date phasing method, as well as performed ARG inference under multiple constant recombination rates (and time-step parameters) to focus our inference on the parameter set that maximizes the likelihood of observing our data and assess the extent to which the conclusions are robust to uncertainty in recombination rates. The main findings of this manuscript have remained consistent—we find evidence for multiple origins along with a widespread role of gene flow in their spread, and this is robust under multiple recombination rate values. We have also expanded our discussion of the limitations of these inferences (Lines 558-574, 594-607).

3) Please follow the suggestions provided by reviewer 2 regarding the analyses shown in Figure 2. It might be helpful to show the actual haplotypes.

We have included a visualization of the haplotypes corresponding to each origin as inferred from the ARG (Figure —figure supplement 4). As suggested, we also used the H12 (and the XPEHH statistic) to evaluate how these origins correspond to signatures of selection as you move across the genome (Figure 2—figure supplement 5,6), although given the nature of the algorithm used to infer these origins, we expect the signals to have been somewhat decayed due to both subsequent mutation and recombination.

4) Given that the finding of the recent population expansion is critical for the interpretation of allele ages, and therefore the question of whether adaptation occurred from SGV or de novo mutations, it would be reassuring if this result could be confirmed with a different inference method such as DaDi, stairway plot, or SMC++. If not possible, or if results are inconsistent, the claims about allele ages and sweep origins would need to be qualified accordingly.

In a previous publication, we used DaDi to perform demographic modelling of Amaranthus tuberculatus using the same WGS dataset. Qualitatively, this inference was consistent in that it inferred a large population expansion. However, a limitation of this demographic inference and most such inference methods in the context of this work, is that it is difficult to gain estimates over contemporary timescales, <100 or even 1,000 years ago. This is one reason why we were particularly excited about using Relate for such purposes. The expansion we observed over the last hundred years with Relate is consistent with the contemporary success of this plant as an agricultural weed, especially in light of reported densities up to 360 plants/m2 in heavily infested fields (Costea, Weaver, and Tardif 2005), and high levels of diversity in the species (Kreiner et al. 2019). The robustness of Relate has been tested under phasing error, for which it showed little bias in response to phase switching (Speidel et al. 2019). Nonetheless, we have qualified the interpretation of these results accordingly, especially as they pertain to allele age estimates (second paragraph of “The selective history of target-site resistance lineages” section of the Discussion; lines 594-607).

More generally, we recognize the difficulty in allele age estimation, and this was a major motivation for a complementary set of analyses we performed—a test for evidence of selection over two different timescales across the tree that is robust to effective population size misspecification (Figure 3B, Methods section “Coalescent tree-based tests for selection”). We used this tree-based test for selection to evaluate consistent selection over the entire history of a resistant lineage (conceptually similar to selection on a de novo mutation) as well as selection on timescales more recent than a mutation's origin. Consistent with our allele age estimates, these tree-based tests provide support for most alleles showing evidence of consistent selection since those alleles arose (de novo). In contrast to allele-age estimates, this test also provides evidence for fluctuating selection. While the majority of alleles show evidence for selection since they arose, we also observe generally stronger evidence of selection on even more recent timescales along with rank order shifts in significance across mutational lineages, suggesting the strengthening of selection from herbicides on recent timescales.

5) Both reviewers were not entirely convinced by the finding of possible haplotype competition. Please clarify more precisely what you mean by interference / competition / etc. It should also be made more clear what the null expectation here is; for example, how often would ALS574 and ALS 653 be expected to occur together on the same haplotype in a model of free recombination?

We have revised the writing of both the results and discussion of this topic greatly, to clarify what we mean by the intralocus interactions we describe as haplotype competition, and how it pertains to a model of the geographic spread of spatial parallel origins as in Ralph and Coop 2010 (Results, lines 337-339, 347-350; Discussion, lines 600-613). We suggest that the combination of parallel origins and subsequent geographic spread with strong selection has resulted in the competition of remarkable large resistant haplotypes (i.e., “genotypic selection” (Neher and Shraiman 2009), as opposed to fine-scale “allelic selection”) in the face of selection from herbicides.

We highlight that (in accordance with reviewer comments) distinct, spatial origins and strong local selection alone likely lead to the origin of the signature of negative LD and extreme structure that we observed between these local genotypes (Figure 4A, B). However, these mutations are no longer distinct in the space which they occupy, and coexist both globally and locally (Figure 2B). Thus, while the distinct spatial origins of these resistance mutations may have created a signal of haplotypic divergence, the fact that such a signal is only persistent in a large region around the ALS locus (locally in Essex where both mutations are at intermediate frequencies; Figure 4B), suggests that haplotypic competition between these mutations is present, and preserving this signal of divergence. In particular, we observe heightened LD across genotypes on this chromosome for a 10 Mb region, between genotypes associated with one resistance allele or another (i.e. repulsion). Further support for the independence of and thus competition between these coexisting alleles comes from haplotype-level inference. Despite observing 16 individuals that harbour both ALS Trp-574-Leu and ALS Ser-653-Asn globally, we observe no recombinant, double resistant haplotype. This lack of recombinant haplotypes is a strong violation of expectations under linkage equilibrium (2df=1 = 16.18, p = 5.77e-5). Given that these resistance mutations are selectively redundant to a number of ALS-inhibiting herbicides, the local patterns of genotypic and haplotypic structure we describe appears consistent with the gamut of the Ralph and Coop (2010) model: parallel spatially-distinct origins, strong selection, spread via gene flow, and haplotype competition.

Lastly, we speculate in the discussion about what additional processes could influence the ultimate outcome of this haplotypic competition based on potential fitness effects when these two alleles are combined onto the same background (Discussion lines 637-650).

Reviewer #1 (Recommendations for the authors):

The need for phased data for all RELATE, ARGweaver, and selection scan analyses raises some concerns about how potential phasing errors could affect the results. I feel the authors should at least discuss this. Ideally, additional analyses would be performed to test for such effects explicitly, although it is not clear to this reviewer what the most appropriate model would be. Maybe respective analyses were already performed in the papers that introduced the methods, in which case those results could simply be discussed here.

The recombination rate inference with LDhat could also be problematic at the resistance loci due to the presumably strong selection these loci have experienced, which may have affected LD patterns significantly. If, for example, sweeps have led to substantially elevated levels of linkage disequilibrium at these loci, LDhat would presumably interpret this as a lower recombination rate. My suggestion would be to run potentially affected analyses at the sweep loci not only with the recombination map inferred by LDhat, but also scenarios with a constant recombination rate set to a range of values, in order to test how robust the results would be to potential recombination rate misspecification.

We agree with these concerns and have performed extensive revisions to the methods and discussion (lines 558-574; 594-607) to address this comment.

The reviewer points out that issues with phasing, such as phase switching, may bias inferences from sweep scans and ARG inference. In response to this, we revisited our phasing method, and updated our approach from phasing with SHAPEIT2 to read-backed phasing in WhatsHap + population-level phasing with SHAPTEIT4. Intuitively, we find these updated methods have improved our inference as we are able to resolve a higher magnitude of difference in haplotype homozygosity and difference in mean pairwise diversity between susceptible and resistant haplotypes (e.g. XPEHH y-axis increased from 2 to 3). This improvement highlights that phase switching (e.g., swapping a resistant mutation onto a susceptible haplotype) can degrade signals of selection. However, it is clear that we are able to resolve strong signals of selection nonetheless, for example, in Essex where a selective sweep signal extends for ~10 Mb. We also performed a scan based on LD across genotypes, rather than haplotypes, which picks up on very similar patterns as our haplotype-based scans despite potential phasing bias (Figure 1).

While recombination rate inference is required for nearly all phasing software, we made sure to use a constant recombination rate for ARGweaver inference at our focal resistance loci. The ARGweaver manuscript (Rasmussen et al. 2014) specifically mentions how phasing errors will lead the algorithm to infer higher recombination rate values, although we see phase switching as less of an issue for the reconstruction of evolutionary history. In this case, high rates of phase switching will result in shortening of a given recombinational unit and thus less information supporting a focal tree. While this may be the case, we still find that this breadth of information is better able to resolve origins of resistance than gene-level inference.

Following your suggestions, we additionally tested ARGweaver over 3 orders of magnitudes of constant recombination rate values (as well as two preciseness of timesteps across the tree) to find the parameter sets that maximized the likelihood of observing our data (Results lines 211-233; Figure 2 – supplemental figure 2-4). In particular, we focus our inference on t=30, r=10-8, which maximizes the likelihood of observing our data. However, we additionally show for other such parameter sets that a scenario of multiple origins is supported for all TSR loci (Figure 2 – supplemental figure 3), suggesting these inferences are robust to possible phase-switching inflating recombination rate. With a recombination rate either one order of magnitude lower or higher than maximizes our data, in all runs the origins inferred are completely consistent for ALS Ser-653-Asn and PPO210. Only for ALS Trp-574-Leu do we see some differences in the number of origins inferred, where using the most likely parameter set is able to resolve fewer origins (Figure 2 – supplemental figure 3).

Generally, while phasing error is a limitation of these ARG based analyses (discussed on lines 558-574), we find the application of ARGs to this dataset a valuable and powerful example of their use in genomic datasets. Compared to our gene-tree-based inference (Figure 2 supplemental figure 1), ARGs have been effective in bringing together origins that were separated out in the gene tree, whether they were being separated out as a result of recombination events subsequent to their origins or phase switching.

The results from the demographic inference are intriguing and also play an important role for the interpretation of the results, as they provide the basis for the rescaling used to estimate both the population-level adaptive mutation rate and allele ages. If there would be a substantial error in the estimate of the recent effective population size, interpretations about adaptation from de novo mutation versus standing genetic variation could change quite dramatically. The demographic inference is currently based on the RELATE method. I wonder whether the authors have considered confirming these results with alternative methods to check how robust they are between different methods. For example, it would be interesting to compare this with SFS-based methods such as stairwayplot and DaDi, or hybrid approaches such as SMC++. If the results from different methods agree, this would greatly increase trust in their accuracy. By contrast, if there are large discrepancies, possible reasons for this and potential implications for the interpretation of results would need to be discussed.

In a previous manuscript, we used DaDi to perform demographic inference in these samples of A. tuberculatus. Qualitatively, this two-epoch model showed a similar pattern, in that both major lineages of A. tuberculatus have experienced a recent population size expansion (Kreiner et al. 2019). However, a major limitation of this DaDi analysis, and other mentioned Ne size estimation software, is the timescale of this Ne reconstruction. In particular, we were drawn to Relate as it allows for Ne estimation over timescales relevant to herbicide use (the past ~50 years). To our knowledge, this remains a limitation of most other methods used for demographic inference. We have expanded our discussion of this Ne estimation and how bias in Ne estimates through time may influence allelic age estimates. In particular, while Relate has been shown to be relatively insensitive to phasing error (Speidel et al. 2019; Sup Figure 4), phase-switching may result in slightly overestimated effective population size, thus shifting our allele-age estimates to more recent timescales. Because of this limitation (discussed at lines 594-611), we also performed a conceptually related test robust to effective-population size misspecification, which examines evidence of selection across the tree over two different timescales.

The arguably most speculative part of this study are the results on negative LD between common resistance mutations, which is interpreted by the authors as being caused by either haplotype competition, negative epistasis, or selective interference. Again, phasing could be somewhat problematic here. Maybe there are enough homozygotes in the data set that the authors could at least confirm that some of the findings hold even for unphased data? Also, I'm concerned that unknown population structure could potentially play into these results. This would be difficult to test, obviously, given that it's unclear what models one should test specifically. However, one question that I think could be more easily answered is how likely it is to find such negative LD at other genomic loci. Are the resistance loci truly genomic outliers in this regard? I hope the authors can add some discussion about whether they think population structure may or may not provide a potential alternative explanation for the observed negative LD.

We agree that phasing is an issue if one infers evidence of these processes by just assessing whether or not we observed double resistant haplotypes. However, another main line of evidence is patterns of LD based on genotypes, rather than haplotypes (e.g. Figure 4A; using the plink1.9 algorithm). This analysis shows that based on genotypic associations, genotypes at ALS Trp-574-Leu are in negative LD with genotypes at ALS Ser-653-Asn (with an observed LD (r) value of -0.67 in Essex). Furthermore, it highlights repulsion between genotypes associated with either allele for nearly a 10 Mb stretch surrounding ALS. For these revisions, we compared the repulsion we see between these two ALS loci to similarly common (minor allele frequency > 0.20) and physically close (<500 bp apart) pairs of loci, finding that the repulsion we observe is more extreme than expected (p = 0.033). Beyond these pairwise comparisons, such a stretch of the repulsion across a 10 Mb segment is not observed (Figure 4—figure supplement 2).

While these regions are clear genomic outliers, we very much agree with the reviewer that the most likely explanation relates to a combination of local selective sweeps in separate populations followed by their spread. We have substantially revised the wording in the results (lines 386-389, 405-407) and discussion (lines 639-653) to make our inferences clearer, as well as added an analysis comparing local and genome-wide patterns of population structure through PCAs (Figure 4B). To reiterate, we believe that distinct spatial origins (and thus too some extent, population structure) contribute to the local structure we see around ALS. Strong selection on the haplotypes where these alleles arose before their contact has generated the signal of extended LD and repulsion between these local origins. However, given the co-occurrence of these alleles both globally (35% for ALS Trp-574-Leu and 19% for ALS Ser-653-Asn) and locally (within Essex, 29% and 44%, respectively), and even their co-segregation within 15 individuals, it is clear that these alleles have gone beyond their local origins and have been in contact for some time. It is clear that haplotype divergence in this region has been remarkably preserved, as individuals carrying different resistance mutations are not otherwise diverged (Figure 4B). Therefore, the evolution of target-site resistance seems to not only fit the first half of the Ralph and Coop (2010) model of spatial parallel origins (multiple origins and local increase), but also the latter half, the meeting and competing of these alleles. In particular, the breadth of the repulsion we see highlights that selection among resistant haplotypes must be occurring at this scale, through the competition of nearly 10 Mb-long haplotypes/genotypes associated with either ALS Trp-574-Leu or ALS Ser-653-Asn substitutions.

It is interesting to reflect on what may facilitate or prolong this haplotype competition: whether this may reflect inefficient selection and thus selective interference, whether the interaction of these alleles are neutral, or whether negative epistasis might drive selection against double resistant types. We agree this exercise is completely speculative, and so remove any conclusions about the underlying process and just offer these alternative scenarios in the discussion.

Reviewer #2 (Recommendations for the authors):

I think this is a super interesting paper. It shows evidence for multiple origins (9) of drug resistance and also widespread migration (transmission) of resistance alleles between local populations. That in and of itself is worth publishing for me.

I feel like what is missing from the paper is a discussion of what these results mean for preventing resistance in weeds. I think that it means that when we want to prevent resistance, we need to be concerned with preventing mutation (control pop size / selection locally) and we need to be concerned with gene flow between populations (do not share equipment and staff?). I think the paper could benefit from more translation of the results to a non-evolutionary genetics audience. I also would love to see more easy-to-digest information about the relevant herbicides and the weed itself. This way, you can make the paper more interesting for evolutionary geneticists who have never thought about herbicide resistance and you can introduce those who are interested in plants and agriculture to evolutionary genetics.

We appreciate the suggestion. We have added a paragraph to the conclusion (last paragraph, lines 729-743) and changed our introduction to provide more background and easy-to-digest information about A. tuberculatus and relevant herbicide use.

In my opinion, some major improvements could be made to the presentation. Part of that is eye for detail in figures (e.g., use same colors and same notation throughout) and text (e.g., line 118 why mention psbA here, but nowhere else? Line 126 why mention glyphosate here? These things make the text hard to follow.…).

We have gone through the text to remove any extraneous and distracting information. Thanks for pointing this out. We have also double-checked for colour consistency throughout the figures in the paper.

While we have clarified the first paragraph of the results you pointed out, we have kept our reference to psbA and glyphosate and clarified their context, as we initially assayed for all known TSR mutations (lines 153-159).

Some of the analysis on allele ages / evidence of recent selection should be presented differently before I am convinced.

The same is true for the analysis on clonal interference. This should start with clarifying what is meant by interference / competition / etc.

We address these comments below in response to specific suggestions

Instead of TSR I would write "resistance allele" – I think that will help other readers.

TSR is the common language for these types of mutations in the weed science literature, however, to make it more readable to a wider audience, in many instances, we have changed TSR to resistance allele or lineage when appropriate. Thanks for the suggestion.

Figure 1: could Figure 1 be remade using the origins from Figure 2? Once you know that there are multiple origins, doing an analysis that ignores these origins doesn't make too much sense, I think. Can you use the same nomenclature as Table 1 and Figure 2?

We think it interesting to illustrate the regional sweep signals, as this reflects the influence that herbicide use has had on patterns of diversity and homozygosity at these regional scales. However, we take your point and have additionally provided sweep scan signals corresponding to these independent origins in Figure 2 – supplemental figure 5 and 6. In some cases these signals are relatively muted, perhaps reflecting both recombinational and mutational processes subsequent to their origins.

We have also made sure to use the same nomenclature for resistance mutations throughout (i.e. ALS Trp-574-Leu for ALS Trp-574-Leu etc).

Figure 2:

1. First, use ARG to find origins. Conclusion: there are multiple origins!

2. Then plot origins and fractions on a map. Conclusion: there is migration as well, though not panmixia (would it make sense to test that?).

To test whether the stratification we observe is more extreme than expected given the number of haplotypes mapping to each origin, we performed a permutation in which we randomized assignment of haplotypes across geographic regions (Ontario versus the Midwestern US), repeated 1,000 times, and recalculated the proportion of haplotypes of a given origin mapping to each geographic region (methods lines 824-831). This provides an expectation for the random or panmictic distribution of lineages corresponding to each origin given their frequency. Across the seven origins we characterize, we find only two that have more extreme stratification than expected under this null, further emphasizing the role of gene flow in the spread of resistance.

3. Finally test for recent selection. Here I would love to see some kind of strength of selection statistic rather than just p-value.

We agree that this would be very nice. However, this is not a feature of ARGweaver or Relate inherently. We had implemented another method (Stern, Wilton, and Nielsen 2019) to infer the strength of selection for one origin, and presented it in the previous version of this manuscript. However, when analyzing each origin, we find this approach yields highly unlikely results, and thus we believe our data must not be suitable for such a test. We have removed this analysis from our revised manuscript. We hope to be able to address the question of selection coefficients associated with herbicide resistance in the future, with a dataset more appropriate for such questions (i.e., randomly sampled with respect to herbicide resistance).

For one allele there is evidence for recent selection, but not selection since origin (right?) – ALS 574#4. Could you show a zoom in for the tree with the 0.02% cut-off for that allele?

Conclusion of figure 2: multiple origins and gene flow are both important. For some alleles evidence that there has been selection since origin – suggests de novo evolution of resistance. But for some possibly standing genetic variation? I am not so convinced of that part of the analysis. Maybe it'd help to show haplotypes?

We agree that this would be very nice. However, this is not a feature of ARGweaver or Relate inherently. We had implemented another method (Stern, Wilton, and Nielsen 2019) to infer the strength of selection for one origin, and presented it in the previous version of this manuscript. However, when analyzing each origin, we find this approach yields highly unlikely results, and thus we believe our data must not be suitable for such a test. We have removed this analysis from our revised manuscript. We hope to be able to address the question of selection coefficients associated with herbicide resistance in the future, with a dataset more appropriate for such questions (i.e., randomly sampled with respect to herbicide resistance).

Suggestion: can you show haplotypes (like Harpak et al. on rats or Garud et al. on Drosophila or Williams and Pennings on HIV?)

One important addition to our ARG-based inference is including support for each monophyletic origin across MCMC samples (i.e., in Figure 2A). Based on our allele age rescaling, while we still find that one ALS Trp-574-Leu mutational lineage predates the onset of herbicide use, this lineage also shows relatively low support for monophyletic clustering across MCMC samples. Accordingly, we emphasize the lack of confidence we have in evidence for standing genetic variation for these alleles. In support of adaptation from de novo mutations, our estimates put all other ALS mutational lineages arising right around the onset of herbicide use suggesting that resistance alleles may only persist with recurrent selection from herbicides.

We have provided a visualization of the haplotypes corresponding to different ARG-inferred origins (Figure 2 – supplemental figure 4), along with figures showing how XPEHH and H12 change as we move across the chromosome for haplotypes mapping to each origin (Figure 2 supplemental figure 5 and 6).

Please label MidWest and Ontario. Could the pies have the same scale in the two regions?

We have labelled the Midwest and Ontario agricultural regions (Essex and Walpole). We have left the different pie size legends as we feel the different scales make a bigger impact on readability.

Suggestion: Use the same color and nomenclature scheme throughout. For example, the 653_7 origin should have that name and color throughout the figure and other figures.

We have double-checked that our colour scheme and nomenclature are consistent throughout.

Figure 3:

The increase in pop size is clear and not surprising.

The allele ages are interesting. Could they be plotted?

The allele ages are plotted in Figure 3b.

I am not convinced that there is evidence for SGV for 653#7, because its age is only about 30 years according to the figure. 3B and 2C are hard to reconcile in my head.

Again, these results have changed slightly with our revised results following rephasing and ARG inference with the most likely parameter set. We no longer see substantial evidence for adaptation from standing variation based on allele age estimates.

Figure 3B (the explicit allele age estimates) and Figure 2C (tree-based tests for selection) present the results from distinct tests and while they may be conceptually related, one is explicitly testing evidence for standing variation versus de novo mutation, while the other tests for non-neutral allele frequency change across a mutational lineage’s entire evolutionary history versus over just more recent timescales. In particular, our results highlight a scenario where these tests illustrated processes over different timescales – while we find evidence of de novo adaptation from allele age estimates, we also find that signatures of selection are stronger on timescales even more recent than the onset of herbicide use.

Additionally, it is reassuring to see consistent signatures of selection since a mutational lineage arose for the majority of mutations (i.e. Figure 2C), as this largely supports our inferences about adaptation on standing variation versus de novo mutation based on allele age estimates (i.e. Figure 3C), but without any assumptions about effective population size. Furthermore, the tree-based test illustrates heterogeneity in signatures of selection across origins, in that some origins show no sign of strong selection (ALS Trp-574-Leu #3, ALS Ser-653-Asn #3, PPO ΔGly210 #7) despite their inferred recent origins.

We have revised our discussion of these two approaches to emphasize our motivation for performing both tests (lines 303-310; Results), and how their interpretation differs (lines 609-620; Discussion).

Figure 4:

This is about interactions between ALS574 and ALS 653. These are 300-ish bp apart.

The resistance mutations never occur on the same haplotype which is surprising (how surprising? Could we get a prediction for how common this should have been if there was free recombination given the age of the alleles?).

We have a paragraph in the discussion which discusses the likelihood of finding a double resistant haplotype given the local recombination rate and their frequencies (lines 661-674).

Now, I'd say the next question is: how often do they occur in the same individual, given their commonality in each population. Like a HW-test. Is there selection against carrying two resistance alleles?

We have now provided the number of individuals that these alleles are found co-occurring within in the results = 16 / 152 sampled individuals (line 382). Furthermore, we find that our observed genotype frequencies are in clear violation of expectations under linkage equilibrium (line 345-347). However, this is not strong evidence for selection against carrying two resistance alleles, as we still don’t know whether recombination has had the chance to recombine these haplotypes on the same background based on their contact time, or what the fitness effects of the combined haplotype are compared to single mutations (discussed at lines 676-689).

I am not entirely convinced that there is evidence for competition. Or maybe it is not clear to me what the authors mean exactly by competition / interference.

I am not sure my brain can follow Figure 4A – what would be the expectation here given multiple origins at both loci?

We have clarified what we mean by haplotype competition in the results at lines 386-389, 405-407 and in the discussion at lines 639-653 (additionally, in earlier responses in this document). We suspect that independent (spatially structured) origins are likely playing a key role in driving this signal. We point out that repulsion disequilibrium would be expected if these alleles were to have arisen on distinct backgrounds. Since their origins, however, these alleles have spread and met (both locally and globally), and given their independence from one another (e.g. lack of haplotypes harbouring both mutations, long-range repulsion), these resistant alleles along with their linked genetic variation across a 10 Mb haplotype, must be competing on their way towards fixation.

It is likely that multiple distinct origins of ALS Trp-574-Leu in Essex have slightly weakened our ability to detect signals of competition between mutations ALS Ser-653-Asn and ALS Trp-574-Leu (in fact it is possible that haplotypic competition is occurring between the unique origins of each mutation too). The larger number of origins of ALS Trp-574-Leu haplotypes mapping to Essex likely explains why individuals harboring ALS Trp-574-Leu are more spread out in our genotypic PCA compared to individuals harboring ALS Ser-653-Asn (in a 2Mb region around ALS, Figure 4B). However, given that one origin of ALS Trp-574-Leu does predominate within Essex (likewise for ALS Ser-653-Asn), it appears that we have enough power to detect a signal of haplotypic differentiation around the ALS locus regardless.

References:

Costea, Mihai, Susan E. Weaver, and François J. Tardif. 2005. “The Biology of Invasive Alien Plants in Canada. 3. Amaranthus Tuberculatus (Moq.) Sauer Var. Rudis (Sauer) Costea and Tardif.” Canadian Journal of Plant Science. Revue Canadienne de Phytotechnie 85 (2): 507–22.

Kreiner, Julia M., Darci Ann Giacomini, Felix Bemm, Bridgit Waithaka, Julian Regalado, Christa Lanz, Julia Hildebrandt, et al. 2019. “Multiple Modes of Convergent Adaptation in the Spread of Glyphosate-Resistant Amaranthus Tuberculatus.” Proceedings of the National Academy of Sciences of the United States of America 116 (42): 21076–84.

Neher, Richard A., and Boris I. Shraiman. 2009. “Competition between Recombination and Epistasis Can Cause a Transition from Allele to Genotype Selection.” Proceedings of the National Academy of Sciences of the United States of America 106 (16): 6866–71.

Rasmussen, Matthew D., Melissa J. Hubisz, Ilan Gronau, and Adam Siepel. 2014. “Genome-Wide Inference of Ancestral Recombination Graphs.” PLoS Genetics 10 (5): e1004342.

Speidel, Leo, Marie Forest, Sinan Shi, and Simon R. Myers. 2019. “A Method for Genome-Wide Genealogy Estimation for Thousands of Samples.” Nature Genetics 51 (9): 1321–29.

Stern, Aaron J., Peter R. Wilton, and Rasmus Nielsen. 2019. “An Approximate Full-Likelihood Method for Inferring Selection and Allele Frequency Trajectories from DNA Sequence Data.” PLoS Genetics 15 (9): e1008384.

https://doi.org/10.7554/eLife.70242.sa2

Article and author information

Author details

  1. Julia M Kreiner

    Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
    Present address
    1. Department of Botany, University of British Columbia, Vancouver, Canada
    2. Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Visualization, Writing – original draft, Writing – review and editing
    For correspondence
    julia.kreiner@ubc.ca
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8593-1394
  2. George Sandler

    Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
    Contribution
    Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9420-1521
  3. Aaron J Stern

    Graduate Group in Computational Biology, University of California, Berkeley, Berkeley, United States
    Contribution
    Methodology, Software
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7368-5520
  4. Patrick J Tranel

    Department of Crop Sciences, University of Illinois Urbana-Champaign, Urbana, United States
    Contribution
    Data curation, Funding acquisition, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Detlef Weigel

    Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
    Contribution
    Data curation, Funding acquisition, Writing – review and editing
    Competing interests
    Deputy editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2114-7963
  6. John R Stinchcombe

    Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
    Contribution
    Data curation, Funding acquisition, Methodology, Supervision, Writing – original draft, Writing – review and editing
    Contributed equally with
    Stephen I Wright
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3349-2964
  7. Stephen I Wright

    Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Data curation, Funding acquisition, Methodology, Supervision, Writing – original draft, Writing – review and editing
    Contributed equally with
    John R Stinchcombe
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9973-9697

Funding

Natural Sciences and Engineering Research Council of Canada (PGS-D)

  • Julia M Kreiner

Society for the Study of Evolution (Rosemary Grant Advanced Award)

  • Julia M Kreiner

Natural Sciences and Engineering Research Council of Canada (Discovery Grant)

  • John Stinchcombe
  • Stephen I Wright

Canada Research Chairs (Population Genomics)

  • Stephen I Wright

Max Planck Society

  • Detlef Weigel

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We gratefully acknowledge discussion and feedback on the manuscript from Matt Osmond, Aneil Agrawal, Rob Ness, Tyler Kent, and Sally Otto, and our two reviewers, Pleuni Pennings and Philipp Messer. Thanks to Leo Speidel and Melissa Hubisz for input on RELATE and ARGweaver implementation, and Peter Sikkema and lab for their support on sampling Ontario populations. JMK was supported by the Rosemary Grant Advanced Award from the Society for the Study of Evolutionary Biology and NSERC PGS-D, DW was supported by the Max Planck Society, JRS and SIW were supported by Discovery Grants from NSERC Canada, and SIW was additionally supported by a Canada Research Chair in Population Genomics.

Senior Editor

  1. Molly Przeworski, Columbia University, United States

Reviewing Editor

  1. Philipp W Messer, Cornell University, United States

Reviewers

  1. Philipp W Messer, Cornell University, United States
  2. Pleuni S Pennings, San Francisco State University, United States

Publication history

  1. Preprint posted: May 10, 2021 (view preprint)
  2. Received: May 11, 2021
  3. Accepted: January 16, 2022
  4. Accepted Manuscript published: January 17, 2022 (version 1)
  5. Version of Record published: January 28, 2022 (version 2)

Copyright

© 2022, Kreiner et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 880
    Page views
  • 145
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Osvaldo Villa et al.
    Short Report Updated

    The influence of genetic variation on the aging process, including the incidence and severity of age-related diseases, is complex. Here, we define the evolutionarily conserved mitochondrial enzyme ALH-6/ALDH4A1 as a predictive biomarker for age-related changes in muscle health by combining Caenorhabditis elegans genetics and a gene-wide association scanning (GeneWAS) from older human participants of the US Health and Retirement Study (HRS). In a screen for mutations that activate oxidative stress responses, specifically in the muscle of C. elegans, we identified 96 independent genetic mutants harboring loss-of-function alleles of alh-6, exclusively. Each of these genetic mutations mapped to the ALH-6 polypeptide and led to the age-dependent loss of muscle health. Intriguingly, genetic variants in ALDH4A1 show associations with age-related muscle-related function in humans. Taken together, our work uncovers mitochondrial alh-6/ALDH4A1 as a critical component to impact normal muscle aging across species and a predictive biomarker for muscle health over the lifespan.

    1. Evolutionary Biology
    Linda H Lidborg et al.
    Research Article Updated

    Humans are sexually dimorphic: men and women differ in body build and composition, craniofacial structure, and voice pitch, likely mediated in part by developmental testosterone. Sexual selection hypotheses posit that, ancestrally, more ‘masculine’ men may have acquired more mates and/or sired more viable offspring. Thus far, however, evidence for either association is unclear. Here, we meta-analyze the relationships between six masculine traits and mating/reproductive outcomes (96 studies, 474 effects, N = 177,044). Voice pitch, height, and testosterone all predicted mating; however, strength/muscularity was the strongest and only consistent predictor of both mating and reproduction. Facial masculinity and digit ratios did not significantly predict either. There was no clear evidence for any effects of masculinity on offspring viability. Our findings support arguments that strength/muscularity may be sexually selected in humans, but cast doubt regarding selection for other forms of masculinity and highlight the need to increase tests of evolutionary hypotheses outside of industrialized populations.