Not so local: the population genetics of convergent adaptation in maize and teosinte

  1. Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
  2. Dept. of Evolution and Ecology, University of California, Davis, CA 95616, USA
  3. Center for Population Biology, University of California, Davis, CA 95616, USA
  4. Department of Plant Sciences, University of California, Davis, CA 95616, USA
  5. Dept. of Integrative Genetics and Genomics, University of California, Davis, CA 95616, USA
  6. United States Department of Agriculture – Agriculture Research Service, Raleigh, NC 27695 USA
  7. Centro Universitario de Ciencias Biológicas y Agropecuarias, Universidad de Guadalajara, Zapopan, Jalisco CP45110, México
  8. Department of Ecology, Evolution, and Organismal Biology; Genome Informatics Facility, Iowa State University, Ames, IA 50011, USA
  9. Génétique Quantitative et Evolution - Le Moulon, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, 91190, Gif-sur-Yvette, France
  10. Genome Center, University of California, Davis, CA 95616, USA

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Detlef Weigel
    Max Planck Institute for Biology Tübingen, Tübingen, Germany
  • Senior Editor
    Detlef Weigel
    Max Planck Institute for Biology Tübingen, Tübingen, Germany

Reviewer #1 (Public Review):

Summary:
This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the patterns are interesting, the strength of evidence in support of the conclusions drawn from these patterns is weak overall. Most of the main conclusions are not supported by convincing analyses.

Strengths:
The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each.

Weaknesses:
There were issues with many parts of the paper, especially with the strength of conclusions that can be drawn from the analyses. I list the major issues in the order in which they appear in the paper.

1. Gene flow and demography.
The f4 tests of introgression (Figure 1E) are not independent of one another. So how should we interpret these: as gene flow everywhere, or just one event in an ancestral population? More importantly, almost all the significant points involve one population (Crucero Lagunitas), which suggests that the results do not simply represent gene flow between the sub-species. There was also no signal of increased migration between sympatric pairs of populations. Overall, the evidence for gene flow presented here is not convincing. Can some kind of supporting evidence be presented?

The paper also estimates demographic histories (changes in effective population sizes) for each population, and each sub-species together. The text (lines 191-194) says that "all histories estimated a bottleneck that started approximately 10 thousand generations ago" but I do not see this. Figure 2C (not 2E, as cited in the text) shows that teosinte had declines in all populations 10,000 generations ago, but some of these declines were very minimal. Maize has a similar pattern that started more recently, but the overall species history shows no change in effective size at all. There's not a lot of signal in these figures overall.

I am also curious: how does the demographic model inferred by mushi address inbreeding and homozygosity by descent (lines 197-202)? In other words, why does a change in Ne necessarily affect inbreeding, especially when all effective population sizes are above 10,000?

2. Proportion of adaptive mutations.
The paper estimates alpha, the proportion of nonsynonymous substitutions fixed by positive selection, using two different sampling schemes for polymorphism. One uses range-wide polymorphism data and one uses each of the single populations. Because the estimates using these two approaches are similar, the authors conclude that there is little local adaptation. However, this conclusion is not justified.

There is little information as to how the McDonald-Kreitman test is carried out, but it appears that polymorphism within either teosinte or maize (using either sampling scheme) is compared to fixed differences with an outgroup. These species might be Z. luxurians or Z. diploperennis, as both are mentioned as outgroups. Regardless of which is used, this sampling means that almost all the fixed differences in the MK test will be along the ancestral branch leading to the ancestor of maize or teosinte, and on the branch leading to the outgroup. Therefore, it should not be surprising that alpha does not change based on the sampling scheme, as this should barely change the number of fixed differences (no numbers are reported).

The lack of differences in results has little to do with range-wide vs restricted adaptation, and much more to do with how MK tests are constructed. Should we expect an excess of fixed amino acid differences on very short internal branches of each sub-species tree? It makes sense that there is more variation in alpha in teosinte than maize, as these branches are longer, but they all seem quite short (it is hard to know precisely, as no Fst values or similar are reported).

3. Shared and private sweeps.
In order to make biological inferences from the number of shared and private sweeps, there are a number of issues that must be addressed.

One issue is false negatives and false positives. If sweeps occur but are missed, then they will appear to be less shared than they really are. Table S3 reports very high false negative rates across much of the parameter space considered, but is not mentioned in the main text. How can we make strong conclusions about the scale of local adaptation given this? Conversely, while there is information about the false positive rate provided, this information doesn't tell us whether it's higher for population-specific events. It certainly seems likely that it would be. In either case, we should be cautious saying that some sweeps are "locally restricted" if they can be missed more than 85% of the time in a second population or falsely identified more than 25% of the time in a single population.

A second, opposite, issue is shared ancestral events. Maize populations are much more closely related than teosinte (Figure 2B). Because of this, a single, completed sweep in the ancestor of all populations could much more readily show a signal in multiple descendant populations. This is consistent with the data showing more shared events (and possibly more events overall). There also appear to be some very closely (phylogenetically) related teosinte populations. What if there's selection in their shared ancestor? For instance, Los Guajes and Palmar Chico are the two most closely related populations of teosinte and have the fewest unique sweeps (Figure 4B). How do these kinds of ancestrally shared selective events fit into the framework here?

These analyses of shared sweeps are followed by an analysis of sweeps shared by sympatric pairs of teosinte and maize. Because there are not more events shared by these pairs than expected, the paper concludes that geography and local environment are not important. But wouldn't it be better to test for shared sweeps according to the geographic proximity of populations of the same sub-species? A comparison of the two sub-species does not directly address the scale of adaptation of one organism to its environment, and therefore it is hard to know what to conclude from this analysis.

4. Convergent adaptation
My biggest concern involves the apparent main conclusion of the paper about the sources of "convergent adaptations". I believe the authors are misapplying the method of Lee and Coop (2017), and have not seriously considered the confounding factors of this method as applied. I am unconvinced by the conclusions that are made from these analyses.

The method of Lee and Coop (referred to as rdmc) is intended to be applied to a single locus (or very tightly linked loci) that shows adaptation to the same environmental factor in different populations. From their paper: "Geographically separated populations can convergently adapt to the same selection pressure. Convergent evolution at the level of a gene may arise via three distinct modes." However, in the current paper, we are not considering such a restricted case. Instead, genome-wide scans for sweep regions have been made, without regard to similar selection pressures or to whether events are occurring in the same gene. Instead, the method is applied to large genomic regions not associated with known phenotypes or selective pressures.

I think the larger worry here is whether we are truly considering the "same gene" in these analyses. The methods applied here attempt to find shared sweep regions, not shared genes (or mutations). Even then, there are no details that I could find as to what constitutes a shared sweep. The only relevant text (lines 802-803) describes how a single region is called: "We merged outlier regions within 50,000 Kb of one another and treated as a single sweep region." (It probably doesn't mean "50,000 kb", which would be 50 million bases.) However, no information is given about how to identify overlap between populations or sub-species, nor how likely it is that the shared target of selection would be included in anything identified as a shared sweep. Is there a way to gauge whether we are truly identifying the same target of selection in two populations?

The question then is, what does rdmc conclude if we are simply looking at a region that happened to be a sweep in two populations, but was not due to shared selection or similar genes? There is little testing of this application here, especially its accuracy. Testing in Lee and Coop (2017) is all carried out assuming the location of the selected site is known, and even then there is quite a lot of difficulty distinguishing among several of the non-neutral models. This was especially true when standing variation was only polymorphic for a short time, as is estimated here for many cases, and would be confused for migration (see Lee and Coop 2017). Furthermore, the model of Lee and Coop (2017) does not seem to consider a completed ancestral sweep that has signals that persist into current populations (see point 3 above). How would rdmc interpret such a scenario?

Overall, there simply doesn't seem to be enough testing of this method, nor are many caveats raised in relation to the strange distributions of standing variation times (bimodal) or migration rates (opposite between maize and teosinte). It is not clear what inferences can be made with confidence, and certainly the Discussion (and Abstract) makes conclusions about the spread of beneficial alleles via introgression that seem to outstrip the results.

Reviewer #2 (Public Review):

Summary:
The authors sampled multiple populations of maize and teosinte across Mexico, aiming to characterise the geographic scale of local adaptation, patterns of selective sweeps, and modes of convergent evolution between populations and subspecies.

Strengths & Weaknesses:
The population genomic methods are standard and appropriate, including Fst, Tajima's D, α, and selective sweep scans. The whole genome sequencing data seems high quality. However, limitations exist regarding limited sampling, potential high false-positive sweep detection rates, and weak evidence for some conclusions, like the role of migration in teosinte adaptation.

Aims & Conclusions:
The results are interesting in supporting local adaptation at intermediate geographic scales, widespread convergence between populations, and standing variation/gene flow facilitating adaptation. However, more rigorous assessments of method performance would strengthen confidence. Connecting genetic patterns to phenotypic differences would also help validate associations with local adaptation.

Impact & Utility:
This work provides some of the first genomic insights into local adaptation and convergence in maize and teosinte. However, the limited sampling and need for better method validation currently temper the utility and impact. Broader sampling and connecting results to phenotypes would make this a more impactful study and valuable resource. The population genomic data itself provides a helpful resource for the community.

Additional Context:
Previous work has found population structure and phenotypic differences consistent with local adaptation in maize and teosinte. However, genomic insights have been lacking. This paper takes initial steps to characterise genomic patterns but is limited by sampling and validation. Additional work building on this foundation could contribute to understanding local adaptation in these agriculturally vital species.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation