The geographic distribution, population structure, and gene flow of maize and teosinte populations.

(A and D) Admixture proportions among populations within subspecies. The dominant cluster in each population is colored by sampling location. (B) The unrooted tree of maize and teosinte populations. (C) Geographic sampling locations for the studied maize and teosinte populations. (E) F4 tests to quantify evidence of gene flow between the subspecies for allopatric and sympatric population pairs. Each point in (E) reports the absolute Z-Score for an f4 test, where a given focal population was partnered with another population of the same subspecies as a sister node, and two other populations from the other sub-species as a sister clade (see Methods for further details). Black points show f4 tests that included maize from Crucero Lagunitas, otherwise points are colored by focal population. The dotted line corresponds to our chosen significance threshold (p = 0.001).

Inbreeding, diversity, and demography.

The distribution of π (A) and Tajima’s D (B) calculated in 100Kb windows for maize and teosinte populations. Dashed lines show the median values for the two subspecies. Filled white points show the median values of each statistic generated from coalescent simulations under the demographic history inferred for each population. Colors for each population are as in Figure 1 and are shown at the bottom of the figure. (C) The inferred demography for each population. (D) The quantile of observed Homozygosity By Descent (HBD) lengths (cM) versus those simulated under each population demography. Dashed lines shows the 1:1 correspondence between the axes. (E) The distribution of inbreeding coefficients in each population. Filled white points are the average values for each population.

The proportion of mutations fixed by natural selection.

Estimated values of the proportion of mutations fixed by natural selection (α) by population. Vertical lines show the 95% credible interval.

The distribution of shared and private selective sweeps.

(A) The total number of sweeps inferred in each population. (B) The proportion of sweeps that are unique to each population. (C) The negative log10 p values for hypergeometric tests to identify maize-teosinte population pairs that shared more sweeps than expected by chance (see Methods). P values were adjusted for multiple tests using the Benjamini and Yekutieli method. Populations along the y axis are maize (order matches the legend below, with Amatlán de Cañas at the bottom), while the point color designates the teosinte population each maize population was paired with. Points with black outline highlight the sympatric population comparisons. Point size is scaled by the number of shared sweeps identified in each pair. The dotted line indicates our chosen significance level (p = 0.05). (D) Counts of shared and unique sweeps broken down by how many maize and teosinte populations they occurred in. Grey boxes show sweeps shared across the two sub-species.

Modes of convergent adaptation and affiliated parameters for shared selective sweeps.

(A) The difference in composite likelihood scores for the best supported mode of convergent adaptation (colors in top legend) compared to next best mode (black points), and best mode compared to the neutral model (other end of each line segment above or below black point). (B) Selection coefficients colored by the most likely mode of convergent adaptation. (C) Number of shared sweeps for both subspecies that were inferred to be from each convergent adaptation mode. (D) The most likely source population for shared sweeps that converged via migration. Bars are colored by population (bottom legend) and are outlined in black for teosinte and grey for maize. (E) Observed frequency of the inferred time in generations that each selected allele persisted prior to selection for models of convergent adaptation via standing variation. (F) Observed frequency of each inferred migration rate value for models of convergent adaptation via migration. Panels C, D, E and F are partitioned by which subspecies shared the sweep.

Population sampling location information.

f4 tests including the maize Crucero Lagunitas population are significantly elevated compared to those without.

Significant F4 tests. Each row of the table reports the number of significant f4 tests that occurred with a given focal and secondary population, where the two other tip positions were filled with each of the remaining populations for each subspecies. Rows that are left blank in the secondary column are used to report the total number of significant trees for a given focal population.

Predicted values of α across mutation types.

Grey bands for each mutation type show the 95% credible intervals averaged over each population.

Treemix phylogeny including both subsamples of Palmar Chico.

Another potential explanation for lowered sweep sharing between replicates is that sweeps vary in their detectability based on their characteristics. Namely, sweeps that were weakly selected, incomplete, and/or ones that started at a high initial frequency prior to the onset of selection (soft sweeps) may vary in their detectability using the methods we employed. We conducted a simulation experiment to better understand the potential causes of the low shared proportion, and to measure performance to detect different kinds of sweeps more generally. We used discoal (Kern and Schrider 2016) to simulate sweeps in a 400Kb region using the average genome-wide maize mutation and recombination rates under the inferred demographic history of the maize population from Palmar Chico (Figure 2). We simulated four distinct scenarios: classical hard sweeps, where selection acts to fix an an adaptive mutation; soft sweeps, where selection is initiated after the adaptive allele reaches a specified frequency; and incomplete sweeps, where a hard sweep simulation is stopped at a specified frequency, and neutral simulations without selection. For soft sweeps, we varied the initial by drawing from a beta distribution with shape parameters 1 and 20. Incomplete sweeps finished when the adaptive allele reached a frequency of 0.5. For all three types of sweeps, we also varied the strength of selection using the parameter α = 4N0s to be 10, 50, or 100, where N0 is the present day effective population size and s is the selection coefficient. In addition to matching demography and other parameters, we used the same sampling scheme, simulation 50 individuals, than randomly choosing two non-overlapping subsets of 10 individuals (https://github.com/silastittes/ms_sub). From the simulations we assessed the True/False Positive/Negative Rates for each combination of sweep type and strength of selection (α), as well as the distribution of base pair overlap between sweep regions inferred in the the two random subsamples. The same sweep inference methods and parameters were used for these simulations and the empirical samples (see methods). Overall, we found that sweep characteristics we explore indeed impacted our overall power to detect them, and the about of overlap between the sweep regions. Namely, weakly selected sweeps had consistently lower True Positive Rates (Table S3).

Performance to detect simulated hard, soft, and incomplete sweeps under varying strengths of selection under the maize Palmar Chico population demography. TPR, TNR, FNR, and FPR stand for true postive, true negative, false negative, and false positive rates, respectively.

Performance to detect simulated hard, soft, and incomplete sweeps under varying strengths of selection under the maize Palmar Chico population demography.

Each panel shows a combinations of sweep type (hard, soft, or incomplete) and strength of selection (α = 4Nes = 10, 50, or 100)

Degree of overlap between simulated sweep regions takenfrom two downsampled replicates under the maize Palmar Chico population demography.

Positive values show the amount of overlap in basepairs between sweep regions, while negative values represent a the space between them. Panel structure follows that of S4.

Frequency of each population as the mutation source for sweeps shared via migration.

The order of populations along the x axis matches that of the source populations labeled for each strip along the top.

Inferred sweeps shared between subspecies via migration.

The x axis is sorted by the number of populations each sweep was found in. Populations are sorted along the y axis first by subspecies then by their number of sweeps.