Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorDetlef WeigelMax Planck Institute for Biology Tübingen, Tübingen, Germany
- Senior EditorDetlef WeigelMax Planck Institute for Biology Tübingen, Tübingen, Germany
Reviewer #1 (Public review):
Summary:
In this manuscript, the authors describe a good-quality ancient maize genome from 15th-century Bolivia and try to link the genome characteristics to Inca influence. Overall, the manuscript is below the standard in the field. In particular, the geographic origin of the sample and its archaeological context is not well evidenced. While dating of the sample and the authentication of ancient DNA have been evidenced robustly, the downstream genetic analyses do not support the conclusion that genomic changes can be attributed to Inca influence. Furthermore, sections of the manuscript are written incoherently and with logical mistakes. In its current form, this paper is not robust and possibly of very narrow interest.
Strengths:
Technical data related to the maize sample are robust. Radiocarbon dating strongly evidenced the sample age, estimated to be around 1474 AD. Authentication of ancient DNA has been done robustly. Spontaneous C-to-T substitutions, which are present in all ancient DNA, are visible in the reported sample with the expected pattern. Despite a low fraction of C-to-T at the 1st base, this number could be consistent with the cool and dry climate in which the sample was preserved. The distribution of DNA fragment sizes is consistent with expectations for a sample of this age.
Weaknesses:
(1) Archaeological context for the maize sample is weakly supported by speculation about the origin and has unreasonable claims weighing on it. Perhaps those findings would be more convincing if the authors were to present evidence that supports their conclusions: i) a map of all known tombs near La Paz, ii) evidence supporting the stone tomb origins of this assemblage, and iii) evidence supporting non-Inca provenance of the tomb.
(2) Dismissal of the admixture in the reported samples is not evidenced correctly. Population f3 statistic with an outgroup is indeed one of the most robust metrics for sample relatedness; however, it should not be used as a test of admixture. For an admixture test, the population f3 statistic should be used in the form: i) target population, ii) one possible parental population, iii) another possible parental population. This is typically done iteratively with all combinations of possible parental populations. Even in such a form, the population f3 statistic is not very sensitive to admixture in cases of strong genetic drift, and instead population f4 statistic (with an outgroup) is a recommended test for admixture.
(3) The geographic placement of the sample based on genetic data is not robust. To make use of the method correctly, it would be necessary to validate that genetic samples in this region follow the assumption of the 'isolation-by-distance' with dense sampling, which has not been done. Additionally, the authors posit that "This suggests that aBM might not only be genetically related to the archaeological maize from ancient Peru, but also in the possible geographic location." The method used to infer the location is based on pure genetic estimation. The above conclusion is not supported by this method, and it directly contradicts the authors' suggestion that the sample comes from Bolivia.
(4) The conclusion that Ancient Andean maize is genetically similar to European varieties and hence shares a similar evolutionary history is not well supported. The PCA plot in Figure 4 merely represents sample similarity based on two components (jointly responsible for about 20% of the variation explained), and European samples could be very distant based on other components. Indeed, the direct test using the outgroup f3 statistic does not support that European varieties are particularly closely related to ancient Andean maize. Perhaps these are more closely related to Brazil? We do not know, as this has not been measured.
(5) The conclusion that long branches in the phylogenetic tree are due to selection under local adaptation has no evidence. Long branches could be the result of missing data, nucleotide misincorporations, genetic drift, or simply due to the inability of phylogenetic trees to model complex population-level relationships such as admixture or incomplete lineage sorting. Additionally, captions to Figure S3, do not explain colour-coding.
(6) The conclusion that selection detected in aBM sample is due to Inca influence has no support. Firstly, selection signature can be due to environmental or other factors. To disentangle those, the authors would need to generate the data for a large number of samples from similar cultural contexts and from a wide-ranging environmental context, followed by a formal statistical test. Secondly, allele frequency increase can be attributed to selection or demographic processes, and alone is not sufficient evidence for selection. The presented XP-EHH method seems more suitable. Overall, methods used in this paper raise some concerns: i) how accurate are allele-frequency tests of selection when only single individual is used as a proxy for a whole population, ii) the significance threshold has been arbitrary fixed to an absolute number based on other studies, but the standard is to use, for example, top fifth percentile. Finally, linking selection to particular GO terms is not strong evidence, as correlation does not imply causation, and links are unclear anyway.
In sum, this manuscript presents new data that seems to be of high quality, but the analyses are frequently inappropriate and/or over-interpreted.
Reviewer #2 (Public review):
Summary:
The manuscript presents valuable new datasets from two ancient maize seeds that contribute to our growing understanding of the maize evolution and biodiversity landscape in pre-colonial South America. Some of the analyses are robust, but the selection elements are not supported.
Strengths:
The data collection is robust, and the data appear to beof sufficiently high quality to carry out some interesting analytical procedures. The central finding that aBM maize is closely related to maize from the core Inca region is well supported, although the directionality of dispersal is not supported.
Weaknesses:
The selection results are not justified, see examples in the detailed comments below.
(1) The manuscript mentions cultural and natural selection (line 76), but then only gives a couple of examples of selecting for culinary/use traits. There are many examples of selection to tolerate diverse environments that could be relevant for this discussion, if desired.
(2) I would be extremely cautious about interpreting the observations of a Spanish colonizer (lines 95-99) without very significant caveats. Indigenous agriculture and foodways would have been far more nuanced than what could be captured in this context, and the genocidal activities of the Europeans would have impacted food production activities to a degree, and any contemporaneous accounts need to be understood through that lens.
(3) The f3 stats presented in Figure 2 are not set up to test any specific admixture scenarios, so it is unsupported to conclude that the aBM maize is not admixed on this basis (lines 201-202). The original f3 publication (Patterson et al, 2012) describes some scenarios where f3 characteristics associate with admixture, but in general, there are many caveats to this approach, and it's not the ideal tool for admixture testing, compared with e.g., f4 and D (abba-baba) statistics.
(4) I'm a little bit skeptical that the Locator method adds value here, given the small training sample size and the wide geographic spread and genetic diversity of the ancient samples that include Central America. The paper describing that method (Battey et al 2020 eLife) uses much larger datasets, and while the authors do not specifically advise on sample sizes, they caution about small sample size issues. We have already seen that the ancient Peruvian maize has the most shared drift with aBM maize on the basis of the f3 stats, and the Locator analysis seems to just be reiterating that. I would advise against putting any additional weight on the Locator results as far as geographic origins, and personally I would skip this analysis in this case.
(5) The overlap in PCA should not be used to confirm that aBM is authentically ancient, because with proper data handling, PCA placement should be agnostic to modern/ancient status (see lines 224-226). It is somewhat unexpected that the ancient Tehuacan maize (with a major teosinte genomic component) falls near the ancient South American maize, but this could be an artifact of sampling throughout the PCA and the lack of teosinte samples that might attract that individual.
(6) What has been established (lines 250-251) is genetic similarity to the Inca core area, not necessarily the directionality. Might aBM have been part of a cultural region supplying maize to the Inca core region, for example? Without a specific test of dispersal directionality, which I don't think is possible with the data at hand, this is somewhat speculative.
(7) Singleton SNPs are not a typical criterion for identifying selection; this method needs some citations supporting the exact approach and validation against neutral expectations (line 278). Without Datasets S2 and S3, which are not included with this submission, it is difficult to assess this result further. However, it is very unexpected that ~18,000 out of ~49,000 SNPs would be unique to the aBM lineage. This most likely reflects some data artifact (unaccounted damage, paralogs not treated for high coverage, which are extremely prevalent in maize, etc). I'm confused about unique SNPs in this context. How can they be unique to the aBM lineage if the SNPs used overlap the Grzybowski set? The GO results do not include any details of the exact method used or a statistical assessment of the results. It is not clear if the GO terms noted are statistically enriched.
(8) The use of XP-EHH with pseudohaplotype variant calls is not viable (line 293). It is not clear what exact implementation of XP-EHH was used, but this method generally relies on phased or sometimes unphased diploid genotype calls to observe shared haplotypes, and some minimum population size to derive statistical power. No implementation of XP-EHH to my knowledge is appropriate for application to this kind of dataset.
Reviewer #3 (Public review):
Summary:
The authors seek to place archaeological maize samples (2 kernels) from Bolivia into genetic and geographical context and to assess signatures of selection. The kernels were dated to the end of the Incan empire, just prior to European colonization. Genetic data and analyses were used to characterize the distance from other ancient and modern maize samples and to predict the origin of the sample, which was discovered in a tomb near La Paz, Bolivia. Given the conquest of this region by the Incan empire, it is possible that the sample could be genetically similar to populations of maize in Peru, the center of the Incan empire. Signatures of selection in the sample could help reveal various environmental variables and cultural preferences that shaped maize genetic diversity in this region at that time.
Strengths:
The authors have generated substantial genetic data from these archaeological samples and have assembled a data set of published archaeological and modern maize samples that should help to place these samples in context. The samples are dated to an interesting time in the history of South America during a period of expansion of the Incan empire and just prior to European colonization. Much could be learned from even this small set of samples.
Weaknesses:
(1) Sample preparation and sequencing:
Details of the quality of the samples, including the percentage of endogenous DN,A are missing from the methods. The low percentage of mapped reads suggests endogenous DNA was low, and this would be useful to characterize more fully. Morphological assessment of the samples and comparison to morphological data from other maize varieties is also missing. It appears that the two kernels were ground separately and that DNA was isolated separately, but data were ultimately pooled across these genetically distinct individuals for analysis. Pooling would violate assumptions of downstream analysis, which included genetic comparison to single archaeological and modern individuals.
(2) Genetic comparison to other samples:
The authors did not meaningfully address the varying ages of the other archaeological samples and modern maize when comparing the genetic distance of their samples. The archaeological samples were as old as >5000 BP to as young as 70 BP and therefore have experienced varying extents of genetic drift from ancestral allele frequencies. For this reason, age should explicitly be included in their analysis of genetic relatedness.
(3) Assessment of selection in their ancient Bolivian sample:
This analysis relied on the identification of alleles that were unique to the ancient sample and inferred selection based on a large number of unique SNPs in two genes related to internode length. This could be a technical artifact due to poor alignment of sequence data, evidence supporting pseudogenization, or within an expected range of genetic differentiation based on population structure and the age of the samples. More rigor is needed to indicate that these genetic patterns are consistent with selection. This analysis may also be affected by the pooling of the Bolivian archaeological samples.
(4) Evidence of selection in modern vs. ancient maize: In this analysis, samples were pooled into modern and ancient samples and compared using the XP-EHH statistic. One gene related to ovule development was identified as being targeted by selection, likely during modern improvement. Once again, ancient samples span many millennia and both South, Central, and North America. These, and the modern samples included, do not represent meaningfully cohesive populations, likely explaining the extremely small number of loci differentiating the groups. This analysis is also complicated by the pooling of the Bolivian archaeological samples.