On the limits of fitting complex models of population history to f-statistics
Abstract
Our understanding of population history in deep time has been assisted by fitting admixture graphs ('AGs') to data: models that specify the ordering of population splits and mixtures, which along with the amount of genetic drift on each lineage and the proportions of mixture, is the only information needed to predict the patterns of allele frequency correlation among populations. Not needing to specify population size changes, split times, or whether admixture events were sudden or drawn out simplifies the space of models that need to be searched. However, the space of possible AGs relating populations is vast and cannot be sampled fully, and thus most published studies have identified fitting AGs through a manual process driven by prior hypotheses, leaving the vast majority of alternative models unexplored. Here, we develop a method for systematically searching the space of all AGs that can incorporate non-genetic information in the form of topology constraints. We implement this findGraphs tool within a software package, ADMIXTOOLS 2, which is a reimplementation of the ADMIXTOOLS software with new features and large performance gains. We apply this methodology to identify alternative models to AGs that played key roles in eight published studies and find that graphs modeling more than six populations and two or three admixture events are often not unique, with many alternative models fitting nominally or significantly better than the published one. Our results suggest that strong claims about population history from AGs should only be made when all well-fitting and temporally plausible models share common topological features. Our re-evaluation of published data also provides insight into the population histories of humans, dogs, and horses, identifying features that are stable across the models we explored, as well as scenarios of populations relationships that differ in important ways from models that have been highlighted in the literature, that fit the allele frequency correlation data, and that are not obviously wrong.
Data availability
As indicated in the 'Materials availability statement', the ancient human genome newly reported in this manuscript (Table S2) is freely available at the European Nucleotide Archive in the form of an alignment of reads to the hg19 human reference genome (project accession number PRJEB58199. All the other data we analyze are previously reported. As we state in the 'Materials availability statement', the exact versions of the published archaeogenetic datasets re-analyzed in this manuscript were kindly shared by the corresponding authors of the following publications upon our requests:1.Bergström A, Frantz L, Schmidt R, et al. Initial Upper Palaeolithic humans in Europe had recent Neanderthal ancestry. Nature. 2021 Apr;592(7853):253-257. doi: 10.1038/s41586-021-03335-3.2.Lazaridis I, Patterson N, Mittnik A, et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 2014 Sep 18;513(7518):409-13. doi: 10.1038/nature13673.3.Librado P, Khan N, Fages A, et al. The origins and spread of domestic horses from the Western Eurasian steppes. Nature. 2021 Oct;598(7882):634-640. doi: 10.1038/s41586-021-04018-9.4.Lipson M, Ribot I, Mallick S, et al. Ancient West African foragers in the context of African population history. Nature. 2020 Jan;577(7792):665-670. doi: 10.1038/s41586-020-1929-1.5.Shinde V, Narasimhan VM, Rohland N, et al. An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers. Cell. 2019 Oct 17;179(3):729-735.e10. doi: 10.1016/j.cell.2019.08.048.6.Sikora M, Pitulko VV, Sousa VC, et al. The population history of northeastern Siberia since the Pleistocene. Nature. 2019 Jun;570(7760):182-188. Doi: 10.1038/s41586-019-1279-z.7.Wang CC, Yeh HY, Popov AN, et al. Genomic insights into the formation of human populations in East Asia. Nature. 2021 Mar;591(7850):413-419. Doi: 10.1038/s41586-021-03336-2.Various statistics for these re-used datasets are summarized in Table S1.
Article and author information
Author details
Funding
Czech Ministry of Education, Youth and Sports (project no. LL2103)
- Pavel Flegontov
- Olga Flegontova
- Piya Changmai
Czech Ministry of Education, Youth and Sports (LM2015070)
- Pavel Flegontov
- Piya Changmai
Czech Ministry of Education, Youth and Sports (project no. LTAUSA18153)
- Pavel Flegontov
- Piya Changmai
National Institutes of Health (GM100233)
- Robert Maier
- David Reich
National Institutes of Health (HG012287)
- Robert Maier
- David Reich
John Templeton Foundation (grant 61220)
- Robert Maier
- David Reich
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2023, Maier et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 5,315
- views
-
- 879
- downloads
-
- 77
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Ecology
- Evolutionary Biology
Understanding the origins of novel, complex phenotypes is a major goal in evolutionary biology. Poison frogs of the family Dendrobatidae have evolved the novel ability to acquire alkaloids from their diet for chemical defense at least three times. However, taxon sampling for alkaloids has been biased towards colorful species, without similar attention paid to inconspicuous ones that are often assumed to be undefended. As a result, our understanding of how chemical defense evolved in this group is incomplete. Here, we provide new data showing that, in contrast to previous studies, species from each undefended poison frog clade have measurable yet low amounts of alkaloids. We confirm that undefended dendrobatids regularly consume mites and ants, which are known sources of alkaloids. Thus, our data suggest that diet is insufficient to explain the defended phenotype. Our data support the existence of a phenotypic intermediate between toxin consumption and sequestration — passive accumulation — that differs from sequestration in that it involves no derived forms of transport and storage mechanisms yet results in low levels of toxin accumulation. We discuss the concept of passive accumulation and its potential role in the origin of chemical defenses in poison frogs and other toxin-sequestering organisms. In light of ideas from pharmacokinetics, we incorporate new and old data from poison frogs into an evolutionary model that could help explain the origins of acquired chemical defenses in animals and provide insight into the molecular processes that govern the fate of ingested toxins.
-
- Chromosomes and Gene Expression
- Evolutionary Biology
Gene regulation is essential for life and controlled by regulatory DNA. Mutations can modify the activity of regulatory DNA, and also create new regulatory DNA, a process called regulatory emergence. Non-regulatory and regulatory DNA contain motifs to which transcription factors may bind. In prokaryotes, gene expression requires a stretch of DNA called a promoter, which contains two motifs called –10 and –35 boxes. However, these motifs may occur in both promoters and non-promoter DNA in multiple copies. They have been implicated in some studies to improve promoter activity, and in others to repress it. Here, we ask whether the presence of such motifs in different genetic sequences influences promoter evolution and emergence. To understand whether and how promoter motifs influence promoter emergence and evolution, we start from 50 ‘promoter islands’, DNA sequences enriched with –10 and –35 boxes. We mutagenize these starting ‘parent’ sequences, and measure gene expression driven by 240,000 of the resulting mutants. We find that the probability that mutations create an active promoter varies more than 200-fold, and is not correlated with the number of promoter motifs. For parent sequences without promoter activity, mutations created over 1500 new –10 and –35 boxes at unique positions in the library, but only ~0.3% of these resulted in de-novo promoter activity. Only ~13% of all –10 and –35 boxes contribute to de-novo promoter activity. For parent sequences with promoter activity, mutations created new –10 and –35 boxes in 11 specific positions that partially overlap with preexisting ones to modulate expression. We also find that –10 and –35 boxes do not repress promoter activity. Overall, our work demonstrates how promoter motifs influence promoter emergence and evolution. It has implications for predicting and understanding regulatory evolution, de novo genes, and phenotypic evolution.