On the limits of fitting complex models of population history to f-statistics

  1. Robert Maier  Is a corresponding author
  2. Pavel Flegontov  Is a corresponding author
  3. Olga Flegontova
  4. Ulas Isildak
  5. Piya Changmai
  6. David Reich  Is a corresponding author
  1. Harvard University, United States
  2. University of Ostrava, Czech Republic
  3. Broad Institute, United States

Abstract

Our understanding of population history in deep time has been assisted by fitting admixture graphs ('AGs') to data: models that specify the ordering of population splits and mixtures, which along with the amount of genetic drift on each lineage and the proportions of mixture, is the only information needed to predict the patterns of allele frequency correlation among populations. Not needing to specify population size changes, split times, or whether admixture events were sudden or drawn out simplifies the space of models that need to be searched. However, the space of possible AGs relating populations is vast and cannot be sampled fully, and thus most published studies have identified fitting AGs through a manual process driven by prior hypotheses, leaving the vast majority of alternative models unexplored. Here, we develop a method for systematically searching the space of all AGs that can incorporate non-genetic information in the form of topology constraints. We implement this findGraphs tool within a software package, ADMIXTOOLS 2, which is a reimplementation of the ADMIXTOOLS software with new features and large performance gains. We apply this methodology to identify alternative models to AGs that played key roles in eight published studies and find that graphs modeling more than six populations and two or three admixture events are often not unique, with many alternative models fitting nominally or significantly better than the published one. Our results suggest that strong claims about population history from AGs should only be made when all well-fitting and temporally plausible models share common topological features. Our re-evaluation of published data also provides insight into the population histories of humans, dogs, and horses, identifying features that are stable across the models we explored, as well as scenarios of populations relationships that differ in important ways from models that have been highlighted in the literature, that fit the allele frequency correlation data, and that are not obviously wrong.

Data availability

As indicated in the 'Materials availability statement', the ancient human genome newly reported in this manuscript (Table S2) is freely available at the European Nucleotide Archive in the form of an alignment of reads to the hg19 human reference genome (project accession number PRJEB58199. All the other data we analyze are previously reported. As we state in the 'Materials availability statement', the exact versions of the published archaeogenetic datasets re-analyzed in this manuscript were kindly shared by the corresponding authors of the following publications upon our requests:1.Bergström A, Frantz L, Schmidt R, et al. Initial Upper Palaeolithic humans in Europe had recent Neanderthal ancestry. Nature. 2021 Apr;592(7853):253-257. doi: 10.1038/s41586-021-03335-3.2.Lazaridis I, Patterson N, Mittnik A, et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 2014 Sep 18;513(7518):409-13. doi: 10.1038/nature13673.3.Librado P, Khan N, Fages A, et al. The origins and spread of domestic horses from the Western Eurasian steppes. Nature. 2021 Oct;598(7882):634-640. doi: 10.1038/s41586-021-04018-9.4.Lipson M, Ribot I, Mallick S, et al. Ancient West African foragers in the context of African population history. Nature. 2020 Jan;577(7792):665-670. doi: 10.1038/s41586-020-1929-1.5.Shinde V, Narasimhan VM, Rohland N, et al. An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers. Cell. 2019 Oct 17;179(3):729-735.e10. doi: 10.1016/j.cell.2019.08.048.6.Sikora M, Pitulko VV, Sousa VC, et al. The population history of northeastern Siberia since the Pleistocene. Nature. 2019 Jun;570(7760):182-188. Doi: 10.1038/s41586-019-1279-z.7.Wang CC, Yeh HY, Popov AN, et al. Genomic insights into the formation of human populations in East Asia. Nature. 2021 Mar;591(7850):413-419. Doi: 10.1038/s41586-021-03336-2.Various statistics for these re-used datasets are summarized in Table S1.

Article and author information

Author details

  1. Robert Maier

    Harvard University, Cambridge, United States
    For correspondence
    rmaier@broadinstitute.org
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3044-090X
  2. Pavel Flegontov

    Department of Biology and Ecology, University of Ostrava, Ostrava, Czech Republic
    For correspondence
    Pavel_Flegontov@hms.harvard.edu
    Competing interests
    The authors declare that no competing interests exist.
  3. Olga Flegontova

    Department of Biology and Ecology, University of Ostrava, Ostrava, Czech Republic
    Competing interests
    The authors declare that no competing interests exist.
  4. Ulas Isildak

    Department of Biology and Ecology, University of Ostrava, Ostrava, Czech Republic
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6497-6254
  5. Piya Changmai

    Department of Biology and Ecology, University of Ostrava, University of Ostrava, Czech Republic
    Competing interests
    The authors declare that no competing interests exist.
  6. David Reich

    Program in Medical and Population Genetics, Broad Institute, Cambridge, United States
    For correspondence
    reich@genetics.med.harvard.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7037-5292

Funding

Czech Ministry of Education, Youth and Sports (project no. LL2103)

  • Pavel Flegontov
  • Olga Flegontova
  • Piya Changmai

Czech Ministry of Education, Youth and Sports (LM2015070)

  • Pavel Flegontov
  • Piya Changmai

Czech Ministry of Education, Youth and Sports (project no. LTAUSA18153)

  • Pavel Flegontov
  • Piya Changmai

National Institutes of Health (GM100233)

  • Robert Maier
  • David Reich

National Institutes of Health (HG012287)

  • Robert Maier
  • David Reich

John Templeton Foundation (grant 61220)

  • Robert Maier
  • David Reich

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Magnus Nordborg, Gregor Mendel Institute, Austria

Version history

  1. Preprint posted: May 8, 2022 (view preprint)
  2. Received: December 10, 2022
  3. Accepted: April 5, 2023
  4. Accepted Manuscript published: April 14, 2023 (version 1)
  5. Version of Record published: June 29, 2023 (version 2)

Copyright

© 2023, Maier et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,909
    views
  • 725
    downloads
  • 40
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Robert Maier
  2. Pavel Flegontov
  3. Olga Flegontova
  4. Ulas Isildak
  5. Piya Changmai
  6. David Reich
(2023)
On the limits of fitting complex models of population history to f-statistics
eLife 12:e85492.
https://doi.org/10.7554/eLife.85492

Share this article

https://doi.org/10.7554/eLife.85492

Further reading

    1. Evolutionary Biology
    Case Vincent Miller, Jen A Bright ... Michael Pittman
    Research Article

    Enantiornithines were the dominant birds of the Mesozoic, but understanding of their diet is still tenuous. We introduce new data on the enantiornithine family Bohaiornithidae, famous for their large size and powerfully built teeth and claws. In tandem with previously published data, we comment on the breadth of enantiornithine ecology and potential patterns in which it evolved. Body mass, jaw mechanical advantage, finite element analysis of the jaw, and traditional morphometrics of the claws and skull are compared between bohaiornithids and living birds. We find bohaiornithids to be more ecologically diverse than any other enantiornithine family: Bohaiornis and Parabohaiornis are similar to living plant-eating birds; Longusunguis resembles raptorial carnivores; Zhouornis is similar to both fruit-eating birds and generalist feeders; and Shenqiornis and Sulcavis plausibly ate fish, plants, or a mix of both. We predict the ancestral enantiornithine bird to have been a generalist which ate a wide variety of foods. However, more quantitative data from across the enantiornithine tree is needed to refine this prediction. By the Early Cretaceous, enantiornithine birds had diversified into a variety of ecological niches like crown birds after the K-Pg extinction, adding to the evidence that traits unique to crown birds cannot completely explain their ecological success.

    1. Evolutionary Biology
    Mátyás Paczkó, Eörs Szathmáry, András Szilágyi
    Research Article

    The RNA world hypothesis proposes that during the early evolution of life, primordial genomes of the first self-propagating evolutionary units existed in the form of RNA-like polymers. Autonomous, non-enzymatic, and sustained replication of such information carriers presents a problem, because product formation and hybridization between template and copy strands reduces replication speed. Kinetics of growth is then parabolic with the benefit of entailing competitive coexistence, thereby maintaining diversity. Here, we test the information-maintaining ability of parabolic growth in stochastic multispecies population models under the constraints of constant total population size and chemostat conditions. We find that large population sizes and small differences in the replication rates favor the stable coexistence of the vast majority of replicator species (‘genes’), while the error threshold problem is alleviated relative to exponential amplification. In addition, sequence properties (GC content) and the strength of resource competition mediated by the rate of resource inflow determine the number of coexisting variants, suggesting that fluctuations in building block availability favored repeated cycles of exploration and exploitation. Stochastic parabolic growth could thus have played a pivotal role in preserving viable sequences generated by random abiotic synthesis and providing diverse genetic raw material to the early evolution of functional ribozymes.