1. Epidemiology and Global Health
  2. Genetics and Genomics
Download icon

Demographic history mediates the effect of stratification on polygenic scores

  1. Arslan A Zaidi  Is a corresponding author
  2. Iain Mathieson  Is a corresponding author
  1. University of Pennsylvania, United States
Research Article
  • Cited 0
  • Views 192
  • Annotations
Cite this article as: eLife 2020;9:e61548 doi: 10.7554/eLife.61548


Population stratification continues to bias the results of genome-wide association studies (GWAS). When these results are used to construct polygenic scores, even subtle biases can cumulatively lead to large errors. To study the effect of residual stratification, we simulated GWAS under realistic models of demographic history. We show that when population structure is recent, it cannot be corrected using principal components of common variants because they are uninformative about recent history. Consequently, polygenic scores are biased in that they recapitulate environmental structure. Principal components calculated from rare variants or identity-by-descent segments can correct this stratification for some types of environmental effects. While family-based studies are immune to stratification, the hybrid approach of ascertaining variants in GWAS but re-estimating effect sizes in siblings reduces but does not eliminate stratification. We show that the effect of population stratification depends not only on allele frequencies and environmental structure but also on demographic history.

Article and author information

Author details

  1. Arslan A Zaidi

    Genetics, University of Pennsylvania, Philadelphia, United States
    For correspondence
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2155-8367
  2. Iain Mathieson

    Department of Genetics, University of Pennsylvania, Philadelphia, United States
    For correspondence
    Competing interests
    The authors declare that no competing interests exist.


National Institute of General Medical Sciences (R35GM133708)

  • Iain Mathieson

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. George H Perry, Pennsylvania State University, United States

Publication history

  1. Received: July 29, 2020
  2. Accepted: November 16, 2020
  3. Accepted Manuscript published: November 17, 2020 (version 1)


© 2020, Zaidi & Mathieson

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.


  • 192
    Page views
  • 15
  • 0

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    2. Epidemiology and Global Health
    Toshiko Tanaka et al.
    Research Article

    Older age is a strong shared risk factor for many chronic diseases and there is increasing interest in identifying aging biomarkers. Here a proteomic analysis of 1301 plasma proteins was conducted in 997 individuals between 21 and 102 years of age. We identified 651 proteins associated with age (506 over-represented, 145 underrepresented with age) was identified. Mediation analysis suggested a role for partial cis-epigenetic control of protein expression with age. Of the age-associated proteins, 33.5% and 45.3%, were associated with mortality and multimorbidity, respectively. There was enrichment of proteins associated with inflammation and extracellular matrix as well as senescence-associated secretory proteins. A 76-protein proteomic age signature predicted accumulation of chronic diseases and all-cause mortality. These data support the premise of proteomic biomarkers to monitor aging trajectories and to identify individuals at higher risk for disease to be targeted for in depth diagnostic procedures and early interventions.

    1. Epidemiology and Global Health
    2. Genetics and Genomics
    Alice Easton et al.
    Research Article

    Human ascariasis is a major neglected tropical disease caused by the nematode Ascaris lumbricoides. We report a 296 megabase (Mb) reference-quality genome comprised of 17,902 protein-coding genes derived from a single, representative Ascaris worm. An additional 68 worms were collected from 60 human hosts in Kenyan villages where pig husbandry is rare. Notably, the majority of these worms (63/68) possessed mitochondrial genomes that clustered closer to the pig parasite Ascaris suum than to A. lumbricoides. Comparative phylogenomic analyses identified over 11 million nuclear-encoded SNPs but just two distinct genetic types that had recombined across the genomes analyzed. The nuclear genomes had extensive heterozygosity, and all samples existed as genetic mosaics with either A. suum-like or A. lumbricoides-like inheritance patterns supporting a highly interbred Ascaris species genetic complex. As no barriers appear to exist for anthroponotic transmission of these ‘hybrid’ worms, a one-health approach to control the spread of human ascariasis will be necessary.