Limitations of principal components in quantitative genetic association models for human studies

  1. Yiqi Yao
  2. Alejandro Ochoa  Is a corresponding author
  1. Duke University, United States

Abstract

Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, including not varying the number of principal components (PCs), simulating simple population structures, and inconsistent use of real data and power evaluations. We evaluate PCA and LMM both varying number of PCs in realistic genotype and complex trait simulations including admixed families, subpopulation trees, and real multiethnic human datasets with simulated traits. We find that LMM without PCs usually performs best, with the largest effects in family simulations and real human datasets and traits without environment effects. Poor PCA performance on human datasets is driven by large numbers of distant relatives more than the smaller number of closer relatives. While PCA was known to fail on family data, we report strong effects of family relatedness in genetically diverse human datasets, not avoided by pruning close relatives. Environment effects driven by geography and ethnicity are better modeled with LMM including those labels instead of PCs. This work better characterizes the severe limitations of PCA compared to LMM in modeling the complex relatedness structures of multiethnic human data for association studies.

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript. Code is available at https://github.com/OchoaLab/pca-assoc-paper

The following previously published data sets were used

Article and author information

Author details

  1. Yiqi Yao

    Department of Biostatistics and Bioinformatics, Duke University, Durham, United States
    Competing interests
    Yiqi Yao, is affiliated with BenHealth Consulting. The author has no financial interests to declare..
  2. Alejandro Ochoa

    Department of Biostatistics and Bioinformatics, Duke University, Durham, United States
    For correspondence
    alejandro.ochoa@duke.edu
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4928-3403

Funding

Whitehead Foundation

  • Alejandro Ochoa

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2023, Yao & Ochoa

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,436
    views
  • 140
    downloads
  • 7
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Yiqi Yao
  2. Alejandro Ochoa
(2023)
Limitations of principal components in quantitative genetic association models for human studies
eLife 12:e79238.
https://doi.org/10.7554/eLife.79238

Share this article

https://doi.org/10.7554/eLife.79238

Further reading

    1. Genetics and Genomics
    Shek Man Chim, Kristen Howell ... Regeneron Genetics Center
    Research Article

    Recent studies have revealed a role for zinc in insulin secretion and glucose homeostasis. Randomized placebo-controlled zinc supplementation trials have demonstrated improved glycemic traits in patients with type II diabetes (T2D). Moreover, rare loss-of-function variants in the zinc efflux transporter SLC30A8 reduce T2D risk. Despite this accumulated evidence, a mechanistic understanding of how zinc influences systemic glucose homeostasis and consequently T2D risk remains unclear. To further explore the relationship between zinc and metabolic traits, we searched the exome database of the Regeneron Genetics Center-Geisinger Health System DiscovEHR cohort for genes that regulate zinc levels and associate with changes in metabolic traits. We then explored our main finding using in vitro and in vivo models. We identified rare loss-of-function (LOF) variants (MAF <1%) in Solute Carrier Family 39, Member 5 (SLC39A5) associated with increased circulating zinc (p=4.9 × 10-4). Trans-ancestry meta-analysis across four studies exhibited a nominal association of SLC39A5 LOF variants with decreased T2D risk. To explore the mechanisms underlying these associations, we generated mice lacking Slc39a5. Slc39a5-/- mice display improved liver function and reduced hyperglycemia when challenged with congenital or diet-induced obesity. These improvements result from elevated hepatic zinc levels and concomitant activation of hepatic AMPK and AKT signaling, in part due to zinc-mediated inhibition of hepatic protein phosphatase activity. Furthermore, under conditions of diet-induced non-alcoholic steatohepatitis (NASH), Slc39a5-/- mice display significantly attenuated fibrosis and inflammation. Taken together, these results suggest SLC39A5 as a potential therapeutic target for non-alcoholic fatty liver disease (NAFLD) due to metabolic derangements including T2D.

    1. Genetics and Genomics
    2. Stem Cells and Regenerative Medicine
    Amy Tresenrider, Marcus Hooper ... Thomas A Reh
    Research Article

    Retinal degeneration in mammals causes permanent loss of vision, due to an inability to regenerate naturally. Some non-mammalian vertebrates show robust regeneration, via Muller glia (MG). We have recently made significant progress in stimulating adult mouse MG to regenerate functional neurons by transgenic expression of the proneural transcription factor Ascl1. While these results showed that MG can serve as an endogenous source of neuronal replacement, the efficacy of this process is limited. With the goal of improving this in mammals, we designed a small molecule screen using sci-Plex, a method to multiplex up to thousands of single-nucleus RNA-seq conditions into a single experiment. We used this technology to screen a library of 92 compounds, identified, and validated two that promote neurogenesis in vivo. Our results demonstrate that high-throughput single-cell molecular profiling can substantially improve the discovery process for molecules and pathways that can stimulate neural regeneration and further demonstrate the potential for this approach to restore vision in patients with retinal disease.