Limitations of principal components in quantitative genetic association models for human studies
Abstract
Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, including not varying the number of principal components (PCs), simulating simple population structures, and inconsistent use of real data and power evaluations. We evaluate PCA and LMM both varying number of PCs in realistic genotype and complex trait simulations including admixed families, subpopulation trees, and real multiethnic human datasets with simulated traits. We find that LMM without PCs usually performs best, with the largest effects in family simulations and real human datasets and traits without environment effects. Poor PCA performance on human datasets is driven by large numbers of distant relatives more than the smaller number of closer relatives. While PCA was known to fail on family data, we report strong effects of family relatedness in genetically diverse human datasets, not avoided by pruning close relatives. Environment effects driven by geography and ethnicity are better modeled with LMM including those labels instead of PCs. This work better characterizes the severe limitations of PCA compared to LMM in modeling the complex relatedness structures of multiethnic human data for association studies.
Data availability
The current manuscript is a computational study, so no data have been generated for this manuscript. Code is available at https://github.com/OchoaLab/pca-assoc-paper
-
1000 Genomes Project, high-coverage versionInternational Genome Sample Resource.
Article and author information
Author details
Funding
Whitehead Foundation
- Alejandro Ochoa
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2023, Yao & Ochoa
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,436
- views
-
- 140
- downloads
-
- 7
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Genetics and Genomics
Recent studies have revealed a role for zinc in insulin secretion and glucose homeostasis. Randomized placebo-controlled zinc supplementation trials have demonstrated improved glycemic traits in patients with type II diabetes (T2D). Moreover, rare loss-of-function variants in the zinc efflux transporter SLC30A8 reduce T2D risk. Despite this accumulated evidence, a mechanistic understanding of how zinc influences systemic glucose homeostasis and consequently T2D risk remains unclear. To further explore the relationship between zinc and metabolic traits, we searched the exome database of the Regeneron Genetics Center-Geisinger Health System DiscovEHR cohort for genes that regulate zinc levels and associate with changes in metabolic traits. We then explored our main finding using in vitro and in vivo models. We identified rare loss-of-function (LOF) variants (MAF <1%) in Solute Carrier Family 39, Member 5 (SLC39A5) associated with increased circulating zinc (p=4.9 × 10-4). Trans-ancestry meta-analysis across four studies exhibited a nominal association of SLC39A5 LOF variants with decreased T2D risk. To explore the mechanisms underlying these associations, we generated mice lacking Slc39a5. Slc39a5-/- mice display improved liver function and reduced hyperglycemia when challenged with congenital or diet-induced obesity. These improvements result from elevated hepatic zinc levels and concomitant activation of hepatic AMPK and AKT signaling, in part due to zinc-mediated inhibition of hepatic protein phosphatase activity. Furthermore, under conditions of diet-induced non-alcoholic steatohepatitis (NASH), Slc39a5-/- mice display significantly attenuated fibrosis and inflammation. Taken together, these results suggest SLC39A5 as a potential therapeutic target for non-alcoholic fatty liver disease (NAFLD) due to metabolic derangements including T2D.
-
- Genetics and Genomics
- Stem Cells and Regenerative Medicine
Retinal degeneration in mammals causes permanent loss of vision, due to an inability to regenerate naturally. Some non-mammalian vertebrates show robust regeneration, via Muller glia (MG). We have recently made significant progress in stimulating adult mouse MG to regenerate functional neurons by transgenic expression of the proneural transcription factor Ascl1. While these results showed that MG can serve as an endogenous source of neuronal replacement, the efficacy of this process is limited. With the goal of improving this in mammals, we designed a small molecule screen using sci-Plex, a method to multiplex up to thousands of single-nucleus RNA-seq conditions into a single experiment. We used this technology to screen a library of 92 compounds, identified, and validated two that promote neurogenesis in vivo. Our results demonstrate that high-throughput single-cell molecular profiling can substantially improve the discovery process for molecules and pathways that can stimulate neural regeneration and further demonstrate the potential for this approach to restore vision in patients with retinal disease.