Limitations of principal components in quantitative genetic association models for human studies

  1. Yiqi Yao
  2. Alejandro Ochoa  Is a corresponding author
  1. Duke University, United States

Abstract

Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM), sometimes in combination, are the most common genetic association models. Previous PCA-LMM comparisons give mixed results, unclear guidance, and have several limitations, including not varying the number of principal components (PCs), simulating simple population structures, and inconsistent use of real data and power evaluations. We evaluate PCA and LMM both varying number of PCs in realistic genotype and complex trait simulations including admixed families, subpopulation trees, and real multiethnic human datasets with simulated traits. We find that LMM without PCs usually performs best, with the largest effects in family simulations and real human datasets and traits without environment effects. Poor PCA performance on human datasets is driven by large numbers of distant relatives more than the smaller number of closer relatives. While PCA was known to fail on family data, we report strong effects of family relatedness in genetically diverse human datasets, not avoided by pruning close relatives. Environment effects driven by geography and ethnicity are better modeled with LMM including those labels instead of PCs. This work better characterizes the severe limitations of PCA compared to LMM in modeling the complex relatedness structures of multiethnic human data for association studies.

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript. Code is available at https://github.com/OchoaLab/pca-assoc-paper

The following previously published data sets were used

Article and author information

Author details

  1. Yiqi Yao

    Department of Biostatistics and Bioinformatics, Duke University, Durham, United States
    Competing interests
    Yiqi Yao, is affiliated with BenHealth Consulting. The author has no financial interests to declare..
  2. Alejandro Ochoa

    Department of Biostatistics and Bioinformatics, Duke University, Durham, United States
    For correspondence
    alejandro.ochoa@duke.edu
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4928-3403

Funding

Whitehead Foundation

  • Alejandro Ochoa

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2023, Yao & Ochoa

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,599
    views
  • 153
    downloads
  • 14
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

Share this article

https://doi.org/10.7554/eLife.79238

Further reading

    1. Biochemistry and Chemical Biology
    2. Genetics and Genomics
    Conor J Howard, Nathan S Abell ... Nathan B Lubock
    Research Article

    Deep Mutational Scanning (DMS) is an emerging method to systematically test the functional consequences of thousands of sequence changes to a protein target in a single experiment. Because of its utility in interpreting both human variant effects and protein structure-function relationships, it holds substantial promise to improve drug discovery and clinical development. However, applications in this domain require improved experimental and analytical methods. To address this need, we report novel DMS methods to precisely and quantitatively interrogate disease-relevant mechanisms, protein-ligand interactions, and assess predicted response to drug treatment. Using these methods, we performed a DMS of the melanocortin-4 receptor (MC4R), a G-protein-coupled receptor (GPCR) implicated in obesity and an active target of drug development efforts. We assessed the effects of >6600 single amino acid substitutions on MC4R’s function across 18 distinct experimental conditions, resulting in >20 million unique measurements. From this, we identified variants that have unique effects on MC4R-mediated Gαs- and Gαq-signaling pathways, which could be used to design drugs that selectively bias MC4R’s activity. We also identified pathogenic variants that are likely amenable to a corrector therapy. Finally, we functionally characterized structural relationships that distinguish the binding of peptide versus small molecule ligands, which could guide compound optimization. Collectively, these results demonstrate that DMS is a powerful method to empower drug discovery and development.

    1. Biochemistry and Chemical Biology
    2. Genetics and Genomics
    Jiale Zhou, Ding Zhao ... Zhanjun Li
    Research Article

    5-Methylcytosine (m5C) is one of the posttranscriptional modifications in mRNA and is involved in the pathogenesis of various diseases. However, the capacity of existing assays for accurately and comprehensively transcriptome-wide m5C mapping still needs improvement. Here, we develop a detection method named DRAM (deaminase and reader protein assisted RNA methylation analysis), in which deaminases (APOBEC1 and TadA-8e) are fused with m5C reader proteins (ALYREF and YBX1) to identify the m5C sites through deamination events neighboring the methylation sites. This antibody-free and bisulfite-free approach provides transcriptome-wide editing regions which are highly overlapped with the publicly available bisulfite-sequencing (BS-seq) datasets and allows for a more stable and comprehensive identification of the m5C loci. In addition, DRAM system even supports ultralow input RNA (10 ng). We anticipate that the DRAM system could pave the way for uncovering further biological functions of m5C modifications.