Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in UK Biobank

  1. Ardalan Naseri
  2. Degui Zhi  Is a corresponding author
  3. Shaojie Zhang  Is a corresponding author
  1. Department of Computer Science, University of Central Florida, United States
  2. Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, United States
6 figures, 2 tables and 7 additional files

Figures

Figure 1 with 3 supplements
Runs-of-homozygosity (ROH)-DICE enables the discovery of loci-specific association signals of ROH diplotypes.

The actual ROH contents (a) including the locations and sequence identities of ROH (indicated by different colors) were lost in traditional ROH analysis pipelines (b) which aggregate the ROH contents per individual and lose the chances for identifying associating loci. ROH-DICE (c) reveals ROH diplotype clusters that are long and wide enough, thus enabling mapping loci associated with phenotypes.

Figure 1—figure supplement 1
Evaluation of runs-of-homozygosity (ROH) clusters using simulated genotype data with and without genotyping errors.

(a) and (b) show the detection power and accuracy using different L and W values without genotype errors. (c) and (d) show the detection power and accuracy with a genotyping error rate of 0.1% using different L and W values.

Figure 1—figure supplement 2
Evaluation of runs-of-homozygosity (ROH) clusters using different cut-off values for the same target length (L) and width (W).

The target was set to L = 100 and W = 20. Various length and width values were used as cut-offs.

Figure 1—figure supplement 3
Power of runs-of-homozygosity (ROH)-DICE vs traditional genome-wide association studies (GWAS) for finding associations between phenotypes and ROH clusters using 200 samples with 10 Mbps and 100 consecutive causal variant sites.
A simple schematic of searching for runs-of-homozygosity (ROH) diplotype clusters in a genotype panel.

The input is a genotype panel where each line represents an individual. The heterozygous sites are depicted in violet in the genotype panel. Input genotype data are converted into a binarized genotype panel where homozygous sites are preserved. The matching blocks (clusters) are searched using consensus PBWT (cPBWT). A matching block is defined by a minimum number of sites, individuals, and also an objective function. The objective can be either maximizing the number of individuals or maximizing the number of sites. The clusters of matches are highlighted in different colors. Red represents a cluster with the maximized number of individuals and blue represents a cluster with the maximized number of sites.

Figure 3 with 1 supplement
Total number of detected runs-of-homozygosity (ROH) diplotype clusters in each autosomal chromosome (a) and the detected ROH clusters in the major histocompatibility complex (MHC) region (chr6:28477797–33448354) (b) in hg19.

Some regions may contain multiple overlapping clusters comprising different sets of individuals. The minimum length of the ROH regions was set to 100 sites and the minimum number of individuals to 100.

Figure 3—figure supplement 1
Total number of detected runs-of-homozygosity (ROH) diplotype clusters in each autosomal chromosome in UK Biobank with a minimum length (L) of 100 sites, a minimum genetic length of 0.1 cM, and a minimum width (W) of 100 samples.
Figure 4 with 1 supplement
Detected runs-of-homozygosity (ROH) diplotype clusters with at least 100 individuals sharing the same consensus with a minimum number of 100 SNPs.

Chromosome 18 has the lowest peak for individuals sharing an ROH diplotype. Chromosomes 2, 6, and 8 contain diplotypes shared with more than 100,000 individuals.

Figure 4—figure supplement 1
The number of individuals sharing the same runs-of-homozygosity (ROH) consensus with a minimum number of 100 SNPs after filtering out segments shorter than 0.1 cM.
Runs-of-homozygosity (ROH) associations between ROH diplotypes and mortality of COVID-19.

(a) Manhattan plot of ROH diplotypes across all chromosomes and mortality of COVID-19. Diplotypes with less than 10 cases were discarded. (b) UCSC genome browser (https://genome.ucsc.edu) view of the region containing the diplotype with a significant p-value in chromosome 4.

Consensuses of haplotype matches with a minimum length (L) of 3 and a minimum width (W) of 3.

(a) Clusters of haplotypes with two different objectives: maximizing the number of sites and maximizing the number of indivdiuals. The green rectangle ending at site 4 highlights a cluster that meets the requirement of W ≥ 3 and L ≥ 3 while maximizing the number of individuals (width-maximal). The blue rectangles ending at 4 maximize the number of sites (length-maximal). The blue rectangles ending at site 8 show a cluster with W ≥ 3 and L ≥ 3 maximizing the number of sites and number of individuals. This cluster is length-maximal because adding either column 5 or 9 will introduce a mismatch; It is also width-maximal because adding the third haplotype will introduce a mismatch. (b) Two clusters (clusters A and B) with the same starting and ending positions but different consensuses. Therefore, these two clusters are not merged and considered as separate clusters. Each line represents one individual and 0/0 alleles are highlighted in gray, and 1/1 alleles in black.

Tables

Table 1
Clusters of the runs-of-homozygosity (ROH) diplotypes with the lowest p-values in the HLA region for self-reported diseases using the British population in UK Biobank.

Detailed diplotype consensus sequences are available in Supplementary file 5. The p-values were calculated using PHESANT. Only the region with the lowest p-value has been included for each disease. Beta represents the effect size reported by PHESANT and D′ describes the non-random association of an ROH cluster and the overlapping SNP.

Disease (binary trait)Diplotype IDPosition (on chr6)p-valueBetaCarrier frequency (%)Odds ratioGenetic length (cM)GWAS p-value*GWAS beta*GWAS lead SNP*D
Ankylosing spondylitis131431031–314640504.62 × 10−340.1210.298.660.07119801.45 × 10−2rs1133404600.61
Hemochromatosis225969631–261081688.02 × 10−1200.4170.0924.510.011597----
Malabsorption/coeliac disease332564985–326297553.41 × 10−2590.3154.121.640.00540807.74 × 10−3rs92713521
Multiple sclerosis432410215–325541294.36 × 10−450.1920.373.790.0127361.05 × 10−1074.58 × 10−3rs92689250.99
Polymyalgia rheumatica531710968–317945927.31 × 10−090.0800.235.900.0068081.59 × 10−086.80 × 10−3rs11507481
Prostate problem (not cancer)634607958–351639742.84 × 10−080.0820.186.940.0348899.81 × 10−049.81 × 10−4rs761178340.03
Psoriasis731254263–312632161.20 × 10−1220.2141.212.733.07×10–0501.93 × 10−2rs132148721
Psoriatic arthropathy833072522–331157628.54 × 10−120.1220.203.970.0087084.76 × 10−101.01 × 10−3rs172214011
Rheumatoid arthritis932412539–325737608.15 × 10−1220.2080.232.340.012936.96 × 10−1248.24 × 10−3rs1885751170.98
  1. *
Table 2
Clusters of the runs-of-homozygosity (ROH) diplotypes with the lowest p-values outside of the HLA region for self-reported diseases using the British population in UK Biobank.

The p-values were calculated using PHESANT.

Disease (binary trait)Diplotype IDPositionp-valueBetaCarrier frequency (%)Odds ratioGenetic length (cM)GWAS p-value*GWAS beta*GWAS lead SNP*D
Deep venous thrombosis (dvt)10chr1:169075589–1695288303.10 × 10−210.0392.0810.490.567.41 × 10−166−3.13 × 10−2rs60251
Eczema/dermatitis11chr1:151515188–1519024941.52 × 10−270.0442.857.310.363.45 × 10−361.43 × 10−2rs558752221
12chr1:151940401–1522800329.46 × 10−240.05311.762.070.121.35 × 10−641.84 × 10−2rs618155591
13chr1:152493154–1529644791.53 × 10−210.0392.857.350.361.01 × 10−421.62 × 10−2rs618138751
Hypothyroidism/myxoedema14chr12:111910219–1128741794.51 × 10−210.0625.061.250.041.88 × 10−809.87 × 10−3rs71378280.99
  1. *

Additional files

Supplementary file 1

Number of individuals in detected runs-of-homozygosity (ROH) clusters in autosomal chromosomes of UKBB.

https://cdn.elifesciences.org/articles/81698/elife-81698-supp1-v2.zip
Supplementary file 2

The overlapping hotspots between runs-of-homozygosity (ROH)-DICE and Pemberton et al., 2012.

https://cdn.elifesciences.org/articles/81698/elife-81698-supp2-v2.zip
Supplementary file 3

The overlapping coldspots between runs-of-homozygosity (ROH)-DICE and Pemberton et al., 2012.

https://cdn.elifesciences.org/articles/81698/elife-81698-supp3-v2.zip
Supplementary file 4

100 clusters of the runs-of-homozygosity (ROH) diplotypes with the lowest p-values for self-reported non-cancerous diseases using the British population in UK Biobank.

https://cdn.elifesciences.org/articles/81698/elife-81698-supp4-v2.xlsx
Supplementary file 5

Runs-of-homozygosity (ROH) diplotype consensuses of the clusters with the lowest p-values.

https://cdn.elifesciences.org/articles/81698/elife-81698-supp5-v2.xlsx
Supplementary file 6

cPBWT algorithms for finding width- and length-maximal matches.

https://cdn.elifesciences.org/articles/81698/elife-81698-supp6-v2.pdf
MDAR checklist
https://cdn.elifesciences.org/articles/81698/elife-81698-mdarchecklist1-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ardalan Naseri
  2. Degui Zhi
  3. Shaojie Zhang
(2024)
Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in UK Biobank
eLife 13:e81698.
https://doi.org/10.7554/eLife.81698