An overview of our study of the impact of stability considerations on genetic fine-mapping. A. The two ways in which we perform fine-mapping, the first of which (colored in green) prioritizes the stability of variant discoveries to subpopulation perturbations. The data illustrates the case where there are two distinct environments, or subpopulations (denoted E1 and E2), that split the observations. B. Key steps in our comparison of the stability-guided approach with the popular residualization approach.

A list of 378 functional annotations across which the biological significances of stable and top fine-mapped single nucleotide polymorphisms are compared. Annotations that report multiple scores have the total number of scores reported shown in parentheses. Scores mined from the FAVOR database (Zhou et al. 2022) are indicated by an asterisk. (TSS = Transcription Start Site, bp = base pair)

Venn diagram showing the number of matching and non-matching variants for Potential Set 1.

Top Row. CADD scores. A. Empirical cumulative distribution functions of raw CADD scores of matching and non-matching variants across all genes, for Potential Set 1. Non-matching variants are further divided into stable and top variants, with a score lower threshold of 1.0 and upper threshold of 5.0 used to improve visualization. B. For a deleteriousness cutoff, the percent of (i) all matching variants, (ii) all nonmatching top variants, and (iii) all non-matching stable variants, which are classified as deleterious. We use a sliding cutoff threshold ranging from 10 to 20 as recommended by CADD authors. Bottom Row. Empirical cumulative distribution functions of perturbation scores of Enformer-predicted H3K27me3 ChIP-seq track. Score upper threshold of 0.015 and empirical CDF lower threshold of 0.5 used to improve visualization. C. Perturbation scores computed from predictions based on centering input sequences on the gene TSS as well as its two flanking positions. D. Perturbation scores computed from predictions based on centering input sequences on the gene TSS only.

A. Paired scatterplot of raw CADD scores of both top and stable variant for each gene, for Potential Set 1. B. Percent of genes that are classified as (i) having deleterious top variant only, (ii) having deleterious stable variant only, and (iii) having both top and stable variant deleterious, using a sliding cutoff threshold ranging from 10 to 20 as recommended by CADD authors.

List of 6 moderating factors considered.

Visual summary of the PICS algorithm described in Subsection 4.1. A. Breakdown of the calculation of the probability of a focal SNP A1 being causal. B. Illustration of the permutation procedure used to generate the null distribution. An example N × P genotype array with N = P = 6 is used, with two valid row shuffles, or permutations, of the original array shown. Entries affected by the shuffle are highlighted, as is the focal SNP (A3).

List of annotations used in our study, alongside their interpretations.

Pair density plot of posterior probabilities of the top variant and the stable variant, in case they match.

Pair density plot of posterior probabilities of the top variant and the stable variant, in case they do not match.