The impact of stability considerations on genetic fine-mapping

  1. Alan J Aw
  2. Lionel Chentian Jin
  3. Nilah Ioannidis  Is a corresponding author
  4. Yun S Song  Is a corresponding author
  1. Department of Statistics, University of California, United States
  2. Center for Computational Biology, University of California, United States
  3. McKinsey & Company, United States
  4. Computer Science Division, University of California, United States
12 figures, 14 tables and 1 additional file

Figures

An overview of our study of the impact of stability considerations on genetic fine-mapping.

(A) The two ways in which we perform fine-mapping, the first of which (colored in green) prioritizes the stability of variant discoveries to subpopulation perturbations. The data illustrates the case where there are two distinct environments, or subpopulations (denoted E1 and E2), that split the observations. (B) Key steps in our comparison of the stability-guided approach with the popular residualization approach.

Figure 2 with 32 supplements
Simulation study results.

(A) The frequency with which at least one causal variant is recovered in Potential Set 1 by Plain PICS and Stable PICS, across 1440 simulated gene expression data that incorporate ancestry-mediated environmental heterogeneity. Recovery frequencies are stratified by simulations differing in the number of causal variants, and the Venn diagram reports the number of matching and non-matching variants in Potential Set 1 across all simulations. (B) The frequency with which at least one causal variant is recovered in Potential Set 1 by Combined PICS, Stable PICS, and Top PICS, across 2400 simulated gene expression data. Recovery frequencies are stratified by the SNR parameter ϕ used in simulations, and the Venn diagram reports the number of matching and non-matching variants in Potential Set 1 across all simulations. (C) The frequency with which at least one causal variant is recovered in Credible Set 1 by Stable SuSiE and Top SuSiE. Venn diagram reports the number of matching and non-matching variants in Potential Set 1 across all simulations. (D) The frequency with which matching and non-matching variants in the first credible or potential set recover a causal variant, obtained from comparing top and stable approaches to an algorithm. We report approximate 95% confidence intervals for each point estimate, by multiplying the associated standard error of the estimate by 1.96.

Figure 2—figure supplement 1
Plain PICS vs Stable PICS (Potential Sets 2 and 3).

Frequency with which at least one causal variant is recovered in Potential Sets 2 and 3 by Plain PICS and Stable PICS, across 1440 simulated gene expression phenotypes. Recovery frequencies are stratified by simulations differing in the number of causal variants, but Venn diagrams report the number of matching and non-matching variants across all simulations.

Figure 2—figure supplement 2
Performance of PICS algorithms (Potential Sets 2 and 3).

Frequency with which at least one causal variant is recovered in Potential Sets 2 and 3 by Combined PICS, Stable PICS, and Top PICS, across 2400 simulated gene expression phenotypes. Recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ, but Venn diagrams report the number of matching and non-matching variants across all simulations.

Figure 2—figure supplement 3
Performance of SuSiE algorithms (Credible Sets 2 and 3).

Frequency with which at least one causal variant is recovered in Potential Sets 2 and 3 by Stable SuSiE and Top SuSiE, across 2400 simulated gene expression phenotypes. Recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ, but Venn diagrams report the number of matching and non-matching variants across all simulations.

Figure 2—figure supplement 4
Matching vs non-matching variants (Potential and Credible Sets 1 and 2).

Frequencies with which matching and non-matching variants in the credible or potential set recover a causal variant, obtained from comparing top and stable approaches to an algorithm. Analysis is performed over 2400 simulated gene expression phenotypes, and recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ. (A) Credible or Potential Set 2. (B) Credible or Potential Set 3.

Figure 2—figure supplement 5
Stable PICS vs Stable SuSiE (one causal variant).

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by stability-guided algorithms in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 6
Stable PICS vs Stable SuSiE (two causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by stability-guided algorithms in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 7
Stable PICS vs Stable SuSiE (three causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by stability-guided algorithms in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 8
Matching Top vs Stable SNP posterior probabilities.

Posterior probabilities of matching top and stable variants across 2400 simulated gene expression phenotypes. Points are colored by the number of causal variants,,S{1,2,3} set in simulations.

Figure 2—figure supplement 9
Non-matching Top vs Stable SNP posterior probabilities.

Posterior probabilities of non-matching top and stable variants across 2400 simulated gene expression phenotypes. Points are colored by the number of causal variants,,S{1,2,3} set in simulations.

Figure 2—figure supplement 10
Stable PICS vs SuSiE (one causal variant).

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by Stable PICS or Top SuSiE in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 11
Stable PICS vs SuSiE (two causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by Stable PICS or Top SuSiE in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 12
Stable PICS vs SuSiE (three causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Stable PICS or Top SuSiE in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 13
Distribution of the number of variants recovered by PICS (one causal variant).

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by PICS algorithms in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a larger number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 14
Distribution of the number of variants recovered by PICS (two causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by PICS algorithms in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a larger number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 15
Distribution of the number of variants recovered by PICS (three causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by PICS algorithms in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 16
Distribution of the number of variants recovered by SuSiE (one causal variant).

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by SuSiE algorithms in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 17
Distribution of number of variants recovered by SuSiE (two causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by SuSiE algorithms in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 18
Distribution of number of variants recovered by SuSiE (three causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by SuSiE algorithms in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 19
Variant recovery frequency of PICS and SuSiE matching and non-matching variants (one causal variant).

Frequency with which the causal variant is recovered by a matching variant, non-matching top variant, or non-matching stable variant in simulations involving one causal variant. Recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ.

Figure 2—figure supplement 20
Distribution of number of variants recovered by PICS matching and non-matching variants (two causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by matching Top and Stable PICS variants, non-matching Top PICS variants, and non-matching Stable PICS variants, in simulations involving two causal variants. Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ (increasing SNR from left to right).

Figure 2—figure supplement 21
Distribution of the number of variants recovered by SuSiE matching and non-matching variants (two causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by matching Top and Stable SuSiE variants, non-matching Top SuSiE variants, and non-matching Stable SuSiE variants, in simulations involving two causal variants. Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ (increasing SNR from left to right).

Figure 2—figure supplement 22
Distribution of number of variants recovered by PICS matching and non-matching variants (three causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by matching Top and Stable PICS variants, non-matching Top PICS variants, and non-matching Stable PICS variants, in simulations involving three causal variants. Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ (increasing SNR from left to right).

Figure 2—figure supplement 23
Distribution of number of variants recovered by SuSiE matching and non-matching variants (three causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by matching Top and Stable SuSiE variants, non-matching Top SuSiE variants, and non-matching Stable SuSiE variants, in simulations involving three causal variants. Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ (increasing SNR from left to right).

Figure 2—figure supplement 24
Plain PICS vs Stable PICS in environmental heterogeneity simulations (one causal variant).

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by Plain or Stable PICS in simulations with one causal variant and environmental heterogeneity, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 25
Plain PICS vs Stable PICS in environmental heterogeneity simulations (two causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by Plain or Stable PICS in simulations with two causal variants and environmental heterogeneity, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 26
Plain PICS vs Stable PICS in environmental heterogeneity simulations (three causal variants).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in simulations with three causal variants and environmental heterogeneity, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a larger number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Figure 2—figure supplement 27
Plain PICS vs Stable PICS in ‘variance shift (t = 8)’ environmental heterogeneity simulations.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘t=8’ simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Figure 2—figure supplement 28
Plain PICS vs Stable PICS in ‘variance shift (t = 16)’ environmental heterogeneity simulations.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘t=16’ simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Figure 2—figure supplement 29
Plain PICS vs Stable PICS in ‘variance shift (t = 128)’ environmental heterogeneity simulations.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘t=128’ simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Figure 2—figure supplement 30
Plain PICS vs Stable PICS in ‘variance shift (t = 256)’ environmental heterogeneity simulations.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘t=256’ simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Figure 2—figure supplement 31
Plain PICS vs Stable PICS in ‘mean shift (|i − 3|)’ environmental heterogeneity simulations.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘|i3|’ simulations involving environmental heterogeneity (mean shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Figure 2—figure supplement 32
Plain PICS vs Stable PICS in ‘mean shift (i = 3)’ environmental heterogeneity simulations.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘i=3’ simulations involving environmental heterogeneity (mean shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Figure 3 with 2 supplements
Venn diagram showing the number of matching and non-matching variants for Potential Set 1 in GEUVADIS fine-mapped variants.
Figure 3—figure supplement 1
Matching GEUVADIS Top vs Stable SNP posterior probabilities.

Pair density plot of posterior probabilities of the top variant and the stable variant, in case they match.

Figure 3—figure supplement 2
Non-matching GEUVADIS Top vs Stable SNP posterior probabilities.

Pair density plot of posterior probabilities of the top variant and the stable variant, in case they do not match.

Distribution of computational VEP scores across matching and non-matching variants.

Top row. CADD scores. (A) Empirical cumulative distribution functions of raw CADD scores of matching and non-matching variants across all genes, for Potential Set 1. Non-matching variants are further divided into stable and top variants, with a score lower threshold of 1.0 and upper threshold of 5.0 used to improve visualization. (B) For a deleteriousness cutoff, the percent of (1) all matching variants, (2) all non-matching top variants, and (3) all non-matching stable variants, which are classified as deleterious. We use a sliding cutoff threshold ranging from 10 to 20 as recommended by CADD authors. For each value along the x-axis, 95% confidence intervals for point estimates on the y-axis were obtained using the Sison-Glaz method for constructing multinomial distribution standard errors (R command DescTools::MultinomCI(...)). Bottom row. Empirical cumulative distribution functions of perturbation scores of Enformer-predicted H3K27me3 ChIP-seq track. Score upper threshold of 0.015 and empirical CDF lower threshold of 0.5 used to improve visualization. (C) Perturbation scores computed from predictions based on centering input sequences on the gene TSS as well as its two flanking positions. (D) Perturbation scores computed from predictions based on centering input sequences on the gene TSS only.

Comparison of CADD scores across non-matching top and stable variants.

(A) Paired scatterplot of raw CADD scores of both top and stable variant for each gene, for Potential Set 1. (B) Percent of genes that are classified as (1) having deleterious top variant only, (2) having deleterious stable variant only, and (3) having both top and stable variant deleterious, using a sliding cutoff threshold ranging from 10 to 20 as recommended by CADD authors.

Visual summary of the PICS algorithm described in Probabilistic Identification of Causal SNPs.

(A) Breakdown of the calculation of the probability of a focal SNP Ai being causal. (B) Illustration of the permutation procedure used to generate the null distribution. An example N×P genotype array with N=P=6 is used, with two valid row shuffles, or permutations, of the original array shown. Entries affected by the shuffle are highlighted, as is the focal SNP (A3).

Author response image 1
Author response image 2
Author response image 3
Author response image 4
Author response image 5
Author response image 6

Tables

Table 1
A list of 378 functional annotations across which the biological significances of stable and top fine-mapped single nucleotide polymorphisms are compared.

Annotations that report multiple scores have the total number of scores reported shown in parentheses. Scores mined from the FAVOR database (Zhou et al., 2023) are indicated by an asterisk (TSS = transcription start site, bp = base pair).

Functional annotation typeFunctional annotation
EnsemblDistance to Canonical TSS (Cunningham et al., 2022)
Regulatory Features (6; Cunningham et al., 2022)
Computational predictionsCADD∗ (2; Rentzsch et al., 2019)
SIFTVal∗ (Ng and Henikoff, 2003)
FATHMM-XF∗ (Rogers et al., 2018)
LINSIGHT∗ (Huang et al., 2017)
Polyphen∗ (Adzhubei et al., 2010)
PhyloP∗ (3; Pollard et al., 2010)
Gerp∗ (2; Davydov et al., 2010)
B Statistic∗ (McVicker et al., 2009)
FunSeq2∗ (Fu et al., 2014)
ALoFT∗ (Balasubramanian et al., 2017)
Percent CpG in 75 bp window∗ (Rentzsch et al., 2019)
Percent GC in 75 bp window∗ (Rentzsch et al., 2019)
FIRE (Ioannidis et al., 2017)
Enformer (177 tracks × 2 scores per track; Avsec et al., 2021)
Table 2
List of six moderating factors considered.
ModeratorQuantity/statistic computed
(1) Degree of StabilityNo. subpopulations for which stable variant has positive probability
(2) Population DiversityMaximum of pairwise allele frequency difference between subpopulations for which stable variant has positive posterior probability
(3) Population DifferentiationMaximum FST between subpopulations for
which stable variant has positive posterior probability
(4) Inclusion of Distal Subpopulations (Top)Whether or not the top variant also had positive probability in Yoruban subpopulation when the stability-guided approach was used
(5) Inclusion of Distal Subpopulations (Stable)Whether or not the stable variant had positive probability in Yoruban subpopulation when the stability-guided approach was used
(6) Degree of Certainty of Causality Using Residualization ApproachPosterior probability of top variant
Appendix 12—table 1
Plain and Stable PICS matching frequencies.

Below reports the frequencies with which Plain and Stable PICS have matching variants for the same potential set. The numbers of matching variants for each SNR scenario are reported in the parentheses. The bottom two rows show matching frequencies when results are stratified by posterior probability (PP) of the Plain PICS variant. The numbers of matching variants for each PP stratum are reported in the parentheses.

Stratified by signal-to-noise ratio (SNR) of simulations
Potential Set 1Potential Set 2Potential Set 3
SNR = 0.0530.736 (265)0.803 (289)0.797 (287)
SNR = 0.1110.775 (279)0.753 (271)0.758 (273)
SNR = 0.250.903 (325)0.714 (257)0.728 (262)
SNR = 0.6670.906 (326)0.753 (271)0.744 (268)
Stratified by posterior probability (PP) of plain PICS variant
p > 0.90.978 (441)0.899 (286)0.927 (307)
p ≤ 0.90.762 (754)0.715 (802)0.706 (783)
Appendix 12—table 2
Stable and Top PICS matching frequencies.

Below reports the frequencies with which Stable and Top PICS have matching variants for the same potential set. The numbers of matching variants for each SNR/‘No. Causal Variants’ scenario are reported in the parentheses.

Stratified by signal-to-noise ratio (SNR) and No. Causal Variants (S) in simulations
Potential Set 1Potential Set 2Potential Set 3
One causal variant (S = 1)SNR = 0.0530.695 (139)0.41 (82)0.22 (44)
SNR = 0.1110.75 (150)0.405 (81)0.24 (48)
SNR = 0.250.825 (165)0.45 (90)0.26 (52)
SNR = 0.6670.895 (179)0.405 (81)0.275 (55)
Two causal variants (S = 2)SNR = 0.0530.545 (109)0.36 (72)0.225 (45)
SNR = 0.1110.68 (136)0.38 (76)0.215 (43)
SNR = 0.250.79 (158)0.435 (87)0.27 (54)
SNR = 0.6670.78 (156)0.41 (82)0.26 (52)
Three causal variants (S = 3)SNR = 0.0530.565 (113)0.37 (74)0.245 (49)
SNR = 0.1110.655 (131)0.36 (72)0.265 (53)
SNR = 0.250.72 (144)0.39 (78)0.22 (44)
SNR = 0.6670.785 (157)0.48 (96)0.255 (51)
Appendix 12—table 3
Stable and Top SuSiE matching frequencies.

Below reports the frequencies with which Stable and Top SuSiE have matching variants for the same potential set. The numbers of matching variants for each SNR/‘No. Causal Variants’ scenario are reported in the parentheses.

Stratified by signal-to-noise ratio (SNR) and No. Causal Variants (S) in simulations
Potential Set 1Potential Set 2Potential Set 3
One causal variant (S = 1)SNR = 0.0530.55 (110)0.115 (23)0.095 (19)
SNR = 0.1110.725 (145)0.14 (28)0.095 (19)
SNR = 0.250.84 (168)0.145 (29)0.17 (34)
SNR = 0.6670.875 (175)0.14 (28)0.19 (38)
Two causal variants (S = 2)SNR = 0.0530.505 (101)0.095 (19)0.065 (13)
SNR = 0.1110.73 (146)0.14 (28)0.095 (19)
SNR = 0.250.875 (175)0.33 (66)0.105 (21)
SNR = 0.6670.875 (175)0.425 (85)0.115 (23)
Three causal variants (S = 3)SNR = 0.0530.345 (69)0.075 (15)0.085 (17)
SNR = 0.1110.68 (136)0.23 (46)0.145 (29)
SNR = 0.250.795 (159)0.37 (74)0.185 (37)
SNR = 0.6670.85 (170)0.585 (117)0.25 (50)
Appendix 12—table 4
Off-diagonal matching frequencies and causal variant recovery.

Below reports the number of Stable and Top PICS non-matching variants that match across different, or ‘off-diagonal’, potential sets. Frequencies are computed across simulations with the same number of causal variants (S = 1, 2, or 3), with numbers along the yellow-shaded diagonal reporting the number of non-matching variants between the same potential sets. Each off-diagonal element reports both the number of matching variants for the pair of potential sets listed as well as the percentage of these matches that also correspond to the causal variant.

Simulations with one causal variant
Top PICS potential set compared against
Potential Set 1Potential Set 2Potential Set 3
Stable PICS potential setPotential Set 11675 (60%)3 (67%)
Potential Set 24 (25%)466104 (0.96%)
Potential Set 3094 (0%)601
Simulations with two causal variants
Top PICS potential set compared against
Potential Set 1Potential Set 2Potential Set 3
Stable PICS potential setPotential Set 124124 (46%)5 (60%)
Potential Set 229 (52%)48388 (10%)
Potential Set 39 (11%)84 (13%)606
Simulations with three causal variants
Top PICS potential set compared against
Potential Set 1Potential Set 2Potential Set 3
Stable PICS potential setPotential Set 125529 (66%)7 (14%)
Potential Set 230 (40%)48079 (18%)
Potential Set 34 (0%)65 (12%)603
Appendix 12—table 5
List of matching variants with low stable posterior probability.

Below summarizes the genes and potential sets for which Stable and Top PICS returned matching variants, along with SNP-level and fine-mapping features for interpretation. Five statistics are reported: posterior probability of the stable variant (Stable PP); posterior probability of the top variant (Top PP); posterior probability support size, defined as the number variants with positive probability (Support Size); the number of ancestry slices, including the ALL slice, for which the stable variant had positive posterior probability from running Stable PICS (Number of Slices); the maximum difference in allele frequency between any pair of subpopulations among YRI, TSI, GBR, FIN, and CEU (Max AF Difference).

Potential SetGeneMatching variantStable PPTop PPSupport sizeNumber of slicesMax AF Difference
1ENSG00000134762.11rs617319210.00280.762340.22
1ENSG00000197847.8rs71309550.00750.234530.18
1ENSG00000255284.1rs122248940.00670.652360.14
1ENSG00000104442.5rs69952420.00920.314240.34
1ENSG00000146733.9rs102395280.00310.532750.24
1ENSG00000248468.1rs98535050.00990.293930.43
1ENSG00000122224.10rs574490.00890.502540.31
1ENSG00000134262.8rs174645250.00300.452340.15
2ENSG00000216522.3rs57519020.00521730.16
2ENSG00000108592.9rs99122010.00220.323260.27
2ENSG00000134551.7rs73158430.00190.581050.22
2ENSG00000221947.3rs31038600.00180.99440.084
2ENSG00000081791.4rs22701130.00590.771140.33
3ENSG00000140368.8rs620272960.00690.292140.15
3ENSG00000254614.1rs6257500.00170.64430.22
3ENSG00000133835.9rs24518180.00360.403230.41
3ENSG00000158234.8rs6932930.00690.552440.13
Appendix 12—table 6
List of variant annotations with interpretations.
Functional annotationInterpretation
Distance to Canonical Transcription Start Site (TSS)-
Percent CpG in 75 bp window centered on variant position-
Percent GC in 75 bp window centered on variant position-
CTCF Binding EnrichmentWhether the variant lies within a CTCF binding site region as predicted by Ensembl
Enhancer EnrichmentWhether the variant lies within an enhancer region as predicted by Ensembl
Open Chromatin EnrichmentWhether the variant lies within an open chromatin region as predicted by Ensembl
Promoter EnrichmentWhether the variant lies within a promoter region as predicted by Ensembl
TF Binding EnrichmentWhether the variant lies within a TF-binding site region as predicted by Ensembl
Promoter Flanking EnrichmentWhether the variant lies within a promoter flanking region as predicted by Ensembl
CADD (2 scores)Whether the variant is likely to be simulated or not, and hence likely deleterious or not. One score is raw while the other is rank-normalized
SIFTValWhether the variant affects protein function, and hence deleterious
Polyphen2Posterior probability that the variant is damaging
LINSIGHTProbability that the variant site is under selection, thus having functional consequence
PhyloP (3 scores)Substitution rates measuring cross-species evolutionary conservation at the site of the variant. Each score is computed with respect to a clade (vertebrate, mammal, primate)
GerpNEstimated neutral substitution rate at variant position, with higher value implying greater conservation
GerpSEstimated rejected substitution rate at variant position, with positive value implying a deficit in substitutions
B StatisticBackground selection at variant position, with smaller value indicating larger impact of selection
FATHMM-XFIntegrative score measuring deleteriousness of the variant
Funseq2Integrative score measuring deleteriousness of the variant
ALoftIntegrative score measuring loss of function associated with the variant
FIREIntegrative score measuring deleteriousness of the variant
Magnitude of Effect on Enformer Track Prediction (177 tracks)Change in prediction of a gene regulatory track when performing in silico mutagenesis on the variant in a 196,608 bp sequence
Author response table 1
Min1st Qu.MedianMean3rd Qu.Max
-0.999-0.342-0.107-0.1170.0670.949
Author response table 2
Min1st Qu.MedianMean3rd Qu.Max
-1.000-0.334-0.0741-0.07780.1620.976
Author response table 3
Min1st Qu.MedianMean3rd Qu.Max
-0.998-0.327-0.0564-0.06290.1910.968
Author response table 4
Min1st Qu.MedianMean3rd Qu.Max
-0.994-0.266-0.00645-0.1090.0490.924
Author response table 5
Min1st Qu.MedianMean3rd Qu.Max
-0.998-0.376-0.0944-0.1210.1150.903
Author response table 6
Min1st Qu.MedianMean3rd Qu.Max
-0.998-0.389-0.100-0.1140.1460.915

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Alan J Aw
  2. Lionel Chentian Jin
  3. Nilah Ioannidis
  4. Yun S Song
(2026)
The impact of stability considerations on genetic fine-mapping
eLife 12:RP88039.
https://doi.org/10.7554/eLife.88039.3