Figures and data

An overview of our study of the impact of stability considerations on genetic fine-mapping.
A. The two ways in which we perform fine-mapping, the first of which (colored in green) prioritizes the stability of variant discoveries to subpopulation perturbations. The data illustrates the case where there are two distinct environments, or subpopulations (denoted E1 and E2), that split the observations. B. Key steps in our comparison of the stability-guided approach with the popular residualization approach.

A list of 378 functional annotations across which the biological significances of stable and top fine-mapped single nucleotide polymorphisms are compared.
Annotations that report multiple scores have the total number of scores reported shown in parentheses. Scores mined from the FAVOR database (Zhou et al., 2023) are indicated by an asterisk. (TSS = Transcription Start Site, bp = base pair)

Simulation study results.
A. The frequency with which at least one causal variant is recovered in Potential Set 1 by Plain PICS and Stable PICS, across 1, 440 simulated gene expression data that incorporate ancestry-mediated environmental heterogeneity. Recovery frequencies are stratified by simulations differing in the number of causal variants, and the Venn diagram reports the number of matching and non-matching variants in Potential Set 1 across all simulations. B. The frequency with which at least one causal variant is recovered in Potential Set 1 by Combined PICS, Stable PICS and Top PICS, across 2, 400 simulated gene expression data. Recovery frequencies are stratified by the SNR parameter ϕ used in simulations, and the Venn diagram reports the number of matching and non-matching variants in Potential Set 1 across all simulations. C. The frequency with which at least one causal variant is recovered in Credible Set 1 by Stable SuSiE and Top SuSiE. Venn diagram reports the number of matching and non-matching variants in Potential Set 1 across all simulations. D. The frequency with which matching and non-matching variants in the first credible or potential set recover a causal variant, obtained from comparing top and stable approaches to an algorithm. Figure 2—figure supplement 1. Plain PICS vs Stable PICS (Potential Sets 2 and 3) Figure 2—figure supplement 2. Performance of PICS Algorithms (Potential Sets 2 and 3) Figure 2—figure supplement 3. Performance of SuSiE Algorithms (Credible Sets 2 and 3) Figure 2—figure supplement 4. Matching vs Non-matching Variants (Potential and Credible Sets 1 and 2) Figure 2—figure supplement 5. Stable PICS vs Stable SuSiE (1 causal variant) Figure 2—figure supplement 6. Stable PICS vs Stable SuSiE (2 causal variants) Figure 2—figure supplement 7. Stable PICS vs Stable SuSiE (3 causal variants) Figure 2—figure supplement 8. Matching Top vs Stable SNP posterior probabilities Figure 2—figure supplement 9. Non-matching Top vs Stable SNP posterior probabilities Figure 2—figure supplement 10. Stable PICS vs SuSiE (1 causal variant) Figure 2—figure supplement 11. Stable PICS vs SuSiE (2 causal variants) Figure 2—figure supplement 12. Stable PICS vs SuSiE (3 causal variants) Figure 2—figure supplement 13. Distribution of number of variants recovered by PICS (1 causal variant) Figure 2—figure supplement 14. Distribution of number of variants recovered by PICS (2 causal variants) Figure 2—figure supplement 15. Distribution of number of variants recovered by PICS (3 causal variants) Figure 2—figure supplement 16. Distribution of number of variants recovered by SuSiE (1 causal variant) Figure 2—figure supplement 17. Distribution of number of variants recovered by SuSiE (2 causal variants) Figure 2—figure supplement 18. Distribution of number of variants recovered by SuSiE (3 causal variants) Figure 2—figure supplement 19. Variant recovery frequency of PICS and SuSiE matching and non-matching variants (1 causal variant) Figure 2—figure supplement 20. Distribution of number of variants recovered by PICS matching and non-matching variants (2 causal variants) Figure 2—figure supplement 21. Distribution of number of variants recovered by SuSiE matching and non-matching variants (2 causal variants) Figure 2—figure supplement 22. Distribution of number of variants recovered by PICS matching and non-matching variants (3 causal variants) Figure 2—figure supplement 23. Distribution of number of variants recovered by SuSiE matching and non-matching variants (3 causal variants) Figure 2—figure supplement 24. Plain PICS vs Stable PICS in environmental heterogeneity Simulations (1 causal variant) Figure 2—figure supplement 25. Plain PICS vs Stable PICS in environmental heterogeneity simulations (2 causal variants) Figure 2—figure supplement 26. Plain PICS vs Stable PICS in environmental heterogeneity simulations (3 causal variants) Figure 2—figure supplement 27. Plain PICS vs Stable PICS in “variance shift (t = 8)” environmental heterogeneity simulations Figure 2—figure supplement 28. Plain PICS vs Stable PICS in “variance shift (t = 16)” environmental heterogeneity simulations Figure 2—figure supplement 29. Plain PICS vs Stable PICS in “variance shift (t = 128)” environmental heterogeneity simulations Figure 2—figure supplement 30. Plain PICS vs Stable PICS in “variance shift (t = 256)” environmental heterogeneity simulations Figure 2—figure supplement 31. Plain PICS vs Stable PICS in “mean shift (|i − 3|)” environmental heterogeneity simulations Figure 2—figure supplement 32. Plain PICS vs Stable PICS in “mean shift (i = 3)” environmental heterogeneity simulations

Venn diagram showing the number of matching and non-matching variants for Potential Set 1 in GEUVADIS fine-mapped variants.
Figure 3—figure supplement 1. Matching GEUVADIS Top vs Stable SNP posterior probabilities Figure 3—figure supplement 2. Non-matching GEUVADIS Top vs Stable SNP posterior probabilities

Top Row. CADD scores. A. Empirical cumulative distribution functions of raw CADD scores of matching and non-matching variants across all genes, for Potential Set 1. Non-matching variants are further divided into stable and top variants, with a score lower threshold of 1.0 and upper threshold of 5.0 used to improve visualization. B. For a deleteriousness cutoff, the percent of (i) all matching variants, (ii) all non-matching top variants, and (iii) all non-matching stable variants, which are classified as deleterious. We use a sliding cutoff threshold ranging from 10 to 20 as recommended by CADD authors. Bottom Row. Empirical cumulative distribution functions of perturbation scores of Enformer-predicted H3K27me3 ChIP-seq track. Score upper threshold of 0.015 and empirical CDF lower threshold of 0.5 used to improve visualization. C. Perturbation scores computed from predictions based on centering input sequences on the gene TSS as well as its two flanking positions. D. Perturbation scores computed from predictions based on centering input sequences on the gene TSS only.

A. Paired scatterplot of raw CADD scores of both top and stable variant for each gene, for Potential Set 1. B. Percent of genes that are classified as (i) having deleterious top variant only, (ii) having deleterious stable variant only, and (iii) having both top and stable variant deleterious, using a sliding cutoff threshold ranging from 10 to 20 as recommended by CADD authors.

List of 6 moderating factors considered.

Visual summary of the PICS algorithm described in Probabilistic Identification of Causal SNPs.
A. Breakdown of the calculation of the probability of a focal SNP Ai being causal. B. Illustration of the permutation procedure used to generate the null distribution. An example N × P genotype array with N = P = 6 is used, with two valid row shuffles, or permutations, of the original array shown. Entries affected by the shuffle are highlighted, as is the focal SNP (A3).

Frequency with which at least one causal variant is recovered in Potential Sets 2 and 3 by Plain PICS and Stable PICS, across 1, 440 simulated gene expression phenotypes.
Recovery frequencies are stratified by simulations differing in the number of causal variants, but Venn diagrams report the number of matching and non-matching variants across all simulations.

Frequency with which at least one causal variant is recovered in Potential Sets 2 and 3 by Combined PICS, Stable PICS and Top PICS, across 2, 400 simulated gene expression phenotypes.
Recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ, but Venn diagrams report the number of matching and non-matching variants across all simulations.

Frequency with which at least one causal variant is recovered in Potential Sets 2 and 3 by Stable SuSiE and Top SuSiE, across 2, 400 simulated gene expression phenotypes.
Recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ, but Venn diagrams report the number of matching and non-matching variants across all simulations.

Frequencies with which matching and non-matching variants in the credible or potential set recover a causal variant, obtained from comparing top and stable approaches to an algorithm.
Analysis is performed over 2, 400 simulated gene expression phenotypes, and recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ. A. Credible or Potential Set 2. B. Credible or Potential Set 3.

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by stability-guided algorithms in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1 or 2) recovered by stability-guided algorithms in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by stability-guided algorithms in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Posterior probabilities of matching top and stable variants across 2, 400 simulated gene expression phenotypes.
Points are coloured by the number of causal variants, S ∈ {1, 2, 3}, set in simulations.

Posterior probabilities of non-matching top and stable variants across 2, 400 simulated gene expression phenotypes.
Points are coloured by the number of causal variants, S ∈ {1, 2, 3}, set in simulations.

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by Stable PICS or Top SuSiE in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1 or 2) recovered by Stable PICS or Top SuSiE in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by Stable PICS or Top SuSiE in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by PICS algorithms in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1 or 2) recovered by PICS algorithms in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by PICS algorithms in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by SuSiE algorithms in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of credible sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1 or 2) recovered by SuSiE algorithms in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of credible sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by SuSiE algorithms in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of credible sets on the distribution is shown (increasing number of included sets from top to bottom).

Frequency with which the causal variant is recovered by a matching variant, non-matching top variant or non-matching stable variant in simulations involving one causal variant.
Recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ.

Empirical discrete probability distributions over the number of causal variants (0, 1 or 2) recovered by matching Top and Stable PICS variants, non-matching Top PICS variants and non-matching Stable PICS variants, in simulations involving two causal variants.
Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ (increasing SNR from left to right).

Empirical discrete probability distributions over the number of causal variants (0, 1 or 2) recovered by matching Top and Stable SuSiE variants, non-matching Top SuSiE variants and non-matching Stable SuSiE variants, in simulations involving two causal variants.
Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ (increasing SNR from left to right).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by matching Top and Stable PICS variants, non-matching Top PICS variants and non-matching Stable PICS variants, in simulations involving three causal variants.
Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ (increasing SNR from left to right).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by matching Top and Stable SuSiE variants, non-matching Top SuSiE variants and non-matching Stable SuSiE variants, in simulations involving three causal variants.
Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter ϕ (increasing SNR from left to right).

Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by Plain or Stable PICS in simulations with one causal variant and environmental heterogeneity, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1 or 2) recovered by Plain or Stable PICS in simulations with two causal variants and environmental heterogeneity, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by Plain or Stable PICS in simulations with three causal variants and environmental heterogeneity, stratified by the SNR parameter used in simulations (increasing SNR from left to right).
The impact of including more number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by Plain or Stable PICS in “t = 8” simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right).
Each row reports the distribution for simulations with a specific number of causal variants (1, 2 or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by Plain or Stable PICS in “t = 16” simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right).
Each row reports the distribution for simulations with a specific number of causal variants (1, 2 or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by Plain or Stable PICS in “t = 128” simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right).
Each row reports the distribution for simulations with a specific number of causal variants (1, 2 or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by Plain or Stable PICS in “t = 256” simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right).
Each row reports the distribution for simulations with a specific number of causal variants (1, 2 or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by Plain or Stable PICS in “|i−3|” simulations involving environmental heterogeneity (mean shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right).
Each row reports the distribution for simulations with a specific number of causal variants (1, 2 or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Empirical discrete probability distributions over the number of causal variants (0, 1, 2 or 3) recovered by Plain or Stable PICS in “i = 3” simulations involving environmental heterogeneity (mean shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right).
Each row reports the distribution for simulations with a specific number of causal variants (1, 2 or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.

Pair density plot of posterior probabilities of the top variant and the stable variant, in case they match.
