The impact of stability considerations on genetic fine-mapping
Figures
An overview of our study of the impact of stability considerations on genetic fine-mapping.
(A) The two ways in which we perform fine-mapping, the first of which (colored in green) prioritizes the stability of variant discoveries to subpopulation perturbations. The data illustrates the case where there are two distinct environments, or subpopulations (denoted and ), that split the observations. (B) Key steps in our comparison of the stability-guided approach with the popular residualization approach.
Simulation study results.
(A) The frequency with which at least one causal variant is recovered in Potential Set 1 by Plain PICS and Stable PICS, across 1440 simulated gene expression data that incorporate ancestry-mediated environmental heterogeneity. Recovery frequencies are stratified by simulations differing in the number of causal variants, and the Venn diagram reports the number of matching and non-matching variants in Potential Set 1 across all simulations. (B) The frequency with which at least one causal variant is recovered in Potential Set 1 by Combined PICS, Stable PICS, and Top PICS, across 2400 simulated gene expression data. Recovery frequencies are stratified by the SNR parameter used in simulations, and the Venn diagram reports the number of matching and non-matching variants in Potential Set 1 across all simulations. (C) The frequency with which at least one causal variant is recovered in Credible Set 1 by Stable SuSiE and Top SuSiE. Venn diagram reports the number of matching and non-matching variants in Potential Set 1 across all simulations. (D) The frequency with which matching and non-matching variants in the first credible or potential set recover a causal variant, obtained from comparing top and stable approaches to an algorithm. We report approximate 95% confidence intervals for each point estimate, by multiplying the associated standard error of the estimate by 1.96.
Plain PICS vs Stable PICS (Potential Sets 2 and 3).
Frequency with which at least one causal variant is recovered in Potential Sets 2 and 3 by Plain PICS and Stable PICS, across 1440 simulated gene expression phenotypes. Recovery frequencies are stratified by simulations differing in the number of causal variants, but Venn diagrams report the number of matching and non-matching variants across all simulations.
Performance of PICS algorithms (Potential Sets 2 and 3).
Frequency with which at least one causal variant is recovered in Potential Sets 2 and 3 by Combined PICS, Stable PICS, and Top PICS, across 2400 simulated gene expression phenotypes. Recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter , but Venn diagrams report the number of matching and non-matching variants across all simulations.
Performance of SuSiE algorithms (Credible Sets 2 and 3).
Frequency with which at least one causal variant is recovered in Potential Sets 2 and 3 by Stable SuSiE and Top SuSiE, across 2400 simulated gene expression phenotypes. Recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter , but Venn diagrams report the number of matching and non-matching variants across all simulations.
Matching vs non-matching variants (Potential and Credible Sets 1 and 2).
Frequencies with which matching and non-matching variants in the credible or potential set recover a causal variant, obtained from comparing top and stable approaches to an algorithm. Analysis is performed over 2400 simulated gene expression phenotypes, and recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter . (A) Credible or Potential Set 2. (B) Credible or Potential Set 3.
Stable PICS vs Stable SuSiE (one causal variant).
Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by stability-guided algorithms in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Stable PICS vs Stable SuSiE (two causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by stability-guided algorithms in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Stable PICS vs Stable SuSiE (three causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by stability-guided algorithms in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Matching Top vs Stable SNP posterior probabilities.
Posterior probabilities of matching top and stable variants across 2400 simulated gene expression phenotypes. Points are colored by the number of causal variants,, set in simulations.
Non-matching Top vs Stable SNP posterior probabilities.
Posterior probabilities of non-matching top and stable variants across 2400 simulated gene expression phenotypes. Points are colored by the number of causal variants,, set in simulations.
Stable PICS vs SuSiE (one causal variant).
Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by Stable PICS or Top SuSiE in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Stable PICS vs SuSiE (two causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by Stable PICS or Top SuSiE in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Stable PICS vs SuSiE (three causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Stable PICS or Top SuSiE in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible or potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Distribution of the number of variants recovered by PICS (one causal variant).
Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by PICS algorithms in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a larger number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Distribution of the number of variants recovered by PICS (two causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by PICS algorithms in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a larger number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Distribution of the number of variants recovered by PICS (three causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by PICS algorithms in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Distribution of the number of variants recovered by SuSiE (one causal variant).
Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by SuSiE algorithms in simulations with one causal variant, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible sets on the distribution is shown (increasing number of included sets from top to bottom).
Distribution of number of variants recovered by SuSiE (two causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by SuSiE algorithms in simulations with two causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible sets on the distribution is shown (increasing number of included sets from top to bottom).
Distribution of number of variants recovered by SuSiE (three causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by SuSiE algorithms in simulations with three causal variants, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of credible sets on the distribution is shown (increasing number of included sets from top to bottom).
Variant recovery frequency of PICS and SuSiE matching and non-matching variants (one causal variant).
Frequency with which the causal variant is recovered by a matching variant, non-matching top variant, or non-matching stable variant in simulations involving one causal variant. Recovery frequencies are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter .
Distribution of number of variants recovered by PICS matching and non-matching variants (two causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by matching Top and Stable PICS variants, non-matching Top PICS variants, and non-matching Stable PICS variants, in simulations involving two causal variants. Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter (increasing SNR from left to right).
Distribution of the number of variants recovered by SuSiE matching and non-matching variants (two causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by matching Top and Stable SuSiE variants, non-matching Top SuSiE variants, and non-matching Stable SuSiE variants, in simulations involving two causal variants. Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter (increasing SNR from left to right).
Distribution of number of variants recovered by PICS matching and non-matching variants (three causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by matching Top and Stable PICS variants, non-matching Top PICS variants, and non-matching Stable PICS variants, in simulations involving three causal variants. Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter (increasing SNR from left to right).
Distribution of number of variants recovered by SuSiE matching and non-matching variants (three causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by matching Top and Stable SuSiE variants, non-matching Top SuSiE variants, and non-matching Stable SuSiE variants, in simulations involving three causal variants. Empirical distributions are stratified by simulations differing in the signal-to-noise ratio (SNR) parameter (increasing SNR from left to right).
Plain PICS vs Stable PICS in environmental heterogeneity simulations (one causal variant).
Empirical discrete probability distributions over the number of causal variants (0 or 1) recovered by Plain or Stable PICS in simulations with one causal variant and environmental heterogeneity, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Plain PICS vs Stable PICS in environmental heterogeneity simulations (two causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, or 2) recovered by Plain or Stable PICS in simulations with two causal variants and environmental heterogeneity, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a greater number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Plain PICS vs Stable PICS in environmental heterogeneity simulations (three causal variants).
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in simulations with three causal variants and environmental heterogeneity, stratified by the SNR parameter used in simulations (increasing SNR from left to right). The impact of including a larger number of potential sets on the distribution is shown (increasing number of included sets from top to bottom).
Plain PICS vs Stable PICS in ‘variance shift (t = 8)’ environmental heterogeneity simulations.
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘’ simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.
Plain PICS vs Stable PICS in ‘variance shift (t = 16)’ environmental heterogeneity simulations.
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘’ simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.
Plain PICS vs Stable PICS in ‘variance shift (t = 128)’ environmental heterogeneity simulations.
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘’ simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.
Plain PICS vs Stable PICS in ‘variance shift (t = 256)’ environmental heterogeneity simulations.
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘’ simulations involving environmental heterogeneity (variance shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.
Plain PICS vs Stable PICS in ‘mean shift (|i − 3|)’ environmental heterogeneity simulations.
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘’ simulations involving environmental heterogeneity (mean shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.
Plain PICS vs Stable PICS in ‘mean shift (i = 3)’ environmental heterogeneity simulations.
Empirical discrete probability distributions over the number of causal variants (0, 1, 2, or 3) recovered by Plain or Stable PICS in ‘’ simulations involving environmental heterogeneity (mean shift scenario), stratified by the SNR parameter used in simulations (increasing SNR from left to right). Each row reports the distribution for simulations with a specific number of causal variants (1, 2, or 3), and we use all three potential sets to compute the number of causal variants recovered in each case.
Venn diagram showing the number of matching and non-matching variants for Potential Set 1 in GEUVADIS fine-mapped variants.
Matching GEUVADIS Top vs Stable SNP posterior probabilities.
Pair density plot of posterior probabilities of the top variant and the stable variant, in case they match.
Non-matching GEUVADIS Top vs Stable SNP posterior probabilities.
Pair density plot of posterior probabilities of the top variant and the stable variant, in case they do not match.
Distribution of computational VEP scores across matching and non-matching variants.
Top row. CADD scores. (A) Empirical cumulative distribution functions of raw CADD scores of matching and non-matching variants across all genes, for Potential Set 1. Non-matching variants are further divided into stable and top variants, with a score lower threshold of 1.0 and upper threshold of 5.0 used to improve visualization. (B) For a deleteriousness cutoff, the percent of (1) all matching variants, (2) all non-matching top variants, and (3) all non-matching stable variants, which are classified as deleterious. We use a sliding cutoff threshold ranging from 10 to 20 as recommended by CADD authors. For each value along the x-axis, 95% confidence intervals for point estimates on the y-axis were obtained using the Sison-Glaz method for constructing multinomial distribution standard errors (R command DescTools::MultinomCI(...)). Bottom row. Empirical cumulative distribution functions of perturbation scores of Enformer-predicted H3K27me3 ChIP-seq track. Score upper threshold of 0.015 and empirical CDF lower threshold of 0.5 used to improve visualization. (C) Perturbation scores computed from predictions based on centering input sequences on the gene TSS as well as its two flanking positions. (D) Perturbation scores computed from predictions based on centering input sequences on the gene TSS only.
Comparison of CADD scores across non-matching top and stable variants.
(A) Paired scatterplot of raw CADD scores of both top and stable variant for each gene, for Potential Set 1. (B) Percent of genes that are classified as (1) having deleterious top variant only, (2) having deleterious stable variant only, and (3) having both top and stable variant deleterious, using a sliding cutoff threshold ranging from 10 to 20 as recommended by CADD authors.
Visual summary of the PICS algorithm described in Probabilistic Identification of Causal SNPs.
(A) Breakdown of the calculation of the probability of a focal SNP being causal. (B) Illustration of the permutation procedure used to generate the null distribution. An example genotype array with is used, with two valid row shuffles, or permutations, of the original array shown. Entries affected by the shuffle are highlighted, as is the focal SNP ().
Tables
A list of 378 functional annotations across which the biological significances of stable and top fine-mapped single nucleotide polymorphisms are compared.
Annotations that report multiple scores have the total number of scores reported shown in parentheses. Scores mined from the FAVOR database (Zhou et al., 2023) are indicated by an asterisk (TSS = transcription start site, bp = base pair).
| Functional annotation type | Functional annotation |
|---|---|
| Ensembl | Distance to Canonical TSS (Cunningham et al., 2022) |
| Regulatory Features (6; Cunningham et al., 2022) | |
| Computational predictions | CADD∗ (2; Rentzsch et al., 2019) |
| SIFTVal∗ (Ng and Henikoff, 2003) | |
| FATHMM-XF∗ (Rogers et al., 2018) | |
| LINSIGHT∗ (Huang et al., 2017) | |
| Polyphen∗ (Adzhubei et al., 2010) | |
| PhyloP∗ (3; Pollard et al., 2010) | |
| Gerp∗ (2; Davydov et al., 2010) | |
| B Statistic∗ (McVicker et al., 2009) | |
| FunSeq2∗ (Fu et al., 2014) | |
| ALoFT∗ (Balasubramanian et al., 2017) | |
| Percent CpG in 75 bp window∗ (Rentzsch et al., 2019) | |
| Percent GC in 75 bp window∗ (Rentzsch et al., 2019) | |
| FIRE (Ioannidis et al., 2017) | |
| Enformer (177 tracks × 2 scores per track; Avsec et al., 2021) |
List of six moderating factors considered.
| Moderator | Quantity/statistic computed |
|---|---|
| (1) Degree of Stability | No. subpopulations for which stable variant has positive probability |
| (2) Population Diversity | Maximum of pairwise allele frequency difference between subpopulations for which stable variant has positive posterior probability |
| (3) Population Differentiation | Maximum between subpopulations for which stable variant has positive posterior probability |
| (4) Inclusion of Distal Subpopulations (Top) | Whether or not the top variant also had positive probability in Yoruban subpopulation when the stability-guided approach was used |
| (5) Inclusion of Distal Subpopulations (Stable) | Whether or not the stable variant had positive probability in Yoruban subpopulation when the stability-guided approach was used |
| (6) Degree of Certainty of Causality Using Residualization Approach | Posterior probability of top variant |
Plain and Stable PICS matching frequencies.
Below reports the frequencies with which Plain and Stable PICS have matching variants for the same potential set. The numbers of matching variants for each SNR scenario are reported in the parentheses. The bottom two rows show matching frequencies when results are stratified by posterior probability (PP) of the Plain PICS variant. The numbers of matching variants for each PP stratum are reported in the parentheses.
| Stratified by signal-to-noise ratio (SNR) of simulations | |||
|---|---|---|---|
| Potential Set 1 | Potential Set 2 | Potential Set 3 | |
| SNR = 0.053 | 0.736 (265) | 0.803 (289) | 0.797 (287) |
| SNR = 0.111 | 0.775 (279) | 0.753 (271) | 0.758 (273) |
| SNR = 0.25 | 0.903 (325) | 0.714 (257) | 0.728 (262) |
| SNR = 0.667 | 0.906 (326) | 0.753 (271) | 0.744 (268) |
| Stratified by posterior probability (PP) of plain PICS variant | |||
| p > 0.9 | 0.978 (441) | 0.899 (286) | 0.927 (307) |
| p ≤ 0.9 | 0.762 (754) | 0.715 (802) | 0.706 (783) |
Stable and Top PICS matching frequencies.
Below reports the frequencies with which Stable and Top PICS have matching variants for the same potential set. The numbers of matching variants for each SNR/‘No. Causal Variants’ scenario are reported in the parentheses.
| Stratified by signal-to-noise ratio (SNR) and No. Causal Variants (S) in simulations | ||||
|---|---|---|---|---|
| Potential Set 1 | Potential Set 2 | Potential Set 3 | ||
| One causal variant (S = 1) | SNR = 0.053 | 0.695 (139) | 0.41 (82) | 0.22 (44) |
| SNR = 0.111 | 0.75 (150) | 0.405 (81) | 0.24 (48) | |
| SNR = 0.25 | 0.825 (165) | 0.45 (90) | 0.26 (52) | |
| SNR = 0.667 | 0.895 (179) | 0.405 (81) | 0.275 (55) | |
| Two causal variants (S = 2) | SNR = 0.053 | 0.545 (109) | 0.36 (72) | 0.225 (45) |
| SNR = 0.111 | 0.68 (136) | 0.38 (76) | 0.215 (43) | |
| SNR = 0.25 | 0.79 (158) | 0.435 (87) | 0.27 (54) | |
| SNR = 0.667 | 0.78 (156) | 0.41 (82) | 0.26 (52) | |
| Three causal variants (S = 3) | SNR = 0.053 | 0.565 (113) | 0.37 (74) | 0.245 (49) |
| SNR = 0.111 | 0.655 (131) | 0.36 (72) | 0.265 (53) | |
| SNR = 0.25 | 0.72 (144) | 0.39 (78) | 0.22 (44) | |
| SNR = 0.667 | 0.785 (157) | 0.48 (96) | 0.255 (51) | |
Stable and Top SuSiE matching frequencies.
Below reports the frequencies with which Stable and Top SuSiE have matching variants for the same potential set. The numbers of matching variants for each SNR/‘No. Causal Variants’ scenario are reported in the parentheses.
| Stratified by signal-to-noise ratio (SNR) and No. Causal Variants (S) in simulations | ||||
|---|---|---|---|---|
| Potential Set 1 | Potential Set 2 | Potential Set 3 | ||
| One causal variant (S = 1) | SNR = 0.053 | 0.55 (110) | 0.115 (23) | 0.095 (19) |
| SNR = 0.111 | 0.725 (145) | 0.14 (28) | 0.095 (19) | |
| SNR = 0.25 | 0.84 (168) | 0.145 (29) | 0.17 (34) | |
| SNR = 0.667 | 0.875 (175) | 0.14 (28) | 0.19 (38) | |
| Two causal variants (S = 2) | SNR = 0.053 | 0.505 (101) | 0.095 (19) | 0.065 (13) |
| SNR = 0.111 | 0.73 (146) | 0.14 (28) | 0.095 (19) | |
| SNR = 0.25 | 0.875 (175) | 0.33 (66) | 0.105 (21) | |
| SNR = 0.667 | 0.875 (175) | 0.425 (85) | 0.115 (23) | |
| Three causal variants (S = 3) | SNR = 0.053 | 0.345 (69) | 0.075 (15) | 0.085 (17) |
| SNR = 0.111 | 0.68 (136) | 0.23 (46) | 0.145 (29) | |
| SNR = 0.25 | 0.795 (159) | 0.37 (74) | 0.185 (37) | |
| SNR = 0.667 | 0.85 (170) | 0.585 (117) | 0.25 (50) | |
Off-diagonal matching frequencies and causal variant recovery.
Below reports the number of Stable and Top PICS non-matching variants that match across different, or ‘off-diagonal’, potential sets. Frequencies are computed across simulations with the same number of causal variants (S = 1, 2, or 3), with numbers along the yellow-shaded diagonal reporting the number of non-matching variants between the same potential sets. Each off-diagonal element reports both the number of matching variants for the pair of potential sets listed as well as the percentage of these matches that also correspond to the causal variant.
| Simulations with one causal variant | ||||
|---|---|---|---|---|
| Top PICS potential set compared against | ||||
| Potential Set 1 | Potential Set 2 | Potential Set 3 | ||
| Stable PICS potential set | Potential Set 1 | 167 | 5 (60%) | 3 (67%) |
| Potential Set 2 | 4 (25%) | 466 | 104 (0.96%) | |
| Potential Set 3 | 0 | 94 (0%) | 601 | |
| Simulations with two causal variants | ||||
| Top PICS potential set compared against | ||||
| Potential Set 1 | Potential Set 2 | Potential Set 3 | ||
| Stable PICS potential set | Potential Set 1 | 241 | 24 (46%) | 5 (60%) |
| Potential Set 2 | 29 (52%) | 483 | 88 (10%) | |
| Potential Set 3 | 9 (11%) | 84 (13%) | 606 | |
| Simulations with three causal variants | ||||
| Top PICS potential set compared against | ||||
| Potential Set 1 | Potential Set 2 | Potential Set 3 | ||
| Stable PICS potential set | Potential Set 1 | 255 | 29 (66%) | 7 (14%) |
| Potential Set 2 | 30 (40%) | 480 | 79 (18%) | |
| Potential Set 3 | 4 (0%) | 65 (12%) | 603 | |
List of matching variants with low stable posterior probability.
Below summarizes the genes and potential sets for which Stable and Top PICS returned matching variants, along with SNP-level and fine-mapping features for interpretation. Five statistics are reported: posterior probability of the stable variant (Stable PP); posterior probability of the top variant (Top PP); posterior probability support size, defined as the number variants with positive probability (Support Size); the number of ancestry slices, including the ALL slice, for which the stable variant had positive posterior probability from running Stable PICS (Number of Slices); the maximum difference in allele frequency between any pair of subpopulations among YRI, TSI, GBR, FIN, and CEU (Max AF Difference).
| Potential Set | Gene | Matching variant | Stable PP | Top PP | Support size | Number of slices | Max AF Difference |
|---|---|---|---|---|---|---|---|
| 1 | ENSG00000134762.11 | rs61731921 | 0.0028 | 0.76 | 23 | 4 | 0.22 |
| 1 | ENSG00000197847.8 | rs7130955 | 0.0075 | 0.23 | 45 | 3 | 0.18 |
| 1 | ENSG00000255284.1 | rs12224894 | 0.0067 | 0.65 | 23 | 6 | 0.14 |
| 1 | ENSG00000104442.5 | rs6995242 | 0.0092 | 0.31 | 42 | 4 | 0.34 |
| 1 | ENSG00000146733.9 | rs10239528 | 0.0031 | 0.53 | 27 | 5 | 0.24 |
| 1 | ENSG00000248468.1 | rs9853505 | 0.0099 | 0.29 | 39 | 3 | 0.43 |
| 1 | ENSG00000122224.10 | rs57449 | 0.0089 | 0.50 | 25 | 4 | 0.31 |
| 1 | ENSG00000134262.8 | rs17464525 | 0.0030 | 0.45 | 23 | 4 | 0.15 |
| 2 | ENSG00000216522.3 | rs5751902 | 0.0052 | 1 | 7 | 3 | 0.16 |
| 2 | ENSG00000108592.9 | rs9912201 | 0.0022 | 0.32 | 32 | 6 | 0.27 |
| 2 | ENSG00000134551.7 | rs7315843 | 0.0019 | 0.58 | 10 | 5 | 0.22 |
| 2 | ENSG00000221947.3 | rs3103860 | 0.0018 | 0.99 | 4 | 4 | 0.084 |
| 2 | ENSG00000081791.4 | rs2270113 | 0.0059 | 0.77 | 11 | 4 | 0.33 |
| 3 | ENSG00000140368.8 | rs62027296 | 0.0069 | 0.29 | 21 | 4 | 0.15 |
| 3 | ENSG00000254614.1 | rs625750 | 0.0017 | 0.64 | 4 | 3 | 0.22 |
| 3 | ENSG00000133835.9 | rs2451818 | 0.0036 | 0.40 | 32 | 3 | 0.41 |
| 3 | ENSG00000158234.8 | rs693293 | 0.0069 | 0.55 | 24 | 4 | 0.13 |
List of variant annotations with interpretations.
| Functional annotation | Interpretation |
|---|---|
| Distance to Canonical Transcription Start Site (TSS) | - |
| Percent CpG in 75 bp window centered on variant position | - |
| Percent GC in 75 bp window centered on variant position | - |
| CTCF Binding Enrichment | Whether the variant lies within a CTCF binding site region as predicted by Ensembl |
| Enhancer Enrichment | Whether the variant lies within an enhancer region as predicted by Ensembl |
| Open Chromatin Enrichment | Whether the variant lies within an open chromatin region as predicted by Ensembl |
| Promoter Enrichment | Whether the variant lies within a promoter region as predicted by Ensembl |
| TF Binding Enrichment | Whether the variant lies within a TF-binding site region as predicted by Ensembl |
| Promoter Flanking Enrichment | Whether the variant lies within a promoter flanking region as predicted by Ensembl |
| CADD (2 scores) | Whether the variant is likely to be simulated or not, and hence likely deleterious or not. One score is raw while the other is rank-normalized |
| SIFTVal | Whether the variant affects protein function, and hence deleterious |
| Polyphen2 | Posterior probability that the variant is damaging |
| LINSIGHT | Probability that the variant site is under selection, thus having functional consequence |
| PhyloP (3 scores) | Substitution rates measuring cross-species evolutionary conservation at the site of the variant. Each score is computed with respect to a clade (vertebrate, mammal, primate) |
| GerpN | Estimated neutral substitution rate at variant position, with higher value implying greater conservation |
| GerpS | Estimated rejected substitution rate at variant position, with positive value implying a deficit in substitutions |
| B Statistic | Background selection at variant position, with smaller value indicating larger impact of selection |
| FATHMM-XF | Integrative score measuring deleteriousness of the variant |
| Funseq2 | Integrative score measuring deleteriousness of the variant |
| ALoft | Integrative score measuring loss of function associated with the variant |
| FIRE | Integrative score measuring deleteriousness of the variant |
| Magnitude of Effect on Enformer Track Prediction (177 tracks) | Change in prediction of a gene regulatory track when performing in silico mutagenesis on the variant in a 196,608 bp sequence |
| Min | 1st Qu. | Median | Mean | 3rd Qu. | Max |
|---|---|---|---|---|---|
| -0.999 | -0.342 | -0.107 | -0.117 | 0.067 | 0.949 |
| Min | 1st Qu. | Median | Mean | 3rd Qu. | Max |
|---|---|---|---|---|---|
| -1.000 | -0.334 | -0.0741 | -0.0778 | 0.162 | 0.976 |
| Min | 1st Qu. | Median | Mean | 3rd Qu. | Max |
|---|---|---|---|---|---|
| -0.998 | -0.327 | -0.0564 | -0.0629 | 0.191 | 0.968 |
| Min | 1st Qu. | Median | Mean | 3rd Qu. | Max |
|---|---|---|---|---|---|
| -0.994 | -0.266 | -0.00645 | -0.109 | 0.049 | 0.924 |
| Min | 1st Qu. | Median | Mean | 3rd Qu. | Max |
|---|---|---|---|---|---|
| -0.998 | -0.376 | -0.0944 | -0.121 | 0.115 | 0.903 |
| Min | 1st Qu. | Median | Mean | 3rd Qu. | Max |
|---|---|---|---|---|---|
| -0.998 | -0.389 | -0.100 | -0.114 | 0.146 | 0.915 |