Discovering non-additive heritability using additive GWAS summary statistics

  1. Samuel Pattillo Smith
  2. Gregory Darnell
  3. Dana Udwin
  4. Julian Stamp
  5. Arbel Harpak
  6. Sohini Ramachandran
  7. Lorin Crawford  Is a corresponding author
  1. Center for Computational Molecular Biology, Brown University, United States
  2. Department of Ecology and Evolutionary Biology, Brown University, United States
  3. Department of Integrative Biology, The University of Texas at Austin, United States
  4. Department of Population Health, The University of Texas at Austin, United States
  5. Institute for Computational and Experimental Research in Mathematics, Brown University, United States
  6. Department of Biostatistics, Brown University, United States
  7. Data Science Institute, Brown University, United States
  8. Microsoft, United States
4 figures, 2 tables and 6 additional files

Figures

Figure 1 with 5 supplements
Power of the i-LDSC framework to detect tagged pairwise genetic interaction effects on simulated data.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 1%, 5%, and 10% of the total number of SNPs genome-wide (see the x-axis in each panel). These interact with the group #2 SNPs which are selected to be variants within a ± 10 kilobase (kb) window around each SNP in group #1. Coefficients for additive and interaction effects were simulated with no minor allele frequency dependency α=0 (see Materials and methods). Panels (A) and (B) are results with simulations using a heritability H2=0.3, while panels (C) and (D) were generated with H2=0.6. We also varied the proportion of heritability contributed by additive effects to (A, C) ρ=0.5 and (B, D) ρ=0.8, respectively. Here, we are blind to the parameter settings used in generative model and run i-LDSC while computing the cis-interaction LD scores using different estimating windows of ± 5 (green), ± 10 (orange), ± 25 (purple), and ± 50 (pink) SNPs. Results are based on 100 simulations per parameter combination and the horizontal bars represent standard errors. Generally, the performance of i-LDSC increases with larger heritability and lower proportions of additive variation. Note that LDSC is not shown here because it does not search for tagged interaction effects in summary statistics.

Figure 1—figure supplement 1
Power calculations for the i-LDSC framework to detect tagged pairwise genetic interaction effects on simulated data using a ± 10 kilobase (kb) window to generate cis-interactions around a focal SNP with a moderate minor allele frequency dependency α=-0.5 for effect sizes.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 1%, 5%, and 10% of the total number of SNPs genome-wide (see the x-axis in each panel). These interact with the group #2 SNPs which are selected to be variants within a ± 10 kilobase (kb) window around each SNP in group #1. Coefficients for additive and interaction effects were simulated with minor allele frequency dependency α=-0.5 (see Materials and methods). Panels (A) and (B) are results of simulations where the total heritability explained by additive SNP effects and cis-interaction effects is H2=0.3, while panels (C) and (D) were generated with H2=0.6. We also varied the proportion of phenotypic variation explained by additive SNP effects to (A, C) ρ=0.5 and (B, D) ρ=0.8, respectively. Here, we are blind to the parameter settings used in generative model and run i-LDSC while computing the cis-interaction LD scores using different estimation windows of ± 5 (green), ± 10 (orange), ± 25 (purple), and ± 50 (pink) SNPs. Results are based on 100 simulations per parameter combination and the horizontal black bars represent standard errors.

Figure 1—figure supplement 2
Power calculations for the i-LDSC framework to detect tagged pairwise genetic interaction effects on simulated data using a ± 10 kilobase (kb) window to generate cis-interactions around a focal SNP with a strong minor allele frequency dependency α=-𝟏 for effect sizes.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 1%, 5%, and 10% of the total number of SNPs genome-wide (see the x-axis in each panel). These interact with the group #2 SNPs which are selected to be variants within a ± 10 kilobase (kb) window around each SNP in group #1. Coefficients for additive and interaction effects were simulated with minor allele frequency dependency α=-0.5 (see Materials and methods). Panels (A) and (B) are results of simulations where the total heritability explained by additive SNP effects and cis-interaction effects is H2=0.3, while panels (C) and (D) were generated with H2=0.6. We also varied the proportion of phenotypic variation explained by additive SNP effects to (A, C) ρ=0.5 and (B, D) ρ=0.8, respectively. Here, we are blind to the parameter settings used in generative model and run i-LDSC while computing the cis-interaction LD scores using different estimation windows of ± 5 (green), ± 10 (orange), ± 25 (purple), and ± 50 (pink) SNPs. Results are based on 100 simulations per parameter combination and the horizontal black bars represent standard errors.

Figure 1—figure supplement 3
Power calculations for the i-LDSC framework to detect tagged pairwise genetic interaction effects on simulated data using a ± 10 kilobase (kb) window to generate cis-interactions around a focal SNP with no minor allele frequency dependency α=𝟎 for effect sizes.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 1%, 5%, and 10% of the total number of SNPs genome-wide (see the x-axis in each panel). These interact with the group #2 SNPs which are selected to be variants within a ± 10 kilobase (kb) window around each SNP in group #1. Coefficients for additive and interaction effects were simulated with minor allele frequency dependency α=-0.5 (see Materials and methods). Panels (A) and (B) are results of simulations where the total heritability explained by additive SNP effects and cis-interaction effects is H2=0.3, while panels (C) and (D) were generated with H2=0.6. We also varied the proportion of phenotypic variation explained by additive SNP effects to (A, C) ρ=0.5 and (B, D) ρ=0.8, respectively. Here, we are blind to the parameter settings used in generative model and run i-LDSC while computing the cis-interaction LD scores using different estimation windows of ± 5 (green), ± 10 (orange), ± 25 (purple), and ± 50 (pink) SNPs. Results are based on 100 simulations per parameter combination and the horizontal black bars represent standard errors.

Figure 1—figure supplement 4
Power calculations for the i-LDSC framework to detect tagged pairwise genetic interaction effects on simulated data using a ± 100 kilobase (kb) window to generate cis-interactions around a focal SNP with a moderate minor allele frequency dependency α=-0.5 for effect sizes.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 1%, 5%, and 10% of the total number of SNPs genome-wide (see the x-axis in each panel). These interact with the group #2 SNPs which are selected to be variants within a ± 10 kilobase (kb) window around each SNP in group #1. Coefficients for additive and interaction effects were simulated with minor allele frequency dependency α=-0.5 (see Materials and methods). Panels (A) and (B) are results of simulations where the total heritability explained by additive SNP effects and cis-interaction effects is H2=0.3, while panels (C) and (D) were generated with H2=0.6. We also varied the proportion of phenotypic variation explained by additive SNP effects to (A, C) ρ=0.5 and (B, D) ρ=0.8, respectively. Here, we are blind to the parameter settings used in generative model and run i-LDSC while computing the cis-interaction LD scores using different estimation windows of ± 5 (green), ± 10 (orange), ± 25 (purple), and ± 50 (pink) SNPs. Results are based on 100 simulations per parameter combination and the horizontal black bars represent standard errors.

Figure 1—figure supplement 5
Power calculations for the i-LDSC framework to detect tagged pairwise genetic interaction effects on simulated data using a ± 100 kilobase (kb) window to generate cis-interactions around a focal SNP with a strong minor allele frequency dependency α=-𝟏 for effect sizes.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 1%, 5%, and 10% of the total number of SNPs genome-wide (see the x-axis in each panel). These interact with the group #2 SNPs which are selected to be variants within a ± 10 kilobase (kb) window around each SNP in group #1. Coefficients for additive and interaction effects were simulated with minor allele frequency dependency α=-0.5 (see Materials and methods). Panels (A) and (B) are results of simulations where the total heritability explained by additive SNP effects and cis-interaction effects is H2=0.3, while panels (C) and (D) were generated with H2=0.6. We also varied the proportion of phenotypic variation explained by additive SNP effects to (A, C) ρ=0.5 and (B, D) ρ=0.8, respectively. Here, we are blind to the parameter settings used in generative model and run i-LDSC while computing the cis-interaction LD scores using different estimation windows of ± 5 (green), ± 10 (orange), ± 25 (purple), and ± 50 (pink) SNPs. Results are based on 100 simulations per parameter combination and the horizontal black bars represent standard errors.

Figure 2 with 2 supplements
The i-LDSC framework is well-calibrated under the null hypothesis and does not identify evidence of tagged non-additive effects when polygenic traits are generated by only additive effects.

In these simulations, synthetic trait architecture is made up of only additive genetic variation (i.e. ρ=1). Coefficients for additive and interaction effects were simulated with no minor allele frequency dependency α=0 (see Materials and methods). Here, we are blind to the parameter settings used in generative model and run i-LDSC while computing the cis-interaction LD scores using different estimating windows of ± 5 (green), ± 10 (orange), ± 25 (purple), and ± 50 (pink) SNPs. (A) Mean type I error rate using the i-LDSC framework across an array of estimation window sizes for the cis-interaction LD scores. This is determined by assessing the p-value of the cis-interaction coefficient (ϑ) in the i-LDSC regression model and checking whether p < 0.05. (B) Estimates of the cis-interaction coefficient (ϑ). Since traits were simulated with only additive effects, these estimates should be centered around zero. (C) Estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) where the true additive variance is set to H2ρ=0.6. (D) QQ-plot of the p-values for the cis-interaction coefficient (ϑ) in i-LDSC. Results are based on 100 simulations per parameter combination and the horizontal bars represent standard errors.

Figure 2—figure supplement 1
The i-LDSC framework is well-calibrated under the null hypothesis and does not identify evidence of tagged non-additive effects when polygenic traits are generated by only additive effects and a moderate minor allele frequency dependency α=-0.5 for effect sizes.

In these simulations, synthetic trait architecture is made up of only additive genetic variation (i.e. ρ=1). Coefficients for additive and interaction effects were simulated with minor allele frequency dependency α=-0.5 (see Materials and methods). Here, we are blind to the parameter settings used in generative model and run i-LDSC while computing the cis-interaction LD scores using different estimation windows of ± 5 (green), ± 10 (orange), ± 25 (purple), and ± 50 (pink) SNPs. (A) Mean type I error rate using the i-LDSC framework across an array of estimation window sizes for the cis-interaction LD scores. This is determined by assessing the p-value of the cis-interaction coefficient (ϑ) in the i-LDSC regression model and checking whether p < 0.05. (B) Estimates of the cis-interaction coefficient (ϑ). Since traits were simulated with only additive effects, these estimates should be centered around zero. (C) Estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) where the true additive variance is set to H2ρ=0.6. (D) QQ-plot of the p-values for the cis-interaction coefficient (ϑ) in i-LDSC. Results are based on 100 simulations per parameter combination and the horizontal black bars represent standard errors.

Figure 2—figure supplement 2
The i-LDSC framework is well-calibrated under the null hypothesis and does not identify evidence of tagged non-additive effects when polygenic traits are generated by only additive effects and a strong minor allele frequency dependency α=-𝟏 for effect sizes.

In these simulations, synthetic trait architecture is made up of only additive genetic variation (i.e. ρ=1). Coefficients for additive and interaction effects were simulated with minor allele frequency dependency α=-0.5 (see Materials and methods). Here, we are blind to the parameter settings used in generative model and run i-LDSC while computing the cis-interaction LD scores using different estimation windows of ± 5 (green), ± 10 (orange), ± 25 (purple), and ± 50 (pink) SNPs. (A) Mean type I error rate using the i-LDSC framework across an array of estimation window sizes for the cis-interaction LD scores. This is determined by assessing the p-value of the cis-interaction coefficient (ϑ) in the i-LDSC regression model and checking whether p < 0.05. (B) Estimates of the cis-interaction coefficient (ϑ). Since traits were simulated with only additive effects, these estimates should be centered around zero. (C) Estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) where the true additive variance is set to H2ρ=0.6. (D) QQ-plot of the p-values for the cis-interaction coefficient (ϑ) in i-LDSC. Results are based on 100 simulations per parameter combination and the horizontal black bars represent standard errors.

Figure 3 with 13 supplements
i-LDSC robustly and accurately estimates the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) in simulations in polygenic traits, compared to LDSC, due to our accounting for interaction effects tagged in additive GWAS summary statistics.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank (Materials and Methods). All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 10% of the total number of SNPs genome-wide. These interact with the group #2 SNPs which are selected to be variants within a ± 100 kilobase (kb) window around each SNP in group #1. Coefficients for additive and interaction effects were simulated with no minor allele frequency dependency α=0 (see Materials and methods). Here, we assume a heritability (A) H2=0.3 or (B) H2=0.6 (marked by the black dotted lines, respectively), and we vary the proportion contributed by additive effects with ρ={0.2,0.4,0.6,0.8}. The grey dotted lines represent the total contribution of additive effects in the generative model for the synthetic traits (H2ρ). i-LDSC outperforms LDSC in recovering heritability across each scenario. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 1
i-LDSC robustly and accurately estimates the proportions of phenotypic variance explained (PVE) by genetic effects in polygenic traits by accounting for interaction effects tagged by GWAS summary statistics.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 10% of the total number of SNPs genome-wide. These interact with the group #2 SNPs which are selected to be variants within a ± 100 kilobase (kb) window around each SNP in group #1. Coefficients for additive and cis-interaction effects were simulated with no minor allele frequency dependency α=0 (see Materials and methods). Here, we assume a total heritability explained by additive SNP and cis-interaction effects is (A) H2=0.3 or (B) H2=0.6 (marked by the black dotted lines, respectively), and we vary the proportion contributed by additive effects with ρ={0.2,0.4,0.6,0.8}. The grey dotted line represents the total contribution of additive effects in the generative model for the synthetic traits (H2ρ). We run i-LDSC while computing the cis-interaction LD scores using different estimating windows of ± 5, ± 10, ± 25, and ± 50 SNPs, respectively. These results help motivate the selection of scores calculated using a ± 50 SNP window in our empirical analyses. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 2
Performance of LDSC and i-LDSC on simulated polygenic traits with architectures that are determined by additive, cis-interaction, and gene-by-environment (G×E) effects.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 10% of the total number of SNPs genome-wide. These interact with the group #2 SNPs which are selected to be variants within a ± 100 kilobase (kb) window around each SNP in group #1. G×E effects were simulated using an amplification model (Zhu et al., 2023 ; see Materials and methods) where we split the sample population in half to emulate two subsets of individuals coming from different environments. We randomly draw variant effect sizes for the first environment from a standard Gaussian distribution. Then effect sizes for the second environment are set to be the product of the effect sizes in from with first environment with an amplifier w=[1.1,1.2,,2] (see the x-axis in each panel). Both the cis-interaction and G×E effects were set to explain a quarter of the total phenotypic variation and the remaining half was explained by additive SNP effects. Panels (A) and (B) show estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) from LDSC and i-LDSC, respectively. Panels (C) and (D) show i-LDSC estimates of the phenotypic variation explained by tagged non-additive genetic effects using the cis-interaction LD score (i.e. estimates of ϑ). We assume the total heritability explained by all genetic effects to be (A, C) H2=0.6 and (B, D) H2=0.3. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 3
Performance of LDSC and i-LDSC on simulated polygenic traits with architectures that are determined by additive, cis-interaction, and gene-by-ancestry (G×Ancestry) effects with principal components (PCs) included in the GWAS model to correct for additional structure.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 10% of the total number of SNPs genome-wide. These interact with the group #2 SNPs which are selected to be variants within a ± 100 kilobase (kb) window around each SNP in group #1. G×Ancestry effects were simulated as the product of individual genotypes and the SNP loadings for each of the first 10 PCs (see the x-axis in each panel). Both the cis-interaction and G×Ancestry effects were set to explain a quarter of the total phenotypic variation and the remaining half was explained by additive SNP effects. The proportion of genotypic variance explained by each PC is shown in green. Panels (A) and (B) show estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) from LDSC and i-LDSC, respectively. Panels (C) and (D) show i-LDSC estimates of the phenotypic variation explained by tagged non-additive genetic effects using the cis-interaction LD score (i.e. estimates of ϑ). We assume the total heritability explained by all genetic effects to be (A, C) H2=0.6 and (B, D) H2=0.3. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 4
Performance of LDSC and i-LDSC on simulated polygenic traits with architectures that are determined by additive, cis-interaction, and gene-by-ancestry (G×Ancestry) effects without correcting for the additional structure in the GWAS analysis.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 10% of the total number of SNPs genome-wide. These interact with the group #2 SNPs which are selected to be variants within a ± 100 kilobase (kb) window around each SNP in group #1. G×Ancestry effects were simulated as the product of individual genotypes and the SNP loadings for each of the first 10 PCs (see the x-axis in each panel). Both the cis-interaction and G×Ancestry effects were set to explain a quarter of the total phenotypic variation and the remaining half was explained by additive SNP effects. The proportion of genotypic variance explained by each PC is shown in green. Panels (A) and (B) show estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) from LDSC and i-LDSC, respectively. Panels (C) and (D) show i-LDSC estimates of the phenotypic variation explained by tagged non-additive genetic effects using the cis-interaction LD score (i.e. estimates of ϑ). We assume the total heritability explained by all genetic effects to be (A, C) H2=0.6 and (B, D) H2=0.3. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 5
Performance of LDSC and i-LDSC on simulated polygenic traits with architectures that are determined by only additive and gene-by-environment (G×E) effects.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). G×E effects were simulated using an amplification model65 (see Materials and methods) where we split the sample population in half to emulate two subsets of individuals coming from different environments. We randomly draw variant effect sizes for the first environment from a standard Gaussian distribution. Then effect sizes for the second environment are set to be the product of the effect sizes in from with first environment with an amplifier w=[1.1,1.2,,2] (see the x-axis in each panel). Additive and G×E effects were set to explain half of the phenotypic variation. Note that unlike results depicted in Figure 3—figure supplement 2, there are no cis-interaction effects that affect trait architecture. Here, panels (A) and (B) show estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) from LDSC and i-LDSC, respectively. Panels (C) and (D) show i-LDSC estimates of the phenotypic variation explained by tagged non-additive genetic effects using the cis-interaction LD score (i.e. estimates of ϑ). We assume the total heritability explained by all genetic effects to be (A, C) H2=0.6 and (B, D) H2=0.3. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 6
Performance of LDSC and i-LDSC on simulated polygenic traits with architectures that are determined by only additive and gene-by-ancestry (G×Ancestry) effects with principal components (PCs) included in the GWAS model to correct for additional structure.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). G×Ancestry effects were simulated as the product of individual genotypes and the SNP loadings for each of the first 10 PCs (see the x-axis in each panel). Additive and G×E effects were set to explain half of the phenotypic variation. The proportion of genotypic variance explained by each PC is shown in green. Note that unlike results depicted in Figure 3—figure supplement 3, there are no cis-interaction effects that affect trait architecture. Here, panels (A) and (B) show estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) from LDSC and i-LDSC, respectively. Panels (C) and (D) show i-LDSC estimates of the phenotypic variation explained by tagged non-additive genetic effects using the cis-interaction LD score (i.e. estimates of ϑ). We assume the total heritability explained by all genetic effects to be (A, C) H2=0.6 and (B, D) H2=0.3. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 7
Performance of LDSC and i-LDSC on simulated polygenic traits with architectures that are determined by only additive and gene-by-ancestry (G×Ancestry) effects without correcting for the additional structure in the GWAS analysis.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e. creating a polygenic trait architecture). G×Ancestry effects were simulated as the product of individual genotypes and the SNP loadings for each of the first 10 PCs (see the x-axis in each panel). Additive and G×E effects were set to explain half of the phenotypic variation. The proportion of genotypic variance explained by each PC is shown in green. Note that unlike results depicted in Figure 3—figure supplement 4, there are no cis-interaction effects that affect trait architecture. Here, panels (A) and (B) show estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) from LDSC and i-LDSC, respectively. Panels (C) and (D) show i-LDSC estimates of the phenotypic variation explained by tagged non-additive genetic effects using the cis-interaction LD score (i.e., estimates of ϑ). We assume the total heritability explained by all genetic effects to be (A, C) H2=0.6 and (B, D) H2=0.3. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 8
Performance of LDSC and i-LDSC on simulated traits with sparse architectures that are determined by only additive effects.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. Here, traits were generated with solely additive effects where only variants with the top or bottom {1,5,10,25,50,100} percentile of LD scores were given nonzero coefficients in the generative model (see the x-axis in each panel). Panels (A) and (B) show estimates of the proportions of phenotypic variance explained (PVE) by genetic effects (i.e. estimated heritability) from LDSC and i-LDSC, respectively. Panels (C) and (D) show i-LDSC estimates of the phenotypic variation explained by tagged non-additive genetic effects using the cis-interaction LD score (i.e. estimates of ϑ). We assume the total heritability explained by all genetic effects to be (A, C) H2=0.6 and (B, D) H2=0.3. Results are based on 100 simulations per parameter combination. The overall takeaway is that breaking the assumed relationship between LD scores and chi-squared test statistics (i.e. that they are generally positively correlated) led to unbounded estimates of heritability for both LDSC and i-LDSC in all but the (polygenic) scenario when 100% of SNPs contributed to phenotypic variation.

Figure 3—figure supplement 9
The non-additive component estimates in i-LDSC are robust to unobserved additive effects in a haplotype.

Synthetic trait architectures are simulated such that a substantial proportion of genetic variance is explained by an additive effect that is not directly observed. The goal of these simulations was to assess how these unobserved effects influence the estimation of the non-additive variance component in the i-LDSC model. In each simulation, we generated haplotypes that each contain 5000 variants. Next, we select either (A, B) a single causal variant with only an additive effect or (C, D) a set of ten causal variants with only additive effects. In each case, the causal variants have a MAF that is randomly selected between: (i) (0.01, 0.1), (ii) (0.1, 0.2), (iii) (0.2, 0.3), (iv) (0.3, 0.4), or (v) (0.4, 0.5) as depicted on the x-axis. The corresponding additive effect size for each causal variant across the haplotypes is simulated to be inversely proportional to its MAF (Schoech et al., 2019). On the y-axis, we measure the difference (Δ) between i-LDSC coefficient estimates when every variant is included in the model versus when the haplotype causal variants are omitted for two different trait architectures with broad-sense heritability set to (A, C) H2=0.6 and (B, D) H2=0.3. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 10
The i-LDSC framework protects against the false discovery of non-additive genetic variance when causal interacting SNPs are unobserved and the proportion of genetic variance explained by additive effects is equal to ρ= 0.5.

Synthetic trait architectures are simulated such that a substantial proportion of genetic variance is explained by pairwise genetic interaction effects that are not directly observed. The goal of these simulations was to assess how these unobserved effects influence the estimation of the non-additive variance component in the i-LDSC model. In each simulation, we generated haplotypes that each contain 5000 variants. Every SNP in the genome had at least a small additive effect. The corresponding additive effect size for each variant across the haplotypes is simulated to be inversely proportional to its MAF (Schoech et al., 2019). We then set (A, C) 1% or (B, D) 5% of causal variants in each haplotype to have non-zero interaction effects. On the y-axis, we measure the difference (Δ) between i-LDSC coefficient estimates when every variant is included in the model versus when the specified percentage of variants with pairwise genetic interaction effects are omitted for two different trait architectures with broad-sense heritability set to (A, B) H2=0.6 and (C, D) H2=0.3. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 11
The i-LDSC framework protects against the false discovery of non-additive genetic variance when causal interacting SNPs are unobserved and the proportion of genetic variance explained by additive effects is equal to ρ= 0.8.

Synthetic trait architectures are simulated such that a substantial proportion of genetic variance is explained by pairwise genetic interaction effects that are not directly observed. The goal of these simulations was to assess how these unobserved effects influence the estimation of the non-additive variance component in the i-LDSC model. In each simulation, we generated haplotypes that each contain 5000 variants. Every SNP in the genome had at least a small additive effect. The corresponding additive effect size for each variant across the haplotypes is simulated to be inversely proportional to its MAF (Schoech et al., 2019). We then set (A, C) 1% or (B, D) 5% of causal variants in each haplotype to have non-zero interaction effects. On the y-axis, we measure the difference (Δ) between i-LDSC coefficient estimates when every variant is included in the model versus when the specified percentage of variants with pairwise genetic interaction effects are omitted for two different trait architectures with broad-sense heritability set to (A, B) H2=0.6 and (C, D) H2=0.3. Results are based on 100 simulations per parameter combination.

Figure 3—figure supplement 12
Bias in LDSC and i-LDSC estimates when the additive and interaction effect sizes in the generative model of complex traits are correlated.

To simulate synthetic trait architectures, we first simulated additive effects for each variant to be MAF-dependent (i.e., α=-1). Here, we set the corresponding interaction effect sizes to have a correlation with the additive effect sizes equal to r={-1,-0.8,-0.6,,0.6,0.8,1} (labeled across the x-axis). On the y-axis, we measure the bias in the LDSC and i-LDSC estimates of phenotypic variance explained (PVE) by genetic effects. In each simulation, we generate traits with an equal proportion of variance explained by additive and interaction effects and a total broad-sense heritability set to (A) H2=0.6 and (B) H2=0.3. Results are based on 100 simulations for each parameter value.

Figure 3—figure supplement 13
Bias in LDSC and i-LDSC estimates when interaction effect sizes in the generative model of complex traits are a linear or squared function of the the additive effects.

To simulate synthetic trait architectures, we first simulated additive effects for each variant to be MAF-dependent (i.e., α=-1). Here, we set the corresponding interaction effect sizes to be either (A, C) a linear function or (B, D) a squared function of the additive effects with a scaling factor q={0.1,0.2,,0.8,1} (labeled across the x-axis). On the y-axis, we measure the bias in the LDSC and i-LDSC estimates of the phenotypic variance explained (PVE) by genetic effects. In each simulation, we generate traits with an equal proportion of variance explained by additive and interaction effects and a total broad-sense heritability set to (A, B) H2=0.6 and (C, D) H2=0.3. Results are based on 100 simulations for each parameter value.

Figure 4 with 1 supplement
The i-LDSC framework recovers heritability and provides estimates of tagged cis-interactions in GWAS summary statistics (ϑ) for 25 quantitiative traits in the UK Biobank and BioBank Japan.

(A) In both the UK Biobank (green) and BioBank Japan (purple), estimates of phenotypic variance explained (PVE) by genetic effects from i-LDSC and LDSC are highly correlated for 25 different complex traits. The Spearman correlation coefficient between heritability estimates from LDSC and i-LDSC for the UK Biobank and BioBank Japan are r2=0.989 and r2=0.850, respectively. The y=x dotted line represents the values at which estimates from both approaches are the same. (B) PVE estimates from the UK Biobank are better correlated with those from the BioBank Japan across 25 traits using LDSC (Spearman r2=0.848) than i-LDSC (Spearman r2=0.666). (C) Both the original and stratified LDSC models recover the same amount of PVE when the cis-interaction LD score is included as an additional component in the UK Biobank analysis (Spearman r2=0.989). These models are listed as i-LDSC and s+i-LDSC, respectively. For s+i-LDSC, we included 97 functional annotations from Gazal et al. to estimate heritability. (D) Estimates of non-additive variance components in i-LDSC versus s+i-LDSC (Spearmen r2=0.184). While not statistically significant in the stratified analysis with the additional annotations, the non-additive component still makes nonzero contributions to the PVE estimation for all 25 traits in the UK Biobank (see Tables 1 and 2).

Figure 4—figure supplement 1
Additional results from applying LDSC and i-LDSC for 25 quantitiative traits in the UK Biobank and BioBank Japan.

(A) i-LDSC estimates of the phenotypic variation explained by tagged non-additive genetic effects using the cis-interaction LD score (i.e., estimates of ϑ) between traits in the UK Biobank and BioBank Japan (Spearman r2=0.372). (B) Estimates of i-LDSC and LDSC intercept terms for 25 traits analyzed in the UK Biobank and BioBank Japan. Intercept terms using LDSC and i-LDSC are highly correlated in both the UK Biobank (Spearman r2=0.888) and BioBank Japan (Spearman r2=0.813). The x=y dotted line represents points for when the two sets of estimates are equal.

Tables

Table 1
i-LDSC heritability estimates and p-values highlighting statistically significant contributions of tagged pairwise genetic interaction effects for 25 traits in the UK Biobank and BioBank Japan.

Here, LDSC heritability estimates are included as a baseline. The difference between the approaches is that the i-LDSC heritability estimates include proportions of phenotypic variation that are explained by tagged non-additive variation (see columns with estimates of ϑ). Note that all 25 traits analyzed in the UK Biobank and 23 of the 25 traits analyzed in BioBank Japan have a statistically significant amount of tagged non-additive genetic effects as detected by the cis-interaction LD score (p < 0.05). The two traits without significant tagged non-additive genetic effects in BioBank Japan were HDL (p = 0.081) and Triglyceride (p = 0.110). These traits are indicated by *. The i-LDSC p-values are related to the estimates of the ϑ coefficients which are also displayed in Figure 4.

TraitUKB (LDSC)UKB (i-LDSC)UKB ϑ^UKB p-valueBBJ (LDSC)BBJ (i-LDSC)BBJ ϑ^BBJ p-value
Basophil0.02500.03150.00651.572× 10−120.06840.15480.08640.025
BMI0.17570.23490.05923.083× 10−840.16670.26560.09892.438× 10−18
Cholesterol0.09540.09740.00201.821× 10−160.06290.12680.06392.740× 10−4
CRP0.03540.04140.00609.845× 10−120.02020.16250.14230.020
DBP0.09400.12030.02631.118× 10−650.06050.12670.06621.675× 10−7
EGFR0.15210.19990.04781.187× 10−460.10100.12250.02154.232× 10−5
Eosinophil0.10550.13750.03201.230× 10−180.07850.19730.11880.001
HBA1C0.09060.10830.01771.578× 10−260.10570.13080.02510.031
HDL*0.15990.17680.01699.636× 10−370.15900.18380.02480.081
Height0.36750.48150.11401.038× 10−640.39410.73360.33957.433× 10−33
Hematocrit0.10780.13520.02742.479× 10−250.07520.09280.01763.689× 10−5
Hemoglobin0.11770.14330.02564.284× 10−270.07020.07520.00509.037× 10−4
LDL0.08020.08590.00575.087× 10−130.07450.14380.06930.018
Lymphocyte0.04020.05010.00994.906× 10−190.08440.17570.09135.479× 10−5
MCH0.13610.15970.02361.785× 10−250.15360.28310.12951.042× 10−5
MCHC0.03170.03640.00473.730× 10−120.05710.06500.00790.027
MCV0.16300.19020.02721.180× 10−290.15300.28180.12881.042× 10−5
Monocyte0.07880.09550.01675.257× 10−180.08880.15490.06610.004
Neutrophil0.11020.13910.02891.777× 10−330.11910.21140.09235.050× 10−5
Platelet0.19920.24470.04552.303× 10−370.15650.24360.08717.724× 10−9
RBC0.15740.19330.03593.292× 10−310.12030.20680.08655.972× 10−8
SBP0.09540.12010.02478.660× 10−750.07690.16040.08359.075× 10−10
Triglycerides*0.10610.12040.01431.410× 10−260.11710.26700.14990.110
Urate0.12170.15500.03339.642× 10−380.13950.34620.20670.015
WBC0.09620.12500.02889.866× 10−340.10240.22660.12421.346× 10−8
Table 2
Comparison of s-LDSC and i-LDSC estimates of phenotypic variance explained (PVE) by genetic effects for 25 complex traits in the UK Biobank.

Here, we use stratified LD score regression (s-LDSC) to partition heritability across different genomic elements (Finucane et al., 2015). We used 97 functional annotations from Gazal et al. to estimate heritability in 25 traits. We then appended cis-interaction LD scores as an additional annotation to obtain heritability estimates (this method is referred to as s+i-LDSC in the table). p-values for the s+i-LDSC model detailing the contributions of tagged non-additive genetic effects for 25 traits are provided in the last column. Note that, while not statistically significant in this stratified analysis with the additional annotations, the non-additive component still makes nonzero contributions to the PVE estimation for all 25 traits.

TraitUKB PVE (s-LDSC)UKB PVE (s+i-LDSC)s+i-LDSC p-value
Basophil0.03630.03750.4728
BMI0.21000.24820.8126
Cholesterol0.10420.13580.6202
CRP0.04520.05240.6483
DBP0.12280.14410.6125
EGFR0.18260.21050.8507
Eosinophil0.14030.15780.1867
HBA1C0.10400.12750.6917
HDL0.18200.23730.5754
Height0.43150.47260.5224
Hematocrit0.14160.16460.3956
Hemoglobin0.15040.17950.2299
LDL0.08580.11310.8812
Lymphocyte0.05450.06510.1453
MCH0.14970.15450.0968
MCHC0.04500.04960.3728
MCV0.18140.19300.1530
Monocyte0.10850.14310.5421
Neutrophil0.13200.15990.2499
Platelet0.23170.26280.7371
RBC0.19330.22230.3197
SBP0.12060.14190.1100
Triglycerides0.13350.16210.5301
Urate0.15300.17360.1177
WBC0.12210.14820.5155

Additional files

Supplementary file 1

Comparison of LDSC and i-LDSC estimates of the proportion of phenotypic variance explained (PVE) by genetic effects (i.e., estimated heritability) when the true heritability is set to H𝟐=0.3 for polygenic traits.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e., creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two groups. The group #1 SNPs are chosen to be 10% of the total number of SNPs genome-wide. These interact with the group #2 SNPs which are selected to be variants within a ± 100 kilobase (kb) window around each SNP in group #1. Coefficients for additive and interaction effects were simulated with no minor allele frequency dependency α=0 (see Materials and Methods). Here, we assume a heritability H2=0.3 and vary the proportion contributed by additive effects with ρ={0.2,0.4,0.6,0.8}. We run i-LDSC while computing the cis-interaction LD scores using different estimating windows of ± 5, ± 10, ± 25, and ± 50 SNPs. The “average” column represents results using model averaging over the different estimating windows (see Materials and Methods). We report the mean estimates of heritability (with standard errors in the parentheses) and use mean absolute error (MAE) to quantify the difference between the two methods. Results are based on 100 simulations per parameter combination. As shown in Figure 3—figure supplements 3 and 1, LDSC does not capture the contribution of non-additive genetic effects to trait variation.

https://cdn.elifesciences.org/articles/90459/elife-90459-supp1-v1.xlsx
Supplementary file 2

Comparison of LDSC and i-LDSC estimates of the proportion of phenotypic variance explained (PVE) by genetic effects (i.e., estimated heritability) when the true heritability is set to H𝟐=0.6.

Synthetic trait architecture was simulated using real genotype data from individuals of self-identified European ancestry in the UK Biobank. All SNPs were considered to have at least an additive effect (i.e., creating a polygenic trait architecture). Next, we randomly select two groups of interacting variants and divide them into two interacting groups. The group #1 SNPs are chosen to be 10% of the total number of SNPs genome-wide. These interact with the group #2 SNPs which are selected to be variants within a ± 100 kilobase (kb) window around each SNP in group #1. Coefficients for additive and interaction effects were simulated with no minor allele frequency dependency α=0 (see Materials and Methods). Here, we assume a heritability H2=0.6 and vary the proportion contributed by additive effects with ρ={0.2,0.4,0.6,0.8}. We run i-LDSC while computing the cis-interaction LD scores using different estimating windows of ± 5, ± 10, ± 25, and ± 50 SNPs. The “average” column represents results using model averaging over the different estimating windows (see Materials and Methods). We report the mean estimates of heritability (with standard errors in the parentheses) and use mean absolute error (MAE) to quantify the difference between the two methods. Results are based on 100 simulations per parameter combination. As shown in Figure 3—figure supplements 3 and 1, LDSC does not capture the additional contribution of non-additive genetic effects to trait variation.

https://cdn.elifesciences.org/articles/90459/elife-90459-supp2-v1.xlsx
Supplementary file 3

Abbreviations used throughout this study for 14 quantitative traits analyzed in this study.

The remaining 11 traits analyzed were Basophil count, Cholesterol, Eosinophil count, Height, Hematocrit, Hemoglobin, Lymphocyte count, Monocyte count, Neutrophil count, and Triglyceride levels, respectively. These are not abbreviated in the main text.

https://cdn.elifesciences.org/articles/90459/elife-90459-supp3-v1.xlsx
Supplementary file 4

Trait-specific α parameters for each of the 25 traits analyzed.

Here, α values are used to weight each variant based on its minor allele frequency to account for frequency dependent architectures in each trait. The ∗ indicates α parameters that were taken directly from Schoech et al. The α parameters for other traits were calculated using the protocol used in that paper. Expansion of trait abbreviations are given in Supplementary file 3.

https://cdn.elifesciences.org/articles/90459/elife-90459-supp4-v1.xlsx
Supplementary file 5

Number of individuals and total SNPs included in the analysis of each trait in BioBank Japan.

https://cdn.elifesciences.org/articles/90459/elife-90459-supp5-v1.xlsx
MDAR checklist
https://cdn.elifesciences.org/articles/90459/elife-90459-mdarchecklist1-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Samuel Pattillo Smith
  2. Gregory Darnell
  3. Dana Udwin
  4. Julian Stamp
  5. Arbel Harpak
  6. Sohini Ramachandran
  7. Lorin Crawford
(2024)
Discovering non-additive heritability using additive GWAS summary statistics
eLife 13:e90459.
https://doi.org/10.7554/eLife.90459