Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies

  1. Mashaal Sohail  Is a corresponding author
  2. Robert M Maier  Is a corresponding author
  3. Andrea Ganna
  4. Alex Bloemendal
  5. Alicia R Martin
  6. Michael C Turchin
  7. Charleston WK Chiang
  8. Joel Hirschhorn
  9. Mark J Daly
  10. Nick Patterson
  11. Benjamin Neale  Is a corresponding author
  12. Iain Mathieson  Is a corresponding author
  13. David Reich  Is a corresponding author
  14. Shamil R Sunyaev  Is a corresponding author
  1. Brigham and Women’s Hospital and Harvard Medical School, United States
  2. Harvard Medical School, United States
  3. Broad Institute of MIT and Harvard, United States
  4. Massachusetts General Hospital, United States
  5. Karolinska Institutet, Sweden
  6. University of Helsinki, Finland
  7. Brown University, United States
  8. University of Southern California, United States
  9. Boston Children’s Hospital, United States
  10. University of Pennsylvania, United States
  11. Howard Hughes Medical Institute, Harvard Medical School, United States
5 figures and 8 additional files

Figures

Figure 1 with 6 supplements
Polygenic height scores and tSDS scores based on GIANT and UK Biobank GWAS.

(a) Polygenic scores in present-day and ancient European populations are shown, centered by the average score across populations and standardized by the square root of the additive variance. …

https://doi.org/10.7554/eLife.39702.002
Figure 1—source data 1

Polygenic height scores and tSDS scores based on GIANT and UK Biobank GWAS.

https://doi.org/10.7554/eLife.39702.009
Figure 1—figure supplement 1
Beta concordance between GIANT and UK Biobank by P value bin.

SNPs intersecting between GIANT and UKB were LD-pruned (using PLINK 1.9 with parameters r^2 = 0.1, window size = 1 Mb, step size 5) and grouped into P value bins of 500 SNPs each, for P values from …

https://doi.org/10.7554/eLife.39702.003
Figure 1—figure supplement 2
Polygenic height scores based on GIANT and UK Biobank GWAS for clumped SNPs in present-day and ancient Europeans.

Scores are shown, centered by the average score across modern and ancient populations respectively and standardized by the square root of the additive variance. SNPs were LD-pruned with plink’s …

https://doi.org/10.7554/eLife.39702.004
Figure 1—figure supplement 3
Polygenic height scores in 1000 genomes European populations using clumped SNPs and effect sizes from different summary statistics.

Polygenic scores in modern European populations are shown using SNPs LD-pruned with PLINK’s clumping procedure with parameters: (a) r2 <0.1, 1 Mb, p<0.01, and (b) r2 <0.1, 1 Mb, p<5×10−8. Scores are …

https://doi.org/10.7554/eLife.39702.005
Figure 1—figure supplement 4
Polygenic height scores in 1000 Genomes Project European populations using ~1700 independent SNPs and effect sizes from different summary statistics.

Polygenic scores in modern European populations are shown using SNPs LD-pruned by picking the SNP with the lowest P value in each of ~1700 LD-independent blocks genome-wide. Scores are centered by …

https://doi.org/10.7554/eLife.39702.006
Figure 1—figure supplement 5
Polygenic height scores in ancient populations using ~1700 independent SNPs and effect sizes from different summary statistics.

Polygenic scores in ancient meta-populations are shown using SNPs LD-pruned by picking the SNP with the lowest P value in each of ~1700 LD-independent blocks genome-wide. Scores are centered by the …

https://doi.org/10.7554/eLife.39702.007
Figure 1—figure supplement 6
Polygenic height scores in ancient and global modern populations using three different GWAS.

All scores are centered by the average score across all populations (μGIANT=0.645, μLOH=0.219, μNEALELAB=0.259) and standardized by the square root of the additive variance. Error bars are drawn at 95% credible intervals. Modern …

https://doi.org/10.7554/eLife.39702.008
Figure 2 with 4 supplements
Evidence of stratification in height summary statistics.

Top row: Pearson Correlation coefficients of (a) PC loadings and height beta coefficients from GIANT and UKB, and (b) PC loadings and SDS (pre-computed in the UK10K) across all SNPs. PCs were …

https://doi.org/10.7554/eLife.39702.010
Figure 2—source data 1

Evidence of stratification in height summary statistics.

https://doi.org/10.7554/eLife.39702.015
Figure 2—figure supplement 1
Pearson Correlation coefficients of PC loadings and height beta coefficients for different summary statistics.

PCs were computed in all 1000 genomes phase one samples. Colors indicate the correlation of each PC loading with the allele frequency difference between GBR and TSI, a proxy for the European …

https://doi.org/10.7554/eLife.39702.011
Figure 2—figure supplement 2
Heat map of mean beta coefficients for different summary statistics.

All SNPs are binned by GBR and TSI minor allele frequency. Only bins with at least 300 SNPs are shown. Panel 7 (as well as 2, 3 and 4) shows stratification effects in opposite direction to those in …

https://doi.org/10.7554/eLife.39702.012
Figure 2—figure supplement 3
Effect of GBR-TSI allele frequency difference on beta estimates and P values.

SNPs with MAF >0.2 (based on mean between TSI and GBR) were grouped into GBR-TSI allele frequency difference deciles, with the first decile representing SNPs less common in GBR and the last decile …

https://doi.org/10.7554/eLife.39702.013
Figure 2—figure supplement 4
Height (cm) in the UKB as a function of GBR-TSI score.

We computed the relative number of GBR to TSI related alleles in each sample by multiplying the allele frequency difference by the number of alternative alleles in each sample in the UKB (GBR-TSI …

https://doi.org/10.7554/eLife.39702.014
Figure 3 with 5 supplements
Height tSDS results for different summary statistics.

(a) Mean tSDS of the height increasing allele in each P value bin for six different summary statistics. The first two panels are computed analogously to Figure 4A and Figure S22 of Field et al. …

https://doi.org/10.7554/eLife.39702.016
Figure 3—source data 1

Height tSDS results for different summary statistics.

https://doi.org/10.7554/eLife.39702.022
Figure 3—figure supplement 1
tSDS for height-increasing alleles using effect sizes from different summary statistics.

SNPs were ordered by GWAS P value and grouped into bins of 1000 SNPs each. The mean tSDS score within each P value bin is shown on the y-axis. In contrast to Figure 3, where Spearman correlation …

https://doi.org/10.7554/eLife.39702.017
Figure 3—figure supplement 2
Allele frequency difference for height-increasing alleles using different summary statistics.

SNPs were ordered by GWAS P value and grouped into bins of 1000 SNPs each. The gray line indicates the null-expectation, and the colored lines are the linear regression fit. The lowest P value bin …

https://doi.org/10.7554/eLife.39702.018
Figure 3—figure supplement 3
tSDS for LD-pruned height-increasing alleles using effect sizes from different summary statistics.

Binning SNPs by P value can lead to spurious results at the low P value bins when SNPs are in LD (Figure 3—figure supplement 5). Here, LD-pruned SNPs were ordered by GWAS P value and grouped into …

https://doi.org/10.7554/eLife.39702.019
Figure 3—figure supplement 4
Allele frequency difference for LD-pruned height-increasing alleles using different summary statistics.

Binning SNPs by P value can lead to spurious results at the low P value bins when SNPs are in LD (Figure 3—figure supplement 5). Here, LD-pruned SNPs were ordered by GWAS P value and grouped into …

https://doi.org/10.7554/eLife.39702.020
Figure 3—figure supplement 5
Number of independent regions per GWAS P value bin in the UK Biobank.

SDS results in Field et al. as well as in Figure 3 in this article are visualized by grouping non-independent SNPs into bins according to their P value. This may lead to unpredictable patterns at …

https://doi.org/10.7554/eLife.39702.021
Figure 4 with 4 supplements
Polygenic height scores in POPRES populations show a residual albeit attenuated signal of polygenic adaptation for height.

Standardized polygenic height scores from four summary statistics for 19 POPRES populations with at least 10 samples per population, ordered by latitude (see Supplementary file 4). The grey line is …

https://doi.org/10.7554/eLife.39702.023
Figure 4—source data 1

Polygenic height scores in POPRES populations show a residual albeit attenuated signal of polygenic adaptation for height.

This reference was updated from its bioRxiv version to its now published version.

https://doi.org/10.7554/eLife.39702.028
Figure 4—figure supplement 1
Polygenic height scores in POPRES for different summary statistics.

Standardized polygenic height score from diverse summary statistics for 19 POPRES populations with at least 10 samples per population, ordered by latitude (see Supplementary file 4). Confidence …

https://doi.org/10.7554/eLife.39702.024
Figure 4—figure supplement 2
Test statistics for Qx (left) and latitude correlation (right) in the POPRES dataset for different summary statistics.

The numbers indicate P values and the number of SNPs, and numbers in bold highlight nominal significance (p<0.05).

https://doi.org/10.7554/eLife.39702.025
Figure 4—figure supplement 3
P value calibration in the POPRES dataset for Qx and latitude covariance tests.

Random sets of around 1700 independent markers were drawn in 100 repetitions for four summary statistics and Qx and latitude P values were computed. In UK Biobank sibling estimates this resulted in …

https://doi.org/10.7554/eLife.39702.026
Figure 4—figure supplement 4
Spearman correlations between polygenic height scores in the POPRES dataset computed from different summary statistics.

Spearman correlation coefficients of mean population polygenic score ranking for all pairs of summary statistics at different SNP selections. Polygenic scores from independent SNPs which are …

https://doi.org/10.7554/eLife.39702.027

Additional files

Supplementary file 1

Description of 11 GWAS summary statistics.

https://doi.org/10.7554/eLife.39702.029
Supplementary file 2

Table of ancient and 1000 genomes modern populations used with sample sizes.

https://doi.org/10.7554/eLife.39702.030
Supplementary file 3

Supplementary note on characterization of stratification effects in GIANT and UK Biobank.

https://doi.org/10.7554/eLife.39702.031
Supplementary file 4

Table of POPRES populations used with sample sizes and latitude.

https://doi.org/10.7554/eLife.39702.032
Supplementary file 5

LD Score regression estimates for 11 different summary statistics.

LD score regression can be used to detect residual stratification effects in summary statistics, and so we tested whether LDSC confirms our hypothesis of residual stratification. We detect a greatly inflated intercept estimate of 9.42 in UKB all no PCs, but only a moderately increased intercept value in GIANT and an intercept less than one in NG2015 sibs. The relatively small GIANT intercept can be explained by cohort-wise lambda-GC correction, while the low intercept in NG2015 sibs is possibly caused by the adaptive permutation procedure which does not compute precise p-values for non-significant associations. In both cases LDSC cannot be expected to pick up stratification effects, since the generation of summary statistics is not in line with the LDSC model.

https://doi.org/10.7554/eLife.39702.033
Supplementary file 6

Correlation of beta estimates at all 86,153 shared SNPs.

https://doi.org/10.7554/eLife.39702.034
Supplementary file 7

Correlation of beta estimates at 2251 shared SNPs which are significant in the UK Biobank.

https://doi.org/10.7554/eLife.39702.035
Transparent reporting form
https://doi.org/10.7554/eLife.39702.036

Download links