1. Evolutionary Biology
  2. Genetics and Genomics
Download icon

Reduced signal for polygenic adaptation of height in UK Biobank

  1. Jeremy J Berg  Is a corresponding author
  2. Arbel Harpak
  3. Nasa Sinnott-Armstrong
  4. Anja Moltke Joergensen
  5. Hakhamanesh Mostafavi
  6. Yair Field
  7. Evan August Boyle
  8. Xinjun Zhang
  9. Fernando Racimo
  10. Jonathan K Pritchard  Is a corresponding author
  11. Graham Coop  Is a corresponding author
  1. Columbia University, United States
  2. Stanford University, United States
  3. University of Copenhagen, Denmark
  4. University of California, Davis, United States
  5. Howard Hughes Medical Institute, Stanford University, United States
Research Communication
  • Cited 11
  • Views 4,450
  • Annotations
Cite this article as: eLife 2019;8:e39725 doi: 10.7554/eLife.39725

Abstract

Several recent papers have reported strong signals of selection on European polygenic height scores. These analyses used height effect estimates from the GIANT consortium and replication studies. Here, we describe a new analysis based on the the UK Biobank (UKB), a large, independent dataset. We find that the signals of selection using UKB effect estimates are strongly attenuated or absent. We also provide evidence that previous analyses were confounded by population stratification. Therefore, the conclusion of strong polygenic adaptation now lacks support. Moreover, these discrepancies highlight (1) that methods for correcting for population stratification in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of differences in polygenic scores between populations should be treated with caution until these issues are better understood.

Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).

https://doi.org/10.7554/eLife.39725.001

Introduction

In recent years, there has been great progress in understanding the polygenic basis of a wide variety of complex traits. One significant development has been the advent of ‘polygenic scores’, which aim to predict the additive genetic component of individual phenotypes using a linear combination of allelic contributions to a given trait across many sites. An important application of polygenic scores has been the study of polygenic adaptation—the adaptive change of a phenotype through small allele frequency shifts at many sites that affect the phenotype.

Thus far, the clearest example of polygenic adaptation in humans has come from analyses of polygenic scores for height in Europe. However, as we will show here, this signal is strongly attenuated or absent using new data from the UK Biobank (Bycroft et al., 2018), calling this example into question.

Starting in 2012, a series of papers identified multiple lines of evidence suggesting that average polygenic scores for height increase from south-to-north across Europe (Table 1). Analyses from multiple groups have concluded that the steepness of this cline is inconsistent with a neutral model of evolution, suggesting that natural selection drove these differences in allele frequencies and polygenic scores (Turchin et al., 2012; Berg and Coop, 2014; Robinson et al., 2015; Zoledziewska et al., 2015; Berg et al., 2017; Racimo et al., 2018; Guo et al., 2018). Significant differences in polygenic scores for height have also been reported among ancient populations, and these are also argued to have been driven by selection (Mathieson et al., 2015; Martiniano et al., 2017; Berg et al., 2017). In parallel, (Field et al., 2016) developed the Singleton Density Score (SDS)—which compares the distance to the nearest singleton on two alternative allelic backgrounds—to infer recent changes in allele frequencies, and used it to analyze a large sample of British individuals (the UK10K; UK10K Consortium, 2015). They found a significant covariance of SDS and effect on height, suggesting that natural selection drove a concerted rise in the frequency of height-increasing alleles in the ancestors of modern British individuals during the last 2,000 years (Field et al., 2016).

Table 1
Studies reporting signals of height adaptation in Europeans.

Prior to the UK Biobank dataset, studies consistently found evidence for polygenic adaptation of height. Notes: Most of the papers marked as having ‘strong’ signals report p-values <10-5, and sometimes 10-5. In the present paper, the UK Biobank analyses generally yield p-values >10-3.

https://doi.org/10.7554/eLife.39725.002
GWASApproachSignalReference
GIANT 2010European frequency cline of top SNPsstrongTurchin et al., 2012
validation: Framingham sibs
GIANT 2010Polygenic measures of pop. frequency differencesstrongBerg and Coop, 2014
GIANTPolygenic measures of pop. frequency differencesstrongBerg et al., 2017
strongRacimo et al., 2018
strongGuo et al., 2018
Polygenic diffs between ancient and modern populationsstrongMathieson et al., 2015
GIANTHeterogeneity of polygenic scores among populationsstrongRobinson et al., 2015
validation: R15-sibs
Sardinia cohortLow polygenic height scores in Sardinians. Effect estimates from Sardinian cohort at GIANT hit SNPsstrongZoledziewska et al., 2015
GIANT and R15-sibsSingleton density (SDS) in UK sample vs GWASstrongField et al., 2016
Also: LD Score regression (SDS vs GWAS)strong
UK BiobankPopulation frequency differencesweak or absentThis paper*
Singleton density (SDS) in UK sampleweak or absentThis paper*
LD Score regression (SDS vs GWAS)weakThis paper*
  1. *See also results from Sohail et al., 2019.

All such studies rely on estimates of individual allelic effects on height, as calculated from genome-wide association studies (GWAS). These GWAS estimates are then combined with population-genetic analysis to test for selection. Under a null model of no directional change, we would not expect ‘tall’ alleles to increase (or decrease) in frequency in concert; thus, loosely speaking, a systematic shift in frequency of ‘tall’ alleles in the same direction has been interpreted as evidence for selection.

While our focus here is on the the distribution of height polygenic scores in Europe, we see this as a case study for understanding the challenges in comparing polygenic scores across populations in general. Compared to other complex traits, height is particularly well-characterized, and the evidence for adaptation of height in Europeans seemed clear. Thus our work highlights a need for caution in this area until these issues are more fully understood (Novembre and Barton, 2018).

GWAS data used to study adaptation of height

Until recently, the largest height GWAS dataset came from the GIANT consortium (253,288 individuals as of 2014; Wood et al., 2014). This is the primary GWAS underlying most studies of adaptation of height. Additionally, several groups have used other, smaller, datasets to replicate signals found using GIANT (Turchin et al., 2012; Robinson et al., 2015; Zoledziewska et al., 2015; Field et al., 2016; Berg et al., 2017). In particular, because it is known that population structure may be a confounder in GWAS, leading to false positive inferences of polygenic adaptation, several groups sought to replicate signals using family-based analyses, which protect against confounding due to stratification (Allison et al., 1999; Spielman and Ewens, 1998; Abecasis et al., 2000).

The first replication, by Turchin et al. (2012), showed that the effect sizes of the top 1,400 GIANT associations (based on an earlier version of GIANT, published by (Lango Allen et al., 2010) were statistically consistent with effect sizes re-estimated in a smaller sibling-based regression approach using data from the Framingham Heart Study (4819 individuals across 1761 nuclear sibships from Splansky et al., 2007). Sibling-based regression is considered to be immune to confounding by population structure, and so the agreement of effect sizes between studies was taken as validation of the north-south gradient observed when using the GIANT effect sizes.

The second, partially independent, replication came from Zoledziewska et al. (2015), who selected 691 height-associated SNPs on the basis of the GIANT association study, and then computed polygenic scores using effect sizes re-estimated in a cohort of 6307 individuals of Sardinian ancestry. They determined that the average polygenic score of Sardinian individuals was significantly lower than observed for other European populations, consistent with the previously reported north-south gradient of polygenic scores.

A third replication was performed by Robinson et al. (2015), who used a different, larger sibling-based GWAS to identify associations (17,500 sibling pairs from Hemani et al., 2013). We refer to this sibling-based dataset as ‘R15-sibs’. These authors showed that the north-south frequency gradient replicates using SNPs ascertained from the sibling-based GWAS. This replication is stronger than that performed by either Turchin et al. (2012), or that by Zoledziewska et al. (2015), as the cohort is larger and the SNP ascertainment did not rely on GIANT. As pointed out in the supplementary note of Robinson et al. (2015), this two-step procedure—ascertaining with a large but potentially biased GWAS like GIANT, before switching to a less powerful but hopefully unbiased replication GWAS—has the potential to introduce an ascertainment bias, even if the effects are correctly estimated in the replication study (we note that a small fraction of the GIANT samples are contained within the R15-sibs analysis, so the effect sizes are not strictly independent; however, because of the sibling design, any bias due to stratification in GIANT should be absent in R15-sibs). The R15-sibs study was also used by Field et al. (2016) to verify a signal of recent selection in ancestors of the present day British population. Field et al. (2016) found that the signal of selection was fully replicated when using R15-sibs data.

Lastly, Field et al. (2016) also used LD Score regression to test for height adaptation in the British while controlling for population structure (Bulik-Sullivan et al., 2015a; Bulik-Sullivan et al., 2015b). While LD Score regression is typically used to estimate genetic covariance between two phenotypes, Field et al. (2016) used it to test for a relationship between height effects and a recent increase in frequency (measured by SDS)—and found a strong covariance of the two consistent with selection driving allele frequency change at height loci.

Here, we reassess these previously reported signals using data from the UK Biobank (UKB) with genotype and phenotype data for nearly 500,000 residents of the United Kingdom (Bycroft et al., 2018). UKB has recently become a key resource for GWAS, thanks to its large sample, the relatively unstructured population (compared to international studies such as GIANT), and the opportunity for researchers to work directly with the genotype data rather than with summary statistics.

This paper has two aims. First, we will show that previously reported signals of directional selection on height in European populations generally do not replicate when using GWAS effect estimates from the UK Biobank. Similar findings have been obtained independently by other groups working in parallel (Sohail et al., 2019; Uricchio et al., 2019). Second, we will show that both the GIANT and R15-sibs GWAS are confounded due to stratification along the North-South gradient where signals of selection were previously reported. Signals detected using R15-sibs effect estimates were previously used as a significant source of evidence in favor of adaptation by Field et al. (2016), as well as in (Berg et al., 2017). However, the investigators leading the (Robinson et al., 2016) GWAS have now confirmed that the effect size estimates released from their 2015 study were strongly affected by population structure due to a computational bug (Robinson and Visscher, 2018). We include an analysis of the R15-sibs GWAS here to document how these spurious signals affected previous inferences, as well as the evidence that indicated the presence of confounding in the data prior to detection of the bug.

The conclusion that adaptation signals in GIANT were spurious has broader implications for GWAS analysis, as it indicates that standard approaches for population structure correction may not always be sufficient, and that further study is needed to understand their limitations. While we anticipate that current methods are likely adequate for many applications, in particular for identification and broad-scale localization of strong genotype-phenotype associations—they may be insufficient for applications such as phenotypic prediction and the detection of polygenic adaptation as these can be highly prone to the cumulative bias through uncorrected structure. Such analysis should be undertaken with great care.

Results

GWAS datasets

We downloaded or generated seven different height GWAS datasets, each relying on different subsets of individuals or using different analysis methods. The bold-faced text give the identifiers by which we will refer to each dataset throughout this paper. These include two previous datasets that show strong evidence for polygenic adaptation, as well as an updated version of the R15-sibs dataset released in response to results in the initial preprint version of this manuscript:

GIANT: (n = 253 k) 2014 GIANT consortium meta-analysis of 79 separate GWAS for height in individuals of European ancestry, with each study independently controlling for population structure via the inclusion of principal components as covariates (Wood et al., 2014).

R15-sibs: (n = 35 k) Family-based sib-pair analysis of data from European cohorts (Hemani et al., 2013; Robinson et al., 2015). The effect sizes associated with this paper were publicly released in 2016 (Robinson et al., 2016).

R15-sibs-updated: (n = 35 k) In November 2018, while this paper was in the final stages of revision, the authors of the R15-sibs data reported that their earlier data release failed to correct properly for structure confounding. They released this corrected dataset as a replacement (Robinson and Visscher, 2018).

We also considered four different GWAS analyses of the UK Biobank data, using different subsets of individuals and different processing pipelines:

UKB-GB: (n = 337 k) Linear regression controlling for 10 principal components of ancestry (unrelated British ancestry individuals only) (Churchhouse and Neale, 2017).

UKB-Eur: (n = 459 k) All individuals of European ancestry, including relatives. Structure correction was performed using a Linear Mixed Model (LMM) approach, which controls for genetic stratification effects by modeling the genetic background as a random effect with covariance structure given by the kinship matrix. Mild amounts of environmental stratification are subsumed into this term, and therefore controlled implicitly (Loh et al., 2017).

UKB-GB-NoPCs: (n = 337 k) Linear regression without any structure correction—with only genotype, age, sex and sequencing array as covariates (unrelated British ancestry individuals only) [newly calculated by us, see Materials and methods].

UKB-sibs: (n = 35 k) Family-based sib-pair analysis [newly calculated by us, see Materials and methods].

To understand the extent to which these different datasets capture a shared signal, we treated each set of summary statistics as if it were derived from a GWAS of a different phenotype and estimated the genetic correlation between them using bivariate LD Score regression (Bulik-Sullivan et al., 2015a). We find that all of these studies show high pairwise genetic correlations , consistent with the view that all of them estimate a largely-similar genetic basis of height (Table 2).

Table 2
Pairwise genetic correlations between GWAS datasets.

Genetic correlation estimates (lower triangle) and their standard errors (upper triangle) between each of the height datasets, estimated using LD Score regression (Bulik-Sullivan et al., 2015a). All trait pairs show a strong genetic correlation, as expected for different studies of the same trait.

https://doi.org/10.7554/eLife.39725.003
GiantR15-sibsUKB-EurUKB-GB-NoPCsUkb-gbUKB-sibs
GIANT(0.04)(0.01)(0.01)(0.01)(0.05)
R15-sibs0.98(0.04)(0.04)(0.05)(0.08)
UKB-Eur1.030.87(0.004)(0.004)(0.05)
UKB-GB-NoPCs1.010.821.00(0.002)(0.05)
UKB-GB1.030.891.021.00(0.05)
UKB-sibs1.020.931.061.021.06

Signal of selection across Eurasia

One well-studied signal of adaptation of height in Europe has been the observation that, among height-associated SNPs, the ‘tall’ alleles tend to be at higher frequencies in northern populations. Equivalently, the average polygenic scores of individuals in northern populations tend to be higher than individuals in southern populations. To evaluate this signal for each dataset, we independently ascertained the SNP with the smallest p-value within each of 1700 approximately independent LD blocks (Berisa and Pickrell, 2016; Berg et al., 2017) (subject to the constraint that MAF > 0.05 within the GBR 1000 Genomes population). We used these loci to calculate average polygenic scores for each of a set of European population samples taken from the 1000 Genomes and Human Origins panels (Lazaridis et al., 2014; 1000 Genomes Project Consortium et al., 2015) (see Materials and methods for statistical details).

As expected, we find highly significant latitudinal gradients in both the GIANT and R15-sibs data. However, this signal does not replicate in any of the four UK Biobank datasets (Figure 1, top row), or in the R15-sibs-updated dataset (Figure 1—figure supplement 1).

Figure 1 with 1 supplement see all
Polygenic scores across Eurasian populations, for different GWAS datasets.

The top row shows European populations from the combined 1000 Genomes plus Human Origins panel, plotted against latitude, while the bottom row shows all Eurasian populations from the same combined dataset, plotted against longitude.

https://doi.org/10.7554/eLife.39725.004

We also tested whether the polygenic scores are over-dispersed compared to a neutral model, without requiring any relationship with latitude (the QX test from Berg and Coop, 2014). Here we find a similar pattern: we strongly reject neutrality using both the GIANT and R15-sibs datasets, but see little evidence against neutrality among the UK Biobank datasets, or the R15-sibs-updated dataset (Figure 1). The sole exception is for the UKB-GB dataset, though the rejection of neutrality in this dataset is marginal compared to that observed with GIANT and R15-sibs, and it does not align with latitude.

While most studies have focused on a latitudinal cline in Europe, a preprint by Berg et al. (2017) also recently reported a cline of polygenic scores decreasing from west to east across all of Eurasia. Extending this analysis across all six datasets, we observe similarly inconsistent signals (Figure 1, bottom row). Only the GIANT dataset shows the clear longitudinal signal reported by Berg et al. (2017), though the R15-sibs dataset is again strongly over-dispersed in general, and retains some of the longitudinal signal. Interestingly, we find a weakly significant relationship between longitude and polygenic score in the UKB-Eur dataset (though not in the other UKB datasets), suggesting there may be systematic differences between the results based on British-only and pan-European samples, even when state of the art corrections for population structure are applied.

We also experimented with a larger number of SNPs using a procedure similar to Robinson et al. (2015) (Appendix 1). We found that in some cases this led to significant values of QX when using UKB-GB effect sizes to ascertain SNPs. However, this signal was sensitive to the particular method of ascertainment, and seems to be diffuse (i.e. spread out across all axes of population structure, Appendix 1–figure 6), with part of the signal coming from closely linked SNPs. Thus we conclude that this signal is not robust and may, at least partially, arise from a violation of the assumption of independence among SNPs that underlies our neutral model. We also tested different frequency, effect size and probability-of-association cutoffs to determine which SNPs we included in the computation of the scores, but found none of these cutoffs affected the discrepancy observed between the GIANT and UKB-GB datasets (Appendix 2).

SDS signal of selection in Britain

We next evaluated the Singleton Density Score (SDS) signal of selection for increased height in the British population, previously reported by Field et al. (2016). SDS estimates recent changes in allele frequencies at each SNP within a population by comparing the distances to the nearest singleton variants linked to each of the focal SNP’s alleles. (Field et al., 2016) applied SDS calculated across the UK10K sample (UK10K Consortium, 2015) to investigate allele frequency changes in the ancestors of modern British individuals. SDS can be polarized according to the sign of a GWAS effect at each SNP–this is denoted trait-SDS, or tSDS. Here, tSDS > 0 indicates that a height-increasing allele has risen in frequency in the recent past; tSDS < 0 correspondingly indicates a decrease in frequency of the height-increasing allele. A systematic pattern of tSDS > 0 is consistent with directional selection for increased height.

Using both GIANT and R15-sibs, Field et al. (2016) found a genome-wide pattern of positive tSDS, indicating that on average, height-increasing alleles have increased in frequency in the last ∼75 generations. tSDS also showed a steady increase with the significance of a SNP’s association with height. We replicate these trends in Figure 2A,B.

Figure 2 with 1 supplement see all
SDS signals for recent selection, assessed using different height GWAS.

(A–F) Each point shows the average tSDS (SDS polarized to height-increasing allele) of 1000 consecutive SNPs in the ordered list of GWAS p-values. Positive values of tSDS are taken as evidence for selection for increased height, and a global monotonic increase—as seen in panels A and B—suggests highly polygenic selection. (G–L) tSDS distribution for the most significant SNPs in each GWAS, thinned according to LD to represent approximately independent signals. Dashed vertical lines show tSDS = 0, as expected under the neutral null; solid vertical lines show mean tSDS. A significantly positive mean value of tSDS suggests selection for increased height.

https://doi.org/10.7554/eLife.39725.006

This tSDS trend is greatly attenuated in all four GWAS versions performed on the UK Biobank sample (Figure 2C–F), as well as the R15-sibs-updated dataset (Figure 2—figure supplement 1). The correlation between UKB-GB GWAS p-value and tSDS is weak (Spearman ρ=0.009, block-jackknife p=0.04). This correlation is stronger for the UKB-Eur GWAS (ρ=0.018, p=5×107). Since the UKB-Eur GWAS is not limited to British individuals—but instead includes all European ancestry individuals—this might suggest that residual European population structure continues to confound UKB-Eur effect estimates, despite the use of LMM correction for structure, similar to the longitudinal signal detected above for this same dataset (Figure 1).

We wondered whether the main reason for the weakened trend in UKB-GB is an overly conservative PC-correction. This could occur if the genetic contribution to height is highly correlated with population structure axes. If this were the case, we would expect the correlation between GWAS p-value and tSDS to still be observed in a UKB GWAS without population structure correction (namely, in UKB-GB-NoPCs). However, we see no evidence for this correlation (block jackknife p = 0.6). Taken together with the UKB-GB-NoPCs polygenic score analysis (Figure 1), the lack of signal in UKB-GB-NoPCs suggests that the main reason that UKB is less confounded by population structure than GIANT is the relatively-homogeneous ancestry of the UKB British sample—rather than differences in GWAS correction procedures.

Lastly, we examined tSDS at the most significant height-associated SNPs of each UKB GWAS (as before, ascertained in approximately-independent LD blocks). Significant SNPs show a positive average tSDS (Figure 2I,K,L; t-test p < 0.05)—with the exception of the UKB-GB-NoPCs GWAS (Figure 2J) in which the average tSDS is not significantly different from zero (t-test p = 0.6).

Relationship between GWAS estimates and European population structure

We have now shown that signals of polygenic adaptation of height are greatly reduced in the UKB data relative to the GIANT and R15-sibs datasets. To better understand the differences among the datasets, we ascertained 1,652 approximately-independent lead SNPs based on the GIANT p-values to form the basis of comparison between the GIANT and UKB-GB datasets.

Figure 3A shows the effect sizes of ancestral alleles, as estimated using GIANT (x-axis) and UKB-GB (y-axis). The two datasets are highly correlated (r2=0.78, p<2.2×1016), consistent with the strong genetic correlation estimated in Table 2. The fact that the slope is <1 probably reflects, at least in part, the standard winner’s curse effect for SNPs ascertained in one study and replicated in another.

Figure 3 with 2 supplements see all
Effect size estimates and population structure.

Top Row: SNPs ascertained using GIANT compared with UKB-GB. (A) The x- and y-axes show the estimated effect sizes of SNPs in GIANT and in UKB-GB. Note that the signals are highly correlated overall, indicating that these partially capture a shared signal (presumably true effects of these SNPs on height). (B) The x-axis shows the difference in ancestral allele frequency for each SNP between 1000 Genomes GBR and TSI; the y-axis shows the difference in effect size as estimated by GIANT and UKB-GB. These two variables are significantly correlated, indicating that a component of the difference between GIANT and UKB-GB is related to the major axis of population structure across Europe. Bottom Row: SNPs ascertained using R15-sibs compared with UKB-GB. (C) The same plot as panel (A), but ascertaining with and plotting R15-sibs effect sizes rather than GIANT. Here, the correlation between effect size estimates of the two studies is reduced relative to panel A–likely due to the lower power of the R15-sibs study compared to GIANT (D). Similarly, the same as (B), but with the R15-sibs ascertainment and effect sizes.

https://doi.org/10.7554/eLife.39725.008

Importantly however, we also see clear evidence that the differences between the GIANT and UKB-GB effect sizes are correlated with European population structure (Figure 3B). Specifically, for each SNP we plotted the difference in effect size between GIANT and UKB-GB against the difference in allele frequency between northern and southern European samples (specifically, between the British (GBR) and Tuscan (TSI) subsets of 1000 Genomes) . These differences have a significant correlation (r2=0.06, p<2.2×1016), indicating that alleles that are more frequent in GBR, compared to TSI, tend to have more positive effect sizes in GIANT than in UKB-GB, and vice versa. We also observed a similar signal for frequency differences between TSI and the Han Chinese in Beijing (CHB, Figure 3—figure supplement 1), suggesting that longitudinal patterns observed by Berg et al. (2017) were also likely driven by incompletely controlled stratification in GIANT.

Similar patterns are present in a comparison of the R15-sibs and UKB-GB datasets when ascertaining from R15-sibs p-values (Figure 3, panels C and D; 1,642 SNPs). Here, the correlation between effect size estimates is much lower (r2=0.14, p<2.2×1016), likely due to the much smaller sample size of R15-sibs, and therefore elevated winner’s curse effects. However, the correlation between the effect-size difference and the GBR-TSI allele frequency difference remains (r2=0.07, p<2.2×1016). In contrast, when SNPs are ascertained on the basis of their UKB-GB p-value, these patterns are considerably weaker in both the GIANT and R15-sibs datasets (Figure 3—figure supplement 2).

Finally, an unexpected feature of the R15-sibs dataset can be seen in Figure 3D: there is a strong skew for the ancestral allele to be associated with increased height (1,201 out of the 1,642 SNPs ascertained with R15-sibs p-values have positive effect sizes in R15-sibs). This pattern is not present in the R15-sibs-updated dataset (851 out of 1,699 SNPs with positive effects), or any other dataset we analyzed, suggesting that it is likely a result of the failure to control for population structure.

Together, these observations suggest that while all of the datasets primarily capture real signals of association with height, both the GIANT and R15-sibs effect size estimates suffer from confounding along major axes of variation in Europe and Eurasia. This could drive false positive signals in geographic-based analyses of polygenic adaptation. Furthermore, since SDS measured in Britain correlates with north-south frequency differences (Field et al., 2016), this could also drive false positives for SDS.

To explore this further, we next turn to an analysis of the datasets based on LD Score regression.

LD Score regression signal

Another line of evidence from Field et al. (2016) came from LD Score regression (Bulik-Sullivan et al., 2015a; Bulik-Sullivan et al., 2015b). LD Score regression applies the principle that, under a polygenic model, SNPs in regions of stronger LD (quantified by a SNP's ‘LD Score’) should tag more causal variants and therefore have larger squared effect size estimates. Similarly, if two traits share a genetic basis, then the correlation between estimated effect sizes of these traits should increase with LD Score. Meanwhile, confounders such as population structure are argued to affect SNPs of different LD Score equally, and therefore affect the intercept but not the slope of a linear regression to LD Score (we return to this point below; Bulik-Sullivan et al., 2015b).

While LD Score regression is commonly used to estimate the genetic covariance between pairs of phenotypes (Bulik-Sullivan et al., 2015a), Field et al. (2016) used it to test for a relationship between height and SDS. SDS is similar to GWAS effect estimates in that the expected change in frequency of an allele depends on both direct selection it experiences due to its own fitness effect as well as correlated selection due to the effects of those in linkage disequilibrium with it. Field et al. (2016) predicted that the covariance between estimated marginal height effect and SDS should increase with LD Score—and found this to be the case using both GIANT and R15-sibs. This provided further evidence for polygenic adaptation for increased stature in Britain.

Here, we revisit Field et al. (2016)’s observations (Figure 4A,B). Both GIANT and R15-sibs exhibit a highly significant LD Score regression slope (scaled GIANT slope =0.17, p=5×10-9; scaled R15-sibs slope =0.46, p=7×1017), as well as a highly significant intercept (GIANT intercept =0.093, p=4×1071; R15-sibs intercept =0.119, p=2×1087). These large intercepts suggest that both GWAS suffer from stratification along an axis of population structure that is correlated with SDS in the British population. In contrast, in LD Score regression with the UKB-GB GWAS, the intercept is not significant (p=0.10), suggesting that UKB-GB is not strongly stratified (or at least, not along an axis that correlates with SDS). The slope is ∼1/3 as large as in GIANT, though still modestly significant (p=1.2×102, Figure 4C). There is no significant slope (p=0.389) or intercept (p=0.405) in R15-sibs-updated (Figure 4—figure supplement 1B), and analyses of other UKB datasets give similar results to those for UKB-GB (Figure 4—figure supplement 2).

Figure 4 with 3 supplements see all
LD Score regression analyses.

(A), (B), and (C) LD Score covariance analysis of SDS with GIANT, R15-sibs, and UKB-GB, respectively. The x-axis of each plot shows LD Score, and the y-axis shows the average value of the product of effect size on height and SDS, for all SNPs in a bin. Genetic correlation estimates are a function of slope, reference LD Scores, and the sample size Bulik-Sullivan et al. (2015a). Both the slope and intercept are substantially attenuated in UKB-GB. (D), (E) and (F) Genetic covariance between GBR-TSI frequency differences vs. GIANT, R15-sibs, and UKB-GB. GIANT and R15-sibs show highly significant nonzero intercepts, consistent with a signal of population structure in both datasets, while UKB-GB does not. In addition, R15-sibs shows a significant slope with LD Score.

https://doi.org/10.7554/eLife.39725.011

LD Score regression of population frequency differences

We next extended Field et al.’s LD Score rationale from SDS to test whether SNP effects on height affected allele frequency differentiation between northern and southern Europe. We used 1000 Genomes British (GBR) and Tuscan (TSI) samples as proxies for northern and southern ancestry respectively. To control for the correlation between allele frequencies and LD Score, we normalized the frequency differences to have variance 1 within 1% average minor allele frequency bins. For shorthand, we refer to this measure as [GBR-TSI]. Under a model of selection driving allele frequency differences, we would expect the covariance of [GBR-TSI] and effect sizes to increase with LD Score. To test this, we regressed the product [GBR-TSI] × effect size (estimated in previous and UKB GWAS) against LD Score.

In contrast to SDS, we find that none of the GWAS datasets show a strongly positive slope (Figure 4D–F): the slope is approximately zero in GIANT, weakly positive in R15-sibs (p=0.002), and weakly negative in UKB-GB (p=0.09) Results were similar for the other UKB datasets (Figure 4—figure supplement 3), and for R15-sibs-updated (Figure 4—figure supplement 1A). We see extremely strong evidence for positive intercepts in GIANT (p = 4×10-80) and R15-sibs (p = 9×10161), but not in UKB-GB (p=0.05), R15-sibs-updated (p=0.848) or the other UKB datasets. The large intercepts in GIANT and R15-sibs are consistent with stratification affecting both of these GWAS, as the North-South allele frequency difference is systematically correlated with the effect sizes in these GWAS independently of LD Score (see the Materials and methods for a more technical discussion). However, the relative lack of slope in these analyses suggests that the LD Score signal for SDS must be driven by a component of frequency change that is largely uncorrelated with the [GBR-TSI] axis of variation.

Population structure confounds LD Score regression slope

The original LD Score regression paper noted that in the presence of linked selection, allele frequency differentiation might plausibly increase with LD Score. However, they concluded that this effect was negligible in the examples they considered (Bulik-Sullivan et al., 2015b), and subsequent applications of the LD Score regression approach have generally assumed that the two are independent. We find that in bivariate LD Score analyses of SDS, both the intercept and the slope differ significantly from zero for precisely the same GWAS datasets that show evidence of stratification (GIANT and R15-sibs), while both the slope and intercept are much reduced in all of the UK Biobank datasets. This suggests that the LD Score regression slope may not be as robust to stratification as hoped, prompting us to revisit the assumption of independence.

We find that the squared allele frequency difference [GBR-TSI]2 is significantly correlated with LD Score (p=2.5×10-5, Figure 5A), as are squared allele frequency contrasts for much lower levels of differentiation (i.e. between self-identified ‘Irish’ and ‘White British’ individuals in the UK Biobank (p=2.5×10-7, Figure 5—figure supplement 1), and SDS2 (p=2.9×10-42, Figure 5B). Strikingly, squared measures of more recent allele frequency change (i.e. SDS2 and the squared Irish vs. White British contrast) are much more tightly correlated with LD Score than that of more diverged populations [GBR-TSI]2, suggesting that the LD Score regression slope may be equally vulnerable to stratification involving closely related populations than for those that are more distantly related. Finally, the product [GBR-TSI] × SDS is also correlated with LD Score (Figure 5C), demonstrating that the general signal of greater allele frequency change in regions of stronger LD is also shared between [GBR-TSI] and SDS.

Figure 5 with 1 supplement see all
Population allele frequencies show genetic correlation with European height GWAS.

(A), (B) and (C) Magnitude of squared GBR-TSI allele frequency differences, squared SDS effect sizes, and the product of allele frequency and SDS increase with LD Score. Both SDS and GBR-TSI frequency difference are standardized and normalized within 1% minor allele frequency bins..

https://doi.org/10.7554/eLife.39725.015

Background selection and LD Score

As noted above, the correlation we observe between allele frequency differentiation and LD Score could be generated by the genome-wide effects of linked selection. While a range of different modes of linked selection likely act in humans, one of the simplest is background selection (Charlesworth et al., 1993; Charlesworth, 1998; McVicker et al., 2009). Background selection (BGS) on neutral polymorphisms results from the purging of linked, strongly deleterious alleles. Because any neutral allele that is in strong LD with a deleterious mutation will also be purged from the population, the primary effect of BGS is a reduction in the number of chromosomes that contribute descendants in the next generation. The impact of BGS can therefore be approximately thought of as increasing the rate of genetic drift in genomic regions of strong LD relative to regions of weak LD. Therefore, SNPs with larger LD Scores will experience stronger BGS and a higher rate of genetic drift, and this effect could generate a positive relationship between LD Scores and allele frequency differentiation.

In Appendix 3, we derive a simple model for the effect of BGS on the relationship between allele frequency divergence and LD Scores. Empirically, we find that LD Scores are positively correlated with the strength of background selection (Appendix 3—figure 1) (McVicker et al., 2009), and that our simple model of background selection is capable of explaining much of the relationship between LD Score and allele frequency divergence that we observe in Figure 5 (Appendix 3—figures 13). Further, in the presence of BGS, bivariate regression of a stratified GWAS together with a measure of allele frequency differentiation can result in a positive slope, provided that the axis of stratification is correlated with the chosen measure of allele frequency divergence (Appendix 3).

Summary of LD Score regression results

What conclusions should we draw from our LD Score regression analyses? The significant positive intercepts observed for LD Score regression of both GIANT and R15-sibs with [GBR-TSI] suggest that both datasets suffer from confounding due to stratification along a north-to-south axis within Europe. These observations are consistent with the evidence presented in Figure 3. A positive slope in such analyses was previously interpreted as evidence of positive selection either on height or a close genetic correlate (and presumed to be robust to stratification). However, BGS can, and empirically does, violate the assumptions of LD Score regression in a way that may generate a positive slope. We therefore interpret the positive slopes observed for the LD Score regression signals for GIANT and R15-sibs with SDS as likely resulting from a combination of stratification and BGS. A similar conclusion applies to the positive slope observed for R15-sibs × [GBR-TSI]. It is unclear why stratification plus BGS should have elevated the slope for GIANT × SDS, but not for GIANT × [GBR-TSI]. This may suggest that the apparent SDS selection signal found in GIANT may be driven by an axis of variation that is not strongly correlated with [GBR-TSI]. We view this as an area worthy of further exploration.

Discussion

To summarize the key observations, we have reported the following:

  • Multiple analyses based on GIANT and R15-sibs indicate strong signals of selection on height.

  • However, the same signals of selection are absent or greatly attenuated in UK Biobank data. In some, but not all, analyses of frequency differentiation and SDS we still detect weakly significant signals of polygenic adaptation (Figures 1 and 2).

  • The GIANT height GWAS is overall highly correlated with UKB-GB, but differs specifically by having an additional correlation with the main gradient of allele frequency variation across Europe, as modeled by frequency differences between GBR and TSI (Figure 3). LD Score analysis of [GBR-TSI] × GIANT effect-size also suggests that GIANT is stratified along this axis (positive intercept in Figure 4D).

  • Selection signals in the R15-sibs data are consistent with, and in some cases even stronger, than the corresponding signals in GIANT, but are inconsistent with analyses using UK Biobank data. While correctly implemented sib-based studies are designed to be impervious to population structure, [GBR-TSI] × R15-sibs effect size also shows a highly positive intercept in the LD Score analysis presented in Figure 4E. As discussed below, the R15-sibs authors have recently identified a bug in the pipeline that generated this dataset. Analysis of corrected summary statistics does not show such a signal (Figure 4—figure supplement 1).

  • LD Score analyses show a much stronger relationship between SDS and GIANT or R15-sibs than between SDS and UKB-GB. LD Score regression is generally considered to be robust to population structure (but see the discussion in Bulik-Sullivan et al., 2015b). However, the intensity of background selection increases with LD Score (Figure 5, Appendix 3), and this has likely inflated the LD Score-based signal of selection in GIANT and R15-sibs.

In principle, it is possible that height in the UKB is confounded in a way that suppresses the signal of height adaptation. Instead, multiple lines of evidence strongly suggest that population-structure confounding in GIANT and R15-sibs is the main driver of the discrepancy with UKB-based analyses.

The sib design used by Robinson et al offered a strong independent replication of the polygenic adaptation signal, which should have been impervious to population structure concerns. However, our analyses highlight multiple signs of stratification in this study. Robinson et al have now confirmed (as of November 2018) that effect sizes they released from their 2015 study were strongly affected by population stratification due to a bug. Furthermore, they have now stated that the effect sizes that they released publicly in 2016 were not the effect sizes used in their 2015 paper. As part of our own investigation, we have independently confirmed that sib-studies conducted using PLINK v1.90b5 are robust to environmental stratification (Appendix 4–figure 1). Our analyses using the newly rerun effect sizes released by Robinson and Visscher, 2018 show no consistent signal of selection (Figure 1—figure supplement 1, Figure 2—figure supplement 1. and Figure 4—figure supplement 1), in line with our UKB-based analyses.

GIANT was conducted as a collaboration among a large number of research groups that provided summary statistics to the overall consortium. While the overall value of this pioneering dataset is not in question, it would not be surprising in retrospect if this GWAS were impacted by residual stratification along major axes of population structure.

Lastly, we must conclude that the strong signal of LD Score genetic covariance between SDS and both GIANT and R15-sibs is largely spurious. This would imply that the LD Score regression slope is not robust to population structure confounding. Specifically, we demonstrated that background selection—through its correlation with LD Score—can potentially generate a spurious LD Score regression slope.

Taken together, these observations lead us to conclude that what once appeared an ironclad example of population genetic evidence for polygenic adaptation now lacks any strong support. That said, there is still strong evidence that typical GWAS, including GIANT, do capture genuine signals of genotype-phenotype associations. For example, GWAS datasets regularly show strong functional enrichments of heritability within active chromatin from trait-relevant tissues (Finucane et al., 2015), and the observation that top SNPs identified in GIANT tend to replicate in UKB-GB (Figure 3A), together with the high genetic correlations among all of the datasets (Table 2), suggest that the vast majority of the signal captured by GIANT is real.

Nonetheless, we have shown that GIANT effect-size estimates contain a component arising from stratification along a major axis of European population structure (Figure 3B), and one would like to know the extent to which the conclusions from other analyses of GIANT, or other GWAS, may be affected. A complete investigation of this is beyond the scope of this study, and will depend on the nature of the analyses performed. The problem is likely most acute for the analysis of polygenic scores in samples drawn from heterogeneous ancestry. This is because while the bias in detection and effect sizes at any individual locus is small, the systematic nature of biases across many loci compound to significant errors at the level of polygenic scores. This error substantially inflates the proportion of the variance in polygenic scores that is among populations. Individual level prediction efforts therefore suffer dramatically from stratification bias, as even small differences in ancestry will be inadvertently translated into large differences in predicted phenotype (Kerminen et al., 2018). This seems likely to remain a difficult complication even within datasets such as the UK Biobank, though we suspect that meta-analyses such as GIANT, which collate summary statistics from many sources, may be particularly sensitive to structure confounding.

These issues are apparent even within our UK Biobank results, where we see marked differences between results based on UKB-GB and UKB-Eur (Figure 2C,I vs. D, J and Figure 4—figure supplement 3). The study subjects in the two datasets were largely overlapping, and both were computed using widely-accepted structure-correction methods, suggesting that in the more demanding setting of broad European ancestry variation, the linear mixed model approach did not provide complete protection against stratification. This highlights a need for renewed exploration of the robustness of these methods, especially in the context of polygenic prediction.

The study of polygenic scores across ancestry and environmental gradients offers a range of promises and pitfalls (Berg and Coop, 2014; Novembre and Barton, 2018). Looking forward, we recommend that studies of polygenic adaptation should focus on datasets that minimize population structure (such as subsets of UKB), and where the investigators have access to full genotype data, including family data, so that they can explore sensitivity to different datasets and analysis pipelines.

Materials and methods

Newly calculated GWAS

Figure 1 and Figure 2 display analyses based on six different GWAS. Two of these GWAS were newly calculated by us using UK Biobank data. Below, we describe the specifics of these two GWAS.

UKB-GB-NoPCs

Request a detailed protocol

To preform this GWAS, we used the following plink v. 2.0 (Purcell et al., 2007) with command line as follows:

  • plink2 --memory 64G --threads 16 --linear

  • --bpfile UKB imputed SNPs bp file

  • --keep id list of individuals self-identified as ‘White British’

  • --out output file

  • --pheno standing height phenotype file (UKB phenotype 50.0.0)

  • ---covar covariates file

The covariates file included only the sex, age and sequencing array for each individual id. We filtered all AG or CT SNPs–to prevent the possibility of strand errors. Finally, we excluded SNPs for which SDS was not calculated in Field et al. (2016).

UKB-sibs

Request a detailed protocol

We used the estimated kinship coefficient (ϕ) and the proportion of SNPs for which the individuals share no allele (IBS0) provided by the UK Biobank, to call siblings as pairs with

125/2<ϕ<123/2

and IBS0 >0.0012—following the conditions used by Bycroft et al. (2018). We further filtered sibling pairs such that both individuals were ‘White British’, their reported sex matched their inferred sex, were not identified by the UK Biobank as ‘outliers’ based on heterozygosity and missing rate nor had an excessive number relatives in the data, and had height measurements. We standardized height values for each sex based on its mean and standard deviation (SD) values in the sample of 336,810 unrelated British ancestry individuals: mean 175.9 cm and SD 6.7 cm for males, and mean 162.7 cm and SD 6.2 cm for females. We also removed pairs if one of the siblings was more than 5 SD away from the mean. After applying all filters, 19,268 sibling pairs remained, equaling 35,524 individuals in 17,275 families. We performed an association analysis on 10,879,183 biallelic SNPs included in UKB-GB (converting dosages from imputation to genotype calls using no hard calling threshold), using plink v. 1.9 (Purcell et al., 2007) QFAM procedure with the following command:

  • plink --bfile UKB hard-called SNPs file

  • -out output file

  • -qfam mperm = 100000

The family relationships, as well as the phenotypic values, were encoded in plink FAM files.

GBR-TSI allele frequency differences

Request a detailed protocol

Individuals from the GBR and TSI populations from 1000G Phase 3 (N = 189) (1000 Genomes Project Consortium et al., 2015) were assigned binary phenotype labels and a χ2 test was run using plink (Purcell et al., 2007) with a Hardy-Weinberg equilibrium cutoff of 1e-6 (--hwe 1e-6) and missing genotype rate of 0.05 (--geno 0.05), but otherwise with default parameters. Additionally, a firth adjusted logistic regression (Firth, 1993) was run and produced qualitatively similar results (data not shown).

IRL-GBR allele frequency differences

Request a detailed protocol

Unrelated individuals, defined using estimates from KING (Manichaikul et al., 2010), who self-identified as White British or White Irish in the UK Biobank were compared with distinct phenotype labels. Logistic regression (Hill et al., 2017) was run on the genotyped SNP set using plink2 (Chang et al., 2015) with a Hardy-Weinberg equilibrium cutoff of 1e-6 (--hwe 1e-6) and missing genotype rate of 0.05 (--geno 0.05).

Polygenic score analyses

Population genetic datasets

Request a detailed protocol

1000 Genomes Phase 3 VCF files were downloaded from the 1000 Genomes website, and VCF files for the Human Origins dataset were downloaded from the ‘Affymetrix Human Origins fully public dataset’ link on the Reich lab website and subsequently imputed to full genomes using the Michigan imputation server (Das et al., 2016). Because the Human Origins panel includes some 1000 Genomes populations, individual IDs were compared between the two datasets, and any duplicates were removed from the Human Origins dataset. Individuals were then clustered into populations based on groupings provided by each data resource, and allele frequencies were calculated using VCFtools version 0.1.15.

Neutrality tests

Request a detailed protocol

In Figure 1, we employ two separate tests to assess the evidence that the distribution of polygenic scores among populations is driven in part by adaptive divergence. Both are based on a simple null model introduced by Berg and Coop (2014), which states that the distribution of polygenic scores under neutrality should be approximately multivariate normal. Here, we give a brief overview of the assumptions and calculations underlying the null model, before describing the two tests used in Figure 1. For a more complete treatment, see Berg and Coop (2014).

Let p be the vector of population allele frequencies at SNP , while α is the effect size for SNP {1,,L}. Then, population level polygenic scores are given by

(1) Z=2αp.

Under neutrality, the distribution of polygenic scores among populations should be approximately

(2) ZMVN(μ1,2VAF)

where

(3) μ=2αϵ
(4) VA=2α2ϵ(1ϵ)

where ϵ is the mean of p across populations. The matrix 𝐅 gives the population level co-ancestry among populations. Here, we calculate the matrix 𝐅 directly from the same set of SNPs used to calculate polygenic scores, which is a conservative procedure. Concretely, let

(5) x=p-ϵϵ(1-ϵ).

Then, if 𝐗 is a matrix with the x as columns, we have

(6) 𝐅=1L-1𝐗𝐗T.

Now, based on this null model, we perform two separate neutrality tests. One is a general over-dispersion test (i.e. the 'QX test’ from Berg and Coop, 2014), for which the test statistic is

(7) QX=(Z-μ)T𝐅-1(Z-μ)2VA.

For M populations, this statistic is expected to have a χM-12 distribution under the multivariate normal null model (Equation 2). An unusually large value of QX indicates that the neutral null model is a poor fit, and is therefore taken as evidence in favor of selection.

We also apply a second, more specific test, to test for evidence of a correlation with a specific geographic axis that is unusually strong compared to the neutral expectation. For any vector Y, if Z has a multivariate normal distribution given by Equation 2, then

(8) YTZN(μYT1,2VAYTFY)

and therefore

(9) (YTZμYT12VAYTFY)2χ12

under the multivariate normal null. This fact can be used to test for an unexpectedly strong association between polygenic scores and a geographic axis by choosing Y to be the vector of latitudes or longitude across populations.

tSDS vs. GWAS significance

Polarizing SDS into tSDS

Request a detailed protocol

To analyze tSDS as a function of GWAS p-value, we first divided SNPs into 5% minor allele frequency bins. We standardized SDS values—subtracted the mean and divided by the standard deviation—within each bin. While SDS values were already standardized in a similar manner by Field et al. (2016), we re-standardized SDS because the post-filtering composition of SNPs after in each GWAS was variable across GWAS. We then assigned tSDS values to each SNP by polarizing SDS to the tall allele. In other words, we set

(10) tSDS:={SDS,{derived} = {tall}SDS,otherwise

where derived is the derived allele in UK10K (by which SDS was polarized in Field et al., 2016), and tall is the height increasing allele in the GWAS. We only used sites for which SDS values are available. Notably, this implicitly means that sites with minor allele frequency lower than 5% in UK10K were filtered out, due to the filtering used in Field et al. (2016).

Assessing significance of the correlation between GWAS p-value and tSDS

Request a detailed protocol

Figure 2 illustrates the correlation between tSDS and GWAS p-value (p-value for the strength of association with height). We assessed the significance of the correlation between the two while accounting for LD between SNPs. To do this, we used a blocked-jackknife approach (Kunsch, 1989; Busing et al., 1999) to estimate the standard error of our Spearman correlation point estimate, ρ^. For each GWAS, SNPs were assigned to one of b = 200 contiguous blocks based on concatenated genomic coordinates. tSDS values should not be correlated across such large blocks. For each block i, we computed the Spearman correlation in the i’th jackknife sample, ρ^(-i)b—that is the Spearman correlation across all SNPs but the SNPs in block i. We then estimated the standard error of the point Spearman estimate by σ^, where

σ^2=b1bi=1b(ρ^(i)bρb¯),

and.

ρb¯=1bi=1bρ^(i)b

is the average of jackknife samples. Finally, we compute a p-value for the null hypothesis.

H0:ρ=0,

by approximating ρ^ as Normally distributed under the null with standard deviation σ^, namely.

ρ^N(0,σ^).

LD Score regression

Request a detailed protocol

Summary statistics for traits were filtered and allele flipped using munge_sumstats.py (a python program provided by Bulik-Sullivan et al., 2015b), with the default filters. All regressions were performed using the LD Score Regression package, using the LD Scores derived from the 489 unrelated European individuals in 1000 Genomes Phase III and a modified SNP set that excluded the HLA, LCT, and chromosome eight inversion loci.

For genetic correlations of traits presented in Table 2, raw summary statistics were used. For other analyses, effect sizes of SNPs within each 1% minor allele frequency bin (as estimated by the 489 Europeans) were normalized to mean 0 and standard deviation 1, and those normalized statistics were used for downstream analyses. The standard two-step regression method from LD Score regression was used, with the default of 200 jackknife bins and a chi-square cutoff of 30, though results with UKB-GB were reasonably robust to a wide range of bin sizes and cutoffs.

Data availability statement

Request a detailed protocol

The GWAS generated from the UK Biobank for this paper have been uploaded to Dryad: https://doi.org/10.5061/dryad.mg1rr36

The study also makes use of various publicly available GWAS datasets:

The data from the GIANT consortium GWAS, conducted by Wood et al. (2014) is available at the GIANT consortium website: https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files

The UK Biobank GWAS of individuals of ‘White British’ ancestry only (UKB-GB), conducted by Churchhouse and Neale (2017), is available at:http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank

The UK Biobank GWAS of individuals of broadly European ancestral (UKB-Eur), conducted by Loh et al. (2017), is available at: https://data.broadinstitute.org/alkesgroup/UKBB/

Sibling GWAS data from Robinson et al. (2015) released in 2016 and now known to have been impacted by a computational bug, (R15-sibs, (Robinson et al., 2016)) as well as the newly rerun 2018 data (R15-sibs-updated, (Robinson and Visscher, 2018)) are both available at: http://cnsgenomics.com/data.html

Copies of these datasets are independently archived at: https://github.com/jjberg2/height-data.

Appendix 1

Expanded SNP Sets

Some analyses of polygenic score variation among populations have used many more than the SNPs we use in our main text analyses, in the hope of increasing power to detect adaptive divergence (e.g. Robinson et al., 2015). Here, we use three alternative ascertainment schemes that increase the number of SNPs used, and apply them to the UKB-GB GWAS to determine the resulting effect on the signal of selection:

20k

19,848 genotyped SNPs ascertained from the UKB-GB dataset by running plink’s clumping procedure with r2<0.1, a maximum clump size of 1Mb, p<0.01, and using 10,000 randomly selected unrelated British ancestry individuals as the reference for LD structure.

5k

4,880 SNPs with the smallest p values subsampled from the 20k ascertainment.

HapMap5k

5,675 SNPs ascertained from UKB-GB GWAS SNPs after first restricting to HapMap3 SNPs (International HapMap 3 Consortium, 2010), using the same plink clumping procedure as the 20k ascertainment. This HapMap3 ascertainment was performed in order to mimic the ascertainment in Robinson et al. (2015).

We also tested two alternative ascertainments of the R15-sibs-updated dataset, described immediately below. While the majority of this appendix focuses on analyses of the three alternate ascertainments of the UKB-GB datasets, described above, we include a brief analysis of these R15-sibs-updated alternate ascertainments.

R15-sibs-updated-3.5k

3,579 SNPs ascertained from the R15-sibs-updated dataset by running plink’s clumping procedure with r2<0.1 a maximum clump size of 1Mb, p<0.01, using the same set of 10,000 randomly selected unrelated British ancestry individuals as the reference for LD structure.

R15-sibs-updated-22k

22,243 SNPs ascertained under the same plink setting as R15-sibs-updated-3.5k, but with the p value threshold relaxed to p<0.1.

For each expanded SNP set, we applied both the general QX test for overdispersion, as well as the specific test for a correlation with latitude (both tests are outlined in the Materials and methods). In all three datasets, the relationship between polygenic scores and latitude was consistent with neutrality. However, in both the 20k and HapMap5k datasets, we can reject the neutral model, as the QX p value is 1.68 × 10−3 for 20k and 9.88 × 10−9 for HapMap5k. On the other hand, 5k is not significant, with a p value of 0.75.

We were concerned that the rejection of the neutral null with 20k and HapMap5k ascertainments may be partly due to the higher proximity of SNPs included—leading to deviations from the independent evolution assumption of the neutral model underlying the QX hypothesis test. To investigate this, we leveraged a decomposition of the QX statistic in terms of the underlying loci used to calculate the polygenic scores. Specifically, Berg and Coop (2014) showed that QX can be expressed in terms of and 'FST-like' component, which describes the extent to which loci underlying the polygenic scores are marginally overdispersed, and an 'LD-like' component, which describes the extent to which pairs of loci which affect the trait covary in their allele frequencies across populations. This decomposition can be written as

(A1) QX=(M1)2α2Var(p)VAFSTlike term+(M1)2ααCov(p,p)VALD-like term.

Here, we have assumed that the allele frequencies, pl, have been transformed so as to remove the influence of population structure. See the discussion surrounding equations 12-14 in Berg and Coop (2014) for a more complete explanation of this transformation.

Here, we extend this decomposition further, breaking the LD-like term into components as a function of the degree of physical separation of SNPs along the chromsome. Specifically, we define a set of partial QX statistics (pQX(k)), such that pQX(k) gives the contribution to from sites which are k SNPs apart on the chromosome:

(A2) pQX(k)=(M1)2,SkααCov(p,p)VA

where Sk denotes the set of SNP pairs which are k SNPs apart on the same chromosome (note that only SNPs included in the a given ascertianment are included for the purposes of counting how many SNPs apart any two SNPs are). So pQX(0) would give the ' FST term', while pQX(1) gives the component of the 'LD term' that comes from covariance between pairs of SNPs which do not have another SNP (that is included in the polygenic scores) physically located between them. pQX(2) gives the component that comes from covariance between pairs of SNPs separated by exactly one other SNP included in the polygenic scores, pQX(3) the component from SNPs separated by exactly two intervening SNPs, etc. We let S be the set of pairs which are on separate chromosomes, so that pQX() gives the contribution to QX coming from pairs of SNPs on different chromosomes. This decomposition retains the property that

(A3) QX=k=0KmaxpQX(k)+pQX(),

where Kmax is the maximum separation of two SNPs on any chromosome. We note that the pQX(k) terms are not independent of one another, but they are uncorrelated under the neutral null.

Appendix 1—figure 1
pQX(k) statistics for k=1:450 for the 20 k dataset.

The x axis gives the average physical distance between all pairs of SNPs contributing to a given pQX(k) statistic. The uptick in pQX(k) on the left side of the plot (i.e. small values of k) indicates that SNPs which are physically close to one another and have the same sign in their effect on height covary across population disproportionately as compared to more distant pairs of SNPs. Note that the number of pairs of SNPs (|Sk|) contributing to a given pQX(k) decreases as k increases, as smaller chromosomes have fewer pairs at larger distances than they do at shorter distances. This leads to a decrease in the variance of pQX(k) under the null as k increases. However, this decline in variance is not responsible for the decay in signal as k increases, as |Sk| remains approximately constant until well past the dashed vertical line, which indicates the distance between between the ends of chromosome 21 (the shortest chromosome, and therefore the first to drop out of the pQX(k) calculation).

https://doi.org/10.7554/eLife.39725.019
Appendix 1—figure 2
pQX(k) statistics for k=1:150 for the 5 k dataset.

The x axis gives the average physical distance between all pairs of SNPs contributing to a given pQX(k) statistic. The boxplots give an empirical null distribution of pQX(k) statistics derived from permuting the signs of all effect sizes independently (this empirical null was omitted from Appendix 1—figure 1 due to computational expense). In this case, SNPs that are physically close to one another do not contribute disproportionately to the signal.

https://doi.org/10.7554/eLife.39725.020
Appendix 1—figure 3
pQX(k) statistics for k=1:150 for the HapMap5k dataset.

The x axis gives the average physical distance between all pairs of SNPs contributing to a given pQX(k) statistic. The boxplots give an empirical null distribution of pQX(k) statistics derived from permuting the signs of all effect sizes independently (this empirical null was omitted from Appendix 1—figure 1 due to computational expense). The uptick in signal from pairs of SNPs physically nearby to one another is present in this dataset, again suggesting a role for physical linkage in contributing to the signal. However, note that in contrast to the 20 k and 5 k ascertainments, the HapMap5k ascertainment also has a large amount of signal from pQX(), which cannot be explained by linkage.

https://doi.org/10.7554/eLife.39725.021

In Appendix 1—figure 4 and 5 of this Appendix, we show the pQx statistics for various k values in these three different ascertainments. In both the 20 k and the HapMap5k ascertainments, pQX is higher for low k values–that is there is more signal coming from covariance among SNP pairs which are physically close to one another on the chromosome than from distant pairs. This indicates a role for linkage in generating the signals detected in these two ascertainments (we also observed these sort of signals in the R15-sibs-updated-3.5k and R15-sibs-updated-22k ascertainments; Appendix 1—figure 4 and 5). In contrast, we see no linkage-associated signal in the 5 k ascertainment (in fact, we see no signal whatsoever). The major difference between the signal we observe in the 20 k ascertainment and that in the HapMap5k ascertainment is that pQX() is strongly positive for the HapMap5k ascertainment, whereas it is weakly negative for the 20 k ascertainment. This difference in the strength of between-population LD between loci on separate chromosomes is largely responsible for the fact that the neutral null hypothesis is strongly rejected for the 20 k ascertainment, but only weakly so for the HapMap5k ascertainment.

Appendix 1—figure 4
pQX(k) statistics for the R15-sibs-updated-3.5k ascertainment.

Similar to the expanded UKB-GB ascertainments, the elevated signal from covariance among SNPs in adjacent bins suggests that the independence assumption of the neutral model is being violated.

https://doi.org/10.7554/eLife.39725.022
Appendix 1—figure 5
pQX(k) statistics for the R15-sibs-updated-22k.

Similar to Appendix 1—figure 4, the elevated signal from covariance among SNPs in adjacent or nearby bins suggests that the independence assumption of the neutral model is being violated. Similar to Appendix 1—figure 1, we omitted the sign flipping null due to computational expense.

https://doi.org/10.7554/eLife.39725.023

This heterogeneity of signals across different ascertainments suggests that the signals we do observe are unlikely to be the result of selection—but rather result from some other process or phenomenon which we do not fully understand. Perhaps the most unusual observation is the fact that the among chromosome component of QX (i.e. pQX()) is so strong from HapMap5k, when it is absent under all other ascertainments. This suggests a role for some ascertainment bias impacting SNPs included in the HapMap3 SNP set. This seems plausible, as SNPs included in the HapMap3 SNP set have an elevated minor allele frequency as compared to a genome-wide sample. While it seems plausible that patterns of among population LD would differ for SNPs included on genotyping platforms, it is not clear why among-population LD should be systematically positive with respect to the SNPs’ effect on height.

To better understand the signal observed in the HapMap5k ascertainment, we make use of an alternate decomposition of the QX statistic (Berg et al., 2017; Josephs et al., 2018). First, write the eigenvector decomposition of 𝐅 as 𝐔𝚲𝐔𝐓. The mth column of 𝐔 (Um) gives the mth eigenvector of 𝐅, and the mth diagonal entry of 𝚲 (λm) gives the mth eigenvalue of 𝐅. Note that because this eigen-decomposition is performed on the population level covariance matrix, they capture only the major axes of variation among our pre-specified population labels, in contrast to how PCA is usually done at the individual level in demographic inference applications. Now, we can define a statistic

(A4) QU(m)=((Z-μ)TUm)22λmVA

which has a χ12 distribution under the neutral null hypothesis. These statistics, like the pQX statistics, have the property that QX is given simply by their sum:

(A5)QX=(Zμ)TF1(Zμ)2VA(A6)=(Zμ)TUΛ1UT(Zμ)2VA(A7)=m((Zμ)TUm)22λmVA(A8)=mQU(m).

An unusually large value of QU(m) for a given choice of m is an indication that the polygenic scores are more strongly correlated with the mth axis of population structure than expected under the neutral null model. Therefore, once a signal is detected with QX, the QU statistics can be used to understand which specific axes of divergence among populations are responsible for generating the signal in QX.

In Appendix—figure 6, we show a quantile-quantile plot of the -log10 p values for the HapMap5k ascertainment, derived from comparing these QU statistics from the European set of populations to the χ12 distribution. It is particularly noteworthy that the signal in this ascertainment is diffuse, resulting from inflation of nearly all of the QU statistics, rather than just a few. This is a statement that the signal detected in the HapMap5k ascertainment results from the polygenic scores simply being more variable along all axes, rather than one particular axis of population structure. In general, we are skeptical that this represents a real signal of selection, particularly given how sensitive it is to ascertainment.

Appendix 1—figure 6
The QQ plot of -log10 P values for the QU statistics calculated a within-Europe sample using the HapMap5k ascertainment.

The systematic inflation of QU indicates a non-specific rejection of neutrality: polygenic scores are more variable in all directions than expected under the null model. This pattern is not expected under adaptive divergence of labeled populations.

https://doi.org/10.7554/eLife.39725.024

One biological hypothesis is that the HapMap5k ascertainment could suggest ancient assortative mating on the basis of height. Specifically, our neutral null model assumes that all loci drift independently. However, assortative mating on the basis of a phenotype will lead to a build-up of within population LD that is positive with respect to the direction of allelic effects on the trait—even among distant or unlinked alleles. As populations drifted apart, within-population LD due to assortative mating would get converted into among-population LD—causing a deviation from our null-model assumption of independent evolution across all loci. This phenomenon would result in populations drifting apart in height-associated loci faster than expected by the rest of the genome. This hypothesis is consistent with the diffusion of QX across all QU terms in HapMap5k. This hypothesis is also consistent with higher pQX for physically proximate SNPs, as assortative mating would leads to a stronger buildup of trait LD among pairs of loci which are tightly linked than for those that are not, which would lead to stronger among population LD among these loci as populations diverge

However, under this hypothesis, it is not clear why we would expect the uptick in pQx(k) for small k to be present in the HapMap5k and 20 k datasets but not the 5 k dataset, or why the pQx() signal should be present in only the HapMap5k dataset. At this point, we leave the assortative mating hypothesis outlined above as purely speculative, and leave further investigations for future work.

It also possible that all of the signals seen here are entirely the result of a violation of the assumption of independence among SNPs under the null model. This may be the case even for pQx() signals. Consider for example 3 SNPs, with SNPs 1 and 2 adjacent to one another on a chromosome, and SNP three located on a different chromosome. pQx() would include the covariance between SNP 1 and 3, and that between 2 and 3. However, if SNPs 1 and 2 covary, then these two covariance terms will be correlated with one another. Therefore, the variance of the pQx() term will be underestimated by a null that assumes independence among SNPs, even though all of the terms that contribute to it come from covariance among SNPs on separate chromosomes.

https://doi.org/10.7554/eLife.39725.018

Appendix 2

Robustness of differences in differentiation signal to filtering schemes

Here, we explore whether the failure to replicate GIANT signals with UKB-GB could be explained as a result of differences in filtering of SNPs in one or the other dataset. This analysis was performed by Anja Moltke Jørgensen and Fernando Racimo, and doubles as a demonstration of the failure to replicate the GIANT signal of excess among population variance (Turchin et al., 2012; Berg and Coop, 2014; Robinson et al., 2015; Zoledziewska et al., 2015; Berg et al., 2017) that was performed independently from that in the main text.

We focused on present-day populations from phase 3 of the 1000 Genomes Project (1000 Genomes Project Consortium et al., 2015). We divided the genome into 1700 approximately independent LD blocks, using fgwas (Pickrell, 2014; Berisa and Pickrell, 2016), and extracted, for each of the two GWAS for height, the SNP with the highest posterior probability of association (PPA) from each block, using. This resulted in a total of 1700 SNPs (one per block). Unless otherwise stated, we computed scores using the subset of these SNPs that were located in blocks with high per-block posterior probability of association (PPA>95%), after retrieving the allele frequencies of these SNPs in the 1000 Genomes population panels, using glactools (Renaud, 2018). We tested different types of filters to assess how they influenced the results.

Appendix 2—figure 2 (upper row) shows that genetic scores computed for each of the 1000 Genomes phase three populations. In each plot below in which we report a P-value, this P-value comes from calculating the QX statistic, and assuming this statistic is chi-squared distributed (Berg and Coop, 2014; Berg et al., 2017). The candidate SNPs used for calculating the genetic scores were filtered so that the average minor allele frequency across populations was more than or equal to 5%.

To investigate the effect of the per-block posterior probability of association (Block PPA) on the genetic scores, we also used two alternative PPA thresholds for including a block in the computation of the PPA score: 0 (i.e. including all blocks, lower row of Appendix 2—figure 2) and 0.5 (middle row of Appendix 2—figure 2) shows that this filtering has little effect in the difference in results between the two GWASs.

Appendix 2—figure 1
Present-day populations from 1000 Genomes Project Phase 3 used to build population-level polygenic scores, colored by their respective super-population code.
https://doi.org/10.7554/eLife.39725.026
Appendix 2—figure 2
Genetic scores in present-day populations, colored by their super-population code, and created using different block-PPA thresholds.

Left column: Wood et al. (2014) GWAS. Right column: Neale lab UK Biobank GWAS.

https://doi.org/10.7554/eLife.39725.027

To visualize the contribution of each SNP to the difference in scores between two populations with high differentiation in the Wood et al. GWAS (CHB and CEU), we produced a contour plot in which we display the absolute effect size of each SNP contributing in the computation of the genetics scores, plotted as a function of the difference in the frequency of the trait-increasing allele for that SNP in the two populations (Appendix 2—figure 3).

Appendix 2—figure 3
Distribution of the absolute value of effect sizes (y-axis) plotted as a function of the difference in frequency of the trait-increasing allele between CEU and CHB (x-axis), for candidate SNPs used to build genetic scores.

Top left: trait-associated SNPs from Wood et al., with effect sizes from the same GWAS. Top right: trait-associated SNPs from the Neale lab GWAS, with effect sizes from the same GWAS. Bottom left: trait-associated SNPs from Wood et al., but with their corresponding effect sizes from the Neale lab GWAS. Bottom right: trait-associated SNPs from the Neale lab GWAS, but with their corresponding effect sizes from Wood et al. Contour colors denote the density of SNPs in different regions of each plot.

https://doi.org/10.7554/eLife.39725.028

Appendix 2—figure 3 shows that the distribution of the difference in scores between the two populations is shifted in favor of CEU when using the Wood et al. dataset, but not when using the UKB dataset. When selecting SNPs via PPAs from the Wood et al. dataset but using their UKB effect sizes, the distribution of differences is also shifted in favor of CEU, but this does not occur when performing the converse: using PPAs from UKB to select SNPs, but plotting their effect sizes from Wood et al.

This figure also reveals that there are a number of SNPs in the UKB dataset with high effect sizes and very small differences in allele frequency between the two populations. These SNPs tend to have allele frequencies near the boundaries of extinction or fixation in both populations, suggesting they could possibly be under the influence of negative selection. To investigate the contribution of these high-effect SNPs on the overall genetic scores with the UKB dataset, we removed their corresponding blocks from the score computation, and re-calculated the genetic scores for all populations. We chose a minimum absolute effect size equal to 0.08 for removal of SNPs, and the 6 SNPs in the UKB dataset which are above this threshold were therefore excluded from the analysis. This filtering, however, does not seem to serve to recover the Wood et al. signal (Appendix 2—figure 4).

Appendix 2—figure 4
Genetic scores for present-day populations, after excluding 6 high-effect SNPs from UKB, colored by super-population code.

Left: Wood et al. GWAS. Right: Neale lab UK Biobank GWAS.

https://doi.org/10.7554/eLife.39725.029

In Appendix 2—figure 5 we restrict the candidate SNPs used, by only allowing SNPs that have minor allele frequencies larger than 0.05 in all populations. This is different from our previous default allele frequency filtering, in which we only required the average of the minor allele frequency across populations to be larger than 0.05. Nevertheless, this filtering does not recover the Wood et al. signal either.

Appendix 2—figure 5
Genetic scores computed only with SNPs that have minor allele frequencies larger than 0.05 in all populations.

Left: Wood et al. GWAS. Right: Neale lab UK Biobank GWAS.

https://doi.org/10.7554/eLife.39725.030

We also looked into whether the candidate SNPs found using the UK Biobank dataset were also present in the Wood et al. GWAS, but perhaps with much smaller effect sizes, and this was somehow affecting the genetic scores made using the UKB data. In Appendix 2—figure 6 all UK Biobank candidate SNPs that were also found in Wood et al. were evaluated and if a SNP’s absolute effect size in Wood et al. was smaller than or equal to 0.05, the SNP was excluded from the UK Biobank candidate set.

Appendix 2—figure 6
Genetic scores computed using the UK Biobank data, after removing SNPs with absolute effect sizes smaller than or equal to 0.05 in Wood et al.
https://doi.org/10.7554/eLife.39725.031

We also excluded all UKB-candidate SNPs found in Wood et al. with absolute effect sizes smaller than or equal to 0.01, and recomputed the scores using the UK Biobank effect sizes (Appendix 2—figure 7).

Appendix 2—figure 7
Genetic scores computed using the UK Biobank data, after removing SNPs with absolute effect sizes smaller than or equal to 0.01 in Wood et al.
https://doi.org/10.7554/eLife.39725.032
https://doi.org/10.7554/eLife.39725.025

Appendix 3

LD Score regression and linked selection

In this section we discuss how linked selection, specifically background selection (BGS), may be a potential confounder of LD Score regression. In the first section we discuss the intuition behind univariate LD Score regression and how BGS can cause a correlation between LD Score and allele frequency differentiation. In the second section we show empirically how LD Score and BGS covary across the genome, and how this can account for the empirical patterns of LD Score correlating with allele frequency differentiation. In the third section we show the BGS confounding of the slope and intercept of the univariate LD Score regression. In the final section we work through bivariate LD Score regression and show that it can be used to highlight the confounding of GWAS by specific axes of population structure.

Through this supplement we discuss the potential issue with linked selection in terms of BGS. However, it is likely that basic intuition of theses results, that is that linked selection is confounder of LD Score regression, apply more generally to other models of linked selection (e.g. selective sweeps).

Background

Bulik-Sullivan et al. (2015a) and Bulik-Sullivan et al. (2015b) introduced LD Score regression as a robust way to assess the impact of population structure confounding on GWAS, and to robustly assess heritabilities and genetic correlations in GWAS even in the presence of such confounding. The LD Score of a SNP (i) is found by summing up LD (R2) in a genomic window of W surrounding SNPs:

(A9) i=j=0WRi,j2.

Following the logic laid out in the appendix of Bulik-Sullivan et al. (2015b), consider a GWAS done using a sample drawn from two populations, with a sample of N/2 draws from each population. The trait is controlled by a very large number of loci (M), and the total narrow-sense heritability of the trait is hg2. The GWAS is partially confounded by population structure, as the squared difference in mean phenotype between the populations is a, and the allele frequency differentiation between the populations is FST. The expected χi2 statistic of the trait association of the ith SNP is

(A10) 𝐄[χi2]=Nhg2Mi+1+aNFST,

following Equation 2.7 of Bulik-Sullivan et al. (2015b).

The basic idea of LD Score regression is that we regress χi2 on i, the deviation of the estimated intercept away from 1 gives aNFST, the confounding by population structure, while the slope of the regression gives Nhg2M. Underlying this separation of the confounding effects of population structure (aFST) and the heritability (hg2) is the assumption that FST is not correlated with LD Score. However, as noted by Bulik-Sullivan et al. (2015b) this assumption may be violated by background selection (BGS). In short, regions of low recombination (and thus higher LD Score) experience more BGS—which in turn drives higher FST (Charlesworth, 1998).

To a first approximation, the effects of strong BGS in a well-mixed, constant-sized population can be modeled by a reduction in the effective population size, as the rate of drift increases in regions subject to BGS. We can express this mathematically by saying that SNP i experiences an effective population size BiNe, where Ne is the effective population size in the absence of BGS and Bi is the reduction due to BGS. The expected LD between SNP i and another SNP Lbp apart is

𝐄(R2)1/(1+4NeBirBP,iL)

where rBP,i is the recombination rate surrounding SNP i.

FST, in turn, is a decreasing function of NeBi. For example, if the two populations at hand split T generations ago, without subsequent gene-flow or population size changes,

(A11) 𝐄(FST)T/(4NeBi)

(this approximation holds for small values of T/Ne). Similar inverse dependences of FST on Bi can be derived in other models of weak population structure (Charlesworth, 1998).

Empirical results on LD Score and BGS

To explore the empirical relationship between LD Score, recombination rate and BGS we make use of the B values estimated along the human genome by McVicker et al. (2009). We use the 1000 Genomes CEU LD Scores (Bulik-Sullivan et al., 2015b), and the Kong et al. (2010) recombination rates (the latter are standardized by the genome-wide average recombination rate).

In Figure 1 we plot the LD Score, averaged in 100 kb windows, as a function of recombination rates and McVicker’s B values. As expected, LD Scores are higher in regions of low recombination and regions of stronger background selection (lower B). Based on a simple model of BGS (Equation A11), FST1/B. Therefore in Figure 2 we plot the relationship between LD Scores and 1/B values each averaged in 30 quantiles of LD Score.

In the main text (Figure 5A and Figure 5—figure supplement 1) we plotted the relationship between LD Score and the χ2 statistic for allele frequency differentiation. To make our χ2 statistic comparable to FST we standardized it. To do this we note that because population membership is not a genetic trait, setting h2=0 in Equation A10 we obtain

(A12) E[χi2]=1+aNFST,

Therefore, to make our χi2 statistic comparable to FST we standardize our χi2 as:

(A13) (χi21)/χi2¯,

where the overbar in the denominator signifies a genome-wide average. In Figure 2 we plot the expected relationship between LD Score and standardized χi2 predicted under our simple BGS model (using McVicker B values as an estimate for the intensity of background selection). We compare it to the the empirical relationship between LD Score and the standardized χi2 statistics for the Irish-British and GBR-TSI allele frequency differences. The agreement between the empirical results and the BGS-theoretical predictions is reasonable, suggesting that a model of BGS, as parameterized by McVicker’s B, could explain the confounding in LD Score regression by linked selection.

Appendix 3—figure 1
Windows with lower recombination rates and B values have higher LD Scores.

The autosome is divided into 100 kb windows and the average LD Score, B-value, and standardized recombination rate is calculated in each bin. The red lines are a lowess fit as a guide to the eye.

https://doi.org/10.7554/eLife.39725.034
Appendix 3—figure 2
A plot across 30 quantiles of genome-wide LD Score of our simple BGS model of differentiation, parameterized by McVicker’s B (Equation A11).
https://doi.org/10.7554/eLife.39725.035
Appendix 3—figure 3
A plot across 30 quantiles of LD Score a standardized χ2 (Equation A11) of allele frequency differentiation (black dots) and that expected under our simple BGS model parameterized by McVicker’s B (red dots, Equation A11, standardized by its genome-wide mean).

Note that the red dots are the same values in both panels, and match those given in Figure 2.

https://doi.org/10.7554/eLife.39725.036

Predicted effect on linked selection on the slope and intercept of LD Score regression.

The expectations of the slope and intercept of univariate LD Score were derived in the absence of linked selection. In this section we show how these expectations can be distorted by BGS.

In the regression of χi2i the slope is:

(A14)βχ2;=Cov(χi2,i)Var(i))(A15)=Nhg2MVar(i)+aNCov(i,FST,i)Var(i)(A16)=Nhg2M+aNβFST;

where βFST; is the slope of FST regressed on LD Score. Therefore the slope of the univariate LD Scor regression is biased upwards by linked selection. The intercept is

(A17) αχ2;=χ2¯βχ2;¯=1+aN(FST¯βFST;¯),

where the bars denote genome-wide averages. In other words, the intercept is suppressed by aNβFST;¯. Another useful way to write the intercept is

(A18) αχ2;=aNFST¯(1βFST;FST¯¯),

as βFST;FST¯ is the slope of the FSTiFST¯i¯ regression—that is the effect of LD Score on the relative reduction in FST from its mean.

Using LD Score regression to assess ‘genetic correlations’ with allele frequency differentiation.

In the main text we plot the (Height GWAS effect size) × (Allele frequency difference) LD Score regression (Figure 4D–F and Figure 4—figure supplement 3). In a number of cases we see a strong intercept for this regression, and in some cases a significant slope. Here we show how a non-zero intercept may be a signal of stratification in the original GWAS along the axis represented by the allele frequency difference, while a non-zero slope may demonstrate that this stratification has interacted with BGS.

The logic of assessing genetic correlations via LD Score regression (Bulik-Sullivan et al., 2015a) is that at each SNP (i) we have a pair (Zi,1,Zi,2): scores for phenotypes 1 and 2 and the genetic correlation (ρg) between the phenotypes is captured by the slope of the regression (Zi,1Zi,2)i. Imagine that these Z’s were estimated by conducting a GWAS of the two traits in a sample of size N1 and N2 respectively, with a sample overlap of Ns individuals. The intercept of this regression, under the assumptions of Bulik-Sullivan et al. (2015a), is determined by the phenotypic correlation (ρ) in the NS overlapping samples. Bulik-Sullivan et al. (2015a) show that under their assumptions of no stratification and no linked selection,

(A19) 𝐄[Zi,1Zi,2]=N1N2ρgMi+NsρN1N2

Yengo et al. (2018) extended this to the case of a phenotype from a stratified population. Consider as before a population that consists of two equally sized samples from two populations with allele frequency differentiation FST. The difference in mean phenotype 1 and 2 between the two populations are σ1 and σ2 respectively. Yengo et al. (2018) show that

(A20) 𝐄[Zi,1Zi,2]=N1N2ρgMi+NsρN1N2+ρgFST2N1N2+Ns2FSTσ1σ2N1N2.

(This is equation (17) of Yengo et al. (2018), up to slight differences in notation.)

Let us return to our case of the LD Score regression of (Height GWAS effect size) × (Allele frequency difference). Assume for the moment that our ‘Allele frequency difference’ (e.g. [GBR-TSI]) measures the difference in allele frequency between the two populations stratifying our GWAS. In our case, let phenotype 1 be a phenotype (e.g. height) and let 2 be an individual’s population membership (e.g. 1 if in population 1 and 0 if in population 2) Zi,H and the Zi,P score-proxy of the allele frequency difference. The two phenotypes are measured in the same cohort (such that N1=N2=NS. The difference in mean phenotype (height) between the two populations is σ1. The mean difference in population membership is 1. As we can assume that population membership is not a genetic trait it follows that ρg=0. However, there is a ‘phenotypic’ correlation between population membership and height, as height differs between our two populations stratifying our GWAS (ρ=σ1×1). Following the logic of Equation A20 then

(A21) 𝐄[Zi,HZi,P]Aσ1+CFSTσ1

where A and C are constants. Note the strong similarity of Equation A21 to the univariate LD Score regression for allele frequency χ2 (Equation A12). In reality the population samples (GBR and TSI) used to assess European N-S allele frequencies differences, in Figure 4D–F, and related figures, are not the population samples used in the GWAS. However, the spirit of Equation A21 holds if the confounding in a GWAS falls along this N-S axis. A significant intercept of this regression potentially indicates that some portion of the phenotypic variance (e.g. height) in the GWAS samples was confounded by residual N-S population structure and this problem has been transmitted through into the GWAS effect sizes. This LD Score regression is not necessarily expected to have any slope as Equation A21 does not include the LD Score (i). However, if the population structure confounding (FST) in the GWAS samples is correlated with LD Score (i), for example due to BGS, then a slope will be induced (in a manner similar to Equation A16).

https://doi.org/10.7554/eLife.39725.033

Appendix 4

Testing QFAM’s immunity to population stratification confounding

We set out to evaluate how effect sizes estimated in the sibling-based GWAS as implemented in plink’s QFAM procedure are affected by population stratification. To this end, we added an artificial bias to the height of UK Biobank individuals along PC5-axis (we used PC5, among the top 40 PCs provided by the UK Biobank, along which the British ancestry individuals are most variable). We considered two cases. First, we added a bias proportional to the mean PC5 score in the family. Specifically, we set

(A22) Yifam-bias=Yi+2|F(i)|ΣjF(i)γj

where Yi is an individuals actual recorded height in the UK Biobank, F(i) indexes all siblings in individual i’s family, and γj gives individual j’s projection onto PC5. This induced bias mimics an environmental contribution to the trait that varies with genetic ancestry across families but not within a family.

Second, we added a bias proportional to the individual’s PC5 score:

(A23) Yiind-bias=Yi+2γi.

This mimics a scenario where there is a real genetic gradient in height along PC5. Height and PC5 score values were standardized as described in the section on newly calculated GWAS under Materials and methods.

In the first case, QFAM within-family effect size estimates are identical with and without including the bias, illustrating that plink v1.9b5’s implementation correctly accounts for cross-family population structure (Appendix 4—figure 1A). Further, there is no correlation between the effect sizes and SNP loadings on PC5 (Figure Appendix 4—figure 1B, Pearson p = 0.81).

Appendix 4—figure 1
QFAM effect size estimates, under two population stratification scenarios.

Top Row: Height values made biased along PC5-axis, proportional to the mean PC5 scores within family. (A) The x- and y-axes show effect size estimates without and with the added bias, respectively. (B) The x-axis shows the SNP loadings on PC5. Bottom Row: Height values made biased along PC5-axis, proportional to individuals’ PC5 scores. (C) The same plot as panel (A), but with individual-level bias. (D) The same plot as panel (B), but with individual-level bias. All results are shown for 11,611 SNPs on chromosome one for which PC loadings where provided by the UK Biobank.

https://doi.org/10.7554/eLife.39725.038

In the second case, the within-family effect size estimates are biased (Figure Appendix 4—figure 1C), proportional to the SNP contributions to PC5 (Figure Appendix 4—figure 1D, Pearson p∼10-67). These results, however, do not reflect an issue with plink’s implementation. Rather, they show that even correctly implemented family based studies can lead to biased effect size estimates if variation in ancestry segregates among siblings, provided that the different ancestries have different mean genetic contributions to the phenotype. There is no reason to think that this phenomenon is responsible for patterns seen in any of the real sibling datasets we analyze, but we present it here for completeness.

https://doi.org/10.7554/eLife.39725.037

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
    Delete-m Jackknife for Unequal m
    1. F Busing
    2. E Meijer
    3. R Leeden
    4. R Van Der Leeden
    (1999)
    Statistics and Computing 9:3–8.
  10. 10
  11. 11
    Second-generation PLINK: rising to the challenge of larger and richer datasets
    1. CC Chang
    2. CC Chow
    3. LC Tellier
    4. S Vattikuti
    5. SM Purcell
    6. JJ Lee
    (2015)
    GigaScience, 4, 10.1186/s13742-015-0047-8, 25722852.
  12. 12
    The effect of deleterious mutations on neutral molecular variation
    1. B Charlesworth
    2. MT Morgan
    3. D Charlesworth
    (1993)
    Genetics 134:1289–1303.
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
    Hundreds of variants clustered in genomic loci and biological pathways affect human height
    1. H Lango Allen
    2. K Estrada
    3. G Lettre
    4. SI Berndt
    5. MN Weedon
    6. F Rivadeneira
    7. CJ Willer
    8. AU Jackson
    9. S Vedantam
    10. S Raychaudhuri
    11. T Ferreira
    12. AR Wood
    13. RJ Weyant
    14. AV Segrè
    15. EK Speliotes
    16. E Wheeler
    17. N Soranzo
    18. JH Park
    19. J Yang
    20. D Gudbjartsson
    21. NL Heard-Costa
    22. JC Randall
    23. L Qi
    24. A Vernon Smith
    25. R Mägi
    26. T Pastinen
    27. L Liang
    28. IM Heid
    29. J Luan
    30. G Thorleifsson
    31. TW Winkler
    32. ME Goddard
    33. K Sin Lo
    34. C Palmer
    35. T Workalemahu
    36. YS Aulchenko
    37. A Johansson
    38. MC Zillikens
    39. MF Feitosa
    40. T Esko
    41. T Johnson
    42. S Ketkar
    43. P Kraft
    44. M Mangino
    45. I Prokopenko
    46. D Absher
    47. E Albrecht
    48. F Ernst
    49. NL Glazer
    50. C Hayward
    51. JJ Hottenga
    52. KB Jacobs
    53. JW Knowles
    54. Z Kutalik
    55. KL Monda
    56. O Polasek
    57. M Preuss
    58. NW Rayner
    59. NR Robertson
    60. V Steinthorsdottir
    61. JP Tyrer
    62. BF Voight
    63. F Wiklund
    64. J Xu
    65. JH Zhao
    66. DR Nyholt
    67. N Pellikka
    68. M Perola
    69. JR Perry
    70. I Surakka
    71. ML Tammesoo
    72. EL Altmaier
    73. N Amin
    74. T Aspelund
    75. T Bhangale
    76. G Boucher
    77. DI Chasman
    78. C Chen
    79. L Coin
    80. MN Cooper
    81. AL Dixon
    82. Q Gibson
    83. E Grundberg
    84. K Hao
    85. M Juhani Junttila
    86. LM Kaplan
    87. J Kettunen
    88. IR König
    89. T Kwan
    90. RW Lawrence
    91. DF Levinson
    92. M Lorentzon
    93. B McKnight
    94. AP Morris
    95. M Müller
    96. J Suh Ngwa
    97. S Purcell
    98. S Rafelt
    99. RM Salem
    100. E Salvi
    101. S Sanna
    102. J Shi
    103. U Sovio
    104. JR Thompson
    105. MC Turchin
    106. L Vandenput
    107. DJ Verlaan
    108. V Vitart
    109. CC White
    110. A Ziegler
    111. P Almgren
    112. AJ Balmforth
    113. H Campbell
    114. L Citterio
    115. A De Grandi
    116. A Dominiczak
    117. J Duan
    118. P Elliott
    119. R Elosua
    120. JG Eriksson
    121. NB Freimer
    122. EJ Geus
    123. N Glorioso
    124. S Haiqing
    125. AL Hartikainen
    126. AS Havulinna
    127. AA Hicks
    128. J Hui
    129. W Igl
    130. T Illig
    131. A Jula
    132. E Kajantie
    133. TO Kilpeläinen
    134. M Koiranen
    135. I Kolcic
    136. S Koskinen
    137. P Kovacs
    138. J Laitinen
    139. J Liu
    140. ML Lokki
    141. A Marusic
    142. A Maschio
    143. T Meitinger
    144. A Mulas
    145. G Paré
    146. AN Parker
    147. JF Peden
    148. A Petersmann
    149. I Pichler
    150. KH Pietiläinen
    151. A Pouta
    152. M Ridderstråle
    153. JI Rotter
    154. JG Sambrook
    155. AR Sanders
    156. CO Schmidt
    157. J Sinisalo
    158. JH Smit
    159. HM Stringham
    160. G Bragi Walters
    161. E Widen
    162. SH Wild
    163. G Willemsen
    164. L Zagato
    165. L Zgaga
    166. P Zitting
    167. H Alavere
    168. M Farrall
    169. WL McArdle
    170. M Nelis
    171. MJ Peters
    172. S Ripatti
    173. JB van Meurs
    174. KK Aben
    175. KG Ardlie
    176. JS Beckmann
    177. JP Beilby
    178. RN Bergman
    179. S Bergmann
    180. FS Collins
    181. D Cusi
    182. M den Heijer
    183. G Eiriksdottir
    184. PV Gejman
    185. AS Hall
    186. A Hamsten
    187. HV Huikuri
    188. C Iribarren
    189. M Kähönen
    190. J Kaprio
    191. S Kathiresan
    192. L Kiemeney
    193. T Kocher
    194. LJ Launer
    195. T Lehtimäki
    196. O Melander
    197. TH Mosley
    198. AW Musk
    199. MS Nieminen
    200. CJ O'Donnell
    201. C Ohlsson
    202. B Oostra
    203. LJ Palmer
    204. O Raitakari
    205. PM Ridker
    206. JD Rioux
    207. A Rissanen
    208. C Rivolta
    209. H Schunkert
    210. AR Shuldiner
    211. DS Siscovick
    212. M Stumvoll
    213. A Tönjes
    214. J Tuomilehto
    215. GJ van Ommen
    216. J Viikari
    217. AC Heath
    218. NG Martin
    219. GW Montgomery
    220. MA Province
    221. M Kayser
    222. AM Arnold
    223. LD Atwood
    224. E Boerwinkle
    225. SJ Chanock
    226. P Deloukas
    227. C Gieger
    228. H Grönberg
    229. P Hall
    230. AT Hattersley
    231. C Hengstenberg
    232. W Hoffman
    233. GM Lathrop
    234. V Salomaa
    235. S Schreiber
    236. M Uda
    237. D Waterworth
    238. AF Wright
    239. TL Assimes
    240. I Barroso
    241. A Hofman
    242. KL Mohlke
    243. DI Boomsma
    244. MJ Caulfield
    245. LA Cupples
    246. J Erdmann
    247. CS Fox
    248. V Gudnason
    249. U Gyllensten
    250. TB Harris
    251. RB Hayes
    252. MR Jarvelin
    253. V Mooser
    254. PB Munroe
    255. WH Ouwehand
    256. BW Penninx
    257. PP Pramstaller
    258. T Quertermous
    259. I Rudan
    260. NJ Samani
    261. TD Spector
    262. H Völzke
    263. H Watkins
    264. JF Wilson
    265. LC Groop
    266. T Haritunians
    267. FB Hu
    268. RC Kaplan
    269. A Metspalu
    270. KE North
    271. D Schlessinger
    272. NJ Wareham
    273. DJ Hunter
    274. JR O'Connell
    275. DP Strachan
    276. HE Wichmann
    277. IB Borecki
    278. CM van Duijn
    279. EE Schadt
    280. U Thorsteinsdottir
    281. L Peltonen
    282. AG Uitterlinden
    283. PM Visscher
    284. N Chatterjee
    285. RJ Loos
    286. M Boehnke
    287. MI McCarthy
    288. E Ingelsson
    289. CM Lindgren
    290. GR Abecasis
    291. K Stefansson
    292. TM Frayling
    293. JN Hirschhorn
    (2010)
    Nature 467:832–838.
    https://doi.org/10.1038/nature09410
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
    Defining the role of common variation in the genomic and biological architecture of adult human height
    1. AR Wood
    2. T Esko
    3. J Yang
    4. S Vedantam
    5. TH Pers
    6. S Gustafsson
    7. AY Chu
    8. K Estrada
    9. J Luan
    10. Z Kutalik
    11. N Amin
    12. ML Buchkovich
    13. DC Croteau-Chonka
    14. FR Day
    15. Y Duan
    16. T Fall
    17. R Fehrmann
    18. T Ferreira
    19. AU Jackson
    20. J Karjalainen
    21. KS Lo
    22. AE Locke
    23. R Mägi
    24. E Mihailov
    25. E Porcu
    26. JC Randall
    27. A Scherag
    28. AA Vinkhuyzen
    29. HJ Westra
    30. TW Winkler
    31. T Workalemahu
    32. JH Zhao
    33. D Absher
    34. E Albrecht
    35. D Anderson
    36. J Baron
    37. M Beekman
    38. A Demirkan
    39. GB Ehret
    40. B Feenstra
    41. MF Feitosa
    42. K Fischer
    43. RM Fraser
    44. A Goel
    45. J Gong
    46. AE Justice
    47. S Kanoni
    48. ME Kleber
    49. K Kristiansson
    50. U Lim
    51. V Lotay
    52. JC Lui
    53. M Mangino
    54. I Mateo Leach
    55. C Medina-Gomez
    56. MA Nalls
    57. DR Nyholt
    58. CD Palmer
    59. D Pasko
    60. S Pechlivanis
    61. I Prokopenko
    62. JS Ried
    63. S Ripke
    64. D Shungin
    65. A Stancáková
    66. RJ Strawbridge
    67. YJ Sung
    68. T Tanaka
    69. A Teumer
    70. S Trompet
    71. SW van der Laan
    72. J van Setten
    73. JV Van Vliet-Ostaptchouk
    74. Z Wang
    75. L Yengo
    76. W Zhang
    77. U Afzal
    78. J Arnlöv
    79. GM Arscott
    80. S Bandinelli
    81. A Barrett
    82. C Bellis
    83. AJ Bennett
    84. C Berne
    85. M Blüher
    86. JL Bolton
    87. Y Böttcher
    88. HA Boyd
    89. M Bruinenberg
    90. BM Buckley
    91. S Buyske
    92. IH Caspersen
    93. PS Chines
    94. R Clarke
    95. S Claudi-Boehm
    96. M Cooper
    97. EW Daw
    98. PA De Jong
    99. J Deelen
    100. G Delgado
    101. JC Denny
    102. R Dhonukshe-Rutten
    103. M Dimitriou
    104. AS Doney
    105. M Dörr
    106. N Eklund
    107. E Eury
    108. L Folkersen
    109. ME Garcia
    110. F Geller
    111. V Giedraitis
    112. AS Go
    113. H Grallert
    114. TB Grammer
    115. J Gräßler
    116. H Grönberg
    117. LC de Groot
    118. CJ Groves
    119. J Haessler
    120. P Hall
    121. T Haller
    122. G Hallmans
    123. A Hannemann
    124. CA Hartman
    125. M Hassinen
    126. C Hayward
    127. NL Heard-Costa
    128. Q Helmer
    129. G Hemani
    130. AK Henders
    131. HL Hillege
    132. MA Hlatky
    133. W Hoffmann
    134. P Hoffmann
    135. O Holmen
    136. JJ Houwing-Duistermaat
    137. T Illig
    138. A Isaacs
    139. AL James
    140. J Jeff
    141. B Johansen
    142. Å Johansson
    143. J Jolley
    144. T Juliusdottir
    145. J Junttila
    146. AN Kho
    147. L Kinnunen
    148. N Klopp
    149. T Kocher
    150. W Kratzer
    151. P Lichtner
    152. L Lind
    153. J Lindström
    154. S Lobbens
    155. M Lorentzon
    156. Y Lu
    157. V Lyssenko
    158. PK Magnusson
    159. A Mahajan
    160. M Maillard
    161. WL McArdle
    162. CA McKenzie
    163. S McLachlan
    164. PJ McLaren
    165. C Menni
    166. S Merger
    167. L Milani
    168. A Moayyeri
    169. KL Monda
    170. MA Morken
    171. G Müller
    172. M Müller-Nurasyid
    173. AW Musk
    174. N Narisu
    175. M Nauck
    176. IM Nolte
    177. MM Nöthen
    178. L Oozageer
    179. S Pilz
    180. NW Rayner
    181. F Renstrom
    182. NR Robertson
    183. LM Rose
    184. R Roussel
    185. S Sanna
    186. H Scharnagl
    187. S Scholtens
    188. FR Schumacher
    189. H Schunkert
    190. RA Scott
    191. J Sehmi
    192. T Seufferlein
    193. J Shi
    194. K Silventoinen
    195. JH Smit
    196. AV Smith
    197. J Smolonska
    198. AV Stanton
    199. K Stirrups
    200. DJ Stott
    201. HM Stringham
    202. J Sundström
    203. MA Swertz
    204. AC Syvänen
    205. BO Tayo
    206. G Thorleifsson
    207. JP Tyrer
    208. S van Dijk
    209. NM van Schoor
    210. N van der Velde
    211. D van Heemst
    212. FV van Oort
    213. SH Vermeulen
    214. N Verweij
    215. JM Vonk
    216. LL Waite
    217. M Waldenberger
    218. R Wennauer
    219. LR Wilkens
    220. C Willenborg
    221. T Wilsgaard
    222. MK Wojczynski
    223. A Wong
    224. AF Wright
    225. Q Zhang
    226. D Arveiler
    227. SJ Bakker
    228. J Beilby
    229. RN Bergman
    230. S Bergmann
    231. R Biffar
    232. J Blangero
    233. DI Boomsma
    234. SR Bornstein
    235. P Bovet
    236. P Brambilla
    237. MJ Brown
    238. H Campbell
    239. MJ Caulfield
    240. A Chakravarti
    241. R Collins
    242. FS Collins
    243. DC Crawford
    244. LA Cupples
    245. J Danesh
    246. U de Faire
    247. HM den Ruijter
    248. R Erbel
    249. J Erdmann
    250. JG Eriksson
    251. M Farrall
    252. E Ferrannini
    253. J Ferrières
    254. I Ford
    255. NG Forouhi
    256. T Forrester
    257. RT Gansevoort
    258. PV Gejman
    259. C Gieger
    260. A Golay
    261. O Gottesman
    262. V Gudnason
    263. U Gyllensten
    264. DW Haas
    265. AS Hall
    266. TB Harris
    267. AT Hattersley
    268. AC Heath
    269. C Hengstenberg
    270. AA Hicks
    271. LA Hindorff
    272. AD Hingorani
    273. A Hofman
    274. GK Hovingh
    275. SE Humphries
    276. SC Hunt
    277. E Hypponen
    278. KB Jacobs
    279. MR Jarvelin
    280. P Jousilahti
    281. AM Jula
    282. J Kaprio
    283. JJ Kastelein
    284. M Kayser
    285. F Kee
    286. SM Keinanen-Kiukaanniemi
    287. LA Kiemeney
    288. JS Kooner
    289. C Kooperberg
    290. S Koskinen
    291. P Kovacs
    292. AT Kraja
    293. M Kumari
    294. J Kuusisto
    295. TA Lakka
    296. C Langenberg
    297. L Le Marchand
    298. T Lehtimäki
    299. S Lupoli
    300. PA Madden
    301. S Männistö
    302. P Manunta
    303. A Marette
    304. TC Matise
    305. B McKnight
    306. T Meitinger
    307. FL Moll
    308. GW Montgomery
    309. AD Morris
    310. AP Morris
    311. JC Murray
    312. M Nelis
    313. C Ohlsson
    314. AJ Oldehinkel
    315. KK Ong
    316. WH Ouwehand
    317. G Pasterkamp
    318. A Peters
    319. PP Pramstaller
    320. JF Price
    321. L Qi
    322. OT Raitakari
    323. T Rankinen
    324. DC Rao
    325. TK Rice
    326. M Ritchie
    327. I Rudan
    328. V Salomaa
    329. NJ Samani
    330. J Saramies
    331. MA Sarzynski
    332. PE Schwarz
    333. S Sebert
    334. P Sever
    335. AR Shuldiner
    336. J Sinisalo
    337. V Steinthorsdottir
    338. RP Stolk
    339. JC Tardif
    340. A Tönjes
    341. A Tremblay
    342. E Tremoli
    343. J Virtamo
    344. MC Vohl
    345. P Amouyel
    346. FW Asselbergs
    347. TL Assimes
    348. M Bochud
    349. BO Boehm
    350. E Boerwinkle
    351. EP Bottinger
    352. C Bouchard
    353. S Cauchi
    354. JC Chambers
    355. SJ Chanock
    356. RS Cooper
    357. PI de Bakker
    358. G Dedoussis
    359. L Ferrucci
    360. PW Franks
    361. P Froguel
    362. LC Groop
    363. CA Haiman
    364. A Hamsten
    365. MG Hayes
    366. J Hui
    367. DJ Hunter
    368. K Hveem
    369. JW Jukema
    370. RC Kaplan
    371. M Kivimaki
    372. D Kuh
    373. M Laakso
    374. Y Liu
    375. NG Martin
    376. W März
    377. M Melbye
    378. S Moebus
    379. PB Munroe
    380. I Njølstad
    381. BA Oostra
    382. CN Palmer
    383. NL Pedersen
    384. M Perola
    385. L Pérusse
    386. U Peters
    387. JE Powell
    388. C Power
    389. T Quertermous
    390. R Rauramaa
    391. E Reinmaa
    392. PM Ridker
    393. F Rivadeneira
    394. JI Rotter
    395. TE Saaristo
    396. D Saleheen
    397. D Schlessinger
    398. PE Slagboom
    399. H Snieder
    400. TD Spector
    401. K Strauch
    402. M Stumvoll
    403. J Tuomilehto
    404. M Uusitupa
    405. P van der Harst
    406. H Völzke
    407. M Walker
    408. NJ Wareham
    409. H Watkins
    410. HE Wichmann
    411. JF Wilson
    412. P Zanen
    413. P Deloukas
    414. IM Heid
    415. CM Lindgren
    416. KL Mohlke
    417. EK Speliotes
    418. U Thorsteinsdottir
    419. I Barroso
    420. CS Fox
    421. KE North
    422. DP Strachan
    423. JS Beckmann
    424. SI Berndt
    425. M Boehnke
    426. IB Borecki
    427. MI McCarthy
    428. A Metspalu
    429. K Stefansson
    430. AG Uitterlinden
    431. CM van Duijn
    432. L Franke
    433. CJ Willer
    434. AL Price
    435. G Lettre
    436. RJ Loos
    437. MN Weedon
    438. E Ingelsson
    439. JR O'Connell
    440. GR Abecasis
    441. DI Chasman
    442. ME Goddard
    443. PM Visscher
    444. JN Hirschhorn
    445. TM Frayling
    446. Electronic Medical Records and Genomics (eMEMERGEGE) Consortium
    447. MIGen Consortium
    448. PAGEGE Consortium
    449. LifeLines Cohort Study
    (2014)
    Nature Genetics 46:1173–1186.
    https://doi.org/10.1038/ng.3097
  49. 49
  50. 50

Decision letter

  1. Magnus Nordborg
    Reviewing Editor; Austrian Academy of Sciences, Austria
  2. Mark I McCarthy
    Senior Editor; University of Oxford, United Kingdom
  3. Magnus Nordborg
    Reviewer; Austrian Academy of Sciences, Austria
  4. Nicholas H Barton
    Reviewer; Institute of Science and Technology Austria, Austria
  5. Joachim Hermisson
    Reviewer; University of Vienna, Austria

In the interests of transparency, eLife includes the editorial decision letter, peer reviews, and accompanying author responses.

[Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed.]

Thank you for submitting your article "Reduced signal for polygenic adaptation of height in UK Biobank" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Magnus Nordborg as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Mark McCarthy as the Senior Editor. The following individuals involved in review of your submission have also agreed to reveal their identity: Nicholas H Barton (Reviewer #2) and Joachim Hermisson (Reviewer #3).

The Reviewing Editor has summarized the major concerns shared by all reviewers, and we have also included the separate reviews below for your consideration.

If you have any questions, please do not hesitate to contact us.

Summary:

This is one of two papers demonstrating that published signals of selection on human height cannot be replicated in the recently released UK Biobank data, apparently because these signals were caused by confounding population structure that is absent in UK Biobank data.

Major concerns:

We were struck by how both papers focus on spurious signals of selection rather than the underlying cause, which is that the GWAS effect-size estimates are confounded. The former is a somewhat esoteric question, but the latter may have enormous implications for much of human genetics, and these papers are likely to be heavily cited because of this. However, the papers seem to go out of their way to avoid discussing this topic. Of course we are not the authors, but, for the record, it looks odd.

Furthermore, the papers seem to suggest that confounding is not present in the UK Biobank data, but isn't it more likely that the magnitude is simply smaller?

The papers also present evidence that a sib-based study by Robinson et al., 2015, that was meant to eliminate confounding did no such thing. This is disturbing, and while we understand that identifying the reason may be beyond the present papers, the general implications should again probably be discussed.

Finally, while this is a carefully written and extremely scholarly paper, it relies heavily on population genetics jargon, and will be difficult to read for outsiders. We suggest substantial editing with this in mind.

Separate reviews (please respond to each point):

Reviewer #1:

This one of at least two papers appearing simultaneously and reaching exactly the same conclusion. It is well written, although perhaps a bit too technical for people not directly working in the field of human population genetics.

The only thing that surprises me about this paper is that it, as well as the other one I have seen, focuses on the relatively obscure issue of whether height has been under selection, tiptoeing around the much bigger issue (the elephant in the room) that the reason the claims for selection do not stand is that the GWAS estimates of effect sizes are biased because of population structure. It is not just the selection signals that do not replicate, but the polygenic scores. I'm not surprised, but, as you know, there are probably at least a hundred papers out there that are based on the infallibility of LD Score regression and genomic prediction. I understand the need for caution before attacking this edifice, but I nonetheless think some clarification is unavoidable.

Reviewer #2:

This is a carefully argued and scholarly examination of why estimates of selection, based on "polygenic scores", are weaker in the UK Biobank data than in earlier studies. This is an important issue, and so even though the causes of the discrepancy are not at all clear, the paper will be an important contribution. I have various comments below, and on the annotated PDF; several of these ask for a clearer definition of terminology that will make the paper more accessible to those not immersed in the field.

I would be encouraged by seeing a simulation of a stratified population, in which the standard correction for stratification fails, as is claimed to be the case here. One would also like to have some idea of what strength of allele frequency differentiation is needed to generate the observed differences in polygenic scores, and whether (as claimed) this indeed does not much affect GWAS estimates. However, this might be a non-trivial exercise, so I would not insist on it. I would like to see some discussion of how such a study could be approached, though.

The lesson from this is that it is extremely hard to reliably detect selection through the accumulation of slight changes in allele frequency. However, one should bear in mind that selection on polygenic traits is effective through precisely this process, and is largely unaffected by population structure. This is remarkable, but not paradoxical: natural selection can be effective even when its genetic basis cannot be identified.

Minor Comments:

Abstract: "polygenic differences netween populations" is perhaps ambiguous. You are not referring to differences in breeding value here, but to differences in BV inferred from SNP frequencies. Perhaps that can be said more directly.

Introduction first paragraph: Actually, polygenic scores predict an individual's additive genetic component, which is not the same as "individual phenotypes". Perhaps this could be explained: other studies of QG in the wild emphasise the importance of this distinction.

Subsection “GWAS data used to study adaptation of height” paragraph five: SDS only gets defined in the Table, it needs explanation in the text.

Subsection “Weak replication in UK Biobank” How confident can we that "current methods are likely sufficient for most applications" ? As noted above, I would have more confidence in this if we understood better how population structure confounds signals of polygenic adaptation.

Results section (and elsewhere): "genetic correlations" needs to be defined explicitly, correlations of what with what? Do these strong correlations imply that there is no detectable difference in effect size between populations? What bounds can one put on effect size differences

Figure 1: Best to label axes "Latitude, European" and "Longitude, Eurasian"

Subsection “SDS signal of selection in Britain.”: Define LMM! Since the correction for structure is crucial, it needs to be explained clearly early on.

Subsection “Relationship between GWAS estimates and European population structure”: Can one estimate the predicted strength of the "winner's curse"?

Subsection “LD Score regression signal”: "Meanwhile, confounders such as population structure are expected to affect SNPs of different LD Score equally": It is not obvious to me that this is the case.

Subsection “Summary of LD Score Regression Results”, final sentence: Delete "going forward"

Discussion section paragraph five: Nevertheless, one would like to know how much stratification may influence the accuracy of GWAS estimates.

Subsection “Expanded SNP Sets” final paragraph: One should be able to quantify this. We know the strength of assortment for height, and its effect on Va. Therefore, we can estimate its effect on pairwise LD, which will be very small because there are so may pairs, but may nevertheless lead to some overall bias. More directly, is there any correlation in polygenic scores between parents, consistent with the known assortment? One might be able to separate out causes of assortment in this way: stratification vs true assortment, causally due to height.

Reviewer #3:

Both manuscripts by Berg et al. and Sohail et al. present thorough and insightful analyses with highly relevant results for current and future GWAS studies. Even prior to publication, the manuscripts have considerable impact. They will be widely read and cited. I do not think that further analyses are needed, with the potential exception of the third point below. All other points concern the discussion, in particular the guidance for further research that will surely emerge from these studies.

# How safe are results based on the UK Biobank data?

This refers to the weak signals reported (with much caution) in the present studies, but also to potential future results on other traits. You recommend using data "such as UKB" and we will certainly see many more studies based on this resource. I would therefore appreciate a more specific discussion of risks connected to this particular data set.

1) Stratification even within the UKB-GB data: It is well known that height and socioeconomic status are correlated in modern societies (e.g. BMJ 2016; 352:i582), and social status correlates with descent. In the UK, both factors are also geographically stratified, with people living in the north of the country having lower socioeconomic status and shorter stature, on average, than those in the south. Furthermore, the percentage of Anglo-Saxon admixture varies across the UK. How could these factors influence results based on UKB data, both here and otherwise?

2) Potential influence of GxE interactions: The manuscripts focus (for good reason) on issues connected with stratification. However, if polygenic scores depend on the environment (e.g., due to countergradient variation), GxE interactions are an alternative confounding factor. Importantly, use of a homogeneous detection panel (to avoid stratification), such as UKB-GB, could increase these effects. Maybe this should be briefly discussed in the context of the present results and mentioned as a necessary caveat also for future studies that use detection panels from narrow geographic regions.

# What, exactly, causes the problems with the previous data?

3) There seem to be two relevant differences of the GIANT data relative to UKB: 1) UKB is much more homogeneous and 2) GIANT is a meta-study, collecting summary statistics from many sources that are individually corrected for stratification. One would like to know better which factor is decisive. This could be further addressed by combining summaries from sub-samples of the "UKB-all" data in an artificial meta-study.

4) The Robinson et al., 2015 GWAS: Sib-based studies are done to avoid/minimize stratification effects and the Robinson 2015 data have been used as a proof of robustness in several previous studies. The fact that you find clear signs of stratification is sobering and one would like to know what has gone wrong. You may not currently have any explanation and this is fair enough. However, the discussion should be clearer and say upfront that results based on these data cannot be trusted until we understand the issues.

Minor Comments:

a) The very low r2 in Figure 3C is strange. Any idea?

b) Figure 4: I was looking for the analysis using the other GB datasets, turns out that this is done in the supplementary information, but you should briefly comment on the results in the results part of the main text.

Additional data files and statistical comments:

Everything is well explained and all newly generated GWAS data are on Dryad.

https://doi.org/10.7554/eLife.39725.050

Author response

Major concerns:

We were struck by how both papers focus on spurious signals of selection rather than the underlying cause, which is that the GWAS effect-size estimates are confounded. The former is a somewhat esoteric question, but the latter may have enormous implications for much of human genetics, and these papers are likely to be heavily cited because of this. However, the papers seem to go out of their way to avoid discussing this topic. Of course we are not the authors, but, for the record, it looks odd.

One issue for us in writing this paper is that the effect at individual SNPs is relatively small, but systematic, therefore it is unclear how strongly various other types of analyses based on GWAS will be affected. This uncertainty means that in laying out the implications in a reasonable way we erred toward being conservative. We have now expanded our discussion to more broadly discuss the implications for polygenic score predictions, beyond just the spurious signals of selection, where the implications are clearer.

Furthermore, the papers seem to suggest that confounding is not present in the UK Biobank data, but isn't it more likely that the magnitude is simply smaller?

As discussed below we now point to evidence of residual confounding in the biobank.

The papers also present evidence that a sib-based study by Robinson et al., 2015, that was meant to eliminate confounding did no such thing. This is disturbing, and while we understand that identifying the reason may be beyond the present papers, the general implications should again probably be discussed.

Two weeks ago, Robinson and colleagues informed us that the effect sizes they released in 2016 had serious stratification issues due to a bug (see statement posted here: http://cnsgenomics.com/data.html). Furthermore, they revealed that these effect sizes that they released associated with their 2015 paper, were not in fact the effect sizes they used their 2015 paper. They also released a third set of effect sizes produced by reanalyzing the sibling data. This news obviously came to us late in our revisions process, but we have now run their new 2018 set of effect sizes through our pipeline. We now see no consistent signal of selection in Robinson et al’s new set of effect sizes, consistent with our BioBank analyses. Thus at least one mystery has now been solved.

Finally, while this is a carefully written and extremely scholarly paper, it relies heavily on population genetics jargon, and will be difficult to read for outsiders. We suggest substantial editing with this in mind.

We have substantially edited the paper to define terms, and make sure that it is more accessible.

Separate reviews (please respond to each point):

Reviewer #1:

This one of at least two papers appearing simultaneously and reaching exactly the same conclusion. It is well written, although perhaps a bit too technical for people not directly working in the field of human population genetics.

The only thing that surprises me about this paper is that it, as well as the other one I have seen, focuses on the relatively obscure issue of whether height has been under selection, tiptoeing around the much bigger issue (the elephant in the room) that the reason the claims for selection do not stand is that the GWAS estimates of effect sizes are biased because of population structure. It is not just the selection signals that do not replicate, but the polygenic scores. I'm not surprised, but, as you know, there are probably at least a hundred papers out there that are based on the infallibility of LD Score regression and genomic prediction. I understand the need for caution before attacking this edifice, but I nonetheless think some clarification is unavoidable.

We have now extended our discussion of the issues for polygenic scores in the discussion, and how stratification is a strong source of false among-population variance for studies for samples with variation along axes of European variation.

Reviewer #2:

This is a carefully argued and scholarly examination of why estimates of selection, based on "polygenic scores", are weaker in the UK Biobank data than in earlier studies. This is an important issue, and so even though the causes of the discrepancy are not at all clear, the paper will be an important contribution. I have various comments below, and on the annotated PDF, several of these ask for a clearer definition of terminology that will make the paper more accessible to those not immersed in the field.

We have incorporated the reviewer’s comments about clarifying terminology throughout the document.

I would be encouraged by seeing a simulation of a stratified population, in which the standard correction for stratification fails, as is claimed to be the case here. One would also like to have some idea of what strength of allele frequency differentiation is needed to generate the observed differences in polygenic scores, and whether (as claimed) this indeed does not much affect GWAS estimates.. However, this might be a non-trivial exercise, so I would not insist on it. I would like to see some discussion of how such a study could be approached, though.

We agree that it would be useful to offer some simulations to show how stratification affected the original GWAS. However, the issue is that the source of population stratification in the meta-analysis is far from clear, as it involves many studies, each of which performed some level of correction for stratification. As such we feel that any simulation would not really provide much intuition about the level of confounding involved in the original GWAS.

The lesson from this is that it is extremely hard to reliably detect selection through the accumulation of slight changes in allele frequency. However, one should bear in mind that selection on polygenic traits is effective through precisely this process, and is largely unaffected by population structure. This is remarkable, but not paradoxical: natural selection can be effective even when its genetic basis cannot be identified.

Minor Comments:

Abstract: "polygenic differences netween populations" is perhaps ambiguous. You are not referring to differences in breeding value here, but to differences in BV inferred from SNP frequencies. Perhaps that can be said more directly.

Changed to “differences in polygenic score between populations”

Introduction first paragraph: Actually, polygenic scores predict an individual's additive genetic component, which is not the same as "individual phenotypes". Perhaps this could be explained: other studies of QG in the wild emphasise the importance of this distinction.

Changed to “additive genetic component of individual phenotypes”

Subsection “GWAS data used to study adaptation of height” paragraph five: SDS only gets defined in the Table, it needs explanation in the text.

Fixed.

Subsection “Weak replication in UK Biobank”: How confident can we that "current methods are likely sufficient for most applications" ? As noted above, I would have more confidence in this if we understood better how population structure confounds signals of polygenic adaptation.

Changed “most” to “many”, and outlined a few contrasting cases where current methods seem likely sufficient, vs. those where they may not be.

Results section (and elsewhere): "genetic correlations" needs to be defined explicitly, correlations of what with what? Do these strong correlations imply that there is no detectable difference in effect size between populations? What bounds can one put on effect size differences

Added text explaining the procedure for calculating “genetic correlations”, and how we are interpreting them.

Figure 1: Best to label axes "Latitude, European" and "Longitude, Eurasian"

Done.

Subsection “SDS signal of selection in Britain.”: Define LMM! Since the correction for structure is crucial, it needs to be explained clearly early on.

Done.

Subsection “Relationship between GWAS estimates and European population structure”: Can one estimate the predicted strength of the "winner's curse"?

It is possible, but not trivial (e.g. https://www.sciencedirect.com/science/article/pii/S0002929707610970, https://onlinelibrary.wiley.com/doi/abs/10.1002/gepi.20398), to correct for the winner’s curse, but it’s unclear to us in our current application what the practical advantage of doing so would be.

Subsection “LD Score regression signal”: "Meanwhile, confounders such as population structure are expected to affect SNPs of different LD Score equally": It is not obvious to me that this is the case.

This expectation traces its origin to the original LD Score paper (Bulik Sullivan et al. 2015), which argued that the rate of drift and LD Score should be uncorrelated. They acknowledged that background selection could potentially violate this argument, and performed a few simulations in small samples of northern Europeans to support it. This assertion of independence between rate of drift and LD Score has been generally accepted by the human genetics community. However, as we show later in this paper, this assumption does not necessarily hold.

We replaced the word “expected” with “argued” to clarify that we do not ourselves necessarily expect this to hold, but that it has been argued.

Subsection “Summary of LD Score Regression Results”, final sentence: Delete "going forward"

Done.

Discussion section paragraph five: Nevertheless, one would like to know how much stratification may influence the accuracy of GWAS estimates.

We have updated this section of the paper to more thoroughly discuss our current state of knowledge.

Subsection “Expanded SNP Sets” final paragraph: One should be able to quantify this. We know the strength of assortment for height, and its effect on Va. Therefore, we can estimate its effect on pairwise LD, which will be very small because there are so may pairs, but may nevertheless lead to some overall bias. More directly, is there any correlation in polygenic scores between parents, consistent with the known assortment? One might be able to separate out causes of assortment in this way: stratification vs true assortment, causally due to height.

This is an interesting idea that we think may be worth pursuing in the future. However, we note that the accelerated rate of drift relative to our neutral expectation depends on the degree of assortative mating in the past, when the specified drift was taking place, and may therefore differ from the rate that would be expected given the present day degree of assortative mating. The best that can likely be estimated from this sort of signature is some sort of time averaged rate of assortative mating over the period during which the populations under study were diverging.

This does raise the point that, so long as stabilizing selection is not too strong, strong assortative mating over short periods of time may be essentially indistinguishable from bouts of selection. This is a topic some of us are interested in exploring in the future.

Reviewer #3:

Both manuscripts by Berg et al. and Sohail et al. present thorough and insightful analyses with highly relevant results for current and future GWAS studies. Even prior to publication, the manuscripts have considerable impact. They will be widely read and cited. I do not think that further analyses are needed, with the potential exception of the third point below. All other points concern the discussion, in particular the guidance for further research that will surely emerge from these studies.

# How safe are results based on the UK Biobank data?

This refers to the weak signals reported (with much caution) in the present studies, but also to potential future results on other traits. You recommend using data "such as UKB" and we will certainly see many more studies based on this resource. I would therefore appreciate a more specific discussion of risks connected to this particular data set.

1) Stratification even within the UKB-GB data: It is well known that height and socioeconomic status are correlated in modern societies (e.g. BMJ 2016; 352:i582), and social status correlates with descent. In the UK, both factors are also geographically stratified, with people living in the north of the country having lower socioeconomic status and shorter stature, on average, than those in the south. Furthermore, the percentage of Anglo-Saxon admixture varies across the UK. How could these factors influence results based on UKB data, both here and otherwise?

We now have a brief discussion of remaining stratification in the UK Biobank, and point to other papers on this topic.

2) Potential influence of GxE interactions: The manuscripts focus (for good reason) on issues connected with stratification. However, if polygenic scores depend on the environment (e.g., due to countergradient variation), GxE interactions are an alternative confounding factor. Importantly, use of a homogeneous detection panel (to avoid stratification), such as UKB-GB, could increase these effects. Maybe this should be briefly discussed in the context of the present results and mentioned as a necessary caveat also for future studies that use detection panels from narrow geographic regions.

This is a fair point, but also one that has been covered fairly extensively in the Berg and Coop 2014 paper, as well as the recent 2018 commentary from Novembre and Barton. We now include a comment regarding the promises and pitfalls of studying polygenic phenotypes across ancestry and environmental gradients in our closing paragraph, and direct readers to these two papers.

# What, exactly, causes the problems with the previous data?

3) There seem to be two relevant differences of the GIANT data relative to UKB: 1) UKB is much more homogeneous and 2) GIANT is a meta-study, collecting summary statistics from many sources that are individually corrected for stratification. One would like to know better which factor is decisive. This could be further addressed by combining summaries from sub-samples of the "UKB-all" data in an artificial meta-study.

We agree this would be interesting. However, a thorough investigation of this issue could easily constitute an entire paper of its own, and without access to the original GIANT cohort data, would bring us no closer to determining what actually happened in that dataset.

4) The Robinson et al., 2015 GWAS: Sib-based studies are done to avoid/minimize stratification effects and the Robinson 2015 data have been used as a proof of robustness in several previous studies. The fact that you find clear signs of stratification is sobering and one would like to know what has gone wrong. You may not currently have any explanation and this is fair enough. However, the discussion should be clearer and say upfront that results based on these data cannot be trusted until we understand the issues.

Robinson et al. have now informed us that the summary statistics they posted online in 2016 were A) not the same statistics actually used in their 2015 paper, and B) afflicted by a plink bug which resulted in mishandling of family structured data, and therefore no control for population structure (http://cnsgenomics.com/data.html). We have updated the paper throughout to reflect this, and we added an analysis of a new set of corrected summary statistics they released.

Minor Comments:

a) The very low r2 in Figure 3C is strange. Any idea?

This is due to the substantially lower power of the Robinson sib study relative to GIANT. This should result in a greater distance, on average, between the chosen lead SNP and the causal variation it tags, leading to much weaker replication of effect sizes. We have added a comment on the difference in power to the figure caption.

b) Figure 4: I was looking for the analysis using the other GB datasets, turns out that this is done in the supplementary information, but you should briefly comment on the results in the results part of the main text

Added.

https://doi.org/10.7554/eLife.39725.051

Article and author information

Author details

  1. Jeremy J Berg

    Department of Biological Sciences, Columbia University, New York, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Arbel Harpak and Nasa Sinnott-Armstrong
    For correspondence
    jeremy.jackson.berg@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5411-6840
  2. Arbel Harpak

    1. Department of Biological Sciences, Columbia University, New York, United States
    2. Department of Biology, Stanford University, Stanford, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Jeremy J Berg and Nasa Sinnott-Armstrong
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3655-748X
  3. Nasa Sinnott-Armstrong

    Department of Genetics, Stanford University, Stanford, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Jeremy J Berg and Arbel Harpak
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4490-0601
  4. Anja Moltke Joergensen

    Lundbeck GeoGenetics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Validation, Investigation, Visualization, Writing—original draft
    Competing interests
    No competing interests declared
  5. Hakhamanesh Mostafavi

    Department of Biological Sciences, Columbia University, New York, United States
    Contribution
    Data curation, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1060-2844
  6. Yair Field

    Department of Genetics, Stanford University, Stanford, United States
    Contribution
    Conceptualization
    Competing interests
    No competing interests declared
  7. Evan August Boyle

    Department of Genetics, Stanford University, Stanford, United States
    Contribution
    Conceptualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4494-9771
  8. Xinjun Zhang

    Department of Anthropology, University of California, Davis, Davis, United States
    Contribution
    Conceptualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1298-3545
  9. Fernando Racimo

    Lundbeck GeoGenetics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Conceptualization, Supervision, Investigation, Writing—original draft, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5025-2607
  10. Jonathan K Pritchard

    1. Department of Biology, Stanford University, Stanford, United States
    2. Howard Hughes Medical Institute, Stanford University, Stanford, United States
    Contribution
    Conceptualization, Supervision, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    pritch@stanford.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8828-5236
  11. Graham Coop

    1. Center for Population Biology, University of California, Davis, Davis, United States
    2. Department of Evolution and Ecology, University of California, Davis, Davis, United States
    Contribution
    Conceptualization, Supervision, Project administration
    For correspondence
    gmcoop@ucdavis.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8431-0302

Funding

National Institutes of Health (R01 GM115889)

  • Jeremy J Berg

National Institutes of Health (F32 GM126787)

  • Jeremy J Berg

Stanford Center for Computational, Evolutionary and Human Genomics (Fellowship)

  • Arbel Harpak

U.S. Department of Defense (National Defense Science and Engineering Gran)

  • Nasa Sinnott-Armstrong

Stanford University (Graduate Fellowship)

  • Nasa Sinnott-Armstrong

U.S. Department of Defense (National Defense Science and Engineering Grant)

  • Nasa Sinnott-Armstrong

National Institutes of Health (R01 GM121372)

  • Hakhamanesh Mostafavi

Villum Fonden (Young Investigator award)

  • Fernando Racimo

National Institutes of Health (R01 HG008140)

  • Jonathan K Pritchard

National Institutes of Health (R01 GM108779)

  • Graham Coop

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This research has been conducted using the UK Biobank resource under applications Number 11138 and 24983. We thank Manuel Rivas for assistance with the accessing this resource. We thank the Coop lab, Pritchard lab, Przeworski lab, Sella lab, Doc Edge, John Novembre, Guy Sella, Molly Przeworski, Joshua Schraiber, Loic Yengo and the authors of Robinson et al. (2015) and Sohail et al., 2019 for helpful conversations and feedback on earlier drafts. We also thank Magnus Nordborg, Nicholas Barton, and Joachim Hermisson for helpful comments during the review process. JJB was supported in part by NIH R01 GM115889 to Guy Sella and in part by NIH Grant F32 GM126787 to JJB. AH was supported, in part, by a fellowship from the Stanford Center for Computational, Evolutionary and Human Genomics. NS-A was supported by a Stanford Graduate Fellowship and by the Department of Defense through a National Defense Science and Engineering Grant. HM was supported by NIH R01 GM121372 to Molly Przeworski. FR was supported by a Villum Fonden Young Investigator award. GC was partially supported by NIH R01 GM108779. Work in the Pritchard lab was supported in part by NIH R01 HG008140.

Senior Editor

  1. Mark I McCarthy, University of Oxford, United Kingdom

Reviewing Editor

  1. Magnus Nordborg, Austrian Academy of Sciences, Austria

Reviewers

  1. Magnus Nordborg, Austrian Academy of Sciences, Austria
  2. Nicholas H Barton, Institute of Science and Technology Austria, Austria
  3. Joachim Hermisson, University of Vienna, Austria

Publication history

  1. Received: July 2, 2018
  2. Accepted: January 15, 2019
  3. Version of Record published: March 21, 2019 (version 1)
  4. Version of Record updated: March 22, 2019 (version 2)

Copyright

© 2019, Berg et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 4,450
    Page views
  • 509
    Downloads
  • 11
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Nick Barton et al.
    Insight

    Great care is needed when interpreting claims about the genetic basis of human variation based on data from genome-wide association studies.

    1. Evolutionary Biology
    2. Microbiology and Infectious Disease
    Peter Major et al.
    Research Article Updated