Research Communication

Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies

Brigham and Women’s Hospital and Harvard Medical School, United States
Harvard Medical School, United States
Broad Institute of MIT and Harvard, United States
Massachusetts General Hospital, United States
Karolinska Institutet, Sweden
University of Helsinki, Finland
Brown University, United States
University of Southern California, United States
Boston Children’s Hospital, United States
University of Pennsylvania, United States
Howard Hughes Medical Institute, Harvard Medical School, United States

Mar 21, 2019

Open access
Copyright information

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution.

Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).

https://doi.org/10.7554/eLife.39702.001

Introduction

Most human complex traits are highly polygenic (Yang et al., 2010; Boyle et al., 2017). For example, height has been estimated to be modulated by as much as 4% of human allelic variation (Boyle et al., 2017; Zeng et al., 2018). Polygenic traits are expected to evolve differently from monogenic ones, through slight but coordinated shifts in the frequencies of a large numbers of alleles, each with mostly small effect. In recent years, multiple methods have sought to detect selection on polygenic traits by evaluating whether shifts in the frequency of trait-associated alleles are correlated with the signed effects of the alleles estimated by genome-wide association studies (GWAS) (Turchin et al., 2012; Berg and Coop, 2014; Mathieson et al., 2015; Robinson et al., 2015; Berg et al., 2017; Racimo et al., 2018; Guo et al., 2018).

Here we focus on a series of recent studies—some involving co-authors of the present manuscript—that have reported evidence of polygenic adaptation at alleles associated with height in Europeans. One set of studies observed that height-increasing alleles are systematically elevated in frequency in northern compared to southern European populations, a result that has subsequently been extended to ancient DNA (Turchin et al., 2012; Berg and Coop, 2014; Mathieson et al., 2015; Robinson et al., 2015; Berg et al., 2017; Racimo et al., 2018; Guo et al., 2018; Simonti et al., 2017). Another study using a very different methodology (singleton density scores, SDS) found that height-increasing alleles have systematically more recent coalescence times in the United Kingdom (UK) consistent with selection for increased height in the last few thousand years (Field et al., 2016a). In the present work, we assess polygenic adaptation on human height as a particular case of the effects that uncorrected population structure in GWAS can have on studies of complex traits.

Most of these previous studies have been based on SNP associations and effect sizes (summary statistics) reported by the GIANT Consortium, which most recently combined 79 individual GWAS through meta-analysis, including a total of 253,288 individuals (Lango Allen et al., 2010; Wood et al., 2014). Here, we show that the selection effects described in these studies are severely attenuated and in some cases no longer significant when using summary statistics derived from the UK Biobank, an independent and larger study that includes 336,474 genetically unrelated individuals who derive their recent ancestry almost entirely from the British Isles (identified as ‘white British ancestry’ by the UK Biobank) (Supplementary file 1). The UK Biobank analysis is based on a single cohort drawn from a relatively homogeneous population enabling better control of population stratification. Both datasets have high concordance even for low P value SNPs which do not reach genome-wide significance (Figure 1—figure supplement 1; genetic correlation between the two height studies is 0.94 [se = 0.0078]). Despite this concordance, we observe that small but systematic biases lead to the two datasets yielding qualitatively different conclusions with respect to signals of polygenic adaptation.

Results

Discrepancies in GWAS: population-level differences in height

To study population level differences among ancient and present-day European samples, we began by estimating ‘polygenic height scores’ as sums of allele frequencies at independent SNPs weighted by their effect sizes from GIANT. We used a set of different significance thresholds and strategies to correct for linkage disequilibrium as employed by previous studies, and replicated their signals for significant differences in genetic height across populations (Turchin et al., 2012; Berg and Coop, 2014; Mathieson et al., 2015; Robinson et al., 2015; Berg et al., 2017; Racimo et al., 2018; Guo et al., 2018; Simonti et al., 2017) (Figure 1a, Figure 1—figure supplement 2). We then repeated the analysis using summary statistics from a GWAS for height in the UK Biobank restricting to individuals of British Isles ancestry (hereafter referred to as the ‘white British' (WB) subset) and correcting for population stratification based on the first ten principal components (UK Biobank [UKB]; also referred to as ‘UKB Neale’ in the supplementary figures) (Churchhouse et al., 2017). This analysis resulted in a dramatic attenuation of differences in polygenic height scores (Figure 1a, Figure 1—figure supplements 2–4). The differences between ancient European populations also greatly attenuated (Figure 1a, Figure 1—figure supplement 5). Strikingly, the ordering of the scores for populations also changed depending on which GWAS was used to estimate genetic height both within Europe (Figure 1a, Figure 1—figure supplements 2–5) and globally (Figure 1—figure supplement 6), consistent with reports from a recent simulation study (Martin et al., 2017). The height scores were qualitatively similar only when we restricted to independent genome-wide significant SNPs in GIANT and the UK Biobank (p<5×10⁻⁸) (Figure 1—figure supplement 2b). This replicates the originally reported significant north-south difference in the allele frequency of the height-increasing allele (Turchin et al., 2012) or in genetic height (Berg and Coop, 2014) across Europe, as well as the finding of greater genetic height in ancient European steppe pastoralists than in ancient European farmers (Mathieson et al., 2015), although the signals are attenuated even here. Our observations suggest that tests of polygenic adaptation based on genome-wide significant SNPs are relatively consistent across different GWAS (Figure 1—figure supplement 2b) and that our concern is primarily directed towards the use of sub-significant SNPs in polygenic scores (Figure 1a, Figure 1—figure supplement 2a).

Figure 1 with 6 supplements see all

Download asset Open asset

Polygenic height scores and tSDS scores based on GIANT and UK Biobank GWAS.

(a) Polygenic scores in present-day and ancient European populations are shown, centered by the average score across populations and standardized by the square root of the additive variance. Independent SNPs for the polygenic score from both GIANT (*red*) and the UK Biobank [UKB] (*blue*) were selected by picking the SNP with the lowest P value in each of 1700 independent LD blocks similarly to refs. (Berg et al., 2017; Racimo et al., 2018) (see Materials and methods). Present-day populations are shown from Northern Europe (CEU, GBR) and Southern Europe (IBS, TSI) from the 1000 genomes project; Ancient populations are shown in three meta-populations (HG = Hunter Gatherer (n = 162 individuals), EF = Early Farmer (n = 485 individuals), and SP = Steppe Ancestry (n = 465 individuals)) (see Supplementary file 2). Error bars are drawn at 95% credible intervals. See Figure 1—figure supplement 1 for analyses of concordance of effect size estimates between GIANT and UKB. See Figure 1—figure supplements 2–6 for polygenic height scores computed using other linkage disequilibrium pruning procedures, significance thresholds, summary statistics and populations. (b) tSDS for height-increasing allele in GIANT (left) and UK Biobank (right). The tSDS method was applied using pre-computed Singleton Density Scores for 4,451,435 autosomal SNPs obtained from 3195 individuals from the UK10K project (Field et al., 2016a; Field et al., 2016b) for SNPs associated with height in GIANT and the UK Biobank. SNPs were ordered by GWAS P value and grouped into bins of 1000 SNPs each. The mean tSDS score within each P value bin is shown on the y-axis. The Spearman correlation coefficient between the tSDS scores and GWAS P values, as well as the correlation standard errors and P values, were computed on the un-binned data. The gray line indicates the null-expectation, and the colored lines are the linear regression fit. The correlation is significant for GIANT (Spearman r = 0.078, p=1.55×10⁻⁶⁵) but not for UK Biobank (Spearman r = −0.009, p=0.077). See Figure 1—source data 1 for figure data.

https://doi.org/10.7554/eLife.39702.002

Figure 1—source data 1 Polygenic height scores and tSDS scores based on GIANT and UK Biobank GWAS.: https://doi.org/10.7554/eLife.39702.009
Download elife-39702-fig1-data1-v2.xlsx

Discrepancies in GWAS: height evolution within a single population

Next, we assessed if an independent measure, the ‘singleton density score (SDS)', which uses a coalescent approach to infer adaptation within a population, is equally as susceptible to biases in GWAS (Field et al., 2016a; Field et al., 2016b). SDS can be combined with GWAS effect size estimates to infer polygenic adaptation on complex traits (generating a ‘tSDS score’ by aligning the SDS sign to the trait-increasing allele). A tSDS score larger than zero for height-increasing alleles implies that these alleles have been increasing in frequency in a population over time due to natural selection. We replicate the original finding that SDS scores of the height-increasing allele computed in the UK population (using the UK10K dataset) increase with stronger association of the alleles to height as inferred by GIANT (Field et al., 2016a) across the entire P value spectrum (Spearman’s ρ = 0.078, p=1.55×10⁻⁶⁵, Figure 1b). However, we observed that this signal of polygenic adaptation in the UK, measured using a Spearman correlation across all GWAS SNPs, disappeared when we used the UK Biobank height effect size estimates (ρ = 0.009, p=0.077, Figure 1b). These observations suggest that concerns about sub-significant SNPs should not only be directed towards population-level differences using polygenic scores but also to analyses of adaptation within a single population.

Population structure underlying discrepancies in GWAS

Discrepancies between GIANT and UK biobank

We propose that the qualitative difference between the polygenic adaptation signals in GIANT and the UK Biobank is due to the cumulative effect of subtle biases in each of the SNPs estimated in GIANT. This bias can arise due to incomplete control of the population structure in GWAS (Novembre and Barton, 2018). For example, if height were differentiated along a north-south axis because of differences in environment, any variant that is differentiated in frequency along the same axis would have an artificially large effect size estimated in the GWAS. Population structure is substantially less well controlled for in the GIANT study than in the UK Biobank study. This is both because the GIANT study population is more heterogeneous than that in the UK Biobank, and because population structure in the GIANT meta-analysis may not have been well controlled in some component cohorts due to their relatively small sizes (i.e., the ability to detect and correct population structure is dependent on sample size (Patterson et al., 2006; Price et al., 2006). The GIANT meta-analysis also found that such stratification effects worsen as SNPs below genome-wide significance are used to estimate height scores (Wood et al., 2014), consistent with our finding that the differences in genetic height among populations increase when including these SNPs.

We obtained direct confirmation that population structure is more correlated with effect size estimates in GIANT than to those in the UK Biobank. Figure 2a shows that the effect sizes estimated in GIANT, in contrast to those in the UKB, are highly correlated with the SNP loadings of several principal components of population structure (PC loadings). We also find that the UK Biobank estimates including individuals of diverse ancestry and not correcting for population structure (UKB all no PCs) show the same stratification effects as GIANT (Figure 2—figure supplements 1–3). Further, in line with our intuition regarding the effects of residual stratification on GWAS effect size estimates, we find that alleles that are more common in the Great Britain population (1000 genomes GBR) than in the Tuscan population from Italy (1000 genomes TSI) tend to be preferentially estimated as height-increasing according to the GIANT study but not according to the UKB study (Figure 2c, Figure 2—figure supplements 2–3).

Figure 2 with 4 supplements see all

Download asset Open asset

Evidence of stratification in height summary statistics.

Top row: Pearson Correlation coefficients of (a) PC loadings and height beta coefficients from GIANT and UKB, and (b) PC loadings and SDS (pre-computed in the UK10K) across all SNPs. PCs were computed in all 1000 genomes phase one samples (Abecasis et al., 2012). Colors indicate the correlation of each PC loading with the allele frequency difference between GBR and TSI, a proxy for the European North-South genetic differentiation. PC 4 and 11 are most highly correlated with the GBR - TSI allele frequency difference. Confidence intervals and P values are based on Jackknife standard errors (1000 blocks). Open circles indicate correlations significant at alpha = 0.05, stars indicate correlations significant after Bonferroni correction in 20 PCs (p<0.0025). Bottom row: Heat map after binning all SNPs by GBR and TSI minor allele frequency of (c) mean beta coefficients from GIANT and UKB, and (d) SDS scores for all SNPs. Only bins with at least 300 SNPs are shown. While the stratification effect in SDS is not unexpected, it can lead to false conclusions when applied to summary statistics that exhibit similar stratification effects. See Figure 2—figure supplements 1–3 for analyses of stratification effects in different summary statistics, and Supplementary file 3 for further description of stratification effects. UKB height betas exhibit stratification effects that are weaker, and in the opposite direction of the stratification effects in GIANT (see Figure 2—figure supplement 4 for a possible explanation). See Figure 2—source data 1 for figure data.

https://doi.org/10.7554/eLife.39702.010

Figure 2—source data 1 Evidence of stratification in height summary statistics.: https://doi.org/10.7554/eLife.39702.015
Download elife-39702-fig2-data1-v2.xlsx

Effect size estimates from previously published family-based height GWAS

We analyzed previously released family-based effect size estimates based on an approach of Robinson et al. (2015) (NG2015 sibs). Surprisingly, we found that while these summary statistics produced significant polygenic adaptation signals, they were also correlated with PC loadings as well as with GBR-TSI allele frequency differences (Figure 2—figure supplements 1–3). This suggests that these estimates are also affected by population structure despite being computed within families and, therefore, in principle, robust to structure. Our own family-based estimates in the UK Biobank (UKB sibs all, UKB sibs WB) appear unconfounded and do not produce significant adaptation signals across the spectrum of associated SNPs (Figure 2—figure supplements 1–3). The residual structure in the original NG2015 sibs dataset is likely to reflect a technical artifact (personal communication from Peter Visscher, and note on their website [Program in Complex Trait Genomics, 2018]). Berg and colleagues (Berg et al., 2019) show that the updated NG2015 sibs summary statistics (posted in the public domain [Program in Complex Trait Genomics, 2018] in November 2018 during the revision of this manuscript) do not show significant signals of polygenic adaptation using either polygenic score differences in Europe or the tSDS metric in the UK.

Population structure within the UK biobank

We also note that the white British subset of the UKB data is not completely free of population stratification (as shown previously [Haworth et al., 2019]), although the magnitude of the potential confounding is much smaller than in the Continental European population (Figure 2—figure supplements 1–2). Interestingly, the north-south genetic cline in the UK tracks the height gradient in the opposite direction than in Continental Europe (Figure 2—figure supplements 2 and 4), and after correcting with principal components, we do not observe any evidence of residual stratification in comparison with the 1000 genomes data (Figure 2a,c). However, we cannot exclude the possibility of uncorrected population stratification, even in the UK Biobank, along axes not captured by the principal components of the 1000 genomes project data. For example, even for genome-wide significant SNPs (Figure 1—figure supplement 2b), polygenic scores for both modern and ancient individuals change when UKB summary statistics (WB ancestry controlling for 10 PCs) are used instead of GIANT. This shift, for example, for the ancient European hunter-gatherer polygenic score is troubling as different European populations are shown to have variable amounts of genetic ancestry from ancient ‘hunter-gatherer’ vs. ‘early farmer’ vs. ‘steppe ancestry’ populations (Haak et al., 2015; Galinsky et al., 2016), and could reflect residual stratification in the UKB GWAS not captured by the 1000 genomes PCs.

Effects of population structure on within-population adaptation inference

We proceeded to investigate the effects of uncontrolled population stratification in GWAS discussed above on a coalescent approach such as tSDS that relies on singleton density (Field et al., 2016a). In principle, this approach is robust to the type of population stratification that affects the allele-frequency based tests. However, there is a north-south cline in singleton density in Europe due to lower genetic diversity in northern than in southern Europeans, leading to singleton density being lower in northern than in southern regions (Sohail et al., 2017). As a consequence, SDS tends to be higher (corresponding to fewer singletons) in alleles more common in GBR than in TSI (Figure 2d). This cline in singleton density coincidentally parallels the phenotypic cline in height and the major axis of genome-wide genetic variation. Therefore, when we perform the tSDS test using GIANT, we find a higher SDS around the inferred height-increasing alleles, which tend, due to the uncontrolled population stratification in GIANT, to be at high frequency in northern Europe (Figure 2c). This effect does not appear when we use UK Biobank summary statistics because of the much lower level of population stratification and more modest variation in height. We find that SDS is not only correlated with GBR-TSI allele frequency differences, but with several principal component loadings across all SNPs (Figure 2b), and that these SDS-PC correlations often coincide with correlations between GIANT-estimated effect sizes and PC loadings (Figure 2a). We further find that the tSDS signal which is observed across the whole range of P values in some GWAS summary statistics can be mimicked by replacing SDS with GBR-TSI allele frequency differences (Figure 3a and c, Figure 3—figure supplements 1–4), suggesting that the tSDS signal at non-significant SNPs may be driven in part by residual population stratification.

Figure 3 with 5 supplements see all

Download asset Open asset

Height tSDS results for different summary statistics.

(a) Mean tSDS of the height increasing allele in each P value bin for six different summary statistics. The first two panels are computed analogously to Figure 4A and Figure S22 of Field et al. (2016a). In contrast to those Figures and to Figure 1b, the displayed betas and P values correspond to the slope and P value of the linear regression across all un-binned SNPs (rather than the Spearman correlation coefficient and Jackknife P values). The y-axis has been truncated at 0.75, and does not show the top bin for UKB all no PCs, which has a mean tSDS of 1.5. (b) tSDS distribution of the height increasing allele in 506 LD-independent SNPs which are genome-wide significant in a UKB height GWAS, where the beta coefficient is taken from a within sibling analysis in the UKB. The gray curve represents the standard normal null distribution, and we observe a significant shift. (c) Allele frequency difference between GBR and TSI of the height increasing allele in each P value bin for six different summary statistics. Betas and P values correspond to the slope and P value of the linear regression across all un-binned SNPs. The lowest P value bin in UKB all no PCs with a y-axis value of 0.06 has been omitted. (d) Allele frequency difference between GBR and TSI of the height increasing allele in 329 LD-independent SNPs which are genome-wide significant in a UKB height GWAS and were intersected with our set of 1000 genomes SNPs. There is no significant difference in frequency in these two populations, suggesting that tSDS shift at the genome-wide significant SNPs is not driven by population stratification at least due to this particular axis. The patterns shown here suggest that the positive tSDS values across the whole range of P values is a consequence of residual stratification. At the same time, the increase in tSDS at genome-wide significant, LD-independent SNPs in (b) cannot be explained by GBR - TSI allele frequency differences as shown in (d). See Figure 3—figure supplements 1–4 for other GWAS summary statistics for unpruned and LD-pruned SNPs. Binning SNPs by P value without LD-pruning can lead to unpredictable patterns at the low P value end, as the SNPs at the low P value end are less independent of each other than higher P value SNPs (Figure 3—figure supplement 5). See Figure 3—source data 1 for figure data.

https://doi.org/10.7554/eLife.39702.016

Figure 3—source data 1 Height tSDS results for different summary statistics.: https://doi.org/10.7554/eLife.39702.022
Download elife-39702-fig3-data1-v2.xlsx

A residual signal of polygenic adaptation on height?

For polygenic adaptation within a population, a small but significant tSDS signal is observed in the UK when we restrict to genome-wide significant SNPs (p<5×10⁻⁸). This effect persists when using UK Biobank family-based estimates (UKB sibs WB) for genome-wide significant SNPs (Figure 3b), and is not driven by allele frequency differences between GBR and TSI (Figure 3d), suggesting an attenuated signal of polygenic adaptation in the UK that is driven by a much smaller number of SNPs than previously thought. Indeed, under most genetic architectures, a tSDS signal which is driven by natural selection is not expected to lead to an almost linear increase over the whole P value range in a well-powered GWAS. Instead, we would expect to see a greater difference between highly significant SNPs and non-significant SNPs, similar to the pattern observed in the UK Biobank (Figure 3a).

For population-level differences in height, we assessed whether any remaining variation in height polygenic scores among populations is driven by polygenic adaptation by testing against a null model of genetic drift (Berg and Coop, 2014). We re-computed polygenic height scores in the POPRES dataset to increase power for this analysis as it has larger sample sizes of northern and southern Europeans than the 1000 Genomes project (Nelson et al., 2008). We computed height scores using independent SNPs that are 1) genome-wide significant in the UK Biobank (‘gw-sig’, p<5×10⁻⁸) and 2) sub-significantly associated with height (‘sub-sig’, p<0.01) in different GWAS datasets. For each of these, we tested if population differences were significant due to an overall overdispersion (P_Qx), and if they were significant along a north-south cline (P_lat) (Figure 4, Figure 4—figure supplements 1–2). Both gw-sig and sub-sig SNP-based scores computed using GIANT effect sizes showed significant overdispersion of height scores overall and along a latitude cline, consistent with previous results (Figure 4, Figure 4—figure supplements 1–2). However, the signal attenuated dramatically between sub-sig (Q_x = 1100, P_Qx = 1×10⁻²²⁰) and gw-sig (Q_x = 48, P_Qx = 2×10⁻⁴) height scores. In comparison, scores that were computed using the UK Biobank (UKB) effect sizes showed substantially attenuated differences using both sub-sig (Q_x = 64, P_Qx = 5×10⁻⁷) and gw-sig (Q_x = 33, P_Qx = 0.02) SNPs, and a smaller difference between the two scores. This suggests that the attenuation of the signal in GIANT is not only driven by a loss of power when using fewer gw-sig SNPs, but also reflects a decrease in stratification effects. The overdispersion signal disappeared entirely when the UK Biobank family based effect sizes were used (Figure 4, Figure 4—figure supplements 1–2). Moreover, Q_x P values based on randomly ascertained SNPs and UK Biobank summary statistics are not uniformly distributed as would be expected if the theoretical null model is valid and if population structure is absent (Figure 4—figure supplement 3). The possibility of residual stratification effects even in the UK Biobank is also supported by a recent study (Haworth et al., 2019). Therefore, we remain cautious about interpreting any residual signals as ‘real’ signals of polygenic adaptation.

Figure 4 with 4 supplements see all

Download asset Open asset

Polygenic height scores in POPRES populations show a residual albeit attenuated signal of polygenic adaptation for height.

Standardized polygenic height scores from four summary statistics for 19 POPRES populations with at least 10 samples per population, ordered by latitude (see Supplementary file 4). The grey line is the linear regression fit to the mean polygenic scores per population. Error bars represent 95% confidence intervals and are calculated in the same way as in Figure 1. SNPs which were overlapping between each set of the summary statistics and the POPRES SNPs were clumped using PLINK 1.9 with parameters r^2 < 0.1, 1 Mb distance, p<1. (Top) A number of independent SNPs was chosen for each summary statistic to match the number of SNPs which remained when clumping UKB at p<0.01. (Bottom) A set of independent SNPs with p<5×10⁻⁸ in the UK Biobank was selected and used to compute polygenic scores along with effect size estimates from each of the different summary statistics. The numbers on each plot show the Q_x P value and the latitude covariance P value respectively for each summary statistic. See Figure 4—figure supplements 1–4 for other clumping strategies and GWAS summary statistics. See Figure 4—source data 1 for figure data.

https://doi.org/10.7554/eLife.39702.023

Figure 4—source data 1 Polygenic height scores in POPRES populations show a residual albeit attenuated signal of polygenic adaptation for height. This reference was updated from its bioRxiv version to its now published version.: https://doi.org/10.7554/eLife.39702.028
Download elife-39702-fig4-data1-v2.xlsx

Discussion

We have shown, by conducting a detailed analysis of human height, that estimates of population differences in polygenic scores are reduced when using the UK Biobank GWAS data relative to claims of previous studies that used GWAS meta-analyses such as GIANT. We find some evidence for population-level differences in genetic height, but it can only be robustly seen at highly significant SNPs, because any signal at less significant P values is dominated by the effect of residual population stratification. Even genome-wide significant SNPs in these analyses may be subtly affected by population structure, leading to continued overestimation of the effect. Thus, it is difficult to arrive at any quantitative conclusion regarding the proportion of the population differences that are due to statistical biases vs. population stratification of genetic height. Further, estimates of the number of independent genetic loci contributing to complex trait variation are sensitive to and likely confounded by residual population stratification.

We conclude that while effect estimates are highly concordant between GIANT and the UK Biobank when measured individually (Supplementary file 5–7, Figure 1—figure supplement 1), they are also influenced by residual population stratification that can mislead comparisons of complex traits across populations and inferences about polygenic adaptation. Although these biases are subtle, in the context of tests for polygenic adaptation, which are driven by small systematic shifts in allele frequency, they can create highly significant artificial signals especially when SNPs that are not genome-wide significant are used to estimate genetic height. Our results do not question the reliability of the genome-wide significant associations discovered in the GIANT cohort. However, we urge caution in the interpretation of signals of polygenic adaptation or between-population differences that are based on large number of sub-significant SNPs–particularly when using effect sizes derived from meta-analysis of heterogeneous cohorts which may be unable to fully control for population structure.

Our results have implications in other areas of human genetics research. For example, there is growing interest in polygenic scores that predict complex phenotypes from the aggregate effects of all allelic variants (Wray et al., 2007; Purcell et al., 2009; Vilhjálmsson et al., 2015; Chun et al., 2018). The observation that individuals with extreme values of polygenic scores exhibit many-fold elevated risk of common diseases raises hopes for their potential clinical utility (Ganna et al., 2013; Khera et al., 2018), and use for sociogenomics applications (Lee et al., 2018; Savage et al., 2018; Nagel et al., 2018). It is already clear that polygenic scores derived from European populations do not translate across populations on a global scale (Martin et al., 2017). Our analysis further suggests that subtle population structure, especially in GWAS that are meta-analyses of independent cohorts, could be an additional source of error in polygenic scores and affect their applicability even within populations. We also note that other factors such as gene by environment interactions can be an alternative confounding factor for GWAS effect sizes and polygenic scores.

Materials and methods

Genome-wide association studies (GWAS)

Request a detailed protocol

We analyzed height using publicly available summary statistics that were obtained either by meta-analysis of multiple GWAS or by a GWAS performed on a single large population. We used results from the GIANT Consortium (N = 253,288) (Wood et al., 2014) and a GWAS performed on individuals of the UK Biobank (‘UKB Neale’ or simply ‘UK Biobank (UKB)', N = 336,474) (Churchhouse et al., 2017) who derive their ancestry almost entirely from the British Isles (identified as ‘white British ancestry (WB)’ by the UK Biobank). The Neale lab’s GWAS uses a linear model with sex and 10 principal components as covariates. We also used an independent GWAS that included all UK Biobank European samples, allowing related individuals as well as population structure (‘UKB Loh’, N = 459,327) (Loh et al., 2018). Loh et al.’s GWAS uses a BOLT-LMM Bayesian mixed model (Loh et al., 2018). Association signals from the three studies are generally correlated for SNPs that are genome-wide significant in GIANT (see Yengo et al., 2018).

We also used previously published family-based effect size estimates (Robinson et al., 2015) (‘NG2015 sibs’) as well as a number of test summary statistics on the UK Biobank that we generated to study the effects of population stratification. These are: ‘UKB Neale new’ (Similar to UKB Neale, with less stringent ancestry definition and 20 PCs calculated within sample), ‘UKB all no PCs’ (All UK Biobank samples included in the GWAS without correction by principal components), ‘UKB all 10 PCs’ (All UK Biobank samples included in the GWAS with correction by 10 principal components), ‘UK WB no PCs’ (Only ‘white British ancestry’ samples included in the GWAS without correction by principal components), ‘UKB WB 10 PCs’ (Only ‘white British ancestry’ samples included in the GWAS with correction by 10 principal components), ‘UKB sibs all’ (All UK Biobank siblings included in the GWAS), ‘UKB sibs WB’ (Only UK Biobank ‘white British ancestry’ siblings included in the GWAS) (Please see Supplementary file 1 for sample sizes and other details).

Population genetic data for ancient and modern samples

Request a detailed protocol

We analyzed ancient and modern populations for which genotype data are publicly available. For ancient samples (Haak et al., 2015; Mathieson et al., 2018), we computed scores after dividing populations into three previously described broad ancestry labels (HG = Hunter Gatherer (n = 162 individuals), EF = Early Farmer (n = 485 individuals), and SP = Steppe Ancestry (n = 465 individuals)). For modern samples available through the 1000 genomes phase three release (Auton et al., 2015), we computed scores in two populations each from Northern Europe (GBR, CEU), Southern Europe (IBS, TSI), Africa (YRI, LWK), South Asia (PJL, BEB) and East Asia (CHB, JPT) (Figure 1a). In total, we analyzed 1112 ancient individuals, and 1005 modern individuals from 10 different populations in the 1000 genomes project (Supplementary file 2). We used the allele frequency differences between the GBR and TSI populations for a number of analyses to study population stratification (Figures 2–3). We also analyzed 19 European populations from the POPRES (Nelson et al., 2008) dataset with at least 10 samples per population (Figure 4—figure supplement 4).

All ancient samples had ‘pseudo-haploid’ genotype calls at 1240k sites generated by selecting a single sequence randomly for each individual at each SNP (Mathieson et al., 2018). Thus, there is only a single allele from each individual at each site, but adjacent alleles might come from either of the two haplotypes of the individual. We also re-computed scores in present-day 1000 genomes individuals using only pseudo-haploid calls at 1240 k sites to allow for a fair comparison between ancient and modern samples (Figure 1—figure supplement 6).

Polygenic scores

Request a detailed protocol

The polygenic scores, confidence intervals and test statistics (against the null model of genetic drift) were computed based on the methodology developed in references Berg and Coop, 2014 and Berg et al., 2017. We computed the polygenic score (Z) for a trait in a population by taking the sum of allele frequencies in that population across all L sites associated with the trait, weighting each allele’s frequency ( $p_{l}$ ) by its effect on the trait ( $β_{l}$ ).

Z = \sum_{l}^{L} β_{l} p_{l}

Al polygenic scores are plotted in centered standardized form ( $\frac{Z - μ}{\sqrt{V_{A}}}$ ),

where $μ = \sum_{l} β_{l} \bar{p_{l}}$ , $V_{A} = \sum_{l} β_{l}^{2} \bar{p_{l}} (1 - \bar{p_{l}})$ , and $\bar{p_{l}}$ is the mean allele frequency across all populations analyzed. Source code repositories for the polygenic score analysis and computing scripts and source data for all the main figures have been made available at https://github.com/msohail88/polygenic_selection (Sohail, 2018; copy archived at https://github.com/elifesciences-publications/polygenic_selection) and https://github.com/uqrmaie1/sohail_maier_2019 (Sohail, 2019; copy archived at https://github.com/elifesciences-publications/sohail_maier_2019).

Polygenic scores were computed using independent GWAS SNPs associated with height in three main ways: (1) The genome was divided into ~1700 non-overlapping linkage disequilibrium (LD) blocks (using the approximately independent linkage disequilibrium blocks in the EUR population computed in Berisa and Pickrell, 2015), and the SNP with the lowest P value within each block was picked to give a set of ~1700 independent SNPs for each height GWAS used (all SNPs for which effect sizes are available were considered) similar to the analysis in Berg et al., 2017. In (2) and (3), Plink’s (Chang et al., 2015; Purcell and Chang, 2015) clumping procedure was used to make independent ‘clumps’ of SNPs for each GWAS at different P value thresholds. This procedure selects SNPs below a given P value threshold as index SNPs to start clumps around, and then reduces all SNPs below a given P value threshold that are in LD with these index SNPs (above an r² threshold, 0.1) and within a physical distance of them (1 Mb) into clumps with them. Clumps are preferentially formed around index SNPs with the lowest P value in a greedy manner. The index SNP from each clump is then picked for further polygenic score analyses. The algorithm is also greedy such that each SNP will only appear in one clump if at all. We clumped each GWAS to obtain (2) a set of independent sub-significant SNPs associated with height (p<0.01) similarly to Robinson et al. (2015), and (3) a set of independent genome-wide significant SNPs associated with height (p<5×10⁻⁸). The 1000 genomes phase three dataset was used as the reference panel for computing LD for the clumping procedure.

The estimated effect sizes for these three sets of SNPs from each GWAS was used to compute scores. Only autosomal SNPs were used for all analyses to avoid creating artificial mean differences between populations with different numbers of males and females.

The 95% credible intervals were constructed by assuming that the posterior of the underlying population allele frequency is independent across loci and populations and follows a beta distribution. We updated a Uniform prior distribution with allele counts from ancient and modern populations to obtain the posterior distribution at each locus in each population. We estimated the variance of the polygenic score $V_{Z}$ using the variance of the posterior distribution at each locus, and computed the width of 95% credible intervals as $1.96 \sqrt{V_{Z}}$ for each population.

The Q_x test statistic measures the degree of overdispersion of the mean population polygenic score compared to a null model of genetic drift. It assumes that the vector of mean centered mean population polygenic score follows a multivariate normal distribution: Z ~ MVN(0, 2 V_A F), where V_A is the additive genetic variance of the ancestral population and F is a square matrix describing the population structure. This is equivalent to the univariate case of the test statistic used in Robinson et al. (2015). The latitude test statistic assumes that Y’Z ~ N(0, 2 V_A Y’FY), where Y is a mean centered vector of latitudes for each population (Berg et al., 2019).

tSDS analysis

Request a detailed protocol

The Singleton Density Score (SDS) method identifies signatures of recent positive selection based on a maximum likelihood estimate of the log-ratio of the mean tip-branch length of the derived vs. the ancestral allele at a given SNP. The tip-branch lengths are inferred from the average distance of each allele to the nearest singleton SNP across all individuals in a sequencing panel. When the sign of the SDS scores is aligned with the trait-increasing or trait-decreasing allele in the effect estimates of a GWAS, the Spearman correlation between the resulting tSDS scores and the GWAS P values has been proposed as an estimate of recent positive selection on polygenic traits.

Here, we applied the tSDS method using pre-computed Singleton Density Scores for 4,451,435 autosomal SNPs obtained from 3195 individuals from the UK10K project (Field et al., 2016a; Field et al., 2016b) for SNPs associated with height in GIANT and the UK biobank (Figure 1b) and in different summary statistics (Figure 3). After normalizing SDS scores in each 1% allele frequency bin to mean zero and unit variance, excluding SNPs from the MHC region on chromosome six and aligning the sign of the SDS scores to the height increasing alleles (resulting in tSDS scores), we computed the Spearman correlation coefficient between the tSDS score and the GWAS P value. The tSDS Spearman correlation standard errors and P values were computed using a block-jackknife approach, where each block of 1% of all SNPs ordered by genomic location was left out and the Spearman correlation coefficient was computed on the remaining SNPs. We also compared the tSDS score distributions for only genome-wide significant SNPs (Figure 3b).

Population structure analysis

Request a detailed protocol

To compute SNP loadings of the principal components of population structure (PC loadings) in the 1000 genomes data (Figure 2), we first computed PC scores for each individual. We used SNPs that had matching alleles in 1000 genomes, GIANT and UK Biobank, that had minor allele frequency >5% in 1000 genomes, and that were not located in the MHC locus, the chromosome eight inversion region, or regions of long LD. After LD pruning to SNPs with r² <0.2 relative to each other, PCA was performed in PLINK on the 187,160 remaining SNPs. In order to get SNP PC loadings for more SNPs than those that were used to compute PC scores, we performed linear regressions of the PC scores on the genotype allele count of each SNP (after controlling for sex) and used the resulting regression coefficients as the SNP PC loading estimates. The 1000 genomes phase one dataset (Abecasis et al., 2012) was used to compute the PC loadings.

Data availability

All newly generated UK Biobank height GWAS summary statistics have been made available at http://dx.doi.org/10.5061/dryad.8g5g6j4. Results from the GIANT Consortium (GWAS Anthropometric 2014 Height) were downloaded from https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files#GWAS_Anthropometric_2014_Height. GWAS results from the UK Biobank ("UKB" or "UKB Neale") were downloaded from http://www.nealelab.is/uk-biobank. The previously published family-based effect size estimates ("NG2015 sibs") can be accessed here http://cnsgenomics.com/data/robinson_et_al_2015_ng/withinfam_summary_ht_bmi_release_March2016.tar.gz. The independent mixed model association analysis that included all UK Biobank individuals of European ancestry ("UKB Loh") was downloaded from https://data.broadinstitute.org/alkesgroup/UKBB/body_HEIGHTz.sumstats.gz. Approximately independent linkage disequilibrium blocks in human populations were downloaded for the EUR population from https://bitbucket.org/nygcresearch/ldetect-data/overview. Source code repositories for the polygenic score analysis in this manuscript and computing scripts and source data for all the main figures have been made available at https://github.com/msohail88/polygenic_selection and https://github.com/uqrmaie1/sohail_maier_2019 (copies archived at https://github.com/elifesciences-publications/polygenic_selection and https://github.com/elifesciences-publications/sohail_maier_2019, respectively).

The following data sets were generated

(2018) Dryad Digital Repository
Data from: Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies.

https://doi.org/10.5061/dryad.8g5g6j4

References

(2012) An integrated map of genetic variation from 1,092 human genomes
Nature 491:56–65.

https://doi.org/10.1038/nature11632
- PubMed
- Google Scholar
(2015) A global reference for human genetic variation
Nature 526:68–74.

https://doi.org/10.1038/nature15393
- PubMed
- Google Scholar
Preprint
1. Berg JJ
2. Zhang X
3. Coop G
(2017) Polygenic adaptation has impacted multiple anthropometric traits
BioRxiv.

https://doi.org/10.1101/167551
- Google Scholar
1. Berg JJ
2. Harpak A
3. Sinnott-Armstrong N
4. Joergensen AM
5. Mostafavi H
6. Field Y
7. Boyle EA
8. Zhang X
9. Racimo F
10. Pritchard JK
11. Coop G
(2019) Reduced signal for polygenic adaptation of height in UK Biobank
eLife 8:e39725.

https://doi.org/10.7554/eLife.39725
- Google Scholar
1. Berg JJ
2. Coop G
(2014) A population genetic signal of polygenic adaptation
PLOS Genetics 10:e1004412.

https://doi.org/10.1371/journal.pgen.1004412
- PubMed
- Google Scholar
1. Berisa T
2. Pickrell JK
(2015) Approximately independent linkage disequilibrium blocks in human populations
Bioinformatics 32:btv546.

https://doi.org/10.1093/bioinformatics/btv546
- Google Scholar
(2017) An expanded view of complex traits: from polygenic to omnigenic
Cell 169:1177–1186.

https://doi.org/10.1016/j.cell.2017.05.038
- PubMed
- Google Scholar
1. Chang CC
2. Chow CC
3. Tellier LC
4. Vattikuti S
5. Purcell SM
6. Lee JJ
(2015) Second-generation PLINK: rising to the challenge of larger and richer datasets
GigaScience 4:1–16.

https://doi.org/10.1186/s13742-015-0047-8
- PubMed
- Google Scholar
Preprint
(2018) Non-parametric polygenic risk prediction using partitioned GWAS summary statistics
BioRxiv.

https://doi.org/10.1101/370064
- Google Scholar
Website
1. Churchhouse C
2. Neale BM
3. Abbott L
4. Anttila V
5. Aragam K
6. Baumann A
7. Bloom J
8. Bryant S
9. Churchhouse C
10. Cole J
11. Daly MJ
12. Damian R
13. Ganna A
14. Goldstein J
15. Haas M
16. Hirschhorn J
17. Howrigan D
18. Jones E
19. King D
(2017) Rapid gwas of thousands of phenotypes for 337,000 samples in the Uk biobank
Accessed February 11, 2018.

https://sites.google.com/broadinstitute.org/ukbbgwasresults/home?authuser=0
1. Field Y
2. Boyle EA
3. Telis N
4. Gao Z
5. Gaulton KJ
6. Golan D
7. Yengo L
8. Rocheleau G
9. Froguel P
10. McCarthy MI
11. Pritchard JK
(2016a) Detection of human adaptation during the past 2000 years
Science 354:760–764.

https://doi.org/10.1126/science.aag0776
- PubMed
- Google Scholar
Data
1. Field Y
2. Boyle E
3. Telis N
4. Gao Z
5. Gaulton K
6. Golan D
7. Yengo L
8. Rocheleau G
9. Froguel P
10. McCarthy M
11. Pritchard J
(authors) (2016b) Data from: detection of human adaptation during the past 2000 years
Dyrad Digital Repository.

https://doi.org/10.5061/dryad.kd58f
(2016) Population structure of UK biobank and ancient eurasians reveals adaptation at genes influencing blood pressure
The American Journal of Human Genetics 99:1130–1139.

https://doi.org/10.1016/j.ajhg.2016.09.014
- PubMed
- Google Scholar
1. Ganna A
2. Magnusson PK
3. Pedersen NL
4. de Faire U
5. Reilly M
6. Arnlöv J
7. Sundström J
8. Hamsten A
9. Ingelsson E
(2013) Multilocus genetic risk scores for coronary heart disease prediction
Arteriosclerosis, Thrombosis, and Vascular Biology 33:2267–2272.

https://doi.org/10.1161/ATVBAHA.113.301218
- PubMed
- Google Scholar
1. Guo J
2. Wu Y
3. Zhu Z
4. Zheng Z
5. Trzaskowski M
6. Zeng J
7. Robinson MR
8. Visscher PM
9. Yang J
(2018) Global genetic differentiation of complex traits shaped by natural selection in humans
Nature Communications 9:1–9.

https://doi.org/10.1038/s41467-018-04191-y
- Google Scholar
1. Haak W
2. Lazaridis I
3. Patterson N
4. Rohland N
5. Mallick S
6. Llamas B
7. Brandt G
8. Nordenfelt S
9. Harney E
10. Stewardson K
11. Fu Q
12. Mittnik A
13. Bánffy E
14. Economou C
15. Francken M
16. Friederich S
17. Pena RG
18. Hallgren F
19. Khartanovich V
20. Khokhlov A
21. Kunst M
22. Kuznetsov P
23. Meller H
24. Mochalov O
25. Moiseyev V
26. Nicklisch N
27. Pichler SL
28. Risch R
29. Rojo Guerra MA
30. Roth C
31. Szécsényi-Nagy A
32. Wahl J
33. Meyer M
34. Krause J
35. Brown D
36. Anthony D
37. Cooper A
38. Alt KW
39. Reich D
(2015) Massive migration from the steppe was a source for Indo-European languages in Europe
Nature 522:207–211.

https://doi.org/10.1038/nature14317
- PubMed
- Google Scholar
1. Haworth S
2. Mitchell R
3. Corbin L
4. Wade KH
5. Dudding T
6. Budu-Aggrey A
7. Carslake D
8. Hemani G
9. Paternoster L
10. Smith GD
11. Davies N
12. Lawson DJ
13. J Timpson N
(2019) Apparent latent structure within the UK biobank sample has implications for epidemiological analysis
Nature Communications 10:.

https://doi.org/10.1038/s41467-018-08219-1
- PubMed
- Google Scholar
1. Khera AV
2. Chaffin M
3. Aragam KG
4. Haas ME
5. Roselli C
6. Choi SH
7. Natarajan P
8. Lander ES
9. Lubitz SA
10. Ellinor PT
11. Kathiresan S
(2018) Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations
Nature Genetics 50:1219–1224.

https://doi.org/10.1038/s41588-018-0183-z
- PubMed
- Google Scholar
1. Lango Allen H
2. Estrada K
3. Lettre G
4. Berndt SI
5. Weedon MN
6. Rivadeneira F
7. Willer CJ
8. Jackson AU
9. Vedantam S
10. Raychaudhuri S
11. Ferreira T
12. Wood AR
13. Weyant RJ
14. Segrè AV
15. Speliotes EK
16. Wheeler E
17. Soranzo N
18. Park JH
19. Yang J
20. Gudbjartsson D
21. Heard-Costa NL
22. Randall JC
23. Qi L
24. Vernon Smith A
25. Mägi R
26. Pastinen T
27. Liang L
28. Heid IM
29. Luan J
30. Thorleifsson G
31. Winkler TW
32. Goddard ME
33. Sin Lo K
34. Palmer C
35. Workalemahu T
36. Aulchenko YS
37. Johansson A
38. Zillikens MC
39. Feitosa MF
40. Esko T
41. Johnson T
42. Ketkar S
43. Kraft P
44. Mangino M
45. Prokopenko I
46. Absher D
47. Albrecht E
48. Ernst F
49. Glazer NL
50. Hayward C
51. Hottenga JJ
52. Jacobs KB
53. Knowles JW
54. Kutalik Z
55. Monda KL
56. Polasek O
57. Preuss M
58. Rayner NW
59. Robertson NR
60. Steinthorsdottir V
61. Tyrer JP
62. Voight BF
63. Wiklund F
64. Xu J
65. Zhao JH
66. Nyholt DR
67. Pellikka N
68. Perola M
69. Perry JR
70. Surakka I
71. Tammesoo ML
72. Altmaier EL
73. Amin N
74. Aspelund T
75. Bhangale T
76. Boucher G
77. Chasman DI
78. Chen C
79. Coin L
80. Cooper MN
81. Dixon AL
82. Gibson Q
83. Grundberg E
84. Hao K
85. Juhani Junttila M
86. Kaplan LM
87. Kettunen J
88. König IR
89. Kwan T
90. Lawrence RW
91. Levinson DF
92. Lorentzon M
93. McKnight B
94. Morris AP
95. Müller M
96. Suh Ngwa J
97. Purcell S
98. Rafelt S
99. Salem RM
100. Salvi E
101. Sanna S
102. Shi J
103. Sovio U
104. Thompson JR
105. Turchin MC
106. Vandenput L
107. Verlaan DJ
108. Vitart V
109. White CC
110. Ziegler A
111. Almgren P
112. Balmforth AJ
113. Campbell H
114. Citterio L
115. De Grandi A
116. Dominiczak A
117. Duan J
118. Elliott P
119. Elosua R
120. Eriksson JG
121. Freimer NB
122. Geus EJ
123. Glorioso N
124. Haiqing S
125. Hartikainen AL
126. Havulinna AS
127. Hicks AA
128. Hui J
129. Igl W
130. Illig T
131. Jula A
132. Kajantie E
133. Kilpeläinen TO
134. Koiranen M
135. Kolcic I
136. Koskinen S
137. Kovacs P
138. Laitinen J
139. Liu J
140. Lokki ML
141. Marusic A
142. Maschio A
143. Meitinger T
144. Mulas A
145. Paré G
146. Parker AN
147. Peden JF
148. Petersmann A
149. Pichler I
150. Pietiläinen KH
151. Pouta A
152. Ridderstråle M
153. Rotter JI
154. Sambrook JG
155. Sanders AR
156. Schmidt CO
157. Sinisalo J
158. Smit JH
159. Stringham HM
160. Bragi Walters G
161. Widen E
162. Wild SH
163. Willemsen G
164. Zagato L
165. Zgaga L
166. Zitting P
167. Alavere H
168. Farrall M
169. McArdle WL
170. Nelis M
171. Peters MJ
172. Ripatti S
173. van Meurs JB
174. Aben KK
175. Ardlie KG
176. Beckmann JS
177. Beilby JP
178. Bergman RN
179. Bergmann S
180. Collins FS
181. Cusi D
182. den Heijer M
183. Eiriksdottir G
184. Gejman PV
185. Hall AS
186. Hamsten A
187. Huikuri HV
188. Iribarren C
189. Kähönen M
190. Kaprio J
191. Kathiresan S
192. Kiemeney L
193. Kocher T
194. Launer LJ
195. Lehtimäki T
196. Melander O
197. Mosley TH
198. Musk AW
199. Nieminen MS
200. O'Donnell CJ
201. Ohlsson C
202. Oostra B
203. Palmer LJ
204. Raitakari O
205. Ridker PM
206. Rioux JD
207. Rissanen A
208. Rivolta C
209. Schunkert H
210. Shuldiner AR
211. Siscovick DS
212. Stumvoll M
213. Tönjes A
214. Tuomilehto J
215. van Ommen GJ
216. Viikari J
217. Heath AC
218. Martin NG
219. Montgomery GW
220. Province MA
221. Kayser M
222. Arnold AM
223. Atwood LD
224. Boerwinkle E
225. Chanock SJ
226. Deloukas P
227. Gieger C
228. Grönberg H
229. Hall P
230. Hattersley AT
231. Hengstenberg C
232. Hoffman W
233. Lathrop GM
234. Salomaa V
235. Schreiber S
236. Uda M
237. Waterworth D
238. Wright AF
239. Assimes TL
240. Barroso I
241. Hofman A
242. Mohlke KL
243. Boomsma DI
244. Caulfield MJ
245. Cupples LA
246. Erdmann J
247. Fox CS
248. Gudnason V
249. Gyllensten U
250. Harris TB
251. Hayes RB
252. Jarvelin MR
253. Mooser V
254. Munroe PB
255. Ouwehand WH
256. Penninx BW
257. Pramstaller PP
258. Quertermous T
259. Rudan I
260. Samani NJ
261. Spector TD
262. Völzke H
263. Watkins H
264. Wilson JF
265. Groop LC
266. Haritunians T
267. Hu FB
268. Kaplan RC
269. Metspalu A
270. North KE
271. Schlessinger D
272. Wareham NJ
273. Hunter DJ
274. O'Connell JR
275. Strachan DP
276. Wichmann HE
277. Borecki IB
278. van Duijn CM
279. Schadt EE
280. Thorsteinsdottir U
281. Peltonen L
282. Uitterlinden AG
283. Visscher PM
284. Chatterjee N
285. Loos RJ
286. Boehnke M
287. McCarthy MI
288. Ingelsson E
289. Lindgren CM
290. Abecasis GR
291. Stefansson K
292. Frayling TM
293. Hirschhorn JN
(2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height
Nature 467:832–838.

https://doi.org/10.1038/nature09410
- PubMed
- Google Scholar
1. Lee JJ
2. Wedow R
3. Okbay A
4. Kong E
5. Maghzian O
6. Zacher M
7. Nguyen-Viet TA
8. Bowers P
9. Sidorenko J
10. Karlsson Linnér R
11. Fontana MA
12. Kundu T
13. Lee C
14. Li H
15. Li R
16. Royer R
17. Timshel PN
18. Walters RK
19. Willoughby EA
20. Yengo L
21. Alver M
22. Bao Y
23. Clark DW
24. Day FR
25. Furlotte NA
26. Joshi PK
27. Kemper KE
28. Kleinman A
29. Langenberg C
30. Mägi R
31. Trampush JW
32. Verma SS
33. Wu Y
34. Lam M
35. Zhao JH
36. Zheng Z
37. Boardman JD
38. Campbell H
39. Freese J
40. Harris KM
41. Hayward C
42. Herd P
43. Kumari M
44. Lencz T
45. Luan J
46. Malhotra AK
47. Metspalu A
48. Milani L
49. Ong KK
50. Perry JRB
51. Porteous DJ
52. Ritchie MD
53. Smart MC
54. Smith BH
55. Tung JY
56. Wareham NJ
57. Wilson JF
58. Beauchamp JP
59. Conley DC
60. Esko T
61. Lehrer SF
62. Magnusson PKE
63. Oskarsson S
64. Pers TH
65. Robinson MR
66. Thom K
67. Watson C
68. Chabris CF
69. Meyer MN
70. Laibson DI
71. Yang J
72. Johannesson M
73. Koellinger PD
74. Turley P
75. Visscher PM
76. Benjamin DJ
77. Cesarini D
78. 23andMe Research Team COGENT (Cognitive Genomics Consortium) Social Science Genetic Association Consortium
(2018) Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals
Nature Genetics 50:1112–1121.

https://doi.org/10.1038/s41588-018-0147-3
- PubMed
- Google Scholar
1. Loh PR
2. Kichaev G
3. Gazal S
4. Schoech AP
5. Price AL
(2018) Mixed-model association for biobank-scale datasets
Nature Genetics 50:906–908.

https://doi.org/10.1038/s41588-018-0144-6
- PubMed
- Google Scholar
1. Martin AR
2. Gignoux CR
3. Walters RK
4. Wojcik GL
5. Neale BM
6. Gravel S
7. Daly MJ
8. Bustamante CD
9. Kenny EE
(2017) Human demographic history impacts genetic risk prediction across diverse populations
The American Journal of Human Genetics 100:635–649.

https://doi.org/10.1016/j.ajhg.2017.03.004
- PubMed
- Google Scholar
1. Mathieson I
2. Lazaridis I
3. Rohland N
4. Mallick S
5. Patterson N
6. Roodenberg SA
7. Harney E
8. Stewardson K
9. Fernandes D
10. Novak M
11. Sirak K
12. Gamba C
13. Jones ER
14. Llamas B
15. Dryomov S
16. Pickrell J
17. Arsuaga JL
18. de Castro JM
19. Carbonell E
20. Gerritsen F
21. Khokhlov A
22. Kuznetsov P
23. Lozano M
24. Meller H
25. Mochalov O
26. Moiseyev V
27. Guerra MA
28. Roodenberg J
29. Vergès JM
30. Krause J
31. Cooper A
32. Alt KW
33. Brown D
34. Anthony D
35. Lalueza-Fox C
36. Haak W
37. Pinhasi R
38. Reich D
(2015) Genome-wide patterns of selection in 230 ancient eurasians
Nature 528:499–503.

https://doi.org/10.1038/nature16152
- PubMed
- Google Scholar
1. Mathieson I
2. Alpaslan-Roodenberg S
3. Posth C
4. Szécsényi-Nagy A
5. Rohland N
6. Mallick S
7. Olalde I
8. Broomandkhoshbacht N
9. Candilio F
10. Cheronet O
11. Fernandes D
12. Ferry M
13. Gamarra B
14. Fortes GG
15. Haak W
16. Harney E
17. Jones E
18. Keating D
19. Krause-Kyora B
20. Kucukkalipci I
21. Michel M
22. Mittnik A
23. Nägele K
24. Novak M
25. Oppenheimer J
26. Patterson N
27. Pfrengle S
28. Sirak K
29. Stewardson K
30. Vai S
31. Alexandrov S
32. Alt KW
33. Andreescu R
34. Antonović D
35. Ash A
36. Atanassova N
37. Bacvarov K
38. Gusztáv MB
39. Bocherens H
40. Bolus M
41. Boroneanţ A
42. Boyadzhiev Y
43. Budnik A
44. Burmaz J
45. Chohadzhiev S
46. Conard NJ
47. Cottiaux R
48. Čuka M
49. Cupillard C
50. Drucker DG
51. Elenski N
52. Francken M
53. Galabova B
54. Ganetsovski G
55. Gély B
56. Hajdu T
57. Handzhyiska V
58. Harvati K
59. Higham T
60. Iliev S
61. Janković I
62. Karavanić I
63. Kennett DJ
64. Komšo D
65. Kozak A
66. Labuda D
67. Lari M
68. Lazar C
69. Leppek M
70. Leshtakov K
71. Vetro DL
72. Los D
73. Lozanov I
74. Malina M
75. Martini F
76. McSweeney K
77. Meller H
78. Menđušić M
79. Mirea P
80. Moiseyev V
81. Petrova V
82. Price TD
83. Simalcsik A
84. Sineo L
85. Šlaus M
86. Slavchev V
87. Stanev P
88. Starović A
89. Szeniczey T
90. Talamo S
91. Teschler-Nicola M
92. Thevenet C
93. Valchev I
94. Valentin F
95. Vasilyev S
96. Veljanovska F
97. Venelinova S
98. Veselovskaya E
99. Viola B
100. Virag C
101. Zaninović J
102. Zäuner S
103. Stockhammer PW
104. Catalano G
105. Krauß R
106. Caramelli D
107. Zariņa G
108. Gaydarska B
109. Lillie M
110. Nikitin AG
111. Potekhina I
112. Papathanasiou A
113. Borić D
114. Bonsall C
115. Krause J
116. Pinhasi R
117. Reich D
(2018) The genomic history of southeastern europe
Nature 555:197–203.

https://doi.org/10.1038/nature25778
- PubMed
- Google Scholar
(2018) Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways
Nature Genetics 50:920–927.

https://doi.org/10.1038/s41588-018-0151-7
- PubMed
- Google Scholar
1. Nelson MR
2. Bryc K
3. King KS
4. Indap A
5. Boyko AR
6. Novembre J
7. Briley LP
8. Maruyama Y
9. Waterworth DM
10. Waeber G
11. Vollenweider P
12. Oksenberg JR
13. Hauser SL
14. Stirnadel HA
15. Kooner JS
16. Chambers JC
17. Jones B
18. Mooser V
19. Bustamante CD
20. Roses AD
21. Burns DK
22. Ehm MG
23. Lai EH
(2008) The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research
The American Journal of Human Genetics 83:347–358.

https://doi.org/10.1016/j.ajhg.2008.08.005
- PubMed
- Google Scholar
1. Novembre J
2. Barton NH
(2018) Tread lightly interpreting polygenic tests of selection
Genetics 208:1351–1355.

https://doi.org/10.1534/genetics.118.300786
- PubMed
- Google Scholar
(2006) Population structure and eigenanalysis
PLOS Genetics 2:e190.

https://doi.org/10.1371/journal.pgen.0020190
- PubMed
- Google Scholar
(2006) Principal components analysis corrects for stratification in genome-wide association studies
Nature Genetics 38:904–909.

https://doi.org/10.1038/ng1847
- PubMed
- Google Scholar
Website
1. Program in Complex Trait Genomics
(2018) Program in complex trait genomics
Accessed December 2, 2018.

http://cnsgenomics.com/data.html
(2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder
Nature 460:748.

https://doi.org/10.1038/nature08185
- PubMed
- Google Scholar
Software
1. Purcell S
2. Chang C
(2015)
PLINK 1

GigaScience.
(2018) Detecting polygenic adaptation in admixture graphs
Genetics 208:1565–1584.

https://doi.org/10.1534/genetics.117.300489
- PubMed
- Google Scholar
1. Robinson MR
2. Hemani G
3. Medina-Gomez C
4. Mezzavilla M
5. Esko T
6. Shakhbazov K
7. Powell JE
8. Vinkhuyzen A
9. Berndt SI
10. Gustafsson S
11. Justice AE
12. Kahali B
13. Locke AE
14. Pers TH
15. Vedantam S
16. Wood AR
17. van Rheenen W
18. Andreassen OA
19. Gasparini P
20. Metspalu A
21. Berg LH
22. Veldink JH
23. Rivadeneira F
24. Werge TM
25. Abecasis GR
26. Boomsma DI
27. Chasman DI
28. de Geus EJ
29. Frayling TM
30. Hirschhorn JN
31. Hottenga JJ
32. Ingelsson E
33. Loos RJ
34. Magnusson PK
35. Martin NG
36. Montgomery GW
37. North KE
38. Pedersen NL
39. Spector TD
40. Speliotes EK
41. Goddard ME
42. Yang J
43. Visscher PM
(2015) Population genetic differentiation of height and body mass index across Europe
Nature Genetics 47:1357–1362.

https://doi.org/10.1038/ng.3401
- PubMed
- Google Scholar
1. Savage JE
2. Jansen PR
3. Stringer S
4. Watanabe K
5. Bryois J
6. de Leeuw CA
7. Nagel M
8. Awasthi S
9. Barr PB
10. Coleman JRI
11. Grasby KL
12. Hammerschlag AR
13. Kaminski JA
14. Karlsson R
15. Krapohl E
16. Lam M
17. Nygaard M
18. Reynolds CA
19. Trampush JW
20. Young H
21. Zabaneh D
22. Hägg S
23. Hansell NK
24. Karlsson IK
25. Linnarsson S
26. Montgomery GW
27. Muñoz-Manchado AB
28. Quinlan EB
29. Schumann G
30. Skene NG
31. Webb BT
32. White T
33. Arking DE
34. Avramopoulos D
35. Bilder RM
36. Bitsios P
37. Burdick KE
38. Cannon TD
39. Chiba-Falek O
40. Christoforou A
41. Cirulli ET
42. Congdon E
43. Corvin A
44. Davies G
45. Deary IJ
46. DeRosse P
47. Dickinson D
48. Djurovic S
49. Donohoe G
50. Conley ED
51. Eriksson JG
52. Espeseth T
53. Freimer NA
54. Giakoumaki S
55. Giegling I
56. Gill M
57. Glahn DC
58. Hariri AR
59. Hatzimanolis A
60. Keller MC
61. Knowles E
62. Koltai D
63. Konte B
64. Lahti J
65. Le Hellard S
66. Lencz T
67. Liewald DC
68. London E
69. Lundervold AJ
70. Malhotra AK
71. Melle I
72. Morris D
73. Need AC
74. Ollier W
75. Palotie A
76. Payton A
77. Pendleton N
78. Poldrack RA
79. Räikkönen K
80. Reinvang I
81. Roussos P
82. Rujescu D
83. Sabb FW
84. Scult MA
85. Smeland OB
86. Smyrnis N
87. Starr JM
88. Steen VM
89. Stefanis NC
90. Straub RE
91. Sundet K
92. Tiemeier H
93. Voineskos AN
94. Weinberger DR
95. Widen E
96. Yu J
97. Abecasis G
98. Andreassen OA
99. Breen G
100. Christiansen L
101. Debrabant B
102. Dick DM
103. Heinz A
104. Hjerling-Leffler J
105. Ikram MA
106. Kendler KS
107. Martin NG
108. Medland SE
109. Pedersen NL
110. Plomin R
111. Polderman TJC
112. Ripke S
113. van der Sluis S
114. Sullivan PF
115. Vrieze SI
116. Wright MJ
117. Posthuma D
(2018) Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence
Nature Genetics 50:912–919.

https://doi.org/10.1038/s41588-018-0152-6
- PubMed
- Google Scholar
Preprint
1. Simonti C
2. Stein J
3. Thompson P
4. Fisher SE
5. Dan J
(2017) Polygenic selection underlies evolution of human brain structure and behavioral traits
BioRxiv.

https://doi.org/10.1101/164707
- Google Scholar
(2017) Negative selection in humans and fruit flies involves synergistic epistasis
Science 356:539–542.

https://doi.org/10.1126/science.aah5238
- PubMed
- Google Scholar
Software
1. Sohail M
(2018) Scripts to compute polygenic scores for height using GIANT and UK Biobank GWAS, version 3a75120
GitHub.

https://github.com/msohail88/polygenic_selection
Software
1. Sohail M
(2019) sohail_maier_2019, version 7e84c66
GitHub.

https://github.com/uqrmaie1/sohail_maier_2019
(2012) Evidence of widespread selection on standing variation in Europe at height-associated SNPs
Nature Genetics 44:1015–1019.

https://doi.org/10.1038/ng.2368
- PubMed
- Google Scholar
(2015) Modeling linkage disequilibrium increases accuracy of polygenic risk scores
The American Journal of Human Genetics 97:576–592.

https://doi.org/10.1016/j.ajhg.2015.09.001
- PubMed
- Google Scholar
1. Wood AR
2. Esko T
3. Yang J
4. Vedantam S
5. Pers TH
6. Gustafsson S
7. Chu AY
8. Estrada K
9. Luan J
10. Kutalik Z
11. Amin N
12. Buchkovich ML
13. Croteau-Chonka DC
14. Day FR
15. Duan Y
16. Fall T
17. Fehrmann R
18. Ferreira T
19. Jackson AU
20. Karjalainen J
21. Lo KS
22. Locke AE
23. Mägi R
24. Mihailov E
25. Porcu E
26. Randall JC
27. Scherag A
28. Vinkhuyzen AA
29. Westra HJ
30. Winkler TW
31. Workalemahu T
32. Zhao JH
33. Absher D
34. Albrecht E
35. Anderson D
36. Baron J
37. Beekman M
38. Demirkan A
39. Ehret GB
40. Feenstra B
41. Feitosa MF
42. Fischer K
43. Fraser RM
44. Goel A
45. Gong J
46. Justice AE
47. Kanoni S
48. Kleber ME
49. Kristiansson K
50. Lim U
51. Lotay V
52. Lui JC
53. Mangino M
54. Mateo Leach I
55. Medina-Gomez C
56. Nalls MA
57. Nyholt DR
58. Palmer CD
59. Pasko D
60. Pechlivanis S
61. Prokopenko I
62. Ried JS
63. Ripke S
64. Shungin D
65. Stancáková A
66. Strawbridge RJ
67. Sung YJ
68. Tanaka T
69. Teumer A
70. Trompet S
71. van der Laan SW
72. van Setten J
73. Van Vliet-Ostaptchouk JV
74. Wang Z
75. Yengo L
76. Zhang W
77. Afzal U
78. Arnlöv J
79. Arscott GM
80. Bandinelli S
81. Barrett A
82. Bellis C
83. Bennett AJ
84. Berne C
85. Blüher M
86. Bolton JL
87. Böttcher Y
88. Boyd HA
89. Bruinenberg M
90. Buckley BM
91. Buyske S
92. Caspersen IH
93. Chines PS
94. Clarke R
95. Claudi-Boehm S
96. Cooper M
97. Daw EW
98. De Jong PA
99. Deelen J
100. Delgado G
101. Denny JC
102. Dhonukshe-Rutten R
103. Dimitriou M
104. Doney AS
105. Dörr M
106. Eklund N
107. Eury E
108. Folkersen L
109. Garcia ME
110. Geller F
111. Giedraitis V
112. Go AS
113. Grallert H
114. Grammer TB
115. Gräßler J
116. Grönberg H
117. de Groot LC
118. Groves CJ
119. Haessler J
120. Hall P
121. Haller T
122. Hallmans G
123. Hannemann A
124. Hartman CA
125. Hassinen M
126. Hayward C
127. Heard-Costa NL
128. Helmer Q
129. Hemani G
130. Henders AK
131. Hillege HL
132. Hlatky MA
133. Hoffmann W
134. Hoffmann P
135. Holmen O
136. Houwing-Duistermaat JJ
137. Illig T
138. Isaacs A
139. James AL
140. Jeff J
141. Johansen B
142. Johansson Å
143. Jolley J
144. Juliusdottir T
145. Junttila J
146. Kho AN
147. Kinnunen L
148. Klopp N
149. Kocher T
150. Kratzer W
151. Lichtner P
152. Lind L
153. Lindström J
154. Lobbens S
155. Lorentzon M
156. Lu Y
157. Lyssenko V
158. Magnusson PK
159. Mahajan A
160. Maillard M
161. McArdle WL
162. McKenzie CA
163. McLachlan S
164. McLaren PJ
165. Menni C
166. Merger S
167. Milani L
168. Moayyeri A
169. Monda KL
170. Morken MA
171. Müller G
172. Müller-Nurasyid M
173. Musk AW
174. Narisu N
175. Nauck M
176. Nolte IM
177. Nöthen MM
178. Oozageer L
179. Pilz S
180. Rayner NW
181. Renstrom F
182. Robertson NR
183. Rose LM
184. Roussel R
185. Sanna S
186. Scharnagl H
187. Scholtens S
188. Schumacher FR
189. Schunkert H
190. Scott RA
191. Sehmi J
192. Seufferlein T
193. Shi J
194. Silventoinen K
195. Smit JH
196. Smith AV
197. Smolonska J
198. Stanton AV
199. Stirrups K
200. Stott DJ
201. Stringham HM
202. Sundström J
203. Swertz MA
204. Syvänen AC
205. Tayo BO
206. Thorleifsson G
207. Tyrer JP
208. van Dijk S
209. van Schoor NM
210. van der Velde N
211. van Heemst D
212. van Oort FV
213. Vermeulen SH
214. Verweij N
215. Vonk JM
216. Waite LL
217. Waldenberger M
218. Wennauer R
219. Wilkens LR
220. Willenborg C
221. Wilsgaard T
222. Wojczynski MK
223. Wong A
224. Wright AF
225. Zhang Q
226. Arveiler D
227. Bakker SJ
228. Beilby J
229. Bergman RN
230. Bergmann S
231. Biffar R
232. Blangero J
233. Boomsma DI
234. Bornstein SR
235. Bovet P
236. Brambilla P
237. Brown MJ
238. Campbell H
239. Caulfield MJ
240. Chakravarti A
241. Collins R
242. Collins FS
243. Crawford DC
244. Cupples LA
245. Danesh J
246. de Faire U
247. den Ruijter HM
248. Erbel R
249. Erdmann J
250. Eriksson JG
251. Farrall M
252. Ferrannini E
253. Ferrières J
254. Ford I
255. Forouhi NG
256. Forrester T
257. Gansevoort RT
258. Gejman PV
259. Gieger C
260. Golay A
261. Gottesman O
262. Gudnason V
263. Gyllensten U
264. Haas DW
265. Hall AS
266. Harris TB
267. Hattersley AT
268. Heath AC
269. Hengstenberg C
270. Hicks AA
271. Hindorff LA
272. Hingorani AD
273. Hofman A
274. Hovingh GK
275. Humphries SE
276. Hunt SC
277. Hypponen E
278. Jacobs KB
279. Jarvelin MR
280. Jousilahti P
281. Jula AM
282. Kaprio J
283. Kastelein JJ
284. Kayser M
285. Kee F
286. Keinanen-Kiukaanniemi SM
287. Kiemeney LA
288. Kooner JS
289. Kooperberg C
290. Koskinen S
291. Kovacs P
292. Kraja AT
293. Kumari M
294. Kuusisto J
295. Lakka TA
296. Langenberg C
297. Le Marchand L
298. Lehtimäki T
299. Lupoli S
300. Madden PA
301. Männistö S
302. Manunta P
303. Marette A
304. Matise TC
305. McKnight B
306. Meitinger T
307. Moll FL
308. Montgomery GW
309. Morris AD
310. Morris AP
311. Murray JC
312. Nelis M
313. Ohlsson C
314. Oldehinkel AJ
315. Ong KK
316. Ouwehand WH
317. Pasterkamp G
318. Peters A
319. Pramstaller PP
320. Price JF
321. Qi L
322. Raitakari OT
323. Rankinen T
324. Rao DC
325. Rice TK
326. Ritchie M
327. Rudan I
328. Salomaa V
329. Samani NJ
330. Saramies J
331. Sarzynski MA
332. Schwarz PE
333. Sebert S
334. Sever P
335. Shuldiner AR
336. Sinisalo J
337. Steinthorsdottir V
338. Stolk RP
339. Tardif JC
340. Tönjes A
341. Tremblay A
342. Tremoli E
343. Virtamo J
344. Vohl MC
345. Amouyel P
346. Asselbergs FW
347. Assimes TL
348. Bochud M
349. Boehm BO
350. Boerwinkle E
351. Bottinger EP
352. Bouchard C
353. Cauchi S
354. Chambers JC
355. Chanock SJ
356. Cooper RS
357. de Bakker PI
358. Dedoussis G
359. Ferrucci L
360. Franks PW
361. Froguel P
362. Groop LC
363. Haiman CA
364. Hamsten A
365. Hayes MG
366. Hui J
367. Hunter DJ
368. Hveem K
369. Jukema JW
370. Kaplan RC
371. Kivimaki M
372. Kuh D
373. Laakso M
374. Liu Y
375. Martin NG
376. März W
377. Melbye M
378. Moebus S
379. Munroe PB
380. Njølstad I
381. Oostra BA
382. Palmer CN
383. Pedersen NL
384. Perola M
385. Pérusse L
386. Peters U
387. Powell JE
388. Power C
389. Quertermous T
390. Rauramaa R
391. Reinmaa E
392. Ridker PM
393. Rivadeneira F
394. Rotter JI
395. Saaristo TE
396. Saleheen D
397. Schlessinger D
398. Slagboom PE
399. Snieder H
400. Spector TD
401. Strauch K
402. Stumvoll M
403. Tuomilehto J
404. Uusitupa M
405. van der Harst P
406. Völzke H
407. Walker M
408. Wareham NJ
409. Watkins H
410. Wichmann HE
411. Wilson JF
412. Zanen P
413. Deloukas P
414. Heid IM
415. Lindgren CM
416. Mohlke KL
417. Speliotes EK
418. Thorsteinsdottir U
419. Barroso I
420. Fox CS
421. North KE
422. Strachan DP
423. Beckmann JS
424. Berndt SI
425. Boehnke M
426. Borecki IB
427. McCarthy MI
428. Metspalu A
429. Stefansson K
430. Uitterlinden AG
431. van Duijn CM
432. Franke L
433. Willer CJ
434. Price AL
435. Lettre G
436. Loos RJ
437. Weedon MN
438. Ingelsson E
439. O'Connell JR
440. Abecasis GR
441. Chasman DI
442. Goddard ME
443. Visscher PM
444. Hirschhorn JN
445. Frayling TM
446. Electronic Medical Records and Genomics (eMEMERGEGE) ConsortiumMIGen Consortium PAGEGE ConsortiumLifeLines Cohort Study
(2014) Defining the role of common variation in the genomic and biological architecture of adult human height
Nature Genetics 46:1173–1186.

https://doi.org/10.1038/ng.3097
- PubMed
- Google Scholar
(2007) Prediction of individual genetic risk to disease from genome-wide association studies
Genome Research 17:1520–1528.

https://doi.org/10.1101/gr.6665407
- PubMed
- Google Scholar
1. Yang J
2. Benyamin B
3. McEvoy BP
4. Gordon S
5. Henders AK
6. Nyholt DR
7. Madden PA
8. Heath AC
9. Martin NG
10. Montgomery GW
11. Goddard ME
12. Visscher PM
(2010) Common SNPs explain a large proportion of the heritability for human height
Nature Genetics 42:565–569.

https://doi.org/10.1038/ng.608
- PubMed
- Google Scholar
Preprint
1. Yengo L
2. Sidorenko J
3. Kemper KE
4. Zheng Z
5. Wood AR
6. Weedon MN
7. Frayling TM
8. Hirschhorn J
9. Yang J
10. Peter M
(2018) Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of european ancestry
BioRxiv.

https://doi.org/10.1101/274654
- Google Scholar
1. Zeng J
2. de Vlaming R
3. Wu Y
4. Robinson MR
5. Lloyd-Jones LR
6. Yengo L
7. Yap CX
8. Xue A
9. Sidorenko J
10. McRae AF
11. Powell JE
12. Montgomery GW
13. Metspalu A
14. Esko T
15. Gibson G
16. Wray NR
17. Visscher PM
18. Yang J
(2018) Signatures of negative selection in the genetic architecture of human complex traits
Nature Genetics 50:746–753.

https://doi.org/10.1038/s41588-018-0101-4
- PubMed
- Google Scholar

Article and author information

Author details

Mashaal Sohail
1. Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, United States
2. Department of Biomedical Informatics, Harvard Medical School, Boston, United States
3. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
Present address
National Laboratory of Genomics for Biodiversity (UGA-LANGEBIO), Cinvestav, Irapuato, Mexico

Contribution
Conceptualization, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing

Contributed equally with
Robert M Maier

For correspondence
mashaal33@gmail.com

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-6586-4403
Robert M Maier
1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
2. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, United States
3. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, United States
Contribution
Conceptualization, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing

Contributed equally with
Mashaal Sohail

For correspondence
rmaier@broadinstitute.org

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-3044-090X
Andrea Ganna
1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
2. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, United States
3. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, United States
4. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
5. Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
Contribution
Formal analysis, Writing—review and editing

Competing interests
No competing interests declared
Alex Bloemendal
1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
2. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, United States
3. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, United States
Contribution
Methodology, Writing—review and editing

Competing interests
No competing interests declared
Alicia R Martin
1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
2. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, United States
3. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, United States
Contribution
Data curation, Writing—review and editing

Competing interests
No competing interests declared
Michael C Turchin
1. Center for Computational Molecular Biology, Brown University, Providence, United States
2. Department of Ecology and Evolutionary Biology, Brown University, Providence, United States
Contribution
Validation, Writing—review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-3569-1529
Charleston WK Chiang

Department of Preventive Medicine, Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, United States

Contribution
Validation, Writing—review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-0668-7865
Joel Hirschhorn
1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
2. Departments of Pediatrics and Genetics, Harvard Medical School, Boston, United States
3. Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children’s Hospital, Boston, United States
Contribution
Validation, Methodology, Writing—review and editing

Competing interests
No competing interests declared
Mark J Daly
1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
2. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, United States
3. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, United States
4. Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
Contribution
Methodology, Writing—review and editing

Competing interests
No competing interests declared
Nick Patterson
1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
2. Department of Genetics, Harvard Medical School, Boston, United States
Contribution
Methodology, Writing—review and editing

Competing interests
No competing interests declared
Benjamin Neale
1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
2. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, United States
3. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, United States
Contribution
Supervision, Visualization, Methodology, Writing—review and editing

Contributed equally with
Iain Mathieson, David Reich and Shamil R Sunyaev

For correspondence
bneale@broadinstitute.org

Competing interests
Ben Neale is a member and on the scientific advisory board of Deep Genomics, a consultant for Camp4 Therapeutics Corporation, a consultant for Merck & Co., a consultant for Takeda Phamaceutical, and a consultant for Avanir Pharmaceuticals. None of these entities played a role in determining the content of this paper.
Iain Mathieson

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States

Contribution
Data curation, Supervision, Investigation, Methodology, Writing—review and editing

Contributed equally with
Benjamin Neale, David Reich and Shamil R Sunyaev

For correspondence
mathi@pennmedicine.upenn.edu

Competing interests
No competing interests declared
David Reich
1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
2. Department of Genetics, Harvard Medical School, Boston, United States
3. Howard Hughes Medical Institute, Harvard Medical School, Boston, United States
Contribution
Conceptualization, Supervision, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing

Contributed equally with
Benjamin Neale, Iain Mathieson and Shamil R Sunyaev

For correspondence
reich@genetics.med.harvard.edu

Competing interests
No competing interests declared
Shamil R Sunyaev
1. Department of Biomedical Informatics, Harvard Medical School, Boston, United States
2. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, United States
3. Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, United States
Contribution
Conceptualization, Supervision, Methodology, Writing—original draft, Project administration, Writing—review and editing

Contributed equally with
Benjamin Neale, Iain Mathieson and David Reich

For correspondence
ssunyaev@rics.bwh.harvard.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-5715-5677

Funding

National Institutes of Health (HG009088)

Mashaal Sohail
Robert M Maier
Benjamin Neale
Shamil R Sunyaev

National Institutes of Health (MH101244)

Mashaal Sohail
Robert M Maier
Benjamin Neale
Shamil R Sunyaev

Alfred P. Sloan Foundation (Sloan Research Fellowship)

Iain Mathieson

Charles E Kaufman Foundation (New Investigator Research Grant)

Iain Mathieson

Paul Allen Foundation (Allen Discovery Center)

David Reich

National Institutes of Health (GM100233)

David Reich

National Institutes of Health (HG006399)

David Reich

Howard Hughes Medical Institute (Investigator)

David Reich

National Institutes of Health (GM127131)

Shamil R Sunyaev

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Alkes Price, Jeremy Berg, Graham Coop, Jonathan Pritchard, Matthew Robinson, Jian Yang, Peter Visscher, Hilary Finucane, John Novembre and Raymond Walters for useful discussions and comments that significantly improved the manuscript. The study was supported by National Institutes of Health grants HG009088, MH101244 (MS, RM, BN and SS) and GM127131 (SS). DR was supported by National Institutes of Health grants GM100233 and HG006399, an Allen Discovery Center grant from the Paul Allen Foundation, and the Howard Hughes Medical Institute. IM was supported by a Sloan Research Fellowship and a New Investigator Research Grant from the Charles E Kaufman foundation.

This research was conducted using the UK Biobank Resource applications 18597, 11898 and 31063.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.