1. Evolutionary Biology
  2. Genetics and Genomics
Download icon

Population Genetics: Why structure matters

  1. Nick Barton  Is a corresponding author
  2. Joachim Hermisson  Is a corresponding author
  3. Magnus Nordborg  Is a corresponding author
  1. IST Austria, Austria
  2. University of Vienna, Austria
  3. Austrian Academy of Sciences, Vienna BioCenter, Austria
Insight
  • Cited 2
  • Views 10,483
  • Annotations
Cite this article as: eLife 2019;8:e45380 doi: 10.7554/eLife.45380

Abstract

Great care is needed when interpreting claims about the genetic basis of human variation based on data from genome-wide association studies.

Main text

Human height is the classic example of a quantitative trait: its distribution is continuous, presumably because it is influenced by variation at a very large number of genes, most with a small effect (Fisher, 1918). Yet height is also strongly affected by the environment: average height in many countries increased during the last century and the children of immigrants are often taller than relatives in their country of origin – in both cases presumably due to changing diet and other environmental factors (Cavalli-Sforza and Bodmer, 1971; Grasgruber et al., 2016; NCD Risk Factor Collaboration, 2016). This makes it very difficult to determine the cause of geographic patterns for height, such as the ‘latitudinal cline’ seen in Europe (Figure 1).

Distribution of average male height in Europe, calculated from studies performed between 1999–2013.

In general, southern Europeans tend to be shorter than northern Europeans. Image reproduced from Grasgruber et al., 2014 (CC BY 3.0).

Are such patterns caused by environmental or genetic differences – or by a complex combination of both? And to the extent that genetic differences are involved, do they reflect selection or simply random history? A number of recent papers have relied on so-called Genome-Wide Association Studies (GWAS) to address these questions, and reported strong evidence for both genetics and selection. Now, in eLife, two papers – one by Jeremy Berg, Arbel Harpak, Nasa Sinnott-Armstrong and colleagues (Berg et al., 2019); the other by Mashaal Sohail, Robert Maier and colleagues (Sohail et al., 2019) – independently reject these conclusions. Even more importantly, they identify problems with GWAS that have broader implications for human genetics.

As the name suggests, GWAS scan the genome for variants – typically single nucleotide polymorphisms (SNPs) – that are associated with a particular condition or trait (phenotype). The first GWAS for height found a small number of SNPs that jointly explained only a tiny fraction of the variation. Because this was in contrast with the high heritability seen in twin studies, it was dubbed ‘the missing heritability problem’ (reviewed in Yang et al., 2010). It was suggested that the problem was simply due to a lack of statistical power to detect polymorphisms of small effect. Subsequent studies with larger sample sizes have supported this explanation: more and more loci have been identified although most of the variation remains ‘unmappable’, presumably because sample sizes on the order of a million are still not large enough (Yengo et al., 2018).

One way in which the unmappable component of genetic variation can be included in a statistical measure is via so-called polygenic scores. These scores sum the estimated contributions to the trait across many SNPs, including those whose effects, on their own, are not statistically significant. Polygenic scores thus represent a shift from the goal of identifying major genes to predicting phenotype from genotype. Originally designed for plant and animal breeding purposes, polygenic scores can, in principle, also be used to study the genetic basis of differences between individuals and groups.

This, however, requires accurate and unbiased estimation of the effects of all SNPs included in the score, which is difficult in a structured (non-homogeneous) population when environmental differences cannot be controlled. To see why this is a problem, consider the classic example of chopstick-eating skills (Lander and Schork, 1994). While there surely are genetic variants affecting our ability to handle chopsticks, most of the variation for this trait across the globe is due to environmental differences (cultural background), and a GWAS would mostly identify variants that had nothing to do with chopstick skills, but simply happened to differ in frequency between East Asia and the rest of the world.

Several methods for dealing with this problem have been proposed. When a GWAS is carried out to identify major genes, it is relatively simple to avoid false positives by eliminating associations outside major loci regardless of whether they are due to population structure confounding or an unmappable polygenic background (Vilhjálmsson and Nordborg, 2013). However, if the goal is to make predictions, or to understand differences among populations (such as the latitudinal cline in height), we need accurate and unbiased estimates for all SNPs. Accomplishing this is extremely challenging, and it is also difficult to know whether one has succeeded.

One possibility is to compare the population estimates with estimates taken from sibling data, which should be relatively unbiased by environmental differences. In one of many examples of this, Robinson et al. used data from the GIANT Consortium (Wood et al., 2014) together with sibling data to estimate that genetic variation contributes significantly to height variation across Europe (Robinson et al., 2015). They also argued that selection must have occurred, because the differences were too large to have arisen by chance. Using estimated effect sizes provided by Robinson et al., a more sophisticated analysis by Field et al. found extremely strong evidence for selection for height across Europe (p=10−74; Field et al., 2016). Several other studies reached the same conclusion based on the GIANT data (reviewed in Berg et al., 2019; Sohail et al., 2019).

Berg et al. (who are based at Columbia University, Stanford University, UC Davis and the University of Copenhagen) and Sohail et al. (who are based at Harvard Medical School, the Broad Institute, and other institutes in the US, Finland and Sweden) now re-examine these conclusions using the recently released data from the UK Biobank (Sudlow et al., 2015). Estimating effect sizes from these data allows possible biases due to population structure confounding to be investigated, because the UK Biobank data comes from a (supposedly) more homogenous population than the GIANT data.

Using these new estimates, Berg et al. and Sohail et al. independently found that evidence for selection vanishes – along with evidence for a genetic cline in height across Europe. Instead, they show that the previously published results were due to the cumulative effects of slight biases in the effect-size estimates in the GIANT data. Surprisingly, they also found evidence for confounding in the sibling data used as a control by Robinson et al. and Field et al. This turned out to be due to a technical error in the data distributed by Robinson et al. after they published their paper.

This means we still do not know whether genetics and selection are responsible for the pattern of height differences seen across Europe. That genetics plays a major role in height differences between individuals is not in doubt, and it is also clear that the signal from GWAS is mostly real. The issue is that there is no perfect way to control for complex population structure and environmental heterogeneity. Biases at individual loci may be tiny, but they become highly significant when summed across thousands of loci – as is done in polygenic scores. Standard methods to control for these biases, such as principal component analysis, may work well in simulations but are often insufficient when confronted with real data. Importantly, no natural population is unstructured: indeed, even the data in the UK Biobank seems to contain significant structure (Haworth et al., 2019).

Berg et al. and Sohail et al. demonstrate the potential for population structure to create spurious results, especially when using methods that rely on large numbers of small effects, such as polygenic scores. Caution is clearly needed when interpreting and using the results of such studies. For clinical predictions, risks must be weighed against benefits (Rosenberg et al., 2019). In some cases, such as recommendations for more frequent medical checkups for patients found at higher ‘genetic’ risk of a condition, it may not matter greatly whether predictors are confounded as long as they work. By contrast, the results of behavioral studies of traits such as IQ and educational attainment (Plomin and von Stumm, 2018) must be presented carefully, because while the benefits are far from obvious, the risks of such results being misinterpreted and misused are quite clear. The problem is worsened by the tendency of popular media to ignore caveats and uncertainties of estimates.

Finally, although quantitative genetics has proved highly successful in plant and animal breeding, it should be remembered that this success has been based on large pedigrees, well-controlled environments, and short-term prediction. When these methods have been applied to natural populations, even the most basic predictions fail, in large part due to poorly understood environmental factors (Charmantier et al., 2014). Natural populations are never homogeneous, and it is therefore misleading to imply there is a qualitative difference between ‘within-population’ and ‘between-population’ comparisons – as was recently done in connection with James Watson’s statements about race and IQ (Harmon, 2019). With respect to confounding by population structure, the key qualitative difference is between controlling the environment experimentally, and not doing so. Once we leave an experimental setting, we are effectively skating on thin ice, and whether the ice will hold depends on how far out we skate.

References

  1. 1
  2. 2
    The Genetics of Human Populations
    1. LL Cavalli-Sforza
    2. WF Bodmer
    (1971)
    San Francisco: WH Freeman.
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
    Defining the role of common variation in the genomic and biological architecture of adult human height
    1. AR Wood
    2. T Esko
    3. J Yang
    4. S Vedantam
    5. TH Pers
    6. S Gustafsson
    7. AY Chu
    8. K Estrada
    9. J Luan
    10. Z Kutalik
    11. N Amin
    12. ML Buchkovich
    13. DC Croteau-Chonka
    14. FR Day
    15. Y Duan
    16. T Fall
    17. R Fehrmann
    18. T Ferreira
    19. AU Jackson
    20. J Karjalainen
    21. KS Lo
    22. AE Locke
    23. R Mägi
    24. E Mihailov
    25. E Porcu
    26. JC Randall
    27. A Scherag
    28. AA Vinkhuyzen
    29. HJ Westra
    30. TW Winkler
    31. T Workalemahu
    32. JH Zhao
    33. D Absher
    34. E Albrecht
    35. D Anderson
    36. J Baron
    37. M Beekman
    38. A Demirkan
    39. GB Ehret
    40. B Feenstra
    41. MF Feitosa
    42. K Fischer
    43. RM Fraser
    44. A Goel
    45. J Gong
    46. AE Justice
    47. S Kanoni
    48. ME Kleber
    49. K Kristiansson
    50. U Lim
    51. V Lotay
    52. JC Lui
    53. M Mangino
    54. I Mateo Leach
    55. C Medina-Gomez
    56. MA Nalls
    57. DR Nyholt
    58. CD Palmer
    59. D Pasko
    60. S Pechlivanis
    61. I Prokopenko
    62. JS Ried
    63. S Ripke
    64. D Shungin
    65. A Stancáková
    66. RJ Strawbridge
    67. YJ Sung
    68. T Tanaka
    69. A Teumer
    70. S Trompet
    71. SW van der Laan
    72. J van Setten
    73. JV Van Vliet-Ostaptchouk
    74. Z Wang
    75. L Yengo
    76. W Zhang
    77. U Afzal
    78. J Arnlöv
    79. GM Arscott
    80. S Bandinelli
    81. A Barrett
    82. C Bellis
    83. AJ Bennett
    84. C Berne
    85. M Blüher
    86. JL Bolton
    87. Y Böttcher
    88. HA Boyd
    89. M Bruinenberg
    90. BM Buckley
    91. S Buyske
    92. IH Caspersen
    93. PS Chines
    94. R Clarke
    95. S Claudi-Boehm
    96. M Cooper
    97. EW Daw
    98. PA De Jong
    99. J Deelen
    100. G Delgado
    101. JC Denny
    102. R Dhonukshe-Rutten
    103. M Dimitriou
    104. AS Doney
    105. M Dörr
    106. N Eklund
    107. E Eury
    108. L Folkersen
    109. ME Garcia
    110. F Geller
    111. V Giedraitis
    112. AS Go
    113. H Grallert
    114. TB Grammer
    115. J Gräßler
    116. H Grönberg
    117. LC de Groot
    118. CJ Groves
    119. J Haessler
    120. P Hall
    121. T Haller
    122. G Hallmans
    123. A Hannemann
    124. CA Hartman
    125. M Hassinen
    126. C Hayward
    127. NL Heard-Costa
    128. Q Helmer
    129. G Hemani
    130. AK Henders
    131. HL Hillege
    132. MA Hlatky
    133. W Hoffmann
    134. P Hoffmann
    135. O Holmen
    136. JJ Houwing-Duistermaat
    137. T Illig
    138. A Isaacs
    139. AL James
    140. J Jeff
    141. B Johansen
    142. Å Johansson
    143. J Jolley
    144. T Juliusdottir
    145. J Junttila
    146. AN Kho
    147. L Kinnunen
    148. N Klopp
    149. T Kocher
    150. W Kratzer
    151. P Lichtner
    152. L Lind
    153. J Lindström
    154. S Lobbens
    155. M Lorentzon
    156. Y Lu
    157. V Lyssenko
    158. PK Magnusson
    159. A Mahajan
    160. M Maillard
    161. WL McArdle
    162. CA McKenzie
    163. S McLachlan
    164. PJ McLaren
    165. C Menni
    166. S Merger
    167. L Milani
    168. A Moayyeri
    169. KL Monda
    170. MA Morken
    171. G Müller
    172. M Müller-Nurasyid
    173. AW Musk
    174. N Narisu
    175. M Nauck
    176. IM Nolte
    177. MM Nöthen
    178. L Oozageer
    179. S Pilz
    180. NW Rayner
    181. F Renstrom
    182. NR Robertson
    183. LM Rose
    184. R Roussel
    185. S Sanna
    186. H Scharnagl
    187. S Scholtens
    188. FR Schumacher
    189. H Schunkert
    190. RA Scott
    191. J Sehmi
    192. T Seufferlein
    193. J Shi
    194. K Silventoinen
    195. JH Smit
    196. AV Smith
    197. J Smolonska
    198. AV Stanton
    199. K Stirrups
    200. DJ Stott
    201. HM Stringham
    202. J Sundström
    203. MA Swertz
    204. AC Syvänen
    205. BO Tayo
    206. G Thorleifsson
    207. JP Tyrer
    208. S van Dijk
    209. NM van Schoor
    210. N van der Velde
    211. D van Heemst
    212. FV van Oort
    213. SH Vermeulen
    214. N Verweij
    215. JM Vonk
    216. LL Waite
    217. M Waldenberger
    218. R Wennauer
    219. LR Wilkens
    220. C Willenborg
    221. T Wilsgaard
    222. MK Wojczynski
    223. A Wong
    224. AF Wright
    225. Q Zhang
    226. D Arveiler
    227. SJ Bakker
    228. J Beilby
    229. RN Bergman
    230. S Bergmann
    231. R Biffar
    232. J Blangero
    233. DI Boomsma
    234. SR Bornstein
    235. P Bovet
    236. P Brambilla
    237. MJ Brown
    238. H Campbell
    239. MJ Caulfield
    240. A Chakravarti
    241. R Collins
    242. FS Collins
    243. DC Crawford
    244. LA Cupples
    245. J Danesh
    246. U de Faire
    247. HM den Ruijter
    248. R Erbel
    249. J Erdmann
    250. JG Eriksson
    251. M Farrall
    252. E Ferrannini
    253. J Ferrières
    254. I Ford
    255. NG Forouhi
    256. T Forrester
    257. RT Gansevoort
    258. PV Gejman
    259. C Gieger
    260. A Golay
    261. O Gottesman
    262. V Gudnason
    263. U Gyllensten
    264. DW Haas
    265. AS Hall
    266. TB Harris
    267. AT Hattersley
    268. AC Heath
    269. C Hengstenberg
    270. AA Hicks
    271. LA Hindorff
    272. AD Hingorani
    273. A Hofman
    274. GK Hovingh
    275. SE Humphries
    276. SC Hunt
    277. E Hypponen
    278. KB Jacobs
    279. MR Jarvelin
    280. P Jousilahti
    281. AM Jula
    282. J Kaprio
    283. JJ Kastelein
    284. M Kayser
    285. F Kee
    286. SM Keinanen-Kiukaanniemi
    287. LA Kiemeney
    288. JS Kooner
    289. C Kooperberg
    290. S Koskinen
    291. P Kovacs
    292. AT Kraja
    293. M Kumari
    294. J Kuusisto
    295. TA Lakka
    296. C Langenberg
    297. L Le Marchand
    298. T Lehtimäki
    299. S Lupoli
    300. PA Madden
    301. S Männistö
    302. P Manunta
    303. A Marette
    304. TC Matise
    305. B McKnight
    306. T Meitinger
    307. FL Moll
    308. GW Montgomery
    309. AD Morris
    310. AP Morris
    311. JC Murray
    312. M Nelis
    313. C Ohlsson
    314. AJ Oldehinkel
    315. KK Ong
    316. WH Ouwehand
    317. G Pasterkamp
    318. A Peters
    319. PP Pramstaller
    320. JF Price
    321. L Qi
    322. OT Raitakari
    323. T Rankinen
    324. DC Rao
    325. TK Rice
    326. M Ritchie
    327. I Rudan
    328. V Salomaa
    329. NJ Samani
    330. J Saramies
    331. MA Sarzynski
    332. PE Schwarz
    333. S Sebert
    334. P Sever
    335. AR Shuldiner
    336. J Sinisalo
    337. V Steinthorsdottir
    338. RP Stolk
    339. JC Tardif
    340. A Tönjes
    341. A Tremblay
    342. E Tremoli
    343. J Virtamo
    344. MC Vohl
    345. P Amouyel
    346. FW Asselbergs
    347. TL Assimes
    348. M Bochud
    349. BO Boehm
    350. E Boerwinkle
    351. EP Bottinger
    352. C Bouchard
    353. S Cauchi
    354. JC Chambers
    355. SJ Chanock
    356. RS Cooper
    357. PI de Bakker
    358. G Dedoussis
    359. L Ferrucci
    360. PW Franks
    361. P Froguel
    362. LC Groop
    363. CA Haiman
    364. A Hamsten
    365. MG Hayes
    366. J Hui
    367. DJ Hunter
    368. K Hveem
    369. JW Jukema
    370. RC Kaplan
    371. M Kivimaki
    372. D Kuh
    373. M Laakso
    374. Y Liu
    375. NG Martin
    376. W März
    377. M Melbye
    378. S Moebus
    379. PB Munroe
    380. I Njølstad
    381. BA Oostra
    382. CN Palmer
    383. NL Pedersen
    384. M Perola
    385. L Pérusse
    386. U Peters
    387. JE Powell
    388. C Power
    389. T Quertermous
    390. R Rauramaa
    391. E Reinmaa
    392. PM Ridker
    393. F Rivadeneira
    394. JI Rotter
    395. TE Saaristo
    396. D Saleheen
    397. D Schlessinger
    398. PE Slagboom
    399. H Snieder
    400. TD Spector
    401. K Strauch
    402. M Stumvoll
    403. J Tuomilehto
    404. M Uusitupa
    405. P van der Harst
    406. H Völzke
    407. M Walker
    408. NJ Wareham
    409. H Watkins
    410. HE Wichmann
    411. JF Wilson
    412. P Zanen
    413. P Deloukas
    414. IM Heid
    415. CM Lindgren
    416. KL Mohlke
    417. EK Speliotes
    418. U Thorsteinsdottir
    419. I Barroso
    420. CS Fox
    421. KE North
    422. DP Strachan
    423. JS Beckmann
    424. SI Berndt
    425. M Boehnke
    426. IB Borecki
    427. MI McCarthy
    428. A Metspalu
    429. K Stefansson
    430. AG Uitterlinden
    431. CM van Duijn
    432. L Franke
    433. CJ Willer
    434. AL Price
    435. G Lettre
    436. RJ Loos
    437. MN Weedon
    438. E Ingelsson
    439. JR O'Connell
    440. GR Abecasis
    441. DI Chasman
    442. ME Goddard
    443. PM Visscher
    444. JN Hirschhorn
    445. TM Frayling
    446. Electronic Medical Records and Genomics (eMEMERGEGE) Consortium MIGen Consortium PAGEGE Consortium LifeLines Cohort Study
    (2014)
    Nature Genetics 46:1173–1186.
    https://doi.org/10.1038/ng.3097
  19. 19
  20. 20

Article and author information

Author details

  1. Nick Barton

    Nick Barton is at IST Austria, Klosterneuburg, Austria

    For correspondence
    nick.barton@ist-austria.ac.at
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8548-5240
  2. Joachim Hermisson

    Joachim Hermisson is at the Department of Mathematics and at the Max F. Perutz Laboratories, University of Vienna, Vienna, Austria

    For correspondence
    joachim.hermisson@univie.ac.at
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7476-9283
  3. Magnus Nordborg

    Magnus Nordborg is at the Gregor Mendel Institute, Austrian Academy of Sciences, Vienna BioCenter, Vienna, Austria

    For correspondence
    magnus.nordborg@gmi.oeaw.ac.at
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7178-9748

Acknowledgements

We thank Jeremy Berg and Peter Visscher for answering our questions, and Molly Przeworski for helpful discussions.

Publication history

  1. Version of Record published: March 21, 2019 (version 1)

Copyright

© 2019, Barton et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 10,483
    Page views
  • 518
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Jeremy J Berg et al.
    Research Communication

    Several recent papers have reported strong signals of selection on European polygenic height scores. These analyses used height effect estimates from the GIANT consortium and replication studies. Here, we describe a new analysis based on the the UK Biobank (UKB), a large, independent dataset. We find that the signals of selection using UKB effect estimates are strongly attenuated or absent. We also provide evidence that previous analyses were confounded by population stratification. Therefore, the conclusion of strong polygenic adaptation now lacks support. Moreover, these discrepancies highlight (1) that methods for correcting for population stratification in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of differences in polygenic scores between populations should be treated with caution until these issues are better understood.

    Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).

    1. Evolutionary Biology
    2. Physics of Living Systems
    Pierre A Haas
    Insight