1. Genetics and Genomics
Download icon

Novel genetic loci affecting facial shape variation in humans

  1. Ziyi Xiong
  2. Gabriela Dankova
  3. Laurence J Howe
  4. Myoung Keun Lee
  5. Pirro G Hysi
  6. Markus A de Jong
  7. Gu Zhu
  8. Kaustubh Adhikari
  9. Dan Li
  10. Yi Li
  11. Bo Pan
  12. Eleanor Feingold
  13. Mary L Marazita
  14. John R Shaffer
  15. Kerrie McAloney
  16. Shu-Hua Xu
  17. Li Jin
  18. Sijia Wang
  19. Femke MS de Vrij
  20. Bas Lendemeijer
  21. Stephen Richmond
  22. Alexei Zhurov
  23. Sarah Lewis
  24. Gemma C Sharp
  25. Lavinia Paternoster
  26. Holly Thompson
  27. Rolando Gonzalez-Jose
  28. Maria Catira Bortolini
  29. Samuel Canizales-Quinteros
  30. Carla Gallo
  31. Giovanni Poletti
  32. Gabriel Bedoya
  33. Francisco Rothhammer
  34. André G Uitterlinden
  35. M Arfan Ikram
  36. Eppo Wolvius
  37. Steven A Kushner
  38. Tamar EC Nijsten
  39. Robert-Jan TS Palstra
  40. Stefan Boehringer
  41. Sarah E Medland
  42. Kun Tang
  43. Andres Ruiz-Linares
  44. Nicholas G Martin
  45. Timothy D Spector
  46. Evie Stergiakouli
  47. Seth M Weinberg
  48. Fan Liu  Is a corresponding author
  49. Manfred Kayser  Is a corresponding author
  50. On behalf of the International Visible Trait Genetics (VisiGen) Consortium
  1. Erasmus MC University Medical Center Rotterdam, Netherlands
  2. University of Chinese Academy of Sciences (CAS), China
  3. University of Bristol, United Kingdom
  4. University of Pittsburgh, United States
  5. King’s College London, United Kingdom
  6. Leiden University Medical Center, Netherlands
  7. QIMR Berghofer Medical Research Institute, Australia
  8. University College London, United Kingdom
  9. Chinese Academy of Sciences (CAS), China
  10. Plastic Surgery Hospital, China
  11. ShanghaiTech University, China
  12. Chinese Academy of Sciences, China
  13. Fudan University, China
  14. Cardiff University, United Kingdom
  15. Instituto Patagonico de Ciencias Sociales y Humanas, CENPAT-CONICET, Argentina
  16. Universidade Federal do Rio Grande do Sul, Brazil
  17. Unidad de Genomica de Poblaciones Aplicada a la Salud, Mexico
  18. Universidad Peruana Cayetano Heredia, Peru
  19. Universidad de Antioquia, Colombia
  20. Universidad de Tarapaca, Chile
  21. Aix-Marseille Université, CNRS, EFS, ADES, France
Research Article
  • Cited 0
  • Views 861
  • Annotations
Cite this article as: eLife 2019;8:e49898 doi: 10.7554/eLife.49898

Abstract

The human face represents a combined set of highly heritable phenotypes, but knowledge on its genetic architecture remains limited, despite the relevance for various fields. A series of genome-wide association studies on 78 facial shape phenotypes quantified from 3-dimensional facial images of 10,115 Europeans identified 24 genetic loci reaching study-wide suggestive association (p < 5 × 10−8), among which 17 were previously unreported. A follow-up multi-ethnic study in additional 7917 individuals confirmed 10 loci including six unreported ones (padjusted < 2.1 × 10−3). A global map of derived polygenic face scores assembled facial features in major continental groups consistent with anthropological knowledge. Analyses of epigenomic datasets from cranial neural crest cells revealed abundant cis-regulatory activities at the face-associated genetic loci. Luciferase reporter assays in neural crest progenitor cells highlighted enhancer activities of several face-associated DNA variants. These results substantially advance our understanding of the genetic basis underlying human facial variation and provide candidates for future in-vivo functional studies.

Introduction

The human face represents a multi-dimensional set of correlated, mostly symmetric, complex phenotypes with high heritability. This is illustrated by a large degree of facial similarity between monozygotic twins and, albeit less so, between other relatives (Djordjevic et al., 2016), stable facial features within and differences between major human populations, and the enormous diversity among unrelated persons almost at the level of human individualization. Understanding the genetic basis of human facial variation has important implications for several disciplines of fundamental and applied sciences, including human genetics, developmental biology, evolutionary biology, medical genetics, and forensics. However, current knowledge on the specific genes influencing human facial appearance and the underlying molecular mechanisms forming facial morphology is limited.

Over recent years, nine separate genome-wide association studies (GWASs) have each highlighted several but largely non-overlapping genetic loci associated with facial shape phenotypes (Adhikari et al., 2016; Cha et al., 2018; Claes et al., 2018; Cole et al., 2016; Lee et al., 2017; Liu et al., 2012; Paternoster et al., 2012; Pickrell et al., 2016; Shaffer et al., 2016). The overlapping loci from independent facial GWASs, thus currently representing the most established genetic findings, include DNA variants in or close to 10 genes that is CACNA2D3 (Paternoster et al., 2012; Pickrell et al., 2016), DCHS2 (Adhikari et al., 2016; Claes et al., 2018), EPHB3 (Claes et al., 2018; Pickrell et al., 2016), HOXD cluster (Claes et al., 2018; Pickrell et al., 2016), PAX1 (Adhikari et al., 2016; Shaffer et al., 2016), PAX3 (Adhikari et al., 2016; Claes et al., 2018; Liu et al., 2012; Paternoster et al., 2012; Pickrell et al., 2016), PKDCC (Claes et al., 2018; Pickrell et al., 2016), SOX9 (Cha et al., 2018; Claes et al., 2018; Pickrell et al., 2016), SUPT3H (Adhikari et al., 2016; Claes et al., 2018; Pickrell et al., 2016), and TBX15 (Claes et al., 2018; Pickrell et al., 2016), which were identified mostly in Europeans.

Indicated by the small effect size of the previously identified face-associated DNA variants, together with the high heritability of facial phenotypes, the true number of genes that influence polygenetic facial traits is likely to be much larger than currently known. Despite the previous work, our understanding of the complex genetic architecture of human facial morphology thus remains largely incomplete. This emphasizes on the need for well-powered GWASs that can be achieved through large collaborative efforts, which has been recently demonstrated for appearance traits involving pigmentation variation (Hysi et al., 2018; Visconti et al., 2018), but is currently missing for the human face.

A most recent study (Claes et al., 2018) suggested that variants at genetic loci implicated in human facial shape are involved in the epigenetic regulation of human neural crest cells, suggesting that at least some of the functional variants may reside within regulatory elements such as enhancers. Although this is in line with evidence for other appearance phenotypes such as human pigmentation traits, where experimental studies have proven enhancer effects of pigmentation-associated DNA variants (Visser et al., 2012; Visser et al., 2014; Visser et al., 2015), experimental evidence for understanding the functional basis of DNA variants for which facial phenotype associations have been previously established is missing as of yet.

Led by the International Visible Trait Genetics (VisiGen) Consortium and together with its study partners, the current study represents a collaborative effort to identify novel genetic variants involved in human facial variation in the largest set of multi-ethnic samples available thus far. Moreover, it demonstrates the functional consequences of identified face-associated DNA variants based on in-silico work and in-vitro experimental evidence.

Results

Facial phenotypes

The current study included seven cohorts, totaling 18,032 individuals. Demographic characteristics and phenotyping details for all cohorts are provided in Materials and methods and Supplementary file 1 - Table 1. A variety of different phenotyping methods are available for 3D facial image data, where some approaches (Claes et al., 2018) require access to the underlying raw image datasets, which often cannot be shared between different research groups. In order to maximize sample size via a collaborative study, we focused on linear distance measures of facial features that can be accurately derived from facial image datasets. In our cohorts with 3D facial surface images, we focused on distances between a common set of thirteen well-defined facial landmarks (Figure 1). In the Rotterdam Study (RS) and the TwinsUK study (TwinsUK), we used the same ensemble method for automated 3D face landmarking that was specifically designed for large cohort studies with the flexibility of landmark choice and high robustness in dealing with various image qualities (de Jong et al., 2018; de Jong et al., 2016). Facial landmarking in Avon Longitudinal Study of Parents and their Children (ALSPAC) and Pittsburgh 3D Facial Norms study (PITT) was done manually. Generalized Procrustes Analysis (GPA) was used to remove affine variations due to shifting, rotation and scaling (Figure 1—figure supplement 1). A total of 78 Euclidean distances involving all combinatorial pairs of the 13 facial landmarks were considered as facial phenotypes.

Figure 1 with 5 supplements see all
Manhattan plot from meta-analysis discovery GWAS (N = 10,115 Europeans) for 78 Euclidean distances between 13 facial landmarks.

Results from meta-analysis of four GWASs in Europeans (N = 10,115), which were separately conducted in RS, TwinsUK, ALSPAC, and PITT for 78 facial shape phenotypes that are displayed in a signal ‘composite’ Manhattan plot at the bottom of the figure. All loci consisting of SNPs reaching study-wide suggestive significance (p<5 × 10−8, red line) were nominated according to the nearby candidate genes in the associated loci, and the top 10 loci were highlighted in colors. The black line indicates study-wide significance after multiple trait correction (p<1.2×10−9). Novel and previously established face-associated genetic loci were differentiated using colored and gray text, respectively. Annotation of the 13 facial landmarks used to derive the 78 facial phenotypes and the associated facial phenotypes are illustrated in the upper part of the figure.

Detailed phenotype analyses were conducted in RS. In RS, almost all (72 out of 78, 92%,) of the facial phenotypes followed the normal distribution as formally investigated by applying the Shapiro-Wilk normality test and obtaining non-significant results (Supplementary file 1 - Table 2). For only six phenotypes, this test revealed significant results; however, their histogram looked largely normal. A highly significant sex effect (min p=2.1 × 10−151) and aging effect (min p=8.0 × 10−44) were noted, together explained up to 21% of the variance per facial phenotype (e.g. between alares and Ls; Figure 1—figure supplement 2A–2C, Supplementary file 1 - Table 3). Intra-organ facial phenotypes (e.g. within nose or within eyes) showed on average higher correlations than inter-organ ones (e.g. between nose and eyes), and symmetric facial phenotypes showed higher correlations than non-symmetric ones. Genetic correlations (Zhou and Stephens, 2014) estimated using genome-wide data were similar to correlations obtained directly from phenotype data, with on average higher absolute values seen based on genotypes (Figure 1—figure supplement 3, Supplementary file 1 - Tables 4 and 5).

Unsupervised hierarchical clustering analysis identified four distinct clusters of facial shape phenotypes, for example many distances related to pronasale were clustered together (Figure 1—figure supplement 2D). Twin heritability of all facial phenotypes was estimated in TwinsUK as h2 mean(sd)=0.52 (0.15), max = 0.86 and in Queensland Institute of Medical Research study (QIMR) as h2 mean(sd)=0.55 (0.19), max = 0.91 (Supplementary file 1 - Table 6). The most heritable features were linked to the central face (i.e., the outer- and inter- orbital distances, the nose breadth, and the distance between the subnasion and the inner eye corners; Figure 1—figure supplement 2E and F). These twin-derived heritability estimates were generally higher than those obtained from genome-wide SNPs in the same cohorts (Supplementary file 1 - Table 6). These results were overall consistent with expectations and suggest that the phenotype correlation within the human face may be explained in part by sets of shared genetic components.

GWASs and replications

We conducted a series of GWASs and meta-analyses to test the genetic association of 7,029,494 autosomal SNPs with 78 face phenotypes in RS, TwinsUK, ALSPAC, and PITT, totaling 10,115 subjects of European descent, used as the discovery dataset. Inflation factors were in an acceptable range in all meta-analyses (average λ = 1.015, sd = 0.006; Figure 1—figure supplement 4), and were thus not further considered. A matrix spectral decomposition analysis estimated the effective number of independent facial traits as 43, which corresponds to a Bonferroni corrected study-wide significant threshold of 1.2 × 10−9. A power analysis showed that under this threshold, our discovery data had over 90% power to detect an allelic effect explaining 0.54% variance for individual phenotypes (Figure 1—figure supplement 5A). In the discovery GWAS meta-analysis, a total of 221 SNPs from six distinct loci showed study-wide significant association (p<1.2 × 10−9) with facial phenotypes, among which two have not been previously reported: 4q28.1 INTU (top SNP rs1250495 p=3.03 × 10−10), and 12q24.21 TBX3 (rs1863716 p=1.47 × 10−10). The other four have been described in previous GWASs of facial variation, including 2q36.1 PAX3, 4q31.3 SFRP2, 17q24.3 CASC17, and 20p11.22 PAX1 (Figure 1 and Supplementary file 1 - Table 7) (Liu et al., 2012; Paternoster et al., 2012; Adhikari et al., 2016; Shaffer et al., 2016; Claes et al., 2018). Considering the traditional genome-wide significant threshold of 5 × 10−8 as our study-wide suggestive significance threshold for replication studies, the power analysis showed that under this threshold, our discovery data had over 90% power to detect an allelic effect explaining 0.43% variance for individual phenotypes (Figure 1—figure supplement 5B). In addition to the study-wide significant ones, 273 SNPs from 18 distinct loci passed the 5 × 10−8 threshold (Supplementary file 1 - Table 7), of which 15 were novel and 3 (1p12 TBX15, 9q22.31 ROR2 and 17q24.3 SOX9) have been reported in previous studies (Cha et al., 2018; Claes et al., 2018; Pickrell et al., 2016). Thus, in total, we identified 24 face-associated genetic loci, of which 17 were not reported in previous face GWASs, that we followed-up via a multi-ethnic replication analysis (see below). In general, no significant heterogeneity was observed for the 24 top SNPs of the study-wide significant and suggestive associated loci, that is only one SNP (rs3403289 at 2q36.1 PAX3) showed nominally significant (Cochran's Q test p=0.04, Table 1) heterogeneity between the discovery cohorts, but was not significant after multiple testing correction.

Table 1
SNPs significantly associated (p<5×10−8) with facial shape phenotypes from European discovery GWAS meta-analysis (RS, TwinksUK, ALSPAC, and PITT), and their multi-ethnic replication (UYG, CANDELA and QIMR).
Discovery meta-analysisReplication
(N = 10,115)(N = 7,917)
RegionSNPNearest GeneEAOATraitBetaPQCom. P
Novel face-associated loci
1p36.22rs143353512CASZ1AGPrn-AlL−0.296.44 × 10−90.730.0001
1p36.13rs200243292ARHGEF19ITEnR-ChL−0.111.46 × 10−80.540.0640
1p31.2rs77142479RPE65CAEnL-Sn−0.087.65 × 10−90.900.0121
2p12rs10202675LRRTM4TCEnR-AlR−0.264.43 × 10−80.400.0226
2q31.1rs2884836KIAA1715TCExR-ChR−0.093.04 × 10−80.520.1096
3q12.1rs113663609CMSS1AGEnR-ChR−0.123.46 × 10−80.960.0782
4q28.1rs12504954INTUAGPrn-AlL−0.093.03 × 10−100.520.0015
6p22.3rs2225718RNF144BCTEnR-AlR−0.092.97 × 10−80.600.0071
6p21.2rs7738892KIF6TCEnR-N0.111.34 × 10−80.090.0018
8p23.2rs1700048CSMD1CAExR-ChL−0.223.94 × 10−80.730.1131
8q21.3rs9642796DCAF4L2AGAlL-Ls−0.114.52 × 10−80.070.3090
10q22.1rs201719697SUPV3L1ADExL-AlL−0.283.42 × 10−80.66NA
12q24.21rs1863716TBX3ACPrn-AlR−0.091.47 × 10−100.532.45 × 10−5
13q14.3rs7325564LINC00371TCN-Prn0.091.34 × 10−80.160.3519
14q32.2rs1989285C14orf64GCN-Prn−0.104.54 × 10−80.820.0019
16q12.1rs16949899SALL1TGExR-ChL−0.133.35 × 10−80.760.8912
16q12.2rs7404301RPGRIP1LGAAlL-Ls−0.093.49 × 10−90.432.87 × 10−7
Previously reported face-associated loci
1p12rs1229119TBX15TCExR-ChR0.082.25 × 10−80.260.1219
2q36.1rs34032897PAX3GAEnR-N0.141.96 × 10−170.040.0020
4q31.3rs6535972SFRP2CGEnL-AlL0.123.51 × 10−120.125.89 × 10−12
9q22.31rs2230578ROR2CTEnL-Sn0.099.02 × 10−90.661.09 × 10−6
17q24.3rs8077906CASC17AGPrn-EnL−0.093.00 × 10−110.240.0505
17q24.3rs35473710SOX9GAAlR-ChL0.193.95 × 10−80.370.2634
20p11.22rs4813454PAX1TCAlL-AlR−0.102.32 × 10−110.580.0002
  1. Except rs2230578 (3'-UTR of ROR2) and rs7738892 (a synonymous variant of KIF6), all SNPs are intronic or intergenic. Gene symbols in bold indicate successful replication after Bonferroni correction of 24 loci (p<2.1×10−3). Meta P in bold indicates study-wide significance in the discovery analysis after multiple trait correction (p<1.2×10−9). Replication P in bold means significant after Bonferroni correction for multiple tests (p<2.1×10−3). Q: p-value from the Cochran's Q test for testing heterogeneity between discovery cohorts. EA and OA, the effect allele and the another allele. Com. P: p-values from a combined test of dependent tests. NA, not available.

The association signals at the 24 genetic loci involved multiple facial phenotypes clustered to the central region of the face above the upper lip (Figure 1), and closely resembled the twins-based face heritability map (Figure 1—figure supplement 2E and F). All seven previously reported loci showed largely consistent allele effects on the same or similar sets of facial phenotypes as described in previous GWASs (Supplementary file 1 - Table 8). Four (2q36.1 PAX3, 9q22.31 ROR2, 17q24.3 CASC17, and 20p11.22 PAX1) out of the 24 face-associated loci have been noted in the GWAS catalog for face morphology or related phenotypes that is nose size, chin dimples, monobrow thickness, and male pattern baldness (Supplementary file 1 - Table 9). Five out of the 24 loci showed genome-wide significant association (p<5e-8) with non-facial phenotypes and/or diseases in the GeneATLAS database (Canela-Xandri et al., 2018) (Supplementary file 1 - Table 10), suggesting pleiotropic effects. Six out of the 24 loci showed a high degree of diversity among the major continental groups in terms of both allele frequency differences (>0.2) and Fst values (empirical p<0.05), including 1p31.2 RPE65, 2q36.1 PAX3, 4q28.1 INTU, 16q12.1 SALL1, 16q12.2 RPGRIP1L and 17q24.3 CASC17 (Supplementary file 1 - Tables 11 and 12). Three out of the 24 face-associated loci showed signals of positive selection in terms of the integrated haplotype scores (iHS, empirical p<0.01), including RPE65, RPGRIP1L and CASC17 (Supplementary file 1 - Table 12). None of the 24 loci overlapped with the previously reported signals of singleton density score (SDS, empirical p<0.01) in Europeans (Field et al., 2016), which may indicate that the observed selection signals are unlikely the consequences from the recent selective pressures detectable via SDS analysis.

We selected the 24 top-associated SNPs (Table 1), one SNP per for each of the identified 24 genetic loci passing the study-wide suggestive significance threshold (p<5×10−8), for replication studies in a total of 7917 multi-ethnic subjects from three additional cohorts that is the Consortium for the Analysis of the Diversity and Evolution of Latin America (CANDELA) study involving Latin Americans known to be of admixed Native American and European ancestry, the Xinjiang Uyghur (UYG) study from China known to be of admixed East Asian and European ancestry, and the Queensland Institute of Medical Research (QIMR) study from Australia involving Europeans. We used a ‘combined test of dependent tests’ (Kost and McDermott, 2002) to account for the incompatibility among facial traits from the three replication cohorts, which tests for H0: no association for all facial phenotypes in all replication cohorts vs. H1: associated with at least one facial phenotypes in at least one replication cohorts. After Bonferroni correction of multiple testing of 24 SNPs, significant evidence of replication (p<2.1×10−3) was obtained for 10 SNPs, among which six were novel loci: 1p36.22 CASZ1, 4q28.1 INTU, 6p21.2 KIF6, 12q24.21 TBX3, 14q32.2 C14orf64, and 16q12.2 RPGRIP1L (Figure 2), and four have been described in previous face GWASs: 2q36.1 PAX3, 4q31.3 SFRP2, 9q22.31 ROR2, and 20p11.22 PAX1 (Adhikari et al., 2016; Claes et al., 2018; Paternoster et al., 2012; Pickrell et al., 2016; Shaffer et al., 2016).

Figure 2 with 24 supplements see all
Six novel genetic loci associated with facial shape phenotypes and successfully replicated.

The figure is composed by six parts, (A) 1p36.22 CASZ1 region, (B) 4q28.1 INTU region, (C) 6p21.2 KIF6 region, (D) 12q24.21 TBX3 region, (E) 14q32.2 C14orf64 region, and (F) 16q12.2 RPGRIP1L region. Each part includes two figures, Face map (left) denoting all of the top-SNP-associated phenotypes weighted by the statistical significance (-log10P). The dashed line means that the P value was nominally significant (p<0.01) but did not reach study-wide suggestive significance (p>5e-8). LocusZoom (right) shows regional association plots for the top-associated facial phenotypes with candidate genes aligned below according to the chromosomal positions (GRCh37.p13) followed by linkage disequilibrium (LD) patterns (r2) of EUR in the corresponding regions. Similar figures for all 24 study-wide suggestively significant loci are provided in Figure 2—figure supplements 124.

The allelic effects of the replicated SNPs were highly consistent across all participating cohorts, while the individual cohorts were underpowered (Figure 2—figure supplements 16), further emphasizing on the merit of our collaborative study. The associated phenotypes of the replicated SNPs demonstrated a clear pattern of facial symmetry that is the same association was observed on both sides of the face, while such a symmetric pattern was less clear for most phenotypes involved in the non-replicated loci (Figure 2—figure supplements 717). Several novel replicated loci showed effects on the similar sets of facial shape phenotypes; for example, 1p36.22 CASZ1 and 4q28.1 INTU (Figure 2A and C) both affected the distances between pronasion and alares. Moreover, several novel replicated loci affected phenotypes in multiple facial regions for example 4q32.1 C14orf64 and 12q24.1 TBX3 (Figure 2D and E) both affected many distances among eyes, alares and mouth. It should be noted however, that the discovery dataset is of European ancestry while the replication dataset is multi-ethnic and the facial phenotypes could not be obtained in the same way as in the discovery set; hence, the replication results were rather conservative, that is the possibility of false negatives in this replication study cannot be excluded. This was evident from the results of the replication attempts of the seven known loci (1p12 TBX15, 2q36.1 PAX3, 4q31.3 SFRP2, 9q22.31 ROR2, 17q24.3 CASC17, 17q24.3 SOX9, 20p11.22 PAX1) (Figure 2—figure supplements 1824). Out of these 7, only four were replicated (PAX3, SFRP2, ROR2 and PAX1, Table 1). For this reason, we considered all of the 494 SNPs from the 24 loci passing the study-wide suggestive threshold in the subsequent polygenic score analyses and in the functional annotation analyses.

Integration with previous literature knowledge

In our discovery GWAS dataset, we also investigated 122 face-associated SNPs reported in previously published facial GWASs (Adhikari et al., 2016; Cha et al., 2018; Claes et al., 2018; Cole et al., 2016; Lee et al., 2017; Liu et al., 2012; Paternoster et al., 2012; Pickrell et al., 2016; Shaffer et al., 2016). Out of those, a total of 44 (36%) SNPs showed significant association on either the same set of facial phenotypes or at least one facial phenotypes in a combined test, after Bonferroni correction for the number of SNPs (adjusted p<0.05) (Supplementary file 1 - Table 8). Notably, none of the 5 SNPs reported in a previous African GWAS (Cole et al., 2016) was replicated in our European dataset. This is unlikely explained by the allele frequency differences between Sub-Saharan African (AFR) and Europeans (EUR) (absolute frequency differences between AFR and EUR were less than 0.1 in the 1000 Genomes project) but more likely due to different phenotypes, that is in the previous study these loci were primarily associated with facial size, which was not tested on our study.

We also investigated the potential links between facial phenotypes and 60 SNPs in 38 loci previously implicated in non-syndromic cleft lip and palate (NSCL/P) phenotype, which is linked with normal variation of facial morphology as suggested previously (Beaty et al., 2010; Beaty et al., 2011; Birnbaum et al., 2009; Grant et al., 2009; Leslie et al., 2017; Ludwig et al., 2017; Ludwig et al., 2012; Mangold et al., 2010; Sun et al., 2015; Yu et al., 2017). Among these, 17 SNPs in 13 loci showed significant face-associations in the discovery cohorts after Bonferroni correction for the number of SNPs and phenotypes, and many (11 SNPs in eight loci) involving cleft related facial landmarks as expected, for example subnasale, labliale superius, and labiale inferius (Supplementary file 1 - Table 13).

Multivariable model fitting and polygenic face scores

In the RS cohort, a multiple-regression conditioning on the effects of the 24 lead SNPs identified 31 SNPs showing significant (adjusted p<0.05) independent effects on sex- and age-adjusted face phenotypes (Supplementary file 1 - Table 14). These 31 SNPs individually explain less than 1%, and together explain up to 4.62%, of the variance for individual phenotypes (Figure 3A and B). The top-explained phenotypes clustered to the central region of the face, that is the distances between nasion, subnasale, pronasale, left endocanthion and right endocanthion. The variance explained was potentially overestimated in RS cohort since the SNP selection and model building were conducted using the same dataset. However, potential overfitting is unexpected to have a drastic effect, given that some of the selected SNPs showed weaker effects in RS than in other cohorts used (Supplementary file 1 - Table 7). We then used the fitted models to derive a set of polygenetic face scores for EUR, AFR and East Asian (EAS) subjects (total N = 1,668) from the 1000-Genomes Project. The polygenic face scores based on the 31 SNPs were largely reversely distributed between EUR and AFR, with EAS situated in between (Figure 3C). EUR showed the longest distances among the vertical phenotypes, while AFR had the widest of the horizontal phenotypes. EAS was consistently in-between with the most differential phenotype scores observed for nose wing breadth (AFR > EAS > EUR) and the nose length (EUR > EAS > AFR). An exception was for mouth phenotypes, where AFR had the largest values in both mouth width and lip thickness (Figure 3D). The nose-related score distribution, for example nose breadth and length, thus largely assembled the facial features in major continental groups, consistent with the hypotheses regarding adaptive evolution of human facial morphology (Noback et al., 2011; Weiner, 1954), for example nose width and nose length have been found to correlate with temperature and humidity. It should be noted, however, that the possibility of potential bias cannot be excluded due to the fact that the discovery GWAS was conducted in only Europeans. Notably, the polygenic scores of the NSCL/P-associated SNPs explained up to 1.2% of the age- and sex- adjusted phenotypic variance for phenotypes mostly between nose and mouth, for example right alare – right cheilion and pronasale – right cheilion (Figure 3—figure supplement 1).

Figure 3 with 1 supplement see all
Polygenic scores of 78 facial phenotypes.

(A) Facial phenotype variance is explained by a multivariable model consisting of 31 face-associated SNPs with independent effects. (B) Unsupervised clustering reveals the details on the explained phenotype variance, where the SNPs were attributed to the candidate genes in the associated regions. All phenotypes with the total explained variance > 2% are shown. SNPs were colored according to their raw p-values and those passing FDR < 0.05 were indicated in squares. (C) Mean standardized polygenic scores calculated by applying the multivariable model trained in the RS discovery sample to Sub-Saharan Africans (AFR), East Asians (EAS) and Europeans (EUR) in the 1000 Genomes Project data. (D) Examples of the distributions of the face scores in samples of different ancestry origin/country.

Gene ontology of face-associated genetic loci

A gene ontology enrichment analysis for the 24 face-associated genetic loci (Table 1) highlighted a total of 67 biological process terms and seven molecular function terms significantly enriched with different genes group (FDR < 0.01 or q value < 0.01, Supplementary file 1 - Table 15), with the top significant terms being ‘embryonic digit morphogenesis’ and ‘embryonic appendage morphogenesis’ (FDR < 1 × 10−8, Figure 4—figure supplement 1). Biological process terms were categorized in four broader categories, ‘morphogenesis’, ‘differentiation’, ‘development’ and ‘regulation’. These results underlined the important role of embryonic developmental and regulatory processes in shaping human facial morphology. Furthermore, cis-eQTL effects of the 494 face-associated SNPs at the 24 loci (Supplementary file 1 - Table 7) were investigated using the GTEx data. Significant multi-tissue cis-eQTL effects (adjusted p<0.05) were found around three genes, ARHGEF19, RPE65, and TBX15 (Supplementary file 1 - Table 16). No significant tissue-specific eQTL effects were observed, likely due to the lack of cell types indexed within the GTEx database that are directly involved in craniofacial morphology.

Expression of face-associated genetic loci in embryonic cranial neural crest cells

Embryonic cranial neural crest cells (CNCCs) arise during weeks 3–6 of human gestation from the dorsal part of the neural tube ectoderm and migrate into the branchial arches. They later form the embryonic face, consequently establishing the central plan of facial morphology and determining species-specific and individual facial variation (Bronner and LeDouarin, 2012; Cordero et al., 2011). Recently, CNCCs have been successfully derived in-vitro from human embryonic stem cells or induced pluripotent stem cells (Prescott et al., 2015). Taking advantage of the available CNCC RNA-seq data (Prescott et al., 2015) and other public RNA-seq datasets compiled from GTEx (Lonsdale et al., 2013) and ENCODE (ENCODE Project Consortium, 2012), we compared 24 genes, which were nearest to the top-associated SNPs at the 24 face-associated genetic loci identified in our discovery GWAS meta-analysis, with the background genes over the genome for their expression in CNCCs and 49 other cell types. In 89% of 9 embryonic stem cells, 65% of 20 different tissue cells, and 15% of 20 primary cells these 24 genes showed significantly higher expression in all cell types (p<0.05) after multiple testing correction compared to the background genes (Figure 4A), and the most significant difference was observed in CNCCs (p=3.8 × 10−6). Furthermore, these 24 genes also showed preferential expression in CNCCs compared to their expression in other cell types, that is in 65% of 20 different tissue cells, in 60% of 20 primary cells, and in 22% of 9 embryonic stem cells (Figure 4A). Specifically, 16 (66.7%) of these 24 genes showed preferential expression in CNCCs compared to 49 other cell types. The top-ranked three preferentially expressed genes were PAX3 (p=1.3 × 10−29), INTU (p=8.3 × 10−25), and ROR2 (p=4.5 × 10−23, Figure 4B). Furthermore, some mid-preferentially expressed genes like DCAF4L2 and non-preferentially expressed genes like RPE65 became top-ranked preferentially expressed genes when testing sub-groups of cell types (Figure 4B). Although not necessarily is the nearest gene the causative gene and not necessarily is there only one causative gene per locus, the observation of significant preferential expression of the nearby genes considered in this analysis does support the hypothesis that variation of facial shape may indeed originate during early embryogenesis (Claes et al., 2018), and provide a priority list in different stages of cell differentiation with genes likely involved in embryo ectoderm derived phenotypes such as facial variation for future in-vivo functional studies.

Figure 4 with 1 supplement see all
Expression of 24 genes located at the identified 24 face-associated genetic loci in 50 cell types.

Expression levels of the genes nearest to the 24 top-associated SNPs (see Table 1) in 50 cell types were displayed in boxplots based on published data from the study of Prescott et al. (2015), the GTEx database (Lonsdale et al., 2013) and the ENCODE database (ENCODE Project Consortium, 2012). The cell types included CNCC, 20 other tissue cells, 20 primary cells and nine embryonic stem cells. (A) Boxplots of normalized RNA-seq VST values for the 24 genes (in orange) and all genome background genes (~50,000 genes, in blue). Expression difference between the genome background genes and the 24 genes was tested in each cell type using the unpaired Wilcoxon rank-sum test. Distribution of the expression levels in CNCC showed smaller variation than those of several embryonic stem cells which showed higher median. The expression of the 24 genes in CNCCs was also iteratively compared with that in other cell types using paired Wilcoxon rank-sum test. Statistical significance was indicated: *p<0.05 after multiple correcting test of cell lines amount. (B) Difference significance of normalized RNA-seq VST values in each of 24 genes were displayed using barplots between CNCCs and all 49 types of cells (purple), CNCCs and 20 tissue cells (blue), CNCCs and 20 primary cells (green), CNCCs and nine embryonic stem cells (orange). The difference between the value of CNCCs and other cell type groups was tested using the one-sample Student’s t test. Dotted line represents Bonferroni corrected significant threshold (p<0.0021). Significant gene labels were in color compared to non-significant gene labels in black.

Cis-regulatory signals in face-associated genetic loci

It is becoming increasingly clear that cis-regulatory changes play a central role in determining human complex phenotypes (Carroll, 2008; Wray, 2007). In our study, apart from those at KIF6 and ROR2, all identified face-associated DNA variants were non-exonic. With preferential expression pattern validated in CNCCs, we explored the potential cis-regulatory activity of the face-associated DNA variants using the CNCC epigenomic mapping datasets (Prescott et al., 2015). This includes ChIP-seq data on transcription factor (TFAP2A and NR2F1) binding, general coactivator (p300) binding, or histone modifications associated with active regulatory elements (H3K4me1, H3K4me3, and H3K27ac), as well as ATAC-seq data on chromatin accessibility. Abundant entries of cis-regulatory signals were detected surrounding the face-associated loci. In particular, 6 (25%) of the 24 face-associated loci, that is 1p36.13 ARHGEF19, 2q36.1 PAX3, CMSS1, 4q28.1 INTU, 16q12.1 SALL1, and 16q12.2 RPGRIP1L, showed significant enrichment of cis-regulatory entries at the top 0.1% of the genome level in at least one epigenomic tracks after Bonferroni correction (adjusted p<0.05, Figure 5—figure supplement 1). We selected 5 SNPs as candidate regulatory SNPs (Figure 5A and B, Supplementary file 1 - Table 17) as they not only overlapped with the putative enhancers defined in CNCCs (Prescott et al., 2015) and penis foreskin melanocyte primary cells (Chadwick, 2012; ENCODE Project Consortium, 2012), but also showed high LD (r2 > 0.7) with at least one of the face-associated SNPs at the study-wide suggestive significance level. These five candidate regulatory SNPs were further studied via functional experiments as described in the following section.

Figure 5 with 1 supplement see all
Three examples of genetic loci associated with facial shape phenotypes.

Three examples of facial shape associated loci we discovered are depicted in details, including two novel loci (INTU; RPGRIP1L) and one (PAX3) that overlaps with previous face GWAS findings. The figure is composed by three layers (A–C) organized from the top to the bottom. (A) Denotes the linkage disequilibrium (LD) patterns (r2) of EUR in the corresponding regions and epigenetic annotation of CNCC’s regulatory elements in corresponding regions using WashU Epigenome Browser. Chip-seq profiles display histone modifications associated with active enhancers (H3K27ac, H3K4me1) or promoters (H3K4me3), and binding of general coactivator p300 and transcription factor TFAP2A. The ATAC-seq track shows chromatin accessibility. The PhyloP track indicates cross-species conservation. (B) SNPs in LD (r2 > 0.25) with the top-associated SNP and all base-pairs in vicinity (within 20 kb, p<0.05) are displayed according to their log scaled quantile rank (-log10p, axis scale) in the circular figure, where the rank is calculated using the Chip-seq values in the whole genome. In addition, SNPs associated with facial phenotypes in our meta-analysis of GWASs were highlighted according to their association p values (PMeta, color scale) and their LD (r2) with the top-associated SNP in the region (points size). (C) Shows results of luciferase reporter assays in CD271+ neural crest progenitor cells by which we tested allele-specific enhancer activity of putative enhancers surrounding the selected SNPs using the Student´s t-test. Data are presented as relative luciferase activity (firefly/renilla activity ratio) fold change compared to the construct containing less active allele. Similar figures for layers A and B are provided for all face associated loci in Figure 2—figure supplements 124.

Confirmation of cis-regulatory enhancer effects in neural crest progenitor cells

We performed in-vitro luciferase reporter assays to quantify the potential regulatory effect of the five face-associated SNPs suggested by the in-silico analyses outlined above as regulatory candidates that is rs13410020 and rs2855266 from the 2q36.1 PAX3 region, rs17210317 and rs6828035 from the 4q28.1 INTU region, and rs4783818 from the 16q12.2 RPGRIP1L region (Supplementary file 1 - Table 17). Luciferase reporter assays in CD271+ neural crest progenitor cells demonstrated that both SNPs in 2q36.1 and 4q28.1, respectively, showed statistically significant allele-specific effects on enhancer activity (Figure 5C). In the 4q28.1 region, the derived SNP alleles were associated with decreased luciferase expression (33.5 ×~ 10.4 × , Supplementary file 1 - Table 18) as well as a reduced length of some nose features (−0.087 < beta < −0.078, Supplementary file 1 - Table 7). In the 2q36.1 region, the derived SNP alleles were associated with increased luciferase expression (1.5 ×~ 3.7 × , Supplementary file 1 - Table 18) as well as an increased length of some nasion-related features (0.131 < beta < 0.149, Supplementary file 1 - Table 7). Using a luciferase assay for rs4783818 from the 16q12.2 region in CD271+ neural crest progenitor cells, we obtained an allele-specific effect on enhancer activity, although it was not statistically significant (Figure 5C). Additional luciferase reporter assays using G361 melanoma cells provided similar results, indicating that the regulatory activity associated with the DNA variants is conserved across the neural crest cell lineage (Supplementary file 1 - Table 18). These results provide evidence that the tested face-associated SNPs impact the activity of enhancer elements and thus have regulatory function.

Discussion

This study represents the largest genome scan to date examining facial shape phenotypes in humans quantified from 3D facial images. We identified 24 genetic loci showing study-wide suggestive association with normal-range facial shape phenotypes in Europeans, including 17 novel ones, of which 10 were successfully replicated in additional multi-ethnic population cohorts including six novel loci. Significant gene enrichment in morphogenesis related pathways and cis-regulation signals with preferential CNCC expression activities were observed at these genetic loci, underlining their involvement in facial morphology. Our findings were additionally supported by a strong integration with previous genetic association and functional studies. Moreover, via in-vitro cell culture experiments, we functionally exemplified the regulatory activity of selected face-associated SNPs at two newly identified (INTU and RPGRIP1L) and one established (PAX3) face shape genes.

The six replicated novel face-associated loci included 1p36.22 CASZ1 4q28.1 INTU 6p21.2 KIF6, 12q24.21 TBX3, 14q32.2 C14orf64, and 16q12.2 RPGRIP1L. For 4q28.1 INTU and 16q12.2 RPGRIP1L, we demonstrated an impact of selected DNA variants on enhancer activity. INTU, also known as planar cell polarity (PCP) effector, has been reported to compromise proteolytic processing of Gli3 by regulating Hh signal transduction (Zeng et al., 2010). Notably, SNPs in the GLI3 region have been previously associated with human nose wing breadth (Adhikari et al., 2016), supporting a link between INTU and facial morphogenesis. In addition, rare frameshift mutations in INTU have been found to cause human Oral-Facial-Digital (OFD) syndromes as characterized by abnormalities of the face and oral cavity (Bruel et al., 2017). A previous study found that Rpgrip1l mutant mouse embryos displayed facial abnormalities in cleft upper lips and hypoplastic lower jaws (Delous et al., 2007). FTO, a proximate gene to RPGRIP1L, has also been found essential for normal bone growth and bone mineralization in mice (Sachse et al., 2018). TBX3, a member of the T-box gene family, causes Ulnar-mammary syndrome which is a rare disorder characterized by distinct chin and nose shape together with many other abnormal morphological changes (Bamshad et al., 1997; Joss et al., 2011). Intriguingly, TBX3 and TBX15 associated facial phenotypes showed an asymmetric pattern, particularly for nose phenotypes (TBX3) and eye-mouth distances (TBX15). These two genes are thus good candidates for further studying the genetic basis of facial asymmetry and related disorders. A synonymous variant (rs7738892) in the transcript coding region of KIF6 was associated with facial traits. It has been shown that the homologous Kif6 transcript sequence was necessary for spine development in zebrafish, likely due to its role in cilia, which is known to utilize neural crest cells for regulating ventral forebrain morphogenesis (Buchan et al., 2014). For the remaining two novel loci (CASZ1 and C14orf64), there is current lack of existing literature evidence supporting their involvement in facial variation.

The four replicated previously reported face-associated loci included 2q36.1 PAX3, 4q31.3 SFRP2, 9q22.31 ROR2, and 20p11.22 PAX1. Their effects observed in the current study were all consistent with the previous findings, for example PAX3 was associated with position of the nasion (Liu et al., 2012; Paternoster et al., 2012), SFRP2 with distance between inner eye corners and alares (Adhikari et al., 2016; Claes et al., 2018), PAX1 with nose wing breadth (Adhikari et al., 2016; Shaffer et al., 2016), and ROR2 with nose size (Pickrell et al., 2016). The face-associated rs2230578 at 9q22.31 is in the 3'-UTR of ROR2. A Ror2 knockout mouse has been used as a model for the developmental pathology of autosomal recessive Robinow syndrome, a human dwarfism syndrome characterized by limb shortening, vertebral and craniofacial malformations, and small external genitalia (Schwabe et al., 2004). Among the 11 non-replicated novel loci, two are worth mentioning: KIAA1715 maps to about 100kbp upstream of the HOXD cluster (homeobox D cluster), which has been highlighted by two independent facial variation GWAS (Claes et al., 2018; Pickrell et al., 2016). DCAF4L2 gene variants were previously associated with NSCL/P (Leslie et al., 2017; Ludwig et al., 2012; Yu et al., 2017) and we revealed a significant effect on the distance between alares and the upper-lip. We therefore regard it as likely that true positive findings exist among the 11 non-replicated novel loci. This is further supported by our finding that among the total of 14 non-replicated loci, three loci (21%) that is TBX15 (Claes et al., 2018; Pickrell et al., 2016), CASZ17 (Claes et al., 2018; Pickrell et al., 2016), and SOX9 (Cha et al., 2018; Claes et al., 2018; Pickrell et al., 2016) correspond to previously identified face-associated loci. As such, their re-identification in our discovery GWAS meta-analysis represents a confirmation per se, even without significant evidence from our replication analysis. Moreover, pleiotropic effects of TBX15 on skeleton, for example stature, were previously demonstrated by a large-sized GWAS (Wood et al., 2014). Loss of function of TBX15 in mice has been shown to cause abnormal skeletal development (Singh et al., 2005). Thus, the fact that these three previously reported face-associated loci were identified in our discovery GWAS, but were not confirmed in our replication analysis, likely is explained by relatively low statistical power, especially for the European replication dataset (N = 1,101) being most relevant here as these three genes were highlighted in Europeans in the previous and in our study. Moreover, for the European QIMR samples in the replication set, no direct replication analysis was possible due to the availability of 2D images, but not 3D images as in the discovery dataset. Therefore, future studies shall further investigate these 11 novel face-associated loci.

Of the previously identified face-associated genetic loci, several candidate genes for example Pax3 (Zalc et al., 2015), Prdm16 (Bjork et al., 2010), Tp63 (Thomason et al., 2008), Sfrp2 (Satoh et al., 2008), and Gli3 (Hui and Joyner, 1993) have been shown to play important roles in cranial development of embryonic mice. Moreover, in adult mice only Edar has been investigated for its effect on normal facial variation (Adhikari et al., 2016), but the respective face-associated missense EDARV370A is nearly absent in Europeans (Kamberov et al., 2013). Here we showed via direct experimental evidence in neural crest cells that face-associated SNPs in INTU, PAX3, and RPGRIP1L are likely involved in enhancer functions and regulate gene activities. Future studies shall demonstrate functional evidence for more SNPs and genes identified here or previously to be significantly association with facial shape phenotypes. As demonstrated here, this shall go beyond evidence on the bioinformatics level (Claes et al., 2018) and shall involve direct functional testing of the associated DNA variants in suitable cell lines or mouse models.

Our results indirectly support the hypothesis that the facial differentiation observed among worldwide human populations is shaped by positive selection. For some facial features, it is reasonable to speculate a primary selective pressure on the morphological effects of the associated variants, although many of these variants may have pleiotropic effects on other fitness-related phenotypes. For example, nasal shape, which is essential for humidifying and warming the air before it reaches the sensitive lungs, has been found to significantly correlate with climatic variables such as temperature and humidity in human populations (Noback et al., 2011). Notably, many of the SNPs highlighted in our study are involved in nasal phenotypes. The overall effect of these SNPs as measured by polygenic scores largely assembled the nasal shape differences among major continental groups, which implies that human facial variation in part is caused by genetic components that are shared between major human populations. However, there are existing examples of population-specific effects, for example, EDAR clearly demonstrates an Asian specific effect, which impacts on various ectoderm-derived appearance traits such as hair morphology, size of the breast, facial shape in Asians (Adhikari et al., 2015; Kamberov et al., 2013; Kimura et al., 2009; Tan et al., 2013). The respective face-associated missense DNA variant is rare in European and African populations, demonstrating regional developments in Asia.

Nevertheless, the polygenic face scores only explained small proportions of the facial phenotype variance with up to 4.6%. This implies that many more face-predictive genetic information remains to be discovered in future studies, which - in case enough - may provide the prerequisite for practical applications of predicting human facial information from genomic data such as in forensics or anthropology. On the other hand, the increasing capacity of revealing personal information from genomic data may also have far-reaching ethical, societal and legal implications, which shall be broadly discussed by the various stakeholders alongside the genomic and technological progress made here and to be made in future studies.

In summary, this study identified multiple novel and confirmed several previously reported genetic loci influencing facial shape variation in humans. Our findings extend knowledge on natural selection having shaped the genetic differentiation underlying human facial variation. Moreover, we demonstrated the functional effects of several face-associated genetic loci as well as face-associated DNA variants with in-silico bioinformatics analyses and in-vitro cell line experiments. Overall, our study strongly enhances genetic knowledge on human facial variation and provides candidates for future in-vivo functional studies.

Materials and methods

Cohort details

Request a detailed protocol

The four discovery cohorts, totaling 10,115 individuals, were Rotterdam Study (RS, N = 3,193, North-Western Europeans from the Netherlands), TwinsUK (N = 1,020, North-Western Europeans from the UK), ALSPAC (N = 3,707, North-Western Europeans from the UK), and Pittsburgh 3D Facial Norms study (PITT, N = 2,195, European ancestry from the United States). The three replication cohorts, totaling 7917 individuals, were CANDELA Study (N = 5,958, Latin Americans with ancestry admixture estimated at 48% European, 46% Native American and 6% African), Xinjiang Uyghur Study (UYG, N = 858, Uyghurs from China with ancestry admixture estimated at 50% East Asian and 50% European), and Queensland Institute of Medical Research Study (QIMR, N = 1,101, North-Western European ancestry from Australia). Three-dimensional facial surface images were acquired in all discovery cohorts and UYG, while frontal 2-dimensional photographs were used in CANDELA and QIMR.

Rotterdam Study (RS)

Request a detailed protocol

The RS is a population based cohort study of 14,926 participants aged 45 years and older, living in the same suburb of Rotterdam, the Netherlands (Hofman et al., 2013). The present study includes 3193 participants of Dutch European ancestry, for whom high-resolution 3dMDface digital photographs were taken. Genotyping was carried out using the Infinium II HumanHap 550K Genotyping BeadChip version 3 (Illumina, San Diego, California USA). Collection and purification of DNA have been described previously (Kayser et al., 2008). All SNPs were imputed using MACH software (www.sph.umich.edu/csg/abecasis/MaCH/) based on the 1000-Genomes Project reference population information (Abecasis et al., 2012). Genotype and individual quality controls have been described in detail previously (Lango Allen et al., 2010). After all quality controls, the current study included a total of 6,886,439 autosomal SNPs (MAF > 0.01, imputation R2 > 0.8, SNP call rate > 0.97, HWE > 1e-4). The Rotterdam Study has been approved by the Medical Ethics Committee of the Erasmus MC (registration number MEC 02.1015) and by the Dutch Ministry of Health, Welfare and Sport (Population Screening Act WBO, license number 1071272–159521 PG). The Rotterdam Study has been entered into the Netherlands National Trial Register (NTR; R; www.trialregister.nl) an) and into the WHO International Clinical Trials Registry Platform (ICTRP; P; www.who.int/ictrp/network/primary/en/) und under shared catalogue number NTR6831. All participants provided written informed consent to participate in the study and to have their information obtained from treating physicians.

TwinsUK study

Request a detailed protocol

The TwinsUK study included 1020 phenotyped participants (all female and all of Caucasian ancestry) within the TwinsUK adult twin registry based at St. Thomas’ Hospital in London. All participants provided fully informed consent under a protocol reviewed by the St. Thomas’ Hosptal Local Research Ethics Committee. Genotyping of the TwinsUK cohort was done with a combination of Illumina HumanHap300 and HumanHap610Q chips. Intensity data for each of the arrays were pooled separately and genotypes were called with the Illuminus32 calling algorithm, thresholding on a maximum posterior probability of 0.95 as previously described (The MuTHER Consortium, 2011). Imputation was performed using the IMPUTE 2.0 software package using haplotype information from the 1000 Genomes Project (Phase 1, integrated variant set across 1092 individuals, v2, March 2012). After all quality controls, the current study included a total of 4,699,858 autosomal SNPs (MAF > 0.01, imputation R2 > 0.8, SNP call rate > 0.97, HWE > 1e-4) and 1020 individuals.

ALSPAC study

Request a detailed protocol

The Avon Longitudinal Study of Parents and Children (ALSPAC) is a longitudinal study that recruited pregnant women living in the former county of Avon in the United Kingdom, with expected delivery dates between 1st April 1991 and 31st December 1992. The initial number of enrolled pregnancies was 14,541 resulting in 14,062 live births and 13,988 children alive at the age of 1. When the oldest children were approximately 7 years of age, additional eligible cases who had failed to join the study originally were added to the study. Full details of study enrolment have been described in detail previously (Boyd et al., 2013; Fraser et al., 2013; Golding et al., 2001). A total of 9,912 ALSPAC children were genotyped using the Illumina HumanHap550 quad genome-wide SNP genotyping platform. Individuals were excluded from further analysis based on incorrect sex assignments; heterozygosity (0.345 for the Sanger data and 0.330 for the LabCorp data); individual missingness (>3%); cryptic relatedness (>10% IBD) and non-European ancestry (detected by a multidimensional scaling analysis seeded with HapMap two individuals). The resulting post-quality control dataset contained 8237 individuals. The post-quality control ALSPAC children were combined with the ALSPAC mothers cohort (Fraser et al., 2013) and imputed together using a subset of markers common to both the mothers and the children. The combined sample was pre-phased using ShapeIT (v2.r644) (Delaneau et al., 2012) and imputed to the 1000 Genomes reference panel (Phase 1, Version3) (Auton et al., 2015) using IMPUTE3 V2.2.2 (Howie et al., 2009). The present study sample included 3707 individuals with 3D facial images, genotype and covariate data. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool: http://www.bristol.ac.uk/alspac/researchers/our-data/.

Pittsburgh 3D Facial Norms (PITT) sample, United States

Request a detailed protocol

The current study included 2195 individuals from the 3D Facial Norms dataset (Weinberg et al., 2016). These individuals were recruited at four US sites (Pittsburgh, Seattle, Houston, and Iowa City), were of self-identified European ancestry, and ranged in age from 3 to 49 years. Exclusion criteria included any individuals with a history of significant facial trauma, congenital defects of the face, surgical alteration of the face, or any medical conditions affecting the facial posture. DNA was extracted from saliva samples and genotyped along with 72 HapMap control samples for 964,193 SNPs on the Illumina (San Diego, CA) HumanOmniEx- press+Exome v1.2 array plus 4,322 SNPs of custom content by the Center for Inherited Disease Research (CIDR). Samples were interrogated for genetic sex, chromosomal aberrations, relatedness, genotype call rate, and batch effects. SNPs were interrogated for call rate, discordance among duplicate samples, Mendelian errors among HapMap controls (parent-offspring trios), deviations from Hardy-Weinberg equilibrium, and sex differences in allele frequencies and heterozygosity. To assess population structure, we performed principal component analysis (PCA) using subsets of uncorrelated SNPs. Based on the scatterplots of the principal components (PCs) and scree plots of the eigenvalues, we determined that population structure was captured in four PCs of ancestry. Imputation of unobserved variants was performed using haplotypes from the 1000 Genomes Project Phase three as the reference. Imputation was performed using IMPUTE2. We used an info score of > 0.5 at the SNP level and a genotype probability of > 0.9 at the SNP-per-participant level as filters for imputed SNPs. Masked variant analysis, in which genotyped SNPs were imputed in order to assess imputation quality, indicated high accuracy of imputation. Institutional Review Board (IRB) approval was obtained at each recruitment center and all subjects gave written informed consent prior to participation: University of Pittsburgh IRB #PRO09060553 and IRB0405013; Seattle Children’s IRB #12107; University of Iowa Human Subjects Office/IRB #200912764 and #200710721; UT Health Committee for the Protection of Human Subjects #HSC-DB-09–0508 and #HSC-MS-03–090.

Xinjiang uyghur (UYG) study

Request a detailed protocol

The UYG samples were collected at Xinjiang Medical University in 2013–2014. In total, 858 individuals (including 333 males and 525 females, with an age range of 17–25) were enrolled. The research was conducted with the official approval from the Ethics Committee of the Shanghai Institutes for Biological Sciences, Shanghai, China. All participants had provided written consent. All samples were genotyped using the Illumina HumanOmniZhongHua-8 chips, which interrogates 894,517 SNPs. Individuals with more than 5% missing data, related individuals, and the ones that failed the X-chromosome sex concordance check or had ethnic information incompatible with their genetic information were excluded. SNPs with more than 2% missing data, with a minor allele frequency smaller than 1%, and the ones that failed the Hardy–Weinberg deviation test (p<1e-5) were also excluded. After applying these filters, we obtained a dataset of 709 samples with 810,648 SNPs for the Uyghurs. The chip genotype data were firstly phased using SHAPEIT (O'Connell et al., 2014). IMPUTE2 (Howie et al., 2009) was then used to impute genotypes at un-genotyped SNPs using the 1000 Genomes Phase three data as reference. Finally, a total of 6,414,304 imputed SNPs passed quality control and were combined with 810,648 genotyped SNPs for further analyses.

CANDELA study

Request a detailed protocol

CANDELA samples (Ruiz-Linares et al., 2014) consist of 6630 volunteers recruited in five Latin American countries (Brazil, Colombia, Chile, México and Perú). Ethics approval was obtained from: The Universidad Nacional Autónoma de México (México), the Universidad de Antioquia (Colombia), the Universidad Peruana Cayetano Heredia (Perú), the Universidad de Tarapacá (Chile), the Universidade Federal do Rio Grande do Sul (Brazil), and the University College London (UK). All participants provided written informed consent. Individuals with dysmorphologies, a history of facial surgery or trauma, or with BMI over 33 were excluded (due to the effect of obesity on facial features). DNA samples were genotyped on Illumina’s Omni Express BeadChip. After applying quality control filters 669,462 SNPs and 6357 individuals were retained for further analyses (2922 males, 3435 females). Average admixture proportions for this sample were estimated as: 48% European, 46% Native American and 6% African, but with substantial inter-individual variation. After all genomic and phenotypic quality controls this study included 5958 individuals. The genetic PCs were obtained from the LD-pruned dataset of 93,328 SNPs using PLINK 1.9. These PCs were selected by inspecting the proportion of variance explained and checking scatter and scree plots. The final imputed dataset used in the GWAS analyses included genotypes for 9,143,600 SNPs using the 1000 Genomes Phase I reference panel. Association analysis on the imputed dataset were performed using the best-guess imputed genotypes in PLINK 1.9 using linear regression with an additive genetic model incorporating age, sex and five genetic PCs as covariates.

Queensland Institute of Medical Research (QIMR) study

Request a detailed protocol

After phenotype and genotype quality control, data were available for 1101 participants who were mainly twins aged 16 years. All participants, and where appropriate their parent or guardian, gave informed consent, and all studies were approved by the QIMR Berghofer Human Research Ethics Committee. Participants were genotyped on the Illumina Human610-Quad and Core+Exome SNP chips. Genotype data were screened for genotyping quality (GenCall < 0.7), SNP and individual call rates (<0.95), HWE failure (p<1e-6) and MAF (<0.01). Data were checked for pedigree, sex and Mendelian errors and for non-European ancestry. Imputation was performed on the Michigan Imputation Server using the SHAPEIT/minimac Pipeline according to the standard protocols.

Facial shape phenotyping

Request a detailed protocol

In our study, we focused on 13 facial landmarks available in all cohorts with 3-dimensional digital photographs. These included right (ExR) and left (ExL) exocanthion: points of bilateral outer canthus; right (EnR) and left (EnL) endocanthion: points of bilateral inner canthus; nasion (Nsn): the point where the bridge of the nose meets the forehead; pronasale (Prn): the point of the nose tip; subnasale (Sbn): the point where the base of the nasal septum meets the philtrum; right (AlR) and left (AlL) alare: points of bilateral nose wing; labliale superius (Ls): the point of labial superius; labiale inferius (Li): the point of labial inferius; right (ChR) and left (ChL) cheilion: points of bilateral angulus oris. Slightly different image processing algorithms were applied to 3dMD images to capture facial landmarks in the participating cohorts. In RS and TwinsUK, the raw 3D facial images were acquired using a 3D photographic scanning system manufactured by 3dMD (http://www.3dmd.com/). Participants were asked to keep their mouths closed and adopt a neutral expression during the acquisition of the 3D scans. Ensemble methods for automated 3D face landmarking were then used to derived 21 landmarks (de Jong et al., 2018; de Jong et al., 2016). In short, 3D surface model of participants’ face obtained with commercial photogrammetry systems for faces called 3dMDface were input to the algorithm. Then a map projection of these face surfaces transformed the 3D data set into 3D relief maps and 2D images were generated from these 3D relief maps. These images were subjected to a 2D landmarking method like Elastic Bunch Graph Matching. Finally, registered 2D landmarks are mapped back into 3D, inverting the projection. 13 shared facial landmarks coordinate data were processed using GPA to remove variations due to scaling, shifting and rotation. Euclidean distances were calculated between all landmarks. Outliers (>3 sd) were removed before the subsequent analysis. Shapiro-Wilk test (Royston, 1995) was used to test the normality of phenotype distribution. At 15 years of age, the ALSPAC children were recalled (5253 attended) and 3D facial scans where taken using two high-resolution Konica Minolta VI-900 laser scanners (lenses with a focal length 14.5 mm). The set of left and right facial scans of each individual were processed, registered, and merged using a locally developed algorithm implemented as a macro in Rapidform 2006 (Kau et al., 2004; Paternoster et al., 2012; Toma et al., 2009). 22 landmarks were recorded manually. GPA was used to remove affine variations due to shifting, rotation and scaling. 4747 individuals had usable images (506 individuals did not complete the assessment or facial scans were removed due to poor quality). The coordinates of 22 facial landmarks were derived from the scans, of which 13 were shared across all other studies. 3D Euclidean distances were calculated between the facial landmarks. 3D facial models from PITT were obtained with 3dMD digital stereo-photogrammetry systems. Facial landmarks were collected on the facial surfaces by trained raters and assessed for common placement errors during subsequent quality control. A set of 78 linear distance variables was then calculated from a subset of 13 of these landmarks, chosen because they were shared among all of the discovery cohorts in this study. These distances were screened for outliers. In UYG, the 3DMDface system (www.3dmd.com/3dMDface) was used to collect high-resolution 3D facial images from participants. A dense registration was applied to align all the 3D images as described elsewhere (Guo et al., 2013). In brief, 15 salient facial landmarks were first automatically annotated based on the PCA projection of texture and shape information. Afterwards, a facial image of high quality and smooth shape surface was chosen as the reference, and its mesh was re-sampled to achieve an even density of one vertex in each 1mm × 1 mm grid. The reference face was then warped to register every sample face by matching all the 15 landmarks, via a non-rigid thin-plate spline (TPS) transformation. The mesh points of the reference face were then projected to the sample surface to find their one-to-one correspondents. The resulting points of projection were used to define the mesh of the sample facial surface. After the alignment, each sample face was represented by a set of 32,251 3D points and 15 target points were ready to be analyzed. A set of 78 linear distance variables was calculated from a subset of 13 of these 15 landmarks. In CANDELA, ordinal phenotyping was described in prior GWAS (Adhikari et al., 2016). In short, right side and frontal 2D photographs of participants were used to score 14 facial traits, Intra-class correlation coefficients (ICCs) were calculated to indicate a moderate-to-high intra-rater reliability of the trait scores. All photographs were scored by the same rater. In QIMR, two independent researchers manually evaluated all 2D portrait photos and located the coordinates for 36 facial landmarks. Satisfactory inter-rater concordance was obtained (Pearson’s correlation coefficient, mean r = 0.76). The average landmark coordinates between the two researchers were used for Euclidian distance calculation.

Phenotype characteristics

Request a detailed protocol

Phenotype characteristics were explored for gender and aging effect, phenotypic and genetic correlations, clustering, and twins-heritability. Gender and aging effects were estimated using beta, p, and R2 values from linear models. Phenotypic correlations were estimated using Pearson’s correlation coefficients. Genetic correlations were estimated using genome-wide SNPs and the multivariate linear mixed model algorithm implemented in the GEMMA software package (Zhou & Stephens, 2014). Manual clustering was according to the involvement of intra- and inter- organ landmarks. Unsupervised hierarchical clustering was conducted using the 1-abs(correlation) as the dissimilarity matrix and each iteration was updated using the Lance - Williams formula (Ward, 1963). we calculated twins heritability for TwinsUK and QIMR twins using the phenotypic correlation derived from monozygotic twins (MZ) and dizygotic twins (DZ), that is h2=2[corMZ-cor(DZ)]. SNP-based heritability was estimated for twins cohorts using the restricted estimated maximum likelihood method implemented in the GCTA software package (Yang et al., 2011).

GWAS, meta-analysis, and replication studies

Request a detailed protocol

GWASs for face phenotypes were independently carried out in RS, TwinsUK, ALSPAC, PITT, CANDELA, UYG, and QIMR based on linear or logistic models assuming an additive allele effect adjusted for potential confounders in each cohorts, such as family relationships, sex, age, BMI and genetic PCs using software packages including PLINK (Purcell et al., 2007), GEMMA (Zhou and Stephens, 2012), and Rare Metal Worker (Feng et al., 2014). It should be noted that we used the scaled General Procrustes Analysis (GPA) to eliminate potential confounding effect of BMI although in some cohorts BMI was additionally included as a covariate, which shouldn’t have affected the results of our meta-analysis. For RS, association tests were run between the 78 3D facial distance measurements (after scaled GPA) and 6,886,439 genotyped and imputed SNPs with MAF > 0.01, imputation R2 > 0.8, SNP call rate > 0.97 and HWE > 0.0001. GWAS was performed using linear regression under the additive genetic model while adjusting for sex, age, and the first four genomic PCs derived using GCTA (Yang et al., 2011). For TwinsUK, quality control was performed on SNPs (MAF > 0.01, imputation R2 > 0.8, SNP call rate > 0.99). One of each pair of monozygotic twins (MZ) were excluded from association tests to reduce relatedness. The final sample included 1020 individuals and 4,699,859 autosomal genotyped and imputed variants. GWAS was performed between the 78 3D facial distance measurements (after scaled GPA) and genotyped and imputed SNPs using linear regression under the additive genetic model while adjusting for age and the first four genomic PCs. For ALSPAC, 3941 study participants had both genotype and 3D facial phenotype data. GCTA was used to remove individuals with excessive relatedness (GRM > 0.05) and quality control was performed on SNPs (MAF > 0.01, imputation R2 > 0.8, SNP call rate > 0.99). The final sample included 3707 individuals and 4,627,882 autosomal genotyped and imputed SNPs. GWAS was performed between the 78 3D facial distance measurements (after scaled GPA) and genotyped and imputed SNPs using linear regression under the additive genetic model while adjusting for age, sex, and the first four genomic PCs using PLINK 1.9 (Purcell et al., 2007). For PITT, association tests were run between the 78 3D facial distance measurements (after scaled GPA) and 9,478,231 genotyped and imputed SNPs with MAF > 1% and HWE > 0.0001. GWAS was performed using linear regression under the additive genetic model while adjusting for sex, age, age2, height, weight and the first four genomic PCs using PLINK 1.9. For the analysis of the X-chromosome, genotypes were coded 0, 1, and two as per the additive genetic model for females, and coded 0, two for males in order to maintain the same scale between sexes. Note that in the meta-analysis of all cohorts, the X-chromosome was not included. For CANDELA, PLINK 1.9 was used to perform the primary genome-wide association tests for each phenotype using multiple linear regression with an additive genetic model incorporating age, sex, BMI and five genetic PCs as covariates. Association analyses were performed on the imputed dataset with two approaches: using the best-guess imputed genotypes in PLINK, and using the IMPUTE2 genotype probabilities in SNPTEST v2.5. Both were consistent with each other and with the results from the chip genotype data. GWAS results with using best-guess genotypes were used in the subsequent meta-analysis. For UYG, GWAS were carried out using PLINK 1.9 and a linear regression model with additive SNP effects was used. Details was described in prior GWAS (Qiao et al., 2018). For QIMR, the GWAS was conducted using Rare Metal Worker with linear regression adjusted for sex, age, BMI and the first five genomic PCs.

Inverse variance fixed-effect meta-analysis were carried out using PLINK to combine GWAS results for samples of European origin and having 3dMDface data, that is RS, TwinsUK, ALSPAC, and PITT, which included 7,029,494 autosomal SNPs overlapping between the 4 discovery cohorts with physical positions aligned according to NCBI human genome sequence build GRCh37.p13. X chromosome was not included in meta-analyses. Summary statistics of all SNPs and all facial phenotypes from the GWAS meta-analysis from the four cohorts of European descent (RS, TwinsUK, ALSPAC and PITT) are available via figshare https://doi.org/10.6084/m9.figshare.10298396 and will be made available through the NHGRI-EBI GWAS Catalog https://www.ebi.ac.uk/gwas/downloads/summary-statistics. Meta-analysis results were visualized using superposed Manhattan plots and Q-Q plots. All genomic inflation factors were close to 1 and not further considered. Because many of the 78 facial phenotypes were correlated, we applied the matrix decomposition method described by Li and Ji (2005) to determine the effective number of independent phenotypes as 43. We then set our study-wide significant threshold at p<1.2×10−9 using Bonferroni correction of 43 phenotypes, considering significant threshold for single phenotype GWAS as P<5×10−8). We considered the threshold of 5×10−8 as the threshold for replication studies in CANDELA, UYG, and QIMR. Since the 3dMDface data were available in UYG, exact replications were conducted, that is the genetic association was tested for the corresponding phenotypes in the meta-analysis. Since only 2D photos were available in CANDELA and QIMR, trait-wise replications were conducted to test H0: no association for all face phenotypes vs. H1: associated with at least one face phenotypes. We used a combined test of dependent tests to combine genetic association tests involving highly correlated face phenotypes (Kost and McDermott, 2002). In short, the test statistic of the combined test follows a scaled chi-squared distribution when tests are dependent. The mean of the χ2 values is:

E(χ2)=2k, and the variance can be estimated as

var(χ2)=4k+21i<jkcov(2logPi, 2logPj), where

k is the number of phenotypes, Pi is the p value of the ith test. The covariance term cov(-2logPi, -2logPj)  can be approximated by the following formula using the phenotype correlation matrix:

cov-2logPi, -2logPj3.263rij+0.71rij2+0.027rij3

This approximation has been shown nearly identical to the exact value with difference less than 0.0002 for −0.98 <rij < 0.98 (Kost and McDermott, 2002). We then again used the combined test of dependent tests to combine replication evidence from all three cohorts. Multiple testing was corrected for the number of selected SNPs using the Bonferroni method. Regional linkage disequilibrium r2 values were calculated in PLINK and visualized using R scripting. Regional association plots were produced using LocusZoom (Pruim et al., 2010).

Genetic diversity and positive selection signal analysis

Request a detailed protocol

To test for signals of allele frequency diversity and positive selection in the face associated genomic regions, we derived two commonly used statistics in population genetics, that is the fixation index (FST) and the integrated haplotype score (iHS) for all SNPs in the associated regions. The FST is a parameter in population genetics for quantifying the differentiation due to genetic structure between a pair of groups by measuring the proportion of genetic diversity due to allele frequency differences among populations potentially as a result of divergent natural selection (Holsinger and Weir, 2009). We performed a genome-wide FST analysis for three pairs of populations (AFR-EAS, EAS-EUR, and EUR-AFR) in the samples from the 1000 Genome Project using the PLINK software. The empirical significance cutoff for each of the three population pairs was set to the value corresponding to the top 1% FST values in the genome. The iHS is the integrate of the extended haplotype homozygosity (EHH) statistic to highlight recent positive selected variants that have not yet reached fixation, and large values suggest the adaption to the selective pressures in the most recent stage of evolution (Sabeti et al., 2002; Voight et al., 2006). The EHH statistic was calculated for all SNPs in AFR, EAS and EUR samples. Then the iHS statistic, integral of the decay of EHH away from each SNP until EHH reaches 0.05, was calculated for all SNPs, in which an allele frequency bin of 0.05 was used to standardize iHS scores against other SNPs of the same frequency class. Finally, we calculated empirical significance threshold over the genome assuming a Gaussian distribution of iHS scores under the neutral model. The cutoff was based on the top 1% absolute iHS scores calculated in three samples respectively as the absolute score indicate the selection intensity.

Multivariable and polygenic score (PS) analysis

Request a detailed protocol

Multivariable and polygenic score analyses were conducted in the RS cohort. For all SNPs showing study-wide suggestively significant association with face phenotypes (p<5x10−8), we performed a multivariable regression analysis aiming to select SNPs that still be significant after regressing out the effects of the 24 lead SNPs by including them as covariates in the model, which resulted in 31 SNPs for the subsequent modeling. A forward step-wise regression was conducted to access the independent effects of the selected SNPs on sex- and age- adjusted face phenotypes, where the R2 contribution of each SNP was estimated by comparing the current model fitting R2 with the R2 in a previous iteration. Next, we calculated PS for AFR, EUR, EAS individuals of the 1000 Genomes Project using the selected SNPs and the effect sizes estimated from the multivariable analysis, that is PS=i=1nβiAi. The mean values and distributions of the standardized PS values for all face phenotypes in each continental group was visualized using a face map and multiple boxplots, in which the differences between groups were highlighted according to the p-values from the ANOVA test, which tests H0: PS values are the same in all continental groups and H1: PS values are different in at least one of the groups.

Gene ontology analysis, expression analysis and SNP functional annotation

Request a detailed protocol

To further study the involved functions and regulation relationships of the genome-wide significantly associated SNP markers and candidate genes nearby, the genes located within 100 kb up- and down-stream of the SNPs were selected to perform biological process enrichment analysis based on the Gene Ontology (GO) database. The enrichment analysis was performed using the R package ‘clusterProfiler’ (Yu et al., 2012). RNA-seq data from the study of Prescott et al. (2015), GTEx (Lonsdale et al., 2013) and ENCODE (ENCODE Project Consortium, 2012) were used to compare the differences in gene expression levels between our face-associated genes and all genes over the genome in CNCCs. We examined entries in GWAS catalog (Welter et al., 2014) to explore the potential pleotropic effects in our face associated regions. The GTEx (Lonsdale et al., 2013) dataset was used to annotate our face associated SNPs for significant single-tissue cis-eQTL effects in all 48 available tissues. We used the CNCC data from Prescott et al. and UCSC (Aken et al., 2016) to perform a variety of functional annotations for the face associated SNPs in non-coding regions using the WashU Epigenome Browser, including histone modifications (H3K27ac, H3K4me1 and H3K4me3 tracks), chromatin modifier (p300 tracks), transcription factors (TFAP2A tracks), chromatin accessibility (ATACseq tracks) and conserved regions (PhyloP (Siepel et al., 2005) tracks) from UCSC. To investigate the pattern of regulation within each association region in details, we calculated the log scaled quantile rank and variation of all LD SNP (r2 > 0.25) in three replicated measurements from 6 types of Chip-seq data, and for all significant base-pair in vicinity (within 20 kb) of all top-associated SNPs. All analyses were conducted using R Environment for Statistical Computing (version 3.3.1) unless otherwise specified.

Cell culture

Request a detailed protocol

Neural progenitors were derived from human induced pluripotent stem cells as described previously (male, age 57 years, derived from primary skin fibroblasts) (Gunhanlar et al., 2018). Written informed consent was obtained in accordance with the Medical Ethical Committee of the Erasmus University Medical Center. Neural progenitors contain a subpopulation of neural crest progenitor cells that were enriched by fluorescence-activated cell sorting for CD271 (BD Bioscience 560927, BD FACSAria III) (Dupin and Coelho-Aguiar, 2013; Lee et al., 2007). CD271 (p75) is a surface antigen marking neural crest progenitor cells which give rise to craniofacial tissues as well as melanocyte primary cells (Morrison et al., 1999). CD271+ cells were grown in DMEM:F12, 1% N2 supplement, 2% B27 supplement, 1% P/S, 20 ng/ml FGF at 37°C/5% CO2. G361 cells were cultured in DMEM/10% FCS at 37°C/5% CO2. The cell line was tested negative for mycoplasma contamination and its identity was authenticated by STR profiling using PowerPlex 16 System (Promega).

In vitro functional experiments: Luciferase reporter assay

Request a detailed protocol

We explored the effect of the five selected SNPs on gene expression in CD271+ and G361 melanoma cells by a luciferase reporter assay. Custom 500 bp gBlocks Gene fragments (IDT) containing putative enhancers surrounding each of the five selected SNPs were cloned into a modified pGL3 firefly luciferase reporter vector (SV40 promoter and 3´UTR replaced by HSP promoter and 3´UTR). For each SNP, constructs containing both ancestral and derived alleles were tested. Equimolar amount of constructs (measured by qPCR detecting the firefly luciferase gene using GoTaq qPCR Master Mix (Promega) were co-transfected with renilla luciferase control vector into CD271+ and G361 cells using Lipofectamine LTX (Invitrogen). Luciferase activity was measured 48 hr after transfection using the Dual-Glo Luciferase Assay System (Promega). Data are presented as relative luciferase activity (firefly/renilla activity ratio) fold change compared to the construct containing the less active allele and represent at least three independent replicates. Student´s t-test was performed to test differences in gene expression. The Key Resources Table summarizing reagents and resources used in the in-vitro functional experiments is included in the Supplementary file 1 - Table 19 (see Supplementary file 1).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
    Hundreds of variants clustered in genomic loci and biological pathways affect human height
    1. H Lango Allen
    2. K Estrada
    3. G Lettre
    4. SI Berndt
    5. MN Weedon
    6. F Rivadeneira
    7. CJ Willer
    8. AU Jackson
    9. S Vedantam
    10. S Raychaudhuri
    11. T Ferreira
    12. AR Wood
    13. RJ Weyant
    14. AV Segrè
    15. EK Speliotes
    16. E Wheeler
    17. N Soranzo
    18. JH Park
    19. J Yang
    20. D Gudbjartsson
    21. NL Heard-Costa
    22. JC Randall
    23. L Qi
    24. A Vernon Smith
    25. R Mägi
    26. T Pastinen
    27. L Liang
    28. IM Heid
    29. J Luan
    30. G Thorleifsson
    31. TW Winkler
    32. ME Goddard
    33. K Sin Lo
    34. C Palmer
    35. T Workalemahu
    36. YS Aulchenko
    37. A Johansson
    38. MC Zillikens
    39. MF Feitosa
    40. T Esko
    41. T Johnson
    42. S Ketkar
    43. P Kraft
    44. M Mangino
    45. I Prokopenko
    46. D Absher
    47. E Albrecht
    48. F Ernst
    49. NL Glazer
    50. C Hayward
    51. JJ Hottenga
    52. KB Jacobs
    53. JW Knowles
    54. Z Kutalik
    55. KL Monda
    56. O Polasek
    57. M Preuss
    58. NW Rayner
    59. NR Robertson
    60. V Steinthorsdottir
    61. JP Tyrer
    62. BF Voight
    63. F Wiklund
    64. J Xu
    65. JH Zhao
    66. DR Nyholt
    67. N Pellikka
    68. M Perola
    69. JR Perry
    70. I Surakka
    71. ML Tammesoo
    72. EL Altmaier
    73. N Amin
    74. T Aspelund
    75. T Bhangale
    76. G Boucher
    77. DI Chasman
    78. C Chen
    79. L Coin
    80. MN Cooper
    81. AL Dixon
    82. Q Gibson
    83. E Grundberg
    84. K Hao
    85. M Juhani Junttila
    86. LM Kaplan
    87. J Kettunen
    88. IR König
    89. T Kwan
    90. RW Lawrence
    91. DF Levinson
    92. M Lorentzon
    93. B McKnight
    94. AP Morris
    95. M Müller
    96. J Suh Ngwa
    97. S Purcell
    98. S Rafelt
    99. RM Salem
    100. E Salvi
    101. S Sanna
    102. J Shi
    103. U Sovio
    104. JR Thompson
    105. MC Turchin
    106. L Vandenput
    107. DJ Verlaan
    108. V Vitart
    109. CC White
    110. A Ziegler
    111. P Almgren
    112. AJ Balmforth
    113. H Campbell
    114. L Citterio
    115. A De Grandi
    116. A Dominiczak
    117. J Duan
    118. P Elliott
    119. R Elosua
    120. JG Eriksson
    121. NB Freimer
    122. EJ Geus
    123. N Glorioso
    124. S Haiqing
    125. AL Hartikainen
    126. AS Havulinna
    127. AA Hicks
    128. J Hui
    129. W Igl
    130. T Illig
    131. A Jula
    132. E Kajantie
    133. TO Kilpeläinen
    134. M Koiranen
    135. I Kolcic
    136. S Koskinen
    137. P Kovacs
    138. J Laitinen
    139. J Liu
    140. ML Lokki
    141. A Marusic
    142. A Maschio
    143. T Meitinger
    144. A Mulas
    145. G Paré
    146. AN Parker
    147. JF Peden
    148. A Petersmann
    149. I Pichler
    150. KH Pietiläinen
    151. A Pouta
    152. M Ridderstråle
    153. JI Rotter
    154. JG Sambrook
    155. AR Sanders
    156. CO Schmidt
    157. J Sinisalo
    158. JH Smit
    159. HM Stringham
    160. G Bragi Walters
    161. E Widen
    162. SH Wild
    163. G Willemsen
    164. L Zagato
    165. L Zgaga
    166. P Zitting
    167. H Alavere
    168. M Farrall
    169. WL McArdle
    170. M Nelis
    171. MJ Peters
    172. S Ripatti
    173. JB van Meurs
    174. KK Aben
    175. KG Ardlie
    176. JS Beckmann
    177. JP Beilby
    178. RN Bergman
    179. S Bergmann
    180. FS Collins
    181. D Cusi
    182. M den Heijer
    183. G Eiriksdottir
    184. PV Gejman
    185. AS Hall
    186. A Hamsten
    187. HV Huikuri
    188. C Iribarren
    189. M Kähönen
    190. J Kaprio
    191. S Kathiresan
    192. L Kiemeney
    193. T Kocher
    194. LJ Launer
    195. T Lehtimäki
    196. O Melander
    197. TH Mosley
    198. AW Musk
    199. MS Nieminen
    200. CJ O'Donnell
    201. C Ohlsson
    202. B Oostra
    203. LJ Palmer
    204. O Raitakari
    205. PM Ridker
    206. JD Rioux
    207. A Rissanen
    208. C Rivolta
    209. H Schunkert
    210. AR Shuldiner
    211. DS Siscovick
    212. M Stumvoll
    213. A Tönjes
    214. J Tuomilehto
    215. GJ van Ommen
    216. J Viikari
    217. AC Heath
    218. NG Martin
    219. GW Montgomery
    220. MA Province
    221. M Kayser
    222. AM Arnold
    223. LD Atwood
    224. E Boerwinkle
    225. SJ Chanock
    226. P Deloukas
    227. C Gieger
    228. H Grönberg
    229. P Hall
    230. AT Hattersley
    231. C Hengstenberg
    232. W Hoffman
    233. GM Lathrop
    234. V Salomaa
    235. S Schreiber
    236. M Uda
    237. D Waterworth
    238. AF Wright
    239. TL Assimes
    240. I Barroso
    241. A Hofman
    242. KL Mohlke
    243. DI Boomsma
    244. MJ Caulfield
    245. LA Cupples
    246. J Erdmann
    247. CS Fox
    248. V Gudnason
    249. U Gyllensten
    250. TB Harris
    251. RB Hayes
    252. MR Jarvelin
    253. V Mooser
    254. PB Munroe
    255. WH Ouwehand
    256. BW Penninx
    257. PP Pramstaller
    258. T Quertermous
    259. I Rudan
    260. NJ Samani
    261. TD Spector
    262. H Völzke
    263. H Watkins
    264. JF Wilson
    265. LC Groop
    266. T Haritunians
    267. FB Hu
    268. RC Kaplan
    269. A Metspalu
    270. KE North
    271. D Schlessinger
    272. NJ Wareham
    273. DJ Hunter
    274. JR O'Connell
    275. DP Strachan
    276. HE Wichmann
    277. IB Borecki
    278. CM van Duijn
    279. EE Schadt
    280. U Thorsteinsdottir
    281. L Peltonen
    282. AG Uitterlinden
    283. PM Visscher
    284. N Chatterjee
    285. RJ Loos
    286. M Boehnke
    287. MI McCarthy
    288. E Ingelsson
    289. CM Lindgren
    290. GR Abecasis
    291. K Stefansson
    292. TM Frayling
    293. JN Hirschhorn
    (2010)
    Nature 467:832–838.
    https://doi.org/10.1038/nature09410
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
    Nose shape and climate
    1. JS Weiner
    (1954)
    American Journal of Physical Anthropology 12:615–618.
    https://doi.org/10.1002/ajpa.1330120412
  88. 88
  89. 89
    Defining the role of common variation in the genomic and biological architecture of adult human height
    1. AR Wood
    2. T Esko
    3. J Yang
    4. S Vedantam
    5. TH Pers
    6. S Gustafsson
    7. AY Chu
    8. K Estrada
    9. J Luan
    10. Z Kutalik
    11. N Amin
    12. ML Buchkovich
    13. DC Croteau-Chonka
    14. FR Day
    15. Y Duan
    16. T Fall
    17. R Fehrmann
    18. T Ferreira
    19. AU Jackson
    20. J Karjalainen
    21. KS Lo
    22. AE Locke
    23. R Mägi
    24. E Mihailov
    25. E Porcu
    26. JC Randall
    27. A Scherag
    28. AA Vinkhuyzen
    29. HJ Westra
    30. TW Winkler
    31. T Workalemahu
    32. JH Zhao
    33. D Absher
    34. E Albrecht
    35. D Anderson
    36. J Baron
    37. M Beekman
    38. A Demirkan
    39. GB Ehret
    40. B Feenstra
    41. MF Feitosa
    42. K Fischer
    43. RM Fraser
    44. A Goel
    45. J Gong
    46. AE Justice
    47. S Kanoni
    48. ME Kleber
    49. K Kristiansson
    50. U Lim
    51. V Lotay
    52. JC Lui
    53. M Mangino
    54. I Mateo Leach
    55. C Medina-Gomez
    56. MA Nalls
    57. DR Nyholt
    58. CD Palmer
    59. D Pasko
    60. S Pechlivanis
    61. I Prokopenko
    62. JS Ried
    63. S Ripke
    64. D Shungin
    65. A Stancáková
    66. RJ Strawbridge
    67. YJ Sung
    68. T Tanaka
    69. A Teumer
    70. S Trompet
    71. SW van der Laan
    72. J van Setten
    73. JV Van Vliet-Ostaptchouk
    74. Z Wang
    75. L Yengo
    76. W Zhang
    77. U Afzal
    78. J Arnlöv
    79. GM Arscott
    80. S Bandinelli
    81. A Barrett
    82. C Bellis
    83. AJ Bennett
    84. C Berne
    85. M Blüher
    86. JL Bolton
    87. Y Böttcher
    88. HA Boyd
    89. M Bruinenberg
    90. BM Buckley
    91. S Buyske
    92. IH Caspersen
    93. PS Chines
    94. R Clarke
    95. S Claudi-Boehm
    96. M Cooper
    97. EW Daw
    98. PA De Jong
    99. J Deelen
    100. G Delgado
    101. JC Denny
    102. R Dhonukshe-Rutten
    103. M Dimitriou
    104. AS Doney
    105. M Dörr
    106. N Eklund
    107. E Eury
    108. L Folkersen
    109. ME Garcia
    110. F Geller
    111. V Giedraitis
    112. AS Go
    113. H Grallert
    114. TB Grammer
    115. J Gräßler
    116. H Grönberg
    117. LC de Groot
    118. CJ Groves
    119. J Haessler
    120. P Hall
    121. T Haller
    122. G Hallmans
    123. A Hannemann
    124. CA Hartman
    125. M Hassinen
    126. C Hayward
    127. NL Heard-Costa
    128. Q Helmer
    129. G Hemani
    130. AK Henders
    131. HL Hillege
    132. MA Hlatky
    133. W Hoffmann
    134. P Hoffmann
    135. O Holmen
    136. JJ Houwing-Duistermaat
    137. T Illig
    138. A Isaacs
    139. AL James
    140. J Jeff
    141. B Johansen
    142. Å Johansson
    143. J Jolley
    144. T Juliusdottir
    145. J Junttila
    146. AN Kho
    147. L Kinnunen
    148. N Klopp
    149. T Kocher
    150. W Kratzer
    151. P Lichtner
    152. L Lind
    153. J Lindström
    154. S Lobbens
    155. M Lorentzon
    156. Y Lu
    157. V Lyssenko
    158. PK Magnusson
    159. A Mahajan
    160. M Maillard
    161. WL McArdle
    162. CA McKenzie
    163. S McLachlan
    164. PJ McLaren
    165. C Menni
    166. S Merger
    167. L Milani
    168. A Moayyeri
    169. KL Monda
    170. MA Morken
    171. G Müller
    172. M Müller-Nurasyid
    173. AW Musk
    174. N Narisu
    175. M Nauck
    176. IM Nolte
    177. MM Nöthen
    178. L Oozageer
    179. S Pilz
    180. NW Rayner
    181. F Renstrom
    182. NR Robertson
    183. LM Rose
    184. R Roussel
    185. S Sanna
    186. H Scharnagl
    187. S Scholtens
    188. FR Schumacher
    189. H Schunkert
    190. RA Scott
    191. J Sehmi
    192. T Seufferlein
    193. J Shi
    194. K Silventoinen
    195. JH Smit
    196. AV Smith
    197. J Smolonska
    198. AV Stanton
    199. K Stirrups
    200. DJ Stott
    201. HM Stringham
    202. J Sundström
    203. MA Swertz
    204. AC Syvänen
    205. BO Tayo
    206. G Thorleifsson
    207. JP Tyrer
    208. S van Dijk
    209. NM van Schoor
    210. N van der Velde
    211. D van Heemst
    212. FV van Oort
    213. SH Vermeulen
    214. N Verweij
    215. JM Vonk
    216. LL Waite
    217. M Waldenberger
    218. R Wennauer
    219. LR Wilkens
    220. C Willenborg
    221. T Wilsgaard
    222. MK Wojczynski
    223. A Wong
    224. AF Wright
    225. Q Zhang
    226. D Arveiler
    227. SJ Bakker
    228. J Beilby
    229. RN Bergman
    230. S Bergmann
    231. R Biffar
    232. J Blangero
    233. DI Boomsma
    234. SR Bornstein
    235. P Bovet
    236. P Brambilla
    237. MJ Brown
    238. H Campbell
    239. MJ Caulfield
    240. A Chakravarti
    241. R Collins
    242. FS Collins
    243. DC Crawford
    244. LA Cupples
    245. J Danesh
    246. U de Faire
    247. HM den Ruijter
    248. R Erbel
    249. J Erdmann
    250. JG Eriksson
    251. M Farrall
    252. E Ferrannini
    253. J Ferrières
    254. I Ford
    255. NG Forouhi
    256. T Forrester
    257. RT Gansevoort
    258. PV Gejman
    259. C Gieger
    260. A Golay
    261. O Gottesman
    262. V Gudnason
    263. U Gyllensten
    264. DW Haas
    265. AS Hall
    266. TB Harris
    267. AT Hattersley
    268. AC Heath
    269. C Hengstenberg
    270. AA Hicks
    271. LA Hindorff
    272. AD Hingorani
    273. A Hofman
    274. GK Hovingh
    275. SE Humphries
    276. SC Hunt
    277. E Hypponen
    278. KB Jacobs
    279. MR Jarvelin
    280. P Jousilahti
    281. AM Jula
    282. J Kaprio
    283. JJ Kastelein
    284. M Kayser
    285. F Kee
    286. SM Keinanen-Kiukaanniemi
    287. LA Kiemeney
    288. JS Kooner
    289. C Kooperberg
    290. S Koskinen
    291. P Kovacs
    292. AT Kraja
    293. M Kumari
    294. J Kuusisto
    295. TA Lakka
    296. C Langenberg
    297. L Le Marchand
    298. T Lehtimäki
    299. S Lupoli
    300. PA Madden
    301. S Männistö
    302. P Manunta
    303. A Marette
    304. TC Matise
    305. B McKnight
    306. T Meitinger
    307. FL Moll
    308. GW Montgomery
    309. AD Morris
    310. AP Morris
    311. JC Murray
    312. M Nelis
    313. C Ohlsson
    314. AJ Oldehinkel
    315. KK Ong
    316. WH Ouwehand
    317. G Pasterkamp
    318. A Peters
    319. PP Pramstaller
    320. JF Price
    321. L Qi
    322. OT Raitakari
    323. T Rankinen
    324. DC Rao
    325. TK Rice
    326. M Ritchie
    327. I Rudan
    328. V Salomaa
    329. NJ Samani
    330. J Saramies
    331. MA Sarzynski
    332. PE Schwarz
    333. S Sebert
    334. P Sever
    335. AR Shuldiner
    336. J Sinisalo
    337. V Steinthorsdottir
    338. RP Stolk
    339. JC Tardif
    340. A Tönjes
    341. A Tremblay
    342. E Tremoli
    343. J Virtamo
    344. MC Vohl
    345. P Amouyel
    346. FW Asselbergs
    347. TL Assimes
    348. M Bochud
    349. BO Boehm
    350. E Boerwinkle
    351. EP Bottinger
    352. C Bouchard
    353. S Cauchi
    354. JC Chambers
    355. SJ Chanock
    356. RS Cooper
    357. PI de Bakker
    358. G Dedoussis
    359. L Ferrucci
    360. PW Franks
    361. P Froguel
    362. LC Groop
    363. CA Haiman
    364. A Hamsten
    365. MG Hayes
    366. J Hui
    367. DJ Hunter
    368. K Hveem
    369. JW Jukema
    370. RC Kaplan
    371. M Kivimaki
    372. D Kuh
    373. M Laakso
    374. Y Liu
    375. NG Martin
    376. W März
    377. M Melbye
    378. S Moebus
    379. PB Munroe
    380. I Njølstad
    381. BA Oostra
    382. CN Palmer
    383. NL Pedersen
    384. M Perola
    385. L Pérusse
    386. U Peters
    387. JE Powell
    388. C Power
    389. T Quertermous
    390. R Rauramaa
    391. E Reinmaa
    392. PM Ridker
    393. F Rivadeneira
    394. JI Rotter
    395. TE Saaristo
    396. D Saleheen
    397. D Schlessinger
    398. PE Slagboom
    399. H Snieder
    400. TD Spector
    401. K Strauch
    402. M Stumvoll
    403. J Tuomilehto
    404. M Uusitupa
    405. P van der Harst
    406. H Völzke
    407. M Walker
    408. NJ Wareham
    409. H Watkins
    410. HE Wichmann
    411. JF Wilson
    412. P Zanen
    413. P Deloukas
    414. IM Heid
    415. CM Lindgren
    416. KL Mohlke
    417. EK Speliotes
    418. U Thorsteinsdottir
    419. I Barroso
    420. CS Fox
    421. KE North
    422. DP Strachan
    423. JS Beckmann
    424. SI Berndt
    425. M Boehnke
    426. IB Borecki
    427. MI McCarthy
    428. A Metspalu
    429. K Stefansson
    430. AG Uitterlinden
    431. CM van Duijn
    432. L Franke
    433. CJ Willer
    434. AL Price
    435. G Lettre
    436. RJ Loos
    437. MN Weedon
    438. E Ingelsson
    439. JR O'Connell
    440. GR Abecasis
    441. DI Chasman
    442. ME Goddard
    443. PM Visscher
    444. JN Hirschhorn
    445. TM Frayling
    446. Electronic Medical Records and Genomics (eMEMERGEGE) Consortium
    447. MIGen Consortium
    448. PAGEGE Consortium
    449. LifeLines Cohort Study
    (2014)
    Nature Genetics 46:1173–1186.
    https://doi.org/10.1038/ng.3097
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97

Decision letter

  1. Mark I McCarthy
    Senior Editor; Genentech, United States
  2. Andrew P Morris
    Reviewing Editor; University of Liverpool, United Kingdom
  3. Andrew P Morris
    Reviewer; University of Liverpool, United Kingdom
  4. Seongwon Cha
    Reviewer

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The reviewers appreciated the presentation of results of the meta-analysis of genome-wide association studies (GWAS) of 3D facial phenotypes in large sample sizes across cohorts of European ancestry, with replication in an additional three cohorts of diverse ancestry. The reviewers felt the detailed modelling of the facial phenotypes was a particular strength of the study. The authors identified 24 loci at stud-wide significance, 17 of which had not been previously reported: of these 10 loci were replicated (including six of the novel loci). Subsequent integration of these GWAS results with epigenomic data generated in cranial neural crest cells highlighted enrichment in cis-regulatory elements across the identified loci, and luciferase reporter assays demonstrated enhancer activity of several associated variants. Taken together, the reviewers felt that these data have substantially advanced the understanding of the genetic contribution to facial phenotypes, and have pinpointed potential causal genes and molecular mechanisms underlying facial variation that warrant further functional investigation.

Decision letter after peer review:

Thank you for submitting your article "Novel genetic loci affecting facial shape variation in humans" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Andrew P Morris as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Mark McCarthy as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Seongwon Cha (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Xiong et al. present results of meta-analysis of genome-wide association studies (GWAS) of facial phenotypes across cohorts of European ancestry, with replication in an additional three cohorts of diverse ancestry. The authors identified 24 loci at genome-wide significance, 17 of which had not been previously reported: of these 13 loci were replicated (including 9 of the novel loci). Integration of GWAS results with epigenomic data generated in cranial neural crest cells highlighted enrichment in cis-regulatory elements across the identified loci, and luciferase reporter assays demonstrated enhancer activity of variants in linkage disequilibrium with lead SNPs at three loci.

Essential revisions:

1) Genome-wide significance and replication. Given that 78 traits are tested here, using the traditional genome-wide significance threshold of 5x10-8 seems anti-conservative – some adjustment for multiple traits should be considered. This is even more essential as the replication is not ideal – only one of the three replication studies has exactly the same traits as discovery, and for the other two replication studies, a composite p-value for association with any facial trait is reported. The replication stage also does not take account of multiple testing of 24 SNPs. Only the composite p-value should be reported in replication. The authors could consider a meta-analysis of the p-value across the three replication studies using Fishers' method to give an overall assessment of replication evidence.

2) Elsewhere in the manuscript, "significant" is used, without any report of the threshold for significance (in the Results or the Materials and methods) or appropriate justification: "nominally significant" association with facial phenotypes; "significant" multi-tissue eQTL effects; no justification for p<0.05 in subsection “Preferential expression of face-associated genetic loci in embryonic cranial neural crest cells” and “Cis-regulatory signals in face-associated genetic loci”.

3) Signals of selection. More details of these findings should be reported in the Results section (and shorter comments in the Discussion). Since Fst and iHS are not well powered to detect selection, overlap with singleton density score (SDS) results in Europeans should be reported (Field et al., 2016).

4) Selection of "candidate regulatory SNPs" is not clear. The authors highlight SNPs that are in LD with the lead SNP at some loci. However, tag SNPs with r2 of 0.3 with the lead SNP do not seem to be in strong LD, and they may not actually show a strong association with the facial trait for that locus – as a result, the fact that the authors demonstrate enhancer activity for this SNP is irrelevant, as it will not be a good candidate to be driving the association signal. It also wasn't clear where the 5 SNPs came from, since 7 are reported in the preceding text. A better approach would be to define 99% credible sets of variants at each locus, which account for 99% of the probability of driving the association, and then assess the evidence for regulatory activity for these.

5) Multivariable modelling of phenotypes. The authors have considered all SNPs at genome-wide significance, and then LD pruned to a set of independent SNPs (r2<0.5). This does not define SNPs with independent effects on the phenotypes. The authors would be better to actually condition out the effects of the 24 lead SNPs by including them as covariates in the model, and then identify SNPs that remain at genome-wide significance. This could be done in exactly in RS, or could be done through approximate conditional analysis implemented in GCTA.

https://doi.org/10.7554/eLife.49898.sa1

Author response

Essential revisions:

1) Genome-wide significance and replication. Given that 78 traits are tested here, using the traditional genome-wide significance threshold of 5x10-8 seems anti-conservative – some adjustment for multiple traits should be considered. This is even more essential as the replication is not ideal – only one of the three replication studies has exactly the same traits as discovery, and for the other two replication studies, a composite p-value for association with any facial trait is reported. The replication stage also does not take account of multiple testing of 24 SNPs. Only the composite p-value should be reported in replication. The authors could consider a meta-analysis of the p-value across the three replication studies using Fishers' method to give an overall assessment of replication evidence.

We followed the reviewer’s suggestion, performed the requested analyses and revised our manuscript accordingly. In particular, a matrix spectral decomposition analysis of the total of 78 facial distance traits estimated the effective number of independent traits as 43, which corresponds to a Bonferroni corrected study-wide significant threshold of 1.2×10-9. We considered the traditional genome-wide significant threshold of 5×10-8 as our study-wide suggestive threshold for replication analysis. In the revised manuscript the study-wide significant and suggestive results were separately described. In addition, we applied the combined test method suggested by the reviewers to combine the evidence from the three replication datasets, and used the Bonferroni method to correct for the multiple testing of the 24 SNPs that passed the study-wide suggestive association threshold in the discovery analysis by applying the P<2.1x10-3 threshold. We now also applied Bonferroni correction to other analysis e.g. cis-qQTL, preferential expression, cis regulatory signals etc. The revised manuscript was adjusted accordingly.

2) Elsewhere in the manuscript, "significant" is used, without any report of the threshold for significance (in the Results or the Materials and methods) or appropriate justification: "nominally significant" association with facial phenotypes; "significant" multi-tissue eQTL effects; no justification for p<0.05 in subsection “Preferential expression of face-associated genetic loci in embryonic cranial neural crest cells” and “Cis-regulatory signals in face-associated genetic loci”.

As requested by the reviewer, we revised our association analysis using strict Bonferroni correction and reported the adjusted p-values. The revised manuscript was adjusted accordingly.

3) Signals of selection. More details of these findings should be reported in the Results section (and shorter comments in the Discussion). Since Fst and iHS are not well powered to detect selection, overlap with singleton density score (SDS) results in Europeans should be reported (Field et al., 2016).

As suggested by the reviewer, we looked-up our 24 face-associated SNPs in the External Online Resources of Field et al., 2016 and found that none of them overlapped with the previously reported signals of singleton density score (SDS, empirical p<0.01) in Europeans (Field et al., 2016). This may indicate that the observed selection signals are unlikely the consequences from the recent selective pressures as far as detectable via SDS analysis. We added this information to the revised manuscript.

4) Selection of "candidate regulatory SNPs" is not clear. The authors highlight SNPs that are in LD with the lead SNP at some loci. However, tag SNPs with r2 of 0.3 with the lead SNP do not seem to be in strong LD, and they may not actually show a strong association with the facial trait for that locus – as a result, the fact that the authors demonstrate enhancer activity for this SNP is irrelevant, as it will not be a good candidate to be driving the association signal. It also wasn't clear where the 5 SNPs came from, since 7 are reported in the preceding text. A better approach would be to define 99% credible sets of variants at each locus, which account for 99% of the probability of driving the association, and then assess the evidence for regulatory activity for these.

We agree with the reviewer that the description of this work was not clear in the submitted version. We made efforts to better clarify this in the revised version as follows. “In particular, 6 (25%) of the 24 face-associated loci, i.e., 1p36.13 ARHGEF19, 2q36.1 PAX3, CMSS1, 4q28.1 INTU, 16q12.1 SALL1, and 16q12.2 RPGRIP1L, showed significant enrichment of cis-regulatory entries at the top 0.1% of the genome level in at least one epigenomic tracks after Bonferroni correction (adjusted P<0.05, Figure 5—figure supplement 1). We selected 5 SNPs as candidate regulatory SNPs (Figures 5B, Supplementary file 1—table 17) as they not only overlapped with the putative enhancers defined in CNCCs (Prescott et al., 2015) and penis foreskin melanocyte primary cells (Chadwick, 2012; Consortium, 2012) (Figures 5A), but also showed high LD (r2>0.7) with at least one of the face-associated SNPs at the study-wide suggestive significance level. These were rs13410020 and rs2855266 from the 2q36.1 PAX3 region, rs17210317 and rs6828035 from the 4q28.1 INTU region, and rs4783818 from the 16q12.2 RPGRIP1L region. These 5 candidate regulatory SNPs were further studied via functional experiments as described in the following section.” We believe that our approach is sufficiently systematic, while we also agree with the alternative approach suggested by the reviewer. However, because of the substantial amount of experimental work required to follow-up on the selected candidate SNPs, fully executing the reviewer’s approach including the necessary experimental follow-up exceeds the resources we currently have available. However, we will keep the reviewer’s suggestion in mind for future work.

5) Multivariable modelling of phenotypes. The authors have considered all SNPs at genome-wide significance, and then LD pruned to a set of independent SNPs (r2<0.5). This does not define SNPs with independent effects on the phenotypes. The authors would be better to actually condition out the effects of the 24 lead SNPs by including them as covariates in the model, and then identify SNPs that remain at genome-wide significance. This could be done in exactly in RS, or could be done through approximate conditional analysis implemented in GCTA.

Following the reviewer’s comment, we conducted a multiple-regression analysis in RS condition out the effects of the 24 lead SNPs by including them as covariates in the model. This analysis identified 31 SNPs showing significant (adjusted p<0.05) independent effects on facial traits, explaining up to 4.62% variance for individual phenotypes. The revised manuscript was adjusted accordingly. Performing approximate conditional analysis implemented in GCTA we do not feel comfortable because of the lack of individual-level data.

https://doi.org/10.7554/eLife.49898.sa2

Article and author information

Author details

  1. Ziyi Xiong

    1. Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    2. Department of Epidemiology, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    3. CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, University of Chinese Academy of Sciences (CAS), Beijing, China
    Contribution
    Data curation, Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Manuscript preparation and revision, Final approval of the version to be published
    Contributed equally with
    Gabriela Dankova
    Competing interests
    No competing interests declared
  2. Gabriela Dankova

    Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Data curation, Formal analysis, Investigation, Visualization, Methodology, Data acquisition, Manuscript preparation and revision, Final approval of the version to be published
    Contributed equally with
    Ziyi Xiong
    Competing interests
    No competing interests declared
  3. Laurence J Howe

    Medical Research Council Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, United Kingdom
    Contribution
    Data curation, Formal analysis, Investigation, Funding acquisition, Manuscript preparation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  4. Myoung Keun Lee

    Center for Craniofacial and Dental Genetics, Department of Oral Biology, University of Pittsburgh, Pittsburgh, United States
    Contribution
    Data curation, Formal analysis, Investigation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  5. Pirro G Hysi

    Department of Twin Research and Genetic Epidemiology, King’s College London, London, United Kingdom
    Contribution
    Data curation, Formal analysis, Investigation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  6. Markus A de Jong

    1. Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    2. Department of Oral & Maxillofacial Surgery, Special Dental Care, and Orthodontics, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    3. Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
    Present address
    Department of Computer Science, VU University Amsterdam, Amsterdam, Netherlands
    Contribution
    Data curation, Software, Formal analysis, Investigation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  7. Gu Zhu

    QIMR Berghofer Medical Research Institute, Brisbane, Australia
    Contribution
    Data curation, Formal analysis, Validation, Investigation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  8. Kaustubh Adhikari

    Department of Genetics, Evolution, and Environment, University College London, London, United Kingdom
    Contribution
    Data curation, Formal analysis, Validation, Investigation, Final approval of the version to be published
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5825-4191
  9. Dan Li

    1. CAS Key Laboratory of Computational Biology, Chinese Academy of Sciences (CAS), Shanghai, China
    2. CAS-MPG Partner Institute for Computational Biology (PICB), Chinese Academy of Sciences (CAS), Shanghai, China
    3. Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS), Shanghai, China
    Contribution
    Formal analysis, Validation, Investigation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  10. Yi Li

    CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, University of Chinese Academy of Sciences (CAS), Beijing, China
    Contribution
    Formal analysis, Investigation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  11. Bo Pan

    Department of Auricular Reconstruction, Plastic Surgery Hospital, Beijing, China
    Contribution
    Formal analysis, Investigation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  12. Eleanor Feingold

    Center for Craniofacial and Dental Genetics, Department of Oral Biology, University of Pittsburgh, Pittsburgh, United States
    Contribution
    Data curation, Formal analysis, Final approval of the version to be published
    Competing interests
    No competing interests declared
  13. Mary L Marazita

    1. Center for Craniofacial and Dental Genetics, Department of Oral Biology, University of Pittsburgh, Pittsburgh, United States
    2. Department of Human Genetics, University of Pittsburgh, Pittsburgh, United States
    Contribution
    Resources, Data curation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2648-2832
  14. John R Shaffer

    1. Center for Craniofacial and Dental Genetics, Department of Oral Biology, University of Pittsburgh, Pittsburgh, United States
    2. Department of Human Genetics, University of Pittsburgh, Pittsburgh, United States
    Contribution
    Resources, Data curation, Formal analysis, Funding acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1897-1131
  15. Kerrie McAloney

    QIMR Berghofer Medical Research Institute, Brisbane, Australia
    Contribution
    Resources, Data curation, Formal analysis, Validation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  16. Shu-Hua Xu

    1. CAS Key Laboratory of Computational Biology, Chinese Academy of Sciences (CAS), Shanghai, China
    2. CAS-MPG Partner Institute for Computational Biology (PICB), Chinese Academy of Sciences (CAS), Shanghai, China
    3. Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS), Shanghai, China
    4. School of Life Science and Technology, ShanghaiTech University, Shanghai, China
    5. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
    Contribution
    Resources, Supervision, Funding acquisition, Validation, Project administration, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  17. Li Jin

    1. CAS Key Laboratory of Computational Biology, Chinese Academy of Sciences (CAS), Shanghai, China
    2. CAS-MPG Partner Institute for Computational Biology (PICB), Chinese Academy of Sciences (CAS), Shanghai, China
    3. Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS), Shanghai, China
    4. State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
    5. Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China
    Contribution
    Resources, Supervision, Funding acquisition, Validation, Project administration, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  18. Sijia Wang

    1. CAS Key Laboratory of Computational Biology, Chinese Academy of Sciences (CAS), Shanghai, China
    2. CAS-MPG Partner Institute for Computational Biology (PICB), Chinese Academy of Sciences (CAS), Shanghai, China
    3. Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS), Shanghai, China
    4. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
    Contribution
    Resources, Data curation, Supervision, Funding acquisition, Validation, Project administration, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  19. Femke MS de Vrij

    Department of Psychiatry, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Resources, Investigation, Final approval of the version to be published
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0825-3806
  20. Bas Lendemeijer

    Department of Psychiatry, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Resources, Investigation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  21. Stephen Richmond

    Applied Clinical Research and Public Health, University Dental School, Cardiff University, Cardiff, United Kingdom
    Contribution
    Resources, Supervision, Investigation, Data acquisition, Manuscript preparation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  22. Alexei Zhurov

    Applied Clinical Research and Public Health, University Dental School, Cardiff University, Cardiff, United Kingdom
    Contribution
    Data curation, Formal analysis, Investigation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  23. Sarah Lewis

    Medical Research Council Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, United Kingdom
    Contribution
    Resources, Data curation, Investigation, Funding acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  24. Gemma C Sharp

    1. Medical Research Council Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, United Kingdom
    2. School of Oral and Dental Sciences, University of Bristol, Bristol, United Kingdom
    Contribution
    Resources, Data curation, Investigation, Funding acquisition, Project administration, Final approval of the version to be published
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2906-4035
  25. Lavinia Paternoster

    Medical Research Council Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, United Kingdom
    Contribution
    Resources, Data curation, Supervision, Project administration, Final approval of the version to be published
    Competing interests
    No competing interests declared
  26. Holly Thompson

    Medical Research Council Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, United Kingdom
    Contribution
    Resources, Data curation, Supervision, Final approval of the version to be published
    Competing interests
    No competing interests declared
  27. Rolando Gonzalez-Jose

    Instituto Patagonico de Ciencias Sociales y Humanas, CENPAT-CONICET, Puerto Madryn, Argentina
    Contribution
    Resources, Data curation, Validation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  28. Maria Catira Bortolini

    Departamento de Genetica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
    Contribution
    Resources, Data curation, Validation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  29. Samuel Canizales-Quinteros

    UNAM-Instituto Nacional de Medicina Genomica, Facultad de Quımica, Unidad de Genomica de Poblaciones Aplicada a la Salud, Mexico City, Mexico
    Contribution
    Resources, Data curation, Validation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  30. Carla Gallo

    Laboratorios de Investigacion y Desarrollo, Facultad de Ciencias y Filosofıa, Universidad Peruana Cayetano Heredia, Lima, Peru
    Contribution
    Resources, Data curation, Validation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8348-0473
  31. Giovanni Poletti

    Laboratorios de Investigacion y Desarrollo, Facultad de Ciencias y Filosofıa, Universidad Peruana Cayetano Heredia, Lima, Peru
    Contribution
    Resources, Data curation, Validation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  32. Gabriel Bedoya

    GENMOL (Genetica Molecular), Universidad de Antioquia, Medellın, Colombia
    Contribution
    Resources, Data curation, Validation, Data acquisition, Funding acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  33. Francisco Rothhammer

    Instituto de Alta Investigacion, Universidad de Tarapaca, Arica, Chile
    Contribution
    Resources, Data curation, Validation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  34. André G Uitterlinden

    1. Department of Epidemiology, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    2. Department of Internal Medicine, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Resources, Supervision, Funding acquisition, Project administration, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  35. M Arfan Ikram

    Department of Epidemiology, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Resources, Data curation, Supervision, Funding acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  36. Eppo Wolvius

    Department of Oral & Maxillofacial Surgery, Special Dental Care, and Orthodontics, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Resources, Supervision, Final approval of the version to be published
    Competing interests
    No competing interests declared
  37. Steven A Kushner

    Department of Psychiatry, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Resources, Supervision, Final approval of the version to be published
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9777-3338
  38. Tamar EC Nijsten

    Department of Dermatology, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Resources, Data curation, Supervision, Project administration, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  39. Robert-Jan TS Palstra

    Department of Biochemistry, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Conceptualization, Resources, Supervision, Methodology, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  40. Stefan Boehringer

    Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
    Contribution
    Conceptualization, Resources, Software, Supervision, Methodology, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9108-9212
  41. Sarah E Medland

    QIMR Berghofer Medical Research Institute, Brisbane, Australia
    Contribution
    Resources, Supervision, Funding acquisition, Validation, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  42. Kun Tang

    1. CAS Key Laboratory of Computational Biology, Chinese Academy of Sciences (CAS), Shanghai, China
    2. CAS-MPG Partner Institute for Computational Biology (PICB), Chinese Academy of Sciences (CAS), Shanghai, China
    3. Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS), Shanghai, China
    Contribution
    Data curation, Formal analysis, Funding acquisition, Validation, Investigation, Methodology, Data acquisition, Final approval of the version to be published
    Competing interests
    No competing interests declared
  43. Andres Ruiz-Linares

    1. State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
    2. Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China
    3. Aix-Marseille Université, CNRS, EFS, ADES, Marseille, France
    Contribution
    Resources, Supervision, Funding acquisition, Validation, Project administration, Data acquisition, Manuscript preparation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  44. Nicholas G Martin

    QIMR Berghofer Medical Research Institute, Brisbane, Australia
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Validation, Methodology, Project administration, Data acquisition, Manuscript preparation, Final approval of the version to be published
    Competing interests
    No competing interests declared
  45. Timothy D Spector

    Department of Twin Research and Genetic Epidemiology, King’s College London, London, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Methodology, Project administration, Data acquisition, Manuscript preparation, Final approval of the version to be published
    Contributed equally with
    Evie Stergiakouli, Seth M Weinberg, Fan Liu and Manfred Kayser
    Competing interests
    No competing interests declared
  46. Evie Stergiakouli

    1. Medical Research Council Integrative Epidemiology Unit, Population Health Sciences, University of Bristol, Bristol, United Kingdom
    2. School of Oral and Dental Sciences, University of Bristol, Bristol, United Kingdom
    Contribution
    Resources, Data curation, Supervision, Funding acquisition, Methodology, Project administration, Data acquisition, Manuscript preparation, Final approval of the version to be published
    Contributed equally with
    Timothy D Spector, Seth M Weinberg, Fan Liu and Manfred Kayser
    Competing interests
    No competing interests declared
  47. Seth M Weinberg

    1. Center for Craniofacial and Dental Genetics, Department of Oral Biology, University of Pittsburgh, Pittsburgh, United States
    2. Department of Human Genetics, University of Pittsburgh, Pittsburgh, United States
    3. Department of Anthropology, University of Pittsburgh, Pittsburgh, United States
    Contribution
    Resources, Data curation, Supervision, Funding acquisition, Methodology, Project administration, Data acquisition, Manuscript preparation, Final approval of the version to be published
    Contributed equally with
    Timothy D Spector, Evie Stergiakouli, Fan Liu and Manfred Kayser
    Competing interests
    No competing interests declared
  48. Fan Liu

    1. Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    2. CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, University of Chinese Academy of Sciences (CAS), Beijing, China
    Contribution
    Conceptualization, Resources, Software, Supervision, Funding acquisition, Visualization, Methodology, Data acquisition, Manuscript preparation and revision, Final approval of the version to be published
    Contributed equally with
    Timothy D Spector, Evie Stergiakouli, Seth M Weinberg and Manfred Kayser
    For correspondence
    liufan@big.ac.cn
    Competing interests
    No competing interests declared
  49. Manfred Kayser

    Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Project administration, Data acquisition, Manuscript preparation and revision, Final approval of the version to be published
    Contributed equally with
    Timothy D Spector, Evie Stergiakouli, Seth M Weinberg and Fan Liu
    For correspondence
    m.kayser@erasmusmc.nl
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4958-847X

Funding

Horizon 2020 - Research and Innovation Framework Programme (740580 (VISAGE))

  • Manfred Kayser

National Science Foundation of China (91651507)

  • Fan Liu

Chinese Academy of Sciences (Strategic Priority Research Program (XDC010400100))

  • Fan Liu

China Scholarship Council (PhD Fellowship)

  • Ziyi Xiong

Netherlands Organisation for Scientific Research (1750102005011)

  • M Arfan Ikram

Wellcome Trust

  • Timothy D Spector

Medical Research Council (102215/2/13/2)

  • Evie Stergiakouli

National Institute of Dental and Craniofacial Research (U01-DE020078)

  • Mary L Marazita
  • Seth M Weinberg

Leverhulme Trust (F/07 134/DF)

  • Andres Ruiz-Linares

National Natural Science Foundation of China (91631307)

  • Sijia Wang

National Natural Science Foundation of China (91731303)

  • Shu-Hua Xu

National Natural Science Foundation of China (30890034)

  • Li Jin

National Health and Medical Research Council (APP103119)

  • Nicholas G Martin

National Health and Medical Research Council (Fellowship (APP1103623))

  • Sarah E Medland

National Natural Science Foundation of China (31771388)

  • Shu-Hua Xu

National Natural Science Foundation of China (315014)

  • Shu-Hua Xu

National Natural Science Foundation of China (31711530221)

  • Shu-Hua Xu

National Natural Science Foundation of China (31271338)

  • Li Jin

National Institute of Dental and Craniofacial Research (R01-DE027023)

  • John R Shaffer
  • Seth M Weinberg

National Institute of Dental and Craniofacial Research (R01-DE016148)

  • Mary L Marazita
  • Seth M Weinberg

National Institute of Dental and Craniofacial Research (X01-HG007821)

  • Eleanor Feingold
  • Mary L Marazita
  • Seth M Weinberg

Netherlands Organisation for Scientific Research (911-03-012)

  • M Arfan Ikram

Ministry of Science and Technology of the People's Republic of China (National Key R&D Program of China (2017YFC083501))

  • Fan Liu

Shanghai Municipal Science and Technology Commission (Major Project 2017SHZDZX01)

  • Sijia Wang
  • Shu-Hua Xu

BBSRC (BB/I021213/1)

  • Andres Ruiz-Linares

Universidad de Antioquia (CPT-1602, Estrategia para sostenibilidad 2016-2018 grupo GENMOL)

  • Gabriel Bedoya

Science and Technology Commission of Shanghai Municipality (16JC1400504)

  • Li Jin
  • Sijia Wang

Chinese Academy of Sciences (QYZDJ-SSW-SYS009)

  • Shu-Hua Xu

National Basic Research Program of China (2015FY111700)

  • Li Jin
  • Sijia Wang

National Key Research and Development Program China (2016YFC0906403)

  • Shu-Hua Xu

National Key Research and Development Program China (2017YFC0803501-Z04)

  • Kun Tang

National Program for Top-notch Young Innovative Talents of The “Wanren Jihua” Project China

  • Shu-Hua Xu

Thousand Talents Plan

  • Sijia Wang

Max Planck-CAS Paul Gerson Unna Independent Research Group Leadership Award

  • Sijia Wang

Medical Research Council (MC_UU_00011/1)

  • Laurence J Howe
  • Sarah Lewis
  • Gemma C Sharp
  • Evie Stergiakouli

Medical Research Council

  • Timothy D Spector

Netherlands Organisation for Scientific Research (175.010.2005.011)

  • André G Uitterlinden

Netherlands Organisation for Scientific Research (911-03-012)

  • André G Uitterlinden

Netherlands Organisation for Scientific Research (050-060-810)

  • André G Uitterlinden

Netherlands Genomics Initiative (050-060-810)

  • André G Uitterlinden

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are grateful to all participants and their families who took part in the RS, TwinsUK, ALSPAC, PITT, UYG, CANDELA, and QIMR studies as well as the many individuals involved in the running of the studies and recruitment, which include midwives, interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, nurses and other professional. MK additionally thanks Maren von Köckritz-Blickwede for earlier discussions on PAD knockout mice. QIMR would like to thank Natalie Garden and Jessica Adsett for data collection and Scott Gordon for data management.

The ALSPAC study was supported by The UK Medical Research Council and Wellcome (Grant ref: 102215/2/13/2) and the University of Bristol. A comprehensive list of grants funding for ALSPAC is available online (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). This publication is the work of the authors and L.H. and E.S. will serve as guarantors for the contents of this paper. GWAS data was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. The TwinsUK study was supported by the Wellcome Trust, Medical Research Council, European Union (FP7/2007-2013), the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. The Rotterdam Study is supported by the Erasmus MC and Erasmus University Rotterdam; the Netherlands Organization for Scientific Research (NWO); the Netherlands Organization for Health Research and Development (ZonMw); the Research Institute for Diseases in the Elderly (RIDE) the Netherlands Genomics Initiative (NGI); the Ministry of Education, Culture and Science; the Ministry of Health Welfare and Sport; the European Commission (DG XII); and the Municipality of Rotterdam. The generation and management of GWAS genotype data for the Rotterdam Study were executed by the Human Genotyping Facility of the Genetic Laboratory of the Department of Internal Medicine, Erasmus MC. For the CANDELA study, the following institutions kindly provided facilities for the assessment of volunteers: Escuela Nacional de Antropología e Historia and Universidad Nacional Autónoma de México (México); Pontificia Universidad Católica del Perú, Universidad de Lima and Universidad Nacional Mayor de San Marcos (Perú); Universidade Federal do Rio Grande do Sul (Brazil); 13° Companhia de Comunicações Mecanizada do Exército Brasileiro (Brazil). The PITT study received additional support from The Centers for Disease Control and Prevention (https://www.cdc.gov/) provided funding through the following grant: R01- DD000295. Funding for initial genomic data cleaning by the University of Washington was provided by contract #HHSN268201200008I from the National Institute for Dental and Craniofacial Research (http://www.nidcr.nih.gov/) awarded to the Center for Inherited Disease Research (CIDR). The QIMR study received additional support by an Australian Research Council Linkage Grant (LP 110100121 to Dennis McNevin, University of Canberra).

Ethics

Human subjects: All cohort participants gave informed consent and consent to publish. The different cohort studies involved have been approved by their local ethics committees and in part higher institutions such as ministries, as described in the Materials and methods section. Protocol numbers can be found for each cohort in the Materials and methods section.

Senior Editor

  1. Mark I McCarthy, Genentech, United States

Reviewing Editor

  1. Andrew P Morris, University of Liverpool, United Kingdom

Reviewers

  1. Andrew P Morris, University of Liverpool, United Kingdom
  2. Seongwon Cha

Publication history

  1. Received: July 3, 2019
  2. Accepted: November 22, 2019
  3. Accepted Manuscript published: November 25, 2019 (version 1)
  4. Accepted Manuscript updated: November 26, 2019 (version 2)
  5. Version of Record published: December 11, 2019 (version 3)

Copyright

© 2019, Xiong et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 861
    Page views
  • 221
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Genetics and Genomics
    2. Human Biology and Medicine
    Roger Pique-Regi et al.
    Research Article
    1. Genetics and Genomics
    Jacob M Garrigues et al.
    Research Article