1. Genetics and Genomics
Download icon

Genetic predisposition to uterine leiomyoma is determined by loci for genitourinary development and genome stability

  1. Niko Välimäki
  2. Heli Kuisma
  3. Annukka Pasanen
  4. Oskari Heikinheimo
  5. Jari Sjöberg
  6. Ralf Bützow
  7. Nanna Sarvilinna
  8. Hanna-Riikka Heinonen
  9. Jaana Tolvanen
  10. Simona Bramante
  11. Tomas Tanskanen
  12. Juha Auvinen
  13. Outi Uimari
  14. Amjad Alkodsi
  15. Rainer Lehtonen
  16. Eevi Kaasinen
  17. Kimmo Palin
  18. Lauri A Aaltonen  Is a corresponding author
  1. University of Helsinki, Finland
  2. University of Helsinki and Helsinki University Hospital, Finland
  3. Faculty of Medicine, University of Oulu, Finland
  4. Oulu University Hospital, University of Oulu, Finland
  5. Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Sweden
Research Article
  • Cited 1
  • Views 550
  • Annotations
Cite this article as: eLife 2018;7:e37110 doi: 10.7554/eLife.37110

Abstract

Uterine leiomyomas (ULs) are benign tumors that are a major burden to women’s health. A genome-wide association study on 15,453 UL cases and 392,628 controls was performed, followed by replication of the genomic risk in six cohorts. Effects of the risk alleles were evaluated in view of molecular and clinical characteristics. 22 loci displayed a genome-wide significant association. The likely predisposition genes could be grouped to two biological processes. Genes involved in genome stability were represented by TERT, TERC, OBFC1 - highlighting the role of telomere maintenance - TP53 and ATM. Genes involved in genitourinary development, WNT4, WT1, SALL1, MED12, ESR1, GREB1, FOXO1, DMRT1 and uterine stem cell marker antigen CD44, formed another strong subgroup. The combined risk contributed by the 22 loci was associated with MED12 mutation-positive tumors. The findings link genes for uterine development and genetic stability to leiomyomagenesis, and in part explain the more frequent occurrence of UL in women of African origin.

https://doi.org/10.7554/eLife.37110.001

eLife digest

Fibroids – also known as uterine leiomyomas, or myomas – are a very common form of benign tumor that grows in the muscle wall of the uterus. As many as 70% of women develop fibroids in their lifetime. About a fifth of women report symptoms including severe pain, heavy bleeding during periods and complications in pregnancy. In the United States, the cost of treating fibroids is estimated to be $34 billion each year.

Despite the prevalence of fibroids in women, there are few treatments available. Drugs to target them have limited effect and often an invasive procedure such as surgery is needed to remove the tumors. However, a better understanding of the genetics of fibroids could lead to a way to develop better treatment options.

Välimäki, Kuisma et al. used a genome-wide association study to seek out DNA variations that are more common in people with fibroids. Using data from the UK Biobank, the genomes of over 15,000 women with fibroids were analyzed against a control population of over 392,000 individuals. The analysis revealed 22 regions of the genome that were associated with fibroids. These regions included genes that may well contribute to fibroid development, such as the gene TP53, which influences the stability of the genome, and ESR1, which codes for a receptor for estrogen – a hormone known to play a role in the growth of fibroids. Variation in a set of genes known to control development of the female reproductive organs was also identified in women with fibroids.

The findings are the result of the largest genome-wide association study on fibroids, revealing a set of genes that could influence the development of fibroids. Studying these genes could lead to more effective drug development to treat fibroids. Revealing this group of genes could also help to identify women at high risk of developing fibroids and help to prevent or manage the condition.

https://doi.org/10.7554/eLife.37110.002

Introduction

Uterine leiomyomas (ULs), also known as fibroids or myomas, are benign smooth muscle tumors of the uterine wall. They are extremely common; approximately 70% of women develop ULs before menopause (Stewart et al., 2017). The symptoms, occurring in one fifth of women, include excessive menstrual bleeding, abdominal pain and pregnancy complications (Stewart et al., 2017). In most cases, durable treatment options are invasive (Stewart, 2015). ULs cause a substantial human and economic burden, and the annual cost of treating these tumors has been approximated to be as high as $34 billion in the United States, higher than the combined cost of treating breast and colon cancer (Cardozo et al., 2012).

Earlier studies have indicated strong genetic influence in UL susceptibility based on linkage (Gross, 2000), population disparity (Wise et al., 2012) and twin studies (Luoto et al., 2000). The most striking UL predisposing condition thus far characterized is hereditary leiomyomatosis and renal cell cancer (HLRCC) syndrome, caused by high-penetrance germline mutations in the Fumarate hydratase (FH) gene (Multiple Leiomyoma Consortium et al., 2002; Launonen et al., 2001). Genome-wide association studies (GWAS) have proposed several low-penetrance risk loci but few unambiguous predisposing genes have emerged. Cha et al. reported loci in chromosome regions 10q24.33, 11p15.5 and 22q13.1 based on a Japanese patient cohort (Cha et al., 2011). The 11p15.5 locus - near the Bet1 golgi vesicular membrane trafficking protein like (BET1L) gene - was later replicated in Caucasian ancestry (Edwards et al., 2013a). The 22q13.1 locus has been replicated in Caucasian, American and Saudi Arabian populations suggesting trinucleotide repeat containing 6B (TNRC6B) as a possible target gene (Edwards et al., 2013a; Aissani et al., 2015; Bondagji et al., 2017). Further UL predisposition loci have been suggested at 1q42.2 and 2q32.2 by Zhang et al (Zhang et al., 2015). and, at 3p21.31, 10p11.21 and 17q25.3 by Eggert et al (Eggert et al., 2012). A recent work reported cytohesin 4 (CYTH4) at 22q13.1 as a novel candidate locus in African ancestry (Hellwege et al., 2017). While multiple loci and genes have been implicated through these valuable studies it is not straightforward to connect any of them mechanistically to UL development.

Most ULs show somatic site-specific mutations at exons 1 and 2 of the mediator complex subunit 12 (MED12) gene (Mäkinen et al., 2011; Heinonen et al., 2014). These observations together with further scrutiny of driver mutations, chromosomal aberrations, gene expression, and clinicopathological characteristics have led to identification of at least three mutually exclusive UL subtypes; MED12 mutant, Fumarate Hydratase (FH) deficient, as well as High Mobility Group AT-Hook 2 (HMGA2) overexpressing lesions (Mehine et al., 2016).

Here we report the most powerful GWAS on uterine leiomyoma to date, and novel genome-wide significant UL susceptibility loci with plausible adjacent predisposition genes. These genes associate UL genesis to two distinct biological mechanisms: Genome stability related processes are implicated by genes Tumor Protein P53 (TP53) and ATM Serine/Threonine Kinase (ATM) together with the telomere maintenance genes Telomerase Reverse Transcriptase (TERT), Telomerase RNA Component (TERC) and STN1-CST Complex Subunit (OBFC1). The other prominent group is genes relevant for genitourinary development, specifically Wnt Family Member 4 (WNT4), Wilms Tumor 1 (WT1), Spalt Like Transcription Factor 1 (SALL1), Estrogen Receptor 1 (ESR1 or ERα), Growth Regulation By Estrogen In Breast Cancer 1 (GREB1), Forkhead Box O1 (FOXO1), Doublesex and Mab-3 Related Transcription Factor 1 (DMRT1) and CD44 Molecule (CD44). Our analysis of the X chromosome identifies a risk allele near MED12 that drives UL tumorigenesis towards somatic MED12 mutations. We report altogether 22 genome-wide significant susceptibility loci and compile them into a polygenic risk score. The UL association is then replicated in six independent cohorts of different ethnic origins: individuals of African origin are characterized by the highest risk load. Finally, we investigate the risk alleles’ association to clinical features, molecular UL subtypes, telomere length, gene expression and DNA methylation.

Results

Identification of predisposition loci

Figure 1 provides an outline of this study. At discovery stage 1,428 SNPs emerging from 22 distinct genetic loci passed the genome-wide significance level of 5 × 10−8. Figure 2 displays a Manhattan plot of these associations (15,453 UL cases and 392,628 controls; linear mixed model). Two of the significant loci (359/1,428 SNPs) were found on the X chromosome. After linkage disequilibrium (LD; r2 ≤0.3) pruning the significant SNPs, a total of 50 LD-independent associations remained: the resulting SNPs are given in Appendix 1—table 2, and the lead SNPs are summarized in Table 1.

Outline of the study stages and genotyping cohorts.

GRS, genomic risk score. NFBC, Northern Finland Birth Cohort.

https://doi.org/10.7554/eLife.37110.003
Overview of the uterine leiomyoma risk loci and the effect of increased number of MED12-mutated lesions per rs5937008 risk allele.

(A), Manhattan plot of the UK Biobank GWAS on 15,453 UL cases and 392,628 controls. On Y-axis, logarithm transformed association values, and on X-axis, autosomes and the X chromosome. The blue horizontal line denotes genome-wide significance (p=5 × 10−8). Gene symbols shown for reference. (B), MED12 region in more detail. ENCODE tracks (details in Supplementary Methods) are shown for reference. (C), The risk allele near MED12 (rs5937008) is observed with a significant increase in number of MED12-mutation-positive tumors (p=0.009; negative binomial regression; RR = 1.23 per risk allele; n = 457 Helsinki cohort patients).

https://doi.org/10.7554/eLife.37110.004
Table 1
Predisposition loci for uterine leiomyoma.
https://doi.org/10.7554/eLife.37110.005
ChrPositionrs-codeABB freqORPLikely disease gene
6152,562,271rs58415480CG0.1551.186.0E-29ESR1
X131,312,089rs5930554TC0.3111.144.3E-25?
177,571,752rs78378222 #TG0.0131.539.7E-25TP53
1132,370,380rs10835889GA0.1591.145.5E-19WT1
11108,149,207rs141379009TG0.0271.322.0E-18ATM
9802,228rs7027685AT0.4021.113.8E-18DMRT1
122,450,487rs2235529 #CT0.1571.141.1E-17WNT4/CDC42
X70,093,038rs5937008CT0.5200.915.6E-16MED12
51,283,755rs72709458 #CT0.2061.126.9E-16TERT
11225,196rs507139 *GA0.0740.843.2E-13?
454,546,192rs62323680GA0.0671.168.3E-13?
3169,514,585rs10936600 #AT0.2440.916.4E-12TERC
1341,179,798rs7986407AG0.3101.091.2E-11FOXO1
3197,623,337rs143835293AG0.0021.751.8E-11?
1246,831,129rs12832777TC0.7011.092.3E-11?
2240,669,648rs733381 *AG0.2131.105.7E-11?
1651,481,596rs66998222GA0.2010.918.9E-11SALL1
470,634,441rs2202282CT0.4971.078.7E-10?
211,702,661rs10929757AC0.5791.081.2E-09GREB1
1135,085,453rs2553772TG0.5381.074.4E-09CD44
5176,450,837rs2456181CG0.4841.076.3E-09?
10105,674,854rs1265164AG0.8690.911.0E-08OBFC1
  1. The numbers for B allele frequency (B Freq), odds-ratio (OR, where B is the effect allele) and association (P) are based on the UKBB cohort (15,453 UL cases). Gene symbols are shown for reference. The genomic coordinates follow hg19 and dbSNP build 147. All genome-wide significant (p<5 × 10−8) loci and their highest-association SNP are shown.

    * Previously implicated predisposition to ULs.

  2. # Previous associations to endometriosis, lung adenocarcinoma, glioma or telomere length; see literature in Appendix 1—table 11

Appendix 1—figure 1 displays the regional structure of each locus together with flanking association values, linkage disequilibrium (LD) and genome annotation. Annotation tracks are included for tissue-specific data on open chromatin, topologically associating domains (TAD) and other regulatory features (details in Supplementary Methods).

Genomic risk score

A polygenic risk score (Abraham and Inouye, 2015) was compiled based on the discovery stage associations. After LD pruning (r2 ≤0.3) the discovery-stage SNPs, 50 SNPs from the 22 distinct loci passed for the initial genomic risk score (GRS; Appendix 1—table 4). The SNP weights were based on UKBB log-odds. We applied this initial GRS of 50 SNPs to the Helsinki cohort and identified a significant association to the UL phenotype (p=8.3 × 10−10; adjusted p=1.1 × 10−8; one-tailed Wilcoxon rank-sum; W = 1.69 × 106; 457 cases and 8899 female controls).

Meta-analysis

The second stage GWAS combined the UKBB and Helsinki cohorts for a meta-analysis approach. The genome-wide statistics revealed rs117245733, at 13q14.11, as the only SNP with a suggestive (p<10−5) association in both the UKBB (OR = 1.26; p=4.2 × 10−9) and Helsinki (OR = 1.82; p=8.1 × 10−6) cohorts. Figure 3 shows the regional structure and combined association (fixed effect model p=3.1 × 10−12) at the locus: the SNP resides on a gene poor region, at a conserved element that displays activity in uterus-specific H3K27ac and DNaseI data (see ENCODE track details in Supplementary Methods). The SNP is independent of the group of associations at FOXO1 (r2 = 0.0; Figure 3).

Meta-analysis of UL risk revealed rs117245733 at a gene poor region of 13q14.11.

(A) meta-analysis P-values and the genomic context at the locus. Gene symbols and ENCODE tracks (details in Supplementary Methods) are shown for reference; coordinates follow hg19. (B) Hi-C, TADs and CpG methylation around the locus with a 1 Mb flank. The needle plot shows the meQTL associations (dashed lines at 10% FDR; green line denotes the SNP; gray ticks denote all CpGs tested; blue needle for positive coefficient, red for negative coefficient) for tumors (above x-axis; nAA = 53, nAB = 3) and normals (below x-axis; nAA = 33, nAB = 2). (C) UCSC genome browser tracks related to conservation and regulation at the locus.

https://doi.org/10.7554/eLife.37110.006

The meta-analysis identified altogether 112 genome-wide significant SNPs not seen in the discovery stage: seven of those were LD-independent (r2 ≤0.3; Appendix 1-table 3) and their UKBB log-odds weights were appended to the initial GRS model. The final GRS model of 57 SNPs and their UKBB-based weights is given in Appendix 1—table 4. Supplementary file 1 gives further details on the meta-analysis results and heterogeneity statistics.

Replication of the GWAS and GRS

The third stage replicated the observations in NFBC and in five different ethnic groups. In NFBC, the SNP identified in the stage two meta-analysis, rs117245733 at 13q14.11, was replicated (p=0.034; linear mixed model; OR = 1.50; 95% CI 1.03 – 2.19). Additional analysis of all 57 SNPs did not reveal other associations: Supplementary file 2 gives further details on the meta-analysis results and heterogeneity statistics. The association between the GRS and UL phenotype was significant (p=1.1 ×10−5; Wilcoxon rank-sum; adjusted p=1.1 × 10−4; one-tailed; W = 4.7 × 105) in NFBC. These case-control distributions of GRS are displayed in Figure 4.

The genomic risk score is elevated in patients with MED12-mutated lesions and in respect to the UL phenotype in the six follow-up cohorts.

On top, GRS association to MED12 mutation status. The rest show GRS association to the UL phenotype in six independent replication cohorts. Associations (P) and test statistics (W) are from Wilcoxon rank-sum tests. Only females were included as the control samples. The X-axes show the GRS distributions for each phenotype.

https://doi.org/10.7554/eLife.37110.007

UL susceptibility is known to vary by ancestry (Wise et al., 2012). Five different ethnic groups - African, Caribbean, Irish, Indian and ‘other white’ background - were available from the UKBB cohort. A total of 2,212 UL cases and 21,054 female controls could be utilized for replication (Appendix 1—table 1). Supplementary file 3 includes all the 57 SNPs and their summary statistics in these five cohorts, together with heterogeneity estimates. Due to the small cohort sizes, none of the single-SNP associations passed genome-wide significance. The GRS model replicated with a significant phenotype association in all five ethnicities (Appendix 1-Table 6). A summary of test statistics, GRS distributions and the numbers of cases and controls for each population is given in Figure 4. A more detailed summary of the GRS model and receiver operating characteristic (ROC) curve of each cohort are given in Appendix 1—figure 5.

The self-reported ‘Black African’ (mean GRS 4.83) had an outstanding risk-load compared to Caucasian (self-reported ‘White Irish’; mean GRS 4.04) background (Figure 4; Wilcoxon rank-sum p<10−15). As expected (Wise et al., 2012), the African ethnicity displayed an increased prevalence (19%) compared to the Irish (6%). Assuming that the observed GRS weights have a linear relationship to the true risk, the GRS difference between African and Irish ancestries explains 9.0% of the increased prevalence in the African population.

Similar population-specific GRSs could be estimated for the seven populations in the gnomAD database (Appendix 1—table 7). Appendix 1—figure 6 shows an overview of the GRS for each of the populations. African ancestry has been shown to carry a two-to-three times higher prevalence when compared to Caucasian ancestry (Wise et al., 2012). Based on the gnomAD frequencies, the increased GRS of African ancestry explains between 8 – 16% of this population difference.

Association to clinical variables

The number of ULs per patient had a significant positive association to GRS (negative binomial regression p=0.001; adjusted p=0.0032; rate ratio 1.25; 95% CI 1.09 – 1.43 for one-unit increase in GRS; Appendix 1—figure 7). No association was found between GRS and age at hysterectomy (Appendix 1—table 6). Testing the 57 GRS SNPs separately did not reveal any associations that pass FDR (Appendix 1—table 5).

Association to MED12 mutated tumors

Our UL set of 1481 lesions included 1159 (78%) mutation-positive and 322 mutation-negative tumors. The occurrence of mutant tumors did not distribute evenly among the 457 patients. In total 221 (48%) and 123 (27%) patients had all their tumors identified as either MED12-mutation-positive or -negative, respectively, suggesting that genetic or environmental factors contribute to the preferred UL type in affected individuals, as previously observed (Mäkinen et al., 2011). Indeed, the 221 mutation positive patients were found to have a significantly higher GRS (Wilcoxon rank-sum p=7.9 × 10−4; adjusted p=0.0032; two-sided; W = 1.6 × 104). This difference in GRS distributions is visualized in Figure 4.

Comparison against the population controls (n = 8899 females) revealed that the above-mentioned patient groups differ by their effect size: the MED12-mutation-positive (221) subset of patients had an odds ratio of 2.28 for one-unit increase in GRS (95% CI 1.80 – 2.88) compared to the controls, while the mutation-negative (123) subset had an odds ratio of 1.20 (95% CI 0.88 – 1.66). Thus, the majority of the compiled case-control association signal had arisen from the MED12-mutation-positive subset of the patients.

The number of MED12-mutation-positive tumors per patient had a significant positive association to GRS (p=3.2 × 10−4; adjusted p=0.002; negative binomial model rate ratio 1.43; 95% CI 1.13 – 3.83 for one-unit increase in GRS; Appendix 1—figure 8). No association between the number of MED12-mutation-negative tumors and GRS was found (adjusted p=0.053; Appendix 1—figure 8).

The GWAS signal near MED12 was inspected for any associations to somatic MED12 mutations. Strikingly, the risk allele (rs5937008) did significantly increase the number of MED12-mutation-positive tumors (p=0.0087; negative binomial model rate ratio 1.23; 95% CI 1.05 – 1.44). Among our 457 patients, the median number of MED12-mutation-positive tumors increased from one to two for the risk allele carriers. The risk locus and its effect on the number of MED12-mutation-positive tumors is visualized in Figure 2. An additional analysis of each of the 57 GRS SNPs did not reveal any further associations (Appendix 1—table 5).

Association to gene expression

All the genome-wide significant SNPs from UKBB and the meta-analysis stage (altogether 1,540 SNPs) were tested with a permutation based approach. In total 34 and 24 genes passed the local permutation significance threshold (p<0.05) for tumor and matched myometrium data, respectively (Appendix 1—table 8). Among the hits in tumors were WNT4 (p=0.01; permutation test) and CDC42 (p=0.03) at 1 p, TNRC6B (p=0.02) at 22q, FOXO1 (p=0.03) at 13q, and DMRT1 (p=0.04) at 9 p. None of the local associations passed a genome-wide FDR of 10%. No significant association was observed between the risk allele and MED12 expression (rs5936989; Appendix 1—figure 11). The full list of eQTL statistics can be found from Supplementary file 4.

Association to DNA methylation

Our analysis of the 57 GRS SNPs revealed altogether 17,030 (9,466 in tumors and 7564 in matched myometrium) cis methylation quantitative trait loci (cis-meQTL) with nominal p<0.05. Of these, 145 passed a 10% FDR. Of the plausible predisposition genes, FOXO1, TERT and WNT4 showed significant meQTL associations (Appendix 1—table 9). All the cis-meQTLs and annotation for their genomic context are in Supplementary file 5.

Association to telomere length and structural variants

The UL predisposition loci at TERT, TERC and OBFC1 were examined for an effect on telomere length. Overall the telomere length was significantly shorter in tumors than in adjacent matched myometrium (p=0.01; Kruskal-Wallis), as previously reported (Rogalla et al., 1995; Bonatz et al., 1998). One of the risk alleles at TERT (rs2736100) was significantly associated with shorter telomere length (p=0.01; Kruskal-Wallis) (Appendix 1—figure 12). Adjusting for the patient age did not explain away the association. The association was not seen in myometrium. The other two LD-independent SNPs at TERT, rs72709458 and rs2853676, or the SNPs at TERC (rs10936600) and OBFC1 (rs1265164) did not show association to telomere length (p=0.24, p=0.57, p=0.07 and p=0.48, respectively; Kruskal-Wallis). The combined effect of TERT (rs72709458, rs2736100, rs2853676), TERC (rs10936600) and OBFC1 (rs1265164) had a negative trend with telomere length (p=0.055; linear model 95% CI −408.5 – 4.7 per one risk allele; see Appendix 1—figure 13). In whole genome sequencing data, no association was detected between genotype and the number of somatic structural variants.

Pathway enrichment

The DEPICT framework (Pers et al., 2015) was ran using the genome-wide significant SNPs from the UKBB cohort, in total 1,069/1,428 autosomal SNPs. The resulting target gene prioritization, pathway enrichment and tissue enrichment results are given in Supplementary file 6. The analysis did not reveal any significant enrichments with the exception of one pathway related to induced stress. ATM was the highest ranking target gene, and uterus/myometrium were among the highest ranking tissue types.

Previously proposed UL predisposition loci

Previous UL association studies (Cha et al., 2011; Zhang et al., 2015; Eggert et al., 2012; Hellwege et al., 2017) have reported altogether seven genome-wide significant UL susceptibility loci. Two out of the seven loci - that is, 22q13.1 (at TNRC6B) and 11p15.5 (at BET1L) - replicated in UKBB using 15,453 cases and 392,628 controls. Cha et al (Cha et al., 2011). highlight OBFC1 (at 10q24.33) as a candidate gene and, while the SNP that they reported does not replicate in UKBB, the OBFC1 region is identified in our discovery stage (rs1265164; Table 1). See Appendix 1—table 10 for a summary of these results.

Discussion

The UK Biobank genotype-phenotype data revealed 22 novel predisposition loci for UL, most of them in close proximity to highly plausible predisposition genes. The combined UL risk of these loci was replicated in a subsequent analysis of the polygenic risk score (GRS) in six independent cohorts from different ethnic backgrounds. Our multi-ethnic replication implies that the discovered loci are indeed involved in UL development, and the early UL association studies have likely been underpowered to detect them. Three previously reported loci, at OBFC1 (Cha et al., 2011), TNRC6B (Cha et al., 2011; Edwards et al., 2013a; Aissani et al., 2015; Bondagji et al., 2017) and BET1L (Cha et al., 2011; Edwards et al., 2013b), were also validated, however, the mechanistic connection to UL development remains obscure for the latter two.

Though simple association is not sufficient to formally prove causality, 14 out of the 22 risk loci harbor plausible predisposition genes. These genes can be divided into two groups: TERT, TERC, OBFC1 (all involved in telomere length), ATM and TP53 guard stability of the genome. ESR1, GREB1, WT1, MED12, WNT4, FOXO1, DMRT1, SALL1, and CD44 play a role in genitourinary development.

Estrogen is a well-known inducer of UL growth (Borahay et al., 2015). The top association at 6q25.2 (rs58415480) resides within intron 107 of Spectrin Repeat Containing Nuclear Envelope Protein 1 (SYNE1), 130 kb downstream of ESR1, the latter being the only gene that resides completely within the topologically associating domain (TAD; Appendix 1—figure 1). While the role of estrogen in leiomyomagenesis has been firmly established, this is the first genetic evidence to this end. The lead SNP at 2 p resides in the third exon of the gene GREB1. GREB1 is an essential regulatory factor of ESR1 (Mohammed et al., 2013).

WT1, WNT4 and FOXO1 are central factors in uterine development and in the preparation for pregnancy (decidualization) in endometrium (Biason-Lauber and Konrad, 2008; Hill, 2018; Kaya Okur et al., 2016; Tamura et al., 2017). Perturbations in their function are known to have neoplastic potential. The strongest association at 11p13 (rs10835889) is 40 kb downstream of the closest gene WT1, at a region with enhancer activity (Appendix 1—figure 1). WT1 is a transcription factor that acts as both a tumor suppressor and an oncogene (Yang et al., 2007). The lead SNP at 1p36.12 (rs2235529) resides at the second intron of WNT4. The risk allele is associated with suggestive upregulation of WNT4 (Figure 5C). WNT4 is known to be overexpressed in uterine leiomyomas with MED12 mutations (Markowski et al., 2012), and knock-down of MED12 in UL cells reduces WNT4 expression (Al-Hendy et al., 2017). The risk locus in 1p36.12 was also associated with several meQTLs suggesting that methylation may have a role in WNT4 regulation (Figure 5). WNT4 encodes a signaling protein that has a crucial role in sex-determination (Vainio et al., 1999), and the WNT signaling pathway has a well-established role in various malignancies such as breast and ovarian cancer (Peltoketo et al., 2004). Of note, recent GWAS on gestational duration suggested that binding of the estrogen receptor at WNT4 is altered by rs3820282 (r2 = 0.92 with our lead SNP) (Zhang et al., 2017). Both WNT4 and FOXO1 are decidualization markers regulated by ESR1 (Kaya Okur et al., 2016). Though these considerations support WNT4 as a candidate predisposition gene at this locus, the near-by CDC42 has been shown to play a role in uterine pathology, in particular endometriosis (Powell et al., 2016), and should not be overlooked in further work.

Methylation and expression differences in WNT4.

(A), Hi-C, TADs and CpG methylation around the locus with an 1 Mb flank. The needle plot shows the meQTL associations (dashed lines at 10% FDR; green lines denote the two SNPs, rs2235529 and rs2092315; gray ticks denote all CpGs tested; blue needle for positive coefficient, red for negative coefficient) for tumors (above x-axis; nAA = 40, nAB = 15, nBB = 1 for rs2235529 and nAA = 32, nAB = 23 for rs2092315) and normals (below x-axis; nAA = 23, nAB = 12 and nAA = 17, nAB = 17, nBB = 1). (B), Methylation differences in tumors (n = 56) at CpG chr1:22456326 by SNP rs2235529. (C), WNT4 expression differences in tumors (n = 41) stratified by the rs12042083 genotype. B is the risk allele, and the P-value is corrected for local multiple testing (permutation based test).

https://doi.org/10.7554/eLife.37110.008

Also MED12 has been implicated in uterine development in a mouse model (Wang et al., 2017a). DMRT1 is a transcription factor associated with male sex-development (Lindeman et al., 2015). CD44 is a plausible fibroid stem cell marker (Mas et al., 2015). Mutations in SALL1 and a deletion at the GWAS signal have been associated with Townes-Brocks syndrome, a condition associated with kidney malformations (Stevens and May, 2016). Thus genes involved in genitourinary development are strikingly associated with UL predisposition.

ATM, TP53, TERT, TERC and OBFC1 could be involved in uterine neoplasia predisposition through genetic instability and telomere maintenance. The lead SNP at 11q (rs141379009) resides in the 22nd intron of ATM, and the SNP at 17 p in the 3′-untranslated region of TP53. ATM and TP53 are involved in DNA damage response (Guleria and Chandna, 2016), and they are among the relatively few genes that have been found to be recurrently mutated in leiomyosarcoma (Lee et al., 2017). TERT and TERC encode subunits of the telomerase enzyme, which guards chromosomal stability by elongating telomeres (Blasco, 2005). In addition OBFC1 has been associated with telomere maintenance (Lee et al., 2013). TERT is expressed in germ cells as well as in many types of cancers (Blasco, 2005). The neoplasia predisposing effect of the risk alleles at the TERT locus (rs72709458; rs2736100; rs2853676) has been overwhelmingly documented (Appendix 1—table 11). Previous studies have reported contradicting observations on the effect of rs2736100 on telomere length (Liu et al., 2014; ENGAGE Consortium Telomere Group et al., 2014; Lan et al., 2013; Melin et al., 2012; Choi et al., 2015). ULs have been shown to display shortened telomeres (Rogalla et al., 1995; Bonatz et al., 1998), potentially provoking chromosomal instability as the lengths of chromosome telomeres are diminished. In our patient cohort, the risk allele at TERT (rs2736100) is significantly associated with shorter telomere length (Appendix 1—figure 12), whereas the combined effect of SNPs at TERT, TERC and OBFC1 did not reach statistical significance.

GRS associated merely with a susceptibility to the most common UL subtype, MED12 mutation positive tumors. Indeed it has been known that MED12-mutation-positive tumors do not distribute randomly among patients (Mäkinen et al., 2011), and our data provide at least a partial explanation to this intriguing finding. An outstanding susceptibility locus was identified 250 kb upstream of MED12: our in-house patient cohort - together with a mutation-screening of their 1481 tumors - revealed that the risk allele could facilitate selection of somatic MED12 mutations. It may be that environmental factors contribute more significantly to genesis of MED12 wild-type lesions. In our recent study this tumor type was associated with a history of pelvic inflammatory disease, and thus infectious agents could be one underlying factor (Heinonen et al., 2017). Obviously, also the power of GWAS to detect genetic associations to rare UL subtypes – such as the HMGA2 overexpressing or FH deficient subtypes – is reduced.

This work highlights several new genetic cornerstones of UL formation, highlights genitourinary development and maintenance of genomic stability as key processes associated with it, and represents another step towards a much-improved understanding of its molecular basis. The proposed risk score can stratify the female population to low and high-risk quartiles that differ by two-fold in their UL risk. The population-specific risk score was inflated towards the African and Caribbean cohorts, which connects the predisposition loci to the excess UL prevalence in these ethnicities. While the increased risk appears minor on an individual level, the population-level burden to women’s health arising from these risk loci is highly significant considering the incidence of the condition. Together with the recent progress in molecular tumor characterization and subclassification, the identification of the genetic components of UL predisposition should pave the way towards more sophisticated prevention and management strategies for these extremely common tumors. The risk SNP with the most immediate potential value is that at estrogen receptor alpha, and our findings should fuel much further work on the interplay between individual germline genetics, endogenous and exogenous hormonal exposure, and occurrence and growth rate of UL.

Materials and methods

Genome-wide association study

Figure 1 provides an outline of the four stages that were implemented. The discovery stage was conducted with UK Biobank resources (UKBB; project #32506; accessed on April 10, 2018). The resource included pre-imputed genotypes (version 3; March 2018) for a total of 487,409 samples (486,757 samples for the X chromosome) and 96 million SNPs. The background information on the imputation and data quality control (QC) can be found through the UKBB documentation (www.ukbiobank.ac.uk).

The UL cases were identified on the basis of both the self-reported uterine leiomyoma (UL) phenotype (UKBB data-field 20002: Non-cancer illness code 1351) and International Classification of Diseases (ICD) codes (data-fields 41202 – 41205: Main and secondary diagnosis for ICD10 code D25 and ICD9 code 218). These phenotype data resulted in a total of 20,106 UL cases prior to any sample/genotype QC.

Sample QC was based on the UKBB annotation as follows. In total 409,692 samples passed the initial QC on ethnic grouping (UKBB data-field 22006): self-identified as ‘White British’, and similar genetic ancestry based on a principal component analysis (PCA) of the genotypes. Further sample QC excluded excess kinship (field 22021; 408,797 samples passed), sex-chromosome aneuploidy (field 22019; 408,241) and inconsistent gender (fields 31 and 22001, and one male with self-reported ULs; 408,081). In total 15,453 UL cases and 392,628 population-matched controls (205,157 females and 187,471 males) passed all these criteria.

Raw genotype calls (UKBB version 2; Affymetrix UK BiLEVE Axiom, or Affymetrix UKBB Axiom array) were available for 805,426 SNPs: after filtering out low genotyping rate (<95%), Hardy-Weinberg equilibrium (p<10−10) and minor allele frequency (MAF) <0.001, the remaining 611,887 autosomal genotypes were used to train the mixed model for association testing. Imputed SNPs with MAF <0.001 and imputation score (INFO) <0.3 were excluded. Further SNPs were excluded due to imputation panel differences between cohorts, and the remaining 8.3 million SNPs (Haplotype Reference Consortium, HRC1.1 panel) were tested for case-control association with BoltLMM (version 2.3.2) (Loh et al., 2015). The default linear, infinitesimal mixed model was used to adjust for any underlying population structure. The model included categorical covariates for the 22 UK Biobank assessment centres and two genotyping arrays.

Meta-analysis

The second stage meta-analysis utilized the genome-wide summary statistics from UKBB and the Helsinki cohort of 457 UL cases and 15,943 controls. Details on the Helsinki cohort’s imputation, sample and genotype QC are given in the Supplementary Methods. A total of 8.3 million SNPs passed imputation QC and were utilized in the meta-analysis with PLINK (version 1.90b3i) (Chang et al., 2015).

Replication

The SNPs were tested for association in six independent cohorts: Northern Finland Birth Cohort (NFBC) and five non-overlapping subsets of UKBB. In addition to the single-SNP association tests, a polygenic risk score (Abraham and Inouye, 2015Abraham and Inouye, 2015) was compiled as follows. The genomic risk score (GRS) was computed as a sum over SNP dosages weighted by their observed log-odds: LD pruning (r2 ≤0.3) was applied in the order of UKBB association, and the remaining, genome-wide significant SNPs were chosen for the GRS. The log-odds weights were taken from the UKBB statistics (i.e. logarithm of the dosage-based ORs). The resulting GRS model was evaluated using R (3.3.1) and the packages PredictABEL (1.2 – 2) and MASS (7.3 – 45).

The Northern Finland Birth Cohort (NFBC) had in total 459 UL cases and 4943 controls; details of the imputation, sample and genotype QC are given in the Supplementary Methods.

Five non-overlapping, self-reported population-strata were available from UKBB (data-field 21000) and could be utilized as an independent replication: the five self-reported ancestries were ‘Black African’, ‘Black Caribbean’, ‘Indian’, ‘White Irish’ and ‘Other white background’. Sample QC excluded excess kinship (field 22021), sex-chromosome aneuploidy (field 22019) and inconsistent gender (fields 31 and 22001). The numbers of cases and controls that passed the sample QC can be found in Figure 1. A summary of background variables is given in Appendix 1—table 1. These five sample subsets did not overlap with the discovery GWAS individuals. A collection of ancestry-informative genotypes was utilized to assess the genetic homogeneity of each of the self-reported ancestry (details in Supplementary Methods).

Patient and tumor material

Our in-house patient and tumor data were investigated regarding the identified risk loci. All tumors of ≥1 cm diameter had been harvested and stored fresh-frozen (details in Supplementary Methods). MED12 mutations were screened by Sanger sequencing the MED12 exons 1 and 2 and their flanking sequences (60 bp) from all uterine leiomyoma and matching normal myometrium samples (Mäkinen et al., 2011; Heinonen et al., 2014). The resulting sequence graphs were inspected manually and with Mutation Surveyor software (Softgenetics, State College, PA). Clinical patient data was available for the number of ULs, menopause status, parity, body mass index (BMI) and age at hysterectomy (Appendix 1—figure 2). This study was conducted in accordance with the Declaration of Helsinki and approved by the Finnish National Supervisory Authority for Welfare and Health, National Institute for Health and Welfare (THL/151/5.05.00/2017), and the Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS/177/13/03/03/2016).

Expression quantitative trait loci analysis

For the cis expression quantitative trait loci (cis-eQTL) analysis, genes with less than six reads in over 80% of the samples were filtered out. The between sample normalization was done with Relative Log Expression (RLE) normalization and each gene was inverse normal transformed. The eQTL analysis was run with FastQTL (version 2.184) (Ongen et al., 2016) separately for 60 tumors and 56 patient-matched unaffected, adjacent myometrium samples using permutation approach. The permutation parameter was set to ‘1000 10000’. Sequencing batch was used as a covariate. The cis-region was set to be 2 Mb. FDR correction was applied for tumors and matched myometrium separately.

Methylation quantitative trait loci analysis

DNA methylation was studied in 56 tumors and 36 matched myometrium samples. The methylation calls were analyzed with bsseq (version 1.12.2) (Hansen et al., 2012). Only the methylation in CpG context was considered. Every locus was required to have the coverage of ≥2 in at least 90% of samples. The association between methylation and genotype was studied with MatrixEQTL (version 2.1.1) using a linear regression model (Shabalin, 2012). The LD-independent (r2 ≤0.3) SNPs from the discovery stage (Appendix 1—table 2) and meta-analysis (Appendix 1—table 3) were considered (altogether 57 SNPs). The SNPs with MAF <0.05 in the methylation samples were filtered out. This resulted in 44 SNPs in tumors and 45 SNPs in matched myometrium. Cis methylation quantitative trait loci (cis-meQTL) was determined to be within 1 Mb flank from the SNP of interest. To annotate the CpGs with genomic context, the overlap between UCSC’s gene track (hg19) and known CpG islands was studied. As the role of promoter methylation is well known, promoter methylation was studied in addition to gene body methylation. Core promoter was defined as a region −2 kb and +1 kb from the transcription start site. The methylation analysis was performed separately for tumors and matched normal myometrium to study whether the changes in methylation could be observed in both tissues.

Whole genome analysis

The whole genome sequenced (WGS) samples, in total 71 tumors (48 Illumina, 23 Complete Genomics) and 51 matched myometrium samples (28 Illumina and 23 Complete Genomics), were prepared following Illumina and Complete Genomics protocols and processed as described previously (Mehine et al., 2013). Structural variation was defined as a structural rearrangement (e.g deletion, inversion or translocation) not detectable in matched normal myometrium. Structural variation was detected as described in Mehine et al. (Mehine et al., 2013) The mean telomere length was estimated for Illumina samples using Computel (version 0.3) (Nersisyan and Arakelyan, 2015) with the default settings. Clonally related tumors were excluded from the analysis by randomly sampling one tumor to represent each clonally related tumor group. Clonally related tumors had identical changes in driver genes and shared at least a subset of somatic copy-number changes and/or copy neutral loss of heterozygosity (see Mehine et al. (Mehine et al., 2015) for further details in identification of the clonally related tumors). Kruskal-Wallis test was used to assess the telomere length differences between tumors and matched myometrium as well as between genotypes. Linear model was used to calculate the association between number of risk alleles and telomere length.

Pathway enrichment

Pathway enrichment of all genome-wide significant SNPs was tested with DEPICT (version 1 release 194) following the default settings (Pers et al., 2015). The tool is designed to integrate multiple GWAS loci for in silico target gene prioritization, pathway enrichment and tissue-specific expression profiling. In short, the DEPICT framework combines phenotype-free co-expression networks, predefined pathways and protein-protein interaction networks in order to reveal functionally connected genes among the multiple risk loci. The tool is restricted to autosomal SNPs.

Statistical analysis

Meta-analysis was implemented with an inverse-variance weighted, fixed effect model. Associations between the risk alleles and other variables were tested assuming an additive genotype model unless otherwise noted. The DHARMa (0.1.5) package was applied to evaluate the goodness-of-fit of the binomial and negative binomial models. The contribution of GRS to prevalence was estimated by [(Ea/Pi-1)/(Pa/Pi-1)], where Ea = Pi*GRSa/GRSi assumes a linear relationship between GRS and the true risk, and Px and GRSx are the population-specific prevalence and mean GRS, respectively. All statistical tests were two-tailed unless otherwise noted.

Summary statistics were collected from each of the study stages and are available as Appendix 1—table 2 and Supplementary file 13. For each SNP, we report its allele frequency, effect size estimates and association based on the default linear, infinitesimal mixed model. For the meta-analysis stages, we also report the Cochrane’s Q statistic and I2 heterogeneity index in addition to the fixed-effects meta-analysis association and effect size (random-effects meta-analysis is included for reference). BoltLMM reported lambda (λGC) 1.055, 1.045 and 1.016 for UKBB, Helsinki and NFBC, respectively. The X chromosome associations were processed separately and included only the female controls (λGC 1.052, 1.049 and 1.005 for UKBB, Helsinki and NFBC, respectively).

For GWAS, p<5 × 10−8 was reported as significant. The GRS association tests (Appendix 1—table 6) were controlled for family-wise error rate (FWER) and reported significant for Holm-Bonferroni adjusted p<0.05. Large families of association tests were controlled for false discovery rate (FDR; Benjamini-Hochberg method) and noted significant at FDR < 10%. In the six telomere length association tests and the two structural variation association tests, p<0.05 was considered statistically significant.

Appendix 1

Supplementary methods

UK Biobank

The UK Biobank (UKBB) individuals were divided into six distinct subsets based on their self-reported ancestry. A summary of the background variables for each of the six ancestries is given in Appendix 1—table 1.

Sample QC for the discovery subset was readily available from the UKBB annotation (data-field 22006): self-identified as ‘White British’, and similar genetic ancestry based on a principal components analysis (PCA) of the genotypes.

The small replication cohorts were assessed for genetic homogeneity based on PCA of ancestry informative markers as follows. We pooled together 21 panels of ancestry informative markers - in total 1396 unique, autosomal SNPs, see Soundararajan et al (Soundararajan et al., 2016). for references - and compared the resulting PCs against the self-reported ancestry information. As expected, the genetic differences at these informative markers separated each self-reported ancestry into a distinct, homogeneous cluster. See Appendix 1—figure 4 for the resulting clustering of the follow-up cohorts.

Helsinki cohort

Our patient cohort (Helsinki cohort) was collected in accordance with the Declaration of Helsinki and approved by the Finnish National Supervisory Authority for Welfare and Health, National Institute for Health and Welfare (THL/151/5.05.00/2017), and the Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS/177/13/03/03/2016). In total 1577 uterine leiomyoma and corresponding normal myometrial samples were collected as fresh-frozen from 480 patients undergoing hysterectomy as previously described (Mäkinen et al., 2011; Heinonen et al., 2014; Heinonen et al., 2017). See below details on numbers of samples that passed imputation QC.

The number of ULs per patient was determined by the number of ULs harvested from the hysterectomy specimens. The study materials were derived from six different tissue collections, one consisting of anonymous patients and the other five collections consisting of patients who signed an informed consent before entering the study. Pathologists dissected the hysterectomy specimens and collected all visible tumours from each patient, with the exception of one tissue collection in which all feasible distinct tumours ≥ 1 cm in diameter were harvested. The smallest lesion used in this study had a diameter of 4 mm. All the specimens underwent routine diagnostic pathological scrutiny, and the histopathological diagnosis for the study samples was retrieved from the pathology reports.

Population-matched control data were obtained from the National FINRISK Study containing 16,048 genotyped individuals (https://www.thl.fi/fi/web/thlfi-en/research-and-expertwork/population studies/the-national-finrisk-study).

Northern Finland cohort

The Northern Finland Birth Cohort (NFBC) is a prospective collection of Oulu and Lapland region individuals born in 1966. For further validation of our results, the NFBC Project Center (University of Oulu) provided phenotype information on both clinical ICD disease records and 46 year follow-up questionnaires. The genotyped individuals (dbGaP Study Accession: phs000276.v2.p1) had in total 459 UL cases and 4943 population-matched controls.

Genotyping arrays

The Helsinki cohort was genotyped with Illumina Infinium HumanCore-24 BeadChip. The control samples in the Helsinki cohort were genotyped with the Illumina HumanCoreExome SNP array. The NFBC individuals were genotyped with the Illumina Infinium SNP array. All genomic coordinates follow GRCh37 and dbSNP build 147.

SNP array data processing and imputation

The Helsinki cohort B-allele frequencies and log-R ratios were extracted with Illumina GenomeStudio software, and somatic allelic imbalance (AI) regions were calculated for all tumors using BAFsegmentation (Kim et al., 2015) with default parameters.

Quality control (QC) was implemented using PLINK (v1.90b3i; http://www.cog-genomics.org/plink/1.9/). The Helsinki cohort was inspected for outliers, close relatedness and low genotyping rate: 457 UL cases (representing 1481 tumors collected) and 15,943 controls passed the initial genotyping control. Further genotype QC was implemented to exclude SNPs with low genotyping rate (<95%), excess homozygosity (i.e. homozygotes that exceed respective heterozygotes), rare homozygotes (minor allele frequency, MAF <0.02), Hardy-Weinberg equilibrium (p<10−3) or incorrect strand assignment based on LD. The remaining 211,967 SNPs were imputed using the Haplotype Reference Consortium panel (HRC; release 1.1) at the Sanger Imputation service (EAGLE2 +PBWT; https://imputation.sanger.ac.uk). All the reported alleles follow the strand orientation in HRC1.1 and GRCh37 coordinates. The same quality controls were applied to the NFBC data: all 5402 samples passed QC, and the GRS SNPs were imputed following the same process as for the Helsinki cohort. Post-imputation QC of Helsinki and NFBC data excluded SNPs with MAF <0.005 and imputation score (INFO) <0.4. A total of 8.3 million SNPs passed imputation quality.

Clinical background information

Clinical data were collected based on a retrospective review of medical records of the Helsinki study subjects. Details are provided in Heinonen et al (Heinonen et al., 2017). Information on parity, BMI, number of leiomyomas and age at hysterectomy were quantified for a total of n = 367, 366, 457 and 392 patients, respectively. Menopausal status was recorded for 367 patients as either pre, post or current HRT (hormone replacement therapy). Appendix 1—figure 2 shows the summary statistics of each background variable.

RNA sequencing

RNA-seq libraries were prepared according to the standard quality requirements for Illumina TruSeq Stranded total-RNA (RiboZero) kit. Paired-end Illumina (HiSeq2500) sequencing produced around 60 to 70 million 2 × 125 bp reads per sample. Data were aligned to human reference transcript (GRCh37) with HISAT2 (2.1.0) using parameter -dta and setting rna-strandness to RF (Kim et al., 2015). The alignments were assembled with StringTie (1.3.3b) using parameters -e and -rf (Pertea et al., 2015). Data quality was controlled by checking HISAT2 mapping statistics, ribosomal RNA contamination (based on Ensembl annotation release 75) and batch effects (principal components analysis).

DNA methylation

The SureSelect target enrichment system (Agilent Technologies, Inc., CA, USA) covering 84.5 Mb of the genome was used to prepare bisulfite-sequencing samples. Sample preparations were done according to the manufacturer’s instructions. Illumina paired-end sequencing of 56 tumor and 36 normal samples was done in Karolinska Institutet (Sweden) using 100 bp read length and the HiSeq2000 platform.

Raw sequencing reads were quality and adapter trimmed with cutadapt version 1.3 in Trim Galore. Low-quality ends trimming was done using Phred score cutoff 30. Adapter trimming was performed using the first 13 bp of the standard Illumina paired-end adapters with stringency overlap two and error rate 0.1. Read alignment was done against hg19/GRCh37 reference genome downloaded from UCSC Genome Browser with Bismark (version v0.10.0) (Krueger and Andrews, 2011) and Bowtie 2 (version 2.0.0-beta6) (Langmead and Salzberg, 2012). Duplicates were removed using the Bismark deduplicate function. Extraction of methylation calls was done with Bismark methylation extractor discarding the first 10 bp of both reads and reading methylation calls of overlapping parts of the paired reads from the first read (--no_overlap parameter).

LocusZoom visualization and ENCODE tracks

LocusZoom (http://locuszoom.org/) plots include the ENCODE tracks for DNase I hypersensitive sites for hTERT-HM (ENCFF001SPI, ENCFF001UXF) and uterus (ENCFF689EGI). RAMPAGE tracks were included for uterus (ENCFF979EGO), myometrium (ENCFF605TKQ) and endometrial microvascular endothelial cells (ENCFF440YZN) experiments. Topologically associating domains from endometrial microvascular endothelial cells (HiC; ENCFF633ORE) and H3K27ac ChIP-seq from uterus (stable peaks; ENCFF045LNJ) were included. Additional uterus-specific ChIP-seq data for CTCF (ENCFF282BOE, ENCFF634DDY) and POLR2A (ENCFF822OTY, ENCFF164YIY) were also included. Hi-C figures were produced with the ‘3D genome browser’ (http://promoter.bx.psu.edu/hi-c/; GM12878) (Wang et al., 2017b).

Appendix 1—table 1
Summary of the UKBB cohort individuals.

The UKBB cohort was split into six disjoint, self-reported ancestries. For each ancestry, we summarized the background information for the phenotype (number of UL cases and female controls), proportion of cases (%), age at first assessment visit (mean and SD), number of live births (mean and SD), body mass index (BMI; mean and SD), proportion of hysterectomy cases (%) and age at hysterectomy (mean and SD).

https://doi.org/10.7554/eLife.37110.017
Age at first assessment visitNumber of live birthsBMIAge at hysterectomy
Self-reported ancestryPhenotypeNCases (%)MeanSDMeanSDMeanSDEver had hyst. (%)MeanSD
White Britishcases154537.056.97.51.71.227.75.231.946.19.0
controls20515756.77.91.81.227.05.17.042.210.3
Black Africancases29619.149.66.82.01.731.25.622.145.18.8
controls125651.88.12.81.831.35.76.536.415.4
Blackcases66824.750.66.81.51.430.06.324.041.212.1
Caribbeancontrols204153.08.22.21.729.85.99.737.214.8
White
Irish
cases3986.056.57.61.81.427.85.232.646.67.2
controls620856.48.11.91.426.95.07.542.910.8
Asian Indiancases2037.353.87.92.01.127.64.233.544.111.9
controls256753.78.02.01.227.14.86.639.813.9
Other whitecases6476.754.87.91.31.226.75.323.646.58.6
 backgroundcontrols898254.68.31.61.226.45.25.442.111.0
Appendix 1—table 2
Discovery stage GWAS for the UKBB cohort.

The information for B allele frequency (B Freq), odds-ratio (OR) and association (Beta; standard error of Beta; χ2 and P) were collected from the UKBB cohort. All genome-wide significant, LD-independent (r2 ≤0.3 pruned in order of UKBB association) SNPs that passed imputation QC are shown. SNPs with r2 = NA are the lead-SNPs of each distinct locus. The A and B alleles are on GRCh37 forward strand. OR was computed from SNP dosages and B as the effect allele. Rows are sorted by genomic position.

https://doi.org/10.7554/eLife.37110.018
rs-codeChrPositionABr2 (reference
SNP)
B freqORBetaSEχ2P
rs2235529122450487CTNA0.1571.14−0.0055.81E-0473.3731.1E-17
rs2092315122507684CT0.14
(rs2235529)
0.2481.07−0.0034.90E-0430.4653.4E-08
rs10929757211702661ACNA0.5791.08−0.0034.32E-0436.9471.2E-09
rs11674184211721535TG0.30
(rs10929757)
0.3860.940.0024.36E-0430.0824.1E-08
rs109366003169514585ATNA0.2440.910.0034.91E-0447.1916.4E-12
rs1438352933197623337AGNA0.0021.75−0.0436.33E-0345.1431.8E-11
rs62323680454546192GANA0.0671.16−0.0068.49E-0451.2128.3E-13
rs2202282470634441CTNA0.4971.07−0.0034.22E-0437.5938.7E-10
rs7270945851283755CTNA0.2061.12−0.0045.28E-0465.1656.9E-16
rs273610051286516CA0.23 (rs72709458)0.4980.910.0034.23E-0460.8796.1E-15
rs285367651288547TC0.23 (rs2736100)0.7310.910.0034.77E-0449.7611.7E-12
rs24561815176450837CGNA0.4841.07−0.0024.24E-0433.7436.3E-09
rs48700846152543949CT0.29 (rs6904757)0.1890.920.0035.44E-0432.9419.5E-09
rs69283636152546094GA0.17 (rs58415480)0.4850.920.0034.24E-0446.8497.7E-12
rs584154806152562271CGNA0.1551.18−0.0075.89E-04124.6726.0E-29
rs755102046152592680TG0.07 (rs58415480)0.0121.29−0.0122.07E-0332.6641.1E-08
rs69047576152593102AG0.08 (rs6928363)0.3630.930.0034.42E-0441.0351.5E-10
rs1444445836152684585TC0.30 (rs58415480)0.1281.12−0.0046.37E-0445.2861.7E-11
rs1388210789674217CG0.13 (rs10975820)0.0211.23−0.0081.50E-0331.8191.7E-08
rs109758209684160GA0.00 (rs7027685)0.1421.12−0.0046.08E-0448.6023.1E-12
rs70276859802228ATNA0.4021.11−0.0044.33E-0475.4243.8E-18
rs1146803319815682TC0.12 (rs7027685)0.1001.11−0.0047.21E-0435.5772.5E-09
rs47424489826585CG0.28 (rs7027685)0.4581.07−0.0034.33E-0433.7146.4E-09
rs22771639827224AG0.07 (rs7027685)0.9470.870.0059.51E-0433.2118.3E-09
rs126516410105674854AGNA0.8690.910.0046.27E-0432.7721.0E-08
rs1124600311213723TG0.00 (rs507139)0.0440.850.0061.03E-0332.3371.3E-08
rs50713911225196GANA0.0740.840.0068.11E-0453.0823.2E-13
rs22075481132368744CA0.24 (rs10835889)0.4231.09−0.0034.30E-0462.3602.9E-15
rs108358891132370380GANA0.1591.14−0.0055.81E-0479.2345.5E-19
rs71204831132406983GC0.30 (rs10835889)0.1201.12−0.0046.51E-0444.4812.6E-11
rs110317831132459923CA0.28 (rs10835889)0.2041.09−0.0035.25E-0436.6871.4E-09
rs25537721135085453TGNA0.5381.07−0.0024.24E-0434.4584.4E-09
rs5902156511107999907CG0.27 (rs141379009)0.0911.11−0.0047.38E-0430.8822.7E-08
rs14137900911108149207TGNA0.0271.32−0.0111.30E-0376.7192.0E-18
rs498802311108168995AC0.00 (rs141379009)0.1440.890.0046.01E-0443.6993.8E-11
rs1222338111108354102CT0.12 (rs59021565)0.4061.07−0.0024.31E-0430.2343.8E-08
rs96694031246798900GA0.28 (rs12832777)0.4021.07−0.0034.35E-0436.9981.2E-09
rs128327771246831129TCNA0.7011.09−0.0034.61E-0444.7342.3E-11
rs1172457331340723944GA0.00 (rs7986407)0.0161.26−0.0101.74E-0334.5194.2E-09
rs79864071341179798AGNA0.3101.09−0.0034.56E-0445.9431.2E-11
rs669982221651481596GANA0.2010.910.0035.28E-0442.0538.9E-11
rs78378222177571752TGNA0.0131.53−0.0201.93E-03105.4579.7E-25
rs7333812240669648AGNA0.2131.10−0.0035.16E-0442.9195.7E-11
rs5936989X70022420TA0.27 (rs5937008)0.7820.920.0059.30E-0432.6061.1E-08
rs5937008X70093038CTNA0.5200.910.0067.68E-0465.5705.6E-16
rs7059898X70149078CA0.27 (rs5937008)0.3591.07−0.0058.11E-0431.7561.7E-08
rs7888560X131171122AG0.19 (rs5930554)0.2281.08−0.0059.16E-0431.8301.7E-08
rs5930554X131312089TCNA0.3111.14−0.0098.29E-04107.0764.3E-25
rs5933158X131578034AG0.21 (rs5930554)0.5861.08−0.0057.98E-0438.8334.6E-10
rs5975338X131626317AG0.28 (rs5930554)0.1291.10−0.0061.14E-0332.2011.4E-08
Appendix 1—table 3
Meta-analysis of the UKBB and Helsinki cohorts.

All the genome-wide significant, LD-independent (r2 ≤0.3) associations from the meta-analysis stage. Seven SNPs were LD-independent when compared to the discovery stage SNPs, and rs117245733 is shown for reference. The numbers for regression coefficients (Beta), standard error of beta (SE) and association (P) were collected from the UKBB and Helsinki summary statistics and their fixed effect model meta-analysis. The bolded SNP was the only one to reach a suggestive association (p<10−5) in both cohorts.

https://doi.org/10.7554/eLife.37110.019
UKBB cohortHelsinki cohortMeta-analysis (fixed eff.)
rs-codeChrPositionABr2 (reference
SNP)
BetaSEPBetaSEPBetaSEP
rs17631680267090367TCNA0.0040.00072.1E-070.0050.00296.2E-020.0040.00074.29E-08
rs17355373128122820TCNA−0.0030.00051.2E-07−0.0040.00218.7E-02−0.0030.00053.01E-08
rs67751869454568834TC0.14 (rs62323680)−0.0050.00099.9E-08−0.0110.00393.7E-03−0.0050.00095.05E-09
rs69016316152567047TC0.26 (rs6904757)0.0030.00065.4E-080.0050.00311.2E-010.0030.00061.76E-08
rs117904089876418GT0.10 (rs4742448)0.0020.00046.5E-080.0020.00193.9E-010.0020.00044.89E-08
rs1172457331340723944GA0.00 (rs7986407)−0.0100.00174.2E-09−0.0240.00538.1E-06−0.0120.00173.18E-12
rs104153911922652436CTNA−0.0040.00072.8E-07−0.0060.00282.0E-02−0.0040.00072.87E-08
rs621328011949267882ATNA0.0040.00073.3E-070.0060.00292.8E-020.0040.00074.20E-08
Appendix 1—table 4
Genomic risk score.

Summary of the GRS and its weights based on the discovery and meta-analysis stages. Dosage-based odds-ratios (OR) and log-odds were collected from the UKBB summary statistics. The A and B alleles are on GRCh37 forward strand, and the B allele is the effect allele. In total 50 SNPs from stage 1 and seven SNPs from stage 2.

https://doi.org/10.7554/eLife.37110.020
rs-codeChrPositionABORLog-oddsStage *
rs2235529122450487CT1.140.132Stage 1
rs2092315122507684CT1.070.072Stage 1
rs10929757211702661AC1.080.073Stage 1
rs11674184211721535TG0.94−0.067Stage 1
rs109366003169514585AT0.91−0.095Stage 1
rs1438352933197623337AG1.750.558Stage 1
rs62323680454546192GA1.160.153Stage 1
rs2202282470634441CT1.070.071Stage 1
rs7270945851283755CT1.120.111Stage 1
rs273610051286516CA0.91−0.090Stage 1
rs285367651288547TC0.91−0.089Stage 1
rs24561815176450837CG1.070.066Stage 1
rs48700846152543949CT0.92−0.087Stage 1
rs69283636152546094GA0.92−0.078Stage 1
rs584154806152562271CG1.180.167Stage 1
rs755102046152592680TG1.290.257Stage 1
rs69047576152593102AG0.93−0.078Stage 1
rs1444445836152684585TC1.120.111Stage 1
rs1388210789674217CG1.230.205Stage 1
rs109758209684160GA1.120.112Stage 1
rs70276859802228AT1.110.103Stage 1
rs1146803319815682TC1.110.108Stage 1
rs47424489826585CG1.070.066Stage 1
rs22771639827224AG0.87−0.140Stage 1
rs126516410105674854AG0.91−0.097Stage 1
rs1124600311213723TG0.85−0.167Stage 1
rs50713911225196GA0.84−0.171Stage 1
rs22075481132368744CA1.090.091Stage 1
rs108358891132370380GA1.140.133Stage 1
rs71204831132406983GC1.120.112Stage 1
rs110317831132459923CA1.090.084Stage 1
rs25537721135085453TG1.070.069Stage 1
rs5902156511107999907CG1.110.109Stage 1
rs14137900911108149207TG1.320.281Stage 1
rs498802311108168995AC0.89−0.113Stage 1
rs1222338111108354102CT1.070.064Stage 1
rs96694031246798900GA1.070.071Stage 1
rs128327771246831129TC1.090.086Stage 1
rs1172457331340723944GA1.260.231Stage 1
rs79864071341179798AG1.090.084Stage 1
rs669982221651481596GA0.91−0.098Stage 1
rs78378222177571752TG1.530.427Stage 1
rs7333812240669648AG1.100.091Stage 1
rs5936989X70022420TA0.92−0.081Stage 1
rs5937008X70093038CT0.91−0.094Stage 1
rs7059898X70149078CA1.070.068Stage 1
rs7888560X131171122AG1.080.077Stage 1
rs5930554X131312089TC1.140.129Stage 1
rs5933158X131578034AG1.080.074Stage 1
rs5975338X131626317AG1.100.094Stage 1
rs17631680267090367TC0.90−0.102Stage 2
rs17355373128122820TC1.070.071Stage 2
rs67751869454568834TC1.130.121Stage 2
rs69016316152567047TC0.91−0.097Stage 2
rs117904089876418GT0.94−0.063Stage 2
rs104153911922652436CT1.100.098Stage 2
rs621328011949267882AT0.90−0.105Stage 2
  1. * Discovered in stage 1 (UKBB GWAS) or in stage 2 (meta-analysis of UKBB and Helsinki)

Appendix 1—table 5
Summary of association tests for SNPs.

Each of the 57 GRS SNPs was tested for an additive effect to age at hysterectomy, degree of somatic allelic imbalance (AI) and tumor counts. Somatic allele imbalance was defined as the mean of the length of somatic allelic imbalance over all tumors of a patient. Tests of MED12 mutation positive and negative tumors are denoted by MED12mut + and MED12mut-, respectively. The numbers for regression coefficient (Beta), standard error of beta (SE), test statistic (z) and association (P) were taken by fitting either a linear regression or negative binomial (NB) regression model (response ~predictor). The nominal P-values were adjusted for FDR (Q). The risk allele was used as the effect allele for Beta. In total 228 tests, all p<0.05 are shown.

https://doi.org/10.7554/eLife.37110.021
PredictorResponseModelBetaSEStatisticPQ
12:46798900:G:AMED12mut + countNB−0.220.08−2.880.0040.51
9:674217:C:GMED12mut + countNB−0.690.26−2.630.0080.51
X:70093038:C:TMED12mut + countNB0.210.082.620.0090.51
X:70022420:T:AMED12mut + countNB0.200.082.560.0110.51
11:32459923:C:AMED12mut + countNB0.230.092.540.0110.51
4:54568834:T:Clog(somatic AI basepairs)Linear−0.250.10−2.480.0130.51
17:7571752:T:GMED12mut- countNB−0.920.41−2.240.0250.56
6:152546094:G:AMED12mut- countNB−0.190.08−2.240.0250.56
11:107999907:C:GAge at hysterectomyLinear3.001.342.240.0260.56
13:40723944:G:AMED12mut + countNB0.360.172.170.0300.56
22:40669648:A:GMED12mut + countNB0.190.092.150.0310.56
5:1286516:C:AMED12mut + countNB0.170.082.150.0310.56
22:40669648:A:GAge at hysterectomyLinear1.360.632.160.0320.56
1:22450487:C:TAge at hysterectomyLinear−1.480.72−2.050.0410.65
16:51481596:G:Alog(somatic AI basepairs)Linear0.170.082.020.0440.65
5:1288547:T:CAge at hysterectomyLinear1.320.662.000.0460.65
Appendix 1—table 6
Summary of all the association tests for GRS.

All GRS related tests from the main text. The notation of MED12mut + and MED12mut- refer to the numbers of MED12-mutation-positive and -negative tumors, respectively. The tests include Wilcoxon rank-sum and models for linear and negative binomial (NB) regression (variable ~GRS). The P values were adjusted for FWER with the Holm-Bonferroni method (Q). Significant associations (Q < 0.05) are shown bolded. Note that population association tests include only the female controls.

https://doi.org/10.7554/eLife.37110.022
GRS *CohortVariableTestN casesN controlsRate ratioPQ
 Stage 1HelsinkiUL phenotypeRank-sum (one-tailed)4578899-8.3e-101.1e-08
 Stage 2NFBCUL phenotypeRank-sum (one-tailed)4592351-1.1e-051.1e-04
 Stage 2HelsinkiTotal number of ULsNB457-1.250.001050.0032
 Stage 2HelsinkiAge at hysterectomyLinear392-0.500.480.48
 Stage 2HelsinkiNumber of MED12mut+NB457-1.433.2e-040.002
 Stage 2HelsinkiNumber of MED12mut-NB457-0.790.02660.053
 Stage 2HelsinkiOne-or-more MED12mut+Rank-sum334123-5.3e-040.0026
 Stage 2HelsinkiAll MED12mut+Rank-sum221123-7.9e-040.0032
 Stage 2African #UL phenotypeRank-sum (one-tailed)2961256-1.3e-051.2e-04
 Stage 2Caribbean#UL phenotypeRank-sum (one-tailed)6682041-6.7e-055.4e-04
 Stage 2Irish #UL phenotypeRank-sum (one-tailed)3986208-8.3e-069.1e-05
 Stage 2Indian #UL phenotypeRank-sum (one-tailed)2032567-2.9e-040.0020
 Stage 2Other white #UL phenotypeRank-sum (one-tailed)6478982-6.9e-098.3e-08
  1. *Based either on stage 1 (UKBB GWAS) or stage 2 (meta-analysis of UKBB and Helsinki)

    # An independent subset of UKBB data (stratified based on self-reported UKBB annotation).

Appendix 1—table 7
Population specific estimates for GRS.

Allele frequencies were collected from gnomAD (version 2.0.1). GRS weights were taken from UKBB summary statistics. The estimate for population specific GRS follows from weighting the allele frequencies with log-odds. The final values were scaled with a factor two to make them comparable with SNP dosage-based GRS. See Appendix-figure 6 for an explanation of the population acronyms.

https://doi.org/10.7554/eLife.37110.023
rs-codeChrPosABAF_AFRAF_AMRAF_ASJAF_EASAF_FINAF_NFEAF_OTH
 rs584154806152562271CG0.2280.0660.0900.0000.2370.1610.158
 rs78378222177571752TG0.0020.0040.0000.0000.0200.0180.015
 rs108358891132370380GA0.3840.0910.2020.0520.0690.1500.124
 rs14137900911108149207TG0.0050.0140.0230.0000.0070.0220.015
rs70276859802228AT0.5350.6020.3570.7380.5560.4130.462
rs2235529122450487CT0.0290.3130.0460.4840.1640.1580.171
 rs7270945851283755CT0.1390.1370.1830.1340.2210.1880.180
rs22075481132368744CA0.5530.2280.3930.1790.3210.4110.378
rs273610051286516CA0.5500.5970.4770.6060.5070.5040.539
rs50713911225196GA0.0280.0580.0790.0070.0940.0720.084
 rs62323680454546192GA0.0200.0320.0990.0020.0450.0700.060
rs285367651288547TC0.7650.8010.6130.8560.7830.7390.763
 rs109758209684160GA0.6690.3970.2120.5860.2570.1710.255
 rs109366003169514585AT0.0940.4060.1950.5390.2740.2490.267
rs69283636152546094GA0.6920.6930.5410.8510.5750.5080.568
rs79864071341179798AG0.4460.4590.2770.7390.3840.3110.367
 rs1444445836152684585TC0.0500.0450.0440.0060.2300.1390.155
 rs1438352933197623337AG0.0000.0040.0030.0000.0150.0020.006
 rs128327771246831129TC0.9180.7400.7620.9020.6510.6750.669
rs71204831132406983GC0.2420.0870.1520.0250.0740.1120.098
rs498802311108168995AC0.0220.0440.1420.0060.2360.1460.149
rs7333812240669648AG0.0950.4710.1690.2470.2580.2160.239
 rs669982221651481596GA0.0330.1380.2650.0010.1660.2130.167
rs69047576152593102AG0.4430.5790.4010.6250.3790.3920.401
rs2202282470634441CT0.2300.5490.4200.7260.4370.4910.466
 rs10929757211702661AC0.1160.5050.5560.5170.6740.5910.556
rs96694031246798900GA0.6150.4760.4060.2660.3960.3820.369
 rs110317831132459923CA0.2620.3640.2880.7480.1970.2040.241
 rs1146803319815682TC0.0650.1780.0670.1280.1900.1150.133
 rs1172457331340723944GA0.0010.0040.0030.0000.0300.0240.027
rs25537721135085453TG0.3420.6560.6000.7690.5240.5440.569
rs24561815176450837CG0.7960.3550.5330.4810.5000.5110.505
rs47424489826585CG0.8380.6290.4730.7850.5480.4710.503
rs22771639827224AG0.9590.8580.9640.8250.9660.9480.954
rs48700846152543949CT0.1000.4210.1750.1880.2340.2110.238
rs126516410105674854AG0.4690.8910.8250.9850.8900.8630.873
 rs755102046152592680TG0.0020.0020.0000.0000.0150.0110.011
 rs1124600311213723TG0.0600.0470.0330.1490.0220.0350.038
 rs1388210789674217CG0.0020.0040.0000.0000.0170.0250.014
 rs5902156511107999907CG0.0380.1120.1190.0490.0500.0860.054
rs2092315122507684CT0.1320.3230.2420.5040.2000.2430.210
 rs1222338111108354102CT0.4130.6060.4770.4440.3900.4140.457
 rs11674184211721535TG0.2590.4760.3340.4140.4030.3840.389
rs5930554X131312089TC0.7520.1740.2960.1040.3200.3190.311
rs5937008X70093038CT0.1450.3830.4140.5630.4860.5200.485
rs5933158X131578034AG0.7100.5880.5000.6860.6500.6050.741
rs5936989X70022420TA0.7420.8880.7670.9940.6770.7780.810
rs5975338X131626317AG0.4090.1030.0990.0910.1610.1450.151
rs7059898X70149078CA0.1430.1670.3550.0010.3400.3420.276
rs7888560X131171122AG0.1420.0900.1990.0000.2440.2170.201
 rs67751869454568834TC0.0850.0300.0990.0220.0530.0620.067
rs69016316152567047TC0.2560.2490.1820.1220.1020.1200.127
 rs104153911922652436CT0.2020.0740.1100.2250.1300.1070.144
rs17355373128122820TC0.5230.3850.2620.4250.2490.2510.288
 rs621328011949267882AT0.0130.0700.0530.0090.1120.1030.092
 rs17631680267090367TC0.0180.0610.0500.0010.1000.1020.082
 rs117904089876418GT0.1100.3070.3870.2460.3550.3800.353
AFRAMRASJEASFINNFEOTH
GRS4.7184.0594.0684.1964.1684.0914.116
Appendix 1—table 8
eQTLs in tumor and normal tissue.

Here, B is the effect allele. All local permutation p<0.05 are shown. Full table of eQTLs is given in Supplementary file 4.

https://doi.org/10.7554/eLife.37110.024
TissueGeneIDNBest SNPDistanceNominal
P
Permutation
P
FDR
TRP11-
816J6.3
ENSG00000269889.182rs67795055424579.10E-054.03E-040.224
TRUFY3ENSG00000018189.88rs7660770−9729457.84E-046.68E-040.370
TTRIP13ENSG00000071539.922rs27360993945811.63E-042.27E-031
NRN7SL832PENSG00000243819.328rs75709798869584.06E-042.75E-031
NLNX1ENSG00000072201.925rs623236742123007.12E-043.22E-031
NRP11-
849F2.8
ENSG00000269928.15rs143094271−6650382.43E-034.00E-031
TMIR5006ENSG00000264190.1158rs9549254−9076606.10E-044.42E-031
NRP11-
423H2.3
ENSG00000249684.150rs183686−8832353.36E-035.55E-031
NSLC7A3ENSG00000165349.7148rs43604509656.75E-047.38E-031
TALPLENSG00000162551.952.5213708.48E-047.59E-031
NPRRG4ENSG00000135378.3167rs11031737−4787183.99E-048.59E-031
NRP11-
791G15.2
ENSG00000272275.128rs64322208027141.24E-038.81E-031
TZDHHC11BENSG00000206077.622rs27360995768649.74E-041.02E-021
TACAP1ENSG00000072818.75rs783782223319034.83E-031.04E-021
NHNRNPA1P48ENSG00000224578.320rs55881068−2122664.97E-031.12E-021
TWNT4ENSG00000162552.1052rs12042083289331.30E-031.16E-021
TLRRIQ4ENSG00000188306.682rs13074500258602.96E-031.20E-021
TTEX11ENSG00000120498.9148rs117956272086501.30E-031.23E-021
TAC011994.1ENSG00000265583.128rs11685032−732361.84E-031.32E-021
TFRMD7ENSG00000165694.5215rs59331583670128.40E-041.33E-021
NTRAPPC1ENSG00000170043.75rs138420351−1336017.23E-031.44E-021
TRP11-
423H2.1
ENSG00000170089.1172rs353496−8065119.37E-031.51E-021
NSDHAP3ENSG00000185986.1022rs11742908−2976551.42E-031.58E-021
TCDC42-IT1ENSG00000230068.252rs109171671184002.10E-031.80E-021
NRP11-
362K14.6
ENSG00000270096.182rs12638862−352454.49E-031.96E-021
TTNRC6BENSG00000100354.1632rs124849512622538.29E-031.97E-021
NDNAH2ENSG00000183914.105rs143094271−1575719.73E-031.98E-021
NSALL1ENSG00000103449.720rs129336863089139.53E-032.13E-021
TXXyac-
YRM2039.2
ENSG00000181404.1371rs1388210786597051.69E-032.22E-021
NITPRIPENSG00000148841.118rs9419958−3959491.43E-022.39E-021
TRNASEK-
C17orf49
ENSG00000161939.145.5016851.19E-022.43E-021
TUQCRFS1P1ENSG00000226085.232rs18075792569261.06E-022.44E-021
TFOXO1ENSG00000150907.6166rs95492601242992.94E-032.61E-021
TNDUFS6ENSG00000145494.722rs7734992−5213872.81E-032.82E-021
TRP11-
259O2.3
ENSG00000249731.122rs7710703−6807042.89E-032.84E-021
TCDC42ENSG00000070831.1152rs41307810−250763.35E-032.91E-021
TRP3-
363L9.1
ENSG00000227064.1215rs3764771−8300732.03E-032.97E-021
TRP11-
362K14.7
ENSG00000270135.182rs67795055186117.45E-033.07E-021
NAMIGO2ENSG00000139211.5155rs9706162−6068086.28E-033.08E-021
TLPIN1ENSG00000134324.728rs11685032−1434275.06E-033.33E-021
TMBNL3ENSG00000076770.10215rs12631551907182.55E-033.37E-021
NTOX3ENSG00000103460.125rs12325192−9837893.24E-023.38E-021
NDANCRENSG00000226950.223rs623254828858238.24E-033.54E-021
TDMRT1ENSG00000137090.771rs12004436−1582613.09E-033.64E-021
NNT5C2ENSG00000076685.148rs12651648289132.05E-023.75E-021
TTMEM256-
PLSCR3
ENSG00000187838.125rs1384203514070161.77E-023.81E-021
TRP11-
401P9.4
ENSG00000261685.220rs129336867990791.64E-023.84E-021
NCTC1ENSG00000178971.95rs143094271−6670901.89E-024.03E-021
NSULT1B1ENSG00000173597.419rs13133166133181.49E-024.04E-021
NTNK1ENSG00000174292.85rs1430942711792482.24E-024.21E-021
TSOX15ENSG00000129194.35rs1384203512085662.02E-024.30E-021
TUGT2A1ENSG00000173610.719rs15877661129592.06E-024.31E-021
TRP4-
607I7.1
ENSG00000255521.134rs2553783−820852.66E-024.41E-021
NSENP3ENSG00000161956.85rs783782221065592.19E-024.45E-021
NKDM6BENSG00000132510.65rs143094271−2801212.16E-024.54E-021
TSKILENSG00000136603.982rs34194057−5259161.13E-024.63E-021
TEPHB2ENSG00000133216.1252.−6801046.04E-034.97E-021
Appendix 1—table 9
cis-meQTLs in candidate genes where FDR < 0.1.

The statistics were calculated with an additive linear model. Here B is used as the effect allele. Full Table of cis-meQTLs with nominal significance (p-value<0.05) is given in Supplementary file 5.

https://doi.org/10.7554/eLife.37110.025
TissueSNPCpG*Betat-statp-valueFDRGeneCpG islandPromoterAAABBB
N13_
41179798
13_
41189143
−0.413−19.8376.57E-204.88E-15FOXO1nono15164
T13_
41179798
13_
41189143
−0.333−8.2963.29E-117.74E-07FOXO1nono28235
T5_12837555_
1285974
−0.427−7.1672.21E-093.46E-05TERTnono36173
N5_
1283755
5_
1285974
−0.461−7.461.42E-081.11E-04TERTnono18143
T5_
1286516
5_
1277576
−0.289−6.3584.53E-083.99E-04TERTyesno72920
T1_
22450487
1_
22456326
−0.123−5.5071.04E-066.36E-03WNT4nono40151
N5_
1286516
5_
1285974
0.4265.4774.51E-061.76E-02TERTnono6227
T5_
1286516
5_
1285974
0.2994.7391.61E-054.82E-02TERTnono72920
N13_
41179798
13_
41223110
−0.174−4.9572.10E-055.36E-02FOXO1nono15164
N5_
1283755
5_
1282413
−0.17−4.8163.16E-057.12E-02TERTnono18143
T11_
225196
11_
195431
0.2224.4524.31E-059.21E-02BET1LnoODF34880
  1. * Minimum CpG coverage of ≥2 required in ≥90% of samples. CpG in 1 Mb flank from the SNP.

Appendix 1—table 10
Replication of previously suggested UL predisposition loci.

For study details, see the literature references in the main text. The population, locus, gene and risk allele (RA) information were collected from the original studies. The associations, P (UKBB) column, were taken from the UKBB summary statistics for 15,453 UL cases. The bolded values pass FWER (Bonferroni for seven independent loci; p<0.05/7).

https://doi.org/10.7554/eLife.37110.026
StudyPopulationLocusSuggested geneSNPRAMethodP (UKBB)
Cha et al.Japanese22q13.1TNRC6Brs12484776GGWAS1.0E-10
Edwards et al.European Americans22q13.1TNRC6Brs12484776GGWAS1.0E-10
Cha et al.Japanese11p15.5BET1Lrs2280543GGWAS2.2E-08
Edwards et al.European Americans11p15.5BET1Lrs2280543GGWAS2.2E-08
Zhang et al.African Americans1q42.2PCNXL2rs7546784-Admixture3.6E-02
Zhang et al.African Americans2q32.2PMS1rs256552-Admixture1.8E-01
Cha et al.Japanese10q24.33SLKrs7913069AGWAS3.4E-01
Hellwege et al.African American22q13.1CYTH4rs5995416CGWAS5.3E-01
Hellwege et al.African American22q13.1CYTH4rs739187CGWAS6.2E-01
Hellwege et al.African American22q13.1CYTH4rs713939CGWAS8.0E-01
Hellwege et al.African American22q13.1CYTH4rs4821628GGWAS8.1E-01
Eggert et al.Multiple17q25.3FASNrs4247357ALinkage
and GWAS
8.1E-01
Appendix 1—table 11
GWAS catalog and references to earlier literature on GRS SNPs.

The GRS SNPs rs10936600, rs11674184, rs2235529, rs2736100, rs2853676, rs72709458 and rs78378222 were found in the GWAS catalog (version 1.0.1; www.ebi.ac.uk/gwas). The numbers for allele frequency (AF), odds-ratio (OR) and association (P) follow those reported in the GWAS catalog.

https://doi.org/10.7554/eLife.37110.027
PubmedDateJournalGene symbolRisk alleleAFORP
188358602008-10-01J Med GenetIdiopathic pulmonary fibrosisTERTrs2736100-A0.412.113.00E-08
195783672009-07-05Nat GenetGliomaTERTrs2736100-G0.491.272.00E-17
195783672009-07-05Nat GenetGliomaTERTrs2853676-A0.731.264.00E-14
198360082009-10-15Am J Hum GenetLung adenocarcinomaTERTrs2736100-G0.51.122.00E-10
201399782010-02-07Nat GenetRed blood cell countTERTrs2736100-G0.40.073.00E-08
205438472010-06-13Nat GenetTesticular germ cell cancerTERTrs2736100-T0.491.338.00E-15
207004382010-08-05PLoS GenetLung adenocarcinomaTERTrs2736100-G0.391.462.00E-22
208715972010-09-26Nat GenetLung adenocarcinomaTERTrs2736100-C0.391.273.00E-11
215317912011-04-29Hum Mol GenetGliomaTERTrs2736100-?-1.251.00E-14
217253082011-07-03Nat GenetLung cancerTERTrs2736100-C0.411.271.00E-27
218276602011-08-09BMC Med GenomicsGliomaTERTrs2736100-?--7.00E-09
219463512011-09-25Nat GenetBasal cell carcinomaTP53rs78378222-C-2.162.00E-20
228865592012-08-11Hum GenetGliomaTERTrs2736100-G0.4941.304.00E-09
231436012012-11-11Nat GenetLung cancerTERTrs2736100-G0.41.384.00E-27
234721652013-03-05PLoS OneEndometriosisWNT4rs2235529-T0.1521.287.00E-09
234721652013-03-05PLoS OneEndometriosisWNT4rs2235529-C0.7091.196.00E-06
234721652013-03-05PLoS OneEndometriosisWNT4rs2235529-T0.1341.258.00E-07
234721652013-03-05PLoS OneEndometriosisWNT4rs2235529-A0.1531.303.00E-09
235839802013-04-14Nat GenetInterstitial lung diseaseTERTrs2736100-A0.491.372.00E-19
244030522014-01-08Hum Mol GenetBasal cell carcinomaTP53rs78378222-G-2.244.00E-22
244654732014-01-21PLoS OneTelomere lengthTERTrs2736100-C0.084.00E-06
249082482014-06-08Nat GenetGlioma (high-grade)TERTrs2736100-C0.511.391.00E-15
 249454042014-06-19PLoS GenetBone mineral density (paediatric, upper limb)WNT4rs2235529-C0.850.121.00E-08
 249454042014-06-19PLoS GenetBone mineral density (paediatric, upper limb)WNT4rs2235529-C-0.123.00E-07
 258551362015-04-09Nat CommunBasal cell carcinomaTP53rs78378222-G0.0182.071.00E-20
 264240502015-10-01Nat CommunGlioblastoma-rs72709458-T-1.686.00E-24
273636822016-07-01Nat CommunMultiple myeloma-rs10936600-A-1.206.00E-15
275017812016-08-09Nat CommunEGFR mutation-positive lung adenocarcinomaTERTrs2736100-G0.3871.422.00E-31
275398872016-08-19Nat CommunBasal cell carcinomaTP53rs78378222-G0.011.412.00E-10
278632522016-11-17CellMean corpuscular hemoglobinTP53rs78378222-G0.01210.106.00E-09
278632522016-11-17CellMean corpuscular hemoglobinTERTrs2736100-A0.49820.045.00E-34
278632522016-11-17CellPlatelet countTERTrs2736100-A0.49840.033.00E-20
280173752016-12-22Am J Hum GenetMean corpuscular volumeTERTrs2736100-A-0.003.00E-06
280173752016-12-22Am J Hum GenetMean corpuscular volumeTERTrs2736100-?--2.00E-11
280173752016-12-22Am J Hum GenetMean corpuscular hemoglobinTERTrs2736100-?--1.00E-08
281352442017-01-30Nat GenetPulse pressureTP53rs78378222-T0.990.902.00E-10
283464432017-03-27Nat GenetGliomaTP53rs78378222-G0.0132.539.00E-38
283464432017-03-27Nat GenetGlioblastomaTP53rs78378222-G0.0132.635.00E-29
283464432017-03-27Nat GenetNon-glioblastoma gliomaTP53rs78378222-G0.0132.735.00E-27
285372672017-05-24Nat CommunEndometriosisGREB1rs11674184-T0.611.133.00E-17
285372672017-05-24Nat CommunEndometriosisGREB1rs11674184-T0.611.123.00E-14
286047282017-06-12Nat GenetTesticular germ cell tumorTERTrs2736100-A0.511.289.00E-25
286047322017-06-12Nat GenetTesticular germ cell tumorTERTrs2736100-A0.51.298.00E-20
Appendix 1—figure 1
Structure of UL predisposition loci.

The lead SNPs from stage one and their genomic context. The LD estimates (r2) were taken from UK10k ALSPAC. Associations are based on the UKBB cohort. Gene symbols and ENCODE tracks (details in Supplementary Methods) are shown for reference; coordinates follow hg19. See main text Table 1 for more information. In total 22 figures, ordered by genomic position.

https://doi.org/10.7554/eLife.37110.028
Appendix 1—figure 2
Clinical background information for the Helsinki cohort patients.

(A,) the number of ULs per patient (n = 457). (B), patient’s age at hysterectomy (n = 392). (C), parity (n = 367). (D), body mass index (BMI; n = 366). (E), menopause status (n = 367).

https://doi.org/10.7554/eLife.37110.029
Appendix 1—figure 3
MED12 mutation status distributions for the Helsinki cohort patients.

On left, mutation-positive tumors per patient (n = 457), and on right, mutation-negative tumors per patient (n = 457).

https://doi.org/10.7554/eLife.37110.030
Appendix 1—figure 4
Genetic separation between the self-reported ancestries in UK Biobank.

Principal components (PC) analysis of ancestry informative markers. In total 1396 autosomal, ancestry informative SNPs were used and the resulting first two PCs are shown: top-left plot shows all 23,266 samples colored according to their self-reported ancestry. The subsequent plots show the same data divided by the self-reported ancestry. Bottom-left plot displays the variance explained by the first ten PCs. Bottom-right violin plot displays the distribution of the first principal component for cases (n = 2,212) and controls (n = 21,054). The phenotype is more prevalent in individuals with increased African ancestry (p<2.2 × 10−16; Wilcoxon rank sum W = 3 × 107).

https://doi.org/10.7554/eLife.37110.031
Appendix 1—figure 5
Association between GRS and phenotype.

One somatic phenotype (MED12 mutation status) and six independent case-control replication cohorts. On left, density plots (bandwidth = 0.2) for each phenotype. X-axis gives GRS values scaled to unit variance with respect to the controls. Dashed lines denote the mean GRS for cases and controls. Associations (P) and test statistics (W) are from Wilcoxon rank-sum tests. C-scores, Nagelkerke’s R2 (pseudo R2) and Receiver operating characteristic (ROC; on right) are from a logistic regression model. Positive predictive values (PPV) were computed from the highest GRS quartile, and odds ratios (OR and 95% CI) from the top and bottom GRS quartiles.

https://doi.org/10.7554/eLife.37110.032
Appendix 1—figure 6
Genomic risk scores in gnomAD populations.

Population-specific genomic risk scores (GRS) as seen based on gnomAD allele frequencies. Y-axis shows the cumulative effect of each of the 57 GRS SNPs; the X-axis is ordered by risk allele frequency. The population names displayed here follow the naming convention of the gnomAD database.

https://doi.org/10.7554/eLife.37110.033
Appendix 1—figure 7
Association between GRS and number of ULs per patient.

The numbers are based on the Helsinki cohort data of 457 patients with all distinct tumors of ≥1 cm diameter harvested at hysterectomy; details on sample collection are given in Supplementary Methods. A, summary of the model. B, diagnostic plots for the model.

https://doi.org/10.7554/eLife.37110.034
Appendix 1—figure 8
Association between GRS and MED12 mutations.

The numbers are based on the Helsinki cohort data (457 patients). (A) summary of the negative binomial model for MED12-mutation-positive tumor counts. (B) diagnostic plots for the negative binomial model. (C, D) similar model for the MED12-mutation-negative tumor counts.

https://doi.org/10.7554/eLife.37110.035
Appendix 1—figure 9
meQTLs at TERT region.

Hi-C, TADs and CpG methylation around rs2736100 with an 1Mbp flank. The needle plot shows the meQTL associations (dashed lines at 10% FDR; green line denotes the two SNPs; gray ticks denote all CpGs tested; blue needle for positive coefficient, red for negative coefficient) for tumors (above x-axis; nAA = 8, nAB = 32, nBB = 16) and normals (below x-axis; nAA = 5, nAB = 22, nBB = 7). The only significant meQTLs were seen around TERT.

https://doi.org/10.7554/eLife.37110.036
Appendix 1—figure 10
Overview of the CpG methylation around TERT in tumor tissue.

There are several CpGs in TERT whose methylation level is associated with rs2736100 genotype (nominal p<0.05). Some of them are also detected in normal tissue (Appendix 1—Table 9). Hypomethylation refers to decreased methylation in BB vs. AA genotype and hypermethylation the opposite.

https://doi.org/10.7554/eLife.37110.037
Appendix 1—figure 11
MED12 expression.

Linear model. Here, A is the risk allele. Beta 0.247 ± 0.177.

https://doi.org/10.7554/eLife.37110.038
Appendix 1—figure 12
Telomere lengths in tumors and normals.

Here A denotes the risk allele (see Appendix 1—Table 1 for the allele information). In tumors, shorter telomere length is significantly associated with the risk allele at 5p15.33 (rs2736100) (Kruskal-Wallis test).

https://doi.org/10.7554/eLife.37110.039
Appendix 1—figure 13
The association between risk allele count and telomere length On X-axis, total number of risk alleles at TERT (rs72709458, rs2736100, rs2853676), TERC (rs10936600) and OBFC1 (rs1265164).

On Y-axis, estimated telomere length. Linear regression model p=0.055; 95% CI −408.511 – 4.704 per one risk allele.

https://doi.org/10.7554/eLife.37110.040

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
    Obstetrics & Gynecology
    1. K Gross
    (2000)
    Finding genes for uterine fibroids, Obstetrics & Gynecology, 95, Elsevier, 10.1016/S0029-7844(00)00715-8.
  18. 18
  19. 19
  20. 20
    MED12 mutation frequency in unselected sporadic uterine leiomyomas
    1. H-R Heinonen
    2. NS Sarvilinna
    3. J Sjöberg
    4. K Kämpjärvi
    5. E Pitkänen
    6. P Vahteristo
    7. N Mäkinen
    8. LA Aaltonen
    (2014)
    Fertility and sterility 102:1137–1142.
  21. 21
  22. 22
  23. 23
     Uterus Development
    1. MA Hill
    (2018)
    Embryology.
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
    Genome wide association and linkage analyses identified three loci-4q25, 17q23.2, and 10q11.21-associated with variation in leukocyte telomere length: the long life family study
    1. JH Lee
    2. R Cheng
    3. LS Honig
    4. M Feitosa
    5. CM Kammerer
    6. MS Kang
    7. N Schupf
    8. SJ Lin
    9. JL Sanders
    10. H Bae
    11. T Druley
    12. T Perls
    13. K Christensen
    14. M Province
    15. R Mayeux
    (2013)
    Frontiers in Genetics 4:e310.
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
    The 3D genome browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions
    1. Y Wang
    2. B Zhang
    3. L Zhang
    4. L An
    5. J Xu
    6. D Li
    7. MNK Choudhary
    8. Y Li
    9. M Hu
    10. R Hardison
    11. T Wang
    12. F Yue
    (2017)
    BioRxiv, 10.1101/112268.
  62. 62
  63. 63
  64. 64
  65. 65

Decision letter

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Genetic predisposition to uterine leiomyoma is determined by loci for genitourinaty development and genome stability" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Eduardo Franco as the Reviewing Editor and Charles Sawyers as the Senior Editor.

As a key feature of eLife's review process, the reviewers have discussed your paper and their critiques with one another in an online forum moderated by the Reviewing Editor and the Senior Editor. This process fosters consensus and provides additional guidance to assist authors in improving their work. Although your submission elicited much interest, our decision is that it needs major revisions before we can consider it for publication. We have captured in this decision letter the main points in the reviewers' critiques and in the subsequent discussion. Please keep in mind that the revised version may be shown to the reviewers of the original submission.

Summary:

Välimäki et al. performed a genome-wide association study (GWAS) of uterine leiomyomas with 15,453 cases and 392,628 controls to identify common genetic variants associated with uterine fibroids. In a discovery-phase GWAS across white women from the UK Biobank and Helsinki cohorts, the authors identify 22 independent loci to have significant associations with the phenotype of interest. Based on LD-pruning (r2 < 0.3), a set of 57 representative genetic variants were selected for quantifying the overall polygenic risk score/genomic risk score (GRS) in subsequent analyses. Rather than replicating the individual loci's associations with uterine fibroids, the authors demonstrate the reproducibility of elevated composite GRS scores in cases compared to controls in several analyses stratified by ancestry, clinical features, and MED12 mutation status. Follow-up functional studies in the manuscript are reported for gene expression, DNA methylation, and telomere length association analyses. The authors conclude that their reported observations indicate a genetic predisposition to uterine fibroids, a heritable component to MED12 somatic mutations in uterine fibroids, and an explanation for the increased prevalence of genitourinary disease in women of African origin.

Essential revisions:

1) Conclusions are not sufficiently justified by the findings:

This is a very interesting study that leverages several powerful resources to evaluate the relationship between GWAS variants and risk for uterine leiomyoma. This GWAS is poised to be the largest GWAS of uterine fibroids yet published. The authors identify several GWAS variants in discovery stage, but unfortunately, the effects did not replicate in stage 3 analyses. They did find strong associations between their polygenic risk score and uterine leiomyoma outcome and some promising findings from their secondary functional evaluates via tissue gene expression, DNA methylation, and telomere length analyses.

Primary concerns are with regard to the rigor and completeness of the post-GWAS studies. Current expectations of human genetics research require sufficient follow-up analysis to provide more comprehensive insight into the biological correlates of genetic associations. The numerous directions the authors took in the post-GWAS analyses made the paper lack a sense of synthesis or cohesion. There were a lot of analyses that were performed but the write up lacked focus and contained statements that were not by results. The reviewers recommended a more in-depth interpretation of findings and additional analyses (e.g., network or pathway analysis), in addition to tidying up the results.

Although the authors pursue a number of different follow-up analyses, none of the analyses are yet carried to completion, and therefore none are sufficiently mature to provide a strong, cohesive functional explanation of how the genes they have identified influence the risk for uterine fibroids. A more well-developed knowledge of the functional role of the associated loci in uterine fibroids will be an exciting addition to understanding the biology of these common neoplasms.

2) Group comparisons and other methodological details:

The authors do not provide sufficient detail on the tumor tissue experiments in the main text or how their experiments were performed. Some of these details are presented in the supplementary material but a minimal amount of additional details should be provided in the main text to allow readers to assess the overall quality of the analyses. Also, the supplementary materials describe the QC and processing details but details are missing regarding the statistical analyses of these tumor tissue studies.

Many of their functional analyses (e.g., gene expression, telomere length, and DNA methylation) involve comparing uterine fibroid samples to 'control' samples that are not well-defined. The authors should indicate whether the control tissue is patient-matched unaffected, adjacent myometrium or myometrial tissue isolated from a completely unaffected control.

In the discussion of gene expression analyses, the authors mention accessing 'myometrium' tissue on the GTEx Portal (Discussion, fourth paragraph). Currently, GTEx only provides data taken from the entire uterus. Moreover, given the nature of the individuals from whom GTEx tissue is derived (i.e., predominantly older individuals dying of natural causes), the relevance of these analyses to uterine gene expression in reproductive-aged women is questionable.

3) Statistical analysis:

In general the “Statistical analysis” Materials and methods section is not presented with sufficient detail in the main text.

Why did the author's perform Bonferroni corrections of the GWAS but FDR on other analyses? Would those analyses survive a Bonferroni Multiple testing correction?

GWAS meta-analyses across all three stages should be presented including heterogeneity statistics.

4) Ancestry and risk:

The authors use self-reported ancestry in their follow-up analyses. PCA-based analysis of ancestry-informative markers is necessary for ensuring homogeneity of the individual follow-up cohorts. The discovery and stage 2 meta-analyses were performed primarily in white European populations and did not replicate in the final stage 3 that included multiple ancestries, including those of African ancestry. The author's main conclusions are that their findings explain the occurrence of leiomyoma in women of African origin (both in the abstract and final discussion). Correlation between increased percentage of African ancestry and probability of reporting uterine fibroids is a necessary analysis for the conclusion that "[Their] findings… in part explain the more frequent occurrence of UL in women of African origin."

The authors should describe the frequencies of the variants evaluated in stage 2 and 3 for their respective populations. Is it possible that some of the variants that associated in stage 1 in European ancestry individuals are rare in other ancestral groups?

The interchanging usage of 'African American', 'Admixed American', and 'African' in the text should be addressed.

5) Correlations of findings with biology:

Recent work investigating the role of the WNT4/CDC42 locus in endometriosis has shown that CDC42 – not WNT4 – contributes to the endometriosis phenotype. In light of the evidence drawn from analyses in endometriosis, the work presented in this manuscript provides insufficient evidence to implicate WNT4 in uterine fibroids. Additional work to either verify or strengthen the specific implication of WNT4 (from the CDC42/WNT4 locus) is necessary, else the revised manuscript should more fully justify the conclusions around WNT4.

A more thorough and solid functional validation to prove the conclusions should be performed. Given the high frequency of MED12 mutations in leiomyomas the heritability to MED12 mutations is possibly one of the strongest points of the manuscript, however no real validation has been demonstrated in the study. The authors show a "trend" of an over-expression which cannot be taken as enough evidence for such a claim. What was the effect size in patients with both MED12 mutated positive and negative leiomyoma? A strong association between the risk allele rs5937008 and the MED12 status is described and the authors argue that the risk allele could facilitate selection of somatic MED12 mutations, possibly through increased expression.

UL can be differentiated in 3 mutually exclusive subtypes, with the MED12 mutant subtype been the most common. In the manuscript the authors refer and analyze carefully associations between risk alleles and MED12 mutation positive samples, however there is no mention to the other two subtypes. From the set of 1481 lesions included in the study, what was the distribution of tumors and patients with FH deficient or HMGA2 overexpressing lesions? Were FH mutation carriers included in the study? Was there a risk allele from the 22 significant loci that associated to any of these subtypes?

Although I understand that not all associations can be experimentally proven, a functional evaluation of this phenomenon and how would the rs5937008 drive MED12 mutations would greatly improve the manuscript and provide strong evidence of the rs5937008 allele and UL development. These data would be of great importance towards clinical management of patients.

TERT SNPs rs72709458; rs2736100; rs2853676 are known to be related to many different neoplasias. How did those relate to MED12 mutated tumors?

In the same fashion that for rs5937008, it would be of great interest to confirm a functional effect by which rs58415480 regulates ESR1 expression or the repercussion of the lead SNP at 2p on GREB1 mediated regulation of ESR1.

The authors divide the 22 risk loci into two global biological processes one of which is genomic instability and telomere maintenance. Genetic instability is a common mechanism driving tumorigenesis and molecular signatures of genomic instability can be extrapolated from tumor profiling, in this context, a molecular profile of the leiomyomas associated with ATM, TP53, TERT, TERC or OBFC1 could reveal strong evidence to validate the role of genetic instability in UL development.

6) Additional issues in interpretation:

The follow-up analyses rely solely on the composite GRS as a surrogate measurement for genetic association with the phenotype of interest. It is assumed that this limited metric is selected due to the underpowered nature of the small follow-up cohorts.

None of the results presented in the 'Association to Gene Expression' section are actually significant. The authors should consider removing this section from the main portion of the manuscript. Similarly, the 'negative trend' observed in the 'combined effect of TERT, TERC, and OBFC1' on telomere length is also insignificant. The fact that the combined effect of three major genes still is insignificant indicates some concern with interpretation of the results.

The authors state that they replicated in NFBC and in the other ethnic groups but the replication results for the other five ethnic groups were not in the supplementary material. Those should be provided. In addition, the authors state (subsection “Replication of the GWAS and GRS”) that the third stage replicated the results but later on in the same paragraph state that no associations passed multiple testing corrections for the single SNP associations. The authors should state clearly that one of the associations replicated in stage 3, based on the evidence they present.

There are multiple locations where the authors state that results significantly associated but later on state that the results did not pass multiple testing corrections. The authors should make it clear that the there was no evidence for significant association if effects did not pass multiple testing corrections. This language is also present in the Discussion, where results are described as replicating in stage 3.

https://doi.org/10.7554/eLife.37110.046

Author response

Essential revisions:

1) Conclusions are not sufficiently justified by the findings:

[…] Primary concerns are with regard to the rigor and completeness of the post-GWAS studies. Current expectations of human genetics research require sufficient follow-up analysis to provide more comprehensive insight into the biological correlates of genetic associations. The numerous directions the authors took in the post-GWAS analyses made the paper lack a sense of synthesis or cohesion. There were a lot of analyses that were performed but the write up lacked focus and contained statements that were not by results. The reviewers recommended a more in-depth interpretation of findings and additional analyses (e.g., network or pathway analysis), in addition to tidying up the results.

Although the authors pursue a number of different follow-up analyses, none of the analyses are yet carried to completion, and therefore none are sufficiently mature to provide a strong, cohesive functional explanation of how the genes they have identified influence the risk for uterine fibroids. A more well-developed knowledge of the functional role of the associated loci in uterine fibroids will be an exciting addition to understanding the biology of these common neoplasms.

We thank the reviewers for their encouraging comments and have now improved the presentation throughout the manuscript as requested. We put specific emphasis on making the follow-up analyses easier to follow and unified the writing. Our study indeed extends well beyond a traditional GWAS. We combine population, patient and tumor derived resources in order to give a truly unique view into the hereditary risk of ULs. The wide-range of our post-GWAS analyses will serve as an exceptional basis for the scientific community to design future, in-depth functional studies to better characterize and develop models for UL tumorigenesis.

The reviewers suggested an additional pathway-based analysis to help the interpretation of the risk loci. We followed this recommendation by applying the DEPICT framework (Pers et al., 2015), which is designed to integrate multiple GWAS loci for in silico target gene prioritization, pathway enrichment and tissue-specific expression profiling. For analysis details, see the main text and supplementary material. In brief, applying DEPICT on the set of genome-wide significant SNPs did not reveal any significant enrichment with the exception of one pathway related to induced stress. ATM was the highest-ranking target gene, and uterus/myometrium were among the highest-ranking tissue types.

2) Group comparisons and other methodological details:

The authors do not provide sufficient detail on the tumor tissue experiments in the main text or how their experiments were performed. Some of these details are presented in the supplementary material but a minimal amount of additional details should be provided in the main text to allow readers to assess the overall quality of the analyses. Also, the supplementary materials describe the QC and processing details but details are missing regarding the statistical analyses of these tumor tissue studies.

We relocated the key methodological details into the main text and refined the description of the processing details to address these issues. The Materials and methods section now includes implementation details of the UKBB sample and genotype QC, criteria use-d to identify UL cases, and the model used for association testing. Further key details on the meta-analysis, genomic risk score (GRS) and follow-up analyses were also relocated from supplementary material to the main text.

Many of their functional analyses (e.g., gene expression, telomere length, and DNA methylation) involve comparing uterine fibroid samples to 'control' samples that are not well-defined. The authors should indicate whether the control tissue is patient-matched unaffected, adjacent myometrium or myometrial tissue isolated from a completely unaffected control.

We have clarified the text regarding the sample types involved in the follow-up analysis. All the controls were patient-matched, unaffected myometrium. Please see the Materials and methods section for exact numbers of each sample type.

In the discussion of gene expression analyses, the authors mention accessing 'myometrium' tissue on the GTEx Portal (Discussion, fourth paragraph). Currently, GTEx only provides data taken from the entire uterus. Moreover, given the nature of the individuals from whom GTEx tissue is derived (i.e., predominantly older individuals dying of natural causes), the relevance of these analyses to uterine gene expression in reproductive-aged women is questionable.

We understand this point. To be on the safe side, these references and the respective parts of discussion were removed from the Discussion section.

3) Statistical analysis:

In general the “Statistical analysis” Materials and methods section is not presented with sufficient detail in the main text.

We relocated the most relevant sections of the supplementary methods into the main text. This included all the statistical methods used for GWAS, meta-analysis and GRS association, as well as details on the eQTL and meQTL identification.

Why did the author's perform Bonferroni corrections of the GWAS but FDR on other analyses? Would those analyses survive a Bonferroni Multiple testing correction?

FDR was chosen whenever the total number of tests was expected to be large, that is, a family of one hundred or more multiple hypotheses. Controlling for FWER at this scale would lack discovery power.

GWAS meta-analyses across all three stages should be presented including heterogeneity statistics.

We added supplementary files with the requested information: for each cohort, we report the allele frequency, association effect size and P-value of each of the 57 GRS SNPs. Supplementary file 1 gives these statistics together with heterogeneity estimates, Cochrane's Q and I2 index, for the UKBB and Helsinki cohorts. Supplementary file 2 gives the SNP statistics for UKBB, Helsinki and NFBC. Supplementary file 3 contains these information for the five UKBB follow-up cohorts.

4) Ancestry and risk:

The authors use self-reported ancestry in their follow-up analyses. PCA-based analysis of ancestry-informative markers is necessary for ensuring homogeneity of the individual follow-up cohorts. The discovery and stage 2 meta-analyses were performed primarily in white European populations and did not replicate in the final stage 3 that included multiple ancestries, including those of African ancestry. The author's main conclusions are that their findings explain the occurrence of leiomyoma in women of African origin (both in the abstract and final discussion). Correlation between increased percentage of African ancestry and probability of reporting uterine fibroids is a necessary analysis for the conclusion that "[Their] findings… in part explain the more frequent occurrence of UL in women of African origin."

The requested PCA-based analysis is now included: a short summary was appended to the Materials and methods section, together with new supplementary material. In short, we combined 21 panels of ancestry informative markers – in total 1,396 unique SNPs – and compared the resulting PCs against the self-reported ancestry. As expected, the genetic differences at these informative markers separated each self-reported ancestry into a distinct, homogeneous cluster. Of note, the first principal component differentiated both the ‘African’ and ‘Black Caribbean’ cohorts from the others. Please see Appendix—Figure 4 for the resulting clustering.

To address reviewer’s concern on “correlation between increased percentage of African ancestry and probability of reporting ULs”, we note the following: First, the observed proportions of reported UL cases in the ‘African’ and ‘White British’ cohorts (19.1% and 7.0%, respectively) do match with the literature estimate of a 2-3 times higher prevalence of ULs in African compared to Caucasian ancestry (see also the references in the Results section).

Second, a correlation between increased percentage of African ancestry and reported ULs has seen overwhelming evidence in literature, and to no surprise, we see a strong support for it also in our data. We extended the above ancestry informative marker analysis: the first principal component was chosen as a proxy for the degree of African ancestry, and a comparison of the 2,212 UL cases and 21,054 controls did show a significant difference in their distributions, that is, the prevalence increases in degree of African ancestry (P<2.2⨉10-16; Wilcoxon rank sum W=3⨉107) as expected. See Appendix—Table 1 for the background information on each follow-up cohort.

The authors should describe the frequencies of the variants evaluated in stage 2 and 3 for their respective populations. Is it possible that some of the variants that associated in stage 1 in European ancestry individuals are rare in other ancestral groups?

We added supplementary files with the requested information. The follow-up cohorts do carry some differences regarding their allele frequency and heterogeneity of effect size as expected (Supplementary file 1-3). To answer reviewer’s concern, we note that the mean GRS was relatively even across the different population-strata among both the follow-up cohorts (Figure 4) and gnomAD (Appendix—Figure 6). That is, none of the follow-up populations displayed a deflated GRS or exceptionally rare risk alleles.

The interchanging usage of 'African American', 'Admixed American', and 'African' in the text should be addressed.

We unified the terminology as requested.

5) Correlations of findings with biology:

Recent work investigating the role of the WNT4/CDC42 locus in endometriosis has shown that CDC42 – not WNT4 – contributes to the endometriosis phenotype. In light of the evidence drawn from analyses in endometriosis, the work presented in this manuscript provides insufficient evidence to implicate WNT4 in uterine fibroids. Additional work to either verify or strengthen the specific implication of WNT4 (from the CDC42/WNT4 locus) is necessary, else the revised manuscript should more fully justify the conclusions around WNT4.

WNT4 (and WNT/β-catenin signaling pathway) plays a significant role in the myomagenesis of MED12 mutated leiomyomas. In addition, the evidence in our tumor data – risk SNP associating with increased WNT4 expression but not CDC42 expression – leans towards WNT4 as the predisposing gene. However we do agree that this evidence is not unambiguous and have revised the discussion to better acknowledge the possible role of CDC42 (see also Table 1 and Figure 2).

A more thorough and solid functional validation to prove the conclusions should be performed. Given the high frequency of MED12 mutations in leiomyomas the heritability to MED12 mutations is possibly one of the strongest points of the manuscript, however no real validation has been demonstrated in the study. The authors show a "trend" of an over-expression which cannot be taken as enough evidence for such a claim. What was the effect size in patients with both MED12 mutated positive and negative leiomyoma? A strong association between the risk allele rs5937008 and the MED12 status is described and the authors argue that the risk allele could facilitate selection of somatic MED12 mutations, possibly through increased expression.

We recognize that the expression trend is not enough evidence for the claims and thus such wording is removed from the discussion. The effect size with regard to expression was larger in wild type tumors compared to MED12 mutated tumors (0.03 and 0.003 respectively). Thus if the expression difference is of significance it would seem that the risk SNP confers higher MED12 expression, but after the mutation this addition loses some of its selective power (as the impact of expression level is secondary as compared to the effect of the mutation). This is of course very speculative and indeed the relation between MED12 expression and rs5937008 is not discussed in the revised paper.

UL can be differentiated in 3 mutually exclusive subtypes, with the MED12 mutant subtype been the most common. In the manuscript the authors refer and analyze carefully associations between risk alleles and MED12 mutation positive samples, however there is no mention to the other two subtypes. From the set of 1481 lesions included in the study, what was the distribution of tumors and patients with FH deficient or HMGA2 overexpressing lesions? Were FH mutation carriers included in the study? Was there a risk allele from the 22 significant loci that associated to any of these subtypes?

Other subtype-specific associations were not evaluated due to the GRS association merely to patients with MED12 mutated lesions. Efforts to scrutinize the more rare subtypes will also require larger numbers of these subtype cases as discussed in the manuscript.

Although I understand that not all associations can be experimentally proven, a functional evaluation of this phenomenon and how would the rs5937008 drive MED12 mutations would greatly improve the manuscript and provide strong evidence of the rs5937008 allele and UL development. These data would be of great importance towards clinical management of patients.

In the same fashion that for rs5937008, it would be of great interest to confirm a functional effect by which rs58415480 regulates ESR1 expression or the repercussion of the lead SNP at 2p on GREB1 mediated regulation of ESR1.

These are truly important aspects but, unfortunately, not feasible in the scope of this manuscript. Typically, taking GWAS findings to this level has taken years of work.

TERT SNPs rs72709458; rs2736100; rs2853676 are known to be related to many different neoplasias. How did those relate to MED12 mutated tumors?

We made no a priori hypothesis regarding these specific SNPs and MED12 mutated tumors. That said, when controlling for multiple testing over all 57 GRS SNPs, the above TERT SNPs did not show significant associations to the MED12 status. See Appendix—Table 5 for the details and any nominal associations.

The authors divide the 22 risk loci into two global biological processes one of which is genomic instability and telomere maintenance. Genetic instability is a common mechanism driving tumorigenesis and molecular signatures of genomic instability can be extrapolated from tumor profiling, in this context, a molecular profile of the leiomyomas associated with ATM, TP53, TERT, TERC or OBFC1 could reveal strong evidence to validate the role of genetic instability in UL development.

As mentioned in the manuscript – and further clarified in the revised version – we were not able to see a correlation between genetic instability and the risk SNPs. This could be due to the lack of power in whole-genome analysis as well as the technical limitations (e.g. inability to detect short copy number changes and copy neutral translocations) of SNP array data.

6) Additional issues in interpretation:

The follow-up analyses rely solely on the composite GRS as a surrogate measurement for genetic association with the phenotype of interest. It is assumed that this limited metric is selected due to the underpowered nature of the small follow-up cohorts.

The composite GRS approach has seen a wide range of applications in genetics research and, indeed, the approach was chosen due to the limitations of the follow-up cohorts.

None of the results presented in the 'Association to Gene Expression' section are actually significant. The authors should consider removing this section from the main portion of the manuscript. Similarly, the 'negative trend' observed in the 'combined effect of TERT, TERC, and OBFC1' on telomere length is also insignificant. The fact that the combined effect of three major genes still is insignificant indicates some concern with interpretation of the results.

We thank the reviewers for these considerations. We choose to retain both the eQTL and the telomere length results. In the eQTL analysis, our discovery power is limited due to the combination of a large family of hypotheses and restricted number of samples per genotype. While restricted number of samples is also a problem in the telomere analysis, some of the associations do reach statistical significance (e.g. the effect of rs2736100 on telomere length). All in all, we trust that the reported data does add value towards designing future and more focused experiments.

The authors state that they replicated in NFBC and in the other ethnic groups but the replication results for the other five ethnic groups were not in the supplementary material. Those should be provided. In addition, the authors state (subsection “Replication of the GWAS and GRS”) that the third stage replicated the results but later on in the same paragraph state that no associations passed multiple testing corrections for the single SNP associations. The authors should state clearly that one of the associations replicated in stage 3, based on the evidence they present.

The supplementary material did already include all the GRS-based analyses and summary statistics in Appendix—Table 6(“Summary of all the association tests for GRS”) and Appendix—Figure 5 (“Association between GRS and phenotype”). These show the case-control distribution of GRS in each of the six follow-up cohorts and the resulting association statistics. The GRS association to the UL phenotype did replicate in NFBC and in the five other ethnic cohorts.

Regarding the single SNP association tests, we followed the reviewer’s request and extended the supplementary material to include SNP summary statistics for all the cohorts. We have clarified the statement regarding single SNP associations and multiple testing correction (subsection “Replication of the GWAS and GRS”, second paragraph). Due to the limited size of the follow-up cohorts, a single SNP was underpowered to capture the phenotype association, thus, the above mentioned composite GRS was utilized to model UL risk and replicate the phenotype association. That said, we have now clarified the main text to make a more clear separation between the single-SNP and GRS based results and their interpretation.

There are multiple locations where the authors state that results significantly associated but later on state that the results did not pass multiple testing corrections. The authors should make it clear that the there was no evidence for significant association if effects did not pass multiple testing corrections. This language is also present in the Discussion, where results are described as replicating in stage 3.

We have improved the presentation and made the interpretation of results easier to follow. The reviewer’s concern was addressed by making a more clear distinction between single SNP and GRS based results and their interpretation.

https://doi.org/10.7554/eLife.37110.047

Article and author information

Author details

  1. Niko Välimäki

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Formal analysis, Writing—original draft
    Contributed equally with
    Heli Kuisma
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9200-9560
  2. Heli Kuisma

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Formal analysis, Writing—original draft
    Contributed equally with
    Niko Välimäki
    Competing interests
    No competing interests declared
  3. Annukka Pasanen

    Department of Pathology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    Contribution
    Resources
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0079-9807
  4. Oskari Heikinheimo

    Department of Obstetrics and Gynecology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    Contribution
    Resources
    Competing interests
    No competing interests declared
  5. Jari Sjöberg

    Department of Obstetrics and Gynecology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    Contribution
    Resources
    Competing interests
    No competing interests declared
  6. Ralf Bützow

    Department of Pathology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    Contribution
    Resources
    Competing interests
    No competing interests declared
  7. Nanna Sarvilinna

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    3. Department of Obstetrics and Gynecology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    4. Institute of Biomedicine, Biochemistry and Developmental Biology, University of Helsinki, Helsinki, Finland
    Contribution
    Resources
    Competing interests
    No competing interests declared
  8. Hanna-Riikka Heinonen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  9. Jaana Tolvanen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1183-4943
  10. Simona Bramante

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  11. Tomas Tanskanen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  12. Juha Auvinen

    1. Northern Finland Birth Cohorts' Project Center, Faculty of Medicine, University of Oulu, Oulu, Finland
    2. Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu, Finland
    Contribution
    Resources
    Competing interests
    No competing interests declared
  13. Outi Uimari

    Department of Obstetrics and Gynecology, PEDEGO Research Unit, Medical Research Center Oulu, Oulu University Hospital, University of Oulu, Oulu, Finland
    Contribution
    Resources
    Competing interests
    No competing interests declared
  14. Amjad Alkodsi

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3528-4683
  15. Rainer Lehtonen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  16. Eevi Kaasinen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    3. Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  17. Kimmo Palin

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Supervision
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4621-6128
  18. Lauri A Aaltonen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Contribution
    Supervision
    For correspondence
    lauri.aaltonen@helsinki.fi
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6839-4286

Funding

Terveyden Tutkimuksen Toimikunta (1250345)

  • Lauri A Aaltonen

European Research Council (695727)

  • Lauri A Aaltonen

Syöpäjärjestöt

  • Lauri A Aaltonen

Sigrid Juséliuksen Säätiö

  • Lauri A Aaltonen

Jane ja Aatos Erkon Säätiö

  • Lauri A Aaltonen

Luonnontieteiden ja Tekniikan Tutkimuksen Toimikunta (287665)

  • Niko Välimäki
  • Niko Välimäki

NordForsk (62721)

  • Kimmo Palin

Terveyden Tutkimuksen Toimikunta (312041)

  • Lauri A Aaltonen

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are thankful to Sini Marttinen, Sirpa Soisalo, Marjo Rajalaakso, Inga-Lill Åberg, Iina Vuoristo, Alison Ollikainen, Elina Pörsti, Salla Välipakka and Heikki Metsola for their technical support. We also thank Pirjo Ikonen and the rest of the staff of the Kätilöopisto Maternity Hospital, and the staff of the Department of Pathology, University of Helsinki for technical assistance. We thank Minna Männikkö, Tuula Ylitalo, and the rest of the staff of the Northern Finland Birth Cohort Studies, Faculty of Medicine, University of Oulu for technical assistance. The study was supported by grants from Academy of Finland (Finnish Center of Excellence Program 2012 – 2017, 2018 – 2025, No. 1250345 and 312041), European Research Council (ERC, 695727), Cancer Society of Finland, Sigrid Juselius Foundation and Jane and Aatos Erkko Foundation. NV received a grant from the Academy of Finland (No. 287665). KP received a grant from Nordic Information for Action eScience Center (NIASC), the Nordic Center of Excellence financed by NordForsk (No. 62721). This research has been conducted using the UK Biobank Resource, project #32506.

Ethics

Human subjects: The anonymous patient samples (65) were collected according to Finnish laws and regulations by permission of the director of the health care unit. For the rest of the patients, an informed consent was obtained. This study was conducted in accordance with the Declaration of Helsinki and approved by the Finnish National Supervisory Authority for Welfare and Health, National Institute for Health and Welfare (THL/151/5.05.00/2017), and the Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS/177/13/03/03/2016).

Publication history

  1. Received: March 29, 2018
  2. Accepted: September 17, 2018
  3. Accepted Manuscript published: September 18, 2018 (version 1)
  4. Version of Record published: October 26, 2018 (version 2)

Copyright

© 2018, Välimäki et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 550
    Page views
  • 152
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Developmental Biology
    2. Genetics and Genomics
    Young Sun Hwang et al.
    Short Report Updated
    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Jürgen Jänes et al.
    Tools and Resources Updated