Genetic predisposition to uterine leiomyoma is determined by loci for genitourinary development and genome stability

  1. Niko Välimäki
  2. Heli Kuisma
  3. Annukka Pasanen
  4. Oskari Heikinheimo
  5. Jari Sjöberg
  6. Ralf Bützow
  7. Nanna Sarvilinna
  8. Hanna-Riikka Heinonen
  9. Jaana Tolvanen
  10. Simona Bramante
  11. Tomas Tanskanen
  12. Juha Auvinen
  13. Outi Uimari
  14. Amjad Alkodsi
  15. Rainer Lehtonen
  16. Eevi Kaasinen
  17. Kimmo Palin
  18. Lauri A Aaltonen  Is a corresponding author
  1. University of Helsinki, Finland
  2. University of Helsinki and Helsinki University Hospital, Finland
  3. Faculty of Medicine, University of Oulu, Finland
  4. Oulu University Hospital, University of Oulu, Finland
  5. Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Sweden


Uterine leiomyomas (ULs) are benign tumors that are a major burden to women’s health. A genome-wide association study on 15,453 UL cases and 392,628 controls was performed, followed by replication of the genomic risk in six cohorts. Effects of the risk alleles were evaluated in view of molecular and clinical characteristics. 22 loci displayed a genome-wide significant association. The likely predisposition genes could be grouped to two biological processes. Genes involved in genome stability were represented by TERT, TERC, OBFC1 - highlighting the role of telomere maintenance - TP53 and ATM. Genes involved in genitourinary development, WNT4, WT1, SALL1, MED12, ESR1, GREB1, FOXO1, DMRT1 and uterine stem cell marker antigen CD44, formed another strong subgroup. The combined risk contributed by the 22 loci was associated with MED12 mutation-positive tumors. The findings link genes for uterine development and genetic stability to leiomyomagenesis, and in part explain the more frequent occurrence of UL in women of African origin.

eLife digest

Fibroids – also known as uterine leiomyomas, or myomas – are a very common form of benign tumor that grows in the muscle wall of the uterus. As many as 70% of women develop fibroids in their lifetime. About a fifth of women report symptoms including severe pain, heavy bleeding during periods and complications in pregnancy. In the United States, the cost of treating fibroids is estimated to be $34 billion each year.

Despite the prevalence of fibroids in women, there are few treatments available. Drugs to target them have limited effect and often an invasive procedure such as surgery is needed to remove the tumors. However, a better understanding of the genetics of fibroids could lead to a way to develop better treatment options.

Välimäki, Kuisma et al. used a genome-wide association study to seek out DNA variations that are more common in people with fibroids. Using data from the UK Biobank, the genomes of over 15,000 women with fibroids were analyzed against a control population of over 392,000 individuals. The analysis revealed 22 regions of the genome that were associated with fibroids. These regions included genes that may well contribute to fibroid development, such as the gene TP53, which influences the stability of the genome, and ESR1, which codes for a receptor for estrogen – a hormone known to play a role in the growth of fibroids. Variation in a set of genes known to control development of the female reproductive organs was also identified in women with fibroids.

The findings are the result of the largest genome-wide association study on fibroids, revealing a set of genes that could influence the development of fibroids. Studying these genes could lead to more effective drug development to treat fibroids. Revealing this group of genes could also help to identify women at high risk of developing fibroids and help to prevent or manage the condition.


Uterine leiomyomas (ULs), also known as fibroids or myomas, are benign smooth muscle tumors of the uterine wall. They are extremely common; approximately 70% of women develop ULs before menopause (Stewart et al., 2017). The symptoms, occurring in one fifth of women, include excessive menstrual bleeding, abdominal pain and pregnancy complications (Stewart et al., 2017). In most cases, durable treatment options are invasive (Stewart, 2015). ULs cause a substantial human and economic burden, and the annual cost of treating these tumors has been approximated to be as high as $34 billion in the United States, higher than the combined cost of treating breast and colon cancer (Cardozo et al., 2012).

Earlier studies have indicated strong genetic influence in UL susceptibility based on linkage (Gross, 2000), population disparity (Wise et al., 2012) and twin studies (Luoto et al., 2000). The most striking UL predisposing condition thus far characterized is hereditary leiomyomatosis and renal cell cancer (HLRCC) syndrome, caused by high-penetrance germline mutations in the Fumarate hydratase (FH) gene (Multiple Leiomyoma Consortium et al., 2002; Launonen et al., 2001). Genome-wide association studies (GWAS) have proposed several low-penetrance risk loci but few unambiguous predisposing genes have emerged. Cha et al. reported loci in chromosome regions 10q24.33, 11p15.5 and 22q13.1 based on a Japanese patient cohort (Cha et al., 2011). The 11p15.5 locus - near the Bet1 golgi vesicular membrane trafficking protein like (BET1L) gene - was later replicated in Caucasian ancestry (Edwards et al., 2013a). The 22q13.1 locus has been replicated in Caucasian, American and Saudi Arabian populations suggesting trinucleotide repeat containing 6B (TNRC6B) as a possible target gene (Edwards et al., 2013a; Aissani et al., 2015; Bondagji et al., 2017). Further UL predisposition loci have been suggested at 1q42.2 and 2q32.2 by Zhang et al (Zhang et al., 2015). and, at 3p21.31, 10p11.21 and 17q25.3 by Eggert et al (Eggert et al., 2012). A recent work reported cytohesin 4 (CYTH4) at 22q13.1 as a novel candidate locus in African ancestry (Hellwege et al., 2017). While multiple loci and genes have been implicated through these valuable studies it is not straightforward to connect any of them mechanistically to UL development.

Most ULs show somatic site-specific mutations at exons 1 and 2 of the mediator complex subunit 12 (MED12) gene (Mäkinen et al., 2011; Heinonen et al., 2014). These observations together with further scrutiny of driver mutations, chromosomal aberrations, gene expression, and clinicopathological characteristics have led to identification of at least three mutually exclusive UL subtypes; MED12 mutant, Fumarate Hydratase (FH) deficient, as well as High Mobility Group AT-Hook 2 (HMGA2) overexpressing lesions (Mehine et al., 2016).

Here we report the most powerful GWAS on uterine leiomyoma to date, and novel genome-wide significant UL susceptibility loci with plausible adjacent predisposition genes. These genes associate UL genesis to two distinct biological mechanisms: Genome stability related processes are implicated by genes Tumor Protein P53 (TP53) and ATM Serine/Threonine Kinase (ATM) together with the telomere maintenance genes Telomerase Reverse Transcriptase (TERT), Telomerase RNA Component (TERC) and STN1-CST Complex Subunit (OBFC1). The other prominent group is genes relevant for genitourinary development, specifically Wnt Family Member 4 (WNT4), Wilms Tumor 1 (WT1), Spalt Like Transcription Factor 1 (SALL1), Estrogen Receptor 1 (ESR1 or ERα), Growth Regulation By Estrogen In Breast Cancer 1 (GREB1), Forkhead Box O1 (FOXO1), Doublesex and Mab-3 Related Transcription Factor 1 (DMRT1) and CD44 Molecule (CD44). Our analysis of the X chromosome identifies a risk allele near MED12 that drives UL tumorigenesis towards somatic MED12 mutations. We report altogether 22 genome-wide significant susceptibility loci and compile them into a polygenic risk score. The UL association is then replicated in six independent cohorts of different ethnic origins: individuals of African origin are characterized by the highest risk load. Finally, we investigate the risk alleles’ association to clinical features, molecular UL subtypes, telomere length, gene expression and DNA methylation.


Identification of predisposition loci

Figure 1 provides an outline of this study. At discovery stage 1,428 SNPs emerging from 22 distinct genetic loci passed the genome-wide significance level of 5 × 10−8. Figure 2 displays a Manhattan plot of these associations (15,453 UL cases and 392,628 controls; linear mixed model). Two of the significant loci (359/1,428 SNPs) were found on the X chromosome. After linkage disequilibrium (LD; r2 ≤0.3) pruning the significant SNPs, a total of 50 LD-independent associations remained: the resulting SNPs are given in Appendix 1—table 2, and the lead SNPs are summarized in Table 1.

Outline of the study stages and genotyping cohorts.

GRS, genomic risk score. NFBC, Northern Finland Birth Cohort.
Overview of the uterine leiomyoma risk loci and the effect of increased number of MED12-mutated lesions per rs5937008 risk allele.

(A), Manhattan plot of the UK Biobank GWAS on 15,453 UL cases and 392,628 controls. On Y-axis, logarithm transformed association values, and on X-axis, autosomes and the X chromosome. The blue horizontal line denotes genome-wide significance (p=5 × 10−8). Gene symbols shown for reference. (B), MED12 region in more detail. ENCODE tracks (details in Supplementary Methods) are shown for reference. (C), The risk allele near MED12 (rs5937008) is observed with a significant increase in number of MED12-mutation-positive tumors (p=0.009; negative binomial regression; RR = 1.23 per risk allele; n = 457 Helsinki cohort patients).
Table 1
Predisposition loci for uterine leiomyoma.
ChrPositionrs-codeABB freqORPLikely disease gene
177,571,752rs78378222 #TG0.0131.539.7E-25TP53
122,450,487rs2235529 #CT0.1571.141.1E-17WNT4/CDC42
51,283,755rs72709458 #CT0.2061.126.9E-16TERT
11225,196rs507139 *GA0.0740.843.2E-13?
3169,514,585rs10936600 #AT0.2440.916.4E-12TERC
2240,669,648rs733381 *AG0.2131.105.7E-11?
  1. The numbers for B allele frequency (B Freq), odds-ratio (OR, where B is the effect allele) and association (P) are based on the UKBB cohort (15,453 UL cases). Gene symbols are shown for reference. The genomic coordinates follow hg19 and dbSNP build 147. All genome-wide significant (p<5 × 10−8) loci and their highest-association SNP are shown.

    * Previously implicated predisposition to ULs.

  2. # Previous associations to endometriosis, lung adenocarcinoma, glioma or telomere length; see literature in Appendix 1—table 11

Appendix 1—figure 1 displays the regional structure of each locus together with flanking association values, linkage disequilibrium (LD) and genome annotation. Annotation tracks are included for tissue-specific data on open chromatin, topologically associating domains (TAD) and other regulatory features (details in Supplementary Methods).

Genomic risk score

A polygenic risk score (Abraham and Inouye, 2015) was compiled based on the discovery stage associations. After LD pruning (r2 ≤0.3) the discovery-stage SNPs, 50 SNPs from the 22 distinct loci passed for the initial genomic risk score (GRS; Appendix 1—table 4). The SNP weights were based on UKBB log-odds. We applied this initial GRS of 50 SNPs to the Helsinki cohort and identified a significant association to the UL phenotype (p=8.3 × 10−10; adjusted p=1.1 × 10−8; one-tailed Wilcoxon rank-sum; W = 1.69 × 106; 457 cases and 8899 female controls).


The second stage GWAS combined the UKBB and Helsinki cohorts for a meta-analysis approach. The genome-wide statistics revealed rs117245733, at 13q14.11, as the only SNP with a suggestive (p<10−5) association in both the UKBB (OR = 1.26; p=4.2 × 10−9) and Helsinki (OR = 1.82; p=8.1 × 10−6) cohorts. Figure 3 shows the regional structure and combined association (fixed effect model p=3.1 × 10−12) at the locus: the SNP resides on a gene poor region, at a conserved element that displays activity in uterus-specific H3K27ac and DNaseI data (see ENCODE track details in Supplementary Methods). The SNP is independent of the group of associations at FOXO1 (r2 = 0.0; Figure 3).

Meta-analysis of UL risk revealed rs117245733 at a gene poor region of 13q14.11.

(A) meta-analysis P-values and the genomic context at the locus. Gene symbols and ENCODE tracks (details in Supplementary Methods) are shown for reference; coordinates follow hg19. (B) Hi-C, TADs and CpG methylation around the locus with a 1 Mb flank. The needle plot shows the meQTL associations (dashed lines at 10% FDR; green line denotes the SNP; gray ticks denote all CpGs tested; blue needle for positive coefficient, red for negative coefficient) for tumors (above x-axis; nAA = 53, nAB = 3) and normals (below x-axis; nAA = 33, nAB = 2). (C) UCSC genome browser tracks related to conservation and regulation at the locus.

The meta-analysis identified altogether 112 genome-wide significant SNPs not seen in the discovery stage: seven of those were LD-independent (r2 ≤0.3; Appendix 1-table 3) and their UKBB log-odds weights were appended to the initial GRS model. The final GRS model of 57 SNPs and their UKBB-based weights is given in Appendix 1—table 4. Supplementary file 1 gives further details on the meta-analysis results and heterogeneity statistics.

Replication of the GWAS and GRS

The third stage replicated the observations in NFBC and in five different ethnic groups. In NFBC, the SNP identified in the stage two meta-analysis, rs117245733 at 13q14.11, was replicated (p=0.034; linear mixed model; OR = 1.50; 95% CI 1.03 – 2.19). Additional analysis of all 57 SNPs did not reveal other associations: Supplementary file 2 gives further details on the meta-analysis results and heterogeneity statistics. The association between the GRS and UL phenotype was significant (p=1.1 ×10−5; Wilcoxon rank-sum; adjusted p=1.1 × 10−4; one-tailed; W = 4.7 × 105) in NFBC. These case-control distributions of GRS are displayed in Figure 4.

The genomic risk score is elevated in patients with MED12-mutated lesions and in respect to the UL phenotype in the six follow-up cohorts.

On top, GRS association to MED12 mutation status. The rest show GRS association to the UL phenotype in six independent replication cohorts. Associations (P) and test statistics (W) are from Wilcoxon rank-sum tests. Only females were included as the control samples. The X-axes show the GRS distributions for each phenotype.

UL susceptibility is known to vary by ancestry (Wise et al., 2012). Five different ethnic groups - African, Caribbean, Irish, Indian and ‘other white’ background - were available from the UKBB cohort. A total of 2,212 UL cases and 21,054 female controls could be utilized for replication (Appendix 1—table 1). Supplementary file 3 includes all the 57 SNPs and their summary statistics in these five cohorts, together with heterogeneity estimates. Due to the small cohort sizes, none of the single-SNP associations passed genome-wide significance. The GRS model replicated with a significant phenotype association in all five ethnicities (Appendix 1-Table 6). A summary of test statistics, GRS distributions and the numbers of cases and controls for each population is given in Figure 4. A more detailed summary of the GRS model and receiver operating characteristic (ROC) curve of each cohort are given in Appendix 1—figure 5.

The self-reported ‘Black African’ (mean GRS 4.83) had an outstanding risk-load compared to Caucasian (self-reported ‘White Irish’; mean GRS 4.04) background (Figure 4; Wilcoxon rank-sum p<10−15). As expected (Wise et al., 2012), the African ethnicity displayed an increased prevalence (19%) compared to the Irish (6%). Assuming that the observed GRS weights have a linear relationship to the true risk, the GRS difference between African and Irish ancestries explains 9.0% of the increased prevalence in the African population.

Similar population-specific GRSs could be estimated for the seven populations in the gnomAD database (Appendix 1—table 7). Appendix 1—figure 6 shows an overview of the GRS for each of the populations. African ancestry has been shown to carry a two-to-three times higher prevalence when compared to Caucasian ancestry (Wise et al., 2012). Based on the gnomAD frequencies, the increased GRS of African ancestry explains between 8 – 16% of this population difference.

Association to clinical variables

The number of ULs per patient had a significant positive association to GRS (negative binomial regression p=0.001; adjusted p=0.0032; rate ratio 1.25; 95% CI 1.09 – 1.43 for one-unit increase in GRS; Appendix 1—figure 7). No association was found between GRS and age at hysterectomy (Appendix 1—table 6). Testing the 57 GRS SNPs separately did not reveal any associations that pass FDR (Appendix 1—table 5).

Association to MED12 mutated tumors

Our UL set of 1481 lesions included 1159 (78%) mutation-positive and 322 mutation-negative tumors. The occurrence of mutant tumors did not distribute evenly among the 457 patients. In total 221 (48%) and 123 (27%) patients had all their tumors identified as either MED12-mutation-positive or -negative, respectively, suggesting that genetic or environmental factors contribute to the preferred UL type in affected individuals, as previously observed (Mäkinen et al., 2011). Indeed, the 221 mutation positive patients were found to have a significantly higher GRS (Wilcoxon rank-sum p=7.9 × 10−4; adjusted p=0.0032; two-sided; W = 1.6 × 104). This difference in GRS distributions is visualized in Figure 4.

Comparison against the population controls (n = 8899 females) revealed that the above-mentioned patient groups differ by their effect size: the MED12-mutation-positive (221) subset of patients had an odds ratio of 2.28 for one-unit increase in GRS (95% CI 1.80 – 2.88) compared to the controls, while the mutation-negative (123) subset had an odds ratio of 1.20 (95% CI 0.88 – 1.66). Thus, the majority of the compiled case-control association signal had arisen from the MED12-mutation-positive subset of the patients.

The number of MED12-mutation-positive tumors per patient had a significant positive association to GRS (p=3.2 × 10−4; adjusted p=0.002; negative binomial model rate ratio 1.43; 95% CI 1.13 – 3.83 for one-unit increase in GRS; Appendix 1—figure 8). No association between the number of MED12-mutation-negative tumors and GRS was found (adjusted p=0.053; Appendix 1—figure 8).

The GWAS signal near MED12 was inspected for any associations to somatic MED12 mutations. Strikingly, the risk allele (rs5937008) did significantly increase the number of MED12-mutation-positive tumors (p=0.0087; negative binomial model rate ratio 1.23; 95% CI 1.05 – 1.44). Among our 457 patients, the median number of MED12-mutation-positive tumors increased from one to two for the risk allele carriers. The risk locus and its effect on the number of MED12-mutation-positive tumors is visualized in Figure 2. An additional analysis of each of the 57 GRS SNPs did not reveal any further associations (Appendix 1—table 5).

Association to gene expression

All the genome-wide significant SNPs from UKBB and the meta-analysis stage (altogether 1,540 SNPs) were tested with a permutation based approach. In total 34 and 24 genes passed the local permutation significance threshold (p<0.05) for tumor and matched myometrium data, respectively (Appendix 1—table 8). Among the hits in tumors were WNT4 (p=0.01; permutation test) and CDC42 (p=0.03) at 1 p, TNRC6B (p=0.02) at 22q, FOXO1 (p=0.03) at 13q, and DMRT1 (p=0.04) at 9 p. None of the local associations passed a genome-wide FDR of 10%. No significant association was observed between the risk allele and MED12 expression (rs5936989; Appendix 1—figure 11). The full list of eQTL statistics can be found from Supplementary file 4.

Association to DNA methylation

Our analysis of the 57 GRS SNPs revealed altogether 17,030 (9,466 in tumors and 7564 in matched myometrium) cis methylation quantitative trait loci (cis-meQTL) with nominal p<0.05. Of these, 145 passed a 10% FDR. Of the plausible predisposition genes, FOXO1, TERT and WNT4 showed significant meQTL associations (Appendix 1—table 9). All the cis-meQTLs and annotation for their genomic context are in Supplementary file 5.

Association to telomere length and structural variants

The UL predisposition loci at TERT, TERC and OBFC1 were examined for an effect on telomere length. Overall the telomere length was significantly shorter in tumors than in adjacent matched myometrium (p=0.01; Kruskal-Wallis), as previously reported (Rogalla et al., 1995; Bonatz et al., 1998). One of the risk alleles at TERT (rs2736100) was significantly associated with shorter telomere length (p=0.01; Kruskal-Wallis) (Appendix 1—figure 12). Adjusting for the patient age did not explain away the association. The association was not seen in myometrium. The other two LD-independent SNPs at TERT, rs72709458 and rs2853676, or the SNPs at TERC (rs10936600) and OBFC1 (rs1265164) did not show association to telomere length (p=0.24, p=0.57, p=0.07 and p=0.48, respectively; Kruskal-Wallis). The combined effect of TERT (rs72709458, rs2736100, rs2853676), TERC (rs10936600) and OBFC1 (rs1265164) had a negative trend with telomere length (p=0.055; linear model 95% CI −408.5 – 4.7 per one risk allele; see Appendix 1—figure 13). In whole genome sequencing data, no association was detected between genotype and the number of somatic structural variants.

Pathway enrichment

The DEPICT framework (Pers et al., 2015) was ran using the genome-wide significant SNPs from the UKBB cohort, in total 1,069/1,428 autosomal SNPs. The resulting target gene prioritization, pathway enrichment and tissue enrichment results are given in Supplementary file 6. The analysis did not reveal any significant enrichments with the exception of one pathway related to induced stress. ATM was the highest ranking target gene, and uterus/myometrium were among the highest ranking tissue types.

Previously proposed UL predisposition loci

Previous UL association studies (Cha et al., 2011; Zhang et al., 2015; Eggert et al., 2012; Hellwege et al., 2017) have reported altogether seven genome-wide significant UL susceptibility loci. Two out of the seven loci - that is, 22q13.1 (at TNRC6B) and 11p15.5 (at BET1L) - replicated in UKBB using 15,453 cases and 392,628 controls. Cha et al (Cha et al., 2011). highlight OBFC1 (at 10q24.33) as a candidate gene and, while the SNP that they reported does not replicate in UKBB, the OBFC1 region is identified in our discovery stage (rs1265164; Table 1). See Appendix 1—table 10 for a summary of these results.


The UK Biobank genotype-phenotype data revealed 22 novel predisposition loci for UL, most of them in close proximity to highly plausible predisposition genes. The combined UL risk of these loci was replicated in a subsequent analysis of the polygenic risk score (GRS) in six independent cohorts from different ethnic backgrounds. Our multi-ethnic replication implies that the discovered loci are indeed involved in UL development, and the early UL association studies have likely been underpowered to detect them. Three previously reported loci, at OBFC1 (Cha et al., 2011), TNRC6B (Cha et al., 2011; Edwards et al., 2013a; Aissani et al., 2015; Bondagji et al., 2017) and BET1L (Cha et al., 2011; Edwards et al., 2013b), were also validated, however, the mechanistic connection to UL development remains obscure for the latter two.

Though simple association is not sufficient to formally prove causality, 14 out of the 22 risk loci harbor plausible predisposition genes. These genes can be divided into two groups: TERT, TERC, OBFC1 (all involved in telomere length), ATM and TP53 guard stability of the genome. ESR1, GREB1, WT1, MED12, WNT4, FOXO1, DMRT1, SALL1, and CD44 play a role in genitourinary development.

Estrogen is a well-known inducer of UL growth (Borahay et al., 2015). The top association at 6q25.2 (rs58415480) resides within intron 107 of Spectrin Repeat Containing Nuclear Envelope Protein 1 (SYNE1), 130 kb downstream of ESR1, the latter being the only gene that resides completely within the topologically associating domain (TAD; Appendix 1—figure 1). While the role of estrogen in leiomyomagenesis has been firmly established, this is the first genetic evidence to this end. The lead SNP at 2 p resides in the third exon of the gene GREB1. GREB1 is an essential regulatory factor of ESR1 (Mohammed et al., 2013).

WT1, WNT4 and FOXO1 are central factors in uterine development and in the preparation for pregnancy (decidualization) in endometrium (Biason-Lauber and Konrad, 2008; Hill, 2018; Kaya Okur et al., 2016; Tamura et al., 2017). Perturbations in their function are known to have neoplastic potential. The strongest association at 11p13 (rs10835889) is 40 kb downstream of the closest gene WT1, at a region with enhancer activity (Appendix 1—figure 1). WT1 is a transcription factor that acts as both a tumor suppressor and an oncogene (Yang et al., 2007). The lead SNP at 1p36.12 (rs2235529) resides at the second intron of WNT4. The risk allele is associated with suggestive upregulation of WNT4 (Figure 5C). WNT4 is known to be overexpressed in uterine leiomyomas with MED12 mutations (Markowski et al., 2012), and knock-down of MED12 in UL cells reduces WNT4 expression (Al-Hendy et al., 2017). The risk locus in 1p36.12 was also associated with several meQTLs suggesting that methylation may have a role in WNT4 regulation (Figure 5). WNT4 encodes a signaling protein that has a crucial role in sex-determination (Vainio et al., 1999), and the WNT signaling pathway has a well-established role in various malignancies such as breast and ovarian cancer (Peltoketo et al., 2004). Of note, recent GWAS on gestational duration suggested that binding of the estrogen receptor at WNT4 is altered by rs3820282 (r2 = 0.92 with our lead SNP) (Zhang et al., 2017). Both WNT4 and FOXO1 are decidualization markers regulated by ESR1 (Kaya Okur et al., 2016). Though these considerations support WNT4 as a candidate predisposition gene at this locus, the near-by CDC42 has been shown to play a role in uterine pathology, in particular endometriosis (Powell et al., 2016), and should not be overlooked in further work.

Methylation and expression differences in WNT4.

(A), Hi-C, TADs and CpG methylation around the locus with an 1 Mb flank. The needle plot shows the meQTL associations (dashed lines at 10% FDR; green lines denote the two SNPs, rs2235529 and rs2092315; gray ticks denote all CpGs tested; blue needle for positive coefficient, red for negative coefficient) for tumors (above x-axis; nAA = 40, nAB = 15, nBB = 1 for rs2235529 and nAA = 32, nAB = 23 for rs2092315) and normals (below x-axis; nAA = 23, nAB = 12 and nAA = 17, nAB = 17, nBB = 1). (B), Methylation differences in tumors (n = 56) at CpG chr1:22456326 by SNP rs2235529. (C), WNT4 expression differences in tumors (n = 41) stratified by the rs12042083 genotype. B is the risk allele, and the P-value is corrected for local multiple testing (permutation based test).

Also MED12 has been implicated in uterine development in a mouse model (Wang et al., 2017a). DMRT1 is a transcription factor associated with male sex-development (Lindeman et al., 2015). CD44 is a plausible fibroid stem cell marker (Mas et al., 2015). Mutations in SALL1 and a deletion at the GWAS signal have been associated with Townes-Brocks syndrome, a condition associated with kidney malformations (Stevens and May, 2016). Thus genes involved in genitourinary development are strikingly associated with UL predisposition.

ATM, TP53, TERT, TERC and OBFC1 could be involved in uterine neoplasia predisposition through genetic instability and telomere maintenance. The lead SNP at 11q (rs141379009) resides in the 22nd intron of ATM, and the SNP at 17 p in the 3′-untranslated region of TP53. ATM and TP53 are involved in DNA damage response (Guleria and Chandna, 2016), and they are among the relatively few genes that have been found to be recurrently mutated in leiomyosarcoma (Lee et al., 2017). TERT and TERC encode subunits of the telomerase enzyme, which guards chromosomal stability by elongating telomeres (Blasco, 2005). In addition OBFC1 has been associated with telomere maintenance (Lee et al., 2013). TERT is expressed in germ cells as well as in many types of cancers (Blasco, 2005). The neoplasia predisposing effect of the risk alleles at the TERT locus (rs72709458; rs2736100; rs2853676) has been overwhelmingly documented (Appendix 1—table 11). Previous studies have reported contradicting observations on the effect of rs2736100 on telomere length (Liu et al., 2014; ENGAGE Consortium Telomere Group et al., 2014; Lan et al., 2013; Melin et al., 2012; Choi et al., 2015). ULs have been shown to display shortened telomeres (Rogalla et al., 1995; Bonatz et al., 1998), potentially provoking chromosomal instability as the lengths of chromosome telomeres are diminished. In our patient cohort, the risk allele at TERT (rs2736100) is significantly associated with shorter telomere length (Appendix 1—figure 12), whereas the combined effect of SNPs at TERT, TERC and OBFC1 did not reach statistical significance.

GRS associated merely with a susceptibility to the most common UL subtype, MED12 mutation positive tumors. Indeed it has been known that MED12-mutation-positive tumors do not distribute randomly among patients (Mäkinen et al., 2011), and our data provide at least a partial explanation to this intriguing finding. An outstanding susceptibility locus was identified 250 kb upstream of MED12: our in-house patient cohort - together with a mutation-screening of their 1481 tumors - revealed that the risk allele could facilitate selection of somatic MED12 mutations. It may be that environmental factors contribute more significantly to genesis of MED12 wild-type lesions. In our recent study this tumor type was associated with a history of pelvic inflammatory disease, and thus infectious agents could be one underlying factor (Heinonen et al., 2017). Obviously, also the power of GWAS to detect genetic associations to rare UL subtypes – such as the HMGA2 overexpressing or FH deficient subtypes – is reduced.

This work highlights several new genetic cornerstones of UL formation, highlights genitourinary development and maintenance of genomic stability as key processes associated with it, and represents another step towards a much-improved understanding of its molecular basis. The proposed risk score can stratify the female population to low and high-risk quartiles that differ by two-fold in their UL risk. The population-specific risk score was inflated towards the African and Caribbean cohorts, which connects the predisposition loci to the excess UL prevalence in these ethnicities. While the increased risk appears minor on an individual level, the population-level burden to women’s health arising from these risk loci is highly significant considering the incidence of the condition. Together with the recent progress in molecular tumor characterization and subclassification, the identification of the genetic components of UL predisposition should pave the way towards more sophisticated prevention and management strategies for these extremely common tumors. The risk SNP with the most immediate potential value is that at estrogen receptor alpha, and our findings should fuel much further work on the interplay between individual germline genetics, endogenous and exogenous hormonal exposure, and occurrence and growth rate of UL.

Materials and methods

Genome-wide association study

Request a detailed protocol

Figure 1 provides an outline of the four stages that were implemented. The discovery stage was conducted with UK Biobank resources (UKBB; project #32506; accessed on April 10, 2018). The resource included pre-imputed genotypes (version 3; March 2018) for a total of 487,409 samples (486,757 samples for the X chromosome) and 96 million SNPs. The background information on the imputation and data quality control (QC) can be found through the UKBB documentation (

The UL cases were identified on the basis of both the self-reported uterine leiomyoma (UL) phenotype (UKBB data-field 20002: Non-cancer illness code 1351) and International Classification of Diseases (ICD) codes (data-fields 41202 – 41205: Main and secondary diagnosis for ICD10 code D25 and ICD9 code 218). These phenotype data resulted in a total of 20,106 UL cases prior to any sample/genotype QC.

Sample QC was based on the UKBB annotation as follows. In total 409,692 samples passed the initial QC on ethnic grouping (UKBB data-field 22006): self-identified as ‘White British’, and similar genetic ancestry based on a principal component analysis (PCA) of the genotypes. Further sample QC excluded excess kinship (field 22021; 408,797 samples passed), sex-chromosome aneuploidy (field 22019; 408,241) and inconsistent gender (fields 31 and 22001, and one male with self-reported ULs; 408,081). In total 15,453 UL cases and 392,628 population-matched controls (205,157 females and 187,471 males) passed all these criteria.

Raw genotype calls (UKBB version 2; Affymetrix UK BiLEVE Axiom, or Affymetrix UKBB Axiom array) were available for 805,426 SNPs: after filtering out low genotyping rate (<95%), Hardy-Weinberg equilibrium (p<10−10) and minor allele frequency (MAF) <0.001, the remaining 611,887 autosomal genotypes were used to train the mixed model for association testing. Imputed SNPs with MAF <0.001 and imputation score (INFO) <0.3 were excluded. Further SNPs were excluded due to imputation panel differences between cohorts, and the remaining 8.3 million SNPs (Haplotype Reference Consortium, HRC1.1 panel) were tested for case-control association with BoltLMM (version 2.3.2) (Loh et al., 2015). The default linear, infinitesimal mixed model was used to adjust for any underlying population structure. The model included categorical covariates for the 22 UK Biobank assessment centres and two genotyping arrays.


Request a detailed protocol

The second stage meta-analysis utilized the genome-wide summary statistics from UKBB and the Helsinki cohort of 457 UL cases and 15,943 controls. Details on the Helsinki cohort’s imputation, sample and genotype QC are given in the Supplementary Methods. A total of 8.3 million SNPs passed imputation QC and were utilized in the meta-analysis with PLINK (version 1.90b3i) (Chang et al., 2015).


Request a detailed protocol

The SNPs were tested for association in six independent cohorts: Northern Finland Birth Cohort (NFBC) and five non-overlapping subsets of UKBB. In addition to the single-SNP association tests, a polygenic risk score (Abraham and Inouye, 2015Abraham and Inouye, 2015) was compiled as follows. The genomic risk score (GRS) was computed as a sum over SNP dosages weighted by their observed log-odds: LD pruning (r2 ≤0.3) was applied in the order of UKBB association, and the remaining, genome-wide significant SNPs were chosen for the GRS. The log-odds weights were taken from the UKBB statistics (i.e. logarithm of the dosage-based ORs). The resulting GRS model was evaluated using R (3.3.1) and the packages PredictABEL (1.2 – 2) and MASS (7.3 – 45).

The Northern Finland Birth Cohort (NFBC) had in total 459 UL cases and 4943 controls; details of the imputation, sample and genotype QC are given in the Supplementary Methods.

Five non-overlapping, self-reported population-strata were available from UKBB (data-field 21000) and could be utilized as an independent replication: the five self-reported ancestries were ‘Black African’, ‘Black Caribbean’, ‘Indian’, ‘White Irish’ and ‘Other white background’. Sample QC excluded excess kinship (field 22021), sex-chromosome aneuploidy (field 22019) and inconsistent gender (fields 31 and 22001). The numbers of cases and controls that passed the sample QC can be found in Figure 1. A summary of background variables is given in Appendix 1—table 1. These five sample subsets did not overlap with the discovery GWAS individuals. A collection of ancestry-informative genotypes was utilized to assess the genetic homogeneity of each of the self-reported ancestry (details in Supplementary Methods).

Patient and tumor material

Request a detailed protocol

Our in-house patient and tumor data were investigated regarding the identified risk loci. All tumors of ≥1 cm diameter had been harvested and stored fresh-frozen (details in Supplementary Methods). MED12 mutations were screened by Sanger sequencing the MED12 exons 1 and 2 and their flanking sequences (60 bp) from all uterine leiomyoma and matching normal myometrium samples (Mäkinen et al., 2011; Heinonen et al., 2014). The resulting sequence graphs were inspected manually and with Mutation Surveyor software (Softgenetics, State College, PA). Clinical patient data was available for the number of ULs, menopause status, parity, body mass index (BMI) and age at hysterectomy (Appendix 1—figure 2). This study was conducted in accordance with the Declaration of Helsinki and approved by the Finnish National Supervisory Authority for Welfare and Health, National Institute for Health and Welfare (THL/151/5.05.00/2017), and the Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS/177/13/03/03/2016).

Expression quantitative trait loci analysis

Request a detailed protocol

For the cis expression quantitative trait loci (cis-eQTL) analysis, genes with less than six reads in over 80% of the samples were filtered out. The between sample normalization was done with Relative Log Expression (RLE) normalization and each gene was inverse normal transformed. The eQTL analysis was run with FastQTL (version 2.184) (Ongen et al., 2016) separately for 60 tumors and 56 patient-matched unaffected, adjacent myometrium samples using permutation approach. The permutation parameter was set to ‘1000 10000’. Sequencing batch was used as a covariate. The cis-region was set to be 2 Mb. FDR correction was applied for tumors and matched myometrium separately.

Methylation quantitative trait loci analysis

Request a detailed protocol

DNA methylation was studied in 56 tumors and 36 matched myometrium samples. The methylation calls were analyzed with bsseq (version 1.12.2) (Hansen et al., 2012). Only the methylation in CpG context was considered. Every locus was required to have the coverage of ≥2 in at least 90% of samples. The association between methylation and genotype was studied with MatrixEQTL (version 2.1.1) using a linear regression model (Shabalin, 2012). The LD-independent (r2 ≤0.3) SNPs from the discovery stage (Appendix 1—table 2) and meta-analysis (Appendix 1—table 3) were considered (altogether 57 SNPs). The SNPs with MAF <0.05 in the methylation samples were filtered out. This resulted in 44 SNPs in tumors and 45 SNPs in matched myometrium. Cis methylation quantitative trait loci (cis-meQTL) was determined to be within 1 Mb flank from the SNP of interest. To annotate the CpGs with genomic context, the overlap between UCSC’s gene track (hg19) and known CpG islands was studied. As the role of promoter methylation is well known, promoter methylation was studied in addition to gene body methylation. Core promoter was defined as a region −2 kb and +1 kb from the transcription start site. The methylation analysis was performed separately for tumors and matched normal myometrium to study whether the changes in methylation could be observed in both tissues.

Whole genome analysis

Request a detailed protocol

The whole genome sequenced (WGS) samples, in total 71 tumors (48 Illumina, 23 Complete Genomics) and 51 matched myometrium samples (28 Illumina and 23 Complete Genomics), were prepared following Illumina and Complete Genomics protocols and processed as described previously (Mehine et al., 2013). Structural variation was defined as a structural rearrangement (e.g deletion, inversion or translocation) not detectable in matched normal myometrium. Structural variation was detected as described in Mehine et al. (Mehine et al., 2013) The mean telomere length was estimated for Illumina samples using Computel (version 0.3) (Nersisyan and Arakelyan, 2015) with the default settings. Clonally related tumors were excluded from the analysis by randomly sampling one tumor to represent each clonally related tumor group. Clonally related tumors had identical changes in driver genes and shared at least a subset of somatic copy-number changes and/or copy neutral loss of heterozygosity (see Mehine et al. (Mehine et al., 2015) for further details in identification of the clonally related tumors). Kruskal-Wallis test was used to assess the telomere length differences between tumors and matched myometrium as well as between genotypes. Linear model was used to calculate the association between number of risk alleles and telomere length.

Pathway enrichment

Request a detailed protocol

Pathway enrichment of all genome-wide significant SNPs was tested with DEPICT (version 1 release 194) following the default settings (Pers et al., 2015). The tool is designed to integrate multiple GWAS loci for in silico target gene prioritization, pathway enrichment and tissue-specific expression profiling. In short, the DEPICT framework combines phenotype-free co-expression networks, predefined pathways and protein-protein interaction networks in order to reveal functionally connected genes among the multiple risk loci. The tool is restricted to autosomal SNPs.

Statistical analysis

Request a detailed protocol

Meta-analysis was implemented with an inverse-variance weighted, fixed effect model. Associations between the risk alleles and other variables were tested assuming an additive genotype model unless otherwise noted. The DHARMa (0.1.5) package was applied to evaluate the goodness-of-fit of the binomial and negative binomial models. The contribution of GRS to prevalence was estimated by [(Ea/Pi-1)/(Pa/Pi-1)], where Ea = Pi*GRSa/GRSi assumes a linear relationship between GRS and the true risk, and Px and GRSx are the population-specific prevalence and mean GRS, respectively. All statistical tests were two-tailed unless otherwise noted.

Summary statistics were collected from each of the study stages and are available as Appendix 1—table 2 and Supplementary file 13. For each SNP, we report its allele frequency, effect size estimates and association based on the default linear, infinitesimal mixed model. For the meta-analysis stages, we also report the Cochrane’s Q statistic and I2 heterogeneity index in addition to the fixed-effects meta-analysis association and effect size (random-effects meta-analysis is included for reference). BoltLMM reported lambda (λGC) 1.055, 1.045 and 1.016 for UKBB, Helsinki and NFBC, respectively. The X chromosome associations were processed separately and included only the female controls (λGC 1.052, 1.049 and 1.005 for UKBB, Helsinki and NFBC, respectively).

For GWAS, p<5 × 10−8 was reported as significant. The GRS association tests (Appendix 1—table 6) were controlled for family-wise error rate (FWER) and reported significant for Holm-Bonferroni adjusted p<0.05. Large families of association tests were controlled for false discovery rate (FDR; Benjamini-Hochberg method) and noted significant at FDR < 10%. In the six telomere length association tests and the two structural variation association tests, p<0.05 was considered statistically significant.

Appendix 1

Supplementary methods

UK Biobank

The UK Biobank (UKBB) individuals were divided into six distinct subsets based on their self-reported ancestry. A summary of the background variables for each of the six ancestries is given in Appendix 1—table 1.

Sample QC for the discovery subset was readily available from the UKBB annotation (data-field 22006): self-identified as ‘White British’, and similar genetic ancestry based on a principal components analysis (PCA) of the genotypes.

The small replication cohorts were assessed for genetic homogeneity based on PCA of ancestry informative markers as follows. We pooled together 21 panels of ancestry informative markers - in total 1396 unique, autosomal SNPs, see Soundararajan et al (Soundararajan et al., 2016). for references - and compared the resulting PCs against the self-reported ancestry information. As expected, the genetic differences at these informative markers separated each self-reported ancestry into a distinct, homogeneous cluster. See Appendix 1—figure 4 for the resulting clustering of the follow-up cohorts.

Helsinki cohort

Our patient cohort (Helsinki cohort) was collected in accordance with the Declaration of Helsinki and approved by the Finnish National Supervisory Authority for Welfare and Health, National Institute for Health and Welfare (THL/151/5.05.00/2017), and the Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS/177/13/03/03/2016). In total 1577 uterine leiomyoma and corresponding normal myometrial samples were collected as fresh-frozen from 480 patients undergoing hysterectomy as previously described (Mäkinen et al., 2011; Heinonen et al., 2014; Heinonen et al., 2017). See below details on numbers of samples that passed imputation QC.

The number of ULs per patient was determined by the number of ULs harvested from the hysterectomy specimens. The study materials were derived from six different tissue collections, one consisting of anonymous patients and the other five collections consisting of patients who signed an informed consent before entering the study. Pathologists dissected the hysterectomy specimens and collected all visible tumours from each patient, with the exception of one tissue collection in which all feasible distinct tumours ≥ 1 cm in diameter were harvested. The smallest lesion used in this study had a diameter of 4 mm. All the specimens underwent routine diagnostic pathological scrutiny, and the histopathological diagnosis for the study samples was retrieved from the pathology reports.

Population-matched control data were obtained from the National FINRISK Study containing 16,048 genotyped individuals ( studies/the-national-finrisk-study).

Northern Finland cohort

The Northern Finland Birth Cohort (NFBC) is a prospective collection of Oulu and Lapland region individuals born in 1966. For further validation of our results, the NFBC Project Center (University of Oulu) provided phenotype information on both clinical ICD disease records and 46 year follow-up questionnaires. The genotyped individuals (dbGaP Study Accession: phs000276.v2.p1) had in total 459 UL cases and 4943 population-matched controls.

Genotyping arrays

The Helsinki cohort was genotyped with Illumina Infinium HumanCore-24 BeadChip. The control samples in the Helsinki cohort were genotyped with the Illumina HumanCoreExome SNP array. The NFBC individuals were genotyped with the Illumina Infinium SNP array. All genomic coordinates follow GRCh37 and dbSNP build 147.

SNP array data processing and imputation

The Helsinki cohort B-allele frequencies and log-R ratios were extracted with Illumina GenomeStudio software, and somatic allelic imbalance (AI) regions were calculated for all tumors using BAFsegmentation (Kim et al., 2015) with default parameters.

Quality control (QC) was implemented using PLINK (v1.90b3i; The Helsinki cohort was inspected for outliers, close relatedness and low genotyping rate: 457 UL cases (representing 1481 tumors collected) and 15,943 controls passed the initial genotyping control. Further genotype QC was implemented to exclude SNPs with low genotyping rate (<95%), excess homozygosity (i.e. homozygotes that exceed respective heterozygotes), rare homozygotes (minor allele frequency, MAF <0.02), Hardy-Weinberg equilibrium (p<10−3) or incorrect strand assignment based on LD. The remaining 211,967 SNPs were imputed using the Haplotype Reference Consortium panel (HRC; release 1.1) at the Sanger Imputation service (EAGLE2 +PBWT; All the reported alleles follow the strand orientation in HRC1.1 and GRCh37 coordinates. The same quality controls were applied to the NFBC data: all 5402 samples passed QC, and the GRS SNPs were imputed following the same process as for the Helsinki cohort. Post-imputation QC of Helsinki and NFBC data excluded SNPs with MAF <0.005 and imputation score (INFO) <0.4. A total of 8.3 million SNPs passed imputation quality.

Clinical background information

Clinical data were collected based on a retrospective review of medical records of the Helsinki study subjects. Details are provided in Heinonen et al (Heinonen et al., 2017). Information on parity, BMI, number of leiomyomas and age at hysterectomy were quantified for a total of n = 367, 366, 457 and 392 patients, respectively. Menopausal status was recorded for 367 patients as either pre, post or current HRT (hormone replacement therapy). Appendix 1—figure 2 shows the summary statistics of each background variable.

RNA sequencing

RNA-seq libraries were prepared according to the standard quality requirements for Illumina TruSeq Stranded total-RNA (RiboZero) kit. Paired-end Illumina (HiSeq2500) sequencing produced around 60 to 70 million 2 × 125 bp reads per sample. Data were aligned to human reference transcript (GRCh37) with HISAT2 (2.1.0) using parameter -dta and setting rna-strandness to RF (Kim et al., 2015). The alignments were assembled with StringTie (1.3.3b) using parameters -e and -rf (Pertea et al., 2015). Data quality was controlled by checking HISAT2 mapping statistics, ribosomal RNA contamination (based on Ensembl annotation release 75) and batch effects (principal components analysis).

DNA methylation

The SureSelect target enrichment system (Agilent Technologies, Inc., CA, USA) covering 84.5 Mb of the genome was used to prepare bisulfite-sequencing samples. Sample preparations were done according to the manufacturer’s instructions. Illumina paired-end sequencing of 56 tumor and 36 normal samples was done in Karolinska Institutet (Sweden) using 100 bp read length and the HiSeq2000 platform.

Raw sequencing reads were quality and adapter trimmed with cutadapt version 1.3 in Trim Galore. Low-quality ends trimming was done using Phred score cutoff 30. Adapter trimming was performed using the first 13 bp of the standard Illumina paired-end adapters with stringency overlap two and error rate 0.1. Read alignment was done against hg19/GRCh37 reference genome downloaded from UCSC Genome Browser with Bismark (version v0.10.0) (Krueger and Andrews, 2011) and Bowtie 2 (version 2.0.0-beta6) (Langmead and Salzberg, 2012). Duplicates were removed using the Bismark deduplicate function. Extraction of methylation calls was done with Bismark methylation extractor discarding the first 10 bp of both reads and reading methylation calls of overlapping parts of the paired reads from the first read (--no_overlap parameter).

LocusZoom visualization and ENCODE tracks

LocusZoom ( plots include the ENCODE tracks for DNase I hypersensitive sites for hTERT-HM (ENCFF001SPI, ENCFF001UXF) and uterus (ENCFF689EGI). RAMPAGE tracks were included for uterus (ENCFF979EGO), myometrium (ENCFF605TKQ) and endometrial microvascular endothelial cells (ENCFF440YZN) experiments. Topologically associating domains from endometrial microvascular endothelial cells (HiC; ENCFF633ORE) and H3K27ac ChIP-seq from uterus (stable peaks; ENCFF045LNJ) were included. Additional uterus-specific ChIP-seq data for CTCF (ENCFF282BOE, ENCFF634DDY) and POLR2A (ENCFF822OTY, ENCFF164YIY) were also included. Hi-C figures were produced with the ‘3D genome browser’ (; GM12878) (Wang et al., 2017b).

Appendix 1—table 1
Summary of the UKBB cohort individuals.

The UKBB cohort was split into six disjoint, self-reported ancestries. For each ancestry, we summarized the background information for the phenotype (number of UL cases and female controls), proportion of cases (%), age at first assessment visit (mean and SD), number of live births (mean and SD), body mass index (BMI; mean and SD), proportion of hysterectomy cases (%) and age at hysterectomy (mean and SD).
Age at first assessment visitNumber of live birthsBMIAge at hysterectomy
Self-reported ancestryPhenotypeNCases (%)MeanSDMeanSDMeanSDEver had hyst. (%)MeanSD
White Britishcases154537.
Black Africancases29619.
Asian Indiancases2037.353.
Other whitecases6476.754.
Appendix 1—table 2
Discovery stage GWAS for the UKBB cohort.

The information for B allele frequency (B Freq), odds-ratio (OR) and association (Beta; standard error of Beta; χ2 and P) were collected from the UKBB cohort. All genome-wide significant, LD-independent (r2 ≤0.3 pruned in order of UKBB association) SNPs that passed imputation QC are shown. SNPs with r2 = NA are the lead-SNPs of each distinct locus. The A and B alleles are on GRCh37 forward strand. OR was computed from SNP dosages and B as the effect allele. Rows are sorted by genomic position.
rs-codeChrPositionABr2 (reference
B freqORBetaSEχ2P
rs273610051286516CA0.23 (rs72709458)0.4980.910.0034.23E-0460.8796.1E-15
rs285367651288547TC0.23 (rs2736100)0.7310.910.0034.77E-0449.7611.7E-12
rs48700846152543949CT0.29 (rs6904757)0.1890.920.0035.44E-0432.9419.5E-09
rs69283636152546094GA0.17 (rs58415480)0.4850.920.0034.24E-0446.8497.7E-12
rs755102046152592680TG0.07 (rs58415480)0.0121.29−0.0122.07E-0332.6641.1E-08
rs69047576152593102AG0.08 (rs6928363)0.3630.930.0034.42E-0441.0351.5E-10
rs1444445836152684585TC0.30 (rs58415480)0.1281.12−0.0046.37E-0445.2861.7E-11
rs1388210789674217CG0.13 (rs10975820)0.0211.23−0.0081.50E-0331.8191.7E-08
rs109758209684160GA0.00 (rs7027685)0.1421.12−0.0046.08E-0448.6023.1E-12
rs1146803319815682TC0.12 (rs7027685)0.1001.11−0.0047.21E-0435.5772.5E-09
rs47424489826585CG0.28 (rs7027685)0.4581.07−0.0034.33E-0433.7146.4E-09
rs22771639827224AG0.07 (rs7027685)0.9470.870.0059.51E-0433.2118.3E-09
rs1124600311213723TG0.00 (rs507139)0.0440.850.0061.03E-0332.3371.3E-08
rs22075481132368744CA0.24 (rs10835889)0.4231.09−0.0034.30E-0462.3602.9E-15
rs71204831132406983GC0.30 (rs10835889)0.1201.12−0.0046.51E-0444.4812.6E-11
rs110317831132459923CA0.28 (rs10835889)0.2041.09−0.0035.25E-0436.6871.4E-09
rs5902156511107999907CG0.27 (rs141379009)0.0911.11−0.0047.38E-0430.8822.7E-08
rs498802311108168995AC0.00 (rs141379009)0.1440.890.0046.01E-0443.6993.8E-11
rs1222338111108354102CT0.12 (rs59021565)0.4061.07−0.0024.31E-0430.2343.8E-08
rs96694031246798900GA0.28 (rs12832777)0.4021.07−0.0034.35E-0436.9981.2E-09
rs1172457331340723944GA0.00 (rs7986407)0.0161.26−0.0101.74E-0334.5194.2E-09
rs5936989X70022420TA0.27 (rs5937008)0.7820.920.0059.30E-0432.6061.1E-08
rs7059898X70149078CA0.27 (rs5937008)0.3591.07−0.0058.11E-0431.7561.7E-08
rs7888560X131171122AG0.19 (rs5930554)0.2281.08−0.0059.16E-0431.8301.7E-08
rs5933158X131578034AG0.21 (rs5930554)0.5861.08−0.0057.98E-0438.8334.6E-10
rs5975338X131626317AG0.28 (rs5930554)0.1291.10−0.0061.14E-0332.2011.4E-08
Appendix 1—table 3
Meta-analysis of the UKBB and Helsinki cohorts.

All the genome-wide significant, LD-independent (r2 ≤0.3) associations from the meta-analysis stage. Seven SNPs were LD-independent when compared to the discovery stage SNPs, and rs117245733 is shown for reference. The numbers for regression coefficients (Beta), standard error of beta (SE) and association (P) were collected from the UKBB and Helsinki summary statistics and their fixed effect model meta-analysis. The bolded SNP was the only one to reach a suggestive association (p<10−5) in both cohorts.
UKBB cohortHelsinki cohortMeta-analysis (fixed eff.)
rs-codeChrPositionABr2 (reference
rs67751869454568834TC0.14 (rs62323680)−0.0050.00099.9E-08−0.0110.00393.7E-03−0.0050.00095.05E-09
rs69016316152567047TC0.26 (rs6904757)0.0030.00065.4E-080.0050.00311.2E-010.0030.00061.76E-08
rs117904089876418GT0.10 (rs4742448)0.0020.00046.5E-080.0020.00193.9E-010.0020.00044.89E-08
rs1172457331340723944GA0.00 (rs7986407)−0.0100.00174.2E-09−0.0240.00538.1E-06−0.0120.00173.18E-12
Appendix 1—table 4
Genomic risk score.

Summary of the GRS and its weights based on the discovery and meta-analysis stages. Dosage-based odds-ratios (OR) and log-odds were collected from the UKBB summary statistics. The A and B alleles are on GRCh37 forward strand, and the B allele is the effect allele. In total 50 SNPs from stage 1 and seven SNPs from stage 2.
rs-codeChrPositionABORLog-oddsStage *
rs2235529122450487CT1.140.132Stage 1
rs2092315122507684CT1.070.072Stage 1
rs10929757211702661AC1.080.073Stage 1
rs11674184211721535TG0.94−0.067Stage 1
rs109366003169514585AT0.91−0.095Stage 1
rs1438352933197623337AG1.750.558Stage 1
rs62323680454546192GA1.160.153Stage 1
rs2202282470634441CT1.070.071Stage 1
rs7270945851283755CT1.120.111Stage 1
rs273610051286516CA0.91−0.090Stage 1
rs285367651288547TC0.91−0.089Stage 1
rs24561815176450837CG1.070.066Stage 1
rs48700846152543949CT0.92−0.087Stage 1
rs69283636152546094GA0.92−0.078Stage 1
rs584154806152562271CG1.180.167Stage 1
rs755102046152592680TG1.290.257Stage 1
rs69047576152593102AG0.93−0.078Stage 1
rs1444445836152684585TC1.120.111Stage 1
rs1388210789674217CG1.230.205Stage 1
rs109758209684160GA1.120.112Stage 1
rs70276859802228AT1.110.103Stage 1
rs1146803319815682TC1.110.108Stage 1
rs47424489826585CG1.070.066Stage 1
rs22771639827224AG0.87−0.140Stage 1
rs126516410105674854AG0.91−0.097Stage 1
rs1124600311213723TG0.85−0.167Stage 1
rs50713911225196GA0.84−0.171Stage 1
rs22075481132368744CA1.090.091Stage 1
rs108358891132370380GA1.140.133Stage 1
rs71204831132406983GC1.120.112Stage 1
rs110317831132459923CA1.090.084Stage 1
rs25537721135085453TG1.070.069Stage 1
rs5902156511107999907CG1.110.109Stage 1
rs14137900911108149207TG1.320.281Stage 1
rs498802311108168995AC0.89−0.113Stage 1
rs1222338111108354102CT1.070.064Stage 1
rs96694031246798900GA1.070.071Stage 1
rs128327771246831129TC1.090.086Stage 1
rs1172457331340723944GA1.260.231Stage 1
rs79864071341179798AG1.090.084Stage 1
rs669982221651481596GA0.91−0.098Stage 1
rs78378222177571752TG1.530.427Stage 1
rs7333812240669648AG1.100.091Stage 1
rs5936989X70022420TA0.92−0.081Stage 1
rs5937008X70093038CT0.91−0.094Stage 1
rs7059898X70149078CA1.070.068Stage 1
rs7888560X131171122AG1.080.077Stage 1
rs5930554X131312089TC1.140.129Stage 1
rs5933158X131578034AG1.080.074Stage 1
rs5975338X131626317AG1.100.094Stage 1
rs17631680267090367TC0.90−0.102Stage 2
rs17355373128122820TC1.070.071Stage 2
rs67751869454568834TC1.130.121Stage 2
rs69016316152567047TC0.91−0.097Stage 2
rs117904089876418GT0.94−0.063Stage 2
rs104153911922652436CT1.100.098Stage 2
rs621328011949267882AT0.90−0.105Stage 2
  1. * Discovered in stage 1 (UKBB GWAS) or in stage 2 (meta-analysis of UKBB and Helsinki)

Appendix 1—table 5
Summary of association tests for SNPs.

Each of the 57 GRS SNPs was tested for an additive effect to age at hysterectomy, degree of somatic allelic imbalance (AI) and tumor counts. Somatic allele imbalance was defined as the mean of the length of somatic allelic imbalance over all tumors of a patient. Tests of MED12 mutation positive and negative tumors are denoted by MED12mut + and MED12mut-, respectively. The numbers for regression coefficient (Beta), standard error of beta (SE), test statistic (z) and association (P) were taken by fitting either a linear regression or negative binomial (NB) regression model (response ~predictor). The nominal P-values were adjusted for FDR (Q). The risk allele was used as the effect allele for Beta. In total 228 tests, all p<0.05 are shown.
12:46798900:G:AMED12mut + countNB−0.220.08−2.880.0040.51
9:674217:C:GMED12mut + countNB−0.690.26−2.630.0080.51
X:70093038:C:TMED12mut + countNB0.210.082.620.0090.51
X:70022420:T:AMED12mut + countNB0.200.082.560.0110.51
11:32459923:C:AMED12mut + countNB0.230.092.540.0110.51
4:54568834:T:Clog(somatic AI basepairs)Linear−0.250.10−2.480.0130.51
17:7571752:T:GMED12mut- countNB−0.920.41−2.240.0250.56
6:152546094:G:AMED12mut- countNB−0.190.08−2.240.0250.56
11:107999907:C:GAge at hysterectomyLinear3.001.342.240.0260.56
13:40723944:G:AMED12mut + countNB0.360.172.170.0300.56
22:40669648:A:GMED12mut + countNB0.
5:1286516:C:AMED12mut + countNB0.
22:40669648:A:GAge at hysterectomyLinear1.360.632.160.0320.56
1:22450487:C:TAge at hysterectomyLinear−1.480.72−2.050.0410.65
16:51481596:G:Alog(somatic AI basepairs)Linear0.
5:1288547:T:CAge at hysterectomyLinear1.320.662.000.0460.65
Appendix 1—table 6
Summary of all the association tests for GRS.

All GRS related tests from the main text. The notation of MED12mut + and MED12mut- refer to the numbers of MED12-mutation-positive and -negative tumors, respectively. The tests include Wilcoxon rank-sum and models for linear and negative binomial (NB) regression (variable ~GRS). The P values were adjusted for FWER with the Holm-Bonferroni method (Q). Significant associations (Q < 0.05) are shown bolded. Note that population association tests include only the female controls.
GRS *CohortVariableTestN casesN controlsRate ratioPQ
 Stage 1HelsinkiUL phenotypeRank-sum (one-tailed)4578899-8.3e-101.1e-08
 Stage 2NFBCUL phenotypeRank-sum (one-tailed)4592351-1.1e-051.1e-04
 Stage 2HelsinkiTotal number of ULsNB457-1.250.001050.0032
 Stage 2HelsinkiAge at hysterectomyLinear392-0.500.480.48
 Stage 2HelsinkiNumber of MED12mut+NB457-1.433.2e-040.002
 Stage 2HelsinkiNumber of MED12mut-NB457-0.790.02660.053
 Stage 2HelsinkiOne-or-more MED12mut+Rank-sum334123-5.3e-040.0026
 Stage 2HelsinkiAll MED12mut+Rank-sum221123-7.9e-040.0032
 Stage 2African #UL phenotypeRank-sum (one-tailed)2961256-1.3e-051.2e-04
 Stage 2Caribbean#UL phenotypeRank-sum (one-tailed)6682041-6.7e-055.4e-04
 Stage 2Irish #UL phenotypeRank-sum (one-tailed)3986208-8.3e-069.1e-05
 Stage 2Indian #UL phenotypeRank-sum (one-tailed)2032567-2.9e-040.0020
 Stage 2Other white #UL phenotypeRank-sum (one-tailed)6478982-6.9e-098.3e-08
  1. *Based either on stage 1 (UKBB GWAS) or stage 2 (meta-analysis of UKBB and Helsinki)

    # An independent subset of UKBB data (stratified based on self-reported UKBB annotation).

Appendix 1—table 7
Population specific estimates for GRS.

Allele frequencies were collected from gnomAD (version 2.0.1). GRS weights were taken from UKBB summary statistics. The estimate for population specific GRS follows from weighting the allele frequencies with log-odds. The final values were scaled with a factor two to make them comparable with SNP dosage-based GRS. See Appendix-figure 6 for an explanation of the population acronyms.
Appendix 1—table 8
eQTLs in tumor and normal tissue.

Here, B is the effect allele. All local permutation p<0.05 are shown. Full table of eQTLs is given in Supplementary file 4.
TissueGeneIDNBest SNPDistanceNominal
Appendix 1—table 9
cis-meQTLs in candidate genes where FDR < 0.1.

The statistics were calculated with an additive linear model. Here B is used as the effect allele. Full Table of cis-meQTLs with nominal significance (p-value<0.05) is given in Supplementary file 5.
TissueSNPCpG*Betat-statp-valueFDRGeneCpG islandPromoterAAABBB
  1. * Minimum CpG coverage of ≥2 required in ≥90% of samples. CpG in 1 Mb flank from the SNP.

Appendix 1—table 10
Replication of previously suggested UL predisposition loci.

For study details, see the literature references in the main text. The population, locus, gene and risk allele (RA) information were collected from the original studies. The associations, P (UKBB) column, were taken from the UKBB summary statistics for 15,453 UL cases. The bolded values pass FWER (Bonferroni for seven independent loci; p<0.05/7).
StudyPopulationLocusSuggested geneSNPRAMethodP (UKBB)
Cha et al.Japanese22q13.1TNRC6Brs12484776GGWAS1.0E-10
Edwards et al.European Americans22q13.1TNRC6Brs12484776GGWAS1.0E-10
Cha et al.Japanese11p15.5BET1Lrs2280543GGWAS2.2E-08
Edwards et al.European Americans11p15.5BET1Lrs2280543GGWAS2.2E-08
Zhang et al.African Americans1q42.2PCNXL2rs7546784-Admixture3.6E-02
Zhang et al.African Americans2q32.2PMS1rs256552-Admixture1.8E-01
Cha et al.Japanese10q24.33SLKrs7913069AGWAS3.4E-01
Hellwege et al.African American22q13.1CYTH4rs5995416CGWAS5.3E-01
Hellwege et al.African American22q13.1CYTH4rs739187CGWAS6.2E-01
Hellwege et al.African American22q13.1CYTH4rs713939CGWAS8.0E-01
Hellwege et al.African American22q13.1CYTH4rs4821628GGWAS8.1E-01
Eggert et al.Multiple17q25.3FASNrs4247357ALinkage
and GWAS
Appendix 1—table 11
GWAS catalog and references to earlier literature on GRS SNPs.

The GRS SNPs rs10936600, rs11674184, rs2235529, rs2736100, rs2853676, rs72709458 and rs78378222 were found in the GWAS catalog (version 1.0.1; The numbers for allele frequency (AF), odds-ratio (OR) and association (P) follow those reported in the GWAS catalog.
PubmedDateJournalGene symbolRisk alleleAFORP
188358602008-10-01J Med GenetIdiopathic pulmonary fibrosisTERTrs2736100-A0.412.113.00E-08
195783672009-07-05Nat GenetGliomaTERTrs2736100-G0.491.272.00E-17
195783672009-07-05Nat GenetGliomaTERTrs2853676-A0.731.264.00E-14
198360082009-10-15Am J Hum GenetLung adenocarcinomaTERTrs2736100-G0.51.122.00E-10
201399782010-02-07Nat GenetRed blood cell countTERTrs2736100-G0.40.073.00E-08
205438472010-06-13Nat GenetTesticular germ cell cancerTERTrs2736100-T0.491.338.00E-15
207004382010-08-05PLoS GenetLung adenocarcinomaTERTrs2736100-G0.391.462.00E-22
208715972010-09-26Nat GenetLung adenocarcinomaTERTrs2736100-C0.391.273.00E-11
215317912011-04-29Hum Mol GenetGliomaTERTrs2736100-?-1.251.00E-14
217253082011-07-03Nat GenetLung cancerTERTrs2736100-C0.411.271.00E-27
218276602011-08-09BMC Med GenomicsGliomaTERTrs2736100-?--7.00E-09
219463512011-09-25Nat GenetBasal cell carcinomaTP53rs78378222-C-2.162.00E-20
228865592012-08-11Hum GenetGliomaTERTrs2736100-G0.4941.304.00E-09
231436012012-11-11Nat GenetLung cancerTERTrs2736100-G0.41.384.00E-27
234721652013-03-05PLoS OneEndometriosisWNT4rs2235529-T0.1521.287.00E-09
234721652013-03-05PLoS OneEndometriosisWNT4rs2235529-C0.7091.196.00E-06
234721652013-03-05PLoS OneEndometriosisWNT4rs2235529-T0.1341.258.00E-07
234721652013-03-05PLoS OneEndometriosisWNT4rs2235529-A0.1531.303.00E-09
235839802013-04-14Nat GenetInterstitial lung diseaseTERTrs2736100-A0.491.372.00E-19
244030522014-01-08Hum Mol GenetBasal cell carcinomaTP53rs78378222-G-2.244.00E-22
244654732014-01-21PLoS OneTelomere lengthTERTrs2736100-C0.084.00E-06
249082482014-06-08Nat GenetGlioma (high-grade)TERTrs2736100-C0.511.391.00E-15
 249454042014-06-19PLoS GenetBone mineral density (paediatric, upper limb)WNT4rs2235529-C0.850.121.00E-08
 249454042014-06-19PLoS GenetBone mineral density (paediatric, upper limb)WNT4rs2235529-C-0.123.00E-07
 258551362015-04-09Nat CommunBasal cell carcinomaTP53rs78378222-G0.0182.071.00E-20
 264240502015-10-01Nat CommunGlioblastoma-rs72709458-T-1.686.00E-24
273636822016-07-01Nat CommunMultiple myeloma-rs10936600-A-1.206.00E-15
275017812016-08-09Nat CommunEGFR mutation-positive lung adenocarcinomaTERTrs2736100-G0.3871.422.00E-31
275398872016-08-19Nat CommunBasal cell carcinomaTP53rs78378222-G0.011.412.00E-10
278632522016-11-17CellMean corpuscular hemoglobinTP53rs78378222-G0.01210.106.00E-09
278632522016-11-17CellMean corpuscular hemoglobinTERTrs2736100-A0.49820.045.00E-34
278632522016-11-17CellPlatelet countTERTrs2736100-A0.49840.033.00E-20
280173752016-12-22Am J Hum GenetMean corpuscular volumeTERTrs2736100-A-0.003.00E-06
280173752016-12-22Am J Hum GenetMean corpuscular volumeTERTrs2736100-?--2.00E-11
280173752016-12-22Am J Hum GenetMean corpuscular hemoglobinTERTrs2736100-?--1.00E-08
281352442017-01-30Nat GenetPulse pressureTP53rs78378222-T0.990.902.00E-10
283464432017-03-27Nat GenetGliomaTP53rs78378222-G0.0132.539.00E-38
283464432017-03-27Nat GenetGlioblastomaTP53rs78378222-G0.0132.635.00E-29
283464432017-03-27Nat GenetNon-glioblastoma gliomaTP53rs78378222-G0.0132.735.00E-27
285372672017-05-24Nat CommunEndometriosisGREB1rs11674184-T0.611.133.00E-17
285372672017-05-24Nat CommunEndometriosisGREB1rs11674184-T0.611.123.00E-14
286047282017-06-12Nat GenetTesticular germ cell tumorTERTrs2736100-A0.511.289.00E-25
286047322017-06-12Nat GenetTesticular germ cell tumorTERTrs2736100-A0.51.298.00E-20
Appendix 1—figure 1
Structure of UL predisposition loci.

The lead SNPs from stage one and their genomic context. The LD estimates (r2) were taken from UK10k ALSPAC. Associations are based on the UKBB cohort. Gene symbols and ENCODE tracks (details in Supplementary Methods) are shown for reference; coordinates follow hg19. See main text Table 1 for more information. In total 22 figures, ordered by genomic position.
Appendix 1—figure 2
Clinical background information for the Helsinki cohort patients.

(A,) the number of ULs per patient (n = 457). (B), patient’s age at hysterectomy (n = 392). (C), parity (n = 367). (D), body mass index (BMI; n = 366). (E), menopause status (n = 367).
Appendix 1—figure 3
MED12 mutation status distributions for the Helsinki cohort patients.

On left, mutation-positive tumors per patient (n = 457), and on right, mutation-negative tumors per patient (n = 457).
Appendix 1—figure 4
Genetic separation between the self-reported ancestries in UK Biobank.

Principal components (PC) analysis of ancestry informative markers. In total 1396 autosomal, ancestry informative SNPs were used and the resulting first two PCs are shown: top-left plot shows all 23,266 samples colored according to their self-reported ancestry. The subsequent plots show the same data divided by the self-reported ancestry. Bottom-left plot displays the variance explained by the first ten PCs. Bottom-right violin plot displays the distribution of the first principal component for cases (n = 2,212) and controls (n = 21,054). The phenotype is more prevalent in individuals with increased African ancestry (p<2.2 × 10−16; Wilcoxon rank sum W = 3 × 107).
Appendix 1—figure 5
Association between GRS and phenotype.

One somatic phenotype (MED12 mutation status) and six independent case-control replication cohorts. On left, density plots (bandwidth = 0.2) for each phenotype. X-axis gives GRS values scaled to unit variance with respect to the controls. Dashed lines denote the mean GRS for cases and controls. Associations (P) and test statistics (W) are from Wilcoxon rank-sum tests. C-scores, Nagelkerke’s R2 (pseudo R2) and Receiver operating characteristic (ROC; on right) are from a logistic regression model. Positive predictive values (PPV) were computed from the highest GRS quartile, and odds ratios (OR and 95% CI) from the top and bottom GRS quartiles.
Appendix 1—figure 6
Genomic risk scores in gnomAD populations.

Population-specific genomic risk scores (GRS) as seen based on gnomAD allele frequencies. Y-axis shows the cumulative effect of each of the 57 GRS SNPs; the X-axis is ordered by risk allele frequency. The population names displayed here follow the naming convention of the gnomAD database.
Appendix 1—figure 7
Association between GRS and number of ULs per patient.

The numbers are based on the Helsinki cohort data of 457 patients with all distinct tumors of ≥1 cm diameter harvested at hysterectomy; details on sample collection are given in Supplementary Methods. A, summary of the model. B, diagnostic plots for the model.
Appendix 1—figure 8
Association between GRS and MED12 mutations.

The numbers are based on the Helsinki cohort data (457 patients). (A) summary of the negative binomial model for MED12-mutation-positive tumor counts. (B) diagnostic plots for the negative binomial model. (C, D) similar model for the MED12-mutation-negative tumor counts.
Appendix 1—figure 9
meQTLs at TERT region.

Hi-C, TADs and CpG methylation around rs2736100 with an 1Mbp flank. The needle plot shows the meQTL associations (dashed lines at 10% FDR; green line denotes the two SNPs; gray ticks denote all CpGs tested; blue needle for positive coefficient, red for negative coefficient) for tumors (above x-axis; nAA = 8, nAB = 32, nBB = 16) and normals (below x-axis; nAA = 5, nAB = 22, nBB = 7). The only significant meQTLs were seen around TERT.
Appendix 1—figure 10
Overview of the CpG methylation around TERT in tumor tissue.

There are several CpGs in TERT whose methylation level is associated with rs2736100 genotype (nominal p<0.05). Some of them are also detected in normal tissue (Appendix 1—Table 9). Hypomethylation refers to decreased methylation in BB vs. AA genotype and hypermethylation the opposite.
Appendix 1—figure 11
MED12 expression.

Linear model. Here, A is the risk allele. Beta 0.247 ± 0.177.
Appendix 1—figure 12
Telomere lengths in tumors and normals.

Here A denotes the risk allele (see Appendix 1—Table 1 for the allele information). In tumors, shorter telomere length is significantly associated with the risk allele at 5p15.33 (rs2736100) (Kruskal-Wallis test).
Appendix 1—figure 13
The association between risk allele count and telomere length On X-axis, total number of risk alleles at TERT (rs72709458, rs2736100, rs2853676), TERC (rs10936600) and OBFC1 (rs1265164).

On Y-axis, estimated telomere length. Linear regression model p=0.055; 95% CI −408.511 – 4.704 per one risk allele.

Data availability

The UKBB data is available through the UK Biobank ( The NFBC data can be requested from the Northern Finland Birth Cohorts' Project Center at the Medical Faculty, University of Oulu ( The summary statistics that support the findings presented in this work are included in Supplementary Tables and Supplementary Data.

The following previously published data sets were used


    1. Gross K
    Obstetrics & Gynecology
    Finding genes for uterine fibroids, Obstetrics & Gynecology, 95, Elsevier, 10.1016/S0029-7844(00)00715-8.
    1. Heinonen H-R
    2. Sarvilinna NS
    3. Sjöberg J
    4. Kämpjärvi K
    5. Pitkänen E
    6. Vahteristo P
    7. Mäkinen N
    8. Aaltonen LA
    MED12 mutation frequency in unselected sporadic uterine leiomyomas
    Fertility and sterility 102:1137–1142.
    1. Hill MA
     Uterus Development
    1. Lee JH
    2. Cheng R
    3. Honig LS
    4. Feitosa M
    5. Kammerer CM
    6. Kang MS
    7. Schupf N
    8. Lin SJ
    9. Sanders JL
    10. Bae H
    11. Druley T
    12. Perls T
    13. Christensen K
    14. Province M
    15. Mayeux R
    Genome wide association and linkage analyses identified three loci-4q25, 17q23.2, and 10q11.21-associated with variation in leukocyte telomere length: the long life family study
    Frontiers in Genetics 4:e310.

Article and author information

Author details

  1. Niko Välimäki

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Formal analysis, Writing—original draft
    Contributed equally with
    Heli Kuisma
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9200-9560
  2. Heli Kuisma

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Formal analysis, Writing—original draft
    Contributed equally with
    Niko Välimäki
    Competing interests
    No competing interests declared
  3. Annukka Pasanen

    Department of Pathology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0079-9807
  4. Oskari Heikinheimo

    Department of Obstetrics and Gynecology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    Competing interests
    No competing interests declared
  5. Jari Sjöberg

    Department of Obstetrics and Gynecology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    Competing interests
    No competing interests declared
  6. Ralf Bützow

    Department of Pathology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    Competing interests
    No competing interests declared
  7. Nanna Sarvilinna

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    3. Department of Obstetrics and Gynecology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
    4. Institute of Biomedicine, Biochemistry and Developmental Biology, University of Helsinki, Helsinki, Finland
    Competing interests
    No competing interests declared
  8. Hanna-Riikka Heinonen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Formal analysis
    Competing interests
    No competing interests declared
  9. Jaana Tolvanen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Formal analysis
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1183-4943
  10. Simona Bramante

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Formal analysis
    Competing interests
    No competing interests declared
  11. Tomas Tanskanen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Formal analysis
    Competing interests
    No competing interests declared
  12. Juha Auvinen

    1. Northern Finland Birth Cohorts' Project Center, Faculty of Medicine, University of Oulu, Oulu, Finland
    2. Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu, Finland
    Competing interests
    No competing interests declared
  13. Outi Uimari

    Department of Obstetrics and Gynecology, PEDEGO Research Unit, Medical Research Center Oulu, Oulu University Hospital, University of Oulu, Oulu, Finland
    Competing interests
    No competing interests declared
  14. Amjad Alkodsi

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Formal analysis
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3528-4683
  15. Rainer Lehtonen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Formal analysis
    Competing interests
    No competing interests declared
  16. Eevi Kaasinen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    3. Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
    Formal analysis
    Competing interests
    No competing interests declared
  17. Kimmo Palin

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4621-6128
  18. Lauri A Aaltonen

    1. Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
    2. Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6839-4286


Terveyden Tutkimuksen Toimikunta (1250345)

  • Lauri A Aaltonen

European Research Council (695727)

  • Lauri A Aaltonen


  • Lauri A Aaltonen

Sigrid Juséliuksen Säätiö

  • Lauri A Aaltonen

Jane ja Aatos Erkon Säätiö

  • Lauri A Aaltonen

Luonnontieteiden ja Tekniikan Tutkimuksen Toimikunta (287665)

  • Niko Välimäki
  • Niko Välimäki

NordForsk (62721)

  • Kimmo Palin

Terveyden Tutkimuksen Toimikunta (312041)

  • Lauri A Aaltonen

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


We are thankful to Sini Marttinen, Sirpa Soisalo, Marjo Rajalaakso, Inga-Lill Åberg, Iina Vuoristo, Alison Ollikainen, Elina Pörsti, Salla Välipakka and Heikki Metsola for their technical support. We also thank Pirjo Ikonen and the rest of the staff of the Kätilöopisto Maternity Hospital, and the staff of the Department of Pathology, University of Helsinki for technical assistance. We thank Minna Männikkö, Tuula Ylitalo, and the rest of the staff of the Northern Finland Birth Cohort Studies, Faculty of Medicine, University of Oulu for technical assistance. The study was supported by grants from Academy of Finland (Finnish Center of Excellence Program 2012 – 2017, 2018 – 2025, No. 1250345 and 312041), European Research Council (ERC, 695727), Cancer Society of Finland, Sigrid Juselius Foundation and Jane and Aatos Erkko Foundation. NV received a grant from the Academy of Finland (No. 287665). KP received a grant from Nordic Information for Action eScience Center (NIASC), the Nordic Center of Excellence financed by NordForsk (No. 62721). This research has been conducted using the UK Biobank Resource, project #32506.


Human subjects: The anonymous patient samples (65) were collected according to Finnish laws and regulations by permission of the director of the health care unit. For the rest of the patients, an informed consent was obtained. This study was conducted in accordance with the Declaration of Helsinki and approved by the Finnish National Supervisory Authority for Welfare and Health, National Institute for Health and Welfare (THL/151/5.05.00/2017), and the Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS/177/13/03/03/2016).

Version history

  1. Received: March 29, 2018
  2. Accepted: September 17, 2018
  3. Accepted Manuscript published: September 18, 2018 (version 1)
  4. Version of Record published: October 26, 2018 (version 2)


© 2018, Välimäki et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 3,992
  • 433
  • 59

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Niko Välimäki
  2. Heli Kuisma
  3. Annukka Pasanen
  4. Oskari Heikinheimo
  5. Jari Sjöberg
  6. Ralf Bützow
  7. Nanna Sarvilinna
  8. Hanna-Riikka Heinonen
  9. Jaana Tolvanen
  10. Simona Bramante
  11. Tomas Tanskanen
  12. Juha Auvinen
  13. Outi Uimari
  14. Amjad Alkodsi
  15. Rainer Lehtonen
  16. Eevi Kaasinen
  17. Kimmo Palin
  18. Lauri A Aaltonen
Genetic predisposition to uterine leiomyoma is determined by loci for genitourinary development and genome stability
eLife 7:e37110.

Share this article

Further reading

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Lisa Baumgartner, Jonathan J Ipsaro ... Julius Brennecke
    Research Advance

    Members of the diverse heterochromatin protein 1 (HP1) family play crucial roles in heterochromatin formation and maintenance. Despite the similar affinities of their chromodomains for di- and tri-methylated histone H3 lysine 9 (H3K9me2/3), different HP1 proteins exhibit distinct chromatin-binding patterns, likely due to interactions with various specificity factors. Previously, we showed that the chromatin-binding pattern of the HP1 protein Rhino, a crucial factor of the Drosophila PIWI-interacting RNA (piRNA) pathway, is largely defined by a DNA sequence-specific C2H2 zinc finger protein named Kipferl (Baumgartner et al., 2022). Here, we elucidate the molecular basis of the interaction between Rhino and its guidance factor Kipferl. Through phylogenetic analyses, structure prediction, and in vivo genetics, we identify a single amino acid change within Rhino’s chromodomain, G31D, that does not affect H3K9me2/3 binding but disrupts the interaction between Rhino and Kipferl. Flies carrying the rhinoG31D mutation phenocopy kipferl mutant flies, with Rhino redistributing from piRNA clusters to satellite repeats, causing pronounced changes in the ovarian piRNA profile of rhinoG31D flies. Thus, Rhino’s chromodomain functions as a dual-specificity module, facilitating interactions with both a histone mark and a DNA-binding protein.

    1. Genetics and Genomics
    2. Neuroscience
    Yifei Weng, Shiyi Zhou ... Coleen T Murphy
    Research Article

    Cognitive decline is a significant health concern in our aging society. Here, we used the model organism C. elegans to investigate the impact of the IIS/FOXO pathway on age-related cognitive decline. The daf-2 Insulin/IGF-1 receptor mutant exhibits a significant extension of learning and memory span with age compared to wild-type worms, an effect that is dependent on the DAF-16 transcription factor. To identify possible mechanisms by which aging daf-2 mutants maintain learning and memory with age while wild-type worms lose neuronal function, we carried out neuron-specific transcriptomic analysis in aged animals. We observed downregulation of neuronal genes and upregulation of transcriptional regulation genes in aging wild-type neurons. By contrast, IIS/FOXO pathway mutants exhibit distinct neuronal transcriptomic alterations in response to cognitive aging, including upregulation of stress response genes and downregulation of specific insulin signaling genes. We tested the roles of significantly transcriptionally-changed genes in regulating cognitive functions, identifying novel regulators of learning and memory. In addition to other mechanistic insights, a comparison of the aged vs young daf-2 neuronal transcriptome revealed that a new set of potentially neuroprotective genes is upregulated; instead of simply mimicking a young state, daf-2 may enhance neuronal resilience to accumulation of harm and take a more active approach to combat aging. These findings suggest a potential mechanism for regulating cognitive function with age and offer insights into novel therapeutic targets for age-related cognitive decline.