Abstract
Parent-of-origin effect (POE) is a phenomenon whereby an allele’s effect on a phenotype depends both on its allelic identity and parent from whom the allele is inherited, as exemplified by the polar overdominance in the ovine callypyge locus and the human obesity DLK1 locus. Systematic studies of POE of expression quantitative trait loci (eQTL) are lacking. In this study we use trios among participants in the Framingham Heart Study to examine to what extend POE exists for gene expression of whole blood using whole genome sequencing and RNA sequencing. For each gene and the SNPs in cis, we performed eQTL analysis using genotype, paternal, maternal, and joint models, where the genotype model enforces the identical effect sizes on paternal and maternal alleles, and the joint model allows them to have different effect sizes. We compared models using Bayes factors to identify paternal, maternal, and opposing eQTL, where paternal and maternal effects have opposite directions. The resultant variants are collectively called POE eQTL. The highlights of our study include: 1) There are more than 2, 000 genes harbor POE eQTL and majority POE eQTL are not in the vicinity of known imprinted genes; 2) Among 180 genes harboring opposing eQTL, 99 harbor exclusively opposing eQTL, and 58 of the 99 are phosphoprotein coding genes, reflecting significant enrichment; 3) Paternal eQTL are enriched with GWAS hits, and genes harboring paternal eQTL are enriched with drug targets. Our study demonstrates the abundance of POE in gene expression, illustrates the complexity of gene expression regulation, and provides a resource that is complementary to existing resources such as GTEx. We revisited two previous POE findings in light of our POE results. A SNP residing in KCNQ1 that is maternally associated with diabetes is a maternal eQTL of CDKN1C, not KCNQ1. A SNP residing in DLK1 that showed paternal polar overdominance for human obesity is a maternal eQTL of MEG3, offering an explanation for the baseline risk of homozygous samples through association between MEG3 expression and obesity. Finally, we advised caution on conducting Mendelian randomization using gene expression as the exposure.
1 Background
Parent-of-origin effect (POE) is a phenomenon whereby an allele’s effect on a phenotype not only depends on its allelic identity, but also on the parent from whom the allele is inherited (DeChiara et al., 1991; Cockett et al., 1996). POE can be driven by factors such as sex bias in transmission of genetic variants (Tomé et al., 2011) and maternal genetic effects that influences the environment in which the offspring develops (Hager et al., 2008). The main driver, however, is genomic imprinting, which is a phenomenon where one of the two alleles at a locus is functionally silenced by methylation. Importantly, which parental copy of the allele being silenced is highly consistent for that gene across samples. Imprinting appears to play an important role in embryonic and placental development and social behavior (Reik and Walter, 2001; Garfield et al., 2011). Genes underpinning syndromic disorders such as Prader-Willi, Beckwtih-Weidemann, and Angelman show POE (Lawson et al., 2013). The same is also true for common disease phenotypes, such as breast cancer, diabetes, and cardiovascular disease (Kong et al., 2009; Hanson et al., 2013; Mozaffari et al., 2019).
In genetic association studies, particularly genome-wide association studies (GWAS), the absence of POE is usually assumed implicitly. This assumption is largely out of convenience, as in most GWAS, the parent-of-origin of an allele is not directly observable, and not inferable due to lack of first and second degree relatives in the sample. Exceptions include an Icelandic cohort, with their extensively documented pedigrees and the abundance of closely related samples (Kong et al., 2009), a Hutterite pedigree (Mozaffari et al., 2019), the Framingham Heart Study with three-generations of participants (Kannel et al., 1979), and the biobank datasets, for which parent-of-origin can be inferred using distant relatives (Hofmeister et al., 2022). Another contributing factor to the implicit assumption may be the presumed scarcity of documented POE in humans. Although POE is believed to be associated with a wide range of complex traits and diseases (Lawson et al., 2013), significant associations reported in GWAS setting are of limited quantity. For example, in a Hutterite study, Mozaffari et al. (2019) examined 21 phenotypes and produced 18 significant associations with POE, fewer than one association per phenotype.
Mouse studies demonstrated that, despite a limited number of imprinted genes, most phenotypes display POE, and loci that show POE were not enriched for known imprinted genes. Using reciprocal F1 crosses, Mott et al. (2014) showed that non-imprinted genes can generate POE by interactions with imprinted loci. Focusing on metabolic traits and using a mouse population at different levels of intercrossing, Macias-Velasco et al. (2022) identified a network comprised of three imprinted and six non-imprinted genes that show POE. In humans, however, loci with POE in relation to phenotypes are either near imprinted regions by design (Kong et al., 2009; Hanson et al., 2013), or near regions with characteristic of imprinting from a genome-wide scan (Mozaffari et al., 2019). To what extend POE contributes to phenotypic variation in humans and how widespread the loci with POE in the human genome remains elusive.
Using whole genome sequencing and RNA sequencing data from the Framingham Heart Study, and taking advantage of its relative abundance of trios, we set to investigate to what extent and degree gene expressions is affected by parent-of-origin. There exist multiple studies on POE on gene expression (Zhabotynsky et al., 2019; Jadhav et al., 2019; Deng et al., 2020) or POE of methylation and gene expression jointly (Zink et al., 2018), but these studies are either underpowered due to small sizes or focus on analysis of allele specific gene expression instead of eQTL. A systematic study of POE eQTL is lacking and our study aims to fill this void.
2 Results
2.1 Overview of data processing
We identified 1477 trios with whole genome sequencing (WGS) data and children within the trios having RNAseq data. After routine QC of genotype data, for each trio, we first phased the child based on rules of Mendelian inheritance, which resolved all non triple heterozygous genotypes. The genotype data has negligible Mendelian error rate, and see details in Methods on how to handle them.
We then marked the remaining triple heterozygous genotypes as missing, which produced phased paternal and maternal haplotypes for each child in the trio. Next we imputed missing alleles into their paternal and maternal haplotype background, and the parental haplotype with a higher imputed allele dosage was assigned the reference allele, the other haplotype was assigned the alternative allele. Simulation studies showed this mask and impute phasing to be highly accurate (Supplementary Table S1). Thus at each SNP, we have a genotype vector, a paternal vector, and a maternal vector.
To process RNAseq data, we started from Transcript Per Million (TPM) values, and selected 16, 824 genes that have < 5% of 0 TMP values. We then corrected for the GC content bias using an approach that is similar to the one used by EDAseq (Risso et al., 2011). The corrected gene expression values were then quantile normalized.
2.2 Bayes factors as evidence for association
For genetic association we used a linear mixed model, fitted by a novel and efficient method that is designed for analyzing multiomics datasets (Guan and Levy, 2024a). The genetic relatedness matrix (twice of kinship matrix) used for the linear mixed model was estimated by Kindred (Guan and Levy, 2024c). The covariates included age, sex, body mass index, and white blood cell composition. This study focuses on genetic association between gene expression and genetic variants in cis, defined as within 1Mb from transcription start site. We elected to use Bayes factors (Kass and Raftery, 1995) instead of p-values as the evidence of association for the following considerations: 1) Bayes factor has been successfully applied in high profile studies as evidence for genetic association (Wellcome Trust Case Control Consortium, 2007). 2) Bayes factor and p-values have a nice monotone relationship for a fixed allele frequency (Zhou and Guan, 2018; Guan and Levy, 2024b). If an eQTL is significant with Bayes factor, it is likely also significant with p-value, but not vice versa (due to the next point). 3) Bayes factors panelize genetic variants with small allele frequencies to reduce false positives (Guan and Stephens, 2011; Zhou and Guan, 2018). In other words, variants of small allele frequencies may have significant p-values, but unsignificant Bayes factors. This is useful as paternal or maternal allele frequencies can be low by chance even the genotype minor allele frequencies are chosen to be ≥ 0.01. 4) Informally, Bayes factor accounts for multiple testing through prior odds in a manner similar to Bonferroni correction: a smaller prior odds requires larger Bayes factor to maintain the same level of posterior odds for association. 5) Parent-of-origin effects require comparing multiple non-nested modles (e.g., maternal vs. paternal effects). Bayes factors provide a coherent framework for such comparisons, while frequentist approaches lack an equivalent (Stephens and Balding, 2009).
For each gene and its cis-SNP, we computed four Bayes factors: genotype Bayes factor (BFg), paternal Bayes factor (BF1), maternal Bayes factor (BF0), and joint Bayes factors (BFj). When computing joint Bayes factor, we also computed paternal and maternal effect sizes and attached a p-value to test whether they differ significantly. Examples of test statistics are provided in Table 1. BFg, BF0 and BF1 are one degree of freedom tests, but BFj is a two degree of freedom test. Thus to achieve BFj > BFg, the paternal and maternal effect sizes have to differ significantly to compensate for the extra degree of freedom. Examples in Table 1 are chosen because their paternal and maternal effects differ significantly, and we have BFj >> BFg. Another useful observation is that BF1 is correlated with |β1|, and BF0 is correlated with |β0|. But BF1 and BF0 are oblivious to the signs of β1 and β0, and signs are informative and interesting in our context.

Example of eQTL statistics.
lgBFx columns are log10 BFx. Column βj’s are effect estimates in joint analysis used to compute BFj. P column contains − log10 p-value with the null hypothesis β0 = β1 and alternative hypothesis β0 ≠ β1. Examples chosen here have most significant P among sentinel eQTL. Transcript start site (TSS) is in Mb, and the coordinate is from HG38. Bona fide imprinted genes are colored in blue.
We used log10 BF = 4 as the significance threshold. If the prior odds for a cis-eQTL is 1 out of 1000, this threshold gives the posterior probability of association (PPA) of 0.91. If the prior odds for a cis-eQTL is 1 out of 100, this threshold gives the PPA of 0.99. 1 – PPA is the Bayesian counterpart of the local false discovery rate (c.f. Soloff et al., 2024).
2.3 Overview from sentinel eQTL of joint analysis
Table 1 contains 10 genes and test statistics of their sentinel eQTL. These eQTL are chosen because they have most significant column P (see caption for details). A sentinel eQTL for a gene is defined as the eQTL with largest BFj for that gene. A gene with significant eQTL is called eGene. These sentinel eQTL are either BF1 >> BF0 ≈ 1 such as NDN, or BF0 >> BF1 ≈ 1 such as MEG3, or BFj >> BFg ≈ 1 such as NECAB3. In Table 1 the sentinel eQTL were ordered according to column P, which measures the significance of differences between paternal and maternal effects. Reassuringly, six out of top ten sentinel eQTL with most significant P are from bona fide imprinted genes (colored in blue) according to http://geneimprint.com.
Figure 1 plots paternal effect vs maternal effect for all sentinel eQTL whose log10 BFj > 3. The color of dots corresponding to degree of difference between paternal and maternal effects. The marked sentinel eQTL include known imprinted genes such as maternally expressed ZNF331 and MEG3, and paternally expressed FAM50B, GNAS, SGCE, NDN, SNURF, SNRPN, and PEG10. Imprinted genes are located along the x and y-axes. At least two genes appear to be novel imprinted genes: PPIEL and CCR9. Gene ZNF890P provides an example of paternal and maternal effects being in the same direction, but their sizes differ significantly. Finally, NECAB3 and LSM7 are two examples of paternal and maternal effects that are in opposite directions.

Comparison of paternal effects and maternal effects of sentinel eQTL in joint analysis.
Each point is the most prominent eQTL with log10 BFj > 3 of gene. The paternal effect is on x-axis and maternal effect y-axis. The coloring reflects significance of differentials between paternal effects and maternal effects, with red more significant than blue.
This approach based on the sentinel eQTL suggests that 1) the combined effect of paternal and maternal allele can be across all 360 degrees; 2) when paternal and maternal effects are in opposite directions, both effects sizes are modest compared to when they are in the same direction; and 3) there are quite a few sentinel eQTL whose paternal and maternal effects show opposite directions.
2.4 NECAB3 : opposite paternal and maternal effects
The existence of eGenes whose eQTL show opposite paternal and maternal effects is intriguing. We focused on the example of NECAB3 to examine further details. Figure 2 left panel shows that all eQTL of NECAB3 have opposite paternal and maternal effects. Consequently, the joint Bayes factors BFj’s are much larger than their corresponding genotype Bayes factors BFg’s (right bottom).

An example of gene that whose paternal and maternal eQTL have opposite effect sizes.
On the left is paternal vs maternal effect sizes normalized by their corresponding standard deviation. The effect sizes were estimated by joint analysis. The non-significant eQTL are colored in gray and significant eQTL colored in black. An eQTL marked in blue was chosen to show boxplots with gene expression (phenotype) of NECAB3. The three panels of boxplots are for genotypes, paternal alleles, and maternal alleles. For this SNP rs4911348, the genotypes has no association with phenotype, but both paternal alleles and maternal alleles are associated with the phenotype, and the paternal and maternal effects are in opposite direction. Right panel bottom show Manhattan plot of cis-eQTL of the gene NECAB3. The genotype Bayes factor were colored in gray and joint test Bayes factor were colored in black. SNP rs4911348 was highlighted in blue.
We selected as an example SNP rs4911348, which is different from the sentinel eQTL shown in Figure 1, and looked into details of its association with the gene expression of NECAB3 (phenotype). Three panels of boxplots (right top) show that genotype has no association with the phenotype, and both paternal alleles and maternal alleles are associated with the phenotype, but in opposite directions. Opposite paternal and maternal effects were also observed in Hutterite POE study, where ten associations of opposite effects across nine traits were reported (Mozaffari et al., 2019).
2.5 POE eQTL are abundant
Next we looked beyond the sentinel eQTL and examined all significant eQTL. We noted that there are 15, 893 eQTL from 14, 733 SNPs and 1, 824 eGenes that are insignificant for the SNP test, but significant for the paternal, or maternal, or joint test. That is, log10 BFg < 4, but either log10 BF1 > 4, or log10 BF0 > 4, or log10 BFj > 4. In other words, these eQTL cannot be detected without analyzing POE, which implies that POE can contribute to recover missing heritability.
With reference Figure 1, we proposed criteria to define subsets of eQTL. The first subset eQTL locate along the x-axis, mimicking sentinel eQTL of NDN and FAM50B; the second subset locate along the y-axis, mimicking sentinel eQTL of ZNF331 and MEG3; the third subset locate along the secondary diagonal, mimicking sentinel eQTL of NECAB3 and LSM7; and the fourth subset follows (blues dots) along the diagonal line. Note Figure 1 only involves BFj, our criteria to define gene sets also takes into account of BF1, BF0, and BFg (details in Methods).
The first set of eQTL have significant paternal effects but insignificant maternal effects. We identified 15, 576 such paternal eQTL (set SP) from 14, 372 SNPs associated with 1, 188 eGenes. The second set of eQTL have significant maternal effects but insignificant paternal effects. We identified 14, 783 such maternal eQTL (set SM) from 13, 293 SNPs associated with 1, 209 eGenes. We refer to these paternal and maternal eQTL as imprinting eQTL. The third set of eQTL have opposite paternal and maternal effects, such that joint Bayes factors are much larger than genotype Bayes factor BFj >> BFg. We identified 688 such opposing eQTL (set SO) from 485 SNPs that are associated with 180 eGenes. Imprinting eQTL and opposing eQTL are referred to as POE eQTL.
The fourth set of eQTL require that both paternal and maternal alleles alone are associated with an expression phenotype, and the two effects are in the same direction. For these eQTL BFg > BFj because similar effect sizes favor one degree of freedom test. We identified 884, 119 such genotype eQTL from 577, 701 SNPs and 4940 eGenes. The percent of eGenes with genotype eQTL is on par, but smaller than GTEx study (Consortium et al., 2020), presumably because log10 BFg > 4 is a more stringent threshold for eQTL analysis.
2.6 Paternal eQTL are enriched with GWAS hits
Using ANNOVAR (Wang et al., 2010), we annotated all four sets of eQTL (Table 2). We made the following observations: 1) The median distance to transcription start site (TSS) for opposing eQTL is 673.2Kb, much larger than other types of eQTL, and majority of opposing eQTL SO are located in introns of nearby genes. 2) The percent of GWAS hits for SNPs in SO is 0.022, significantly lower than that of SNPs in genotype eQTL SG’s 0.054 (test for proportion P = 0.001). This is reasonable as GWAS mainly use genotype test and are agnostic to opposing eQTL. 3) The precent of GWAS hits for SNPs in paternal eQTL SP is higher than that of SNPs in SG (test for proportion P = 9 × 10−11), and the percent of GWAS hits among SNPs in maternal eQTL SM has no significant difference to that of SNPs in SG (test for proportion P = 0.051).

Top tabular: Annotation of eQTL. SO is a set of opposing eQTL, SP is a set of paternal eQTL, SM is a set of maternal eQTL, and SG is a set of genotype eQTL. The column eQTL contain counts of eQTL in each set, and column SNP contains counts of distinct SNPs of eQTL in the set. The column eGene is the number of genes associated with eQTL in the set. The column of Len is the median length in Kb of the eGenes. (The pattern is the same with the mean length.) The column DT SS contains median distance in Kb to transcription start site. The column GWAS contains percent of GWAS hits among SNPs, with GWAS p-value threshold of 5 × 10−8. Bottom tabular: A SNP set Pk contain SNPs in SG that are eQTL of at least k eGenes. (Note P1 = SG.) Percent of GWAS hits for SNP set Pk increases with k.
Number of eQTL per SNP (r) measures degree of pleiotropy of a set of SNPs. SNPs in SP and SNPs in SM have similar pleiotropy r ≈ 1.1, SNPs in SG have the largest r = 1.53, and SNPs in SO display r = 1.41. In other words, SNPs are more exclusive for imprinting eQTL, less exclusive for genotype eQTL, and somewhere in between for opposing eQTL. Table 2 (bottom) compares how the degree of pleiotropy affects their annotation for SNPs in SG, where set Pk contains SNPs in SG that are eQTL of at least k eGenes. The most interesting observation is that the percent of GWAS hits increases with rising SNP pleiotropy. This feature is only observed in genotype eQTL.
2.7 POE eGenes are abundant
There are 1, 188 eGenes harboring paternal eQTL and 1, 209 eGenes harboring maternal eQTL. Their union contains 2, 139 eGenes harbor imprinting eQTL, and their intersection contains 258 eGenes harboring both paternal and maternal eQTL. There are 4, 940 eGenes harboring genotype eQTL, and 1, 867 eGenes harboring both genotype eQTL and imprinting eQTL. Supplementary Figure S1 showed an example of an eGene harboring both maternal and paternal eQTL: GZMH, and an example of an eGene harboring both genotype eQTL and imprinting eQTL: DSE. The most prominent feature is that paternal eQTL and maternal eQTL cluster in separate genomic regions, and genotype eQTL and imprinting eQTL also cluster in separate genomic region.
2.8 eGenes that harbor exclusively imprinting eQTL
There are 129 eGenes (set G1) that harbor exclusively paternal eQTL, and 139 eGenes (set G0) that harbor exclusively maternal eQTL. Naturally, G1 contain many bona fide imprinted genes with paternal expression such as FAM50B, NDN, SGCE, SNRPN, SNURF and PEG10. Remaining genes in G1 are candidates for imprinted genes with paternal expression, for example CCR9 and NCOA2. G0 contain bona fide imprinted genes with maternal expression such as GRB10, ZNF331 and MEG3. The remaining in G0 are candidates for imprinted genes with maternal expression, for example COA8 and ZNF888.
The Supplementary Material provides a full list of the candidate imprinted genes from the eQTL analysis. Note however, that not all imprinted genes have eQTL, therefore this list is incomplete. Moreover, a gene on the list is considered behaving like an imprinted gene, but may not be a bona fide imprinted gene. For example, a gene that has no cis-eQTL on its own, may “acquires” a cis-eQTL through gene-gene interaction with a putative imprinted gene such that the cis-eQTL is in fact its trans-eQTL.
We used G1 and G0 as proxies for paternal and maternal imprinted genes, and used cytoBands these genes occupies as proxies for imprinted regions to examine to what extend paternal and maternal eQTL locate in the same cytoBands as these proxy imprinted genes. There are 811 autosome cytoBands in humans, 765 of them larger than 1 Mb, and the median size is 3.2 Mb (Cheung et al., 2001). The 129 proxy paternal imprinted genes occupy 89 cytoBands (set
2.9 eGenes that harbor exclusively opposing eQTL
There are 99 eGenes (set G2) that harbor only opposing eQTL (Supplementary). Among them 58 encode phosphoproteins (Supplementary), a significant enrichment (FDR = 5.5 × 10−4) according to DAVID, a web server for functional enrichment analysis (Sherman et al., 2022). A phosphoprotein is a protein that is posttranslationally modified by the attachment of either a single phosphate group, or a complex molecule such as 5’-phospho-DNA, through a phosphate group. Because phosphorylation often serves as an on/off switch, targeting phosphoproteins or the enzymes regulating them can restore normal cell signaling in diseases. In addition, drugs can modulate the functions of these proteins, either inhibiting or enhancing their activities. This makes phosphoprotein attractive drug targets. Indeed, according to a therapeutic target database (Zhou et al., 2024), among 58 phosphoproteins genes that harbor exclusively opposing eQTL, there are three successful drug targets ITPR1, ITPR2, and REL, and three additional clinical trial targets CD46, NOTCH3, and XPO1.
Other notable phosphoproteins genes include CDKN2A, PHIP, and LEP, among others. Where CDKN2A produces two major proteins p16(INK4), which is a cyclin-depnednet kinase inhibitor, and p14(ARF), which binds the p53-stabilizing protein MDM2 (Serrano et al., 2000); PHIP is associated with Chung-Jansen syndrome, featuring behavioral problems, intellectual disability, obesity, and dysmorphia (Webster et al., 2016; Jansen et al., 2018); and LEP encodes leptin, a protein that plays a critical role in the regulation of body weight. Leptin is secreted by white adipocytes, and it binds to the leptin receptor in the brain, which in turn inhibits appetite and promotes energy expenditure (Maffei et al., 1995; Weigle et al., 1997).
2.10 Paternal eGenes are enriched with drug targets
We used a therapeutic target database (Zhou et al., 2024) to examine the enrichment with drug targets of the four sets of eGenes, grouped by their eQTL. Table 3 contains the enrichment of the four gene sets in either successful targets, clinical trial targets, or combined targets. The significant enrichment that (following Bonferroni correction of 12 tests) is highlighted. Both paternal eGenes and genotype eGenes are enriched in clinical trial targets and combined targets, but paternal eGenes are significantly more enriched than genotype eGenes (one-sided test for proportion P = 0.035). The enrichment of drug targets for paternal eGenes echoes the enrichment of GWAS hits for paternal eQTL (Table 2). This asymmetry between paternal and maternal eGenes perhaps finds its root in the conflicting interests of two sets of genes in relation to transfer of nutrients from the mother to her offspring (Moore and Haig, 1991). It is known that paternal alleles may have distinct evolutionary and functional roles compared to maternal alleles. Maze serves as an extreme example, where gene expression of the hybrid is regulated exclusively by the paternally transmitted alleles (Swanson-Wagner et al., 2009).

Enrichment of drug target genes.
GO contains eQTL with both paternal and maternal effects but they are in opposite direction. GP contains eQTL of paternal effects. GM contains eQTL of maternal effects. GG contain eQTL that have both paternal and maternal effects and they are in the same direction. Column nGene are sizes of gene sets, Column nTarget are counts of successful drug targets. The last column contains one-sided test-of-proportion p-values. The highlighted p-values are significant after Bonferroni correction of 12 tests.
3 Discussion
Our study provides a comprehensive analysis of POE eQTL in human gene expression using data from the Framingham Heart Study. The extensive presence and complexity of POE uncovered in this study have significant implications for our understanding of genetic regulation and its role in complex traits and diseases. To illustrate the implications in the context of existing research, we revisit two previous findings in light of our new results, and advise caution on conducting Mendelian randomization using gene expression as the exposure.
3.1 T2D and maternal allele of rs2237892
SNP rs2237892 in the last intron of KCNQ1 was found in association with type 2 diabetes (T2D) in a Japanese cohort (Yasuda et al., 2008), and the association was later confirmed in a Chinese cohort (Liu et al., 2009). Three other SNPs (rs2283228, rs2237895, and rs2237897) in the same last intron of KCNQ1 were found in association with T2D in another Japanese cohort (Unoki et al., 2008). Both rs2237895 and rs2237897 were replicated in a Singaporean cohort, a Danish cohort (Unoki et al., 2008), and a Chinese cohort (Liu et al., 2009). In these studies, KCNQ1 was identified as T2D candidate gene.
Based on these results from the genotype test, Kong et al. (2009) demonstrated that SNP rs2237892 was maternally associated with T2D in an Iceland cohort. This maternal association was replicated with a larger odds ratio in a Pima Amerindian cohort (Hanson et al., 2013). According to data from GTEx, none of these SNPs are eQTL of KCNQ1 and CDKN1C in any tissue, consistent with the fact that these are imprinted genes and GTEx study is oblivious to parent of origin. Our eQTL analysis shows that SNPs rs2237892, rs2283228, and rs2237897 are maternal eQTL of CDKN1C, a down-stream neighbor of KCNQ1, and none of these SNPs are eQTL of KCNQ1.
We therefore suggest that CDKN1C, instead of KCNQ1, is a T2D candidate gene, for the following reasons: 1) These GWAS SNPs connect T2D and CDKN1C quantitatively, but connect T2D and KCNQ1 only geographically; 2) A study suggests that CDKN1C mutations may represent a novel monogenic form of diabetes (Kerns et al., 2014); 3) A boy carrying a frameshift mutation in CDKN1C was diabetic from week 29 (Berland et al., 2022); and 4) Targeted demethylation at the CDKN1C/p57 locus induces human β cell replication, while the loss of insulin-secreting β cell is characteristic among T1D and T2D (Ou et al., 2019).
3.2 Polar overdominance and rs1802710
Overdominance is a pattern of inheritance such that both homozygous AA and BB have the same baseline trait value, while heterozygous AB has a higher trait value. Polar over-dominance (Cockett et al., 1996) introduces asymmetry into overdominance to separates AB into two groups according to parent-of-origin of A (or B), such that AB with paternal A (or B) has a high trait value, while AB with maternal A (or B) has the baseline trait value.
Wermter et al. (2008) studied trios of extremely obese offspring and identified rs1802710 in exon 5 of DLK1, homologous to the ovine callipyge locus (Cockett et al., 1996), whose allelic transmission pattern was consistent with polar overdominance: frequent transmission of the paternal C allele to obese children, but the relative risk for carriers of the homozygous CC genotype was not increased compared to the reference TT genotype.
Our eQTL results show that DLK1 has no significant eQTL, but the maternal C allele of rs1802710 reduces expression of MEG3, 53Kb downstream of DLK1. Both MEG3 and DLK1 are imprinted with MEG3 maternally expressed and DLK1 paternally expressed. The expression of MEG3 was shown to be significantly higher in the obese group (Daneshmoghadam et al., 2021). Therefore the maternal C allele is associated with reduced risks of obesity, which balances out the increased risk conferred by paternal C allele, thus offering an explanation for the baseline risk of homozygous CC samples in (Wermter et al., 2008). Since rs1802710 is not an eQTL of any gene in any tissue according to GTEx data, our study is critical to link rs1802710 with MEG3, albeit not in the adipose tissue.
3.3 Mendelian randomization
Mendelian randomization (MR) refers to the random allocation of alleles at the time of gamete formation. Observational epidemiology studies use MR to infer the causal effect of an exposure on an phenotype (Smith and Ebrahim, 2003), as if the random allocation of alleles is comparable to a randomized clinical trial. An important assumption for MR, among many other important assumptions, is that the phenotype conferred by a specific genetic variant is homogeneous in the population, and exchangeable between paternally and maternally inherited alleles. POE is a direct violation of this assumption (Bochud et al., 2008).
There is a growing interest in using gene expression as an exposure to perform MR across various traits (Porcu et al., 2019; van der Graaf et al., 2020). If an exposure has eQTL with POE, ignoring the parent-of-origin will bias prediction of a subset of samples. If the bias is severe it will change the ranking of the exposure. Consequently, it either compromises the power of the MR or leads to a false conclusion of a causal relationship between the exposure and the phenotype.
Our study demonstrated the abundance of POE in gene expressions, and therefore advised caution when conducting MR using gene expression as the exposure. Our suggestion is checking the list of POE SNPs and eGenes we provided in the Supplementary Material and excluding those that show POE towards the exposure. The same suggestion is also applicable to computing polygenic risk scores, and imputing gene expression such as PrediXcan (Gamazon et al., 2015) and a Bayesian method motivated by it (Qi et al., 2018).
3.4 Limitations of the Study
Despite the significant findings, our study has several limitations. First, the analysis is restricted to whole blood gene expression, which may not capture POE in other tissues. Gene expression and POE can be tissue-specific, and further studies are needed to investigate POE across different tissues to gain a comprehensive understanding of its impact.
Second, the study relies on the availability of trios with both whole genome sequencing and RNA sequencing data, which limits the sample size. Although the Framingham Heart Study provides a relatively large number of trios, increasing the sample size and including more diverse populations would enhance the generalizability of our findings.
Finally, the functional implications of the identified POE eQTL require further validation through experimental studies. While our study provides a comprehensive catalog of POE eQTL, understanding their biological significance and mechanisms of action will require detailed experimental investigations.
3.5 Conclusion
Our study highlights the abundance and the complexity of POE in gene expression, providing a valuable resource that complements existing databases such as GTEx. These findings have significant implications for genetic association studies, the interpretation of complext traits, and the development of targeted therapies. By expanding the understanding of POE, this study contributes to a deeper comprehension of genetic regulation and its impact on human health.
4. Methods
4.1 Genotype data
We first used pedigree and availability of RNAseq data to narrow the samples down to 2955 individuals and identified all bi-allelic SNPs following a TOPMed protocol (Taliun et al., 2021). We then used Kindred to infer pairwise kinship for each chromosome, and computed mean and standard deviation (SD) of the kinship. A sample pair that is parent-offspring can be distinguished from full sibs by the SD (Supplementary Figure S2). Trios are consisted of sample A, B, and C such that ϕ(A, B) ≈ 0 and ϕ(A, C) ≈ ϕ(B, C) ≈ 0.25 where ϕ(·, ·) is the kinship. In the end, we identified 1477 trios that have whole genome sequencing data and whose children have RNAseq data. In total, we had 7, 752, 281 autosomal biallelic SNPs with minor allele frequency > 0.01 (not corrected for relatedness). Across all 1477 trios, the Mendelian error is negligible 4.76 per 100, 000 SNPs per trio. To handle Mendelian errors, we assumed that the child was correctly genotyped, and errors occurred in the parents, but if the child is heterozygous, we assumed this to be triple heterozygous SNP. On average 1477 trios had 65 triple heterozygous genotypes per SNP. The triple heterozygous SNPs cannot be phased by Mendelian inheritance, but can be phased based on a linkage disequilibrium model (below).
4.2 Inference of parent of origin
For each trio, we first phased all markers that are not triple heterozygous using rules of Mendelian inheritance, where for the child in each trio we obtained paternal and maternal haplotypes with triple heterozygous markers whose phase were unresolved. We marked those triple heterozygous markers as missing. For markers that showed Mendelian incompatibility, we assumed that the child’s genotype was correct and marked it as missing if it was heterozygous. Using a hidden Markov model that was designed to model haplo-type variation (Guan, 2014), we imputed missing markers in each haplotypes to obtain dosage estimates (between 0 and 1). Then for each marker that was marked as missing, we compared imputed dosages between paternal and maternal haplotypes, and assigned the large dosage as 1 and small dosage as 0. We thus obtained paternal and maternal haplo-types with no missing data. We simulated trios using haplotypes from the 1000 Genomes project (Auton et al., 2015), and investigated the accuracy of the phasing by the mask and imputation approach described above. The phasing was highly accurate, the error rate was less than 1 out of 200 triple heterozygous SNPs (Supplementary Table S1).
4.3 RNAseq data
The RNAseq data of 1477 children was obtained in three batches with 1381 in batch 1; 14 in batch 2, and 82 in batch 3. Our primary goal was to correct for GC content bias. To this end, we first removed genes whose proportion of 0 TPM values was greater than 5%, which retained 16, 969 genes out of total 58103, a majority of which are coding genes. Among those, 16, 824 genes have GC content. To correct for GC content bias, we took an approach used by EDAseq to regress out the GC content from log(1+TMP), but instead of using local linear regression as documented in EDAseq, we used local quadratic regression implemented in loess in R (Supplementary Figure S3). Specifically, let y = log(1 + TMP) be a gene expression and g be the GC content, we fit ly = loess(y, g) and compute ŷ = median(y) + ly$residuals. We then quantile normalized ŷ by qqnorm(ŷ, plot.it = F)$y (in R) separately for each gene.
4.4 eQTL analysis
We used IDUL to perform eQTL analysis. IDUL fits linear mixed models to achieve exact optimal. It was specifically developed to analyze multi-omics data to achieve high efficiency by reusing the intermediate computations (Guan and Levy, 2024a). Previous analysis showed that for the Framingham Heart Study participants, the genomic inflation was well controlled using a linear mixed model with kinship matrix computed by Kindred (Guan and Levy, 2024c). In this study, we also controlled age, sex, BMI, and white blood cell compositions in peripheral blood. We now briefly describe how Bayes factors were computed. Consider a model
where W contains conventional covariates such as age and sex, including a column of 1, x contains genetic variant(s) to be tested for association, u is the random effect with Z as its loading matrix and twice of kinship K as its covariance (both Z and K are known), MVNn denotes an n-dimensional multivariate normal distribution, In is n-dimensional identity matrix. Denote X = (W, x) and b = (a, β), then Xb is the fixed effect, and we assume X has a full rank c. In genetic association studies, the random effect Zu is a nuisance term that absorbs part of the phenotype y that is attributable to population stratification and relatedness. The maximum likelihood estimate (MLE) of η can be efficiently obtained (Guan and Levy, 2024a), plug
and let Va → ∞, κ1 → 0 and κ2 → 0, Bayes factor can be evaluated efficiently in a closed form. In this study we used σ = 0.5, following prior work of Bayes factors for linear models (Servin and Stephens, 2007). Complete details on computing Bayes factors for linear mixed model can be found in (Guan and Levy, 2024b).
4.5 P-value for difference between paternal maternal effects
For joint analysis, we have
with the prior for β1 and β2 as
The Bayes factor can be similarly computed in a closed form. We obtain the posterior estimates of
4.6 Threshold for eQTL sets and gene sets
For paternal eQTL set SP, we required log10 BF1 > 4 and log10 BF0 < θ; For maternal eQTL set SM, we required log10 BF0 > 4 and log10 BF1 < θ; For opposing eQTL set SO, we required log10 BFj − log10 BFg > 4 and log10 BF1 > θ and log10 BF0 > θ, and β1 ∗ β0 < 0; For genotype eQTL set SG, we required log10 BFg > 4 and log10 BF1 > θ and log10 BF0 > θ, and β1 ∗ β0 > 0. It’s easy to see for a particular type of eQTL, different θ produced nested eQTL sets. We tried θ = 0, log10 2, and log10 3 to obtain eQTL sets (Supplementary Table S2), from which we obtained gene sets harboring those eQTL GP, GM, GA and GG, from which we obtained genes harboring exclusively paternal eQTL G1 = GP \ (GM ∪ GA ∪ GG), genes harboring exclusive maternal eQTL G0 = GM \ (GP ∪ GA ∪ GG), and genes harboring exclusively opposing eQTL G0 = GA \(GP ∪ GM ∪ GG). We compared known imprinted genes with paternal expression with G1, and known imprinted genes with maternal expression with G0. θ = log10 2 produced minimum G1 and G0 that contain known imprinted genes.
4.7 Availability of data and materials
This study analyzed existing datasets from public domain and simulated datasets. Data from the 1000 Genomes project can be found at https://www.internationalgenome.org, and datasets from Framingham Heart Study can be obtained via dbGaP. The pipeline and software used to conduct analysis is available at https://github.com/haplotype/poe. The results (POE eQTLs) generated from current study will be made available to general public once the manuscript is in print.
Acknowledgements
This research is supported by the Division of Intramural Research of the National Heart, Lung, and Blood Institute, Bethesda, MD (D.L. Principal Investigator).
Additional files
References
- A global reference for human genetic variationNature 526:68–74Google Scholar
- Deep exploration of a CDKN1C mutation causing a mixture of Beckwith-Wiedemann and IMAGe syndromes revealed a novel transcript associated with developmental delayJ. Med. Genet 59:155–164Google Scholar
- A cautionary note on the use of mendelian randomization to infer causation in observational epidemiologyInt. J. Epidemiol 37:414–6Google Scholar
- Integration of cytogenetic landmarks into the draft sequence of the human genomeNature 409:953–958Google Scholar
- Polar overdominance at the ovine callipyge locusScience 273:236–238Google Scholar
- The gtex consortium atlas of genetic regulatory effects across human tissuesScience 369:1318–1330Google Scholar
- The gene expression of long non-coding RNAs (lncRNAs): MEG3 and H19 in adipose tissues from obese women and its association with insulin resistance and obesity indicesJ. Clin. Lab. Anal 35:e23741Google Scholar
- Parental imprint-ing of the mouse insulin-like growth factor II geneCell 64:849–859Google Scholar
- Joint modeling of eQTLs and parent-of-origin effects using an orthogonal framework with RNA-seq dataHum. Genet 139:1107–1117Google Scholar
- A gene-based association method for mapping traits using reference transcriptome dataNat. Genet 47:1091–1098Google Scholar
- Distinct physiological and behavioural functions for parental alleles of imprinted grb10Nature 469:534–538Google Scholar
- Detecting structure of haplotypes and local ancestryGenetics 196:625–642Google Scholar
- Asymptotically exact fit for linear mixed model in genetic association studiesGenetics 228:iyae143Google Scholar
- Bayes factor for linear mixed model in genetic association studiesbioRxiv Google Scholar
- Estimation of inbreeding and kinship coefficients via latent identity-by-descent statesBioinformatics 40:btae082Google Scholar
- Bayesian variable selection regression for genome-wide association studies and other large-scale problemsThe Annals of Applied Statistics 5:1780–1815Google Scholar
- Maternal effects as the cause of parent-of-origin effects that mimic genomic imprintingGenetics 178:1755–1762Google Scholar
- Strong parent-of-origin effects in the association of KCNQ1 variants with type 2 diabetes in american indiansDiabetes 62:2984–2991Google Scholar
- Parent-of-Origin inference for biobanksNat. Commun 13:6668Google Scholar
- RNA-Seq in 296 phased trios provides a high-resolution map of genomic imprintingBMC Biol 17:50Google Scholar
- A genotype-first approach identifies an intellectual disability-overweight syndrome caused by PHIP haploinsufficiencyEur. J. Hum. Genet 26:54–63Google Scholar
- An investigation of coronary heart disease in families. the framingham offspring studyAm. J. Epidemiol 110:281–290Google Scholar
- Bayes factorsJournal of the American Statistical Association 90:773–795Google Scholar
- A novel variant in CDKN1C is associated with intrauterine growth restriction, short stature, and early-adulthood-onset diabetesJ. Clin. En-docrinol. Metab 99:E2117–22Google Scholar
- Parental origin of sequence variants associated with complex diseasesNature 462:868–874Google Scholar
- Genomic imprinting and parent-of-origin effects on complex traitsNat. Rev. Genet 14:609–617Google Scholar
- Variants in KCNQ1 are associated with susceptibility to type 2 diabetes in the population of mainland chinaDiabetologia 52:1315–1321Google Scholar
- Parent-of-origin effects propagate through networks to shape metabolic traitseLife 11Google Scholar
- Increased expression in adipocytes of ob RNA in mice with lesions of the hypothalamus and with mutations at the db locusProc. Natl. Acad. Sci. U. S. A 92:6957–6960Google Scholar
- Genomic imprinting in mammalian development: a parental tug-of-warTrends in Genetics 7:45–49Google Scholar
- The architecture of parent-of-origin effects in miceCell 156:332–342Google Scholar
- Parent-of-origin effects on quantitative phenotypes in a large hutterite pedigreeCommun. Biol 2:28Google Scholar
- Targeted demethylation at the cdkn1c/p57 locus induces human beta cell replicationThe Journal of Clinical Investigation 129:209–214Google Scholar
- Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traitsNat. Commun 10:3300Google Scholar
- Tissue-specific gene expression prediction associates vitiligo with suox through an active enhancerbioRxiv Google Scholar
- Genomic imprinting: parental influence on the genomeNat. Rev. Genet 2:21–32Google Scholar
- GC-content normalization for RNA-Seq dataBMC Bioinformatics 12:480Google Scholar
- Alterations in the p16INK4a/CDKN2A tumor suppressor gene in gastrinomasJ. Clin. Endocrinol. Metab 85:4146–4156Google Scholar
- Imputation-based analysis of association studies: Candidate regions and quantitative traitsPLOS Genetics 3:1–13Google Scholar
- DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update)Nucleic Acids Res 50:W216–W221Google Scholar
- ‘mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease?Int. J. Epidemiol 32:1–22Google Scholar
- The edge of discovery: Controlling the local false discovery rate at the marginThe Annals of Statistics 52:580–601Google Scholar
- Bayesian statistical methods for genetic association studiesNat. Rev. Genet 10:681–690Google Scholar
- Paternal dominance of trans-eQTL influences gene expression patterns in maize hybridsScience 326:1118–1120Google Scholar
- Sequencing of 53,831 diverse genomes from the NHLBI TOPMed programNature 590:290–299Google Scholar
- Maternal germline-specific effect of DNA ligase I on CTG/CAG instabilityHum. Mol. Genet 20:2131–2143Google Scholar
- SNPs in KCNQ1 are associated with susceptibility to type 2 diabetes in east asian and european populationsNat. Genet 40:1098–1102Google Scholar
- Mendelian randomization while jointly modeling cis genetics identifies causal relationships between gene expression and lipidsNat. Commun 11:4930Google Scholar
- ANNOVAR: functional annotation of genetic variants from high-throughput sequencing dataNucleic Acids Res 38:e164Google Scholar
- De novo PHIP-predicted deleterious variants are associated with developmental delay, intellectual disability, obesity, and dysmorphic featuresCold Spring Harb Mol Case Stud 2:a001172Google Scholar
- Effect of regional fat distribution and Prader-Willi syndrome on plasma leptin levelsJ. Clin. Endocrinol. Metab 82:566–570Google Scholar
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature 447:661–678Google Scholar
- Preferential reciprocal transfer of paternal/maternal DLK1 alleles to obese children: first evidence of polar overdominance in humansEur. J. Hum. Genet 16:1126–1134Google Scholar
- Variants in KCNQ1 are associated with susceptibility to type 2 diabetes mellitusNat. Genet 40:1092–1097Google Scholar
- A statistical method for joint estimation of cis-eQTLs and parent-of-origin effects under family trio designBiometrics 75:864–874Google Scholar
- On the null distribution of bayes factors in linear regressionJournal of the American Statistical Association 113:1362–1371Google Scholar
- TTD: Therapeutic target database describing target druggability informationNucleic Acids Res 52:D1465–D1477Google Scholar
- Insights into imprinting from parent-of-origin phased methylomes and transcriptomesNat. Genet 50:1542–1552Google Scholar
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.107920. This DOI represents all versions, and will always resolve to the latest one.
Copyright
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Metrics
- views
- 0
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.