Genetics of trans-regulatory variation in gene expression

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Heritable variation in gene expression forms a crucial bridge between genomic variation and the biology of many traits. However, most expression quantitative trait loci (eQTLs) remain unidentified. We mapped eQTLs by transcriptome sequencing in 1012 yeast segregants. The resulting eQTLs accounted for over 70% of the heritability of mRNA levels, allowing comprehensive dissection of regulatory variation. Most genes had multiple eQTLs. Most expression variation arose from trans-acting eQTLs distant from their target genes. Nearly all trans-eQTLs clustered at 102 hotspot locations, some of which influenced the expression of thousands of genes. Fine-mapped hotspot regions were enriched for transcription factor genes. While most genes had a local eQTL, most of these had no detectable effects on the expression of other genes in trans. Hundreds of non-additive genetic interactions accounted for small fractions of expression variation. These results reveal the complexity of genetic influences on transcriptome variation in unprecedented depth and detail.

https://doi.org/10.7554/eLife.35471.001

eLife digest

Every individual’s genome is unique, with variations in the DNA sequence at many thousands of points. Each difference is a change in one or more ‘letters’ of the DNA code. Some of these DNA letter variations have consequences for the way the individual looks or behaves. They can influence these traits either by changing the sequence of a protein encoded by a gene; or by changing when, where or how much a gene is active.

Studying how individual differences in the DNA influence gene activity requires a very large amount of data on many individuals within a species. Only recently have such large datasets become available. These have made it possible to study these regulatory differences in unprecedented detail.

Albert, Bloom et al. set out to map as many regulatory genetic variants as possible in budding yeast – a popular model organism used in many branches of science. The approach involved measuring how active every gene in the genome was, and which genetic variants influenced whether each gene’s activity was turned up or down, in more than 1,000 different strains of yeast. Thousands of regions of the DNA turned out to influence regulation of genes. The analysis revealed that almost every gene is influenced by a complex set of regulatory regions all over the genome. Some hotspot regions were found to influence thousands of genes at once.

The findings provide the most complete set of data for studying the effects of variation in DNA sequence on genetic regulation in any species, and can act as a model for researchers to carry out similar experiments in other species. Ultimately, these results could help understand exactly how differences in genome sequence help to make individuals unique.

https://doi.org/10.7554/eLife.35471.002

Introduction

Differences in gene expression among individuals arise in part from DNA sequence differences in regulatory elements and in regulatory genes. Regions of the genome that contain regulatory variants can be identified by tests of genetic linkage or association between mRNA levels and DNA polymorphisms in large collections of individuals. Regions for which such tests show statistical significance are known as eQTLs (Albert and Kruglyak, 2015). Regulatory variation is widespread in the species for which it has been studied; indeed, in humans, the expression of nearly every gene appears to be influenced by one or more eQTL (Aguet et al., 2017; Battle et al., 2014).

In humans, eQTLs are typically mapped by genome-wide association studies (GWAS) in unrelated individuals. To cover the genome, human GWAS must test a very large number of variants, resulting in a high multiple-testing burden and low statistical power. As a result, most human eQTL GWAS have been limited to searches for ‘local’ eQTLs that are located close to the genes they influence (GTEx Consortium, 2015; Lappalainen et al., 2013). The power to detect local eQTLs is higher because focused local tests reduce the multiple-testing burden, and because local eQTLs tend to have larger effect sizes. However, genome-wide estimates show that most regulatory variation does not arise from local eQTLs. Instead, it arises from ‘distant’ eQTLs, which are located far from the genes they influence, typically on different chromosomes, and which exert their effects through trans-acting factors (Grundberg et al., 2012; Wright et al., 2014). Although trans-acting human eQTLs have been discovered (Aguet et al., 2017; Battle et al., 2014; Brynedal et al., 2017; Fehrmann et al., 2011; Grundberg et al., 2012; Heinig et al., 2010; Lee et al., 2014; Small et al., 2011; Wright et al., 2014; Yao et al., 2017), the vast majority remains unknown. As a consequence, we know relatively little about this crucial source of regulatory genetic variation.

In model organisms, eQTLs can be identified by linkage analysis in panels of offspring obtained from crosses of genetically different individuals (Brem et al., 2002). Whereas GWAS studies are powered to test only genetic variants found at high frequency in the population (e.g. [Kita et al., 2017]), linkage studies can assay both common and rare variants that differ between the parental strains. In addition, longer blocks of linkage reduce the number of statistical tests required to cover the genome. As a result, many local and distant eQTLs have been discovered in such studies. However, even in linkage studies, sample size limitations have to date resulted in insufficient statistical power to detect most eQTLs. This limitation has manifested itself as ‘missing heritability’: detected eQTLs tend to account for only a fraction of the measured heritable component of gene expression variation. Here, we addressed this limitation by carrying out an eQTL study in a large panel of segregants from a cross between two yeast strains. The high power of our study allowed us to identify eQTLs that account for the great majority of heritable expression variation in this cross, and to characterize the distant component of regulatory variation in unprecedented depth and detail.

Results

Deep eQTL mapping explains most gene expression heritability

We developed an experimental pipeline for high-throughput generation of RNA-seq data in yeast and obtained high-quality expression measurements (Source data 1 and Source data 2) for 5720 genes in 1012 segregants from a cross between a laboratory and a wine strain (hereafter, BY and RM, respectively). We obtained high-confidence genotypes at 11,530 variant sites from low-coverage whole-genome sequences of the segregants (Bloom et al., 2013) (Source data 3). We used the genotype and RNA-seq data for eQTL mapping and identified 36,498 eQTLs for 5643 genes at a false discovery rate (FDR) of 5% (Source data 4). Only 77 genes had no detected eQTL. Among the genes with at least one detected eQTL, the median number was 6, with a maximum of 21 (Figure 1A; Supplementary Discussion 1 describes the five genes with 21 eQTLs). Previous eQTL mapping in 112 segregants from this cross detected an average of less than one eQTL per gene as a consequence of much lower statistical power (Brem et al., 2002; Smith and Kruglyak, 2008). That data set was used to obtain indirect estimates of the distribution of the number of eQTLs per gene (Brem and Kruglyak, 2005), and these agree closely with the distribution of directly detected eQTLs observed in the current study. For example, Brem and Kruglyak (Brem and Kruglyak, 2005) estimated that at most 3% of genes would be influenced by a single eQTL (we observed 2.6% of such genes), and half of genes would require >5 eQTLs (we observed a median of 6 eQTLs per gene). While Brem and Kruglyak estimated that one third of genes would require more than 8 eQTLs, we observed only 23% such genes. Additional eQTLs of very small effect missed by our study likely account for this discrepancy. The observed distribution of the number of loci also closely matched the distribution we reported for loci influencing 160 protein levels studied with the highly powered X-pQTL approach (Albert et al., 2014b). Our results provide direct demonstration that variation in expression levels of nearly all genes has a complex genetic basis.

Figure 1 with 2 supplements see all

Download asset Open asset

eQTL detection and transcriptome heritability.

(A) Histogram showing the number of eQTLs per gene. (B) Most additive heritability for transcript abundance variation is explained by detected eQTLs. The total variance explained by detected eQTLs for each transcript (y-axis) is plotted against the additive heritability (h²). The diagonal line represents a scenario under which the variance explained by eQTLs exactly matches the heritability. (C) Power to detect eQTLs as a function of effect size, and distributions of observed local and distant eQTL effects. The black curve corresponds to the statistical power (right y-axis) for eQTL detection at a genome-wide significance threshold. Colored areas show the density of individual significant eQTLs (left y-axis) that explain a given fraction of phenotypic variance (x-axis) for distant (blue) and local (red) eQTLs. Note that the x-axis is truncated at 20% variance explained to aid visualization of smaller effects, and omits a long tail of few eQTLs with large effects. (D) Histogram showing the fraction of h² explained by the sum of the eQTLs for each gene.

https://doi.org/10.7554/eLife.35471.003

We used our data to estimate the additive heritability of the expression level of each gene (i.e. the fraction of expression variance attributable to genetic factors; Figure 1—figure supplement 1; Source data 5). We observed a median heritability of 26%, with a maximum of 95% (Figure 1B). Our estimates are similar to those from population-based studies of gene expression in humans (Grundberg et al., 2012; Lloyd-Jones et al., 2017; Wheeler et al., 2016; Wright et al., 2014). The estimates are lower than heritabilities typically seen for organismal traits in this yeast cross (Bloom et al., 2013, 2015), suggesting a greater contribution of environmental and stochastic factors to gene expression variation. Across genes, heritability was positively correlated with mean expression and with expression variance, and negatively correlated with the number of protein-protein and synthetic genetic interaction partners, as well as with gene essentiality (p≤0.005) (Figure 1—figure supplement 2; Supplementary Discussion 2 and 3; Supplementary file 1; Source data 6).

In contrast to previous eQTL studies, the detected eQTLs explained most of the estimated additive gene expression heritability (a median across genes of 71.5%) (Figure 1B and D 10-fold cross-validation). Low missing heritability in our data is explained by the high power of our experiment. We had greater than 90% power to detect eQTLs that explain at least 2.5% of expression variance (Figure 1C). The distribution of effect sizes of detected eQTLs is strongly weighted toward small effects (median 1.9% of variance explained; Figure 1C), suggesting that the remaining missing heritability is explained by undetected eQTLs with even smaller effects. These results are similar to those observed for organismal traits in this cross (Bloom et al., 2013, 2015). Thus, we have discovered most eQTLs with substantial effects that segregate in this cross, and these jointly account for the great majority of the observed genetic variation in the transcriptome.

Genetic expression variation arises primarily from trans-acting hotspots

We found that 2884 genes (50% of 5720 expressed genes) had a local eQTL (defined as an eQTL whose confidence interval includes the gene it influences) at genome-wide significance (Figure 2A). This number rose to 4241 genes (74% of expressed genes) when we performed eQTL analysis with only one nearby marker per gene in order to reduce the multiple testing burden (FDR < 5%). Thus, the single pair of yeast isolates used here harbors sufficient local regulatory variation to alter the expression of more than half the genes in the genome. Comparisons with allele-specific expression data (Albert et al., 2014a) support previous results (Doss et al., 2005; Ronald et al., 2005) that most but not all local eQTLs act in cis (Figure 2—figure supplement 1, Figure 2—figure supplement 2, Supplementary Discussion 4; Supplementary files 2, 3 and 4; Source data 7).

Figure 2 with 2 supplements see all

Download asset Open asset

Contribution of local and distant eQTLs to expression variance.

(A) A stacked barplot showing for each gene with at least one eQTL the amount of phenotypic variance from local and distant eQTLs. Genes are sorted first by the amount of variance from the local eQTL, followed by the amount of variance from the strongest distant eQTL. (B) Violin plots of the distributions of fractions of phenotypic variance explained by summed local and distant eQTLs, respectively. This panel was generated using genes with at least one local and one distant eQTL.

https://doi.org/10.7554/eLife.35471.006

The vast majority of the genome-wide significant eQTLs did not overlap the genes they influenced (92%; 33,529 of 36,498); indeed, 86% were located on a different chromosome. Nearly every expressed gene (98%; 5606) had at least one such distant, trans-acting eQTL (Figure 2A). The individual effect sizes of the trans eQTLs were smaller than those of local eQTLs (median variance explained 2.8-fold less, T-test p<2.2e-16; Figure 1C). However, for the 2846 genes that had both a local eQTL and at least one distant eQTL, the aggregate effect of the distant eQTLs per gene was larger than that of the local eQTL (median 2.6-fold more variance explained; paired T-test p<2.2e-16; Figure 2B). Distant eQTLs accounted for the majority of eQTL variance for 85% of genes. Our results directly demonstrate the importance of trans acting variation.

The trans eQTLs were not uniformly distributed across the genome (Figure 3A). Instead, they clustered at 102 hotspot loci, each of which affected the expression of many genes (Brem et al., 2002; Ghazalpour et al., 2008; Orozco et al., 2012) (Figure 3B). These hotspots contained over 90% of all trans eQTLs. The eQTLs that mapped outside of the hotspots also clustered more than expected by chance (randomization p<0.001), suggesting the existence of additional hotspots that affect the expression of too few genes to pass the stringent criteria used to define the set of 102. Isolated trans-acting loci that affect the expression of one or a few genes appear to be uncommon.

Figure 3

Download asset Open asset

Locations of eQTLs in the genome.

(A) Map of local and distant eQTLs. The genomic locations of eQTL peaks (x-axis) are plotted against the genomic locations of the genes whose expression they influence (y-axis). The strong diagonal band corresponds to local eQTLs. The many vertical bands correspond to eQTL hotspots. Point size is scaled as a function of eQTL effect size, measured in fraction of phenotypic variance explained. (B) The number of gene expression traits linking to each of 102 identified eQTL hotspots (Methods) are shown as vertical bars. Text labels identify genes in hotspots referred to in the text.

https://doi.org/10.7554/eLife.35471.009

The 102 hotspots affected a median of 425 genes, ranging from 26 (a newly discovered hotspot at position 166,390 bp on chromosome III) to 4594 at the previously reported MKT1 hotspot (Zhu et al., 2008) (82% of 5629 genes with any signal at a hotspot; Figure 3B). Three additional hotspots each affected more than half of all genes. They include a previously described hotspot at the HAP1 gene (Brem et al., 2002) (3640 genes affected), as well as two newly detected hotspots on chromosome XIV. A hotspot at 372,376 bp affected 4172 genes and is likely caused by a variant that recently arose in the KRE33 gene in the RM parent used in our cross (Jerison et al., 2017). A hotspot at position 267,161 bp affected 3169 genes and spans the genes GCR2, YNL198C, WHI3 and SLZ1. These results indicate that hotspots can have extraordinarily wide-reaching effects on the transcriptome, with some influencing the expression of the majority of all genes.

Widespread effects caused by single loci likely arise from a cascade of effects in which strong primary effects spread through the cellular regulatory network. For example, the BY allele of the transcriptional activator HAP1 carries a transposon insertion that reduces HAP1 function. As expected, the BY allele strongly reduced the expression of known transcriptional targets of HAP1: 26 out of the 69 HAP1 targets present in our data were among the 50 genes with the largest reduction in expression in segregants carrying the BY allele of HAP1 (p<2.2e-16, odds ratio = 138). In total, only 75 direct transcriptional HAP1 targets are known. Unless previous work missed thousands of HAP1 targets, the vast majority of the 3640 trans eQTLs at HAP1 must reflect indirect, secondary consequences of the direct transcriptional effects. HAP1 is an activator of genes involved in cellular respiration. Thus, the many secondary effects of the BY HAP1 allele on gene expression may be mediated by cellular responses to altered metabolism arising from reduced respiration.

Our Supplementary Files and Datasets provide detailed information about each hotspot. Source data 8 contains a table that gives an overview of the hotspots, including their location, genes affected (details in Source data 9), and analyses of function (details in Source data 10) and transcriptional regulation (details in Source data 11) of the target genes of each hotspot. Supplementary file 5 visually represents each hotspot region, and Supplementary file 6 displays gene networks formed by the strongest target genes of each hotspot.

Causal genes underlying hotspots

Functional analysis of eQTL hotspots requires identification of the underlying causal genes, which has been challenging to do systematically. We developed a multivariate fine-mapping algorithm that narrows hotspot positions by leveraging information across the genes that map to each hotspot (Materials and methods). Briefly, we used all genes with an eQTL on a given chromosome and regressed out genetic factors on all other chromosomes and additional non-genetic factors. We reduced the dimensionality of the residual expression levels by singular value decomposition to capture linear combinations of traits that account for most of the variance in residual expression. We scanned the given chromosome for genetic influences on the multivariate distribution of these linear combinations, while controlling FDR via permutations. Finally, we used bootstraps to compute confidence intervals for hotspot locations (Figure 4 and 5).

Figure 4 with 4 supplements see all

Download asset Open asset

Genes located in hotspot regions.

(A) Histogram showing the number of genes located in the hotspot regions. (B) A hotspot on chromosome VIII maps to the gene *STB5*. From top to bottom: the general region on the chromosome, the empirical frequency distribution of hotspot peak locations from 1000 bootstrap samples (Materials and methods), locations of BY/RM sequence variants (red: variants with ‘high’ impact such as premature stop codons (McLaren et al., 2016); orange: ‘moderate’ impact such as nonsynonymous variants; grey: ‘low’ impact such as synonymous or intergenic variants), and gene locations. The light blue area shows the 95% confidence interval of the hotspot location as determined from the bootstraps. The red line shows the position of the most frequent bootstrap marker. (C) Genes for which the BY allele at the *STB5* hotspot is linked to lower expression are enriched for *STB5* transcription factor (TF) binding sites in their promoter regions. The figure shows enrichment results for all annotated TFs (grey dots), with the strength of enrichment (odds ratio) on the x-axis vs. significance of the enrichment on the y-axis. The *STB5* result is highlighted in red.

https://doi.org/10.7554/eLife.35471.010

With this approach, we resolved the locations of 26 hotspots to regions containing three or fewer genes (a total of 58 genes; Figure 4A). Three hotspots contained exactly one gene (GIS1, STB5, and MOT3). We previously identified and experimentally confirmed the causal genes at several major hotspots (MKT1 (Zhu et al., 2008), HAP1 (Brem et al., 2002), IRA2 (Smith and Kruglyak, 2008), GPA1 (Yvert et al., 2003), and the mating-type locus [Brem et al., 2002]). These were all correctly localized by the algorithm, validating this fine-mapping strategy.

The 58 genes at 26 high-resolution hotspots are highly enriched for the causal genes underlying the hotspots, making it possible to systematically study the functions of hotspot regulators. These genes were less likely to be essential or to have a human homolog than other yeast genes but did not differ from other genes in their expression level or the number of physical or genetic interaction partners (Supplementary file 7). The hotspot genes had highly significant enrichments for gene ontology (GO) terms related to transcriptional regulation (e.g. GO:0006357 ‘regulation of transcription from RNA polymerase II promoter’: 19 genes among the 58; 4 expected; p=4e-9; Figure 4—figure supplement 1; Source data 12), as well as weaker enrichments for terms related to response to nutrient levels (GO:0031669; ‘cellular response to nutrient levels’: 8 genes, 1 expected, p=6e-6)

These analyses indicate that causal hotspot genes are disproportionately involved in transcriptional regulation, a signal that was not picked up in an earlier study with fewer, less-well-resolved hotspots (Yvert et al., 2003). For example, we fine-mapped a new hotspot that affected 382 genes to a single gene, the transcription factor STB5 (Figure 4B). STB5 is a transcriptional activator of multidrug resistance genes (Kasten and Stillman, 1997). A previous analysis suggested reduced activity of the STB5 BY allele compared to the RM allele (Lee and Bussemaker, 2010). Consistent with this observation, we found that the promoters of genes whose expression was lower in the presence of the STB5 BY allele were strongly enriched for STB5 binding sites (Figure 4C).

To further examine the role of sequence variation in transcription factor genes, we focused on transcription factors with variants predicted to be damaging to protein function such as premature stop codons or frameshifts. There are eight such genes in our data. Remarkably, six of these (GAT1, HMS1, PUT3, RFX1, SRD1, TBS1) were located in a hotspot, often very close to the estimated peak location (Figure 4—figure supplement 2). None of these expression hotspots have been reported previously, although variation at GAT1 has been reported to influence traits relevant for wine production (Salinas et al., 2012), and a premature stop codon in the RM allele of RFX1 has been linked to reduced activity of this transcriptional repressor (Lee and Bussemaker, 2010). The remaining two transcription factors with predicted damaging mutations did not overlap a hotspot. Of these, a predicted frameshift in YRM1 is very close to the end of the coding region, while a predicted frameshift in STB4 appears to be an annotation artifact: it resides in a region that is annotated as coding but that does not in fact appear to be transcribed (Figure 4—figure supplement 3, [Albert et al., 2014a]).

Hotspot genes can also influence mRNA levels more indirectly – for instance, by shaping the cellular response to external stimuli such as nutrient availability. For example, we fine-mapped a hotspot, which influenced 645 genes, to an interval on chromosome VIII containing six genes (Figure 4—figure supplement 4A). One of these is ERC1, which encodes a transmembrane transporter. BY but not RM carries a frameshift in this gene, which removes the last two out of 12 predicted transmembrane helices of the protein (Fehrmann et al., 2013). This variant is known to reduce cell-to-cell variability (or ‘noise’) in the expression of a MET17 gene tagged with green fluorescent protein (Fehrmann et al., 2013). We found that the BY allele at this hotspot reduced the expression of genes that are highly enriched for the GO category ‘methionine biosynthetic process’ (GO:0009086, p=2e-22; Source data 8 and Source data 10). Thus, in addition to reducing MET17 expression noise, the ERC1 frameshift variant is linked to reduced mean expression levels of multiple genes in the methionine biosynthesis pathway (the MET regulon; Figure 4—figure supplement 4B). While the precise compounds that are imported or exported by Erc1p are not known, the ERC1 BY allele reduces cellular levels of S-Adenosylmethionine (SAM) (Breunig et al., 2014), a key component of methionine and cysteine amino acid metabolism (Sadhu et al., 2014). The ERC1 BY allele may down-regulate the MET regulon via its effects on SAM, triggering further transcriptional changes in hundreds of genes.

Relationship of local eQTLs and trans eQTLs

Most known causal variants underlying yeast eQTL hotspots are coding (HAP1 (Brem et al., 2002), MKT1 (Zhu et al., 2008), GPA1, AMN1 (Yvert et al., 2003), SSY1 (Brown et al., 2008); [Fay, 2013]); however, change in the expression of a trans-acting factor by a local eQTL is another plausible causal mechanism (Sudarsanam and Cohen, 2014; Yao et al., 2017). We found that a higher proportion of hotspots contained genes with a local eQTL than expected by chance (p=0.007; Figure 5A). The median effect size of the strongest local eQTL in these hotspots was larger than expected (p=0.003). These enrichments are consistent with some hotspots being caused by local eQTLs that alter the expression of a gene located at the hotspot position, which in turn leads to changes in the other transcript levels that map to the hotspot.

Figure 5

Download asset Open asset

Relationship of local eQTLs and distant eQTL hotspots.

(A) The fraction of hotspots that contain a genome-wide significant local eQTL. The black histogram shows the distribution observed in 1000 random, size-matched regions of the genome. Because of the high number of local eQTLs, most hotspots are expected to contain a local eQTL even by chance. The observed fraction (red line) still exceeds this random expectation. (B) Distribution of *trans* eQTLs at local eQTLs outside of hotspot regions. The genome was divided into non-overlapping bins centered on local eQTLs that did not overlap a hotspot. We counted the number of *trans*-eQTL peaks in each bin. The figure shows the frequency of bins with a given number of *trans*-eQTLs. The distribution observed in real data is shown by red lines, and distributions obtained in 1000 randomizations of *trans*-eQTL positions is show by clouds of black circles. The inset shows the observed less the expected frequency for each bin. Error bars indicate the 95% range from the randomizations.

https://doi.org/10.7554/eLife.35471.015

On the other hand, the majority of local eQTLs (60%) did not overlap any of the hotspots. Evidently, the expression changes caused by these local eQTLs did not in turn lead to detectable trans effects on many unlinked genes, within the limits of our statistical power.

To quantify this observation further, we focused on ‘non-hotspot’ local eQTLs that did not overlap a hotspot and asked how they related to the 10% of non-hotspot trans eQTLs that did not overlap a hotspot. We created non-overlapping genomic bins, each centered on a non-hotspot local eQTL. We then counted how many non-hotspot trans eQTL peaks fell into these bins (Figure 5B). The resulting distribution roughly matched the distribution expected if non-hotspot trans peaks occurred at random locations. To the extent that the distribution differed from random, we found an excess of bins with six or more trans eQTLs (p<0.001). There was also an excess of bins with zero trans peaks (p<0.001). This class comprised the great majority of the distribution (Figure 5B). The genetic architecture that is most consistent with these observations is one in which some local eQTLs are accompanied by multiple trans eQTLs, while most local eQTLs have no detectable trans consequences on the expression of other genes.

Even when the causal gene in a hotspot has a local eQTL, it does not automatically follow that this is the causal mechanism. For example, STB5 and ERC1 each had a local eQTL. However, STB5 did not show allele-specific expression, and while there was weak allele-specific expression for ERC1, it was in the opposite direction of the local ERC1 eQTL (Source data 7). Therefore, these local eQTLs are unlikely to be caused by cis-acting variants.

STB5 and ERC1 carry protein-altering variants between BY and RM, including the known causal ERC1 frameshift in BY. Altered protein activity due to these coding variants may be responsible for the many distant linkages to these hotspots and may also cause the observed local eQTLs in trans, as previously shown for AMN1 (Ronald et al., 2005). The Stb5p transcription factor is predicted to target its own promoter (MacIsaac et al., 2006), such that its altered activity could influence its own expression. For the transmembrane transporter encoded by ERC1, the local eQTL might reflect a more indirect mechanism. For each of these hotspots, it seems plausible that a change in protein function, rather than change in gene expression, underlies the hotspot.

Genetics of mRNA vs. protein levels

The degree to which mRNA-based eQTLs also affect the protein levels of their target genes is a fundamental open question (Battle et al., 2015; Chick et al., 2016; Foss et al., 2007; Ghazalpour et al., 2011; Picotti et al., 2013) that has been difficult to resolve as a consequence of low statistical power in eQTL and protein QTL (pQTL) studies. Low power is expected to lead to poor overlap between eQTLs and pQTLs solely as a result of high false-negative rates. We compared our eQTLs to pQTLs that we had identified earlier for 160 proteins using a powerful bulk segregant approach (Albert et al., 2014b) (Source data 13). Here, we present results comparing the distant QTLs in both datasets because – by design of our earlier study – the set of distant pQTLs is much larger than the set of local pQTLs. Results for local QTLs are broadly consistent with those for distant QTLs (Supplementary Discussion 5).

Distant pQTLs clustered at hotspots, which broadly mirrored the mRNA hotspots identified here (Figure 6A). However, differences in hotspot architecture exist. For example, a hotspot on chromosome II showed strong pQTL effects (Albert et al., 2014b) but only weak effects on mRNA levels for the same genes, none of which rose to genome-wide significance.

Figure 6

Download asset Open asset

eQTLs and pQTLs.

(A) Distant eQTL and pQTL hotspots. The figure shows the fraction of 154 genes (Albert et al., 2014b) that have an eQTL or pQTL in a given bin along the genome. eQTLs from the current dataset are shown in the upper half of the figure, and pQTLs from (Albert et al., 2014b) are shown in the bottom half with an inverted scale. Chromosome III is omitted from the figure because no pQTLs can be detected on this chromosome due to the experimental design of (Albert et al., 2014b). (**B–D**) Comparison of individual distant eQTLs and pQTLs. Each panel shows the effect size of linkage of mRNA levels for a given gene to a given genomic position (x-axis; correlation coefficient between mRNA level and marker genotype) compared to the effect size of linkage of protein levels for the same gene to the same genomic position (y-axis; difference in frequency of the BY allele between segregant pools with high and low expression of the protein [Albert et al., 2014b]). Positive values indicate higher expression in RM compared to BY. Only distant QTLs located on different chromosomes than their target gene are shown. (B) All distant eQTLs, irrespective of significance in pQTL data. Dot size scales as a function of eQTL effect size. Red circles: eQTLs that overlap a significant pQTL. Blue circles: strong eQTLs that do not overlap a pQTL (Supplementary file 8); extreme cases are indicated by QTL location and the name of the affected gene. (C) Overlapping significant eQTLs and significant pQTLs. Blue circles: overlapping QTLs with different direction of effect (Supplementary file 9); extreme cases are indicated. (D) All distant pQTLs, irrespective of significance in eQTL data. Dot size scales as a function of pQTL effect size. Red circles: pQTLs that overlap a significant eQTL. Blue circles: strong pQTLs that do not overlap an eQTL (Supplementary file 10); extreme cases are indicated.

https://doi.org/10.7554/eLife.35471.016

In order to avoid downward bias in the overlap between eQTLs and pQTLs caused by false negatives, we focused on strong QTLs in each dataset and asked if they overlapped a significant QTL in the other dataset (Materials and methods; see Supplementary Note 5 for results based on all distant QTLs). Of the 236 strongest eQTLs, 47% (111) overlapped a pQTL for the same gene. Of the 218 strongest pQTLs, 50% (108) overlapped an eQTL for the same gene. As a more sensitive alternative to QTL overlap analyses, we computed estimates of agreement based on the π₁ statistic (Storey and Tibshirani, 2003). Of the strong eQTLs, 92% were estimated to match a pQTL. Of strong pQTLs, 63% were estimated to match an eQTL. Thus, while nearly all eQTLs with strong effects on mRNA levels also affect protein levels for the same gene, a larger fraction of strong pQTLs appear to be specific to protein levels.

Strong eQTLs without a pQTL clustered primarily at the HAP1 and MKT1 hotspots (Supplementary file 8; Figure 6B). These two hotspots also showed the clearest examples of overlapping eQTLs and pQTLs with opposite direction of effect on the same genes (Supplementary file 9; Figure 6C). Thus, while these hotspots influence both mRNA and protein levels of many genes, their effects on mRNA vs. protein levels of a given gene can be quite different. Strong pQTLs without an eQTL were more widely distributed across the genome (Supplementary file 10; Figure 6D).

Detection of non-additive eQTL interactions from a genome-wide search

The contribution of non-additive or ‘epistatic’ genetic interactions to trait variation is a topic of ongoing debate (Hill et al., 2008; Mackay, 2014; Mäki-Tanila and Hill, 2014). In particular, demonstration of non-additive effects on human gene expression has been challenging (Becker et al., 2012; Brown et al., 2014; Buil et al., 2015; Fish et al., 2016; Hemani et al., 2014; Wood et al., 2014). Although clear examples of epistasis have been revealed for yeast gene expression (Brem and Kruglyak, 2005; Storey et al., 2005), the limited power of earlier studies had necessitated targeted search strategies rather than a full genome-by-genome scan.

We reasoned that the high power of our current dataset should permit a more unbiased view of the contribution of epistasis to mRNA expression variation. We carried out a genome-by-genome scan for non-additive interaction effects on the expression levels of all genes and detected 387 eQTL-eQTL interactions influencing 306 genes (FDR = 10%; Source data 14). To our knowledge, this is the first unequivocal identification of eQTL interactions from an unbiased genome-by-genome scan. Targeted scans with a reduced multiple testing burden identified larger numbers of interacting pairs of loci: a total of 784 from a scan for interactions between genome-wide significant additive eQTLs and the genome, and a total of 1464 interactions between significant additive eQTLs.

We examined the 387 eQTL-eQTL interactions detected in the genome-wide scan in more detail. The locations of interacting eQTLs clustered at certain positions in the genome, generally overlapping the hotspots described above (Figure 7A). In particular, many epistatic interactions involved the HAP1 hotspot (79 interactions), the KRE33 hotspot (66 interactions), as well as hotspots containing MKT1, GAP1, the mating type locus, and IRA2. Many interactions connected these hotspots with each other (Figure 7B). For example, 30 genes shared an eQTL interaction between HAP1 and KRE33, 14 genes shared an eQTL interaction between HAP1 and MKT1, 13 genes shared an eQTL interaction between KRE33 with IRA2, and 14 genes shared eQTL interactions between GPA1 and the mating-type locus (Brem et al., 2005).

Figure 7 with 3 supplements see all

Download asset Open asset

Non-additive interactions between eQTLs.

(A) Locations of markers of epistatic pairs (pointing downward) compared to those of additive eQTLs (pointing upward). Epistatic hotspots discussed in the text are highlighted. (B) Interactions between two *trans* loci. The plot shows the genome broken up into chromosomes (indicated as roman numerals), with arches connecting two interacting loci. Arches are shaded such that multiple overlapping interactions appear darker. Epistatic hotspots are indicated as in panel A. The outer histogram shows the density of additive eQTLs. (C) Expression levels of *SAG1* as a function of genotypes at the mating type locus and *GPA1*.

https://doi.org/10.7554/eLife.35471.017

The fact that interacting eQTLs colocalize with additive hotspots suggests that epistatic interactions often involve eQTLs that also have additive effects. Indeed, of the 774 markers in the 387 epistatic pairs, 558 (72%) were within 10 kb of a genome-wide significant additive eQTL influencing the same gene. An estimate based on the π₁ statistic (Storey and Tibshirani, 2003) showed that at least 84% of epistatic markers have additive effects. An example is SAG1, which encodes the Alpha-agglutinin of cells with the alpha mating type. In our cross, the alpha mating type was carried by the RM parent. Consequently, expression of SAG1 was higher in segregants that carried the RM allele at the mating type locus (Figure 7C). In addition to this additive effect, the mating type locus was also involved in an interaction with the GPA1 locus, replicating the finding from a targeted search that a BY-specific S469I variant in GPA1 modulates SAG1 expression in alpha but not a cells (Brem et al., 2005).

Of the 387 interactions, 158 (41%) involved ‘cis by trans’ interactions between distant and local loci (Figure 7—figure supplement 1A). For example, the HAP1 hotspot interacted with local eQTLs for 15 genes (Figure 7—figure supplement 1A). Among these, the RM alleles of HAP1 and SCM4 both increased SCM4 expression, and segregants with the RM genotype at both loci had higher expression levels than expected from additive effects (Figure 7—figure supplement 1B). The HAP1 BY allele is less active due to a transposon insertion (Brem et al., 2002). The local SCM4 eQTL may arise from variants that disrupt a HAP1 binding site (Ter Linde and Steensma, 2002) in BY, and this allelic difference may result in a stronger expression difference in the presence of the more active RM Hap1p. In agreement with this model, an epistatic interaction between inferred Hap1p activity and SCM4 expression has been reported (Parts et al., 2011).

Interestingly, SCM4 is the only annotated direct HAP1 target among the local eQTLs that interact with HAP1 (Harbison et al., 2004). Further, the effect sizes of local eQTLs that interacted with HAP1 were not statistically different in segregants that carried the more active RM vs. the less active BY allele at HAP1 (T-test, p=0.9). In all the cases we examined (KRE33, MKT1, GPA1, and IRA2), the local eQTLs that interacted with these hotspots did not show consistently larger or smaller effect sizes depending on the hotspot allele (all p≥0.3). This argues against a scenario in which most cis by trans interactions are due to variants in transcription factor binding sites whose effect increases with a more active trans regulator. Instead, most cis by trans interaction effects may be mediated through more indirect mechanisms.

We quantified the fraction of variation in gene expression that is contributed by epistatic interactions. Pairwise interactions typically explained about 1/10th as much expression variance as did additive loci (Figure 7—figure supplement 2), with a median of 2.2% of expression variance per pair. Thus, genetic interactions contributed only a small minority of trait variance for gene expression levels, which is consistent with what we previously reported for organism-level traits (Bloom et al., 2015).

We asked if any of the epistatic pairs identified from our unbiased search corresponded to ‘purely’ epistatic interactions without any additive effects. Of the 387 pairs, 111 (29%) were not found in the search between additive loci. Given the small effects of most interactions, this group is likely enriched for false positives from the full search as well as false negatives from the search between additive loci. We searched for specific epistatic pairs in which neither locus had an additive effect (p>0.05), overlapped an additive hotspot (to avoid any potential undetected additive hotspot effects), and where the affected gene was on a different chromosome from both interacting markers (to avoid any potential undetected additive local eQTLs). Only three such pairs existed in our data (Figure 7—figure supplement 3). In each case, the two genotype classes with non-parental genotype combinations (BY-RM and RM-BY) had similar expression levels that differed from those in the two parental genotype combinations (BY-BY and RM-RM), such that the marginal genotype effects canceled out. Even in these three cases, the variance resulting from the interaction was only 2.6–3.6%. We conclude that any contribution of ‘purely’ epistatic pairs to transcriptome variation must be small.

Discussion

The high power of our study allowed us to identify genome-wide significant eQTLs that jointly explain over 70% of gene expression heritability. Thus, our eQTL map shows a high degree of completeness, capturing most genetic sources of transcriptome variation in this cross. In particular, our results allowed us to examine the contribution of trans-acting regulatory variation in much greater detail than previously possible in any species.

We showed that trans-acting eQTLs are the predominant source of expression variation in this cross. Specifically, trans-eQTLs contribute 2.6-fold more to gene expression variance than local eQTLs, remarkably similar to genome-wide estimates in humans of 1.8-fold (Grundberg et al., 2012) and 3.4-fold (Wright et al., 2014). While based on different experimental designs and study populations, these results from yeast and humans clearly demonstrate the importance of trans-acting variation in eukaryotic species.

The vast majority of trans-eQTLs are concentrated at a limited number of hotspot regions that harbor variants with widespread effects on the expression of other genes. The strongest of these hotspots affected the expression of most genes in the genome. The minority of trans eQTLs that fell outside the statistically defined hotspots also clustered more than expected by chance. We saw little evidence for isolated trans-acting loci that affect the expression of one or a few genes. The high density of sequence differences in this cross might have been expected to produce a more diffuse trans-regulatory architecture, with many loci each influencing one or a few genes. Instead, trans-acting regulatory variation arises primarily from several dozen specific loci.

The large number and fine resolution of our hotspots allowed us to conduct unbiased analyses of the genes located in these regions. Unlike earlier work examining fewer, less well localized hotspots (Yvert et al., 2003), we found a strong enrichment for transcription factors, a class of genes with obvious relevance to gene regulation. Guided by this enrichment, we noticed that almost all transcription factor genes with predicted damaging variants in this cross appear to produce a trans-acting hotspot, with the exception of two genes in which the damaging variants may affect gene function only weakly if at all. This result clearly illustrates the importance of variation in transcription factors in shaping gene expression in trans. At the same time, many trans-acting hotspots are caused by variation in genes other than transcription factors. These hotspots illustrate the importance of indirect trans effects that alter cellular networks and physiology, which in turn results in gene expression change.

In an unbiased search, we revealed hundreds of non-additive genetic interactions that influence mRNA levels. Many of these eQTL-eQTL interactions were between eQTLs at two different hotspot regions, or interactions between an eQTL at a hotspot and a local eQTL. Hotspots may play such prominent roles in epistatic interactions because their wide-reaching effects effectively alter the global state of the cell. The effects of many other variants may be larger in some cellular states than others, similar to what is seen when yeast are grown in different environments (Lewis et al., 2014; Smith and Kruglyak, 2008).

Previous searches for epistatic interactions on gene expression levels in this cross were targeted to genes and loci chosen based on prior additive scans (Brem et al., 2005; Storey et al., 2005). Our unbiased search confirmed the validity of this strategy. The vast majority of epistatic markers we detected also had additive effects, and there was little evidence for interactions without any additive effects. Conversely, many additive eQTLs, including several major additive hotspots, were involved in few if any epistatic interactions (Figure 7A). Thus, epistatic interactions cannot be simply assumed to exist for all additive loci, and the most fruitful strategy to detect epistasis is to test loci with additive effects for interactions with other loci.

In our cross between two yeast strains, we identified over one hundred hotspot regions, and found evidence for additional, weaker hotspots. Studies in many multicellular organisms, including plants (Cubillos et al., 2012; Fu et al., 2009; West et al., 2007; Zhang et al., 2011), nematodes (Li et al., 2006; Rockman et al., 2010), mice (Hasin-Brumshtein et al., 2016; Kelly et al., 2012; Orozco et al., 2012) and rats (Heinig et al., 2010; Hubner et al., 2005; Kaisaki et al., 2016) have also observed trans-eQTL hotspots, showing that this phenomenon is not restricted to yeast. Compared to our yeast cross, human populations harbor orders of magnitudes more variants, and the human genome has many more genes and more extensive and complex regulatory regions that offer a large target space for regulatory variation. We would expect these characteristics to result in human trans-eQTL architectures even more complex than that we have uncovered here in yeast.

Recent studies have made progress in identifying trans-eQTLs in humans. Many of the human trans-eQTLs discovered to date tend to influence the expression of multiple genes (Battle et al., 2014; Brynedal et al., 2017; Fehrmann et al., 2011; Grundberg et al., 2012; Heinig et al., 2010; Lee et al., 2014; Small et al., 2011; Wright et al., 2014; Yao et al., 2017). Specifically, work in blood and blood-derived cells from up to ~5000 individuals (Battle et al., 2014; Fairfax et al., 2012; Fehrmann et al., 2011; Pierce et al., 2014; Westra et al., 2013; Wright et al., 2014; Yao et al., 2017) revealed SNPs that were associated with multiple genes in trans, including the HLA locus (Fehrmann et al., 2011), LZY in monocytes, and KLF4 in B-cells (Fairfax et al., 2012). Multiple genes linking to single SNPs were also found in human adipose tissue (e.g.in the KLF14 transcription factor gene (Aguet et al., 2017; Hore et al., 2016; Small et al., 2011, 2018), skin and lymphoblastoid cell lines (Grundberg et al., 2012), and other tissues (Aguet et al., 2017). In immune cells, condition-specific hotspots at known regulators of the immune response were seen upon stimulation, and these altered expression of specific immune response pathways (Fairfax et al., 2014; Lee et al., 2014). Shared, weak effects across transcripts (Brynedal et al., 2017; Hore et al., 2016) indicated that ‘hundreds of trans-eQTLs each affect hundreds of transcripts’ (Brynedal et al., 2017). Together, these observations show that trans-eQTL hotspots exist in humans.

Whether trans-eQTL hotspots are as predominant in humans as they are in yeast (where 90% of our trans-eQTLs mapped to a hotspot) remains an open question. Human trans-eQTLs could be distributed more uniformly, such that hotspots make up a smaller fraction of all trans-eQTLs, leaving room for more gene-specific trans-eQTLs. There could also be more hotspots, each affecting fewer genes than the strongest hotspots in our cross. While it is difficult to extrapolate from the current, still underpowered human studies, it may be informative to consider that the initial eQTL searches in our yeast cross only found eight hotspots, each affecting at most dozens of genes (Brem et al., 2002). The observation that hotspots are the main source of trans variation required the higher statistical power of the present dataset. It remains to be seen if human trans-eQTL discovery in larger samples will follow a similar trajectory.

The recently proposed omnigenic model (Boyle et al., 2017) for the genetic basis of complex trait variation posits that gene regulatory networks are sufficiently densely connected that the change in expression of any one gene, caused by a local eQTL, will ‘percolate’ through the network and alter the expression of all other genes that are expressed in a given cell type. The hotspot loci we described here offer evidence that some regulatory variants can indeed have widespread effects on the transcriptome, in some cases altering the expression of the majority of genes in the genome through precisely the combination of strong direct effects on ‘core’ genes in specific pathways and weak indirect effects on other ‘peripheral’ genes envisioned in the omnigenic model. Hundreds of hotspots may exist in the human genome (Brynedal et al., 2017), providing a rich substrate through which regulatory variation may influence complex traits.

On the other hand, although we detected local eQTLs for most genes in our cross, the majority of these had no detectable trans effects on the expression of other genes, within the limits of our statistical power. Given that our study had sufficient power to detect weak indirect effects of trans-eQTL hotspots, we believe that most local eQTLs indeed have no meaningful downstream consequences for gene expression. By extension, such local eQTLs may be unlikely to contribute to variation in complex traits. Consistent with this conclusion, modest expression changes for dozens of yeast genes have been found to result in minimal fitness effects (Duveau et al., 2017; Keren et al., 2016). These results argue against the simplest form of the omnigenic model, in which a variant that changes the expression of any one gene has meaningful effects on every other gene. Instead, we observed that trans-eQTLs effects preferentially arise from variation in certain classes of genes. Given the crucial importance of regulatory variation for many complex traits (Albert and Kruglyak, 2015), the organismal consequences of expression changes caused by different types of eQTLs remain a key area for further research.

Materials and methods

Unless otherwise specified, all computational analyses were performed in R. Analysis code is available at https://github.com/joshsbloom/eQTL_BYxRM (Bloom and Albert, 2018; copy archived at https://github.com/elifesciences-publications/eQTL_BYxRM). Supplementary Data files are also available at https://figshare.com/s/83bddc1ddf3f97108ad4.

Share this article

Cite this article

eQTL detection and transcriptome heritability.

Contribution of local and distant eQTLs to expression variance.

Locations of eQTLs in the genome.

Genes located in hotspot regions.

Relationship of local eQTLs and distant eQTL hotspots.

eQTLs and pQTLs.

Non-additive interactions between eQTLs.

Author details

Frank Wolfgang Albert

Contribution

Contributed equally with

For correspondence

Competing interests

Joshua S Bloom

Contribution

Contributed equally with

For correspondence

Competing interests

Jake Siegel

Contribution

Competing interests

Laura Day

Contribution

Competing interests

Leonid Kruglyak

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism