1. Genetics and Genomics
Download icon

Extensive impact of low-frequency variants on the phenotypic landscape at population-scale

  1. Téo Fournier
  2. Omar Abou Saada
  3. Jing Hou
  4. Jackson Peter
  5. Elodie Caudal
  6. Joseph Schacherer  Is a corresponding author
  1. Université de Strasbourg, CNRS, GMGM UMR 7156, France
Research Article
  • Cited 11
  • Views 2,365
  • Annotations
Cite this article as: eLife 2019;8:e49258 doi: 10.7554/eLife.49258

Abstract

Genome-wide association studies (GWAS) allow to dissect complex traits and map genetic variants, which often explain relatively little of the heritability. One potential reason is the preponderance of undetected low-frequency variants. To increase their allele frequency and assess their phenotypic impact in a population, we generated a diallel panel of 3025 yeast hybrids, derived from pairwise crosses between natural isolates and examined a large number of traits. Parental versus hybrid regression analysis showed that while most phenotypic variance is explained by additivity, a third is governed by non-additive effects, with complete dominance having a key role. By performing GWAS on the diallel panel, we found that associated variants with low frequency in the initial population are overrepresented and explain a fraction of the phenotypic variance as well as an effect size similar to common variants. Overall, we highlighted the relevance of low-frequency variants on the phenotypic variation.

https://doi.org/10.7554/eLife.49258.001

Introduction

Natural populations are characterized by an astonishing phenotypic diversity. Variation observed among individuals of the same species represents a powerful raw material to develop better insight into the relationship existing between genetic variants and complex traits (Mackay et al., 2009). The recent advances in high-throughput sequencing and phenotyping technologies greatly enhance the ability to determine the genetic basis of traits in various organisms (Alonso-Blanco et al., 2016; Auton et al., 2015; Mackay et al., 2012; Peter et al., 2018). Dissection of the genetic mechanisms underlying natural phenotypic diversity is within easy reach when using classical mapping approaches such as linkage analysis and genome-wide association studies (GWAS) (Mackay et al., 2009; Visscher et al., 2017). Alongside these major advances, however, it must be noted that there are some limitations. All genotype-phenotype correlation studies in humans and other model eukaryotes have identified causal loci in GWAS explaining relatively little of the observed phenotypic variance of most complex traits (Eichler et al., 2010; Hindorff et al., 2009; Manolio et al., 2009; Shi et al., 2016; Stahl et al., 2012; Wood et al., 2014; Zuk et al., 2014).

Despite the efforts made to find the genetic variants responsible for complex traits, the variants found explain only a small part of the heritability, that is of the fraction of the phenotypic variance explained by the underlying genetic variability. One of the most striking examples is observed with human height. This trait is estimated to be 60–80% heritable (Speed et al., 2017; Visscher et al., 2008) but close to 700 variants found in an analysis based on more than 250,000 individuals only explain 20% of this total heritability (Wood et al., 2014). Multiple justifications for this so-called missing heritability have been suggested, including the presence of low-frequency variants, (Gibson, 2012; Hindorff et al., 2009; Manolio et al., 2009; Pritchard, 2001; Walter et al., 2015), structural variants (e.g. copy number variants) (Peter et al., 2018), small effect variants, as well as the low power to estimate non-additive effects (Cordell, 2009; Mackay, 2014; Zuk et al., 2012).

Variants present in less than 5% of the individuals are coined as low-frequency variants and are known to be involved in a large number of rare Mendelian disorders (Gibson, 2012). However, implication of rare variants is also pervasive in common diseases and other complex traits. Assessing the impact and effect of low-frequency variants at a population scale and on a large phenotypic spectrum will allow to gain better insight into the genetic architecture of the phenotypic variation in a species. As GWAS cannot deal with low-frequency and rare variants due to statistical limitations, except for very large sample sizes, their effect has often been overlooked.

Among model organisms, the budding yeast Saccharomyces cerevisiae is especially well suited to dissect variations observed across natural populations (Fay, 2013; Peter and Schacherer, 2016). S. cerevisiae isolates can be found in a broad array of biotopes both human-associated (e.g. wine, sake, beer and other fermented beverages, food, human body) or wild (e.g. plants, soil, insects) and are distributed world-wide (Peter et al., 2018). Phenotypic diversity among yeast isolates is significant and the S. cerevisiae species presents a high level of genetic diversity (π = 3×10−3), much greater than that found in humans (Lek et al., 2016). Because of their small and compact genomes, an unprecedented number of 1,011 S. cerevisiae natural isolates has recently been sequenced (Peter et al., 2018). Yeast genome-wide association analyses have revealed functional Single Nucleotide Polymorphisms (SNPs), explaining a small fraction of the phenotypic variance (Peter et al., 2018). However, these analyses highlighted the importance of the copy number variants (CNVs), which account for a larger proportion of the phenotypic variance and have greater effects on phenotypes compared to the SNPs. Nevertheless, even when CNVs and SNPs are taken together, the phenotypic variance explained is still low (approximately 17% on average) and consequently a large part of it is unexplained.

Interestingly, much of the detected genetic polymorphisms in the 1011 yeast genomes dataset are low-frequency variants with almost 92.7% of the polymorphic sites associated with a minor allele frequency (MAF) lower than 0.05. This trend is similar to that observed in the human population (Auton et al., 2015; Walter et al., 2015) and definitely raised a question regarding the impact of low-frequency variants on the phenotypic landscape within a population and on the missing heritability (Zuk et al., 2014). Here, we investigated the underlying genetic architecture of phenotypic variation as well as unraveling part of the missing heritability by accounting for low-frequency genetic variants at a population-wide scale and non-additive effects controlled by a single locus. For this purpose, we generated and examined a large set of traits in 3025 hybrids, derived from pairwise crosses between a subset of natural isolates from the 1,011 S. cerevisiae population. This diallel crossing scheme allowed us to capture the fraction of the phenotypic variance controlled by both additive and non-additive phenomena as well as infer the main modes of inheritance for each trait. We also took advantage of the intrinsic power of this diallel design to perform GWAS and assess the role of the low-frequency variants on complex traits.

Results

Diallel panel and phenotypic landscape

Based on the genomic and phenotypic data from the 1,011 S. cerevisiae isolate collection (Peter et al., 2018), we selected a subset of 55 isolates that were diploid, homozygous, genetically diverse (Figure 1a), and originated from a broad range of ecological sources (Figure 1b) (e.g. tree exudates, Drosophila, fruits, fermentation processes, clinical isolates) as well as geographical origins (Europe, America, Africa and Asia) (Figure 1c and Supplementary file 1). A full diallel cross panel was constructed by systematically crossing the 55 selected isolates in a pairwise manner (Figure 1d). In total, we generated 3025 hybrids, representing 2970 heterozygous hybrids with a unique parental combination and 55 homozygous hybrids. All 3025 hybrids were viable, indicating no dominant lethal interactions existed between the parental isolates. We then screened the entire set of the parental isolates and hybrids for quantification of mitotic growth abilities across 49 conditions that induce various physiological and cellular responses (Figure 1—figure supplement 1, Figure 1—figure supplement 2, Supplementary file 2). We used growth as a proxy for fitness traits (see Materials and methods). Ultimately, this phenotyping step led to the characterization of 148,225 hybrid/trait combinations.

Figure 1 with 3 supplements see all
Diversity of the 55 selected natural isolates and diallel design.

(a) Pairwise sequence diversity between each pair of parental strains. (b) Ecological origins of the selected strains. See also Supplementary file 1. (c) Geographical origins of the selected strains. (d) Generation of the diallel hybrid panel. 55 natural isolates available as both mating types as stable haploids were crossed in a pairwise manner to obtain 3025 hybrids. This panel was then phenotyped on 49 growth conditions impacting various cellular processes.

https://doi.org/10.7554/eLife.49258.002
Figure 1—source data 1

Growth ratios for every hybrid and parental isolate on each growth condition.

Each value for a given hybrid is the median of 6 replicates. Each value for the haploid parental strains ‘control.a’ and ‘control.b’ are the median of 54 replicates.

https://doi.org/10.7554/eLife.49258.006

Estimation of genetic variance components using the diallel panel (additive vs. non-additive)

The diallel cross design allows for the estimation of additive vs. non-additive genetic components contributing to the variation in each trait by calculating the combining abilities following Griffing’s model (Griffing, 1956). For each trait, the General Combining Ability (GCA) for a given parent refers to the average fitness contribution of this parental isolate across all of its corresponding hybrid combinations, whereas the Specific Combining Ability (SCA) corresponds to the residual variation unaccounted for from the sum of GCAs from the parental combination. Consequently, the phenotype of a given hybrid can be formulated as µ + GCAparent1 + GCAparent2 + SCAhybrid, where µ is the mean fitness of the population for a given trait. We found a near perfect correlation (Pearson’s r = 0.995, p-value<2.2e-16) between expected and observed phenotypic values, confirming the accuracy of the model used (see Materials and methods). Using GCA and SCA values, we estimated both broad- (H2) and narrow-sense (h2) heritabilities for each trait (Figure 1). Broad-sense heritability is the fraction of phenotypic variance explained by genetic contribution. In a diallel cross, the total genetic variance is equal to the sum of the GCA variance of both parents and the SCA variance in each condition. Narrow-sense heritability refers to the fraction of phenotypic variance that can be explained only by additive effects and corresponds to the variance of the GCA in each condition (Figure 2a). The H2 values for each condition ranged from 0.64 to 0.98, with the lowest value observed for fluconazole (1 µg.ml−1) and the highest for sodium meta-arsenite (2.5 mM), respectively. The additive part (h2 values) ranged from 0.12 to 0.86, with the lowest value for fluconazole (1 µg.ml−1) and the highest for sodium meta-arsenite (2.5 mM), respectively. While broad- and narrow-sense heritabilities are variable across conditions, we also observed that on average, most of the phenotypic variance can be explained by additive effects (mean h2 = 0.55). However, non-additive components contribute significantly to some traits, explaining on average one third of the phenotypic variance observed (mean H2 - h2 = 0.29) (Figure 2b). Despite a good correlation between broad- and narrow-sense heritabilities (Pearson’s r = 0.809, p-value=1.921e-12) (Figure 2c), some traits display a larger non-additive contribution, such as in galactose (2%) or ketoconazole (10 µg/ml). Interestingly, we revealed that these two conditions revealed to be mainly controlled by dominance (see below). Altogether, our results highlight the main role of additive effects in shaping complex traits at a population-scale and clearly show that this is not restricted to the single yeast cross where this trend was first observed (Bloom et al., 2013; Bloom et al., 2015). Nonetheless, non-additive effects still explain a third of the observed phenotypic variance. This result also corroborates at a species-wide level the extensive impact of non-additive effects on phenotypic variance (Forsberg et al., 2017; Yadav et al., 2016).

Heritability measurements.

(a) The whole bar represents the overall heritability (H2) for each condition tested. Orange part of the bars represents the narrow-sense heritability h2, that is the fraction of phenotypic variance explained by additive effects, while blue part depicts the fraction of phenotypic variance explained by non-additive effects. (b) Overall mean additive and non-additive effects for every tested growth condition. (c) Representation of H2 as a function of h2 showing the relative additive versus non-additive effects for each condition. Outlier conditions in terms of non-additive variance will lie further away from the linear regression line. Person’s r (95% confidence interval: 0.684–0.889) with the corresponding p-value is displayed.

https://doi.org/10.7554/eLife.49258.007

Relevance of dominance for non-additive effects

To have a precise view of the non-additive components, the mode of inheritance and the relevance of dominance for genetic variance, we focused on the deviation of the hybrid phenotypes from the expected value under a full additive model. Under this model, the hybrid phenotype is expected to be equal to the mean between the two parental phenotypes, hereinafter referred as Mean Parental Value or Mid-Parent Value (MPV). Deviation from this MPV allowed us to infer the respective mode of inheritance for each hybrid/condition combination (Lippman and Zamir, 2007), that is additivity, partial or complete dominance towards one or the other parent and finally overdominance or underdominance (Figure 3a–b, see Materials and methods). Only 17.4% of all hybrid/condition combinations showed enough phenotypic separation between the parents and the corresponding hybrid, allowing the complete partitioning in the seven above-mentioned modes of inheritance. For the 82.6% remaining cases, only a separation of overdominance and underdominance can be achieved (Figure 3c). Interestingly, these events are not as rare as previously described (Zörgö et al., 2012), with 11.6% of overdominance and 10.1% of underdominance (Figure 3d). When a clear separation is possible (Figure 3e), one third of the condition/cross combinations detected were purely additive whereas the rest displayed a deviation towards one of the two parents, with no bias (Figure 3e). When looking at the inheritance mode in each condition, most of the studied growth conditions (32 out of 49) showed a prevalence of additive effects (Figure 3f). However, 17 conditions were not predominantly additive throughout the population. Indeed, a total of 12 conditions were detected as mostly dominant with 4 cases of best parent dominance, including galactose (2%) and ketoconazole (10 µg.ml−1), and 8 of worst parent dominance. The remaining five conditions displayed a majority of partial dominance (Figure 3f). These results confirm the importance of additivity in the global architecture of traits, but more importantly, they clearly demonstrate the major role of dominance as a driver for non-additive effects. Nevertheless, the presence of conditions with a high proportion of partial dominance combined with the cases of over and underdominance may indicate a strong and pervasive impact of epistasis on phenotypic variation.

Mode of inheritance.

(a) Representation of the different mode of inheritance depending on the hybrid value when a separation can be achieved between parental strains and (b) if a clear separation cannot be achieved between parental strains. (c) Percentage of parental phenotypes separated from each other for which a complete partition of different inheritance modes can be achieved. (d) Inheritance modes for every cross and condition where no separation can be achieved between the two homozygous parents. e. Inheritance modes for every cross and condition where a clear phenotypic separation can be achieved between the two homozygous parents. (f) The number of conditions in each main inheritance mode.

https://doi.org/10.7554/eLife.49258.008

Diallel design allows mapping of low-frequency variants in the population using GWAS

Next, we explored the contribution of low-frequency genetic variants (MAF <0.05) to the observed phenotypic variation in our population. Genetic variants considered by GWAS must have a relatively high frequency in the population to be detectable, usually over 0.05 for relatively small datasets (Visscher et al., 2017). Consequently, low-frequency variants are evicted from classical GWAS. However, the diallel crossing scheme stands as a powerful design to assess the phenotypic impact of low-frequency variants present in the initial population as each parental genome is presented several times, creating haplotype mixing across the matrix and preserving the detection power in GWAS.

To avoid issues due to population structure, we selected a subset of hybrids from 34 unrelated isolates in the original panel to perform GWAS (see Materials and methods, Supplementary file 1). By combining known parental genomes, we constructed 595 hybrid genotypes in silico, matching one half matrix of the diallel plus the 34 homozygous diploids. We built a matrix of genetic variants for this panel and filtered SNPs to only retain biallelic variants with no missing calls. In addition, due to the small number of unique parental genotypes, extensive long-distance linkage disequilibrium was also removed (see Materials and methods), leaving a total of 31,632 polymorphic sites in the diallel population. Overall, 3.8% (a total of 1,180 SNPs) had a MAF lower than 0.05 in the initial population of the 1,011 S. cerevisiae isolates but surpassed this threshold in the diallel panel, reaching a MAF of 0.32 (Figure 4a–b).

Figure 4 with 1 supplement see all
Rare and low-frequency variants detection.

(a) Comparison of MAF for each SNP between the whole population (1011 strains) and the hybrid diallel matrix used for GWAS. Hollow blue circles represent the MAF of all SNPs common to the initial population and the diallel hybrids (31,632). Full orange circles show the MAF of significantly associated SNPs. Vertical orange line shows the 5% MAF threshold. (b) Proportion of SNPs with a MAF below 0.05. (c) Proportion of significantly associated SNPs with a MAF below 0.05. (d) Fraction of heritability explained for common and low-frequency variants. P-value was calculated using a two-sided Mann-Whitney-Wilcoxon test, difference in location of −4.5e−3 (95% confidence interval −7.9e−3 -1.4e−3). (e) Absolute effect size of common and low-frequency variants.

https://doi.org/10.7554/eLife.49258.009
Figure 4—source data 1

Significantly associated SNPs SNPs without MAF are SNPs that were not biallelic in the initial population of 1011 isolates (Peter et al., 2018).

https://doi.org/10.7554/eLife.49258.011

To map additive as well as non-additive variants impacting phenotypic variation, we performed GWA using two different models (Seymour et al., 2016) (see Materials and methods). We used a classical additive model, encoding for SNPs where linear relationship between trait and genotype is assessed, that is every locus has a different encoding for each genotype. To account for non-additive inheritance, we also used an overdominant model, which only considers differences between heterozygous and homozygous thus revealing overdominant and dominant effects. For each of these two models, we performed mixed-model association analysis of the 49 growth conditions with FaST-LMM (Lippert et al., 2011; Widmer et al., 2015). Overall, GWAS revealed 1723 significantly associated SNPs (Figure 4—source data 1) by detecting from 2 to 103 significant SNPs by condition, with an average of 39 SNPs per condition. Minor allele frequencies of the significantly associated SNPs were determined in the 1011 sequenced genomes, from which the diallel parents were selected (Figure 4). Interestingly, 16.3% of the significant SNPs (281 in total) corresponded to low-frequency variants (MAF <0.05), with 19.5% of them (55 SNPs) being rare variants (MAF <0.01). This trend is the same and maintained for both models, with 19.3% and 15.2% of low-frequency variants for the additive and overdominant models, respectively. Due to the scheme used, it is important to note that it is possible to increase the MAF of low-frequency variants at a detectable threshold in the diallel panel and to query their effects but it is still difficult for truly rare variants (MAF <0.01), probably leading to an underestimation. However, these results clearly show that low-frequency variants indeed play a significant part in the phenotypic variance at the population-scale. We then estimated the contribution of the significant variants to total phenotypic variation (see Materials and methods) in our panel and found that detected SNPs could explained 15% to 32% of the variance, with a median of 20% (Figure 4d). When looking at the variance explained by each variant over their respective allele frequency, it is noteworthy that low-frequency variants explained roughly the same proportion of the phenotypic variation (median of 20.2%) than the common SNPs (median of 19.6%) (Figure 4d). In addition, the variance explained by the associated rare variants were also higher on average than the rest of the detected SNPs (Figure 4—figure supplement 1a). It is noteworthy that this trend was robust and conserved across the two encoding models implemented, accounting for additive and overdominant effects (Figure 4—figure supplement 1a). However, these results cannot be extrapolated to the whole population and only hold in the scope of our diallel population where these variants are now overrepresented compared to the natural population. Indeed, variance explained is related to the surveyed population because its value relies on the MAF of the variants. Therefore, in the whole natural population of 1011 isolates, their contribution to the phenotypic variance will be less important because of their lower MAF. To obtain a value that is unrelated to the studied population, we measured their respective effect size (Figure 4e). Here again we found that on average, low-frequency variant have about the same effect size (mean of 0.23 sd) than the common variants (mean of 0.25 sd).

To gain insight into the biological relevance of the set of associated SNPs, we first examined their distribution across the genome and found that 62.5% of them are in coding regions (with coding regions representing a total of 72.9% of the S. cerevisiae genome) (Figure 4—figure supplement 1b), with all of these SNPs distributed over a set of 546 genes. Over the last decade, an impressive number of quantitative trait locus (QTL) mapping experiments were performed on a myriad of phenotypes in yeast leading to the identification of 145 quantitative trait genes (QTG) (Peltier et al., 2019) and we found that 19 of the genes we detected are included in this list (Figure 4—figure supplement 1c). In addition, 22 associated genes were also found as overlapping with a recent large-scale linkage mapping survey in yeast (Bloom et al., 2019) (Figure 4—figure supplement 1c). We then asked whether the associated genes were enriched for specific gene ontology (GO) categories (Supplementary file 3). This analysis revealed an enrichment (p-value=5.39×10−5) in genes involved in ‘response to stimulus’ and ‘response to stress’, which is in line with the different tested conditions leading to various physiological and cellular responses.

SGD1 and the mapping of a low-frequency variant

Finally, we focused on one of the most strongly associated genetic variant out of the 281 low-frequency variants significantly associated with a phenotype. The chosen variant was characterized by two adjacent SNPs in the SGD1 gene and was detected in 6-azauracile (100 µg.ml−1) with a p-value of 2.75e-8 with the overdominant encoding and 6.26e-5 with the additive encoding. Their MAF in the initial population is only 2.5% and reached 9% in the diallel panel with three genetically distant strains carrying it (Figure 5a). The SNPs are in the coding sequence of SGD1, an essential gene encoding a nuclear protein. The minor allele (AA) induces a synonymous change (TTG (Leu) → TTA (Leu)) for the first position and a non-synonymous mutation (GAA (Glu)→ AAA (Lys)) for the second position (Figure 5a). The phenotypic advantage conferred by this allele was observed with a significant difference between the homozygous for the minor allele, heterozygous and homozygous for the major allele (Figure 5b). To functionally validate the phenotypic effect of this low-frequency variant, CRISPR-Cas9 genome-editing was used in the three strains carrying the minor allele (AA) in order to switch it to the major allele (GG) and assess its phenotypic impact. Both mating types have been assessed for each strain. When phenotyping the wildtype strains containing the minor allele and the mutated strains with the major allele, we observed that the minor allele confers a phenotypic advantage of 0.2 in growth ratio compared to the major allele (Figure 5c) therefore validating the important phenotypic impact of this low-frequency variant. However, no assumptions can be made regarding the exact effect of this allele at the protein-level because no precise characterization has ever been carried out on Sgd1p and no particular domain has been highlighted.

Low-frequency variant functional validation in 6-azauracil 100 µg.ml−1.

(a) Schematic representation of SGD1 with the relative position of the detected SNPs. The minor allele is represented in orange with its MAF in the population and in the diallel cross panel. (b) boxplot and density plot of the normalized growth ratios for each genotype on 6-azauracil 100 µg.ml−1. Number of observation is displayed in the boxplots. (c) Phenotypic validation after allele replacement of the minor allele with the major allele using CRISPR-Cas9 in the strains carrying the minor allele. Error bars represent median absolute deviation (four replicates).

https://doi.org/10.7554/eLife.49258.012

Discussion

Understanding the source of the missing heritability is essential to precisely address and dissect the genetic architecture of complex traits. Over the years, the diallel hybrid panel design has proven its strength to dissect part of the genetic architecture of traits in populations. One of the main advantages of using such experimental design is the ability to precisely isolate the part of phenotypic variance that is controlled by additive effects from the one controlled by non-additive effects. While our analysis revealed that an important part of the phenotypic variance is linked to additive effects, about a third remains ruled by non-additive interactions encompassing dominance and epistasis. These results are in line with previous findings.

However, care should be taken with the classification of the mode of inheritance. Indeed, as we do not know how many loci are involved for each hybrid’s phenotype, we can only assess the final phenotypic outcome of all the genetic variants involved and not on a locus by locus basis. This classification does not take into account their number, effect size and interactions. Consequently, the mode of inheritance that we described here solely reflects how the phenotype of the hybrid varies with respect to its parents. For example, several interactions could take place with opposite effect, leading to a final phenotype that appears as being controlled by an additive mode of inheritance (i.e. the hybrid phenotype equal to the mid parent value). However, in the cases where dominance was detected as a mode of inheritance, this might reflect the presence of a single locus having a strong phenotypic impact acting dominantly thus being responsible by itself for the phenotype. Yet, if two hybrids show a complete dominance in the same condition, it does not mean that the same alleles are involved in both.

Although few low-frequency and rare variants were considered in our GWAS (4%) due to stringent filtering conditions, a strong enrichment in these variants has been observed in the significantly associated ones (16%), demonstrating the ubiquity of low-frequency variants with important phenotypic impact. However, when looking at the population level, even though they do have effect sizes similar to common variants, they are not going to explain an important part of variance because it relies both on effect size and allele frequency. A good example of this phenomenon has been seen with a study of human height in more than 700,000 individuals. A total of 83 significantly associated rare and low-frequency variants with effect sizes up to 2 cm have been mapped (Marouli et al., 2017). On average, they explained the same amount of phenotypic variation as common variants, which displayed much smaller effect sizes of about 1 mm. Our results suggest that a high number of low-frequency variants play a decisive role in the phenotypic landscape of a population both in term of number and effect size. Taken one by one, they do not explain a lot of phenotypic variance in a large population. Yet, altogether, they might actually explain a greater part of the variation than the one explained by common variants.

The contribution of rare and low-frequency variants to traits is largely unexplored. In humans, these genetic variants are widespread but only a few of them have been associated with specific traits and diseases (Walter et al., 2015). Recently, it has been shown that the missing heritability of height and body mass index is accounted for by rare variants (Wainschtein et al., 2019). We also recently found in yeast that most of the previously identified Quantitative Trait Nucleotides (QTNs) using linkage mapping were at low allele frequency in the 1,011 S. cerevisiae population (Hou et al., 2016; Hou et al., 2019; Peltier et al., 2019; Peter et al., 2018). A total of 284 QTNs were identified by linkage mapping and 150 of them are present at a low frequency in the population of 1011 isolates (Peltier et al., 2019; Peter et al., 2018). However, these QTNs were mapped with mostly closely related genetic backgrounds, encompassing a total of 59 strains with 30% of them coming from laboratory and 41% coming from the wine cluster, which has a very low genetic diversity (Peter et al., 2018). Moreover, experimentally validated QTNs are, most of the time, genetic variants with the most important phenotypic impact, which has been previously recognized as inducing an ascertainment bias (Rockman, 2012). It also raised the question of whether these rare and large effect size alleles discovered in specific crosses are really relevant to the variation across most of the population.

Here, we quantified the contribution of low-frequency variants across a large number of growth conditions and found that among all the genetic variants detected by GWAS on a diallel panel, 16.3% of them have a low-frequency in the initial population and explain a significant part of the phenotypic variance (21% on average). This particular diallel design also presents an intrinsic power to evaluate the additive vs. non-additive genetic components contributing to the phenotypic variation. We assessed the effect of intra-locus dominance on the non-additive genetic component and showed that dominance at the single locus level contributed to the phenotypic variation observed. However, other more complicated inter-loci interactions may still be involved. Altogether, these results have major implications for our understanding of the genetic architecture of traits in the context of unexplained heritability. In parallel to a recent large-scale linkage mapping survey in yeast (Bloom et al., 2019), our study highlights the extensive role of low-frequency variants on the phenotypic variation.

Materials and methods

Construction of the diallel panel

Selection of the S. cerevisiae isolates

Request a detailed protocol

Out of the collection of 1011 strains (Peter et al., 2018), a total of 53 natural isolates were carefully selected to be representative of the S. cerevisiae species. We selected isolates from a broad ecological origins and we prioritized for strains that were diploid, homozygous, euploid and genetically as diverse as possible, that is up to 1% of sequence divergence. All the isolate details, including ecological and geographical origins, are listed in Supplementary file 1. In addition to these 53 isolates, we included two laboratory strains, namely ∑1278b and the reference S288c strain.

Generation of stable haploids

Request a detailed protocol

For each selected parental strain, stable haploid strains were obtained by deleting the HO locus. The HO deletions were performed using PCR fragments containing drug resistance markers flanked by homology regions up and down stream of the HO locus, using standard yeast transformation method. Two resistance cassettes, KanMX and NatMX, were used for MATa and MATα haploids, respectively. The mating-type (MATa and MATα) of antibiotic-resistant clones was determined using testers of well-known mating type. For each genetic background, we selected a MATa and MATα clone that are resistant to G418 or nourseothricin, respectively.

Phenotyping of the parental haploid strains was performed to check for mating type-specific fitness effects. All MATa and MATα parental strains were tested on all 49 growth conditions (see below) using the same procedure as the phenotyping assay of the hybrid matrix. The overall correlation between the MATa and MATα parental strains was 0.967 (Pearson, p-value<1e-324), with an average correlation per strain of 0.976 across different conditions (Figure 1—figure supplement 3). No significant mating type specificity was identified.

Diallel scheme

Request a detailed protocol

Parental strains were arrayed and pregrown in liquid YPD (1% yeast extract, 2% peptone and 2% glucose) overnight. Mating was performed with ROTOR (Singer Instruments) by pinning and mixing MATa over MATα parental strains on solid YPD. The parental strains, that is 55 MATa HO::∆KanMX and 55 MATα HO::∆NatMX strains were arrayed and mated in a pairwise manner on YPD for 24 hr at 30°C. The mating mixtures were replicated on YPD supplemented with G418 (200 µg.ml−1) and nourseothricin (100 µg.ml−1) for double selection of hybrid individuals. After 24 hr, plates were replicated again on the same media to eliminate potential residuals of non-hybrids cells. In total, we generated 3025 hybrids, representing 2970 heterozygous hybrids with a unique parental combination and 55 homozygous hybrids.

High-throughput phenotyping and growth quantification

Request a detailed protocol

Quantitative phenotyping was performed using endpoint colony growth on solid media. Strains were pregrown in liquid YPD medium and pinned onto a solid SC (Yeast Nitrogen Base with ammonium sulfate 6.7 g.l−1, amino acid mixture 2 g.l−1, agar 20 g.l−1, glucose 20 g.l−1) matrix plate to a 1536 density format using the replicating ROTOR robot (Singer Instruments). Two biological replicates (coming from independent cultures) of each parental haploid strain were present on every plate and six biological replicates were present for each hybrid. As 27 plates were used in order to phenotype all the hybrids, 27 technical replicates (same culture in different plates) of the parents were present. The resulting matrix plates were incubated overnight to allow sufficient growth, which were then replicated onto 49 media conditions, plus SC as a pinning control (Figure 1—figure supplement 1, Supplementary file 2). The selected conditions impact a broad range of cellular responses, and multiple concentrations were tested for each compound (Figure 1—figure supplement 2). Most tested conditions displayed distinctive phenotypic patterns, suggesting different genetic basis for each of them (Figure 1—figure supplement 2). The plates were incubated for 24 hr at 30°C (except for 14°C phenotyping) and were scanned with a resolution of 600 dpi at 16-bit grayscale. Quantification of the colony size was performed using the R package Gitter (Wagih and Parts, 2014) and the fitness of each strain on the corresponding condition was measured by calculating the normalized growth ratio between the colony size on a condition and the colony size on SC. As each hybrid is present in six replicates, the value considered for its phenotype is the median of all its replicates, thus smoothing the effects of pinning defect or contamination. This phenotyping step led to the determination of 148,225 hybrid/trait combinations (Figure 1—source data 1).

Diallel combining abilities and heritabilities

Request a detailed protocol

Combining ability values were calculated using half diallel with unique parental combinations, excluding homozygous hybrids from identical parental strains. For each hybrid individual, the fitness value is expressed using Griffing’s model (Griffing, 1956):

zij=μ+gi+ gj+sij+e

Where zij is the fitness value of the hybrid resulting from the combination of ith and jth parental strains, zij is the mean population fitness, μ and gi are the general combining ability for the ith and jth parental strains, gj is the specific combining ability associated with the sij hybrid, and e is the error term (i = 1...N, j = 1…N, N = 55). General combining ability for the ith parent is calculated as:

i×j

Where N is the total number of parental types, gi^=N-1N-2×zi¯-μ is the mean fitness value of all half sibling hybrids involving the ith parent, and zi- is the population mean. The error term associated with μ is:

gi

Where N is the total number of parental types, n is the number of replicates for the egi=N-1×σ2zijn×N×N-2 hybrid, and i×j is the variance of fitness values from a full-sib family involving the ith and jth parents, which is expressed as:

σ2zij

Specific combining ability for the σ2zij=σ2zi+σ2zj+σ2zij+2×covzi,zj hybrid combination therefore:

i×j

The error term associated with sij^=zij-- gi^-gj^-μ is:

sij^

Using combining ability estimates, broad- and narrow-sense heritabilities can be calculated. Narrow sense heritability (h2) accounts for the part of phenotypic variance explained only by additive variance, expressed as the additive variance (esij=N-3×σ2zijn×N-1) over the total phenotypic variance observed (σA2):

σP2

Where h2=σA2σP2=σ(gi+gj)2σ(gi+gj)2+σsij2+σe2 is the sum of GCA variances, σ(gi+gj)2 is the SCA variance and σsij2 is the variance due to measurement error, which is expressed as:

σe2

On the other hand, broad-sense heritability (H2) depicts the part of the phenotypic variance explained by the total genetic variance σe2=N-2egi-+egj--2+N2-N2-1N2-N2+N-3× esij-2:

σG2

Phenotypic variance explained by non-additive variance is therefore equal to the difference between H2 and h2. All calculations were performed in R using custom scripts.

Calculation of mid-parent values and classification of mode of inheritance

Request a detailed protocol

Mid-Parent Value (MPV) is expressed as the mean fitness value of both diploid homozygous parental phenotypes:

H2=σG2σP2=σgi+gj2+σsij2σ(gi+gj)2+σsij2+σe2

Comparing the hybrid phenotypic value (Hyb) to its respective parents’ allows for an inference of the mode of inheritance for each hybrid/trait combination (Figure 3a–b). To obtain a robust classification, confidence intervals for each class were based on the standard deviation of hybrid (six replicates) and parents (54 replicates). P2 is the phenotypic value of the fittest parent while P1 is the phenotypic value of the least fit parent.

Inheritance modeFormula
UnderdominanceHyb<P1(σP1+σHyb)
Dominance P1P1(σP1+σHyb)<Hyb<P1+(σP1+σHyb)
Partial dominance P1P1+(σP1+σHyb)<Hyb<MPV(σP1+σP22+σHyb)
AdditivityMPV+(σP1+σP22+σHyb)<Hyb<P2(σP2+σHyb)
Partial dominance P2MPV(σP1+σP22+σHyb)<Hyb<MPV+(σP1+σP22+σHyb)
Dominance P2P2(σP2+σHyb)<Hyb<P2+(σP2+σHyb)
OverdominanceP2+(σP2+σHyb)<Hyb

When a clear separation is possible between the two parental phenotypic values (P1+σP1<P2σP2),the full decomposition in the seven above mentioned categories is possible (Figure 3a). However, in most of the cases, the two parental phenotypic values are not separated enough to achieve this but it is still possible to distinguish between overdominance and underdominance (Figure 3b, Figure 3d). All calculations were performed in R using custom scripts.

Genome-wide association studies on the diallel panel

Request a detailed protocol

Whole genome sequences for the parental strains were obtained from the 1002 yeast genome project (Peter et al., 2018). Sequencing was performed by Illumina Hiseq 2000 with 102 bases read length. Reads were then mapped to S288c reference genome using bwa (v0.7.4-r385) (Li and Durbin, 2009). Local realignment around indels and variant calling has been performed with GATK (v3.3–0) (McKenna et al., 2010). The genotypes of the F1 hybrids were constructed in silico using 34 parental genome sequences. We retained only the biallelic polymorphic sites, resulting in a matrix containing 295,346 polymorphic sites encoded using the ‘recode12’ function in PLINK (Chang et al., 2015). Those genotypes correspond to a half-matrix of pairwise crosses with unique parental combinations, including the diagonal,that is the 34 homozygous parental genotypes. For each cross, we combined the genotypes of both parents to generate the hybrid diploid genome. As a result, heterozygous sites correspond to sites for which the two parents had different allelic versions. We removed long-range linkage disequilibrium sites in the diallel matrix due to the low number of founder parental genotypes by removing haplotype blocks that are shared more than twice across the population, resulting in a final dataset containing 31,632 polymorphic sites.

We performed GWA analyses with different encodings (Seymour et al., 2016). In the additive model, the genotypes of the F1 progeny were simply the concatenation of the genotypes from the parents. As homozygous parental alleles were encoded as 1 or 2, the possible alleles for each site in the F1 genotype were ‘11’ and ‘22’ for homozygous sites and ‘12’ for heterozygous sites. We also used an overdominant genotype encoding, where both the homozygous minor and homozygous major alleles were encoded as ‘11’ and the heterozygous genotype was encoded as ‘22’.

Mixed-model association analysis was performed using the FaST-LMM python library version 0.2.32 (https://github.com/MicrosoftGenomics/FaST-LMM) (Widmer et al., 2015). We used the normalized phenotypes by replacing the observed value by the corresponding quantile from a standard normal distribution, as FaST-LMM expects normally distributed phenotypes. The command used for association testing was the following: single_snp(bedFiles, pheno_fn, count_A1 = True), where bedFiles is the path to the PLINK formatted SNP data and pheno_fn is the PLINK formatted phenotype file. By default, for each SNP tested, this method excludes the chromosome in which the SNP is found from the analysis in order to avoid proximal contamination. Fast-LMM also computes the fraction of heritability explained for each SNP. The mixed model adds a polygenic term to the standard linear regression designed to circumvent the effects of relatedness and population stratification.

We estimated a condition-specific p-value threshold for each condition by permuting phenotypic values between individuals 100 times. The significance threshold was the 5% quantile (the 5th lowest p-value from the permutations). With that method, variants passing this threshold will have a 5% family-wise error rate. However, we do not have any estimation of the false positive rate. Taken together, GWA revealed 1723 significantly associated SNPs (Figure 4—source data 1), with 1273 and 450 SNPs for overdominant and additive model, respectively.

Variance explained and effect size

Request a detailed protocol

Variance explained by each SNP is calculated by PLINK. Care must be taken that in order to obtain the variance explained by all SNPs, it is not possible to sum up the variance explained by each individual SNP based on the fact that SNPs are not completely independent from one another.

The effect size was calculated using the formula for Cohen's d:

P1+σP1<P2-σP2

Where the pooled standard deviation is calculated with the following formula:

sdPooled=sd12+sd222

Under the additive model, the heterozygote phenotype is equidistant to both possible homozygote phenotypes (minor allele and major allele), so our calculation of the effect size could either compare the heterozygotes with the homozygotes in the minor allele, or the heterozygotes with the homozygotes in the major alleles. We chose to use the latter since the major allele grants us more statistical power. The formula we used to obtain the effect size for a given SNP under this model is the following:

sdPooled=sd12+ sd222

Under the overdominant model, the heterozygote phenotype is compared to the phenotype of the group of both homozygotes (minor and major), so the formula we used to obtain the effect size for a given SNP under this model is the following:

Effect size=xHeterozygous--xMajor-sdPooled

Gene ontology analysis

Request a detailed protocol

GO term enrichment was performed using SGD GO Term Finder (https://www.yeastgenome.org/goTermFinder) with the 546 unique genes containing significantly associated SNPs (Figure 4—source data 1 and Supplementary file 3). Significant enrichment is considered under ‘Process’ ontology with a p-value cutoff of 0.05.

CRISPR-Cas9 allele editing

Request a detailed protocol

pAEF5 plasmid containing Cas9 endonuclease and the guide RNA targeting SGD1 was co-transformed with the repair fragment of 100 nucleotides containing the desired allele. Transformed cells were then plated on YPD supplemented with 200 µg.ml−1 hygromycin at 30°C to select for transformants. Colonies were then arrayed on a 96 well plate with 100 µl YPD and grown for 24 hr to induce plasmid loss. The plate was then pinned back onto solid YPD for 24 hr then replica plated to YPD supplemented with 200 µg.ml−1 hygromycin to check for plasmid loss. Allele specific PCR was performed on colonies that lost the plasmid (Wangkumhang et al., 2007) to distinguish correctly edited allele from wildtype allele. Strains who showed amplification for the edited allele and no amplification for the wildtype allele were phenotyped (four technical replicates and four biological replicates) on the corresponding condition to measure differences with their wildtype counterparts.

Statistical tests

Request a detailed protocol

Person’s correlation test was used to assess linear correlation between two sets.

Wilcoxon Mann Whitney was used to determine if two independent samples have the same distribution.

Correlogram of all tested growth conditions. Numbers in each cell represent 100 x Pearson’s r value.

Data availability

All data generated or analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 1 and 4.

The following previously published data sets were used

References

    1. Marouli E
    2. Graff M
    3. Medina-Gomez C
    4. Lo KS
    5. Wood AR
    6. Kjaer TR
    7. Fine RS
    8. Lu Y
    9. Schurmann C
    10. Highland HM
    11. Rüeger S
    12. Thorleifsson G
    13. Justice AE
    14. Lamparter D
    15. Stirrups KE
    16. Turcot V
    17. Young KL
    18. Winkler TW
    19. Esko T
    20. Karaderi T
    21. Locke AE
    22. Masca NG
    23. Ng MC
    24. Mudgal P
    25. Rivas MA
    26. Vedantam S
    27. Mahajan A
    28. Guo X
    29. Abecasis G
    30. Aben KK
    31. Adair LS
    32. Alam DS
    33. Albrecht E
    34. Allin KH
    35. Allison M
    36. Amouyel P
    37. Appel EV
    38. Arveiler D
    39. Asselbergs FW
    40. Auer PL
    41. Balkau B
    42. Banas B
    43. Bang LE
    44. Benn M
    45. Bergmann S
    46. Bielak LF
    47. Blüher M
    48. Boeing H
    49. Boerwinkle E
    50. Böger CA
    51. Bonnycastle LL
    52. Bork-Jensen J
    53. Bots ML
    54. Bottinger EP
    55. Bowden DW
    56. Brandslund I
    57. Breen G
    58. Brilliant MH
    59. Broer L
    60. Burt AA
    61. Butterworth AS
    62. Carey DJ
    63. Caulfield MJ
    64. Chambers JC
    65. Chasman DI
    66. Chen YI
    67. Chowdhury R
    68. Christensen C
    69. Chu AY
    70. Cocca M
    71. Collins FS
    72. Cook JP
    73. Corley J
    74. Galbany JC
    75. Cox AJ
    76. Cuellar-Partida G
    77. Danesh J
    78. Davies G
    79. de Bakker PI
    80. de Borst GJ
    81. de Denus S
    82. de Groot MC
    83. de Mutsert R
    84. Deary IJ
    85. Dedoussis G
    86. Demerath EW
    87. den Hollander AI
    88. Dennis JG
    89. Di Angelantonio E
    90. Drenos F
    91. Du M
    92. Dunning AM
    93. Easton DF
    94. Ebeling T
    95. Edwards TL
    96. Ellinor PT
    97. Elliott P
    98. Evangelou E
    99. Farmaki AE
    100. Faul JD
    101. Feitosa MF
    102. Feng S
    103. Ferrannini E
    104. Ferrario MM
    105. Ferrieres J
    106. Florez JC
    107. Ford I
    108. Fornage M
    109. Franks PW
    110. Frikke-Schmidt R
    111. Galesloot TE
    112. Gan W
    113. Gandin I
    114. Gasparini P
    115. Giedraitis V
    116. Giri A
    117. Girotto G
    118. Gordon SD
    119. Gordon-Larsen P
    120. Gorski M
    121. Grarup N
    122. Grove ML
    123. Gudnason V
    124. Gustafsson S
    125. Hansen T
    126. Harris KM
    127. Harris TB
    128. Hattersley AT
    129. Hayward C
    130. He L
    131. Heid IM
    132. Heikkilä K
    133. Helgeland Ø
    134. Hernesniemi J
    135. Hewitt AW
    136. Hocking LJ
    137. Hollensted M
    138. Holmen OL
    139. Hovingh GK
    140. Howson JM
    141. Hoyng CB
    142. Huang PL
    143. Hveem K
    144. Ikram MA
    145. Ingelsson E
    146. Jackson AU
    147. Jansson JH
    148. Jarvik GP
    149. Jensen GB
    150. Jhun MA
    151. Jia Y
    152. Jiang X
    153. Johansson S
    154. Jørgensen ME
    155. Jørgensen T
    156. Jousilahti P
    157. Jukema JW
    158. Kahali B
    159. Kahn RS
    160. Kähönen M
    161. Kamstrup PR
    162. Kanoni S
    163. Kaprio J
    164. Karaleftheri M
    165. Kardia SL
    166. Karpe F
    167. Kee F
    168. Keeman R
    169. Kiemeney LA
    170. Kitajima H
    171. Kluivers KB
    172. Kocher T
    173. Komulainen P
    174. Kontto J
    175. Kooner JS
    176. Kooperberg C
    177. Kovacs P
    178. Kriebel J
    179. Kuivaniemi H
    180. Küry S
    181. Kuusisto J
    182. La Bianca M
    183. Laakso M
    184. Lakka TA
    185. Lange EM
    186. Lange LA
    187. Langefeld CD
    188. Langenberg C
    189. Larson EB
    190. Lee IT
    191. Lehtimäki T
    192. Lewis CE
    193. Li H
    194. Li J
    195. Li-Gao R
    196. Lin H
    197. Lin LA
    198. Lin X
    199. Lind L
    200. Lindström J
    201. Linneberg A
    202. Liu Y
    203. Liu Y
    204. Lophatananon A
    205. Luan J
    206. Lubitz SA
    207. Lyytikäinen LP
    208. Mackey DA
    209. Madden PA
    210. Manning AK
    211. Männistö S
    212. Marenne G
    213. Marten J
    214. Martin NG
    215. Mazul AL
    216. Meidtner K
    217. Metspalu A
    218. Mitchell P
    219. Mohlke KL
    220. Mook-Kanamori DO
    221. Morgan A
    222. Morris AD
    223. Morris AP
    224. Müller-Nurasyid M
    225. Munroe PB
    226. Nalls MA
    227. Nauck M
    228. Nelson CP
    229. Neville M
    230. Nielsen SF
    231. Nikus K
    232. Njølstad PR
    233. Nordestgaard BG
    234. Ntalla I
    235. O'Connel JR
    236. Oksa H
    237. Loohuis LM
    238. Ophoff RA
    239. Owen KR
    240. Packard CJ
    241. Padmanabhan S
    242. Palmer CN
    243. Pasterkamp G
    244. Patel AP
    245. Pattie A
    246. Pedersen O
    247. Peissig PL
    248. Peloso GM
    249. Pennell CE
    250. Perola M
    251. Perry JA
    252. Perry JR
    253. Person TN
    254. Pirie A
    255. Polasek O
    256. Posthuma D
    257. Raitakari OT
    258. Rasheed A
    259. Rauramaa R
    260. Reilly DF
    261. Reiner AP
    262. Renström F
    263. Ridker PM
    264. Rioux JD
    265. Robertson N
    266. Robino A
    267. Rolandsson O
    268. Rudan I
    269. Ruth KS
    270. Saleheen D
    271. Salomaa V
    272. Samani NJ
    273. Sandow K
    274. Sapkota Y
    275. Sattar N
    276. Schmidt MK
    277. Schreiner PJ
    278. Schulze MB
    279. Scott RA
    280. Segura-Lepe MP
    281. Shah S
    282. Sim X
    283. Sivapalaratnam S
    284. Small KS
    285. Smith AV
    286. Smith JA
    287. Southam L
    288. Spector TD
    289. Speliotes EK
    290. Starr JM
    291. Steinthorsdottir V
    292. Stringham HM
    293. Stumvoll M
    294. Surendran P
    295. 't Hart LM
    296. Tansey KE
    297. Tardif JC
    298. Taylor KD
    299. Teumer A
    300. Thompson DJ
    301. Thorsteinsdottir U
    302. Thuesen BH
    303. Tönjes A
    304. Tromp G
    305. Trompet S
    306. Tsafantakis E
    307. Tuomilehto J
    308. Tybjaerg-Hansen A
    309. Tyrer JP
    310. Uher R
    311. Uitterlinden AG
    312. Ulivi S
    313. van der Laan SW
    314. Van Der Leij AR
    315. van Duijn CM
    316. van Schoor NM
    317. van Setten J
    318. Varbo A
    319. Varga TV
    320. Varma R
    321. Edwards DR
    322. Vermeulen SH
    323. Vestergaard H
    324. Vitart V
    325. Vogt TF
    326. Vozzi D
    327. Walker M
    328. Wang F
    329. Wang CA
    330. Wang S
    331. Wang Y
    332. Wareham NJ
    333. Warren HR
    334. Wessel J
    335. Willems SM
    336. Wilson JG
    337. Witte DR
    338. Woods MO
    339. Wu Y
    340. Yaghootkar H
    341. Yao J
    342. Yao P
    343. Yerges-Armstrong LM
    344. Young R
    345. Zeggini E
    346. Zhan X
    347. Zhang W
    348. Zhao JH
    349. Zhao W
    350. Zhao W
    351. Zheng H
    352. Zhou W
    353. Rotter JI
    354. Boehnke M
    355. Kathiresan S
    356. McCarthy MI
    357. Willer CJ
    358. Stefansson K
    359. Borecki IB
    360. Liu DJ
    361. North KE
    362. Heard-Costa NL
    363. Pers TH
    364. Lindgren CM
    365. Oxvig C
    366. Kutalik Z
    367. Rivadeneira F
    368. Loos RJ
    369. Frayling TM
    370. Hirschhorn JN
    371. Deloukas P
    372. Lettre G
    373. EPIC-InterAct Consortium, CHD Exome+ Consortium, ExomeBP Consortium, T2D-Genes Consortium, GoT2D Genes Consortium, Global Lipids Genetics Consortium, ReproGen Consortium, MAGIC Investigators
    (2017) Rare and low-frequency coding variants alter human adult height
    Nature 542:186–190.
    https://doi.org/10.1038/nature21039
    1. Wood AR
    2. Esko T
    3. Yang J
    4. Vedantam S
    5. Pers TH
    6. Gustafsson S
    7. Chu AY
    8. Estrada K
    9. Luan J
    10. Kutalik Z
    11. Amin N
    12. Buchkovich ML
    13. Croteau-Chonka DC
    14. Day FR
    15. Duan Y
    16. Fall T
    17. Fehrmann R
    18. Ferreira T
    19. Jackson AU
    20. Karjalainen J
    21. Lo KS
    22. Locke AE
    23. Mägi R
    24. Mihailov E
    25. Porcu E
    26. Randall JC
    27. Scherag A
    28. Vinkhuyzen AA
    29. Westra HJ
    30. Winkler TW
    31. Workalemahu T
    32. Zhao JH
    33. Absher D
    34. Albrecht E
    35. Anderson D
    36. Baron J
    37. Beekman M
    38. Demirkan A
    39. Ehret GB
    40. Feenstra B
    41. Feitosa MF
    42. Fischer K
    43. Fraser RM
    44. Goel A
    45. Gong J
    46. Justice AE
    47. Kanoni S
    48. Kleber ME
    49. Kristiansson K
    50. Lim U
    51. Lotay V
    52. Lui JC
    53. Mangino M
    54. Mateo Leach I
    55. Medina-Gomez C
    56. Nalls MA
    57. Nyholt DR
    58. Palmer CD
    59. Pasko D
    60. Pechlivanis S
    61. Prokopenko I
    62. Ried JS
    63. Ripke S
    64. Shungin D
    65. Stancáková A
    66. Strawbridge RJ
    67. Sung YJ
    68. Tanaka T
    69. Teumer A
    70. Trompet S
    71. van der Laan SW
    72. van Setten J
    73. Van Vliet-Ostaptchouk JV
    74. Wang Z
    75. Yengo L
    76. Zhang W
    77. Afzal U
    78. Arnlöv J
    79. Arscott GM
    80. Bandinelli S
    81. Barrett A
    82. Bellis C
    83. Bennett AJ
    84. Berne C
    85. Blüher M
    86. Bolton JL
    87. Böttcher Y
    88. Boyd HA
    89. Bruinenberg M
    90. Buckley BM
    91. Buyske S
    92. Caspersen IH
    93. Chines PS
    94. Clarke R
    95. Claudi-Boehm S
    96. Cooper M
    97. Daw EW
    98. De Jong PA
    99. Deelen J
    100. Delgado G
    101. Denny JC
    102. Dhonukshe-Rutten R
    103. Dimitriou M
    104. Doney AS
    105. Dörr M
    106. Eklund N
    107. Eury E
    108. Folkersen L
    109. Garcia ME
    110. Geller F
    111. Giedraitis V
    112. Go AS
    113. Grallert H
    114. Grammer TB
    115. Gräßler J
    116. Grönberg H
    117. de Groot LC
    118. Groves CJ
    119. Haessler J
    120. Hall P
    121. Haller T
    122. Hallmans G
    123. Hannemann A
    124. Hartman CA
    125. Hassinen M
    126. Hayward C
    127. Heard-Costa NL
    128. Helmer Q
    129. Hemani G
    130. Henders AK
    131. Hillege HL
    132. Hlatky MA
    133. Hoffmann W
    134. Hoffmann P
    135. Holmen O
    136. Houwing-Duistermaat JJ
    137. Illig T
    138. Isaacs A
    139. James AL
    140. Jeff J
    141. Johansen B
    142. Johansson Å
    143. Jolley J
    144. Juliusdottir T
    145. Junttila J
    146. Kho AN
    147. Kinnunen L
    148. Klopp N
    149. Kocher T
    150. Kratzer W
    151. Lichtner P
    152. Lind L
    153. Lindström J
    154. Lobbens S
    155. Lorentzon M
    156. Lu Y
    157. Lyssenko V
    158. Magnusson PK
    159. Mahajan A
    160. Maillard M
    161. McArdle WL
    162. McKenzie CA
    163. McLachlan S
    164. McLaren PJ
    165. Menni C
    166. Merger S
    167. Milani L
    168. Moayyeri A
    169. Monda KL
    170. Morken MA
    171. Müller G
    172. Müller-Nurasyid M
    173. Musk AW
    174. Narisu N
    175. Nauck M
    176. Nolte IM
    177. Nöthen MM
    178. Oozageer L
    179. Pilz S
    180. Rayner NW
    181. Renstrom F
    182. Robertson NR
    183. Rose LM
    184. Roussel R
    185. Sanna S
    186. Scharnagl H
    187. Scholtens S
    188. Schumacher FR
    189. Schunkert H
    190. Scott RA
    191. Sehmi J
    192. Seufferlein T
    193. Shi J
    194. Silventoinen K
    195. Smit JH
    196. Smith AV
    197. Smolonska J
    198. Stanton AV
    199. Stirrups K
    200. Stott DJ
    201. Stringham HM
    202. Sundström J
    203. Swertz MA
    204. Syvänen AC
    205. Tayo BO
    206. Thorleifsson G
    207. Tyrer JP
    208. van Dijk S
    209. van Schoor NM
    210. van der Velde N
    211. van Heemst D
    212. van Oort FV
    213. Vermeulen SH
    214. Verweij N
    215. Vonk JM
    216. Waite LL
    217. Waldenberger M
    218. Wennauer R
    219. Wilkens LR
    220. Willenborg C
    221. Wilsgaard T
    222. Wojczynski MK
    223. Wong A
    224. Wright AF
    225. Zhang Q
    226. Arveiler D
    227. Bakker SJ
    228. Beilby J
    229. Bergman RN
    230. Bergmann S
    231. Biffar R
    232. Blangero J
    233. Boomsma DI
    234. Bornstein SR
    235. Bovet P
    236. Brambilla P
    237. Brown MJ
    238. Campbell H
    239. Caulfield MJ
    240. Chakravarti A
    241. Collins R
    242. Collins FS
    243. Crawford DC
    244. Cupples LA
    245. Danesh J
    246. de Faire U
    247. den Ruijter HM
    248. Erbel R
    249. Erdmann J
    250. Eriksson JG
    251. Farrall M
    252. Ferrannini E
    253. Ferrières J
    254. Ford I
    255. Forouhi NG
    256. Forrester T
    257. Gansevoort RT
    258. Gejman PV
    259. Gieger C
    260. Golay A
    261. Gottesman O
    262. Gudnason V
    263. Gyllensten U
    264. Haas DW
    265. Hall AS
    266. Harris TB
    267. Hattersley AT
    268. Heath AC
    269. Hengstenberg C
    270. Hicks AA
    271. Hindorff LA
    272. Hingorani AD
    273. Hofman A
    274. Hovingh GK
    275. Humphries SE
    276. Hunt SC
    277. Hypponen E
    278. Jacobs KB
    279. Jarvelin MR
    280. Jousilahti P
    281. Jula AM
    282. Kaprio J
    283. Kastelein JJ
    284. Kayser M
    285. Kee F
    286. Keinanen-Kiukaanniemi SM
    287. Kiemeney LA
    288. Kooner JS
    289. Kooperberg C
    290. Koskinen S
    291. Kovacs P
    292. Kraja AT
    293. Kumari M
    294. Kuusisto J
    295. Lakka TA
    296. Langenberg C
    297. Le Marchand L
    298. Lehtimäki T
    299. Lupoli S
    300. Madden PA
    301. Männistö S
    302. Manunta P
    303. Marette A
    304. Matise TC
    305. McKnight B
    306. Meitinger T
    307. Moll FL
    308. Montgomery GW
    309. Morris AD
    310. Morris AP
    311. Murray JC
    312. Nelis M
    313. Ohlsson C
    314. Oldehinkel AJ
    315. Ong KK
    316. Ouwehand WH
    317. Pasterkamp G
    318. Peters A
    319. Pramstaller PP
    320. Price JF
    321. Qi L
    322. Raitakari OT
    323. Rankinen T
    324. Rao DC
    325. Rice TK
    326. Ritchie M
    327. Rudan I
    328. Salomaa V
    329. Samani NJ
    330. Saramies J
    331. Sarzynski MA
    332. Schwarz PE
    333. Sebert S
    334. Sever P
    335. Shuldiner AR
    336. Sinisalo J
    337. Steinthorsdottir V
    338. Stolk RP
    339. Tardif JC
    340. Tönjes A
    341. Tremblay A
    342. Tremoli E
    343. Virtamo J
    344. Vohl MC
    345. Amouyel P
    346. Asselbergs FW
    347. Assimes TL
    348. Bochud M
    349. Boehm BO
    350. Boerwinkle E
    351. Bottinger EP
    352. Bouchard C
    353. Cauchi S
    354. Chambers JC
    355. Chanock SJ
    356. Cooper RS
    357. de Bakker PI
    358. Dedoussis G
    359. Ferrucci L
    360. Franks PW
    361. Froguel P
    362. Groop LC
    363. Haiman CA
    364. Hamsten A
    365. Hayes MG
    366. Hui J
    367. Hunter DJ
    368. Hveem K
    369. Jukema JW
    370. Kaplan RC
    371. Kivimaki M
    372. Kuh D
    373. Laakso M
    374. Liu Y
    375. Martin NG
    376. März W
    377. Melbye M
    378. Moebus S
    379. Munroe PB
    380. Njølstad I
    381. Oostra BA
    382. Palmer CN
    383. Pedersen NL
    384. Perola M
    385. Pérusse L
    386. Peters U
    387. Powell JE
    388. Power C
    389. Quertermous T
    390. Rauramaa R
    391. Reinmaa E
    392. Ridker PM
    393. Rivadeneira F
    394. Rotter JI
    395. Saaristo TE
    396. Saleheen D
    397. Schlessinger D
    398. Slagboom PE
    399. Snieder H
    400. Spector TD
    401. Strauch K
    402. Stumvoll M
    403. Tuomilehto J
    404. Uusitupa M
    405. van der Harst P
    406. Völzke H
    407. Walker M
    408. Wareham NJ
    409. Watkins H
    410. Wichmann HE
    411. Wilson JF
    412. Zanen P
    413. Deloukas P
    414. Heid IM
    415. Lindgren CM
    416. Mohlke KL
    417. Speliotes EK
    418. Thorsteinsdottir U
    419. Barroso I
    420. Fox CS
    421. North KE
    422. Strachan DP
    423. Beckmann JS
    424. Berndt SI
    425. Boehnke M
    426. Borecki IB
    427. McCarthy MI
    428. Metspalu A
    429. Stefansson K
    430. Uitterlinden AG
    431. van Duijn CM
    432. Franke L
    433. Willer CJ
    434. Price AL
    435. Lettre G
    436. Loos RJ
    437. Weedon MN
    438. Ingelsson E
    439. O'Connell JR
    440. Abecasis GR
    441. Chasman DI
    442. Goddard ME
    443. Visscher PM
    444. Hirschhorn JN
    445. Frayling TM
    446. Electronic Medical Records and Genomics (eMEMERGEGE) Consortium, MIGen Consortium, PAGEGE Consortium, LifeLines Cohort Study
    (2014) Defining the role of common variation in the genomic and biological architecture of adult human height
    Nature Genetics 46:1173–1186.
    https://doi.org/10.1038/ng.3097

Decision letter

  1. Christian R Landry
    Reviewing Editor; Université Laval, Canada
  2. Naama Barkai
    Senior Editor; Weizmann Institute of Science, Israel

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The authors examine the relationship between the frequency of genetic variants in natural populations and their effects on complex growth traits using the budding yeast as a model. They find that high-impact variants tend to be rare and that their effects often combine in a non-additive manner. Their results contribute to a better understanding of phenotypic diversity and will help future developments in the use of natural populations for the mapping of genetic variation underlying complex traits such as those using GWAS in which low-frequency variants represent a particular challenge. Their observations are therefore of interest to a large community of scientists interested in evolution, genetics and particularly in the architecture of complex traits. The data produced and approach developed also represent an important resource for the community.

Decision letter after peer review:

Thank you for submitting your article "Extensive impact of low-frequency variants on the phenotypic landscape at population-scale" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Naama Barkai as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Your paper examines the correlation between allele frequencies and their effects on quantitative characters using QTL mapping and the analysis of a large number of genomes. You find that rare variants explain an unexpectedly large proportion of phenotypic variance. Your study is one of the first to examine this association systematically. Overall, the reviewers found the work of interest and to be a potentially important contribution. One major concern that emerged from the reviews and the discussions among the reviewers is that the importance of the work will not be obvious for non-specialists. One reviewer also mentions that similar conclusions could have been obtained from a meta-analysis of the existing literature. Since eLife is a generalist journal, it would be crucial to better articulate why the study is important and how the findings will impact the field of genetics and maybe evolution in general. More theoretical background as to why variants with large impacts on phenotypes should be rare or vice-versa would be useful. The manuscript is currently very short so you have plenty of space to extend on these points in the Introduction and in the Discussion. One reviewer also suggested you extend the analysis and text on the implication of the conditions tested for yeast biology, which I believe would strengthen the paper as well in terms of impact.

I collated below the other comments of the reviewers that are essential points to consider if you want to submit a revised version.

Essential revisions:

1) For a polygenic trait, the distinction between dominance and additivity isn't a relevant one. For example, you could have 100 loci, each is completely dominance, but if they are additive between loci, the hybrid test will appear additive. The latter results by GWAS suggest that a lot of variants have over-dominant effect (at least some over-dominant component). I can see what the authors are trying to do here, i.e., to assess the contribution of additivity versus other non-additive effects, but I think as long as there are many loci and there is some degree of additivity between loci, everything will appear additive. I think the distinction between additive and non-additive effects are only relevant when discussing one locus. If you had a panel of near-isogenic lines, a diallel experiment could answer the question of additivity versus non-additivity. The results from this analysis are still useful and I would suggest the authors simply report the results without invoking the term of additivity versus dominance. Alternatively, clearly state the caveats so readers don't mis-read the interpretation.

2) I have a somewhat different interpretation of the rare versus common comparison. There are a few facts nicely presented. 1) although there are fewer rare variants in the diallel than common ones, rare variants are more likely to be associated with the traits. This is a major finding. 2) On a per variant basis, common and low-frequency variants explain about the same amount of variation. This means the effect size should be larger for rare variants than common variants. I don't think the statistical significance in Figure 4D is worth highlighting, the difference was minimal (20.2% versus 19.6% with a large variance). Power is proportional to variance explained so it's expected that these two groups produce more or less equal variance on a per variant basis if using the same threshold. However, in the diallel, there are way more common variants than rare variants. This means in the diallel, more variance is explained by common variants as a whole. I can see that if rare variants are more likely to be associated with traits, then in an outbred population, they could also be disproportionally associated with traits but more difficult to detect. I would appreciate some discussion on the contribution by a per-variant basis and overall contribution.

3) The main conclusion of the manuscript is that rare variants significantly contribute to genetic variance. In my view, this conclusion is biased as these rare causal variants are being analyzed in genetic backgrounds in which they are no longer rare; actually, these variants are biallelic. Several studies have shown that a rare variant of MKT1(89A) is a significant contributor to phenotypic variation whenever it is present in segregating populations. However, MKT1(89A) allele hardly identified when one of the parents is not S288c, the strain which harbours this allele. So the extension that if the rare variant has a significant effect in a sub-population, its effect size would be similar in a large heterogeneous population is false. Furthermore, the authors conclude that their larger 55 strain population, a representative distribution of 1000 strain collection, most of the variants have additive effects. This the authors claim is revalidation of other previous studies (Bloom et al., 2013, 2015), where they identified most of the causal variants between BYxRM had additive effects. However, subsequent papers (Frosberg et al. 2017, PMID 28250458; Yadav et al. 2016) showed that variance mapping in BYxRM segregants helped to account for genetic interactions and showed how non-additive interactions also contribute significantly to phenotypic variation. One of the results in the manuscript that non-additive effects contribute 1/3rd to phenotypic variance indicates that additive effects do not explain all effects with dominance, a non-additive interaction, being a significant contributor. Also, the authors fail to explain why dominance is so frequently observed in their diallelic panel. A possible reason could be that one variant is selected for a trait better than the other, and in combination with a weaker or neutral allele, it shows dominance.

4) I find that just doing a few more strains does not make this manuscript a significant advance over the previous studies. One can argue that taking into account all causal variants identified to date (Fay, 2013), one can identify what frequency of rare variants have been identified, e.g. a typical example being MKT1(89A) allele as causal, even though their effect size will not be identified using this strategy. Peltier et al., 2019, show that 284 rare QTNs variants have been identified to date and these functional variants being private to a subpopulation, possibly due to their adaptive role to a specific environment. Moreover, this conclusion can be made without these extensive experimental crosses.

https://doi.org/10.7554/eLife.49258.020

Author response

Your paper examines the correlation between allele frequencies and their effects on quantitative characters using QTL mapping and the analysis of a large number of genomes. You find that rare variants explain an unexpectedly large proportion of phenotypic variance. Your study is one of the first to examine this association systematically. Overall, the reviewers found the work of interest and to be a potentially important contribution. One major concern that emerged from the reviews and the discussions among the reviewers is that the importance of the work will not be obvious for non-specialists. One reviewer also mentions that similar conclusions could have been obtained from a meta-analysis of the existing literature.

We performed such an analysis in the framework of the 1002 Yeast Genomes Project and this analysis was mentioned in the first version of the manuscript. More recently, we were involved in a larger analysis but this one was not cited (Peltier et al., 2019) because unpublished at that time. Now, a proper citation has been included and we commented on this specific point in the Discussion.

Even if such analyses are really insightful, we really think that there are some biases in the subset of detected QTNs in yeast using linkage mapping for different reasons: First in terms of genetic backgrounds studied as most of linkage mapping studies were performed on mostly the same set of isolates. Second, experimentally validated QTNs are often prioritized based on their effect size.

Our study allows for a more global and quantitative approach as the variants are taken from a representative, genetically diverse and larger population. The subset of genetic variants is also much larger. Overall, this dataset gives a precise as well as a quantitative global view of the role of low-frequency variants on the phenotypic diversity in a population.

Since eLife is a generalist journal, it would be crucial to better articulate why the study is important and how the findings will impact the field of genetics and maybe evolution in general. More theoretical background as to why variants with large impacts on phenotypes should be rare or vice-versa would be useful. The manuscript is currently very short so you have plenty of space to extend on these points in the Introduction and in the Discussion.

As suggested, we modified the Introduction by adding more background on the missing heritability problem as well as on the role of low-frequency and rare variants in human diseases. We also expanded the Discussion in order to answer to several points raised during the reviewing process (see below).

One reviewer also suggested you extend the analysis and text on the implication of the conditions tested for yeast biology, which I believe would strengthen the paper as well in terms of impact.

The goal of our study was to have a myriad of complex traits to study. Consequently we selected a large number of conditions for which the phenotypic variance was broad in our population. These conditions were already tested in the framework of the 1002 Yeast Genomes Project (Peter et al., 2018). Most of them show a normal distribution, meaning that they correspond to complex traits. A good dissection and analysis of the implication of the tested conditions for yeast biology require an additional step, namely the determination of inheritance patterns in the progeny. This is actually something that is intended as a logical follow-up to this study.

Essential revisions:

1) For a polygenic trait, the distinction between dominance and additivity isn't a relevant one. For example, you could have 100 loci, each is completely dominance, but if they are additive between loci, the hybrid test will appear additive. The latter results by GWAS suggest that a lot of variants have over-dominant effect (at least some over-dominant component). I can see what the authors are trying to do here, i.e., to assess the contribution of additivity versus other non-additive effects, but I think as long as there are many loci and there is some degree of additivity between loci, everything will appear additive. I think the distinction between additive and non-additive effects are only relevant when discussing one locus. If you had a panel of near-isogenic lines, a diallel experiment could answer the question of additivity versus non-additivity. The results from this analysis are still useful and I would suggest the authors simply report the results without invoking the term of additivity versus dominance. Alternatively, clearly state the caveats so readers don't mis-read the interpretation.

As we only look at the final phenotype of the hybrid, we do agree that the distinction of additivity vs. dominance is only the result of all the combined effects of the genes and that no distinction between the effect of individual loci can be done. However, one can argue that if dominance is indeed detected as the main mode of inheritance, it might suggest the presence of a locus of high phenotypic impact acting dominantly. Also it is possible that if two hybrids display complete dominance towards a parent, it does not necessarily reflect that the same locus is involved in both cases. As suggested, we clearly stated the caveats and consequently we added a paragraph in the Discussion to clarify this point.

2) I have a somewhat different interpretation of the rare versus common comparison. There are a few facts nicely presented.

1) although there are fewer rare variants in the diallel than common ones, rare variants are more likely to be associated with the traits. This is a major finding.

We thank the reviewer for this comment. It is, indeed, true that low-frequency variants are disproportionally associated to the trait (i.e. they are overrepresented) and we now emphasized more on that point in the Abstract and the Results section.

2) On a per variant basis, common and low frequency variants explain about the same amount of variation. This means the effect size should be larger for rare variants than common variants. I don't think the statistical significance in Figure 4D is worth highlighting, the difference was minimal (20.2% versus 19.6% with a large variance). Power is proportional to variance explained so it's expected that these two groups produce more or less equal variance on a per variant basis if using the same threshold. However, in the diallel, there are way more common variants than rare variants. This means in the diallel, more variance is explained by common variants as a whole. I can see that if rare variants are more likely to be associated with traits, then in an outbred population, they could also be disproportionally associated with traits but more difficult to detect. I would appreciate some discussion on the contribution by a per-variant basis and overall contribution.

We thank the reviewer for these comments. This is only true if we look at it in the same population. However, here, in our diallel panel, the low-frequency variants in the initial population are no longer rare because of a shift of the allele frequency. For example, a variant having a MAF of 3% in the 1,011 can rise to 25% in the diallel. Thus, the fraction explained in the diallel won’t be linked to the MAF in the initial population.

To answer this issue, we computed the effect size of the significantly associated variants. Effect size is a metric that is independent of allele frequency thus making it more prone to extrapolation in a different population. We added a paragraph about this point in the Results section as well as a figure (Figure 3E), and in the Discussion.

Concerning the fraction explain by common and low frequency associated SNPs, we do agree that the difference is minimal. As suggested, we did not highlight that point in the new version anymore.

3) The main conclusion of the manuscript is that rare variants significantly contribute to genetic variance. In my view, this conclusion is biased as these rare causal variants are being analyzed in genetic backgrounds in which they are no longer rare; actually, these variants are biallelic. Several studies have shown that a rare variant of MKT1(89A) is a significant contributor to phenotypic variation whenever it is present in segregating populations. However, MKT1(89A) allele hardly identified when one of the parents is not S288c, the strain which harbours this allele. So the extension that if the rare variant has a significant effect in a sub-population, its effect size would be similar in a large heterogeneous population is false.

This part is related to what we mentioned previously. Indeed, the effect size of this variant would be roughly the same in a different population, however it is true that the fraction of variance explained by such a variant could be different. Consequently, we computed the effect size of the significantly associated variants and we’ve shown that effect size of low-frequency variants is not much different from common variants.

Furthermore, the authors conclude that their larger 55 strain population, a representative distribution of 1000 strain collection, most of the variants have additive effects. This the authors claim is revalidation of other previous studies (Bloom et al., 2013, 2015), where they identified most of the causal variants between BYxRM had additive effects. However, subsequent papers (Frosberg et al. 2017, PMID 28250458; Yadav et al. 2016) showed that variance mapping in BYxRM segregants helped to account for genetic interactions and showed how non-additive interactions also contribute significantly to phenotypic variation. One of the results in the manuscript that non-additive effects contribute 1/3rd to phenotypic variance indicates that additive effects do not explain all effects with dominance, a non-additive interaction, being a significant contributor. Also, the authors fail to explain why dominance is so frequently observed in their diallelic panel. A possible reason could be that one variant is selected for a trait better than the other, and in combination with a weaker or neutral allele, it shows dominance.

As suggested, we added the references in the text. One hypothesis that could be proposed to explain the importance of dominance in our dataset is the presence of genetic variants with strong phenotypic effect acting dominantly in some strains and being responsible for most of the phenotypic variance in all crosses being heterozygous at this particular locus. We now added this point in the Discussion section.

4) I find that just doing a few more strains does not make this manuscript a significant advance over the previous studies. One can argue that taking into account all causal variants identified to date (Fay, 2013), one can identify what frequency of rare variants have been identified, e.g. a typical example being MKT1(89A) allele as causal, even though their effect size will not be identified using this strategy. Peltier et al., 2019, show that 284 rare QTNs variants have been identified to date and these functional variants being private to a subpopulation, possibly due to their adaptive role to a specific environment. Moreover, this conclusion can be made without these extensive experimental crosses.

As already discussed above, we strongly believe that our study corresponds to a more global and systematic approach than the concatenation of different results from different linkage mapping studies. We exhaustively looked and compared the fraction of variance explained and the effect size from variants of a large dataset of associated genetic variants, which were not chosen based on their effect size.

https://doi.org/10.7554/eLife.49258.021

Article and author information

Author details

  1. Téo Fournier

    Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4860-6728
  2. Omar Abou Saada

    Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
    Contribution
    Software, Formal analysis, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Jing Hou

    Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
    Contribution
    Conceptualization, Software, Formal analysis, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Jackson Peter

    Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
    Contribution
    Software, Formal analysis
    Competing interests
    No competing interests declared
  5. Elodie Caudal

    Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
    Contribution
    Resources, Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
  6. Joseph Schacherer

    Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
    Contribution
    Conceptualization, Supervision, Funding acquisition, Validation, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    schacherer@unistra.fr
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6606-6884

Funding

National Institutes of Health (R01 GM101091-01)

  • Joseph Schacherer

European Research Council (Consolidator grants (772505))

  • Joseph Schacherer

Fondation pour la Recherche Médicale (Graduate student grant)

  • Téo Fournier

Institut Universitaire de France

  • Joseph Schacherer

University of Strasbourg Institute for Advanced Study

  • Joseph Schacherer

Ministère de l’Enseignement Supérieur et de la Recherche

  • Téo Fournier

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Joshua Bloom and Leonid Kruglyak for insightful discussions, comments on the manuscript as well as for sharing their unpublished manuscript. We thank Maitreya Dunham and the members of the Schacherer laboratory for comments and suggestions. We also thank Gilles Fischer for providing the pAEF5 plasmid. This work was supported by a National Institutes of Health (NIH) grant R01 (GM101091-01) and a European Research Council (ERC) Consolidator grant (772505). TF is supported in part by a grant from the Ministère de l’Enseignement Supérieur et de la Recherche and in part by a fellowship from the medical association la Fondation pour la Recherche Médicale. JS is a Fellow of the University of Strasbourg Institute for Advanced Study (USIAS) and a member of the Institut Universitaire de France.

Senior Editor

  1. Naama Barkai, Weizmann Institute of Science, Israel

Reviewing Editor

  1. Christian R Landry, Université Laval, Canada

Publication history

  1. Received: June 12, 2019
  2. Accepted: October 23, 2019
  3. Accepted Manuscript published: October 24, 2019 (version 1)
  4. Version of Record published: December 4, 2019 (version 2)

Copyright

© 2019, Fournier et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,365
    Page views
  • 290
    Downloads
  • 11
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Genetics and Genomics
    Luisa F Pallares
    Insight

    Rare genetic variants in yeast explain a large amount of phenotypic variation in a complex trait like growth.

    1. Genetics and Genomics
    Tingting Zhao et al.
    Research Article

    In Saccharomyces cerevisiae, RNA Polymerase II (Pol II) selects transcription start sites (TSS) by a unidirectional scanning process. During scanning, a preinitiation complex (PIC) assembled at an upstream core promoter initiates at select positions within a window ~40-120 basepairs downstream. Several lines of evidence indicate that Ssl2, the yeast homolog of XPB and an essential and conserved subunit of the general transcription factor (GTF) TFIIH, drives scanning through its DNA-dependent ATPase activity, therefore potentially controlling both scanning rate and scanning extent (processivity). To address questions of how Ssl2 functions in promoter scanning and interacts with other initiation activities, we leveraged distinct initiation-sensitive reporters to identify novel ssl2 alleles. These ssl2 alleles, many of which alter residues conserved from yeast to human, confer either upstream or downstream TSS shifts at the model promoter ADH1 and genome-wide. Specifically, tested ssl2 alleles alter TSS selection by increasing or narrowing the distribution of TSSs used at individual promoters. Genetic interactions of ssl2 alleles with other initiation factors are consistent with ssl2 allele classes functioning through increasing or decreasing scanning processivity but not necessarily scanning rate. These alleles underpin a residue interaction network that likely modulates Ssl2 activity and TFIIH function in promoter scanning. We propose that the outcome of promoter scanning is determined by two functional networks, the first being Pol II activity and factors that modulate it to determine initiation efficiency within a scanning window, and the second being Ssl2/TFIIH and factors that modulate scanning processivity to determine the width of the scanning widow.