1. Genetics and Genomics
Download icon

Rare variants contribute disproportionately to quantitative trait variation in yeast

  1. Joshua S Bloom  Is a corresponding author
  2. James Boocock
  3. Sebastian Treusch
  4. Meru J Sadhu
  5. Laura Day
  6. Holly Oates-Barker
  7. Leonid Kruglyak  Is a corresponding author
  1. University of California, Los Angeles, United States
  2. Howard Hughes Medical Institute, University of California, Los Angeles, United States
Research Article
  • Cited 2
  • Views 1,491
  • Annotations
Cite this article as: eLife 2019;8:e49212 doi: 10.7554/eLife.49212

Abstract

How variants with different frequencies contribute to trait variation is a central question in genetics. We use a unique model system to disentangle the contributions of common and rare variants to quantitative traits. We generated ~14,000 progeny from crosses among 16 diverse yeast strains and identified thousands of quantitative trait loci (QTLs) for 38 traits. We combined our results with sequencing data for 1011 yeast isolates to show that rare variants make a disproportionate contribution to trait variation. Evolutionary analyses revealed that this contribution is driven by rare variants that arose recently, and that negative selection has shaped the relationship between variant frequency and effect size. We leveraged the structure of the crosses to resolve hundreds of QTLs to single genes. These results refine our understanding of trait variation at the population level and suggest that studies of rare variants are a fertile ground for discovery of genetic effects.

https://doi.org/10.7554/eLife.49212.001

Introduction

A detailed understanding of the sources of heritable variation is a central goal of modern genetics. Genome-wide association studies (GWAS) in humans (Visscher et al., 2017) have implicated tens of thousands of DNA sequence variants in disease risk and quantitative trait variation, but these variants fail to account for the entire heritability of diseases and traits. One key question is the relative contribution of DNA sequence variants with different allele frequencies in a population to trait variation. GWAS by design only test common DNA sequence variants; however, recent studies underscore the likely importance of the contribution of rare variants to heritable variation (Wainschtein et al., 2019). Theoretical analyses have explored how factors such as mutational target size, pleiotropy, and the strength of selection shape the relationship between variant frequency and effect size (Eyre-Walker, 2010; Robinson et al., 2014; Simons et al., 2018). In particular, purifying selection against variants that negatively affect fitness is expected to keep them at low frequencies in a population, resulting in a predicted inverse relationship between effect sizes and allele frequencies for variants that influence fitness-related traits (Gibson, 2012; Goldstein et al., 2013; Kryukov et al., 2007; Pritchard, 2001).

Empirical results have been consistent with the theoretical expectation that rare variants should have larger effect sizes, or, equivalently, that variants implicated in trait variation should be shifted to lower frequencies relative to all variants. An increased burden of ultra-rare protein-truncating variants has been observed in human diseases (Ganna et al., 2018; Exome Aggregation Consortium et al., 2016), and multiple studies have found that GWAS variants with lower allele frequencies have larger effect sizes (Marouli et al., 2017; Park et al., 2011). A negative correlation between allele frequency and effect size has also been observed in maize GWAS (Wallace et al., 2014), and our previous work in yeast suggested that variants that contribute to trait variation are shifted to lower frequencies when compared to all sequence variants (Ehrenreich et al., 2012).

Recent studies employed indirect variance partitioning approaches to uncover appreciable contributions of lower frequency variants to heritability of complex traits in humans, including prostate cancer susceptibility (Mancuso et al., 2016), height (Wainschtein et al., 2019; Yang et al., 2015), and body mass index (Wainschtein et al., 2019). However, a direct comprehensive comparison of the effects of rare and common variants has been lacking in humans for two principal reasons. First, rare variants cannot be detected by GWAS by design, and sequencing studies have not reached sufficient sample sizes to find them with high statistical power (Zuk et al., 2014). As a result, most rare variants have to date escaped detection. Second, the power to detect a variant with any given effect size decreases with the frequency of the variant in the study, simply because fewer individuals in the sample carry a less-frequent variant (Zuk et al., 2014). This statistical artifact shifts the effect sizes of those rare variants that are detected upwards, confounding effect size and allele frequency and biasing any effort to measure the underlying relationship between the two.

Here, we report a comprehensive study in yeast designed to overcome these limitations. We built a mapping population consisting of approximately one thousand progeny from each of 16 biparental crosses. In this mapping population, even variants that are rare in the yeast population and occur in only a single parental strain are present in approximately 1000 progeny, resulting in high power to detect them. We mapped thousands of QTLs that account for most of the heritable variation in 38 quantitative traits and measured the QTL effect sizes. We then decoupled variant frequency from effect size by measuring the population allele frequencies of QTL lead variants detected in our panel in a separate large catalog of sequenced yeast isolates (Peter et al., 2018). Analysis of these large complementary data sets enabled us to directly and comprehensively examine the relationship between QTL effect sizes and variant frequency, characterize the genetic architecture of quantitative traits on a population scale, and improve mapping resolution, in many cases to single genes.

Results

To investigate the genetic basis of quantitative traits in the yeast population, we selected 16 highly diverse S. cerevisiae strains that capture much of the known genetic diversity of this species. Specifically, they contain both alleles at 82% of biallelic SNPs and small indels observed at minor allele frequency >5% in a collection of 1011 S. cerevisiae strains (Peter et al., 2018). We sequenced the 16 strains to high coverage in order to obtain a comprehensive set of genetic variants. We constructed a panel of 13,950 individual recombinant haploid yeast segregants by crossing each parental strain to two different strains and collecting an average of 872 progeny per cross (Figure 1; Figure 1—source data 1; Supplementary file 1). We genotyped these segregants by highly multiplexed whole-genome sequencing, with median 2.3-fold coverage per base per individual. Genotypes were called at 298,979 genetic variants, with an average of 71,117 genetic variants segregating in a single cross. We measured the growth of each segregant in 38 different environments in duplicate by automated assays and quantitative imaging (Materials and methods). Because the growth measurements in different environments are not strongly correlated, we treat them as separate phenotypes or traits (Bloom et al., 2013). The resulting genotype-by-phenotype matrix (over half a million phenotypic measurements and 158 billion combinations of genotype and phenotype) formed the basis for all downstream analyses.

Multiparental cross design with 16 diverse progenitor yeast strains.

16 parental strains were chosen to represent the diversity of the S. cerevisiae population, as illustrated by their positions on a neighbor-joining tree based on 1011 sequenced isolates (Peter et al., 2018). These strains were crossed in a single round-robin design, with each strain crossed to two other strains, as depicted by lines connecting the colored circles. Colors indicate the ecological origins of the parental strains.

https://doi.org/10.7554/eLife.49212.002

We used a variance components model (Bloom et al., 2015; de los Campos et al., 2015; Yang et al., 2010) to show that, on average, additive genetic effects accounted for just over half of the total phenotypic variance, while pairwise genetic interactions accounted for 8%, approximately 1/6 as much as additive effects (Figure 2 inset; Supplementary file 2; Figure 2—source data 1). We carried out QTL mapping to find the specific loci contributing additively to trait variation. We used a joint mapping approach that leverages information across the entire panel of 13,950 segregants (Materials and methods). We mapped 4552 QTLs at a false discovery rate (FDR) of 5%, with an average of 120 (range 52–195) QTLs per trait (Supplementary file 3; Figure 3—source data 1). The detected QTLs explain a median of 73% of the additive heritability per trait and cross, showing that we can account for most of the genetic contribution to trait variation with specific loci (Figure 2; Figure 2—source data 1). We complemented the joint analysis with QTL mapping within each cross and found a median of 12 QTLs per trait at the same FDR of 5%. The detected loci explained a median of 68% of the additive heritability (Figure 2—source data 1). The joint analysis was more powerful, explaining an additional 5% of trait variance and uncovering 458 QTLs not detected within individual crosses. Consistent with the higher statistical power of the joint analysis, these additional QTLs had smaller effect sizes (median of 0.071 SD units vs 0.083 SD units; Wilcoxon rank sum test W = 1e6, p=9e-5). All subsequent results are based on the QTLs detected in the joint analysis.

Most heritable variation is explained by detected QTLs.

Whole-genome estimates of additive genetic variance (X-axis) are plotted against cross-validated estimates of trait variance explained by detected QTLs (Y-axis) for each trait-cross combination. Red points show values for the BY-RM cross. The diagonal line corresponds to detected QTLs explaining all of the estimated additive genetic variance, and is shown as a visual guide. (Inset) A histogram of the ratio of non-additive to additive genetic variance for each trait-cross combination, based on estimates from a variance component model.

https://doi.org/10.7554/eLife.49212.004

To investigate the relationship between variant frequency and QTL effects, we focused on biallelic variants observed in our panel whose frequency could be measured in a large collection of 1011 sequenced yeast strains. Based on their minor allele frequency (MAF) in this collection, we designated variants as rare (MAF <0.01) or common (MAF >0.01). By this definition, 27.8% of biallelic variants in our study were rare. For each trait, we computed the relative fraction of variance explained by these two categories of variants in the segregant panel (Materials and methods) (Yang et al., 2015). Across all traits, the median contribution of rare variants was 51.7%, despite the fact that they constituted only 27.8% of all variants and that a rare variant is expected to explain less variance than a common one with the same allelic effect size. These results are consistent with rare variants having larger effect sizes and making a disproportionate contribution to trait variation. Comparing different traits, we saw a wide range of the relative contribution of rare variants, from almost none for growth in the presence of copper sulfate and lithium chloride to over 75% for growth in the presence of cadmium chloride, in low pH, at high temperature, and on minimal medium (Figure 3A; Figure 3—figure supplement 1; Figure 3—source data 2). The results for copper sulfate and lithium chloride are consistent with GWAS for these traits in the 1011 sequenced yeast strains—these two traits had the most phenotypic variance explained by detected GWAS loci, which inherently correspond to common variants, with large contributions coming from known common copy-number variation at the CUP and ENA loci, respectively (Peter et al., 2018).

Figure 3 with 5 supplements see all
Effect size and contribution to trait variation of rare and common variants.

(A) Stacked bar plots of additive genetic variance explained by rare (blue) and common (gray) variants. Error bars show + /- s.e. (B) Minor allele frequency (X-axis) of the lead variant at each QTL (Peter et al., 2018) is plotted against QTL effect size (Y-axis). Red points show mean QTL effect sizes for groups of approximately 100 variants binned by allele frequency. Error bars show + /- s.e.m. (C) Frequency of the derived allele of each QTL lead variant (X-axis), based on comparison with S. paradoxus, is plotted against QTL effect size (Y-axis). Negative values on the Y-axis correspond to variants with effects that are detrimental for growth.

https://doi.org/10.7554/eLife.49212.006

In a complementary analysis, we investigated the relationship between the allele frequency of the lead variant at each QTL and the corresponding QTL effect size. Although the lead variant is not necessarily causal, in our study it is likely to be of similar frequency as the causal variant, and a simulation analysis showed that this approach largely preserves the relationship between frequency and effect size (Figure 3—figure supplement 2). Most QTLs had small effects (64% of QTLs had effects less than 0.1 SD units) and most lead variants were common (78%), consistent with previous linkage and association studies. We observed that QTLs with large effects were highly enriched for rare variants, and conversely, that rare variants were highly enriched for large effect sizes (Figure 3B; Figure 3—figure supplement 3; Figure 3—figure supplement 4). For instance, among QTLs with an absolute effect of at least 0.3 SD units, 145 of the corresponding lead variants were rare and only 90 were common. Rare variants were 6.7 times more likely to have an effect greater than 0.3 SD (Figure 3—source data 1, Fisher’s exact test, p<2e-16). Theoretical population genetics models show that for traits under negative selection, variant effect size is expected to be a decreasing function of minor allele frequency (Eyre-Walker, 2010; Pritchard, 2001). We empirically observe this relationship in our data for most of the traits examined, providing evidence that they have evolved under negative selection in the yeast population (Figure 3—figure supplement 5).

The existence of a close sister species of S. cerevisiaeS. paradoxus—allowed us to distinguish rare variants by their ancestral state. Variants that share the major allele with S. paradoxus are more likely to have arisen in the S. cerevisiae population recently than those that share the minor allele with S. paradoxus. We classified low-frequency variants as recent or ancient according to whether their major or minor allele was shared with S. paradoxus, respectively. Recently arising deleterious alleles have had less time to be purged by negative selection, and therefore recent variants are expected to have stronger effects on gene function, and hence manifest as QTLs with larger effects. Consistent with this expectation, we observed that recent variants were 1.8 times more likely than ancient variants to have an effect size greater than 0.1 SD units (Fisher’s exact test p=9e-5) (Figure 3C). We further examined the direction of QTL effects and found that recent variants were 1.5 times more likely to decrease fitness (Fisher’s exact test p=8e-3). Strikingly, no ancient variant decreased fitness by more than 0.5 SD units, whereas 41 recent variants did (Fisher’s exact test p=7e-3).

An understanding of trait variation at the level of molecular mechanisms requires narrowing QTLs to the underlying causal genes. Such fine-mapping is a challenge because genetic linkage causes variants across an extended region to show mapping signals of similar strength. Statistical fine-mapping aims to address this challenge by estimating the probability that each variant within a QTL region is causal based on the precise pattern of genotype-phenotype correlations (Farh et al., 2015; Pasaniuc and Price, 2017; Treusch et al., 2015). Our crossing design enables us to obtain higher resolution for QTLs observed in two crosses that share a parent strain by looking for consistent inheritance patterns in both. Specifically, we focused on QTLs with effects greater than 0.14 SD units and used a Bayesian framework (Farh et al., 2015) to compute the posterior probability that each variant is causal (Figure 4A). We then aggregated these probabilities to obtain causality scores for each gene in a QTL. With this approach, we resolved 427 QTLs to single causal genes at an FDR of 20%. Because some QTLs have pleiotropic effects on multiple traits, this gene set contains 195 unique genes, greatly expanding the repertoire of causal genes in yeast. We searched the literature and found that 26 of the 195 genes identified here are supported by previous experimental evidence as causal for yeast trait variation (Fay, 2013; Jerison et al., 2017; Sadhu et al., 2016; Treusch et al., 2015; Wang and Kruglyak, 2014) (Figure 4B; Figure 4—source data 1). At a more stringent FDR of 5%, we found 105 unique causal genes, which included 24 of the 26 genes with experimental evidence.

QTL fine-mapping at gene-level resolution.

(A) Statistical fine-mapping of a QTL for growth in the presence of caffeine. Genetic mapping signal, shown as the coefficient of determination between genotype and phenotype (Y-axis, left), is plotted against genome position (X-axis) for crosses between 273614N and YJM981 (black) and YJM981 and CBS2888 (blue). The posterior probability of causality (PPC), plotted in red (Y-axis, right), localizes the QTL to a portion of the gene TOR1. (B) PPC is shown as black dots for 195 genes identified as causal at an FDR of 20%, sorted by PPC. Genes containing natural variants that have been experimentally validated as causal for trait variation in prior studies (Fay, 2013; Jerison et al., 2017; Sadhu et al., 2016; Treusch et al., 2015; Wang and Kruglyak, 2014) are shown in red and labeled with gene names.

https://doi.org/10.7554/eLife.49212.014

Causal genes were highly enriched for GO terms related to the plasma membrane (45 of 522, 16.5 expected, q = 1.8e-7), metal ion transport (13 of 83, 2.6 expected, q = 0.0009), and positive regulation of nitrogen compound biosynthesis (28 of 393, 12.5 expected, q = 0.0076) (Figure 4—source data 1). Strikingly, five of the six genes involved in cAMP biosynthesis were identified as causal (IRA1, IRA2, BCY1, CYR1, and RAS1; 0.19 expected, q = 0.0002). Additional genes in the RAS/cAMP signaling pathway were also identified as causal, including GPR1, which is involved in glucose sensing, SRV2, which binds adenylate cyclase, and RHO3, which encodes a RAS-like GTPase. In yeast, the RAS/cAMP pathway regulates cell cycle progression, metabolism, and stress resistance (Tisi et al., 2014). Variation in many of these genes influenced growth on alternative carbon sources. We hypothesize that the yeast population contains abundant functional variation in genes that regulate the switch from glucose to alternative carbon sources through the RAS/cAMP pathway.

Discussion

We previously used a cross between lab (BY) and vineyard (RM) strains of yeast to show that the majority of heritable phenotypic differences arise from additive genetic effects, and we were able to detect, at genome-wide significance, specific loci that together account for the majority of quantitative trait variation (Bloom et al., 2015; Bloom et al., 2013). It has been argued that the BY lab reference strain (commonly known as S288c) used in those and many other yeast studies is genetically and phenotypically atypical compared to other yeast isolates (Warringer et al., 2011). Our results here, obtained from crosses among 16 diverse strains, generalize these findings to the S. cerevisiae population and show that S288c is not exceptional from the standpoint of genetic variation and quantitative traits. We believe that the findings that the majority of the genetic variance of most traits is additive, and that there is little additive ‘missing heritability’ in studies with sufficiently large sample sizes, will apply broadly beyond yeast.

We discovered over 4500 quantitative trait loci (QTLs) that influence yeast growth in a wide variety of conditions. These loci likely capture the majority of common variants that segregate in S. cerevisiae and have appreciable phenotypic effects on growth, and therefore provide a comprehensive starting point for more fine-grained analyses of the genetic contribution to quantitative trait variation. We were able to localize approximately 8% of the QTLs to single genes based on genetic mapping information alone. Interestingly, these genes cluster in specific functional categories and pathways, suggesting that different strains of S. cerevisiae may have evolved different strategies for nutrient sensing and response as a function of specializing in particular environmental niches (Chantranupong et al., 2015). In addition to the findings described here, we anticipate that our data set will be a useful resource for further dissecting the genetic basis of trait variation at the gene and variant level (Peltier et al., 2019), and for evaluating statistical methods aimed at inferring causal genes and variants. In particular, the set of loci and genes identified here provides an ideal starting point for massively parallel editing experiments that directly test the phenotypic consequences of sequence variants (Shendure and Fields, 2016).

By combining our results with deep population sequencing in yeast (Peter et al., 2018), we were able to examine the contributions of variants in different frequency classes to trait variation. This approach avoids statistical confounding between variant frequency and effect size that occurs when both are measured in the same study sample. We observed a broad range of genetic architectures across the traits studied here, with variation in some traits dominated by common variants, while variation in others is mostly explained by rare variants. Overall, rare variants made a disproportionate contribution to trait variation as a consequence of their larger effect sizes. A complementary mapping approach in an overlapping set of yeast isolates also revealed enrichment of rare variants with larger effects (Fournier et al., 2019). These results are consistent with the finding from GWAS that common variants have small effects, as well as with linkage studies that find rare variants with large effect sizes. Our study design also revealed a substantial component of genetic variation—variants with low allele frequency and small effect size—that has been refractory to discovery in humans because both GWAS and linkage studies lack statistical power to detect this class of variants. Recent work in humans has suggested that rare variants account for a substantial fraction of heritability of complex traits and diseases (Wainschtein et al., 2019). Our study presents a more direct and fine-grained view of this component of trait variation and implies that larger sample sizes and more complete genotype information will be needed for more comprehensive studies in other systems.

Materials and methods

Data availability

Request a detailed protocol

Unless otherwise specified, all computational analyses were performed in R (v3.4.4). Analysis code and processing scripts are available at https://github.com/joshsbloom/yeast-16-parents (Bloom, 2019; copy archived at https://github.com/elifesciences-publications/yeast-16-parents). Additional links to generated data are also provided in the github repository. The version numbers of R packages used are listed in this repository. Sequencing data for parents and segregants is available in the Sequenced Read Archive (SRA) under the Bioproject ID PRJNA549760.

Short-read and synthetic long read sequencing of parental strains

Request a detailed protocol

Parental genotypes were obtained by deep (>100X) paired-end sequencing of the 16 parental strains. A VCF file containing SNPs and small indels was generated for the parents using bwa (v0.7.1) (Li, 2013) to align to the sacCer3 reference (Engel et al., 2014), Picard (v2.12.2) (Broad Institute, 2019) to remove PCR duplicates, and the GATK HaplotypeCaller (v3.8) (Van der Auwera et al., 2013) with expected sample ploidy set to 1. A separate pipeline was developed to leverage additional synthetic long-reads (Illumina/Moleculo) to identify larger structural variants in the parents. Briefly, synthetic long-read assemblies were filtered to only include scaffolds greater than 10 kb. Scaffolds were corrected with our short-read data using Pilon (Walker et al., 2014). CNVs were discovered using custom scripts modified from scripts originally used to generate calls for testing LUMPY (Layer et al., 2014). CNVs were genotyped in all parents using the approach presented in SVTyper (Ebler et al., 2017). Scripts associated with the CNV detection pipeline are available at https://github.com/theboocock/long_read_cnv (Boocock, 2019https://github.com/elifesciences-publications/long_read_cnv). 

Construction of haploid segregant panels

Request a detailed protocol

Segregants for the BY-RM cross and YPS163-YJM145 cross were obtained by sporulation of the hybrid diploid parents for 5–7 days in SPO++ sporulation medium (http://dunham.gs.washington.edu/sporulationdissection.htm) and tetrad dissection using the MSM 400 dissection microscope (Singer Instrument Company Ltd.). Four-spore tetrads were retained. For BYxRM, one segregant was randomly chosen per tetrad (Bloom et al., 2013). For the YPS163-YJM145 cross, all segregants from ~250 tetrads were used. For all other crosses, the hybrid diploids were were pre-grown in YPD with either G418 or cloNat, depending on which fluorescent magic marker plasmid they contained (Treusch et al., 2015). Then they were sporulated in SPO++ and either cloNat or G418 for 5–7 days. A random spore prep was used to isolate haploid progeny (https://openwetware.org/wiki/McClean:Random_Spore_Prep), modified to exclude the use of glass beads for spore separation. Cells were plated on selective media, grown for two days, and colony fluorescence was visualized. Green fluorescent colonies and red fluorescent colonies corresponding to MATa and MATα haploid progeny were picked to deep-well 96-well plates and then split into frozen stocks.

Preparation of whole-genome sequencing libraries for segregants

Request a detailed protocol

Yeast were grown in 1 ml of yeast peptone dextrose in 2 ml deep-well 96-well plates (Thermo Scientific). Plates were sealed with Breathe-Easy gas-permeable membranes (Sigma-Aldrich). Yeasts were grown without shaking for 2 days in a 30°C incubator. Cell walls were digested with Zymolase, and DNA was extracted using either the 96-well DNeasy Blood and Tissue kits (Qiagen) for the BY-RM and YPS163-YJM145 segregants, or 96-well E-Z 96 Tissue DNA kit, following the bacterial protocol (Omega) for all other segregants. DNA concentrations were determined using the Quant-iT dsDNA High-Sensitivity DNA quantification kit (Invitrogen) and the Bio-Tek Synergy two-plate reader. DNA was diluted to 0.22 ng per μl using a Biomek FX liquid handing robot (Beckamn Coulter). For each segregant, 5 μl of 0.22 ng per μl of DNA was added to 4 μl of 5X Nextera HWM buffer (Illumina), 6 μl of water and 5 μl of 1/35 diluted Nextera enzyme. The transposition reaction was performed for 5 min at 55°C. Directly after the tagmentation reaction and without additional sample purification, Illumina sequencing adaptors and custom indices were added by PCR. 10 μl of tagmented DNA was combined with 0.5 μl each of 10 μM index primers (one of N701-N712 plus one of 96 custom indices, see Supplementary file 1), 5 μl of 10X Ex Taq buffer, 0.375 μl Ex Taq polymerase (Takara), 4 μl of 2.5 mM dNTPs and 29.625 μl of water, and amplified with 20 cycles of PCR. Up to 1152-plex libraries were run on a Hiseq 2500 with single end 150 bp reads, except BY-RM (Bloom et al., 2013) and YPS163-YJM145 which were sequenced with 100 bp reads.

Segregant genotype calling

Request a detailed protocol

Fastq files for were demultiplexed using fastq-multx (v1.3.1) (Aronesty, 2013) and aligned to the SacCer3 version of the reference genome using bwa. Adapter sequences were trimmed from reads and Phred33 quality scores were computed with Trimmomatic (v0.32) (Bolger et al., 2014). PCR duplicates were removed using Picard and then merged into one CRAM file per cross using Picard. VCF files were generated for each cross using the GATK haplotype caller (Van der Auwera et al., 2013) and genotypes were called at known variant sites between the parental strains. Additional custom provided R code was used to remove regions with strong mapping bias toward the reference genome (Albert et al., 2018), filter poor quality markers, and remove segregants with too many crossovers, likely corresponding to diploid contaminants. Missing segregant genotype information was imputed using a hidden Markov model (HMM) implemented in R/QTL (Arends et al., 2010). Structural variants identified in the parent VCF files were considered missing information in the segregants and the HMM was used to impute genotypes at those sites.

Phenotyping by endpoint colony growth

Request a detailed protocol

Segregants were arrayed to 384-well liquid plates in duplicate with different plate positions across the duplicates. Segregants were grown in YPD for approximately 48 hr without shaking and then pinned to agar plates using a BM-5 colony arraying robot (S and P Robotics). Plates were incubated for 48 hr and end-point growth was quantified by automated plate imaging using the colony arraying robot. Colony radii were calculated using functions in the EBImage R package (Pau et al., 2010), and endpoint growth measurements were filtered and normalized for plate effects as described previously (Bloom et al., 2015; Bloom et al., 2013). In addition, a manual filtering step was used to filter out aberrant colonies arising from technical artefacts, such as from wet spots on the agar plates at the time colonies were pinned. Unless otherwise specified, the average value across replicates was used per segregant for all downstream analyses.

Within-cross QTL mapping

Request a detailed protocol

QTL were mapped using a forward stepwise regression procedure that controls the FDR (G’Sell et al., 2013) for each trait and cross. We tested for linkage at each marker along the genome by calculating r2, where r is the Pearson correlation coefficient between segregant genotypes at the marker and segregant phenotypes. 10,000 permutations of phenotype to strain assignment were performed and this statistic was calculated across the genome for each of the permutations. For each of the permutations, the maximum statistic was recorded to generate an empirical null distribution of the maximum statistic (Churchill and Doerge, 1994). A p-value was calculated as the probability the observed maximum statistic comes from the empirical null distribution of maximum statistics. If the observed statistic was greater than all of the empirical null statistics the p-value was recorded as 1e-4. The p-value was added to a set of p-values (p1, … pk), and the entire procedure was repeated (including permutations) with the previously identified marker(s) included as regression covariates. A ‘FowardStop’, FDR-controlling statistic (G’Sell et al., 2013) was calculated as -1ki=1klog(1-pi). We continued to add selected markers to a multiple regression model as long as the ‘ForwardStop’ statistic was less than or equal to 5%.

We note that we chose to use this procedure rather than procedures we have used in the past (Albert et al., 2018; Bloom et al., 2015; Bloom et al., 2013) because it is simple, does not require exchangeability of statistics across different traits, gives very similar results as previous methods, and we verified through simulations (not shown) that it controls FDR for forward stepwise model selection under different QTL architectures.

For this within-cross QTL mapping procedure, we re-localized QTL peak positions for QTL detected by the forward selection procedure. Specifically, for each QTL peak we included all other detected QTL peaks (as detected from the forward selection procedure) as covariates in a multiple regression model, and scanned each marker on the chromosome on which the QTL peak being re-localized was detected to identify the marker that maximized the likelihood of the multiple regression model. The marker that maximized the likelihood of the multiple regression model was retained as the new, re-localized, QTL peak position.

Cross-validation procedures to estimate heritability explained by QTL

Request a detailed protocol

The amount of additive variance explained by detected QTLs was estimated using cross-validation. For the within-cross analysis, segregants were randomly split into 10 sets. Each set of segregants was left out of the procedure one at a time (held-out set). The within-cross QTL mapping procedure was performed for all the other sets (training set). For the QTL markers detected in this training set and with effects estimated in the training set, the amount of variance explained by the joint model of the set of significant QTL markers was estimated in the held out set. For the joint analysis described below, we performed a similar procedure, splitting the segregants within each cross into 10 sets, leaving one of the sets from each cross out (held-out set) identifying QTL jointly across the other sets (training set) and estimating their effects in each cross (training set) and then estimating the variance explained in the held-out set.

Within-cross analysis to estimate additive and pairwise genetic interaction variance

Request a detailed protocol

To estimate the fraction of phenotypic variance attributable to additive genetic effects for each cross and trait we fit the model y = a+e, where y contains the segregant phenotype values and is standardized to have mean 0 and variance 1. Here, a are the additive genetic effects and the residual error is denoted as e. The distributions of these effects are assumed to be multivariate normal with mean zero and variance-covariance as follows:

a~N0,σA2A and e~N0,σEV2I

Here, A is the additive relatedness matrix, the fraction of genome shared between pairs of segregants and was calculated as MM'/n where M is the n x m matrix of standardized marker genotypes, n is the number of segregants and m is the number of markers.

We also fit an expanded model to estimate the relative contribution of additive vs non-additive (pairwise epistatic) effects. For the pairwise epistatic component, we believe that the assumption that all pairs of loci contribute to trait variation with effect sizes drawn from a single normal distribution is violated when one or a few QTL-QTL interactions with large effects are present, resulting in a downward bias. We previously showed (Bloom et al., 2015) that loci involved in such stronger interactions can be detected in additive scans. Therefore, by explicitly including additive QTLs in the three components model, we avoid making the assumption that the effect sizes of all locus pairs are drawn from the same normal distribution and obtain a better estimator of total two-way epistatic variance when large-effect QTL-QTL interactions are present. This model was parameterized as:

y=βX+Zq+Za+Zf+Zg+Zi+Zp+e

The distributions of these effects are assumed to be multivariate normal with mean zero and variance-covariance as follows:

q~N0,σAQTL2AQTL, a~N0,σA2A, f~N0, σAQTL*AQTL2AQTLAQTL,

g~N0,σAQTL*A2AQTLA, i~N0,σA*A2AA,p~N(0,σR2In), and e~N(0,σEV2Im) where y is a vector of length L that contains phenotypes for n segregants including replicate measurements such that L = n x [number of replicates]. β is a vector of estimated fixed effect coefficients. X is a matrix of fixed effects (here β is the overall mean, and X is a 1L vector of ones unless otherwise specified). Z is an L x n incidence matrix that maps L total measures to n total segregants. In order, the random effect terms correspond to the effects of detected QTL, effects from the whole genome, epistatic interactions between detected QTL, epistatic interactions between additive QTL and the genome, epistatic interactions between all pairs of markers across the genome, and residual repeatability, following very similar methods and syntax as described previously (Bloom et al., 2015). We also fit a model that omitted the terms for epistatic interactions between detected QTL, and epistatic interactions between additive QTL and the genome. The mixed model was fit with the regress R package (Clifford and McCullagh, 2014) using restricted maximum likelihood estimation (REML). Standard errors of variance component estimates were calculated as the square root of the diagonal of the Fisher information matrix from the iteration at convergence of the Newton-Raphson algorithm. These procedures were used for all other mixed model analyses described below. For the analysis that compared the fraction non-additive to additive variation, we calculated σAQTL*AQTL2+σAQTL*A2+σA*A2σAQTL2+σA2.

Allele-frequency lookup in 1011 yeast isolate population

Request a detailed protocol

We used bcftools isec (Li et al., 2009) to intersect our VCF containing sequence variant information on the 16 parental strains with the 1011 yeast isolate VCF generated by Peter et al. (2018), and vcftools (Danecek et al., 2011) to further filter only biallelic variants. This subset of 259,647 biallelic markers was used for variance components analysis and joint QTL mapping across the panel. Allele frequencies in the larger panel of 1011 yeast isolates were extracted from the provided VCF (Peter et al., 2018). Derived allele frequencies were calculated by using nucmer (Marçais et al., 2018) to perform whole genome alignment between the sacCer3 reference assembly and the CBS432 assembly of S. paradoxus. Variants were identified using delta-filter and show-snps commands provided in nucmer. Biallelic variants in our panel were classified as ancient if the variant matches the S. paradoxus sequence and recent if not. The unfolded allele frequency was calculated as the frequency of the recent variant. We could determine ancestral status for approximately 80% of the biallelic variants. To improve power for enrichment tests, we used derived allele frequency <5% and>95% as cutoffs when comparing effect sizes and signs of effects between derived and ancestral variants.

Genotype recoding for joint analyses

Request a detailed protocol

We coded the biallelic markers for which we had allele frequency data from the larger yeast isolate panel as −1 for matching the reference strain, or one if not matching the reference. If a variant does not segregate in a particular cross it was treated as missing in that cross.

Mixed model analysis with allele-frequency partitioning

Request a detailed protocol

We fit the following mixed model model per trait (jointly across the different crosses):

y=βX+r+c+e

The distributions of these effects are assumed to be multivariate normal with mean zero and variance-covariance as follows:

 rN(0,σR2Amaf<1%), cN(0,σC2Amaf>=1%), and eN(0,σEV2Im) where y is a vector of length 13,950 that contains phenotypes for segregants concatenated across the different crosses. β is a vector of estimated fixed effects of each cross. X is an incidence matrix mapping segregants to crosses. Here, the two relatedness matrices Amaf<1% and Amaf>=1% were calculated separately for all markers with MAF<1% and MAF>=1% respectively in the larger panel of 1,011 yeast isolates. Per marker, the genotype values were scaled to have mean 0 and variance 1, for each of the segregants from crosses in which that marker segregates. Markers that are fixed within a cross were excluded from the subsequent calculation of genetic covariance. The rationale for excluding data for variants not segregating in a given cross is that all such variants are completely confounded with each other and with any other effects specific to that cross. Thus, their effects are more appropriately captured by including a fixed effect for each cross within the analysis. Then, with M being the n segregants by m markers matrix corresponding to the standardized genotypes for that subset of markers, we calculated the relatedness matrix as a Gower’s centered matrix (Forni et al., 2011; Kang et al., 2010; McArdle and Anderson, 2001) MM'tr(MM')n which has the property that the average diagonal coefficient equals 1.

We used the same logic to construct additional covariance matrices when more finely binning variants by allele frequency in the external panel (seven allele-frequency bins model). Bins were chosen to contain approximately equal numbers of variants. We also fit the seven allele-frequency bins model using only variants that were private to each parent (variants that only segregate in a pair of crosses). In this last model, the allele-frequency of variants used for the analysis are all approximately the same across the panel. Therefore, this last model does not make the assumption that the variance of variants effects is inversely proportional to their frequencies in the mapping panel (Yang et al., 2010).

The procedure for fitting these models was the same as described above in the section ‘within-cross variance component analysis’.

Accounting for large effect QTL and polygenic background for all chromosomes except the chromosome of interest for joint QTL mapping

Request a detailed protocol

For each chromosome of interest and for each trait and cross and trait, we calculated yc = CQ+aL+sc where yc is the vector of trait values for a given trait and cross, Q is a matrix of QTL genotypes at peak markers from the within-cross mapping described above, with FDR < 5% that are not located on the chromosome of interest, C is a vector of estimated QTL effects from the section ‘within-cross QTL mapping’, aL is the additive genetic variance from all chromosomes excluding the chromosome of interest. aL comes from the REML-based BLUP estimate of the effect all other chromosomes, including the fixed effects of detected QTL on the other. The goal of this step was to obtain the residual trait values sc that can be used to scan for QTLs on a chromosome of interest and corrects for mapped genetic sources of variation that do not arise from the chromosome of interest (Yang et al., 2014).

Joint QTL mapping

Request a detailed protocol

Under the assumption that a causal biallelic variant has a consistent additive effect in all the crosses in which it segregates, we implemented a model to identify such variants jointly across our entire segregant panel (McMullen et al., 2009; Stich, 2009). This procedure increases statistical power. For example, for variants that are private to one of the 16 parental strains, this procedure will approximately double the observed number of instances of the minor allele, resulting in less noisy estimates of variant effects. For variants that are shared among multiple parents, the increase in the observed number of instances of the minor allele will be greater.

For each trait and each chromosome, and then for each marker on that chromosome, we calculated a t-statistic as r1-r2n-2. Here, r is the Pearson correlation between the recoded segregant genotypes across the panel, and the vector s, which corresponds to the values of sc described in the previous section concatenated across the different crosses. The number of informative segregants, n, differs for each biallelic variant, and corresponds to the sum of the sample sizes for each cross in which the variant segregates. P-values were calculated that factor in the different number of informative segregants, n, in the calculation of the degrees of freedom using built-in R functions. The -log10(p) was recorded. This statistic was calculated for each marker on the chromosome. 1000 permutations of phenotype to strain assignment were performed, but these permutations were performed with phenotype values within each cross (we did not permute values between crosses) and this statistic was calculated across the genome for each of the permutations. For each of the permutations, the maximum statistic was recorded to generate an empirical null distribution of the maximum statistic (Churchill and Doerge, 1994). A new corrected p-value was calculated as the probability the observed maximum statistic comes from the empirical null distribution of maximum statistics. If the observed maximum statistic was greater than all of the empirical null maximum statistics the p-value was recorded as 1e-3. The p-value was added to a set of p-values (p1, … pk), and the entire procedure was repeated (including permutations) with the previously identified marker(s) included as regression covariates. A ‘FowardStop’, FDR-controlling statistic (G’Sell et al., 2013) was calculated as described above. We continued to add selected markers to a multiple regression model as long as the ‘ForwardStop’ statistic was less than or equal to 5%.

Effect size estimation for joint QTL mapping

Request a detailed protocol

The peak markers (lead variants) from this procedure were then used for effect size estimation. For each trait and cross, the phenotypes are scaled to have mean 0 and variance 1, and effect sizes within each cross are estimated using multiple regression for the peak markers that segregate within that cross. The betas in this analysis correspond to the differences in the means between the two QTL alleles (conditional on the effects of the other segregating QTL). For peak markers that segregate in multiple crosses, the average betas over the different crosses are shown in Figure 3. Unbiased estimates of QTL effect size (Figure 3—figure supplement 4) were obtained by the same procedure except peak detection was performed in 9/10 of the data and effects estimated in the 1/10 of the data left out. Allele frequencies of the lead variants were looked up in the 1011 isolate panel.

Statistical fine-mapping to identify causal genes

Request a detailed protocol

We implemented the probabilistic identification of causal snps (PICS) procedure, a Bayesian approach to estimate the probability that a variant is causal. A very thorough description of the method, including details about the logic and implementation, is present in Farh et al. (2015). We aggregated these probabilities within genes to estimate the probability that a gene contains the causal variant. We noted the position of the observed QTL peak (called the ‘lead’ variant in the GWAS literature), and its effect size for all QTL that explained more than 2% of phenotypic variance from the within-cross mapping (equivalent to 0.1414 SD units). We assumed that the prior probabilities of a variant being causal, or being identified as a lead variant, are equal. For this analysis, we only used variants that fall within a 50 kb window centered around the detected QTL peak. For each variant within this window, we simulated the observed QTL effect size on the background of noise, 500 times. Here, noise was estimated as the residual error of the within-cross QTL model for that trait and cross. Each of the simulations was generated by a different permutation of the assignment of the residual error to segregant. We then repeated our mapping procedure for the simulated data and calculated the fraction of simulations where the observed QTL peak from our trait mappings was the lead variant given the simulated causal variant. This posterior probability was estimated for each of the variants within the 50 kb window, and then normalized so that the sum of all the probabilities in the window is 1. This generated a variant-level probability of causality for each variant within the window for that trait and cross.

Next, we identified overlapping QTL. Overlapping QTL were defined as the QTL coming from neighboring crosses that shared a parent, have 1.5 LOD drop confidence intervals that overlap, and have QTL effect directions that are consistent between the neighboring crosses. For these overlapping QTL, we calculated the product of the causality probabilities (described above) for each variant shared between the two crosses (and segregating in both crosses) and then normalized these probabilities so that they sum to 1. To calculate the probability that a gene was causal, we summed these probabilities for all variants that fell within each gene. Here, a gene was defined as all variants that fell within the defined open-reading frame as well as variants that fell halfway between the start and stop of the adjacent open-reading frames. We calculated a FDR by sorting the observed posterior probabilities of causality per gene from highest to lowest, calculating a posterior error probability as one minus the posterior probability of causality, and calculating the cumulative mean of these probabilities (Käll et al., 2008; Storey, 2003; Storey and Tibshirani, 2003).

We note that the causal gene statistic is an estimate of the posterior probability that a gene is causal assuming that one causal variant in the defined window is responsible for generating a signal in two crosses that share a parent strain, that we estimate the effects of causal variants in both crosses without error, and that genotypes are called without error.

Gene ontology enrichment analyses

Request a detailed protocol

We tested for GO enrichments using the R package topGO (Alexa and Rahnenfuhrer, 2018), using the Fisher test for enrichment ant the ‘classic’ scoring method that does not adjust the enrichments for significance of child GO terms.

References

  1. 1
  2. 2
    topGO: enrichment analysis for gene ontology
    1. A Alexa
    2. J Rahnenfuhrer
    (2018)
    topGO: enrichment analysis for gene ontology.
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
    Picard Tools
    1. Broad Institute
    (2019)
    Picard Tools.
  11. 11
  12. 12
    Empirical threshold values for quantitative trait mapping
    1. GA Churchill
    2. RW Doerge
    (1994)
    Genetics 138:963–971.
  13. 13
    The regress package
    1. D Clifford
    2. P McCullagh
    (2014)
    The regress package, https://cran.r-project.org/web/packages/regress/regress.pdf.
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
    Rare and low-frequency coding variants alter human adult height
    1. E Marouli
    2. M Graff
    3. C Medina-Gomez
    4. KS Lo
    5. AR Wood
    6. TR Kjaer
    7. RS Fine
    8. Y Lu
    9. C Schurmann
    10. HM Highland
    11. S Rüeger
    12. G Thorleifsson
    13. AE Justice
    14. D Lamparter
    15. KE Stirrups
    16. V Turcot
    17. KL Young
    18. TW Winkler
    19. T Esko
    20. T Karaderi
    21. AE Locke
    22. NGD Masca
    23. MCY Ng
    24. P Mudgal
    25. MA Rivas
    26. S Vedantam
    27. A Mahajan
    28. X Guo
    29. G Abecasis
    30. KK Aben
    31. LS Adair
    32. DS Alam
    33. E Albrecht
    34. KH Allin
    35. M Allison
    36. P Amouyel
    37. EV Appel
    38. D Arveiler
    39. FW Asselbergs
    40. PL Auer
    41. B Balkau
    42. B Banas
    43. LE Bang
    44. M Benn
    45. S Bergmann
    46. LF Bielak
    47. M Blüher
    48. H Boeing
    49. E Boerwinkle
    50. CA Böger
    51. LL Bonnycastle
    52. J Bork-Jensen
    53. ML Bots
    54. EP Bottinger
    55. DW Bowden
    56. I Brandslund
    57. G Breen
    58. MH Brilliant
    59. L Broer
    60. AA Burt
    61. AS Butterworth
    62. DJ Carey
    63. MJ Caulfield
    64. JC Chambers
    65. DI Chasman
    66. Y-DI Chen
    67. R Chowdhury
    68. C Christensen
    69. AY Chu
    70. M Cocca
    71. FS Collins
    72. JP Cook
    73. J Corley
    74. JC Galbany
    75. AJ Cox
    76. G Cuellar-Partida
    77. J Danesh
    78. G Davies
    79. PIW de Bakker
    80. GJ de Borst
    81. S de Denus
    82. MCH de Groot
    83. R de Mutsert
    84. IJ Deary
    85. G Dedoussis
    86. EW Demerath
    87. AI den Hollander
    88. JG Dennis
    89. E Di Angelantonio
    90. F Drenos
    91. M Du
    92. AM Dunning
    93. DF Easton
    94. T Ebeling
    95. TL Edwards
    96. PT Ellinor
    97. P Elliott
    98. E Evangelou
    99. A-E Farmaki
    100. JD Faul
    101. MF Feitosa
    102. S Feng
    103. E Ferrannini
    104. MM Ferrario
    105. J Ferrieres
    106. JC Florez
    107. I Ford
    108. M Fornage
    109. PW Franks
    110. R Frikke-Schmidt
    111. TE Galesloot
    112. W Gan
    113. I Gandin
    114. P Gasparini
    115. V Giedraitis
    116. A Giri
    117. G Girotto
    118. SD Gordon
    119. P Gordon-Larsen
    120. M Gorski
    121. N Grarup
    122. ML Grove
    123. V Gudnason
    124. S Gustafsson
    125. T Hansen
    126. KM Harris
    127. TB Harris
    128. AT Hattersley
    129. C Hayward
    130. L He
    131. IM Heid
    132. K Heikkilä
    133. Øyvind Helgeland
    134. J Hernesniemi
    135. AW Hewitt
    136. LJ Hocking
    137. M Hollensted
    138. OL Holmen
    139. GK Hovingh
    140. JMM Howson
    141. CB Hoyng
    142. PL Huang
    143. K Hveem
    144. MA Ikram
    145. E Ingelsson
    146. AU Jackson
    147. J-H Jansson
    148. GP Jarvik
    149. GB Jensen
    150. MA Jhun
    151. Y Jia
    152. X Jiang
    153. S Johansson
    154. ME Jørgensen
    155. T Jørgensen
    156. P Jousilahti
    157. JW Jukema
    158. B Kahali
    159. RS Kahn
    160. M Kähönen
    161. PR Kamstrup
    162. S Kanoni
    163. J Kaprio
    164. M Karaleftheri
    165. SLR Kardia
    166. F Karpe
    167. F Kee
    168. R Keeman
    169. LA Kiemeney
    170. H Kitajima
    171. KB Kluivers
    172. T Kocher
    173. P Komulainen
    174. J Kontto
    175. JS Kooner
    176. C Kooperberg
    177. P Kovacs
    178. J Kriebel
    179. H Kuivaniemi
    180. S Küry
    181. J Kuusisto
    182. M La Bianca
    183. M Laakso
    184. TA Lakka
    185. EM Lange
    186. LA Lange
    187. CD Langefeld
    188. C Langenberg
    189. EB Larson
    190. I-T Lee
    191. T Lehtimäki
    192. CE Lewis
    193. H Li
    194. J Li
    195. R Li-Gao
    196. H Lin
    197. L-A Lin
    198. X Lin
    199. L Lind
    200. J Lindström
    201. A Linneberg
    202. Y Liu
    203. Y Liu
    204. A Lophatananon
    205. Jian'an Luan
    206. SA Lubitz
    207. L-P Lyytikäinen
    208. DA Mackey
    209. PAF Madden
    210. AK Manning
    211. S Männistö
    212. G Marenne
    213. J Marten
    214. NG Martin
    215. AL Mazul
    216. K Meidtner
    217. A Metspalu
    218. P Mitchell
    219. KL Mohlke
    220. DO Mook-Kanamori
    221. A Morgan
    222. AD Morris
    223. AP Morris
    224. M Müller-Nurasyid
    225. PB Munroe
    226. MA Nalls
    227. M Nauck
    228. CP Nelson
    229. M Neville
    230. SF Nielsen
    231. K Nikus
    232. PR Njølstad
    233. BG Nordestgaard
    234. I Ntalla
    235. JR O'Connel
    236. H Oksa
    237. LMO Loohuis
    238. RA Ophoff
    239. KR Owen
    240. CJ Packard
    241. S Padmanabhan
    242. CNA Palmer
    243. G Pasterkamp
    244. AP Patel
    245. A Pattie
    246. O Pedersen
    247. PL Peissig
    248. GM Peloso
    249. CE Pennell
    250. M Perola
    251. JA Perry
    252. JRB Perry
    253. TN Person
    254. A Pirie
    255. O Polasek
    256. D Posthuma
    257. OT Raitakari
    258. A Rasheed
    259. R Rauramaa
    260. DF Reilly
    261. AP Reiner
    262. F Renström
    263. PM Ridker
    264. JD Rioux
    265. N Robertson
    266. A Robino
    267. O Rolandsson
    268. I Rudan
    269. KS Ruth
    270. D Saleheen
    271. V Salomaa
    272. NJ Samani
    273. K Sandow
    274. Y Sapkota
    275. N Sattar
    276. MK Schmidt
    277. PJ Schreiner
    278. MB Schulze
    279. RA Scott
    280. MP Segura-Lepe
    281. S Shah
    282. X Sim
    283. S Sivapalaratnam
    284. KS Small
    285. AV Smith
    286. JA Smith
    287. L Southam
    288. TD Spector
    289. EK Speliotes
    290. JM Starr
    291. V Steinthorsdottir
    292. HM Stringham
    293. M Stumvoll
    294. P Surendran
    295. LM ‘t Hart
    296. KE Tansey
    297. J-C Tardif
    298. KD Taylor
    299. A Teumer
    300. DJ Thompson
    301. U Thorsteinsdottir
    302. BH Thuesen
    303. A Tönjes
    304. G Tromp
    305. S Trompet
    306. E Tsafantakis
    307. J Tuomilehto
    308. A Tybjaerg-Hansen
    309. JP Tyrer
    310. R Uher
    311. AG Uitterlinden
    312. S Ulivi
    313. SW van der Laan
    314. AR Van Der Leij
    315. CM van Duijn
    316. NM van Schoor
    317. J van Setten
    318. A Varbo
    319. TV Varga
    320. R Varma
    321. DRV Edwards
    322. SH Vermeulen
    323. H Vestergaard
    324. V Vitart
    325. TF Vogt
    326. D Vozzi
    327. M Walker
    328. F Wang
    329. CA Wang
    330. S Wang
    331. Y Wang
    332. NJ Wareham
    333. HR Warren
    334. J Wessel
    335. SM Willems
    336. JG Wilson
    337. DR Witte
    338. MO Woods
    339. Y Wu
    340. H Yaghootkar
    341. J Yao
    342. P Yao
    343. LM Yerges-Armstrong
    344. R Young
    345. E Zeggini
    346. X Zhan
    347. W Zhang
    348. JH Zhao
    349. W Zhao
    350. W Zhao
    351. H Zheng
    352. W Zhou
    353. JI Rotter
    354. M Boehnke
    355. S Kathiresan
    356. MI McCarthy
    357. CJ Willer
    358. K Stefansson
    359. IB Borecki
    360. DJ Liu
    361. KE North
    362. NL Heard-Costa
    363. TH Pers
    364. CM Lindgren
    365. C Oxvig
    366. Z Kutalik
    367. F Rivadeneira
    368. RJF Loos
    369. TM Frayling
    370. JN Hirschhorn
    371. P Deloukas
    372. G Lettre
    (2017)
    Nature 542:186–190.
    https://doi.org/10.1038/nature21039
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
    Yeast as a Model for Ras Signalling
    1. R Tisi
    2. F Belotti
    3. E Martegani
    (2014)
    In: L Trabalzini, editors. Ras Signaling: Methods and Protocols, Methods in Molecular Biology. Totawa, NJ: Humana Press. pp. 359–390.
    https://doi.org/10.1007/978-1-62703-791-4_23
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66

Decision letter

  1. Christian R Landry
    Reviewing Editor; Université Laval, Canada
  2. Naama Barkai
    Senior Editor; Weizmann Institute of Science, Israel

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Acceptance summary:

The authors show that rare and recent variants contribute proportionally more than common ones to trait variation in budding yeast. They combine a powerful quantitative genetics approach with extensive trait phenotyping and population genomics data to illustrate how natural selection may be keeping mutations with large effects from reaching high frequency. The expected negative correlation between effect size and frequency has been studied theoretically, but much less so at the experimental level because of the complex type of data needed. The study has broad significance, from population genetics, to quantitative genetics and evolution of complex traits.

Decision letter after peer review:

Thank you for submitting your article "Rare variants contribute disproportionately to quantitative trait variation in yeast" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Naama Barkai as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Your manuscript addresses an important question in genetics and evolutionary biology: what are the relative contributions of genetic variants to phenotypic variation and do these contributions correlate with the frequency of these variants within a species? The reviewers agree that this is an important question that has rarely been comprehensively addressed. They therefore find the work of interest and the findings to be important for the field, in addition to appreciating the quality of the writing and of the presentation. However, they identify some aspects that would need to be reconsidered or better presented and interpreted (details below). In addition, your paper could be strengthened if it was extended to include a more detailed Introduction on why these questions are important for the non-specialists. You could also include more discussions on the implications of your findings, including the implications for biology and evolution in general and for yeast in particular. Since eLife is a generalist journal, the manuscript would appeal to a larger audience with these changes. Your manuscript is currently very short so this would be feasible by the addition of a few short sections in the current structure of the manuscript. You will see that one specific comment relates to the novelty of the work with respect to previous studies. This means that the novelty may not be obvious as presented. It would therefore also be important to emphasize this aspect in a revised version.

Essential revisions:

1) There are two analyses, within cross and joint analysis. I have to go back and forth between Results and Materials and methods to figure out exactly what is done. It would be nice to make clear when discussing the results from which one they are derived.

2) Because the segregants are haploid, there is only the A x A interaction. The majority of variance generated by A x A in fact goes into the additive variance, hence the non-additive variance is small. The authors did not make a big deal out of the fact that non-additive variance is only 1/6 of additive variance, but I feel it's important to stress that large additive variance is expected given the population design. In addition, when estimating the variance attributable additive variance and epistatic variance, the authors broke the non-additive variance into three components, AQTL x A, AQTL x AQTL, and A x A. I wonder if this is necessary because there was no mention of the differences between these three components. A single component A(all) x A(all) could be fitted.

3) The two-component mixed model analysis has some caveats. There is correlation between rare and common variants, i.e., the variance components are not orthogonal. This makes any claim about the relative importance of rare versus common less reliable. For example, for the Cadmium chloride trait (Figure 3—figure supplement 1), the 7-component model seems to disagree with the 2-component model, with MAF < 0.01 explaining much less in the 7-component model than in the 2-component model in A. I think comparing the 2-component and one component model could suffer from the same problem. Perhaps a more appropriate (but still not perfect) analysis is to fit single component model first and then fit two component models, do it in two sequential orders (rare, then rare + common; versus common, then common + rare), and look at how the cumulative variance increase. This will tell you which MAF class is more important or can explain more variance. I think in this case, one-component model is more informative than the 2-component model.

4) In Figure 3, the relationship between effects and MAF or DAF is a major result. Although many other papers have reported similar results, I think this paper (and the co-submitted paper from Fournier et al.) has the most appropriate design, i.e., the discovery panel is independent of where MAF is estimated. Given its central role in this paper, it probably deserves a bit more clarity. A few questions came to mind when reading this part of the paper. In what analysis is the effect size estimated, single crosses or joint? Could you briefly explain how the effect sizes are estimated in the Results section? If effects estimated from joint analysis, the t-statistic used a factor (n-2) to normalize the degree of freedom, which is smaller for rare variants. This would lead to Beavis effect. I believe the authors used a cross validation strategy to estimate effects, but it's not very clear by reading the Materials and methods. Can you also plot 2pqa^2 versus MAF? Even if a is large, the variance contributed by rare variants could be small.

5) The main conclusion of the manuscript is that rare variants significantly contribute to genetic variance. In my view, this conclusion is biased as these rare causal variants are being analysed in genetic backgrounds in which they are no longer rare; actually, these variants are biallelic. Several studies have shown that a rare variant of MKT1(89A) is a significant contributor to phenotypic variation whenever it is present in segregating populations. However, MKT1(89A) allele hardly identified when one of the parents is not S288c, the strain which harbours this allele. So, the extension that if the rare variant has a significant effect in a sub-population, then its effect size would be similar in a large heterogeneous population is false. Furthermore, the authors conclude that in their larger 16 strain segregant populations, a representative distribution of 1000 strain collection, most of the variants have additive effects. This the authors claim is revalidation of their other previous studies (Bloom et al., 2013, 2015), where they identified that most of the causal variants between BYxRM had additive effects. However, in their subsequent paper (Frosberg et al., 2017, PMID 28250458) and another paper (Yadav et al., 2016, PMID 28172852) showed that variance mapping in BYxRM segregants helped to account for genetic interactions and showed how non-additive interactions also contribute significantly to phenotypic variation. Therefore, I find that just doing a few more strains and larger no. of segregants per cross does not make this manuscript a significant advance over the previous studies. One can argue that taking into account all causal variants identified till date (Fay, 2013), one can identify what frequency of rare variants have been identified, e.g. a typical example being MKT1(89A) allele as causal, even though their effect size will not be identified using this strategy. Peltier et al., 2019, show that 284 rare QTNs variants have been identified till date and these functional variants being private to a subpopulation, possibly due to their adaptive role to a specific environment. Moreover, this conclusion can be made without these extensive experimental crosses.

https://doi.org/10.7554/eLife.49212.023

Author response

Essential revisions:

1) There are two analyses, within cross and joint analysis. I have to go back and forth between Results and Materials and methods to figure out exactly what is done. It would be nice to make clear when discussing the results from which one they are derived.

We apologize for any confusion resulting from our presentation of these analyses. We have made a clarification to the text to emphasize that, except for the one paragraph comparing the two analyses beginning with “We complemented the joint analysis with QTL mapping within each cross…”), the text focuses entirely on results from the joint analysis.

2) Because the segregants are haploid, there is only the A x A interaction. The majority of variance generated by A x A in fact goes into the additive variance, hence the non-additive variance is small. The authors did not make a big deal out of the fact that non-additive variance is only 1/6 of additive variance, but I feel it’s important to stress that large additive variance is expected given the population design. In addition, when estimating the variance attributable additive variance and epistatic variance, the authors broke the non-additive variance into three components, AQTL x A, AQTL x AQTL, and A x A. I wonder if this is necessary because there was no mention of the differences between these three components. A single component A(all) x A(all) could be fitted.

Our estimates of additive variance per trait and cross are not exceptional when compared with those obtained from approaches that have used pedigree or marker-based measures of relatedness for numerous traits in plants, livestock, other model organisms, and humans (e.g. Visscher et al., 2008 PMID 18319743; Yang et al., 2010, among many others). We note that our population of line cross progeny is actually expected to give a higher estimate of epistatic variance when compared to outbred populations: as e.g. Mackay et al., 2014, PMID 24296533 Figure 2 shows, estimates of epistatic variance are maximized as allele frequencies of the interacting loci approach 0.5 (as in our line crosses here). As the reviewer notes, another potential non-additive component, dominance variance, is not accessible in our experiment design which uses haploids, but study designs that can estimate dominance variance have not detected a large contribution (e.g. Parts et al., 2016, PMID 27804950).

We are grateful to the reviewer for pointing out an omission in the Materials and methods section of our manuscript regarding an explanation of why we modeled the epistatic variance with three components. First, as the reviewer suggests, we have added results from a model with a single A(all) x A(all) component to Supplementary file 2. In Author response image 1 we show a visual comparison of the fraction of non-additive variance explained by the three component model (x-axis) and the one component model (y-axis) for each trait and cross (the diagonal line corresponds to identity between the two estimates). The estimates are very similar for most traits and crosses, but one can observe that the three component model occasionally gives a higher estimate. This happens because a key assumption of the one component model – that all pairs of loci contribute to trait variation with effect sizes drawn from a single normal distribution – is violated when one or a few QTL-QTL interactions with large effects are present, resulting in a downward bias. We previously showed (Bloom et al., 2015) that loci involved in such stronger interactions can be detected in additive scans. Therefore, by explicitly including additive QTLs in the three component model, we avoid making the assumption that the effect sizes of all locus pairs are drawn from the same normal distribution and obtain a better estimator of total two-way epistatic variance when large-effect QTL-QTL interactions are present. We have included this rationale in the revised manuscript, in the Materials and methods susbection “Within-cross analysis to estimate additive and pairwise genetic interaction variance”.

Author response image 1

3) The two-component mixed model analysis has some caveats. There is correlation between rare and common variants, i.e., the variance components are not orthogonal. This makes any claim about the relative importance of rare versus common less reliable. For example, for the Cadmium chloride trait (Figure 3—figure supplement 1), the 7-component model seems to disagree with the 2-component model, with MAF < 0.01 explaining much less in the 7-component model than in the 2-component model in A. I think comparing the 2-component and one component model could suffer from the same problem. Perhaps a more appropriate (but still not perfect) analysis is to fit single component model first and then fit two component models, do it in two sequential orders (rare, then rare + common; versus common, then common + rare), and look at how the cumulative variance increase. This will tell you which MAF class is more important or can explain more variance. I think in this case, one-component model is more informative than the 2-component model.

We agree with the reviewer that genetic linkage creates a correlation between rare and common variants in genetic mapping studies. The variance component analysis performed here is based on approaches that are the standard in the field, and that have been extensively used in studies of human datasets that seek to address similar fundamental questions about the contribution of variants at different allele frequencies in a population (e.g. Yang et al., 2015; Gazal et al., 2018, PMID 30297966; Wainschtein et al., 2019). How the robustness of estimators obtained from these procedures is affected by the presence of genetic linkage, assumptions about the distributions of causal variant effect sizes, and the relationship between effect size and allele frequency is an active area of research. (e.g. Speed et al., 2017, PMID 28530675) The reviewer is proposing a new forward stepwise variance component analysis which to our knowledge has not been reported before in the literature and which poses its own issues of implementation and interpretation that are beyond the scope of our paper. We agree that this is an interesting idea, and we hope that by making our dataset available, we can stimulate the development of this and other new methods.

With regard to the comparison between the estimates of the contribution of rare alleles from the two-component allele frequency model (light blue bar in Figure 3—figure supplement 1A) and the 7-component model (Figure 3—figure supplement 1B), one can see that for nearly all traits, the estimate of variance explained is very similar, with the exception of cadmium chloride pointed out by the reviewer. We note that cadmium chloride is exceptional among the traits, with nearly all the additive heritability explained by a single locus near the gene PCA1, and that the patterns of segregation in different crosses are consistent with allelic heterogeneity at this locus. Contributions of QTLs with large effects are often poorly modeled with whole-genome variance component approaches, and we believe that this accounts for the discrepancy noted by the reviewer.

We further note that the known limitations of variance component analyses were a primary motivation for our study, and that in subsequent sections we also analyzed our dataset using fixed effects models based on detected QTLs. Our study design is highly-powered to detect QTL effects that jointly account for most of the heritable variance, enabling these analyses for the first time. As we show in Figure 3B, Figure 3C, Figure 3—figure supplement 2, and Figure 3—figure supplement 5, the fixed effects models lead to conclusions similar to those obtained from the variance component models.

4) In Figure 3, the relationship between effects and MAF or DAF is a major result. Although many other papers have reported similar results, I think this paper (and the co-submitted paper from Fournier et al.) has the most appropriate design, i.e., the discovery panel is independent of where MAF is estimated. Given its central role in this paper, it probably deserves a bit more clarity. A few questions came to mind when reading this part of the paper. In what analysis is the effect size estimated, single crosses or joint? Could you briefly explain how the effect sizes are estimated in the Results section? If effects estimated from joint analysis, the t-statistic used a factor (n-2) to normalize the degree of freedom, which is smaller for rare variants. This would lead to Beavis effect. I believe the authors used a cross validation strategy to estimate effects, but it’s not very clear by reading the Materials and methods. Can you also plot 2pqa^2 versus MAF? Even if a is large, the variance contributed by rare variants could be small.

We appreciate the reviewer’s positive comments regarding our study design, which decouples allele frequencies in the population from allele frequencies in the mapping panel, thereby allowing us to obtain estimates of effect sizes of rare variants without the typical complications one encounters in GWAS designs regarding sample size and confounding. We welcome the opportunity to clarify the details here, in the revised main text, and in the Materials and methods. Briefly, QTL peak markers are detected in the joint analysis for each trait. Then, for each trait and cross, the phenotypes are scaled to have mean 0 and variance 1, and effect sizes within each cross are estimated using multiple regression for the peak markers that segregate within that cross. The betas in this analysis correspond to the differences in the means between the two QTL alleles. For peak markers that segregate in multiple crosses, the average betas over the different crosses are shown in Figure 3. This is now described in greater detail in the Materials and methods subsection “Effect size estimation for joint QTL mapping”.

The reviewer correctly points out that because we perform model selection and parameter estimation on the same data set, parameter estimates may be upwardly biased (this is known as the Beavis effect). We note that we carried out simulation analyses (Figure 3—figure supplement 2) which indicated that while some estimate inflation is present, it does not qualitatively alter the results in Figure 3. To further address the reviewer’s concern, we have now calculated unbiased estimates of effect sizes by training a model on 9/10 of the data and estimating parameters on the 1/10 of the data held out from the training procedure. The results are shown in a new supplementary figure (Figure 3—figure supplement 4) and are very similar to Figure 3, but estimates are noisier due to the smaller sample size available for unbiased estimation in this procedure. This is now described in the aforementioned Materials and methods subsection.

We believe that it is important to proceed carefully when reporting and interpreting the relationships between allelic effect sizes, variance explained, and allele frequencies for individual QTL effects. We have modified our Introduction to give additional background as to the relevant issues. Figure 3—figure supplement 5 shows the cumulative fraction of variance explained in our mapping panel for each trait by the detected QTLs. In this calculation of variance explained, we used the allele frequency of the QTL peak marker in our mapping population. Were we to instead calculate variance explained in the larger panel of yeast isolates (Peter et al., 2018) using the allele frequencies of variants in that panel, but effect sizes estimated in our mapping population, the variance contributed by rare variants would necessarily be small because the study design severely undersamples variants that are rare and ultra-rare in this larger population (our mapping panel consists of individuals derived from only 16 of 1012 strains). We are concerned that presenting such results would be actively misleading and confusing. We believe that the results from the variance components analysis and those shown in Figure 3—figure supplement 5 would be recapitulated in the larger yeast population if we could detect and estimate the effects of all the variants present in that population, rather than the small fraction that segregates in our crosses.

5) The main conclusion of the manuscript is that rare variants significantly contribute to genetic variance. In my view, this conclusion is biased as these rare causal variants are being analysed in genetic backgrounds in which they are no longer rare; actually, these variants are biallelic. Several studies have shown that a rare variant of MKT1(89A) is a significant contributor to phenotypic variation whenever it is present in segregating populations. However, MKT1(89A) allele hardly identified when one of the parents is not S288c, the strain which harbours this allele. So, the extension that if the rare variant has a significant effect in a sub-population, its effect size would be similar in a large heterogeneous population is false. Furthermore, the authors conclude that in their larger 16 strain segregant populations, a representative distribution of 1000 strain collection, most of the variants have additive effects. This the authors claim is revalidation of their other previous studies (Bloom et al., 2013, 2015), where they identified that most of the causal variants between BYxRM had additive effects. However, in their subsequent paper (Frosberg et al., 2017, PMID 28250458) and another paper (Yadav et al., 2016, PMID 28172852) showed that variance mapping in BYxRM segregants helped to account for genetic interactions and showed how non-additive interactions also contribute significantly to phenotypic variation. Therefore, I find that just doing a few more strains and larger no. of segregants per cross does not make this manuscript a significant advance over the previous studies. One can argue that taking into account all causal variants identified to date (Fay, 2013), one can identify what frequency of rare variants have been identified, e.g. a typical example being MKT1(89A) allele as causal, even though their effect size will not be identified using this strategy. Peltier et al., 2019, show that 284 rare QTNs variants have been identified to date and these functional variants being private to a subpopulation, possibly due to their adaptive role to a specific environment. Moreover, this conclusion can be made without these extensive experimental crosses.

The summary and the previous reviewer comment underscore the novel contributions of our paper, including the ability to address the empirical contribution of rare variants to trait variation in a more comprehensive manner than has previously been possible, and the decoupling of allele frequency from variant discovery. We have taken the opportunity, as requested in the summary, to significantly expand the Introduction to make these and other key points clearer to the non-specialist. We agree that theoretical approaches in evolutionary, population and quantitative trait genetics have been applied to predict the relative contributions of common and rare variants to trait variation under different sets of assumptions; indeed, we cite many relevant papers in our manuscript. We also agree that there is value in aggregating information from individual case reports in the literature – we cited Fay, 2013 in our manuscript, and we noted that the large set of candidate QTGs systematically identified in our study is enriched for QTGs reported in that study. The variants reported in Fay, 2013 represent a sparse sampling of variant effects in yeast, and were by necessity based on studies with small sample sizes, which biased this set of variants in favor of large effects. The review by Peltier et al., 2019 (published after the submission of our manuscript) is similarly based on genes and variants previously reported in the literature. We have added a citation to this paper. We believe that our systematic, comprehensive, empirical approach provides much more general insights into the relative contributions of variants of different frequencies.

https://doi.org/10.7554/eLife.49212.024

Article and author information

Author details

  1. Joshua S Bloom

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States
    3. Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States
    4. Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    jbloom@mednet.ucla.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7241-1648
  2. James Boocock

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States
    3. Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States
    4. Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Data curation, Software, Formal analysis, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Sebastian Treusch

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States
    3. Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States
    4. Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
    Present address
    Intrexon, South San Francisco, United States
    Contribution
    Conceptualization, Resources
    Competing interests
    Sebastian Treusch is now affiliated with Intrexon, although all work for this study was carried out while ST was affiliated with UCLA. The author has no other competing interests to declare
  4. Meru J Sadhu

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States
    3. Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States
    4. Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
    Present address
    National Human Genome Research Institute, National Institutes of Health, Bethesda, United States
    Contribution
    Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
  5. Laura Day

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States
    3. Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Resources, Methodology
    Competing interests
    No competing interests declared
  6. Holly Oates-Barker

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States
    3. Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Resources, Methodology
    Competing interests
    No competing interests declared
  7. Leonid Kruglyak

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, United States
    3. Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, United States
    4. Institute for Quantitative and Computational Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    Conceptualization, Supervision, Funding acquisition, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    LKruglyak@mednet.ucla.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8065-3057

Funding

National Institutes of Health (R01GM102308)

  • Joshua S Bloom
  • Meru J Sadhu
  • Laura Day
  • Holly Oates-Barker
  • Leonid Kruglyak

Howard Hughes Medical Institute

  • Joshua S Bloom
  • Laura Day
  • Holly Oates-Barker
  • Leonid Kruglyak

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Bogdan Pasaniuc, Frank W Albert, Olga T Schubert, Liangke Gou, Tzitziki Lemus Vergara, Matthieu Delcourt, Longhua Guo, and Eyal Ben-David for helpful manuscript feedback and edits. We thank Illumina for performing synthetic long-read sequencing of the parental yeast strains. This work was supported by funding from the Howard Hughes Medical Institute (to LK) and NIH grant R01GM102308 (to LK). The authors declare no competing financial interests.

Senior Editor

  1. Naama Barkai, Weizmann Institute of Science, Israel

Reviewing Editor

  1. Christian R Landry, Université Laval, Canada

Publication history

  1. Received: June 11, 2019
  2. Accepted: October 23, 2019
  3. Accepted Manuscript published: October 24, 2019 (version 1)
  4. Version of Record published: December 4, 2019 (version 2)

Copyright

© 2019, Bloom et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,491
    Page views
  • 254
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Genetics and Genomics
    Luisa F Pallares
    Insight

    Rare genetic variants in yeast explain a large amount of phenotypic variation in a complex trait like growth.

    1. Genetics and Genomics
    2. Microbiology and Infectious Disease
    Susanne U Franssen et al.
    Research Article