Although most of the 3000 million nucleotides in the human genome are the same in every person on the planet, there are about 90 million sites that can vary between individuals. The source of all phenotypic variation in humans lies in these 90 million genetic variants, and in their interactions with each other and with the environment. Identifying the genetic variants that are involved in a specific trait (such as height or disease status) is a long-standing goal in biology.
Today, researchers rely on genome-wide association studies (GWAS) to find the genetic variants that are relevant to a specific trait. In GWAS the genomes of individuals are analyzed to see if particular genetic variants are correlated with variation in traits of interest. GWAS results have identified hundreds of variants underlying phenotypic variation in humans, mice, fruit flies, rice, maize, and many other taxa. Yet, despite the large number of alleles that have been identified using this technique, the amount of phenotypic variation they explain is just a fraction of what twin and pedigree studies predict is heritable. For example, twin studies have shown that approximately 80% of variation in human height can be explained by genetic factors (Silventoinen et al., 2012). However, the results of the best powered GWAS only explain around 20% of such variation (Wood et al., 2014). This gap is known as the ‘missing heritability problem’.
Rare and low-frequency genetic variants (which have allele frequencies of <1% and <5% respectively) have been proposed as one explanation for the missing heritability problem (reviewed in Gibson, 2012). Such variants are routinely excluded from GWAS studies because when an allele is present in few individuals, the statistical analysis used to draw correlations between traits and alleles is not powerful enough to obtain significant results. As a consequence around 90% of genetic variation in humans and other organisms like yeast has so far gone unexplored (Figure 1A; Auton et al., 2015; Peter et al., 2018). The missing heritability might be hiding in plain sight, but until now, studying the effect of rare alleles on the variation of traits influenced by more than one gene was extremely challenging. Now, in eLife, two independent groups report the results of experiments on yeast which show that rare variants have a fundamental role in phenotypic variation at the population level.
In a monumental effort, the two groups independently selected a set of wild and domesticated yeast isolates from all over the world and crossed them to generate a genetically diverse panel of thousands of strains (Figure 1B). They then exposed each cross to more than 35 different media conditions and quantified their growth by measuring colony size. As a result of the crossing scheme, genetic variants that were present in just one or a few yeast isolates were now present in hundreds of samples in the experimental panels (Figure 1B). This allowed the groups to include a large number of rare variants (up to 28% of the total) in the GWAS analysis: many of these variants would have been excluded from traditional GWAS studies due to their low allele frequency.
Both groups independently identified thousands of genetic variants associated with growth, and estimated that over half of growth variance can be attributed to additive effects. To determine how variants with different frequencies contributed to phenotypic effects, variants were classified into either rare (<1%) and common (>1%) (Bloom et al., 2019), or rare (<1%), low frequency (1–5%) and common (>5%) (Fournier et al., 2019). This classification was based on 1011 yeast genomes that represent global yeast diversity (Figure 1A; Peter et al., 2018). Strikingly, rare variants contributed a disproportionate amount to phenotypic variation in both studies.
In one study Joseph Schacherer and co-workers at the University of Strasbourg – including Téo Fournier as first author – found that 16% of the GWAS results were rare alleles even when they made up just 4% of all the variants used in the experiments (Fournier et al., 2019). In the other study Joshua Bloom, Leonid Kruglyak and colleagues at UCLA estimated that over half of the observed growth variation can be explained by rare variants, even when they represented only 28% of the variants used (Bloom et al., 2019). The UCLA team also found that the rare variants detected in GWAS tend to have larger effect sizes than common variants, tend to reduce growth ability, and tend to have arisen more recently in evolutionary time.
These results join recent efforts exploring the effect of rare variants on complex traits. For human height it has been shown that rare variants have effect sizes ten times larger than common variants (Marouli et al., 2017), and that together they account for most of the missing heritability in this trait (Wainschtein et al., 2019). In parallel, it was estimated that at least a quarter of gene expression heritability in humans is accounted for by rare variants (Hernandez et al., 2019). The fact that in humans, as well as yeast, the contribution of rare variants to complex traits is now beyond doubt suggests that it may be the same in other species. However, addressing this question in organisms with larger genomes and not amenable to crossing schemes remains challenging. But rest assured, researchers will find a way.
Genome-wide association studies (GWAS) allow to dissect complex traits and map genetic variants, which often explain relatively little of the heritability. One potential reason is the preponderance of undetected low-frequency variants. To increase their allele frequency and assess their phenotypic impact in a population, we generated a diallel panel of 3025 yeast hybrids, derived from pairwise crosses between natural isolates and examined a large number of traits. Parental versus hybrid regression analysis showed that while most phenotypic variance is explained by additivity, a third is governed by non-additive effects, with complete dominance having a key role. By performing GWAS on the diallel panel, we found that associated variants with low frequency in the initial population are overrepresented and explain a fraction of the phenotypic variance as well as an effect size similar to common variants. Overall, we highlighted the relevance of low-frequency variants on the phenotypic variation.
The exceptionally rich fossil record available for the equid family has provided textbook examples of macroevolutionary changes. Horses, asses, and zebras represent three extant subgenera of Equus lineage, while the Sussemionus subgenus is another remarkable Equus lineage ranging from North America to Ethiopia in the Pleistocene. We sequenced 26 archaeological specimens from Northern China in the Holocene that could be assigned morphologically and genetically to Equus ovodovi, a species representative of Sussemionus. We present the first high-quality complete genome of the Sussemionus lineage, which was sequenced to 13.4× depth of coverage. Radiocarbon dating demonstrates that this lineage survived until ~3500 years ago, despite continued demographic collapse during the Last Glacial Maximum and the great human expansion in East Asia. We also confirmed the Equus phylogenetic tree and found that Sussemionus diverged from the ancestor of non-caballine equids ~2.3–2.7 million years ago and possibly remained affected by secondary gene flow post-divergence. We found that the small genetic diversity, rather than enhanced inbreeding, limited the species’ chances of survival. Our work adds to the growing literature illustrating how ancient DNA can inform on extinction dynamics and the long-term resilience of species surviving in cryptic population pockets.