Introduction

Sexual reproduction typically involves mating between individuals that belong to separate groups of reciprocal reproductive compatibility. This partition among conspecific individuals is obvious in species where the male and female reproductive functions are performed by distinct categories of individuals, but can also be present in purely hermaphroditic species, where compatibility among individuals is governed by the segregation of various numbers of mating types in the total absence of sexual specialisation. The existence of sexes or mating-types leads to one the strongest forms of long-term balancing selection, and is often associated with clusters of polymorphisms around sex/mating-type controlling regions kept together by structural rearrangements. In some cases, such rearrangements can span almost entire chromosomes (e.g. sex chromosomes in mammals 1 or mating-type chromosomes in ascomycete fungi 2), while in others they remain limited to relatively small genomic regions (e.g. chromosomal inversions controlling male reproductive morphs in the ruff3, mating-type loci in some basidiomycete fungi, segregating indels controlling pin vs. thrum floral morphs in Primula 4). The long-term balancing selection acting on these systems is expected to lead to the accumulation of deleterious mutations in the tightly linked chromosomal segments that are “sheltered” from purifying selection by the presence of the balanced polymorphism5,6. These deleterious mutations can have drastic short- and long-term consequences for the evolution of the species, and determining the processes by which they accumulate is crucial to understand how the rearranged regions can either expand along the chromosomes or conversely remain restricted to limited genomic tracts7,8.

Self-incompatibility (SI) is a genetic mechanism allowing recognition and rejection of self-pollen by hermaphrodite individuals, thereby preventing inbreeding and promoting outcrossing in hermaphroditic plant species 9. In the Brassicaceae family, SI is controlled by a single non-recombining chromosomal region, the S-locus10,11. SI is one of the most prominent examples of long-term balancing selection 12, and as such deleterious mutations are expected to accumulate in very close genetic linkage to the S-alleles because of the indirect effects of linked selection 6. Population genetics models predict that deleterious variants should accumulate within specific S-allele lineages12,13, and should then be reshuffled among them by recombination. However, due to the technical difficulty of phasing polymorphisms, this process has rarely been characterised in detail14.

A key feature of sporophytic SI systems, also shared by sex chromosomes, is the existence of dominance interactions between S-alleles. While most individuals are heterozygous at the S-locus and thus carry two different S-alleles, only one of them is generally expressed at the phenotypic level, especially for the pollen specificity, following a complex genetic dominance hierarchy 15,16. Evolutionary properties of S-alleles vary along the dominance hierarchy because balancing selection is predicted to act more strongly on dominant than on recessive S-alleles, as the latter are often masked at the phenotypic level 17. As a result, Llaurens et al. 13 and Goubet et al. 18 suggested that the dynamics of accumulation of deleterious variation may differ in close linkage with dominant vs recessive S-alleles, similarly to Y vs. X chromosomes, respectively. Specifically, recessive S-alleles can form homozygous combinations in natural populations more often than dominant S-alleles17, such that recombination can occur occasionally between distinct gene copies of the same recessive S-allele, providing the opportunity for the linked recessive deleterious mutations to be purged from within the S-locus itself. In addition, because recessive S-alleles reach higher population frequencies15,17, purifying selection on linked deleterious variants is expected to have higher efficacy among gene copies of recessive than dominant S-alleles. This is expected to result in a higher fixation probability of deleterious variants linked to the class of dominant S-alleles than to the class of recessive S-alleles13. Empirical support for this simple prediction has been conflicting, though. Based on phenotypic measurements in A. halleri, Llaurens et al. 13 observed a decrease of fitness associated by enforced homozygosity for one of the most dominant S-alleles (Ah15) but not for the most recessive S-allele of the allelic series (Ah01). In contrast, Stift et al. 19 observed no effect of dominance on the genetic load linked to three dominant vs . recessive S-alleles in a natural population of the closely related A. lyrata. Hence, the data available so far are inconclusive, but are restricted to very small numbers of S-alleles. They are also based on inherently limited phenotypic measurements, seriously limiting the power of the comparisons, and preventing proper generalisation of the effect of dominance on the accumulation of linked deleterious variation.

In this study, we combined phenotypic, genomic and theoretical approaches to finely dissect the patterns of accumulation of deleterious variation linked to the S-locus supergene in A. halleri and A. lyrata. We first extended the phenotypic approach of Llaurens et al. 13 to a series of additional S-alleles from the same local A. halleri population to evaluate the effect of S-allele dominance on the sheltered load. We then used parent-offspring trios and targeted genome re-sequencing to directly quantify the accumulation of putative deleterious mutations linked to phased dominant vs. recessive S-alleles in two A. halleri and three A. lyrata natural populations. Finally, we used stochastic simulations to refine the theoretical predictions about the effect of S-allele dominance on the dynamics of linked deleterious mutations. Overall, our results provide a more nuanced view of the effect of the intensity of balancing selection on the sheltered load, in which the structure of the sheltered load rather than its magnitude differs among S-alleles from different dominance classes.

Results

The genetic load linked to the S-locus varies among S-alleles, but is not correlated with dominance

We first expanded the experimental approach of Llaurens et al. 13 to phenotypically evaluate the effect of S-allele dominance on the intensity of the sheltered load. The previous study focused on three S-alleles (Ah01, Ah02 and Ah15) 13. Here we included two S-alleles from the same local population (Nivelle, France): Ah03 and Ah04, and included Ah01 again for comparative purposes. In the Arabidopsis genus, S-alleles have been shown to form a complex dominance hierarchy 15,16. This hierarchy is largely associated with the phylogeny of S-alleles15, and at least four phylogenetic classes (I, II, III and IV) have been described, from the most recessive (class I) to the most dominant of S-alleles (class IV). Dominance interactions also exist among S-alleles within classes, such that these five S-alleles form the following dominance hierarchy 15,16: Ah01<Ah03<Ah02<Ah04<Ah15, from the most recessive (Ah01) to the most dominant (Ah15). To reveal the linked load, we enforced homozygosity at the S-locus using controlled crosses between parental individuals sharing a given S-allele that was masked by different dominant S-alleles (e.g., to obtain Ah xAhx homozygotes we deposited pollen from a Ah xAhy plant, where Ahy>Ahx, on pistils of a Ah xAhz plant where z≠y, or on a AhxAhx pistil when available; table S1). We obtained 399 offspring from a total of six such crosses. Note that our experimental procedure differs slightly from that of Llaurens et al. 13 in that their procedure required a CO 2 treatment to bypass the SI system and obtain selfed offspring, while here we took advantage of the dominance interactions to obtain outcrossed S-locus homozygous individuals that we phenotypically compared to their full-sibs with S-locus heterozygous genotypes. Note also that the S-locus homozygous offspring we obtained contain distinct gene copies of a given S-allele lineage. Hence, they could in principle carry distinct suites of linked deleterious mutations in case these mutations segregate within S-allele lineages.

We first tested whether homozygosity at the S-locus affected survival by measuring for each cross the proportion of homozygotes at the S-locus reaching the reproductive stage for three S-alleles (in two replicate families per S-allele, Table S1). The proportion of Ah01/Ah01 and Ah04/Ah04 homozygotes surviving to the reproductive stage was consistent with mendelian expectations in their respective families. However, we observed a significant decrease of Ah03/Ah03 homozygotes at the reproductive stage compared with Mendelian expectations (Table 1), whereas the observed proportion of the Ah03 S-allele among heterozygous individuals did not depart from expectations (2/3=0.67; Table 1). Thus, the increased mortality is associated with Ah03 homozygosity, rather than with a lower performance of individuals carrying the Ah03 S-allele itself. Overall, a genetic load was thus observed linked to the Ah03 S-alleles, which is at an intermediate level of dominance, but neither to the most dominant (Ah04) nor to the most recessive (Ah01) S-allele. Hence, these observations do not support a positive correlation between S-allele dominance and the magnitude of the sheltered load.

Proportion of S-locus homozygous offspring having reached the reproductive stage for three different S-alleles.

The test is performed relative to the expected proportion of homozygous genotypes in the offspring (25% when both parents are heterozygous; 50% when one of the parents is homozygous and the other heterozygous).

Next we measured thirteen vegetative and reproductive traits in the resulting families and compared offspring that were homozygous for their S-alleles with their full sibs that were heterozygous (Fig. 1). We first used permutations to test whether the mean trait value of homozygotes differed from that in heterozygotes. Overall, with a single exception, we found no effect of homozygosity at the S-locus on variation of the traits measured (Fig. 1; Table S2). The maximum length of flowering stems was the exception to this general pattern, with longer reproductive stems for S-locus homozygous than heterozygous genotypes, hence in the opposite direction from our expectation of lower fitness in homozygotes. For this trait, there was significant variation among replicate families for homozygotes of the recessive allele Ah01 but not of the dominant allele Ah04 (Table S3). We then used generalised linear models (GLM) to evaluate the effect of dominance (as a fixed effect) on the mean phenotypic value of homozygotes compared to heterozygotes for each trait (Table S4; treating family of origin, attacks by phytopathogens, phytophagous and oxidative stress as random effects whenever necessary). We also observed no effect of S-allele dominance on the contrast between S-locus homozygotes and heterozygotes for any of these traits. A single of the thirteen traits was an exception to this general pattern, but again the effect was in the opposite direction from our expectation, with an earlier rather than delayed appearance of the first leaf for homozygotes of more dominant S-alleles; Table S4). Overall, our phenotypic results confirmed the presence of a detectable linked load on some phenotypic traits (survival; time to produce the first leaf), but we could not replicate the observation of Llaurens et al. 13 that dominant S-alleles carry a more severe deleterious load than recessive S-alleles, even though our samples were obtained from the same local population.

Effect of homozygosity at the S-locus on 13 phenotypic traits compared to heterozygotes.

For each trait, the phenotypic values in homozygotes (in grey) were normalised relative to the mean phenotypic values in heterozygotes (in black). The distributions were obtained by 10,000 random permutations. SD: Standard Deviation.

S-alleles are associated with specific sets of tightly linked mutations

The model of the sheltered load assumes that distinct S-allele lineages carry specific sets of linked deleterious mutations, but to our knowledge this prediction was never tested directly. We combined a parent-offspring trio approach with sequencing of the S-locus flanking regions to phase the mutations segregating in the S-locus flanking regions with their respective S-alleles. Briefly, we used a previously developed sequence capture protocol specifically targeting the nucleotide sequences over 75 kb on each side of the S-locus along with a series of 100 control regions from throughout the genome20, and we analysed nucleotide sequence polymorphism (including only invariant and biallelic SNPs), based on the A. lyrata reference genome 21. We define a haplotype as a unique combination of mutations along the phased chromosome, and a S-allele lineage as the collection of gene copies of a given functional S-allele (different functional S-alleles are distinguished based on their strong sequence divergence at the S-locus pollen and pistil genes). Different gene copies within an S-allele lineage can thus be associated with distinct linked haplotypes in the flanking regions. The S-alleles were identified based on short reads sequences according to a previously published method 41. We analysed two closely related A. halleri populations from Europe (Nivelle and Mortagne) and three allogamous A. lyrata populations from North America (IND, PIN and TSS 22). Overall, we were able to reconstruct 34 haplotypes linked to a total of 12 distinct S-allele lineages in Nivelle, 38 haplotypes linked to 11 distinct S-allele lineages in Mortagne and 16, 22 and 16 haplotypes associated with 6, 7 and 5 distinct S-allele lineages in populations IND, PIN and TSS, respectively (Table S5). Nine of the S-alleles were shared between the two A. halleri populations (Ah01, Ah03, Ah04, Ah05, Ah12, Ah20, Ah24, Ah25 and Ah59). In the populations of A. lyrata, four S-alleles were shared between PIN and TSS (Ah01*, Ah03*, Ah18* and Ah63*), five S-alleles were shared between PIN and IND (Ah01*, Ah03*, Ah46* and Ah63*), four S-alleles were shared between IND and TSS (Ah01*, Ah03*, Ah31* and Ah63*), and three were shared across all three (Ah01*, Ah03* and Ah63*). Note that for convenience, we used A. halleri notations (with the addition of a *) to refer to the trans-specifically shared A. lyrata S-alleles. Altogether, we were able to obtain the phased flanking sequences of 126 S-locus haplotypes, comprising a total of 4,854 variable sites. This provides considerable power to evaluate the local accumulation of linked mutations across S-alleles of different levels of dominance and to examine their patterns of conservation between populations and between species.

Mutations in the S-locus flanking regions can be exchanged between S-alleles by recombination, and between local populations by migration 23. The relative time scale of these two processes (recombination vs. migration) determines the distribution of the linked mutations. To capture the chromosomal extent of this effect of linkage to S-alleles, we developed a new phylogenetic method comparing the likelihood of two contrasted topologies of interest in overlapping windows along the chromosome: (1) the topology clustering haplotypes by the populations where they came from vs. (2) the topology clustering them by the S-allele to which they are linked (Fig 2A). This allowed us to evaluate the progressive shift from a predominant topology by S-alleles close to the S-locus to a topology by populations further along the chromosome and in unlinked control regions (Fig. 2B). The difference in log likelihood between the two topologies decreased significantly with distance to the S-locus (Pearson coefficient = -0.015 and -0.010 for A. halleri and A. lyrata respectively; p-values <2e-16). In A. halleri, the topology grouping haplotypes by populations became more likely than the topology grouping them by S-alleles at a distance of around 30kb from the S-locus, but even at a distance of 50kb the phylogenetic structure was still different from that in regions unlinked to the S-locus used as controls for the genomic background 20 (Fig 2B). In A. lyrata, the shift was even more rapid (within 10-15kb), although we note that the phylogenetic structure of the control regions was less resolved (Fig 2B). To evaluate these patterns more directly we first examined the data using a Major Component Analysis (MCA, a modified version of PCA adapted to binary data, Fig S1 and S2) and using simple phylogenetic reconstructions (Fig S3, S4, S5 and S6). We confirmed that haplotypes linked to a given S-allele tended to cluster together in the most tightly linked region, and that this grouping by S-alleles was progressively lost in favour of a grouping by population of origin in the most distant regions. Following Kamau et al. 24, we compared the fixation index FST among local populations and among S-alleles in A. lyrata and A. halleri. In both A. halleri and A. lyrata, FST values among S-alleles were high in regions close to the S-locus and quickly decreased to reach the background level (Fig. S7) as the distance from the S-locus increased. In parallel, the differentiation among populations followed roughly the opposite pattern, i.e. it was initially low in regions close to the S-locus (as expected under strong balancing selection) and increased up to background level within the first few kilobases (Fig. S7). In line with our phylogenetic analysis, differentiation between populations started to exceed differentiation between S-alleles much closer to the S-locus in the A. lyrata than in the A. halleri populations (Fig S3, S4, S5 and S6). Finally, we explored the fine-scale patterns of association within populations between individual S-alleles and SNP in the linked and the control regions (Fig S8). As expected, the vast majority of significant associations were found for the most closely linked SNPs. With a single exception, all S-alleles were associated with unique SNPs in the 50kb region around the S-locus, albeit with substantial heterogeneity among S-alleles in the patterns and extent of associations that they show (Fig S8). Overall, our results indicate that due to limited recombination, the S-alleles carry a specific set of polymorphic sites in the linked region. This association fades away for more distant sites over a few kilobases, where population structure becomes predominant, as in the rest of the genome. Hence, different S-alleles are associated with specific sets of tightly linked mutations, but only within 10-30kb.

Linkage to the S-locus locally distorts the phylogenetic relationships.

A. The two topologies of interest cluster haplotypes either by the S-allele to which they are linked (top) or by the populations where they came (bottom). Different S-alleles are represented by symbols of different colours, different populations of origin are represented by symbols of different shapes. B. Difference in log likelihood of the two topologies of interest. Dots correspond to the difference in log likelihood for overlapping series of 50 SNPs around the S-locus for A. halleri (top panel) and A. lyrata (bottom panel). Positive values correspond to chromosomal positions where the topology by S-alleles explains the phylogeny of haplotypes better than the topology by populations. The right panels show the difference in log likelihood in the control regions. 2.5 and 97.5 percentiles of the distribution in the control regions are indicated by dashed lines.

No overall evidence that dominant S-alleles accumulate more linked deleterious mutations

Llaurens et al. 13 predicted a positive correlation between the dominance of S-alleles and their tendency to fix linked deleterious mutations. Thus, we investigated the correlation between the level of dominance of the S-alleles and their total number of 0-fold degenerate mutations (S 0f) or the ratio of 0-fold to 4-fold mutations (S0f/S4f) for the phased haplotypes, assuming that the vast majority of 0-fold degenerate mutations are deleterious. Based on the results presented above and the results of our previous study 20, for the rest of our analyses we focused on the phased haplotypes over the first 25 kb on either side of the S-locus. We found no overall effect of dominance on S 0f (p-values= 0.54 and 0.07 for A. halleri and A. lyrata respectively; Fig. 3; Table S6) or S 0f/S4f (p-values= 0.54 and 0.07 for A. halleri and A. lyrata respectively; Table S6). Extending the analysis to all non-synonymous mutations led to identical conclusions (Table S6). Overall, our genomic results did not confirm the prediction that dominant S-alleles accumulate a larger number of putatively deleterious mutations in their linked regions. We note that the particular S-allele whose sheltered load was quantified in Llaurens et al. 13 (Ah15, red arrow on Fig 3A) appears to be one of the S-alleles associated with the highest number of 0-fold degenerate mutations among all S-alleles of the most dominant class (class IV).

No overall effect of S-allele dominance on the total number of 0-fold degenerate mutations (S0f) in the linked genomic regions within 25kb.

Each dot represents the mean number of mutations observed among haplotypes linked to one S-allele in one population. The correlations evaluated by a GLM are represented by lines, with confidence intervals represented in grey. A: A. halleri. Black dots correspond to the Nivelle population, grey dots to the Mortagne population. The red arrow points to the copy of Ah15, corresponding to the S-allele whose sheltered load was phenotypically characterised by Llaurens et al. 13 in Nivelle. B: A. lyrata. Red dots correspond to the IND population, black dots to the PIN population and blue dots to the TSS population.

The structure of the linked genetic load differs between dominant and recessive S-alleles

Theory predicts that dominant S-alleles should fix linked recessive deleterious mutations with a higher probability than recessive S-alleles13, but in natural populations we observed no difference in the total number of putatively deleterious linked to dominant vs. recessive S-alleles. To clarify this discrepancy, we took advantage of our sequencing of multiple copies of S-alleles to consider separately the fixed and the segregating mutations linked to each of the S-allele lineages. For each population, we included only mutations that were segregating, and excluded those that were locally fixed. In agreement with the prediction of Llaurens et al.13, we observed that lineages of dominant S-alleles do indeed tend to fix deleterious mutations more readily (Fig. 4). The fact that they do not accumulate a larger total number of deleterious mutations is explained by the fact that the structure of the genetic load differs between dominant and recessive S-alleles: the dominant S-alleles tend to have more fixed deleterious mutations, but the recessive S-alleles compensate by accumulating a larger number of segregating mutations, resulting in similar numbers of deleterious mutations overall in most of the populations.

The number of 0-fold degenerate mutations fixed in the 25kb regions flanking the S-locus increases with dominance of the S-allele associated.

Each dot represents the value obtained for haplotypes linked to one S-allele in one population. The correlations evaluated by a GLM are represented by lines, with confidence intervals represented in grey. A: A. halleri. Black dots correspond to the Nivelle population, grey dots to the Mortagne population. B: A. lyrata. Red dots correspond to the IND population, black dots to the PIN population, blue dots to the TSS population.

Motivated by these empirical observations, we extended the model proposed by Llaurens et al. 13 to predict the dynamics of accumulation of recessive deleterious mutations linked to S-alleles, focusing not only on fixed deleterious mutations but also on those that are segregating within allelic lineages. These stochastic simulations confirmed that, at equilibrium, dominant S-alleles tend to accumulate a larger number of deleterious mutations that are fixed among gene copies within S-allele lineages (Fig. 5A). In contrast, the number of segregating linked mutations was higher for recessive than for dominant S-alleles (Fig. 5B). Our model predicted that these two effects eventually compensate each other, such that in the end the total number of linked deleterious mutations was not expected to change with dominance of the S-alleles (Fig. 5C). These predictions are in line with our genomic observations and suggest that the dominance level of S-alleles modifies the structure of the genetic load they shelter: dominant S-alleles accumulate more fixed deleterious mutations, but recessive S-alleles accumulate more segregating mutations, resulting in an equivalent total load.

Stochastic simulations confirm the contrasted architecture of the load of deleterious mutations linked to dominant vs. recessive S-alleles.

Number of fixed (A), segregating (B) and total (C) deleterious mutations linked to S-alleles at four different levels of dominance (I<II<III<IV). The means (bold lines) were estimated per S-allele dominance classes over 100 replicate simulations after discarding an initial burn-in of 100,000 generations. h=0. s=0.01.

Discussion

The genetic load linked to the S-locus is detectable and is manifested on different phenotypes

Our results contribute to a still restricted but growing body of evidence confirming that the accumulation of deleterious mutations linked to strongly balanced allelic lines can be substantial, and that their effect can be detected at the phenotypic level 13,19,2528. An interesting observation is that the phenotypes on which the load was revealed varied among these studies. Here, the effect of homozygosity at the S-locus was apparent on juvenile survival and on the length of the longest flowering stem, but we detected no effect on any other morphological measurements, including leaf and rosette traits. In the same population of A. halleri, Llaurens et al.13 detected an effect on juvenile survival and on leaf size. A study in North American outcrossing populations of A. lyrata19 detected an effect on juvenile survival, but not on any other traits that they measured. In the horsenettle Solanum carolinense, the load was associated with reduced seed viability, flower number and germination25,28. Hence, the most consistent pattern seems to be a decrease of overall juvenile survival,possiblybecauseitisahighlyintegrativemeasurementoffitness,whereasother morphological or life history traits can be associated with more specific components of overall fitness.

A unique genetic load associated with each allele in each population

The model of the sheltered load posits that each S-allele should be associated with a specific set of linked mutations13. In line with this prediction, another consistent observation is that the magnitude of the S -linked load varied among S-alleles, as the load linked to some S-alleles was phenotypically detectable, while for others it was not. This variation of the genetic load is expected since deleterious mutations associated with the different alleles are likely to hit different linked genes, and affect different phenotypic traits with different effects on fitness. Also in line with the model of the sheltered load, our phasing of a large number of variants linked to S-haplotypes in several natural populations revealed that the same suite of linked mutations was consistently associated among different copies of a given allele when sampled from within the same population, in particular for the dominant S-allele lineages under more intense balancing selection. As expected for outcrossing populations with short-scale linkage disequilibrium, this association was lost when examining sites at increasing genetic distances from the S-locus along the chromosome (see also Le Veve et al.20). However, a proper model of sheltered load taking into account recombination among S-alleles is still missing. Finally, the association with linked sites was further lost when comparing gene copies of S-alleles sampled from different local populations, suggesting that recombination within populations decouples alleles from their linked sites faster than migration can homogenise the genetic composition among these natural populations. We note that the patterns of association and phylogenetic structure differed among populations, possibly due to their contrasted demographic histories. Indeed, the A. lyrata populations colonised North America from ancestral European populations about 20-30.000 years ago 29,30, and are less diverse overall than the A. halleri populations we studied, who colonised the north of France during the last century from ancestral German populations31. The progressive decoupling between alleles and their linked sites leads to the simple prediction that S-locus homozygous genotypes formed by crossing individuals carrying identical alleles from distinct populations should not reveal as much load as when they are formed by crossing individuals within populations. Hence, the S-locus region could contribute to overall hybrid vigour. Testing this prediction will be an interesting next step.

Different properties of the linked load according to S-allele dominance

The question of whether S-allele dominance, which modifies the intensity of balancing selection in the S-locus region, could explain variation of the linked load has received conflicting support in the literature. In line with Stift et al. 19, but in contradiction with Llaurens et al. 13, we observed no overall effect of dominance on the magnitude of the load. Several technical and biological reasons could explain the contrasted results obtained in these different studies. First, phenotypic quantification of the linked load is experimentally demanding, such that these studies relied on the comparison of a limited number of alleles (three S-alleles in each of the studies) and therefore each of them had inherently low power. Second, the experimental procedures to reveal the load varied slightly. Llaurens et al.13 used CO2 treatment to by-pass the SI system and obtain homozygous progenies from crosses that would otherwise have been incompatible, whereas we used the “natural” masking by dominant S-alleles to enable the obtention of recessive homozygous genotypes. Our approach is experimentally simpler and avoids the possible contamination by offspring obtained by selfing, which may combine the effect of the sheltered load with that of genome-wide inbreeding depression (see Stift et al. 19 for a detailed discussion of this caveat). Third, a limitation of our approach is that it is restricted to S-alleles that are recessive or intermediate along the dominance hierarchy, and is thus not applicable to quantify the load associated with the most dominantS-alleles under more intense balancing selection. It is therefore possible that the S-alleles we examined did not exhibit sufficiently contrasted levels of dominance, in particular if only the most dominant ones are generating a substantial load, as suggested for fully linked deleterious mutations. In addition, since the homozygous S-allele genotypes we create correspond to different gene copies from the population, they may carry distinct sets of linked deleterious variants, especially for the more recessive alleles. The variation we observed in the phenotypic magnitude of the load among families confirms that linked deleterious variants are unlikely to be fixed within all allele lineages. Finally, we note that our genomic analysis of the genetic load shows that the dominant allele Ah15 previously associated with reduced fitness in homozygotes 13,isindeedunusualintermsofthenumberof mutations it carries. In fact, it is one of the most “loaded” alleles among all the dominant S-alleles present in this population, possibly explaining why Llaurens et al. 13 observed a significant effect despite the inherently limited experimental power of their analysis.

Our stochastic simulations and genomic analyses concur to the conclusion that the intensity of balancing selection applied to each allele affects the genetic architecture of the linked load: a larger proportion of putatively deleterious mutations are fixed among gene copies of the dominant as compared to the recessive S-alleles, while gene copies of the recessive S-alleles tend to accumulate more segregating deleterious variation. While these two processes eventually compensate one another, they may have distinct consequences for the evolution of S-alleles. Uyenoyama 12 showed that the existence of a sheltered load should influence the evolutionary dynamics of new S-alleles through self-compatible intermediates. Specifically, antagonistic interactions are expected between ancestral and derived functional specificities because they would initially share their linked deleterious mutations, slowing down the establishment of new alleles. Our observation that partially different sets of linked mutations are associated with S-alleles from the different populations raises the question of whether the (short) time scale at which recombination decouples alleles from their sets of linked mutation is sufficiently fast to impede such antagonistic interactions to take place. In other words, the effect of the load on the diversification dynamics should be most important if the two mutational steps required for the emergence of new alleles under this model take place within local populations, rather than involving a metapopulation-scale process. As shown by Stetsenko et al.32, this is expected to occur under very low dispersal only. In addition, the observation that the architecture of the sheltered load differs between dominant and recessive S-alleles suggests that their diversification dynamics may also differ. Specifically, the self-compatible intermediates required for the formation of new S-alleles33,34 are expected to be capable of selfing as well as forming homozygous genotypes that would otherwise be prevented. While the consequences of selfing may be equivalent for all alleles (because the overall number of mutations to which they are linked are equivalent), the consequences of the formation of homozygotes allowed by the crossing of separate individuals sharing a given S-allele are expected to be more severe for dominant S-alleles. The segregation of distinct deleterious variants linked to different gene copies of recessive S-alleles implies that linked recessive deleterious mutations are likely to remain masked when two distinct gene copies of a given recessive S-allele are brought together. Hence, our results lead to the prediction that in natural populations self-compatible mutants may segregate more readily for the more recessive than for the more dominant S-alleles, and more generally for allelic lineages under lower intensity of balancing selection. Considering that self-compatible mutants are a necessary intermediate stage in the formation of new S-alleles, one may predict that the diversification dynamics should be more efficient for lineages of recessive than dominant S-alleles. This prediction is in line with the deeper phylogenetic divergence among the most dominant S -alleles observed in Arabidopsis16. Detailed quantification of the presence of self-compatible variants in natural populations will now be necessary to test this hypothesis. At this stage, however, a proper model of allelic diversification taking into account dominance interactions among S-alleles is still missing.

Our observation that the genetic load varies across balanced allelic lines is not unprecedented. The classical case of Y or W sex chromosomes are indeed examples where one balanced line accumulates a greater genetic load than the other (X or Z, respectively), eventually leading to substantial genetic degeneration3537. Another example is the supergene controlling variation in male plumage phenotypes of the ruff, where the genetic load on the derived “Satellite” haplotype is higher than on the ancestral “Independent” haplotype3,38. Similarly, in the butterfly Heliconius numata, the inverted haplotypes conferring mimetic wing patterns tend to accumulate a greater load than the non-inverted haplotypes39. Interestingly, in all these cases, the haplotypes with the greatest load also act genetically in a dominant manner, establishing a clear parallel with our observations. An interesting next step will be to determine whether similar asymmetries between the balanced allelic lines are observed in the linked region for these other systems.

It is clear from our results that S-allele dominance can affect the linked load, through its effect on the intensity of balancing selection, but in turn the differences in structure of the linked load may affect the conditions under which dominance can evolve. The Brassicaceae S-locus is indeed an interesting and rather unique system, where the mechanisms by which dominance is controlled and evolves under the action of so-called “dominance modifiers” have been studied in detail 39,40 (involving the interaction between small non-coding RNAs and their target sites 16, 40). The presence of deleterious mutations linked to S-alleles has been shown to affect the evolution of dominance modifiers, favouring evolution towards greater dominance than towards greater recessivity 40. This asymmetry arises from the fact that S-alleles that become recessive (e.g. following acquisition of a recessivity modifier such as a small RNA target) will start forming homozygous genotypes, leading to expression of their linked load, while S-alleles that become dominant will not. Our observation that many deleterious mutations linked to recessive S-alleles are indeed segregating, suggests that expression of the load will be less severe for recessive than for dominant S-alleles, hence decreasing this predicted asymmetry. It will now be essential to modify models for the evolution of dominance to allow for such differential load among S-alleles.

Acknowledgements

This work was funded by the European Research Council (NOVEL project, grant #648321) and ANR TE-MoMa (grant ANR-18-CE02-0020-01). AL’s PhD thesis was funded by the ERC and the University of Lille. The authors thank Barbara Mable for sharing seeds of A. lyrata andCamilleRouxfor discussions. This work was performed using infrastructure and technical support of the Plateforme Serre, cultures et terrains expérimentaux - Université de Lille for the greenhouse/field facilities. The authors thank the UMR 8199 LIGAN-MP Genomics platform (Lille, France) which belongs to the ‘Federation de Recherche’ 3508 Labex EGID (European Genomics Institute for Diabetes; ANR-10-LABX-46) and was supported by the ANR Equipex 2010 session (ANR-10-EQPX-07-01; ‘LIGAN-MP’). The LIGAN-MP Genomics platform (Lille, France) is also supported by the FEDER and the Region des Hauts-de-France. The authors thank the GenoScreen platform (Lille, France).

Author contributions

AL, ED, VC and XV developed and designed the experiments for the study. AL, CBL and CP performed the experiments. AL and MG analysed and interpreted the data. AL, ED, VC and XV wrote the manuscript. All authors edited the manuscript.

Declaration of interests

The authors declare no competing interests

Methods STAR

Source plant material

We worked on natural accessions from two closely related species, A. halleri and A. lyrata, represented by two population samples named Mortagne (50°47’N, 3°47’E, France, n=60) and Nivelle (50°47’N, 3°47’E, France, n=61) for A. halleri, and three highly outcrossing population samples from the North American Great Lakes, named IND (Indiana Dunes National Lakeshore in Michigan, n=9), PIN (Pinery Provincial Park in Ontario, n=11) and TSS (Tobermory Provincial Park in Ontario, n=8)22 for A. lyrata (Fig. S9). The A. lyrata populations colonised North America from ancestral European populations about 20-30.000 years ago 29,30 and the A. halleri populations are peripheral and likely colonised the north of France during the last century from ancestral German populations31.

We performed 92, 91, 40, 43 and 21 controlled crosses between randomly chosen individuals within the Nivelle, Mortagne, IND, PIN and TSS populations, respectively. We successfully obtained seeds from 60, 66, 21, 21 and 10 of these crosses, respectively. Because we were not interested in estimating population frequencies of S-alleles, we instead tried to maximise the number of reconstructed haplotypes and avoid over representing the most recessive S-allele (Ah01) that tends to segregate at very high frequencies in natural populations 15. To do this, we performed PCR with S-allele-specific primers 15,18 to screen the parents of the crosses and we removed from the experiment offspring with two parents carrying allele Ah01. For A. halleri, we selected 19 individuals from the Nivelle population and 19 individuals from the Mortagne population, based on their genotype at the S-locus (Fig. S9; Table S7). We also selected one offspring of 9, 11, 5, 6 and 5 pairs of selected individuals in the Nivelle, Mortagne, IND, PIN and TSS populations respectively for the phasing of S-haplotypes (Table S8; Fig. S9). To increase sample size for the phenotypic measurements, we included offspring from five additional crosses from the Nivelle population (Table S8).

Library preparation, capture and sequencing

We used a previously developed sequence capture approach to specifically sequence genomic regions of interest 20. Briefly, indexed genomic libraries were constructed for each individual and libraries were pooled in equimolar proportions. Fragments matching a series of regions of interest (including in particular the 75kb upstream and downstream of the non-recombining S-locus region as well as a series of 100 unlinked 25kb regions used as genomic controls 20), were then enriched using synthetic 120bp RNA probes and sequenced by Illumina MiSeq (a total of 159 million paired-end reads).

For six individuals (Table S7, S8), we completed the sequencing with genome-wide resequencing (WGS) in order to distinguish the homozygous and heterozygous genotypes at the S-locus based on read depth41, which is not possible using data from the capture protocol. The prepared libraries were sequenced by Illumina NovaSeq (2x 150pb, paired-end) from the GenoScreen platform (Lille, France).

Determination of the S-locus genotypes and dominance of S-alleles

We used a dedicated pipeline for genotyping the S-locus based on short reads sequencing41 obtained from each individual (Table S7 and S8). The level of dominance of S-alleles found in our study was determined based on either previous assessment of dominance in A. lyrata and A. halleri15,19,4244 or indirectly inferred based on the observed association between the phylogeny of S-alleles and levels of dominance45.

Read mapping and variant calling in A. halleri and A. lyrata populations

Raw reads were mapped on the complete A. lyrata reference genome V1.0.23 21 using Bowtie2 v2.4.146, as described in Le Veve et al 20. File formats were then converted to BAM using samtools v1.3.147 and duplicated reads were removed with the MarkDuplicates program of picard-tools v1.119 (http://broadinstitute.github.io/picard). These steps were performed by the custom Pyt hon script sequencing_genome_vcf.py available at https://github.com/leveveaudrey/analysis-of-polymorphism-S-locus.

We obtained an average of 620 million properly mapped paired-end 300bp reads per population sample. For consistency, we conserved only reads which mapped to the S-locus flanking or control regions, even for samples sequenced by WGS, using the targetintercept option of bedtool v2.25.0 48. We called all SNPs within the chromosomal segment comprising 50 kb upstream from the first base of the gene Ubox in 3’ and 50 kb downstream from the last base of the gene ARK3 in 5’ of the S-locus using the Genome Analysis Toolkit v. 3.8 (GATK)49 with the option GVCF and a quality score threshold of 60 using vcftool v0.1.15 50. This region contains 20 annotated protein-coding genes. In this study we excluded the genes inside the S-locus itself (SCR, SRK). For each sample independently, we computed the distribution of coverage depth across control regions using samtools depth 47. We excluded sites with either less than 15 reads aligned or coverage depth above the 97.5 % percentile, asthelatterarelikelytocorrespondtorepeatedsequences(e.g.transposableelementsor paralogs). Finally, we removed SNPs fixed in each population using the script 1_fix_pos_vcf.py (https://github.com/leveveaudrey/dominance_and_sheltered_load), thus retaining only nucleotide sites that were variable in the population.

Quantifying the sheltered load of deleterious mutations

We examined the genetic load signatures based on the accumulation of mutations on 0-fold degenerate sites, the vast majority of which are considered deleterious. The 0-fold and 4-fold degenerate sites were identified and extracted from the reference genome and the gene annotation using the script NewAnnotateRef.py 51. We also examined the genetic load signatures based on the accumulation of all non-synonymous mutations. We obtained a total of 2,441 and 2,435 variable positions for the A. halleri samples from Nivelle and Mortagne respectively, and 2,360 variable positions for the A. lyrata samples.

Phasing S-haplotypes

For each of the 9, 11, 5, 6 and 5 trios analysed in the Nivelle, Mortagne, IND, PIN and TSS populations respectively, we phased mutations in the flanking regions, resulting in 130 phased haplotypes (Fig. S9). Briefly, we used sites that were heterozygous in the offspring to resolve parental haplotypes by assuming no recombination between parent and offspring, thus attributing the allelic state that was shared between a parent and its offspring to their shared S-allele, and the allelic state that was not shared to the other (untransmitted) haplotype of the parent. Twelve of the parents had been used in more than one cross, and in these cases we phased their haplotypes only once (Table S8). We implemented the phasing procedure in the script 3_phase_S_allele.py available at https://github.com/leveveaudrey/dominance_and_sheltered_load.

Study of the structure of S-haplotypes

We first developed a new method to evaluate the distortion of the phylogenetic patterns caused by linkage to S-alleles. To do this, we used phyml 52 v.3.3 to calculate the likelihood of two contrasted topologies of interest: (1) the topology clustering haplotypes by the populations where they came from vs. (2) the topology clustering them by the S-allele to which they are linked (Fig 2A). We used sliding windows of sequences with 50 SNPs to obtain the variation of the difference in log-likelih ood between these two topologies along the chromosome. We then compared these values to their distribution throughout the genome obtained by random draws of sequences with 50 SNPs from the control regions. Second, we visualised the relationships among the phased haplotypes using maximum likelihood phylogenies based on the Tamura-Nei model 53, with 1,000 replicates in MEGA X54. Third, we followed Kamau et al.’s 24 approach and examined the variation of FST among populations within each species (Nivelle and Mortagne for A. halleri and IND, PIN and TSS for A. lyrata) along the flanking region in non-overlapping windows of 5kb. We also examined the variation of FST along the flanking region obtained by grouping haplotypes by their linked S-allele rather than by population of origin. Then, we compared these FST values computed in the S-locus flanking regions with their genomic distribution as determined from the 100 control regions. The FST values were estimated with the DNAsp 6 software 55. Fourth, we performed a major component analysis (MCA) based on SNPs in the first 5kb, SNPs between 5 and 25kb and SNPs between 25 and 50kb around the S-locus, using the R packages ‘ggplot2’ (version 3.4.0), ‘factoextra’ (version 1.0.7) and ‘FactoMiner’ (version 2.7). We compared the patterns obtained by these MCAs with those obtained from identical numbers of SNP (+/- 1%) from the control regions. Finally, we analysed genetic association in each population independently between each of the locally segregating variants and the S-alleles considered as phenotypes, using STRAT 56 V1.1 combined with Structure 57 V2.3. We examined the distribution of the top 0.1% most significant associations detected specifically for each S-allele in each population.

Estimation of the number of fixed and segregating deleterious mutations within S-allele lineages

For each variable position considered in the phased haplotypes, we estimated the number of mutations on 0-fold (S 0f) and 4-fold degenerate sites (S 4f) compared with the reference genome. We distinguished SNPs that were fixed from those that were segregating within each of the allelic lines. We used GLM with a Poisson distribution to test whether the number of fixed and segregating mutations were associated with S-allele dominance, considering populations as random effects. We reiterated the GLM analysis with the number of non-synonymous (S NS) and synonymous (S S) mutations.

Estimation of the phenotypic impact of homozygosity at the S-locus for three S-alleles

To determine if the genetic sheltered load putatively linked to the S-locus has a detectable phenotypic impact, we performed 45 crosses (Table S1) between offspring of the Nivelle individuals that we chose so that they shared one S-allele (Fig. S9). Based on the dominance hierarchy in pollen16 (Table S1), these crosses should correspond to compatible partners. The general principle of the experiment was to take advantage of the dominance hierarchy to mask recessive S-alleles and generate full sibs that were either homozygous (because they inherited the S-allele that was shared by their two parents) or heterozygous at the S-locus, and thus isolate the effect of homozygosity at the S-locus. Note that all offspring in our experiments were thus ”naturally” outcrossed, whereas Llaurens et al. 13 based their comparisons on outcrossed progenies obtained by enforced incompatible crosses and Stift et al. 19 based their comparisons on enforced selfed progenies. These crosses generated 399 seeds overall, with homozygous genotypes expected for the S-alleles Ah01, Ah03 and Ah04 forming the following dominance relationship: Ah01<Ah03<Ah04.

Seedlings were grown in a greenhouse between 14.5 and 23.1°C and a photoperiod of 16 hr day/8 hr night. Offspring from the six families were placed on tables, and their position randomised every three days. After three months of growing, all the germinated plants were vernalised under a temperature between 6 and 8°C and a natural photoperiod for two months (January-February). Then, all surviving plants began reproduction in a greenhouse under temperature between 10.6 and 25.3°C and a natural photoperiod. The genotypes at the S-locus were determined in surviving plants by a PCR approach, using S-allele-specific primers for the pistil-expressed SRK gene. We assessed the reproductive success of offspring from the different crosses on the basis of fourteen phenotypic traits (detailed below) and computed the mean difference for the trait between homozygotes and heterozygotes within each family. We also tested for departures from mendelian proportions of each S-locus genotypic category in the family after the apparition of the first stem. Significant departures were interpreted as reflecting differences in survival between homozygous and heterozygous S-locus genotypes. We performed 10,000 replicate simulations of mendelian segregation based on the S-locus genotype of the parents. We used GLM to test whether the phenotypic impact of homozygosity at the S-locus increased with dominance of the S-alleles. The models used for GLM depended on the type of trait analysed (poisson for the counts like the number of leaves, flowers by stems or days; gaussian for continue traits like the lengths, widths and areas).

We measured the following fourteen phenotypic traits: the time (days) to the first leaf measured by visual control every day during seven weeks after sowing the seeds, the number of leaves, the area of the rosette (cm²), the mean length and width of leaves (cm), the standard deviation of length and width of leaves (cm) and the mean area of leaves (cm 2) measured by ImageJ58 based on photographs taken seven weeks (+/- five days) after the first leaf. At reproduction, we measured the time to the first flower bud for the end of vernalisation (day), scored by visual control every three days during nine weeks, the number of flower buds per flower stem produced during four week after the appearance of the first bud, the number of flower stems, the length of the highest flower stem produced four weeks after the appearance of the first bud (cm), and finally the total duration of buds production (days), scored by visual control every three days during eleven weeks after the appearance of the first bud. The last trait we measured was the proportion of homozygotes per family that survived until reproduction assuming mendelian proportions in the seeds. During the whole experiment, the presence of phytophagous insects, pathogens and stress markers were scored as binary variables. The presence of phytophagous insects and pathogen attacks were detected by the occurrence of gaps in leaves. Oxidative stress was scored qualitatively based on the occurrence of purple leaves. We also controlled the effect of the family on the phenotypic trait. These effects were controlled by redistributing 10,000 times the values observed in groups of the same size observed for each effect (for example, presence or absence of pathogen attack) and comparing the difference for the trait observed with the distribution of the differences obtained in the permutations. We considered the impact of the effect on the trait if the observed difference between groups was higher than the 95% percentile of the distribution obtained randomly (Table S9). When the test was significant, the effect was implemented as a random effect in the GLM. We used the same method to control for the family effect, which was included as a random effect in GLM if necessary (Table S10). The general experimental procedure is summarised in Fig. S9 and all data analyses were done in R ver. 3.1.2 (R Development Core Team 2014).

Simulations

Finally, we refined the model of Llaurens et al., 13 in several ways. We simulated a panmictic population of N diploid individuals with non-overlapping generations. Each individual was defined by its genotype in a non-recombining genomic region. This region contains the S-locus, and a D locus where deleterious mutations accumulated. For the S-locus, we used a simple model of sporophytic SI, with four dominance classes, as observed in A. halleri 41 (only three classes were considered before13), and fourteen S-alleles (eight alleles in the class IV, three in the class III, two in the class II and one allele in the class I). This distribution mirrors that of the Nivelle population (Table S7), with the exception that a class II allele has been added because its presence has been reported in previous studies 15. Alleles within classes were assumed to be codominant with each other, and dominant over all alleles of the more recessive classes, with the following linear hierarchy between classes : classI<classII<classIII<classIV). We also assumed that no new S-allelecouldappearby mutation during the simulations. The population size was 10,000 diploid individuals, so as to be large enough to avoid S-allele loss by drift during the simulations (previously it was 1,000). The “D locus” comprised one hundred fully linked biallelic positions (versus a single one in Llaurens et al. 13). Fully recessive deleterious mutations were recurrently introduced (at a rate 10 -4), and reverse mutations were possible (at a rate 10 -5). We ignored partially recessive deleterious mutations because these mutations were predicted to be effectively eliminated by natural selection in Llaurens et al. 13. The survival probability p of a zygote depended on its genotype at the D locus: p = (1 = s)n with s the selection coefficient and n the number of positions homozygous for the mutated allele. We explored different values of the selection coefficient (0.1, 0.05, 0.03, 0.01 and 0.005). Under strong selection (s=0.1, 0.05 and 0.03), the combined effect of multiple mutations led to low-fitness individuals, eventually causing population extinction. Under weak selection, (s=0.005), we observed near fixation of the deleterious mutations under the influence of asymmetrical mutation. Hence, we focused on the intermediate value of the selection coefficient (s=0.01), where deleterious mutations segregated stably in the simulations.

We first ran simulations without deleterious mutations until a deterministic equilibrium for S-allele frequencies was reached, which was considered to be attained when allelic frequencies changed by less than 10 -3 between generations. Recessive deleterious mutations were then allowed to accumulate at the positions within the D locus. Each simulation was performed with 100 independent replicates of 100,000 generations, and the frequency of the deleterious alleles was recorded every 1,000 generations. At the end of the simulation runs, we estimated the number of deleterious mutations found in each haplotype associated with each S-allele to determine the expected patterns of association between the sheltered load and dominance at the S-locus.

Data Availability

All sequence data are available in the NCBI Short Read Archive (SRA; https://www.ncbi.nlm.nih.gov/sra) with accession codes: PRJNA744343, PRJNA755829.

All scripts developed are available in Github (https://github.com/leveveaudrey/dominance_and_sheltered_load https://github.com/leveveaudrey/analysis-of-polymorphism-S-locus).

Supplementary data

Figures legend

Analysis of Major Components obtained for haplotypes of A. halleri of the Nivelle (black dots) and Mortagne (grey dots) populations) based on SNPs in the first 5kb, between 5 and 25kb, and between 25kb and 50kb away from the S-locus.

The S-allele to which the SNPs are linked are represented by different symbols. The right panels show the analysis on control regions, each time matching the number of SNPs with that of the corresponding linked regions (left panels).

Analysis of major components (AMC) obtained for haplotypes of A. lyrata (of the PIN: grey dots, IND: red dots and TSS: blue dots) populations based on the SNPs in the first 5kb, between 5 and 25kb, and between 25kb and 50kb away from the S-locus.

The S-allele to which the SNPs are linked are represented by different symbols. The right panels show the analysis on control regions, each time matching the number of SNPs with that of the corresponding linked regions (left panels).

Phylogenetic tree obtained by Maximum Likelihood for haplotypes of A. halleri (populations Nivelle and Mortagne) across the first 25kb flanking the S-locus.

The Tamura-Nei model was used and the percentage of trees in which the associated haplotypes clustered together is shown next to the branches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The black braces indicate haplotypes clustering by populations. The overall pattern shows a structure by S-alleles, with exceptions highlighted in yellow.

Phylogenetic tree obtained by maximum likelihood for haplotypes of A. halleri (populations Nivelle and Mortagne) based on the nucleotide positions between 25kb and 50kb away from the S-locus.

The Tamura-Nei model was used and the percentage of trees in which the associated haplotypes clustered together is shown next to the branches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The black braces indicate haplotypes clustering by populations. The overall pattern shows a structure by S-alleles, with exceptions highlighted in yellow.

Phylogenetic tree obtained by maximum likelihood for haplotypes of A. lyrata (populations PIN, IND, TSS) across the first 5kb flanking the S-locus.

The Tamura-Nei model was used and the percentage of trees in which the associated haplotypes clustered together is shown next to the branches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The black braces indicate haplotypes clustering by populations. The overall pattern shows a structure by S-alleles, with exceptions highlighted in yellow.

Phylogenetic tree obtained by maximum likelihood for haplotypes of A. lyrata (populations PIN, IND, TSS) based on the nucleotide positions between 5kb and 10kb away from the S-locus.

The Tamura-Nei model was used and the percentage of trees in which the associated haplotypes clustered together is shown next to the branches. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The black braces indicate haplotypes clustering by populations. The overall pattern shows a structure by S-alleles, with exceptions highlighted in yellow.

The genetic structure of SNPs in the S-locus flanking regions in A. halleri and A. lyrata.

Left: Mean F ST (lines) and F ST by pair (point) analysed among S-alleles (grey) or among populations (black) in 5 kb windows in the S-locus flanking regions. Right: Distribution (count) of F ST in the control regions analysed among S-alleles (grey bars) or among populations (black bars). The 95% percentiles of the distributions are represented by dotted lines and the medians by solid lines.

Patterns of genetic associations between S-alleles and SNPs across the genome.

Each dot corresponds to a SNP showing statistically significant association (top 0.1%) with a given S-allele. The left panel confirms that SNPs physically linked to the S-locus (in red) are considerably more likely to show statistical associations with S-alleles. The chromosomes were signified by alternance of black and grey. The middle panel shows a zoom on the 50kb regions flanking the S-locus and shows that the statistical association extends over long distances, at least for some S-alleles. Each region of 25 kb was delimited by vertical lines. The right panel shows that the observed number of associated SNPs in the linked regions far exceeds that in regions of identical size from control regions. The histogram shows the distribution across control regions of the mean number of significantly associated SNPs per S-allele. The vertical lines correspond to the mean number of significantly associated SNPs in the first 25kb (solid lines) and the 25-50kb interval (dashed lines) away from S-locus.

Experimental protocol.

A) We randomly crossed A. lyrata individuals from the PIN, TSS and IND populations in North America (left) and A. halleri from the Nivelle (middle) and Mortagne (right) populations. Individuals were sequenced by a capture protocol. Numbers between parentheses represent the number of individuals per dataset. B) One offspring from each cross was sequenced along with its two parents for trio haplotyping. Offspring from the Nivelle population (black circle) were conserved for the study of the impact of homozygosity at the S-locus on fitness (G1 population). C)Individuals sequenced in A and B were used to reconstruct haplotypes linked to each copy of S-allele, assuming no recombination between the S-locus and its flanking regions between parents and offspring. D) We used the dominance hierarchy between S-alleles expressed in pollen22,23 to cross G1 individuals from theNivelle population and obtained six G2 families constituted of heterozygous and homozygous individuals for the alleles Ah01, Ah03 and Ah04 . E) Description of the traits measured and the methods used to estimate the impact of homozygosity at the S-locus in homozygotes. Traits 1-8 are related to biomass and traits 10-14 are related to reproductive success.

Supplemental items

Crosses performed to obtain homozygotes for three S-alleles. For each S-allele, we obtained homozygous and heterozygous plants within two separate families (listed as separate rows).

Trait variation in S-locus homozygous individuals.

Trait variation in homozygous at the S-locus for the S-alleles Ah01 and Ah04 between families.

Effect of dominance on variation of phenotypic traits in S-locus homozygous individuals.

Number of phased haplotypes linked to S-alleles in each sample.

Effect of dominance on the accumulation of genetic load in S-flanking regions.

S-locus genotypes of individuals sequenced using the capture protocol.

S-locus genotypes of the offspring selected for haplotype phasing and the crosses for the study of phenotypic traits.

Effect of phytopathogen, phytophagous attacks and oxidative stress on the phenotypic traits.

Difference on the phenotypic traits variation between the two families for each allele tested.