Dissecting oligogenic and polygenic indirect genetic effects through the lens of neighbor genotypic identity

Yasuhiro Sato; Kosuke Hamazaki

doi:10.7554/eLife.111650.1

Introduction

Individual phenotypes often depend not only on an individual’s own genotype but also on the genotypes of other individuals within a group. This phenomenon, termed ‘indirect genetic effects’ (IGEs), is critical for evolutionary trajectories, as it may constrain the response of direct genetic effects (DGEs) to selection [1–3]. If DGEs and IGEs coevolve, this offers a valuable perspective for plant and animal breeding to optimize group performance by mitigating intraspecific competition [4–6]. However, functional traits responsible for IGEs remain elusive [7]. For example, a set of crop traits designed to reduce competition, termed an ideotype [8], includes stem height, leaf angle, root length and other dimensions of plant morphology [9,10]. To address the difficulty in identifying intermediate traits responsible for IGEs, many researchers have utilized genotype-based IGE models, which allow for trait-blind analyses of IGEs [1,5,7,11].

The application of genotype-based IGE models has been enhanced by the increasing availability of high-resolution and population-wide data of single nucleotide polymorphisms (SNPs) [3,12]. For instance, recent studies have conducted genome-wide association study (GWAS) and genomic prediction (GP) to dissect the genetic architecture of IGEs in humans [13,14], mice [7,15], Arabidopsis thaliana [11], and crops [16,17]. In laboratory mice, IGEs from cage mates have been analyzed for over 100 phenotypes and transcriptomes [7], where GWAS identified 24 quantitative trait loci (QTLs) underlying IGEs [15]. Besides mobile animals, IGEs have been analyzed for the model plant A. thaliana and crops to estimate mixing abilities among neighboring genotypes [16,18,19]. These pioneering studies provide the first step to implement GWAS and GP of IGEs. Yet, flexible implementation has not been available for joint modeling of polygenic and oligogenic IGEs.

In plants, IGEs occur among neighboring individuals as they cannot escape from their neighbors. These neighboring IGEs include multiple types of ecological interactions such as competition and facilitation [20–23]. To detect IGEs at a locus level, our previous study integerated the Ising model of ferromagnetics into a linear mixed model, named ‘Neighbor GWAS’ [24]. Analogous to physical interactions between two magnets, Neighbor GWAS formulated frequency-dependent selection at each SNP locus [25]. To reduce false positive signals, we employed a multi-kernel mixed model to correct for the population structure shaped by additional factors such as IGEs. This multi-kernel mixed model enables an efficient implementation called Reliable Association INference By Optimizing Weights (RAINBOW) [26], thereby allowing us to perform GP and GWAS by partitioning variance components and association scores between DGEs and IGEs.

In this study, we aimed to develop a flexible model of polygenic and oligogenic IGEs on a complex phenotype. We first presented a genotype-based IGE model by integrating Neighbor GWAS with RAINBOW. Subsequently, we conducted simulations to validate the performance of the proposed model against the original model. The proposed model was finally applied to three datasets of woody plant species. Based on the simulations and applications, we illustrated the feasibility of GWAS/GP of IGEs on a complex phenotype.

Results

Model development

To enable flexible modeling of DGEs and IGEs, we developed a multi-kernel mixed model encompassing oligogenic and polygenic effects on a complex phenotype. These oligogenic and polygenic effects were modeled as fixed and random effects, where the former and latter realized GWAS and GP, respectively. To model neighbor genotypic interactions, let us assume that a focal individual i are interacting with neighboring individuals j in a local space k, which are designated as k_i and k_j (Fig. 1). Following the original model of Neighbor GWAS [24], we then formulate IGEs of neighboring genotypes at q-th SNP locus on focal k_i-th individual phenotype as follows.

Overview of the proposed mixed model incorporating oligogenic and polygenic indirect genetic effects.
Panel (a) displays the locus-level equation (upper) and its mixed-model expression (lower) for direct and indirect genetic effects (DGEs and IGEs) with their interactions (DGE×IGE). Blue and red terms highlight extended components relevant to IGE and DGE×IGE, respectively. Plain (black) terms indicate DGE components (i.e., a standard GWAS/GP model). The model parameters are as follows; x_q,i ∈ {AA, Aa, aa} = {+1, 0, −1} whereby or 0; : mean allelic similarity between focal and neighboring individuals; : allele frequency bias near focal individuals; β_q,1: SNP-wise DGE; β_q,2: SNP-wise IGE based on neighbor allelic similarity; and β_q,12: SNP-wise DGE×IGE based on the interaction term between β_q,1 and β_q,2. These fixed-effect coefficients can be used for GWAS of DGE and IGE. Panel (b) shows variance-covariance structure between the random effects u₁ and u₂ based on a multi-variate normal (MVN) distribution. The bottom illustration outlines how to incorporate the local (neighboring) and global spatial genetic structure involving IGEs. The bottom right and left respectively represent neighboring space for individual k and l, which constitute the k and l-th element of the four variance-covariance matrices as K₁: Kinship matrix representing individual genetic similarity; K₂: Kinship matrix weighted by neighboring allelic similarity; and K₁₂ & K₂₁: Kinship matrix weighted by neighboring allelic frequency, with the (co)variance-component parameters : DGE genetic variance (>0); : IGE genetic variance (>0); ρ₁₂: DGE-IGE covariance (>0 or <0). See also “Model development” in the main text.

, or

, where

As for the fixed effects, and represent the genotype of focal i-th and neighbor j-th individuals in the local space k (where J_k is the total number of neighboring individuals). At the focal locus q, the two homozygotes are encoded as aa = −1 and AA = +1 with heterozygote as Aa = 0, i.e., . Accordingly, the fixed-effect coefficient β_q,2 indicates the influence of mean allelic (di)similarity [i.e., (−1)×(+1) = (+1)×(−1) = −1 or (+1)×(+1) = (−1)×(−1) = +1] between focal and all neighboring individuals on . This formulation is analogous to the Ising model of ferromagnetics [27] when inter-genotypic interactions occur among neighboring individuals in a lattice space (e.g., sessile plants). In this analogy, β_q,2 represents physical interactions among the south/north dipole of neighboring magnets while β_q,1 corresponds to an external energy (Appendix S1; Fig. S1). In the context of evolutionary biology, the coefficient β_q,2 represents positive or negative frequency-dependent selection when inter-genotypic interactions occur within a group [25] (Appendix S2 and S3; Fig. S2), which gives a specific case of population genetic models of frequency-dependent selection [28]. In addition to β_q,1 and β_q,2, β_q,12 is defined as an interaction term between β_q,1 and β_q,2 because due to {AA, Aa, aa} = {+1, 0, −1} where or when is heterozygous (Appendx S2). Theoretically, the sign of β_q,2 determines whether the weighted mean of phenotypic values can be concave or convex against neighboring allele frequency (Fig. S2a and d), whereas β_q,12 determines the slope of along the allele frequency (Fig. S2b-c and e-f). Although bistable evolutionary equilibria become possible in the presence of heterozygote (Fig. S2b and e: cf. Fig. S2 of Sato et al. [25]), this property may not be relevant to cultivars for which we can manipulate genotype frequencies and their spatial arrangement. Therefore, the three coefficients β_q,1, β_q,2, β_q,12 can be used as key parameters to distinguish between SNP-wise DEGs and IGEs in GWAS, which underpin the mean behavior of phenotypic values and consequent group performance under allelic mixture at a given locus.

Besides the fixed effects, Eq. 1 includes random effects that can represent polygenic effects (or best linear unbiased predictor: BLUP) for DGEs and IGEs. In the multi-variate normal (MVN) distribution, and denote genetic variance of DGEs and IGEs, respectively. The design matrix Z assigns individuals and genotypes, which should be an identify matrix I because inter-genotypic interactions should occur among individuals. The N × N kinship matrix K₁ represents polygenic DGEs (where N means the total number of individuals), whereas one of the other variance-covariance matrices K₂ represents polygenic IGEs. These two N × N variance-covariance matrices are defined by the cross-products and (where Q means the number of SNP loci). The n-th and q-th element of the SNP matrix X₁ corresponds to the self-genotypic value as . As we define , K₁ is scaled so that its element represents the proportion of SNP loci shared among individuals. The n-th and q-th element of N individuals × Q SNPs matrix X₂ is given as . Accordingly, the k-th and l-th element of N × N matrix K₂ is given as ). This indicates the polygenic effects (i.e., individual genetic similarity ) weighted by global allelic similarity between two local spaces k and l (i.e., a similarity metric of the mean allelic similarity in two local spaces ) (Fig. 1b), which is expected to correct for a spatial bias in allelic distribution in the GWAS of the neighbor genotypic effects β_q,2.

When not only a single locus but also many loci involve both DGE and IGE, these non-independent QTLs result in interactions between polygenic DGEs and IGEs (i.e., X₁β₁ and X₂β₂), thereby shaping covariance between X₁β₁ and X₂β₂ as Cov(X₁β₁, X₂β₂). Given that mean SNP effects are expected to be zero as E(β₁) = E(β₂) = 0, the DGE-IGE covariance is given as . When β₁ and β₂ are not independent, Cov(β₁, β₂) ≠ 0 and this covariance can be redefined as the MVN parameter ρ₁₂σ₁σ₂ = Cov(β₁, β₂). Meanwhile, can be scaled and redefined as . The k-th and l-th element of K₁₂ is , indicating the polygenic effects weighted by the local-space allele frequency in focal individual’s neighborhood. Similarly, where . The k-th and l-th element of K₂₁ is . This indicates K₁₂ ≠ K₂₁, but . Furthermore, the local-space allele frequencies and refer to the frequency of one allele at the same SNP locus. Thus, we can assume the same effect from the focal allele at a given locus as Cov(β₁, β₂) = Cov(β₂, β₁) = ρ₁₂σ₁σ₂I. Combined with the two asymmetric variance-covariance matrices (i.e., ), ρ₁₂σ₁σ₂K₁₂ and ρ₁₂σ₁σ₂K₂₁ represent asymmetric polygenic effects between focal and neighboring genotypes on . In GWAS, this random-effect covariance can be used to correct for population structure due to the locus-wise asymmetric neighbor effects β_q,12.

In the case of Eq. 1, u₁ and u₂ are not independent due to their covariance. Therefore, the total phenotypic variance Var(y) can be decomposed as:

This Eq. 2 indicates that phenotypic variations can be partitioned into those explained by DGE, IGE, and their covariance in addition to residuals, where the three parameters take the range of , and ρ₁₂ = (−∞, ∞). The sign of ρ₁₂ therefore determines whether the DGE-IGE covariance increases or decreases the phenotypic variation Var(y), which would in turn affect response to selection.

We implemented the multi-kernel mixed model (Eq. 1) based on the RAINBOW algorithm [26]. This method works by optimizing weights w₁ + w₂ = 1 for K₁ and K₂. In the present model, the variance component parameters are redefined as and , where represents the total genetic variance due to the two random effects. Accordingly, the proportion of phenotypic variation explained (PVE) by DGE, IGE, and their covariance can be partitioned respectively into PVE_DGE = w₁PVE_u, PVE_IGE = w₂PVE_u, and , where . Estimating variance component parameters yields a weighted kernel as . This weighted kernel allows us to GWAS by comparing models with or without each SNP-wise DGEs or IGEs for β_q,1, β_q,2, or β_q,12.

Benchmark simulation

Using simulated genomes and phenotypes, we tested the performance of the proposed model in variation partitioning, GP, and GWAS. To first address how correctly variance component parameters can be estimated, we performed independent iterations 30 times with varying covariance between DGE and IGE (Fig. 2; Fig. S3). These simulations showed that the medians of the covariance parameter close to those of true models (Fig. 2a and g; Fig. S3a, e and i), validating the capability of the proposed model in distinguishing the sign of ρ₁₂. In a quantitative term, the scenarios of positive DGE-IGE covariance exhibited smaller variations as the absolute value of true ρ₁₂ became large (MAE = 0.099, 0.074, and 0.022 for ρ₁₂ = 0,0.3,0.6, respectively: Fig. 2a; Fig. S3a and e). In contrast, the scenarios of negative DGE-IGE covariance showed larger variations as the absolute value of true ρ₁₂ became large (MAE = 0.210, 0.176 and 0.099 for ρ₁₂ = −0.6, −0.3,0, respectively: Fig. 2g; Fig. S3e and i). Similar trends were observed for the estimated IGE weight , such that the medians of were close to the true value regardless of the incorporation of ρ₁₂ but showed different extents of variations (MAE = 0.12, 0.076, 0.072, 0.050, 0.026 for ρ₁₂ = −0.6, −0.3,0,0.3,0.6, respectively: model = ‘cov’ in Fig. 2b and h; Fig. S3b, f and j). These results indicate that while the case of negative covariance was more difficult to estimate than that of positive ones, the proposed model is able to estimate polygenic DGE and IGE with the sign of their covariance.

We further tested whether the proposed model can be used as a tool of GP. To evaluate the predictability for simulated phenotypes, we left one on the tenth of simulated data as a test dataset while the rest as a training dataset under varying DGE-IGE covariances. In the training models, total BLUPs were highly correlated between models with and without ρ₁₂ (R² > 0.92 for ŷ between the two models: Fig. S4). Nonetheless, when the covariance was positive, the inclusion of DGE-IGE covariance ρ₁₂ slightly but better predicted the test dataset in terms of R² and MAE (Fig. 2c-d; Fig. S3c-d). When the covariance was negative, models with the DGE-IGE covariance ρ₁₂ did not perform better than those without ρ₁₂ or even slightly worse in terms of R² and MAE (Fig. 2i-j; Fig. S3k-l). For the case of ρ₁₂ = 0 (null model), we observed almost no differences between the models with and without ρ₁₂ (Fig. S3g and h). These results revealed the conditions under which the proposed model could perform better in GP.

We then tested whether the inclusion of the covariance ρ₁₂ could improve the GWAS performance to detect causative IGE SNPs (i.e., non-zero β_q,2). Under the distinct scenarios of positive or negative covariance ρ₁₂ = ±0.6, we evaluated the model performance using the receiver operating characteristic curve (ROC). In terms of the area under the ROC curve (AUC), the cases of negative and positive covariance both showed the moderate power to detect the causative SNPs (AUC>0.6 and AUC<0.8: Fig. 2e and k). This moderate power was also supported by the sensitivity at the stringent specificity (at FDR < 0.05: Fig. 2f and l). These two metrics confirmed the feasibility of GWAS of IGE but its performance did not differ considerably regardless of the DGE-IGE covariance (Fig. 2e-f and k-l). Thus, the both models with and without the DGE-IGE covariance work enough for GWAS.

Application to real data

We applied the proposed model to the three datasets — trembling aspen (Populus tremuloides), apple (Malus × domestica), and grape (Vitis vinifera) — to demonstrate its usability. All three datasets involve woody species and provide individual-level data on phenotypes, genotypes, and spatial arrangement. Based on the spatial layout of the study species, we analyzed the influence of nearest neighbors on each trait.

We first analyzed life-history and defensive traits of aspen genotypes grown in a field garden at Wisconsin of USA, called WisAsp project [29–31]. Among 18 trait records collected over two years, basal area increment at the later growth stage (BAI2024-2015) alone showed a significant negative covariance between DGE and IGE (ρ₁₂ = −0.599 and −0.629, χ² = 3.875 and 4.727, p = 0.049 and 0.030 with likelihood ratio tests for 2014 and 2015, respectively: Table S1), suggesting IGEs on growth-related traits out of diverse traits. For the late basal area increment, we conducted GP based on 10-fold cross-validation. This GP showed the similar performance between models with and without the covariance in terms of R² and MAE (Fig. S5), providing consistent evidence with the simulations (Fig. 2). GWAS of the 18 traits detected two significant IGE SNPs for the first bud burst and size growth (FDR < 0.05: Table S1), but none for the other traits including defensive chemicals (FDR > 0.05). Besides the plant traits, we analyzed the two quantitative indices of arthropod community compositions, but found neither significant DGE-IGE covariance nor IGE SNPs (Table S1). These results led us to focus on intergenotypic competition in the other tree species.

Second, we analyzed apple REFPOP data collected across multiple years and sites in Europe (Fig. 4; Fig. S6). This dataset included 20,280 individuals of 534 accessions across six European sites for 30 traits [32,33]. We quantified PVEs by DGEs, IGEs, and their covariance for these traits (Fig. 4a, c and e; Fig. S6) and focused on three traits with less missing records, i.e., flowering intensity, trunk diameter, and trunk increment (see also Table S2). The flowering intensity was the most largely regulated by DGEs among these three traits (Fig. 4a), whereas the trunk diameter and increment showed relatively large influence of IGEs (Fig. 4c and e). These results suggest that growth traits were more likely to be subjected to IGEs rather than phenological traits.

To further examine the intensity of competition among apple trees, we analyzed relationships between PVE by the DGE-IGE covariance (i.e., PVE_cov) and average tree size (Fig. 4b, d and f). The flowering intensity showed a positive but non-significant relationship between PVE_cov and mean trunk diameter (LMM, slope coef. = 0.0007, t = 0.295, p = 0.613 by Wald test: Fig. 4b), where we did not detect any significant (Table S2). In contrast, the trunk diameter showed a negative but non-significant relationship between PVE_cov and mean trunk diameter (slope coef. = −0.002, t = −1.078, p = 0.150: Fig. 4d), where significant were detected at four sites (Table S2). Remarkably, PVE_cov of the trunk increment significantly decreased as the mean trunk diameter became large (slope coef. = −0.006, t = −2.61, p = 0.013: Fig. 4f), indicating that intergenotypic competition turned intense as trees grew bigger. Regarding the trunk increment, GWAS also detected two significant IGE SNPs on the seventh chromosome at Belgium site in 2019 (Fig. 4g), which were located on the seventh chromosome at 8743248 and 8744866 base pair positions. Both SNPs had negative β_q,2 estimates, indicating the negative influence of allelic mixtures on the trunk increment ( : see also Fig. S1 and S2 for the interpretation of ±β_q,2). In total, the apple data provide supportive evidence for competitive genetic interactions among neighboring trees.

Third, we additionally analyzed the grape INNOVINE project data [34] to examine a climbing woody plant as another test case. This dataset included 3,642 individuals and 90,007 SNPs for 19 traits [34]. In this dataset, most traits did not show large influence of neighboring genotypes compared to DGEs ( for 17 out of the 19 traits: Fig. 5). Although the pruning weight and citric acid was influenced by IGEs, these two traits showed not negative but positive DGE-IGE covariance ( and 0.718, and 0.767 for the pruning weight and citric acid, respectively: Table S3). GWAS of the the citric acid detected over a thousand of significant IGE SNPs but quantile-quantile plot showed the inflation of p-values (Fig. S7), indicating the risk of false positive signals. Only the shikimic acid did we find a significantly negative covariance (, χ² = 5.207, p = 0.023), but this trait showed the small contribution of IGEs compared to DGEs ( : Table S3). These results from climbing grapevines provide limited evidence for competitive interactions among neighboring individuals, highlighting the case in which intergenotypic competitions are difficult to detect.

Discussion

We demonstrated the feasibility of GWAS/GP of IGE based on the simulations and applications. Especially, we focused on the DGE-IGE covariance ρ₁₂ as a key parameter underlying the genetic trade-off between individual and group performance. Below, we first discuss methodological advantages and potential limitations in the proposed model. Then we discuss biological findings from the real data on woody plant species.

Model performance and limitation

By formulating SNP-wise IGEs based on the Ising model, we integrated oligogenic and polygenic IGEs into a multi-kernel mixed model. The present model can be exchanged into previous IGE models using the design matrix Z instead of K₂ and β_q,2. For example, Costa e Silva et al. [5] defined interacting neighbors by the design matrix, or incidence matrix Z. This approach can be implemented using the RAINBOWR package as it allows the flexible use of Z (Appendix S4). Furthermore, it seems plausible for plants and sessile organisms to assume weaker IGEs from distant neighbors [5,35]. With 0-1 weights, the design matrix may also accomodate the spatial distance decay of neighbor genotypic effects [35]. Examples of such distance decay include the inverse, exponential, or any other decay functions of Euclidean distance. Therefore, the proposed model can implement preceding IGE models once the neighbor incidence and its distance decay are made explicit.

While we applied our models to neighboring interactions in land plants, this approach can also be applied to animal herds or subpopulations. In this case, spatial genetic interactions based on the Ising model correspond to a population genetic model of frequency-dependent selection [25] (Appendix S3; Fig. S2). By applying the proposed model to cage-reared damselflies, previous studies indeed detected negative frequency-dependent selection on female color polymorphism [25]. Practically, this type of grouping can be achieved by setting neighbor spatial scales to infinity and subpopulations as a grouping factor in Neighbor GWAS [see 25 and its vignette]. Given that IGEs have often been analyzed to mitigate intraspecific competition in livestock [4,6], our method may offer a valuable opportunity to dissect the genetic architecture of group performance in both animals and plants.

Our simulations demonstrated the performance of the proposed model in GWAS and GP. In GWAS, models with or without the covariance ρ₁₂ showed a similar performance to detect causative IGE SNPs. This is likely because total BLUPs are very similar regardless of ρ₁₂ (R² > 0.92: Fig. S2), which may not make a large difference in the weighted kernel K′. In GP, the consideration of covariance improved the predictability especially when DGE and IGE are positively correlated, but not when they are negatively correlated. This might be due to modeling issues and technical limitation in the multi-kernel mixed model. When ρ₁₂ is positive, the weighted kernel K′ is likely to be positive semi-definite because all the variance component parameters , and ρ₁₂ were all positive. When ρ₁₂ is negative, however, K′ may not be positive semi-definite due to ρ₁₂ < 0. To anticipate this issue, we approximated K′ to the nearest positive-definite matrix in this study, which might have resulted in the unstable estimation of negative ρ₁₂. Further theoretical studies are needed for the quantitative improvement of the present model and its estimation.

Genetic architecture of tree competition

To demonstrate the model applicability, we analyzed the three woody species: aspen, apple, and grape. The aspen and apple exhibited negative genetic covariance between DGEs and IGEs on their growth. Notably, these negative effects of neighboring genotypes were amplified as trees matured. These results are consistent with the previous studies on Eucalyptus trees, which detected significant negative DGE-IGE covariance at the late growth stage [5]. In contrast, the climbing woody plants, grapevines, did not exhibit large negative effects of neighboring genotypes on their growth traits. Like other climbing plants, grapes grow vertically and may therefore experience weak competition with horizontal neighbors. Furthermore, grapes possess considerable phenotypic plasticity in response to light quality, such as ultraviolet and far-red wavelengths, to alter their stem length, leaf morphology, and metabolite content [36,37]. Together with the exceptional but plausible case of grapes, the parallel evidence from aspen and apples exemplifies inter-genotypic competition within a plant species.

The present data encompass phenological, chemical, and growth traits, though we did not observe a significantly negative DGE-IGE covariance for these traits. Regarding phenological traits, flowering time is often regulated by self-genotypes as shown by high heritability within a field site [38,39]. We detected substantial phenotypic variance explained by DGEs, but limited variance explained by IGEs, for flowering time across the three woody species (e.g., “BS, bud set” in Fig. 3; “Harvesting date” in Fig. S6; and “days.upto.veraison” in Fig. 5). This suggests that phenological traits are unlikely to be substantially explained by IGEs. In contrast to phenological traits, some chemical traits exhibited large phenotypic variance explained by IGEs, accompanied by a positive DGE-IGE covariance, suggesting synergistic effects of DGEs and IGEs on chemical properties. Specifically, metabolite accumulation in ripening fruits may be facilitated by volatile chemicals, such as ethylene [40,41]. Similarly, for leaf secondary metabolites, herbivore damage often activates jasmonate signaling and the production of green leaf volatiles, which can induce defense responses in neighboring trees [42,43]. These IGEs on specific metabolites provide testable hypotheses for future studies.

Proportion of phenotypic variation explained (PVE) by direct genetic effects (DGEs), indirect genetic effects (IGEs), and their covariance (cov) in aspen.
Blue (upper), green (middle), and red (lower) bars respectively indicate PVE by IGEs, DGEs, and covariance for each trait (see PVE_IGE, PVE_DGE, and PVE_cov calculations in the main text subsection “Model development”). The left and right panels present results from data collected in 2014 and 2015, respectively. Trait abbreviations: AvgBA, average basal area; AvgSize, average tree volume; GrowthYEAR1_YEAR2, growth of AvgSize from YEAR1 to YEAR2; BAI, basal area increment; AvgEFN, average extra-floral nectary; SDEFN, standard deviation of extra-floral nectary; TotEFN, total extra-floral nectary; SLA, specific leaf area; LA, leaf area; BB, first bud break; BS, terminal bud set; GS, growing season (i.e., BB to BS); CT, condensed tannin; PG, phenolic glycoside; TotalDefChem, total defense chemicals; CN, carbon-nitrogen ratio; and N, nitrogen.

Indirect genetic effects of conspecific neighbors on apple traits across the six sites (Belgium, BEL; France, FRA; Italy, ITA; Poland, POL; Spain, ESP; and Switzerland, CHE) and three years (2018, 2019, and 2020).
(a-f) Variation partitioning of the flowering intensity (a-b), trunk diameter (c-d), and trunk increment (e-f). The left panels (a, c and e) show the proportion of phenotypic variation explained (PVE) by direct genetic effects (DGEs), indirect genetic effects (IGEs), and their covariance (cov) following the aspen example (see Fig. 3). The right panels (b, d, and f) display PVE by the covariance against the mean trunk diameter per trial. A single data point corresponds to each field trial. Red circles highlights trials with significant . Solid and dashed lines indicate significant and non-significant trends detected by linear mixed models that include the sites and years as random effects. (g) GWAS manhattan plot of IGEs on the trunk increment at the Belgium site in 2019 (BEL2019). The y-axis [-log₁₀(p₂)] indicates the association score of β_q,2, which is plotted against 17 chromosomes of apple. The horizontal dashed line indicates the genome-wide threshold at FDR = 0.05.

Proportion of phenotypic variation explained (PVE) by direct genetic effects (DGEs), indirect genetic effects (IGEs), and their covariance (cov) in grape.
Blue (upper), green (middle), and red (lower) bars respectively indicate PVE by IGEs, DGEs, and covariance for each trait (see also Fig. 3). Trait abbreviations: “nb.clusters”, the number of clusters; “datever”, date of veraison; “nb.wshoots”, the number of woody shoots, “d13C”, δ¹³C; “glufru.norm”, the sum of glucose and fructose; “gluvrai.norm”, glucose; “fruvrai.norm”, fructose; “tartrate.norm”, tartaric acid; “malate.norm”, malic acid; “shik.norm”, shikimic acid; and “citr.norm”, citric acid.

In addition to variation partitioning, GWAS detected two site-specific but significant IGE SNPs with negative for the apple trunk increment. The negative sign of these SNPs suggests that neighbors with dissimilar alleles reduce the trunk increment of focal individuals, providing evidence for intergenotypic competition at allelic levels. These two significant SNPs exhibited a complete linkage disequilibrium (R² = 1) and were closely located on the top-to-center positions of seventh chromosome. For this chromosomal region, previous publications documented QTLs underlying DGEs on fruit size [44] and basal trunk increment [45]. More specifically, our significant SNPs range between 8743248 and 8744866 bp positions on the seventh chromosome, which include a locus encoding a protein kinase superfamily protein (MD07G1084900). This locus is homologous to AT5G56790 in A. thaliana, but its biological function remains unknown. The nearest loci of MD07G1084900 include the homologs of MYB33 (MD07G1085000; homolog of AT5G06100) and SUCROSE PHOSPHATE SYNTHASE 1F (MD07G1084800; homolog of AT5G20280). Although these two genes AT5G06100 and AT5G20280 are involved in the vegetative phase change [46] and leaf CO₂ assimilation [47] in A. thaliana, respectively, their functions have yet to be validated in apples. To this end, our study uncovered a QTL and candidate genes underlying intraspecific competition.

Neighbor GWAS is widely applicable to a broad range of phenotypes and species as far as spatial information is available alongside phenotype and genotype data. While the present data include individual trees as the unit of genotype [29,33,34], our method allows for defining a genotype unit as multiple individuals of a single genotype, such as a single plot of field crops [48]. It is important to note, however, that spatial metadata may not always be complete. Even with available spatial information, it is necessary to interpret whether intergenotypic interactions operate at a given spatial scale. Although our study found little evidence of grapevine competition, this lack of evidence may have resulted from sufficient spacing to preclude competition. Further research is needed to assess the generality and specificity of intraspecific competition mediated by IGEs.

Conclusion

The present study provides a flexible model to accommodate polygenic and locus-wise effects of neighboring genotypes on a complex phenotype. This single model enables both GP and GWAS of IGEs. The theory of IGEs has frequently involved the evolution of social interactions [2,49], while empirical applications have predominantly been conducted in domesticated animals and plants [6,50,51]. Consequently, genomics of IGEs promises to contribute to both fundamental and applied issues, including multi-level selection and agricultural applications. Recent research has highlighted the growing need for an interdisciplinary approach to IGEs [3,12,22]. With the increasing availability of genome-wide polymorphism data, our methodology may enhance interdisciplinary practice by facilitating GWAS/GP of IGEs across diverse organisms.

Methods

GWAS/GP implementation

We used the RAINBOWR (>v0.1.41) [26] and rNeighborGWAS (>v1.2.4) [24] packages to implement the multi-kernel mixed model (Eq. 1). Multi-kernel mixed models Eq. 1 were solved using restricted maximum likelihood (REML) method without the fixed effects β_q,1, β_q,2, and β_q,12. The statistical significance was determined using likelihood ratio tests by dropping ρ₁₂ out of the full model. The mixed models with or without ρ₁₂ were solved using the EM3.cpp or EM3.cov functions of the RAINBOWR package [26], respectively. When the kernels did not become semi-positive definite, they were approximated by the nearest positive definite matrix using the nearPD function in R (which was enabled with the “forceApproxK=TRUE” option in the EM3.cov function). For GWAS, each SNP locus was subjected to a likelihood ratio test using an eigenvalue decomposition and diagonalization method for the weighted kernel K′, which was implemented in the lmm.diago function of the gaston package [52]. This GWAS procedure followed the nei_imm function of the rNeighborGWAS package [24]. All source codes and input data are deposited on the GitHub and Zenodo repositories (see “Data accessibility” section), which includes the vignette and subroutine functions for GWAS/GP implementations. An easy R package to implement Eq. 1 is also available at GitHub and Zenodo repositories (see “Data accessibility” section).

Simulation

We conducted simulations to examine the performance of the proposed model. This involved (i) simulating genomes and phenotypes, (ii) assessing trait predictability and causal variant detection, and (iii) comparing these abilities between the previous and newer model. We utilized previously simulated genotype data [25]. Specifically, by simulating spatial and temporally varying selection over 2,000 generations among ten subpopulations, Sato et al. [25] generated 30 independent datasets, each comprising 2,000 diploid individuals with 50 kb nucleotide sequences across three chromosomes (see also Appendix S4 of Sato et al. [25]). This simulation was performed using SLiM version 3 [53] and its outputs were deposited as variant call format (.vcf) on Dryad (https://doi.org/10.5061/dryad.zs7h44jdv). Depending on the iterations, each genotype dataset contained approximately ten causal DGE SNPs out of ca. 2,500 SNPs (see Figure S4b and c of [25]). Out of wider parameter range and combinations than described in the present study, our previous study [24] identified conditions under which the model performance differentiated. Our base parameters therefore followed those of our previous study [24].

Variance component and genomic prediction

To test the model capability in the inference and prediction of polygenic IGEs, we simulated phenotypes under two scenarios, i.e., negative and positive covariance between polygebnic DGEs and IGEs. Specifically, we assumed the true parameters of ρ₁₂ = 0, +0.3, or ±0.6 with PVE_u = 0.6; w₁ = 0.5; w₂ = 0.5; and . With these true parameters, we randomly assigned 1,600 individuals to a 40 × 40 lattice space. Then we simulated a phenotype subjected to DEGs and IGEs from the nearest neighbors along diagonals and off diagonals (i.e., Euclidean distance from a focal individual). We repeated this line of simulations for the thirty genotype datasets as described above. After fitting the proposed model to the simulated genotypes and phenotypes, we calculated the median and mean absolute errors (MAEs) of the variance component parameters among the 30-times simulations to test whether these parameters could be estimated well. To further test the applicability to genomic prediction, we divided the simulated data into a training and test dataset. One on the tenth individuals were randomly left out of the simulated data as a test dataset, whereas the rest was fitted by the proposed model as a training dataset. To determine the trait predictability using the test dataset, we calculated the coefficient of determination (R²) and MAEs between the estimated and simulated phenotypic values.

Genome-wide association study

We also tested the model performance to detect causative IGE SNPs in GWAS. Our procedure followed Sato et al. [25] while simulating the negative or positive covariance between DGEs and IGEs as ρ₁₂ = ±0.6. The other parameters were the same as described above for GP. We also assumed total 20 causative IGE SNPs, of which ten had a positive coefficient as β_q,2 = 0.1 and the rest ten had a negative coefficient as β_q,2 = −0.1. To make DGE and IGE correlated, we further assumed that the causative IGE SNPs overlapped with DGE SNPs (i.e., β_q,1 = β_q,2 = ±0.1). In addition to the non-zero β_q,1 and β_q,2, a half of the causative IGE SNPs was assumed to have non-zero β_q,12 as β_q,12 = ±0.1. With or without ρ₁₂, the proposed model Eq. 1 was fitted to simulated phenotype y to estimate the weighted kinship matrix K′. Based on the RAINBOW algorithm [26], we applied the eigenvalue decomposition to K′ and thereby performed GWAS for β_q,2 using a likelihood ratio test. After model fitting, GWAS performance was evaluated based on the false positive and negative rate to detect causative IGE SNPs. The balance between false and true positive rates was calculated using the receiver-operating characteristic curve (ROC) and its area under the curve (AUC). Based on the ROC curves, we also determined the power to capture true positives at the stringent false discovery rate (FDR) = 0.05. We utilized the pROC package implemented in R [54] for the ROC and AUC analysis.

Application

Aspen

For the WisAsp data, we curated phenotype and genotype data from Barker et al. [29] and [30], respectively. These data were deposited at Dryad (https://doi.org/10.5061/dryad.fr045hv and https://doi.org/10.5061/dryad.st463). The annotation of genotype-phenotype link was obtained from Riehl et al. [31]. We set the cut-off value of minor allele frequency (MAF) at 0.05. In total, we analyzed data from 813 individuals in 2014, comprising 169,150 SNPs, and data from 826 individuals in 2015, comprising 169,111 SNPs. In this SNP matrix, 73% of the loci were homozygous while the rest 27% were heterozygous.

Using the curated aspen data, we performed variance component analysis and GWAS of IGEs. The influential area of neighbors was assumed for the nearest neighbors along the diagonal and off-diagonal direction, i.e., 1 Euclidean distance from a focal individual. We included experimental blocks as non-genetic covariates in all the mixed models. These analyses were separately performed for the year 2014 and 2015. For each trait per year, we tested the statistical significance of the covariance parameter ρ₁₂ over the model with ρ₁₂ = 0 based on the likelihood ratio test. According to the estimated variance components, we then conducted GWAS of β_q,2 based on the weighted kinship matrix K′. Unless otherwise stated, we determined the significance threshold by FDR for GWAS. We used the CalcThreshold function implemented in the RAINBOWR package [26] to calculate FDR for GWAS.

Apple

We curated the apple REFPOP data [32,33] available via the Data INRAE repository (https://doi.org/10.15454/IOPGYF, https://doi.org/10.15454/1ERHGX, and https://doi.org/10.15454/VARJYJ). The phenotype data deposited by Jung et al. [33] included 30 traits for 20,280 individuals of 534 accessions across six European sites (Belgium, France, Italy, Poland, Spain, and Switzerland). Ten of these traits were measured at one location, while the remaining 20 traits were available from at least two locations (cf. Fig. 1 of Jung et al. [32]). Genotype information of the 534 accessions is available as imputed SNP data between 20K and 480K array [32], yielding 281,140 SNPs at MAF > 0.05. In the SNP matrix, 66% of the loci were homozygous while the rest 34% were heterozygous. The imputed genotype data are available upon reasonable request to Michaela Jung and Helene Muranty as described in Jung et al. [32]. To screen candidate genes in GWAS, we referred to the whole genome assembly and annotation of Malus x domestica GDDH13 v1.1 [55] through the Apple Genome and Epigenome project website (https://iris.angers.inra.fr/gddh13/; Last accessed on 18-Feb-2026).

Using the curated apple data, we conducted variance component analysis and GWAS per year and site. In this dataset, individuals were assigned along rows and positions. Depending on the site, planting distance within and between tree rows ranged 0.9-1.3 m and 3.2-3.6 m, respectively [32]. Therefore, we considered the nearest neighboring interactions along the position within the tree rows. For each year and site, we then performed likelihood ratio test for ρ₁₂ and GWAS of β_q,2 as described for the aspen data above. To test yearly trends in PVE by DGE-IGE covariance (PVE_cov), we used linear mixed models in which the study years and sites were considered fixed and random effects, respectively.

Grape

We downloaded the set of grape phenotype and genotype dataset from the Data INRAE repository (https://doi.org/10.15454/8DHKGL), which was all archived by Flutre et al. [34]. These data comprise 3,642 individuals of 270 accessions with 90,007 SNPs at MAF > 0.05. Of these SNP loci, 79% were homozygous while the rest 21% were heterozygous. Nineteen traits were available, some exhibited numerous missing data points (Table S3) (see also Supplementary Fig. 1 of Flutre et al. [34]).

Using the retrieved grape data, we conducted variance component analysis and GWAS for each trait. In this dataset, individuals were assigned along ranks and locations in the field, where the ranks were separated 2.5 m each other while the locations were 1 m apart between individuals within a rank [34]. Therefore, we assumed he nearest neighboring interactions along locations with ranks considered a grouping factor. We also included the experimental blocks as non-genetic covariate in all the mixed models. Then we performed likelihood ratio test for ρ₁₂ and GWAS of β_q,2 as described for the aspen data above.

Supplementary Tables

Table S1. Variance component estimates and significant GWAS SNPs of IGEs in aspen. For 2014 or 2015, shown are the effective number of individuals with trait records (n), the total genetic variance , environmental variance , DGE and IGE weights ( and ), DGE-IGE covariance (), likelihood ratio test and its p-values over a model without the covariance ρ₁₂ (χ² and p), and the number of significant IGE SNPs (#sig. SNPs). See also the caption of Figure 3 for the trait abbreviations.

Table S2. Variance component estimates and significant GWAS SNPs of IGEs in apple. Shown are the effective number of individuals with trait records (n), the total genetic variance , environmental variance , DGE and IGE weights ( and ), DGE-IGE covariance , likelihood ratio test and its p-values over a model without the covariance ρ₁₂ (χ² and p), and the number of significant IGE SNPs (#sig. SNPs). The site codes are shown as follows: Belgium, BEL; France, FRA; Italy, ITA; Poland, POL; Spain, ESP; and Switzerland, CHE. NA means not available. ^†Not available due to false convergence. ^‡Option changed as “optimizer = nlminb” in the EM3.cov function.

Table S3. Variance component estimates and significant GWAS SNPs of IGEs in grape. Shown are the effective number of individuals with trait records (n), the total genetic variance , environmental variance , DGE and IGE weights ( and ), DGE-IGE covariance , likelihood ratio test and its p-values over a model without the covariance ρ₁₂ (χ² and p), and the number of significant IGE SNPs (#sig. SNPs). ^†Not available due to false convergence.

Supplementary Figures

Ising model-based simulation of spatial patterns shaped by the nearest neighbor interactions.
The lattice space with 40 x 40 cells shows the outcome of 100-times update of the genotype value after a random configuration of two genotypes (black and white cells) following the fixed effects of Eq. S1.1, i.e., (a) The case of β_q,2 = −0.2, β_q,1 = 0, β₀ = 0. (b) The case of positive β_q,2 = 0.2, β_q,1 = 0, β₀ = 0.

Numerical examples for the theoretical expectation between the phenotypic value and the neighboring frequency of A alleles f_q,A at q-th SNP locus within a local space k.
Black and grey lines indicate the phenotypic values of A and a alleles, respectively. Dashed curves represent a group mean defined by the weighted mean of the phenotypic values between A and a alleles. Panels (a-d) exemplify respective cases of positive and/or negative β_q,n(= β_q,12) and β_q,s×n(= β_q,2) with zero or non-zero β_q,s(= β_q,1) and β₀ = 1.0.

Variance components and genomic prediction of simulated data under the scenarios of moderate positive (ρ = 0.3: a-d), null (ρ = 0: e-h), or moderate nagative (ρ = −0.3: i-l) covariance between DGE and IGE.
(a, e and i) Estimated covariance between polygenic DGE and IGE. Horizontal dashed lines indicate the true ρ₁₂ value. (b, f, and j) Estimated weight of IGE relative to that of DGE . Models with or without the covariance (cov or noncov) were fitted to simulated phenotypes. Horizontal dashed lines indicate the true w₂ value. (c-d, g-h and k-l) Genomic prediction based on the models with or without the covariance (cov or noncov). The model performance was evaluated using the coefficient of determination (R²: c, g and k) and mean absolute error (MAE: d, h and l) between the predicted phenotypic values ŷ and the observed trait values of the test dataset y. Box plots display the median and quartiles among 30 independent iterations, with whiskers extending to 1.5 × inter-quartiles.

Coefficient of determination (R²) for the estimated phenotypic values ŷ between the models with or without the covariance parameter ρ₁₂.
Box plots are shown for the simulation scenarios with varying covariance between DGE and IGE (ρ = −0.6, −0.3,0,0.3,0.6). The high correlations (R² > 0.9) indicate the very similar ŷ between the two models.

Genomic prediction of the basal area increment from 2014 to 2015 (BAI2014_2015) in aspen.
The coefficient of determination (R²: a and b) and mean absolute error (MAE: c and d) of predicted phenotypic values ŷ among 10-fold cross-validations are shown for the data collected in 2014 (left) and 2015 (right).

Proportion of phenotypic variation explained (PVE) by indirect genetic effects (IGEs), direct genetic effects (DGEs), and their covariance (cov) in 20 apple traits other than the flowering intensity, trunk diameter, and trunk increment given in the main text.
Blue (upper), green (middle), and red (lower) bars respectively indicate PVE by IGEs, DGEs, and their covariance for each trait. Blank bars indicate the absence of data.

GWAS Manhattan (a) and quantile-quantile (b) plots for the citric acid (“citr.norm” in Fig. 5) in grape.
(a) The association score of −log₁₀(p) plotted against the 19 chromosomes of grape (where chromosome 100 means unanchored chromosomes). Horizontal dashed line indicates the genome-wide significance threshold at FDR = 0.05. (b) The observed association scores −log₁₀(p) plotted against randomly expected values. Trend line and shade indicate y = x and its 95% confidence interval.

Data availability

All source codes and input data are available in the GitHub (https://github.com/yassato/ngwas2) and Zenodo (https://doi.org/10.5281/zenodo.19362850) repositories. These repositories include a vignette that demonstrates how to implement relevant models using RAINBOWR, which can be viewed at https://yassato.github.io/ngwas2/. To make the present modeling easy, we provide an R package named 'rNeighborLMM' at GitHub (https://github.com/yassato/rNeighborLMM) and Zenodo (https://doi.org/10.5281/zenodo.19362843).

Acknowledgements

We thank M. Jung and H. Muranty for providing complete genotype data of the apple REFPOP project; H. Barker and J. Riehl for addressing our questions about the aspen field arrangement and data annotation; and H. Iwata, S.E. Wuest, and M. Roth for discussions during the study. We acknowledge that the computing resource was provided by Human Genome Center at the University of Tokyo, Japan (http://sc.hgc.jp/shirokane.html). We used Paperpal by Editage (https://paperpal.com/) to correct English, but did not use any generative AI to draft the manuscript.

Additional information

Funding

This study was supported by Japan Science and Technology Agency (Grant no. JPMJFR233L to YS) and by Japan Society for the Promotion of Science (JP25H00928 to YS and KH). KH was supported by RIKEN Special Postdoctoral Researchers (SPDR) program during the study.

Author contributions

YS: Conceptualization, Methodology, Software, Data curation, Investigation, Formal analysis, Validation, Visualization, Funding acquisition, Writing - Original Draft; KH: Conceptualization, Methodology, Software, Funding acquisition, Writing - Review & Editing.

Funding

MEXT | Japan Science and Technology Agency (JST)

https://doi.org/10.52926/JPMJFR233L

Yasuhiro Sato

MEXT | Japan Society for the Promotion of Science (JSPS) (JP25H00928)

Kosuke Hamazaki

Additional files

Supplementary Tables S1-S3

Appendix S1

Original model of Neighbor GWAS

In the main text, we modified the original Neighbor GWAS model to implement a flexible mixed model including polygenic and oligogenic IGEs. Here we present the original model [24] to highlight how to model locus-wise IGEs. Following the main text (Fig. 1), we modeled IGEs of neighboring genotypes at q-th SNP locus on focal k_i-th individual phenotype at the local space k as follows:

where x_q,i and x_q,j represent the genotype of focal i-th and neighbor j-th individuals in a local space k (where J_k is the total number of neighboring individuals). For simplicity, let us assume inbred lines with two homozygotes are encoded as aa = −1 and AA = +1 Then, the fixed effect β_q,2 indicates the influence of mean allelic (di)similarity [i.e., (−1)×(+1) = (+1)×(−1) = −1 or (+1)×(+1) = (−1)×(−1) = +1] between focal and all neighboring individuals on . This is analogous to the Ising model of ferromagnetics [27], in which β_q,2 represents physical interactions among the south/north dipole of neighboring magnets while β_q,1 is an external energy. By focusing on the fixed effects of Eq. S1.1, Figure S1 exemplifies spatial patterns shaped by neighbor genetic interactions over time, in which negative and positive β_q,2 favors spatial mixing and separation of two genotypes in a neighborhood (Fig. S1a and b, respectively). In real data, we aim to estimate β_q,1 and β_q,2 by regressing observed on .

In addition to the fixed effects β_q,1 and β_q,2, denotes random effects that represent polygenic self and neighbor genotypic effects, respectively. Using vectors and matrices, Eq. S1 can also be expressed as a mixed model as follows:

In the original model [24], u consists of additive polygenic effects from self and neighbor genotypes as . The residual e_i is defined as e_i ∈ e and . Z denotes a design matrix and can be an identify matrix I when all genotypes are different. The N × N variance-covariance matrices K₁ and K₂ are the same as defined in the main text (see “Model development”).

When the two random effects u₁ and u₂ are additive and independent, the total phenotypic variance can be decomposed as . Then, the total genetic and environmental variance can be partitioned as and . The proportion of phenotypic variation explained by additive DEGs and IGEs (PVE₁₊₂) is then given as PVE₁₊₂ = V_g/(V_g + V_e). Furthermore, the net contribution of IGEs to the phenotypic variation should be calculated as , where denotes the conventional SNP heritability. Although this conservative metric was needed due to a strong element-wise correlation between K₁ and K₂ [24], this issue would be addressed by allowing the covariance parameter ρ₁₂ = (−∞, ∞) to decrease or increase the total genetic variance in the present study (see Eq. 2).

Appendix S2

Interaction term between SNP-wise self and neighbor effects

The original model Eq. S1.1 implicitly assumes symmetric effects between focal and neighboring individuals at the locus q by . This assumption can be relaxed by introducing an interaction term between DGE and IGE as follows:

When genotypes values are encoded as {AA, Aa, aa} = {+1, 0, −1}, squared-genotype values turn +1 or 0. This leads to , thereby making Eq. S2.1 simpler as follows:

This Eq. S2.2 means that in the context of the Ising model, the statistical interaction term β_q,12 can be exchanged as a main effect [56]. For clarity, we redefine β_q,1 = β_q,s; β_q,12 = β_q,n; and β_q,2 = β_q,s×n, which yields:

Similarly, Eq. S2.3 can be extended as

where the three random effects u_s, u_n, and u_s×n can be respectively defined as . To correct for the population structure due to asymmetric neighbor effects, Eq S2.4 introduced the third random effect the original model. Similar to K_s = K₁ and K_s×n = K₂, we can define K_n as the N × N variance-covariance matrix that represents a polygenic self-by-neighbor genotypic effects on . This self-by-neighbor matrix K_n is also given by the cross-product as . The element of the N individuals × Q SNPs matrix X_n consists of covariates for the self-by-neighbor genotypic effects as . After estimating the three variance component parameters , and , we can obtain a weighted kernel as . Similar to GWAS based on Eq. 1 (see Methods), this weighted kernel matrix represents the total population structure due to polygenic self, neighbor, and self-by-neighbor effects. GWAS can be performed using an eigenvalue decomposition of the weighted kernel K′ and diagonalization method implemented in the lmm.diago function of the gaston package [52].

Appendix S3

Theoretical expectation of mean phenotypic values

In terms of evolutionary biology, the Neighbor GWAS model offers a straightforward regression method for population genetic models of frequency-dependent selection [25,56], also known as pairwise interaction models [28]. Specifically, the sign of β_q,2 represents positive or negative frequency-dependent selection when is a fitness-related phenotype [25]. In other words, a sort of Eq. S1.1 and S2.1 poses a selection gradient analysis of frequency-dependent selection to estimate β_q,2 by regressing phenotypes on genotypes and their spatial similarity [25].

By focusing on allele frequency within a local space k, we can clarify how the three locus-wise coefficients β_q,s, β_q,n, and β_q,s×n affect the focal phenotypic value [25]. Given that all the three datasets analyzed in this study included non-negligible fraction of heterozygous loci (see Methods), we assumed additive effects and encoded genotype values as {AA, Aa, aa} = {+1, 0, −1}. This holds the case 2 in Appendix S2 of Sato et al. [25]. According to Sato et al. [25], let us assume random mating and interactions among AA, Aa, and aa genotypes in the local space k where f_q,A denotes the frequency of A alleles at q-th SNP locus. The phenotypic values for AA, Aa, and aa genotypes are then given as follows:

. Then, the marginal phenotypic values at allele levels are given by their weighted mean as y_A = β₀ + f_q,ARβ_q,n + β_q,s×nSR2f_q,A − 1S + β_q,sf_q,A and y_a = β₀ + R1 − f_q,ASRβ_q,n − β_q,s×nSR2f_q,A − 1S − β_q,sR1 − f_q,AS (see also Appendix S2 of Sato et al. [25]). Furthermore, the group mean of is given by the weighted mean between the two genotypes as .

Figure S2 presents numerical examples of y_A, y_a and in response to f_q,A. When β_q,n < 0, the group mean decreases as increasing f_q,A (Fig. S2c and f). Contrarily, increases as increasing f_q,A when β_q,n > 0 (Fig. S2b and e). These two cases indicate that the sign of β_q,n determines a positive or negative response of the group mean to the selection for A alleles. Regarding the other coefficients, β_q,s modulates the intercept while β_q,s×n determines relative difference (i.e., frequency-dependent selection) between y_A and y_a and resulting shape of (i.e., convex or concave: Fig. S2a and d). These examples showcase the way in which the group mean decreases or increases in response to frequency-dependent selection on .

The proposed model Eq. 1 extends β_q,s, β_q,n, and β_q,s×n to accommodate their polygenic effects. As shown by Figure S2, the sign of β_q,n (= β_q,12) determines whether the group mean decreases or increases when the AA genotypes are selected to become rare or abundant in a group k. According to the interpretation of β_q,n (= β_q,12), the covariance ρ₁₂ determines negative or positive response of self-polygenic effects to the neighboring allele frequency (see also the main text Eq. 1). Taken together, the covariance ρ₁₂ can be interpreted as a key parameter that represents negatively or positively correlated responses between polygenic DGE and IGE. This interpretation aligns with DGE-IGE covariance in the ordidnal IGE model (see Appendix S4 below).

Appendix S4

Designating indirect genetic effects by design matrices

Other than our models, neighboring individuals can be specified using a design matrix or incidence matrix Z. Ordinal IGE models adopted this way to formulate polygenic DGEs and IGEs [5,11,57]. The ordinal IGE model can also be implemented using the RAINBOWR package as it allows users to define Z. Here, let the two design matrices be Z₁ and Z₂ for polygenic DGEs and IGEs, respectively. The ordinal model is then given as follows.

where

This ordinal model defines polygenic DGE and IGE by random effects u₁ and u₂ without fixed effects for each SNP. In the random effects, the kinship matrix K₁ is shared among the variance and covariance parameters , and ρ₁₂ as denoted by the Kronecker product ⊗. In this context, the sign of the covariance ρ₁₂ determines antagonistic (negative) or synergistic (positive) responses of focal and neighboring individuals to selection [57]. To achieve GWAS, previous studies tested SNP-wise fixed effects from focal and neighboring individuals with Eq. S4 set as a null model [e.g., 11,15]. Compared to the ordinal model (Eq. S4), the proposed model (Eq. 1.) is more extensible to incorporate SNP-wise fixed effects β_q,1, β_q,2, and β_q,12 while distinguishing variance-covariance matrices among K₁, K₂, K₁₂ and K₂₁.

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Model development

Overview of the proposed mixed model incorporating oligogenic and polygenic indirect genetic effects.

Benchmark simulation

Benchmark simulations under the scenarios of positive or negative covariance between DGE and IGE.

Application to real data

Discussion

Model performance and limitation

Genetic architecture of tree competition

Proportion of phenotypic variation explained (PVE) by direct genetic effects (DGEs), indirect genetic effects (IGEs), and their covariance (cov) in aspen.

Indirect genetic effects of conspecific neighbors on apple traits across the six sites (Belgium, BEL; France, FRA; Italy, ITA; Poland, POL; Spain, ESP; and Switzerland, CHE) and three years (2018, 2019, and 2020).

Proportion of phenotypic variation explained (PVE) by direct genetic effects (DGEs), indirect genetic effects (IGEs), and their covariance (cov) in grape.

Conclusion

Methods

GWAS/GP implementation

Simulation

Variance component and genomic prediction

Genome-wide association study

Application

Aspen

Apple

Grape

Supplementary Tables

Supplementary Figures

Ising model-based simulation of spatial patterns shaped by the nearest neighbor interactions.

Numerical examples for the theoretical expectation between the phenotypic value and the neighboring frequency of A alleles fq,A at q-th SNP locus within a local space k.

Variance components and genomic prediction of simulated data under the scenarios of moderate positive (ρ = 0.3: a-d), null (ρ = 0: e-h), or moderate nagative (ρ = −0.3: i-l) covariance between DGE and IGE.

Coefficient of determination (R2) for the estimated phenotypic values ŷ between the models with or without the covariance parameter ρ12.

Genomic prediction of the basal area increment from 2014 to 2015 (BAI2014_2015) in aspen.

Proportion of phenotypic variation explained (PVE) by indirect genetic effects (IGEs), direct genetic effects (DGEs), and their covariance (cov) in 20 apple traits other than the flowering intensity, trunk diameter, and trunk increment given in the main text.

GWAS Manhattan (a) and quantile-quantile (b) plots for the citric acid (“citr.norm” in Fig. 5) in grape.

Data availability

Acknowledgements

Additional information

Funding

Author contributions

Funding

Additional files

Appendix S1

Original model of Neighbor GWAS

Appendix S2

Interaction term between SNP-wise self and neighbor effects

Appendix S3

Theoretical expectation of mean phenotypic values

Appendix S4

Designating indirect genetic effects by design matrices

References

Article and author information

Author information

Yasuhiro Sato

Kosuke Hamazaki

Author Notes

Version history

Cite all versions

Copyright

Metrics

Numerical examples for the theoretical expectation between the phenotypic value and the neighboring frequency of A alleles f_q,A at q-th SNP locus within a local space k.

Coefficient of determination (R²) for the estimated phenotypic values ŷ between the models with or without the covariance parameter ρ₁₂.