Dissecting oligogenic and polygenic indirect genetic effects through the lens of neighbor genotypic identity

Yasuhiro Sato; Kosuke Hamazaki

doi:10.7554/eLife.111650.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Jonathan Flint
University of California, Los Angeles, Los Angeles, United States of America
Senior Editor
Detlef Weigel
Max Planck Institute for Biology Tübingen, Tübingen, Germany

Reviewer #1 (Public review):

This study presents a new model of phenotypic variation incorporating direct and indirect genetic effects, as well as a new implementation (RAINBOWR) for quantification, genomic prediction and GWAS. It includes a simulation study to test the model and implementation, and three applications to plant species.

The abstract describes the main novelty and significance of the study as follows: "Recent studies have utilized high-resolution polymorphism data to enable genomic prediction (GP) and genome-wide association study (GWAS) of IGEs, but unified methods remain limited". I disagree with this statement (e.g., using ASREML: https://doi.org/10.1186/s12711-018-0409-7, using LIMIX: https://doi.org/10.1186/s13059-021-02415-x; etc.).

The parameterisation of genetic effects in the model is not standard and complex. Hence, the simulation study is key, and the results need to be presented in a very rigorous manner. I have several points to make on this:

(1) L172 says the estimated parameters are "close to" the real parameters. The results of the simulation study need to be quantitative (see https://www.biorxiv.org/content/10.64898/2026.03.10.710784v1.supplementary-material for example).

(2) Figure 2h: the estimates seem to be biased, no?

(3) Figure 2 in general: why isn't there a difference between cov and noncov? Do we not expect the inclusion or non-inclusion of a covariance term to affect the other genetic parameters and the results presented in Figure 2?

(4) Does "total BLUPs were highly correlated between models with and without 𝜌" really validate the model?

(5) As far as the GWAS is concerned, the results of the simulation study should include a figure showing whether the p-values are inflated (as observed in the grape application), and not just a ROC curve.

The model only includes IID residuals, whereas the importance of including non-genetic social effects (IEE) has been demonstrated in many settings, and other IGE plant studies have used sophisticated spatially structured residuals (e.g. 10.1111/nph.12035). Can the authors justify why they considered only IID residuals? In the three applications presented, wouldn't it be appropriate to include spatially structured residuals and potentially other relevant covariates?

It remains unclear why the authors chose such an unconventional parameterisation of the DGE IGE models for the questions asked in this study. It seemed appropriate to study frequency-dependent selection (previous paper), but for this study, focused on IGE quantification and GWAS, the classical models (e.g. early models by P. Bijma but also more recent models that allow for distance-dependent IGE) seem appropriate, and they are much simpler and easier to interpret, and have been validated in many settings). The Discussion paragraph L274-284 only strengthens my doubts.

https://doi.org/10.7554/eLife.111650.1.sa2

Reviewer #2 (Public review):

Summary:

In this study, Sato and Hamazaki have expanded upon previous work, describing quantitative genetic models for direct and indirect genetic effects and applied this to both simulated and real plant datasets of three different tree species. The methods are clearly described and accompanied by a number of R packages freely available to the wider community.

Strengths:

The main strength lies in the joint modelling of DGE, IGE and their covariance while also simultaneously modelling single-SNP fixed effects (including SNP interactions across neighbours) and a polygenic effect that goes beyond a simple kinship correction as found in many traditional GWAS models, to a compound kinship structure that accounts for DGE, IGE and their interaction.

Weaknesses:

There were some aspects that deserved more attention from the authors. For example, the authors found that a very large amount of phenotypic variation in citric acid content in grapes was explained by neighbour identity, along with over 1000 significant SNPs, yet there was little to no discussion of this result and how it could have arisen (apart from some mention of volatiles and ethylene - but without being explicit on the mechanism here). The simulation study also only considered the scenario of equal direct and indirect genetic variances, while previous studies, as well as the 3 real datasets presented in this study, show that DGE variance is almost always larger than IGE variance. A simulation study cannot be exhaustive, of course, but it seems more likely that in reality and for most traits, IGE will be more difficult to detect than DGE.

https://doi.org/10.7554/eLife.111650.1.sa1

Reviewer #3 (Public review):

Summary:

The authors aimed at studying the genetics of interactions between individuals, notably the genetic architecture of indirect genetic effects. For that, they mobilized a technique known as "genome-wide association" study. GWASs are typically formalized as linear mixed models (LMMs) with fixed effects to identify the oligogenic component of the genetic architecture (usually SNPs tested one by one, as done here), and with random effects to quantify the overall contribution of the polygenic component of the genetic architecture (using a kinship matrix). They used an LMM with a few corrections and improvements from one of their already-published model, assessed it on data they had already simulated in a previous work, and applied it to three datasets generated and originally analyzed by others, focusing only on direct genetic effects. The results on simulated data confirmed that it was necessary to adapt their previous model. The results on real data confirmed the presence of negative correlation between direct and indirect genetic effects (for two out of three species), as was already known from other studies. They found a few SNPs with significant, indirect effects, which led them to identify candidate genes, but they did not validate them.

Strengths:

The main strength of the manuscript lies in the question tackled by the authors, i.e., related to indirect genetic effects, with the ambition to go beyond the estimation of overall effects towards the distinction between polygenic and oligogenic components of genetic architecture. They also found, in an apple dataset, a significant IGE SNP that also happens to be in a DGE-associated region.

Weaknesses:

(1) Overall, the authors do not engage sufficiently with the existing literature, and do not provide strong evidence that their approach is more powerful or more interpretable than others. Hence, this work seems rather incremental.

(2) The authors used an LMM that corresponds to a previous LMM they already published in 2021, with a few changes that appeared more like corrections than improvements. Their model raised several questions.

(3) First of all, their previous model included the polygenic component of direct genetic effects (modeled as random with a kinship matrix), but not the polygenic component of indirect genetic effects. As a consequence, the initial model did not allow both direct and indirect genetic effects to be correlated, although this correlation is the hallmark of the topic: a negative correlation can lead to selection on direct effects only to deliver a negative genetic gain (Griffing, 1967). This was corrected in their new model here, so that it is similar in this respect to the other models. They highlighted that, on simulated data, their new model could "infer a trade-off between DGEs and IGEs", but that was the very goal of introducing the correlation parameter, so it was reassuring at least to know that they could estimate it on simulated data. On real data, they found evidence for it being negative, which was already the case in Cappa and Cantet (2008) for a tree species, in Haug et al (2023) for annual crops, in Montazeaud et al (2023) for A. thaliana, etc. They tested for significativity but did not provide any confidence interval. They showed the proportion of variance explained by the covariance, but did not discuss the sign or magnitude of this correlation.

(4) Although the authors included a correlation parameter between DGE and IGE in their updated model, they did not specify if the residual errors were correlated, too. In fact, they did not even specify a distribution for them. It is already known that allowing for correlated errors may not change the estimates (Haug et al, 2021), but in some settings it can be important (Bergsma et al, 2008).

(5) In appendix S4, they say that the "ordinal" model (I am not sure of what they meant by this word) "defines polygenic DGE and IGE by random effects without fixed effects for each SNP". However, this is not correct; see Baud et al (2021), for instance. In any LMM, it is straightforward to include a single fixed effect for a given SNP, and to do it one SNP at a time. Moreover, they claimed that "compared to the ordinal model (Equation S4), the proposed model (Equation 1) is more extensible to incorporate SNP-wise fixed effects while distinguishing variance-covariance matrices", without providing more evidence than this statement.

(6) The authors seemed keen to convince us that the fact that their model is analogous to the Ising model of ferromagnetics was an advantage in itself. But why would it be? Beyond the mere analogy, it should be a matter of modelling choice, and thus be clearly motivated. For instance, they chose to assess the strength of the association between the trait in the focal individual (y_{k_i}) and the average (dis)similarity between the focal individual and all its neighbors (in neighborhood k), calling the latter "indirect genetic effect". Moreover, it is not clear if what they called "IGE" is \beta_{q,2}, u_2, both, or also \beta_{q,12}, etc? Furthermore, they should have used another term as this is not the same as the "indirect genetic effects" of the other models. In these models, what is called the indirect genetic effects can be modeled as depending on group size (see Hadfield and Wilson, 2007; Bijma, 2010). In which sense would the approach of the authors be better? How does it relate to the other models? Do they have more power? Is their term more interpretable?

(7) Another way in which the authors' model may be different from the other models is in the way it models interactions between direct genetic effects and aggregate (dis)similarity between focal and neighbors. At the level of the polygenic components, other models simply have a (DGExIGE) term capturing the deviations from the additivity of DGE + IGE (e.g., Wright, 1985, in the multispecific context). Here, the authors indeed mentioned "interactions between polygenic DGEs and IGEs" and introduced the K_12 matrix, but it is not clear how different (or similar) it is from the more classical (DGExIGE) term. At the level of the oligogenic component, the authors introduced \beta_{q,12}, but it is not clear, to me at least, how it relates to K_12 and K_21.

(8) The authors checked their model on simulated data for various levels of correlation between u_1 (GE) and u_2.

(9) It is not clear why they have higher absolute errors with negative covariance than with a positive one.

(10) As a causative IGE SNP, the authors considered one with a beta_{q,2} significantly different from 0. However, they also have two other coefficients, beta{q,_}1 and beta_{q,12}, for each SNP q. How is the FDR in RAINBOW controlled in such a case? This is not detailed.

(11) In their simulations, the causative IGE SNPS were also causative DGE SNPs. However, this may increase power. From the manuscript title, one could assume that the authors' goal was to distinguish between the SNPs that are both DGE and IGE, versus the ones that are IGEs only.

(12) From what I understood, the authors first estimated the (co)variance components once and for all on the model without any SNP, and they then used the values to fit the GWAS model one SNP at a time. This assumes that the inclusion of SNP effects modeled as fixed would not change anything regarding the (co)variance components, but this is not warranted.

(13) The authors applied their model to three datasets of perennial plants.

(14) They only used their model and did not provide evidence that their model gave a significant improvement compared to other models, such as the one of Baud et al (2021).

(15) In Figures 3, 4 and 5, having an indication of which cases have a significant correlation between u1 and u2 would have helped.

(16) Concerning the Aspen dataset, it is not clear why the authors claimed that "the negative effects of neighboring genotypes were amplified as trees matured" as the PVE_cov in Figure 3 in 2015 are not systematically more negative than those of Figure 3 in 2014.

(17) When discussing their results, the authors should engage more with the literature estimating DGE-IGE correlations (see some of the references above).

(18) Concerning the apple dataset, they mentioned that "metabolite accumulation in ripening fruits may be facilitated by volatile chemicals, such as ethylene", but they did not find any evidence for significant IGE SNPs localized close to a gene involved in ethylene production. Claiming that these are testable hypotheses should have been made earlier, in the introduction, than a posteriori in the discussion.

https://doi.org/10.7554/eLife.111650.1.sa0

Dissecting oligogenic and polygenic indirect genetic effects through the lens of neighbor genotypic identity

Peer review process

Editors

Be the first to read new articles from eLife