Sibling Similarity Can Reveal Key Insights Into Genetic Architecture

  1. Department of Cellular Biology, Suny Downstate Health Sciences, Brooklyn, NY, USA
  2. Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, 1 Gustave L Levy Pl, NY, NY, USA

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Alexander Young
    University of California, Los Angeles, Los Angeles, United States of America
  • Senior Editor
    Detlef Weigel
    Max Planck Institute for Biology Tübingen, Tübingen, Germany

Reviewer #1 (Public Review):

The authors sought to craft a method, applicable to biobank-scale data but without necessarily using genotyping or sequencing, to detect the presence of de novo mutations and rare variants that stand out from the polygenic background of a given trait. Their method depends essentially on sibling pairs where one sibling is in an extreme tail of the phenotypic distribution and whether the other sibling's regression to the mean shows a systematic deviation from what is expected under a simple polygenic architecture.

Their method is successful in that it builds on a compelling intuition, rests on a rigorous derivation, and seems to show reasonable statistical power in the UK Biobank. (More biobanks of this size will probably become available in the near future.) It is somewhat unsuccessful in that rejection of the null hypothesis does not necessarily point to the favored hypothesis of de novo or rare variants. The authors discuss the alternative possibility of rare environmental events of large effect. Maybe attention should be drawn to this in the abstract or the introduction of the paper. Nevertheless, since either of these possibilities is interesting, the method remains valuable.

Reviewer #2 (Public Review):

Souaiaia et al. attempt to use sibling phenotype data to infer aspects of genetic architecture affecting the extremes of the trait distribution. They do this by considering deviations from the expected joint distribution of siblings' phenotypes under the standard additive genetic model, which forms their null model. They ascribe excess similarity compared to the null as due to rare variants shared between siblings (which they term 'Mendelian') and excess dissimilarity as due to de-novo variants. While this is a nice idea, there can be many explanations for rejection of their null model, which clouds interpretation of Souaiaia et al.'s empirical results.

The authors present their method as detecting aspects of genetic architecture affecting the extremes of the trait distribution. However, I think it would be better to characterize the method as detecting whether siblings are more or less likely to be aggregated in the extremes of the phenotype distribution than would be predicted under a common variant, additive genetic model.

Exactly how the rareness and penetrance of a genetic variant influence the conditional sibling phenotype distribution at the extremes is not made clear. The contrast between de-novo and 'Mendelian' architectures is somewhat odd since these are highly related phenomena: a 'Mendelian' architecture could be due to a de-novo variant of the previous generation. The fact that these two phenomena are surmised to give opposing signatures in the authors' statistical tests seems suboptimal to me: would it not be better to specify a parameter that characterizes the degree or sharing between siblings of rare factors of large effect? This could be related to the mixture components in the bimodal distribution displayed in Fig 1. In fact, won't the extremes of all phenotypes be influenced by all three types of variants (common, rare, de-novo) to greater or lesser degree? By framing the problem as a hypothesis testing problem, I think the authors are obscuring the fact that the extremes of real phenotypes likely reflect a mixture of causes: common, de-novo, and rare variants (and shared and non-shared environmental factors).

To better enable interpretation of the results of this method, a more comprehensive set of simulations is needed. Factors that may influence the conditional distribution of siblings' phenotypes beyond those considered include: non-normal distribution, assortative mating, shared environment, interactions between genetic and shared environmental factors, and genetic interactions.

In summary, I think this is a promising method that is revealing something interesting about extreme values of phenotypes. Determining exactly what is being revealed is going to take a lot more work, however.

Author Response:

The major criticism from the reviewers is that factors other than high-impact rare variants – such as environmental factors or epistasis – could have produced the complex tail architecture that we test for and detect. While we did explain this point in the Discussion, we agree with the reviewers that this should have been emphasized more and earlier in the manuscript.

Regarding suggestions for more complex simulations and methods, we absolutely agree that much more work is needed here to produce optimised inference of all the causes of complex tail architecture. We are performing multiple projects at various stages of completion that we hope will contribute to this, but we felt that this was a good stopping-point in this project to publish what we had completed so far, in order to: (1) introduce the idea of inferring complex genetic architecture from siblings without requiring genetic data, (2) outline an initial theoretical framework for inferring complex tail architecture from sibling data, (3) provide simple tests powered to identify enrichments of de novo or ‘Mendelian’ variants in the tails (albeit tests that make several strong simplifying assumptions), (4) enable others interested in the topic to build upon this work now. However, we plan to expand our simulations and analyses in a revised manuscript based on reviewer feedback.

We thank the reviewers for their comments about the value of our work, its mathematical robustness and the promise of our method.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation