Sibling similarity can reveal key insights into genetic architecture

  1. Tade Souaiaia  Is a corresponding author
  2. Hei Man Wu
  3. Clive Hoggart  Is a corresponding author
  4. Paul F O'Reilly  Is a corresponding author
  1. Department of Cellular Biology, SUNY Downstate Health Sciences, United States
  2. Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, United States

eLife assessment

The authors present a solid statistical framework for using sibling phenotype data to assess whether there is evidence for de novo or rare variants causing extreme trait values. Their valuable method is promising and will be of interest to researchers studying complex trait genetics.

https://doi.org/10.7554/eLife.87522.3.sa0

Abstract

The use of siblings to infer the factors influencing complex traits has been a cornerstone of quantitative genetics. Here, we utilise siblings for a novel application: the inference of genetic architecture, specifically that relating to individuals with extreme trait values (e.g. in the top 1%). Inferring the genetic architecture most relevant to this group of individuals is important because they are at the greatest risk of disease and may be more likely to harbour rare variants of large effect due to natural selection. We develop a theoretical framework that derives expected distributions of sibling trait values based on an index sibling’s trait value, estimated trait heritability, and null assumptions that include infinitesimal genetic effects and environmental factors that are either controlled for or have combined Gaussian effects. This framework is then used to develop statistical tests powered to distinguish between trait tails characterised by common polygenic architecture from those that include substantial enrichments of de novo or rare variant (Mendelian) architecture. We apply our tests to UK Biobank data here, although we note that they can be used to infer genetic architecture in any cohort or health registry that includes siblings and their trait values, since these tests do not use genetic data. We describe how our approach has the potential to help disentangle the genetic and environmental causes of extreme trait values, and to improve the design and power of future sequencing studies to detect rare variants.

Introduction

The fields of quantitative genetics and genetic epidemiology have exploited the shared genetics and environment of siblings in many applications, notably to estimate heritability, using theory developed a century ago (Fisher, 1919; Lush, 1940), and more recently to infer a so-called ‘household effect’ (Selzam et al., 2018), which contributes to genetic risk indirectly via correlation between genetics and household environment. Here, we leverage sibling trait data to infer genetic architecture departing from polygenicity, specifically affecting the tails of the trait distribution and consistent with enrichment in rare variants of large effect.

The genetic architecture of a complex trait is typically inferred from findings of multiple independent studies: genome-wide association studies (GWAS) identifying common variants (Yengo et al., 2022; Selvaraj et al., 2022), whole-exome or whole-genome sequencing studies detecting rare variants (Singh et al., 2022), and family sequencing studies designed to identify de novo and rare Mendelian mutations (Iyegbe and O’Reilly, 2022). The relative contribution of each type of variant to trait heritability is a function of historical selection pressures on the trait in the population (Koch et al., 2024; Uricchio, 2020). If selection has recently acted to increase the average value of a trait, then the lower tail of the trait distribution may become enriched for large-effect rare variants over time because trait-reducing alleles will be subject to negative selection, and so those with large effects – most likely to ‘push’ individuals to the lower tail – will be reduced to low frequencies (Corte et al., 2023). Similarly, if the trait is subject to stabilising selection, then both tails of the trait distribution may be enriched for rare variant aetiology (Momozawa and Mizukami, 2021). This can result in less accurate polygenic scores in the tails of the trait distribution (Chan et al., 2011) and can also produce dissimilarity between siblings beyond what is expected under polygenicity. For example, studies on the intellectual ability of sibling pairs have demonstrated similarity for average intellectual ability (Shakeshaft et al., 2015), regression-to-the-mean for siblings at the upper tail of the distribution (Shakeshaft et al., 2015), and complete discordance when one sibling is at the lower extreme tail of the distribution (Reichenberg et al., 2016). However, no theoretical framework has been developed to formally infer genetic architecture from sibling trait data.

We introduce a novel theoretical framework that allows widely available sibling trait data in population cohorts and health registries to be leveraged to perform statistical tests that can infer complex genetic architecture in the tails of the trait distribution. These tests are powered to differentiate between polygenic, de novo, and Mendelian (i.e. rare variants of large effect) architectures (see Figure 1); while these are simplifications of true complex architectures, our tests allow enrichments of the different architectures to be inferred and compared across traits. This framework establishes expectations about the trait distributions of siblings of index individuals with extreme trait values (e.g. the top 1% of the trait) according to null assumptions of polygenicity, random mating, and environmental factors that are either controlled for or have combined Gaussian effects. Critical to our framework is our derivation of the ‘conditional sibling trait distribution’, which describes the trait distribution for one individual given the quantile value of one or more ‘index’ siblings. Our statistical framework, derivation of the conditional sibling trait distribution, and simulation study allow us to develop statistical tests to infer genetic architecture from data on siblings (Hur et al., 2013) without the need for genetic data (see ‘Model and methods’). We validate the statistical power of our tests using simulated data and apply our tests across a range of traits from the UK Biobank. We also release sibArc, an open-source software tool that can be used to apply our tests to sibling trait data (sibArc:Software for Inference of Genetic Architecture, 2024). Our novel framework can be extended to applications such as estimating heritability, characterising mating patterns, and inferring historical selection pressures.

Sibling similarity under different tail architectures.

Left to right: When an individual’s extreme trait value (top 1%) is due to many alleles of small effect (‘polygenic’), then their siblings’ trait values are expected to show regression-to-the-mean (grey). When an individual’s extreme trait value is due to a de novo mutation of large effect, then their siblings are expected to have trait values that correspond to the background distribution (green). When an individual’s extreme trait value is the result of an inherited rare allele of large effect (‘Mendelian’), then their siblings are expected to have either similarly extreme trait values or trait values that are drawn from the background distribution (red), depending on whether or not they inherited the same large effect allele.

Model and methods

Here, we outline a framework that models the conditional sibling trait distribution, which describes an individual’s trait distribution conditioned on an index sibling’s trait value. In this section, we describe our derivation of the distribution for a completely polygenic trait, outline the development of statistical tests that infer enrichments of rare genetic architectures via departures from polygenicity, and describe the simulation study used to validate and benchmark our results. Full derivations that complement this high-level summary of our methodology, as well as extended results from application of our tests to UK Biobank data, are included in the Appendices.

Conditional sibling inference framework

Assuming completely polygenic (i.e. infinitesimal) architecture, siblings of individuals with extreme trait values are expected to be less extreme. This regression-to-the-mean can be understood through the two factors (Falconer and Mackay, 1983; Alberts et al., 2002) that determine inherited genetic variation: (i) average genetic contribution of the parents (mid-parent) to the trait value and (ii) random genetic reassortment occurring during meiosis. Assuming that an individual presenting an upper-tail trait value does so due to both high mid-parent trait value and genetic reassortment favouring a higher value, then a sibling sharing common mid-parent trait value – but subject to independent reassortment – is likely to be less extreme. How much less extreme can be derived by first considering the conditional distribution that relates mid-parents and their offspring? In the simplest case, for a completely heritable continuous polygenic trait in a large population, the offspring trait values, s, are normally distributed around their mid-parent trait value, m, irrespective of selection and population structure (Johnson and Barton, 2005; Barton et al., 2017):

(1) p(sm)=N(m,σf2)

where σf2 represents within-family variance, which is constant across the population. This result does not require the trait distribution across the population to be Gaussian. If we assume that the population distribution is Gaussian and there is random mating and no selection, then σf2= half the population variance (Bulmer, 1998; Johnson and Barton, 2005; Baselmans et al., 2021). For a trait with heritability h2, trait variance can be partitioned into genetic, σg2, and environmental, σe2, contributions, such that σ2=σg2+σe2 and h2=σg2/(σg2+σe2). Assuming mean-centred genetic and environmental trait contributions, and a trait variance of 1 (a standard normalised trait), the distribution for offspring conditioned on the mid-parent genetic trait value, mg, is

(2) p(smg)=N(mg,σg2/2+σe2)=N(mg,1h22)

Since p(mg)=N(0,h2/2), from Bayes’ theorem, it can be shown that the mid-parent genetic trait value conditional on offspring trait value, s, is distributed as follows:

(3) p(mgs)=N(12sh2,h22(1h22))

From Equations 2 and 3, the sibling trait distribution conditional on an index sibling can be calculated as

(4) p(s2s1)=p(s2mg)p(mgs1)dmg=N(12s1h2,1h44)

A full derivation is provided in Appendix 1. This derivation via mid-parents can be generalised to cases where assumptions of random mating, Gaussian population trait distribution, and no selection do not hold. When these assumptions do hold, then the conditional sibling distribution can also be derived from the joint sibling distribution defined by the relationship matrix (Lencz et al., 2021). In the Appendices, we use the joint sibling distribution to derive the distribution conditional on two sibling values. From Equation 4, we can infer that under complete heritability the siblings of an individual with a standard normal trait value of z will have a mean trait value of z2, with variance equal to three quarters of the population variance. In Appendix 1, we further generalise the result to binary phenotypes, increasing the utility of this framework for further applications and theoretical development.

Statistical tests for complex tail architecture

In Figure 2, the strategy employed to develop statistical tests for complex tail architecture is depicted. Our approach corresponds to testing for deviations from the expected conditional sibling trait distribution under the null hypothesis of complete polygenicity in trait tails (see above for other null assumptions): excess discordance is indicative of an enrichment of de novo mutations, while excess concordance indicates an enrichment of Mendelian variants, that is, large effect variants segregating in the population. The heritability, h2, required to define the null distribution, is estimated by maximising the log-likelihood of the conditional sibling trait distribution (Equation 4) with respect to h2:

(5) logLnlog(1h44)11h44i=1n(s2(i)s1(i)h22)2
Identifying complex tail genetic architecture.

Conditional sibling z-values plotted against index sibling quantiles. Grey depicts complete polygenic architecture across index sibling values. In the lower tail, an extreme scenario of de novo architecture is shown in green, resulting in sibling discordance. In the upper tail, extreme Mendelian architecture is shown in red, whereby siblings are half concordant and half discordant, resulting in a bimodal conditional sibling trait distribution. Statistical tests to infer each type of complex tail architecture are designed to exploit these expected trait distributions.

where s1 and s2 represent index and conditional sibling trait values, respectively, and n is the number of sibling pairs. This allows h2 to be estimated for given quantiles of the trait distribution by restricting sibling pair observations to those index siblings in the quantile of interest. To maximise power to detect non-polygenic architecture in the tails of the trait distribution, we estimate ‘polygenic heritability’, hp2, from sibling pairs for which the index sibling trait value is between the 5th and 95th percentile (labelled ‘Distribution body’ in Figure 2).

Tests for complex architecture are then performed in relation to index siblings whose trait values are in the tails of the distribution (e.g. the lower and upper 1%). Below, Aq denotes the set of sibling pairs for which the index sibling is in quantile q, such that s1>Φ1(1q) and s1<Φ1(q) for the upper and lower tails, respectively, where Φ1 is the inverse normal cumulative distribution function.

It should be noted that our estimate of h2 in Equation 5 assumes no effects of shared environment. Polderman et al., 2015 found limited contribution of shared environment for most complex traits and, critically, our statistical tests are robust to shared environmental effects with consistent effects throughout the trait distribution (see ‘Discussion’).

Statistical test for de novo architecture

For inference of de novo architecture in the tails of the trait distribution, we introduce a parameter, α, to the log-likelihood defined by the conditional sibling trait distribution Equation 5:

(6) logL12(1h44)iAq(s2(i)(12s1(i)h2+α))2

Values of α>0 in the lower tail and α<0 in the upper tail indicate excess regression-to-the-mean and, thus, high sibling discordance, consistent with an enrichment of de novo mutations among the index siblings. The z-statistic of the one-sided score test for α>0 in the lower quantile, q, relative to the null of α=0 is (see Appendix 2 for derivation):

(7) z=iAqs2(i)12s1(i)h2n(1h44)

For the upper tail test of α<0, the above is multiplied by –1.

Statistical test for Mendelian architecture

For inference of Mendelian architecture in the tails of the trait distribution, we compare the observed and expected tail sibling concordance, defined by the number of sibling pairs for which both siblings have trait values in the tail. For each index sibling in Aq, we calculate the probability that the conditional sibling is also in Aq, which, for the upper tail, is given by

(8) P(s2>Φ1(q)s1)=1Φ(Φ1(q)s1h221h44)

where Φ represents the normal cumulative distribution function. Denoting the mean of P(s2(i)>Φ1(q)) across all index siblings in Aq by πo, the expected sibling concordance is nπo, where n is the number of index siblings in Aq. Given an observed number of concordant siblings, r, the z-statistic for a one-sided score test for excess concordance is (see Appendix 2 for derivation) given by

(9) z=rnπ0nπ0(1π0)

Simulation of conditional sibling data

We perform simulations using publicly available GWAS data on multiple traits to validate our analytical model and benchmark our tests for complex architecture. Figure 3 depicts the different stages of our simulation procedure. We start by simulating a ‘parent population’ (step A), utilising the allele frequencies and effect sizes of the first 100k SNPs from a trait GWAS to sample genotypes and subsequently trait values assuming an additive model. Next, parents are randomly paired and their genetic trait values averaged to produce mid-parent trait values (step B), and genotypes of two offspring (Equation 1) and corresponding genetic trait values, G, are calculated (step C) assuming independent reassortment of parental alleles and unlinked SNPs.

Simulation schematic.

Publicly available genome-wide association studies (GWAS) allele frequency and effect size data is used to simulate parent genetic trait value (A). Mid-parent genetic trait value (B) is simulated assuming random mating. Offspring genotype and genetic trait value (C) is simulated assuming complete recombination. Environmental variation (D) is added to compare with theoretical polygenic conditional sibling distribution. De novo and Mendelian rare-variant effects are simulated (E) to benchmark tests for complex architecture (F).

In step D, we generate offspring trait values for different degrees of heritability by adding a Gaussian environmental effect. For heritability, h2, offspring trait values are given by T=hG+E, where G is the genetic effect, standardised to have mean 0 and variance 1, and the environmental effect E is drawn from a normal distribution with mean 0 and variance (1h2). The simulated trait has a N(0,1) distribution.

In step E, we simulate the effect of complex tail architecture on the conditional sibling trait distribution. We assume that rare variants are sufficiently penetrant to move individuals into the tails of the distribution, independent of their polygenic contribution. We, thus, modify sibling trait values for individuals already in the tails (from step D) to minimise perturbation of the trait distribution. We simulate de novo tail architecture by resampling the less extreme sibling from the background distribution and simulate Mendelian tail architecture by resampling the less extreme sibling from the background distribution with probability 0.5, and from the same tail as the extreme sibling with probability 0.5.

Application to UK Biobank data

The UK Biobank includes data from over 21,000 siblings (Lello et al., 2023). To apply our tests to this dataset, we began by identifying continuous traits (at least 50 unique values) with at least 5000 sibling pairs, as defined by kinship coefficient 0.18–0.35 and >0.1% SNPs with IBS0 to distinguish from parent-offspring (Cheesman et al., 2020; Bycroft et al., 2018). After removing outliers with absolute trait value >8 standard deviations from the mean, we removed all traits with absolute skew or excess kurtosis greater than 0.5 to reduce the likelihood that skewed or heavy-tailed trait distributions impact our statistical tests. The remaining traits were standardised using rank-based inverse normal transformation and adjusted for age, sex, recruitment centre, batch covariates, and the first 40 principal components. After this, to ensure a primarily additive polygenic aetiology, we required that traits have heritability hSNP2>0.1(Ni et al., 2018) and that no single SNP contribute more than 0.01 to h2.

We applied our method to estimate heritability from sibling pairs (Equation 5) across the distribution and within the trait body (5th to 95th percentiles) on the remaining traits and selected the 18 traits for which both measures exceeded 30% for further analysis. For each of these 18 traits, siblings were randomly assigned index and conditional status and both trait tails were tested for departures from polygenicity using our de novo and Mendelian tests, as well as a general Kolmogorov–Smirnov test (Marozzi, 2013) to identify departures from the conditional sibling distribution assuming polygenicity (Appendix 2).

Results

Here, we illustrate the conditional sibling trait distribution, validate the accuracy of our analytical model using simulation, perform power analyses for our statistical tests for complex genetic tail architecture (see ‘Model and methods), and apply our tests to trait data on thousands of siblings from the UK Biobank.

Conditional sibling trait distribution

In Figure 4A, the conditional sibling trait distribution (Equation 4) is illustrated at different index sibling trait values (ranked percentiles). For an almost entirely heritable polygenic trait (orange), siblings of individuals at the 99th percentile (z=2.32) have mean z-scores approximately halfway between the population mean and index mean (i.e. z=1.1). This regression-to-the-mean is greater when trait heritability is lower (blue), assuming (as here) independent environmental risk among siblings.

Conditional sibling trait distribution under polygenic architecture.

(A) The conditional sibling trait distribution according to Equation 4 for index siblings at the 1st, 25th, 50th, 75th, and 99th percentile of the standardised trait distribution, when heritability is high (h2=0.95, in orange) and moderate (h2=0.5, in blue). When heritability is 0.95, conditional sibling expectation is almost half of the index sibling z-score; when heritability is 0.5, the conditional sibling expectation is equal to 1/4 of the index sibling z-score. (B) The conditional distribution transformed into rank space. An individual whose sibling is at the 99% percentile is expected to have a trait value in the 80% percentile when heritability is high and in the 67% percentile when heritability is moderate.

In Figure 4B, the conditional sibling z-distribution is transformed into percentiles for interpretation in rank space. This distribution is skewed, especially at the tails, due to truncation at extreme quantiles (i.e. siblings cannot be more extreme than the top 1%). For a trait with h2=0.95, siblings of individuals at the 99th percentile (z = 2.32) have a mean trait value at the 80th percentile. Note that this is less extreme than the result of transforming their expected z-value into percentile space (Φ1(1.1)=86%), which is a consequence of Jensen’s inequality (Jensen, 1906) given that the inverse cumulative distribution function of the normal is convex above zero.

We compared the theoretical conditional sibling trait distributions to those generated from simulated data (see Appendix 3) and found that irrespective of the trait used to simulate data (e.g. fluid intelligence, height) the two distributions did not differ significantly, suggesting that our analytically derived distributions are a valid model for the conditional sibling trait distribution (Equation 4).

Power of statistical tests to identify complex tail architecture

Building on the theoretical framework introduced in the section ‘Model and methods’ and illustrated in the previous section, we develop statistical tests to identify complex architecture in the tails of the trait distribution. These tests leverage the fact that the similarity (or dissimilarity) in trait values among siblings provides information about the underlying genetic architecture (see Figure 1). For example, high-impact de novo mutations generate large dissimilarity between siblings when only one carries the unique mutant allele, while Mendelian variants can create excess similarity in the tails of the distribution when siblings both inherit the same mutant allele.

In Appendix 2, we provide detailed derivations for the statistical tests described at a high level in the section ‘Model and methods’ and explain how they identify tail signatures in contrast to a polygenic background where conditional siblings regress-to-the-mean at a rate proportional to h2/2 (Equation 4). The performance of the two tests evaluated using simulated sibling data is shown in Figure 5. These tests demonstrate that power to identify de novo architecture is greatest when heritability is high, while power to identify Mendelian architecture is greatest when heritability is low. These patterns can be explained by the fact that high heritability should lead to relatively high similarity among siblings, and low heritability to low similarity, under polygenicity. When heritability is estimated near 50% and at least 0.1% of the population has high-impact rare aetiology, both tests are well-powered to identify each class of complex tail architecture.

Power to detect complex tail architecture for different heritability levels, de novo and Mendelian frequencies, and sample sizes.

Simulation assumes highly penetrant de novo and Mendelian frequencies of 0.05, 0.1, 0.2, 0.3, and 0.5%. The false-positive rate was set at 0.05. Null simulations (red dashed line) demonstrate tests are well calibrated.

Identifying complex tail architecture in UK Biobank data

We applied our statistical tests for complex tail architecture to sibling-pair data on 18 traits from the UK Biobank (Sudlow et al., 2015). Here (Figure 6), were present results from a set of six traits with varied tail architecture: Sitting Height, Forced Expiratory Volume, Urate, Ankle Spacing, Left Hand Grip Strength, and Waist Circumference. For each trait, we estimated conditional sibling heritability via Equation 5 and performed tests to identify de novo and Mendelian architecture in the lower and upper tails of the distribution of each trait. Additionally, we also performed a Kolmogorov–Smirnov test (Marozzi, 2013) to provide a general test for departures from our null model. We observed expected polygenic architecture in both tails for Ankle Spacing. We inferred Mendelian architecture in the lower tail for Urate and the upper tail for Wait Circumference and Left Hand Grip Strength. De novo architecture was inferred in the lower tail for Forced Expiratory Volume and strongly inferred in both tails for Sitting Height, which is supported by evidence from deep sequencing studies indicating that rare variants play a substantial role in the genetic aetiology for this trait (Tcheandjieu et al., 2020; Bjornsdottir et al., 2022). In the lower tail for Left Hand Grip Strength, we infer the presence of both de novo (greater than expected mean) and Mendelian (more concordant siblings than expected) architecture. We note that this could occur as a result of highly penetrant variants that are only shared among some siblings or perhaps because at the extremes siblings with different handedness are unexpectedly divergent and siblings with matching handedness are unexpectedly concordant. Extended results for all 18 traits analysed can be found in Appendix 4 (Appendix 4—table 1).

Analysis of six UK Biobank traits.

Application of statistical tests for Mendelian and de novo tail architecture to sibling trait data of six UK Biobank traits. For each trait, the conditional sibling mean is plotted under polygenicity (black line) for the heritability estimated from the data. The red (high) and blue (low) bands represent the expected conditional sibling mean under polygenicity at different heritability values. Statistical tests for de novo architecture, Mendelian architecture, and general departure from polygenicity (Kolmogorov–Smirnov test, Dist P-val) were applied to conditional siblings with index siblings in the upper and lower 1% of the distribution. Significant associations for the Mendelian and de novo tests are shown in red and green, respectively. Tail architecture that is not distinct from polygenic expectation is denoted in grey.

Discussion

In this article, we present a novel approach to infer the genetic architecture of continuous traits, specifically in the tails of the distributions, from sibling trait data alone. Our approach is based on a theoretical framework that we develop, which derives the expected trait distributions of siblings conditional on the trait value of an index sibling and the trait heritability, assuming polygenicity, random mating, and environmental factors that are either controlled for or have combined Gaussian effects. The key intuition underlying the approach is that departures from the expected conditional sibling trait distribution in relation to index siblings selected from the trait tails may be due to non-polygenic architecture in the tails.

We demonstrate the validity of our conditional sibling analytical derivations through simulations and show that our tests for identifying de novo and Mendelian architecture in the tails of trait distributions are well-powered when large effect alleles are present in the population on the order of 1 out of 1000 individuals. Applying our test to a subset of well-powered traits in the UK Biobank, we find evidence of complex genetic architecture in at least one tail (α<0.05) in 16 of 18 traits and find de novo architecture occurring more frequently than Mendelian architecture (19 vs 6 of 36 total tails).

There are several areas in which our work could have short-term utility. Firstly, those individuals inferred as having rare variants of large effect could be followed up in multiple ways to gain individual-level insights. For example, they could undergo clinical genetic testing to identify potential pathogenic variants with effects beyond the examined trait, either in the form of diseases or disorders that the individual has already been diagnosed with or else that they have yet to present with but may be at high future risk for. Furthermore, investigation of their environmental risk profile may indicate an alternative – environmental – explanation for their extreme trait value (see below), rather than the rare genetic architecture inferred by our tests.

Our framework could also help refine the design of sequencing studies for identifying rare variants of large effect. Such studies either sequence entire cohorts at relatively high cost (Uffelmann et al., 2021) or else perform more targeted sequencing of individuals in the trait tails with the goal of optimising power per cost (Yang et al., 2015). However, even the latter approach is usually performed blind to evidence of enrichment of rare variant aetiology in the tails. Since our approach enables the identification of individuals that may be most likely to harbour rare variants, then these individuals could be prioritised for (deep) sequencing. Moreover, our ability to distinguish between de novo and Mendelian architecture could influence the broad study design, with the former suggesting that a family trio design may be more effective than population sequencing, which may be favoured if Mendelian architecture is inferred. Furthermore, our approach could be applied as a screening step to prioritise those traits, and corresponding tails, most likely to harbour rare variant architecture. Finally, if sequence data have already been collected, either cohort-wide or using a more targeted design, then our approach could be utilised to increase the power of statistical methods for detecting rare variants by upweighting individuals most likely to harbour rare variants.

This study has several limitations. First and foremost, departures from the expected conditional sibling trait distributions could be due to environmental risk factors, such as medication-use or work-related exposures, rather than rare genetic architecture. Thus, while we believe that tail-specific deviations from polygenic expectation are interesting whether they arise primarily from genetic or environmental factors, we caution against over-interpretation of our results. Rejection of the null hypothesis from our tests should be considered only as indicating effects consistent with non-polygenic genetic architecture, alongside alternative explanations such as tail-specific unshared (de novo) or shared (‘Mendelian’) environmental risks. We suggest that further investigation of individuals’ clinical, environmental, and genetic profiles is required to achieve greater certainty about the causes of their extreme trait values. Nevertheless, given knowledge that rare variants of large effect contribute to complex trait architecture, we expect that traits for which we infer non-polygenic architecture will, on average, be more enriched for rare architecture in the tail(s) than other traits. Secondly, our modelling assumes that the environmental risk factors of siblings are independent of each other. If in fact shared environmental risk factors contribute significantly to trait similarity among siblings, then our heritability estimates will be upwardly biased. However, this would only impact our tests if the degree of shared environmental risk differed in the tails relative to the rest of the trait distribution. Moreover, a large meta-analysis of heritability estimates from twin studies (Polderman et al., 2015) concluded that the contribution of shared environment among siblings (even twins) is insubstantial, and so we might expect this to have a limited impact on results from our tests. Thirdly, our modelling assumes random mating, and so the results from our tests in relation to traits that may be the subject of assortative mating should be considered with caution. Likewise, our modelling assumes additivity of genetic effects, and so, while additivity is well-supported by much statistical genetics research (Wray et al., 2018; Hivert et al., 2021), results from our tests should be reconsidered for any traits with evidence for significant non-additive genetic effects.

Our approach not only provides a novel way of inferring genetic architecture (without genetic data) but can do so specifically in the tails of trait distributions, which are most likely to harbour complex genetic architecture, due to selection, and are a key focus in biomedical research given their enrichment for disease. This work could also have broader implications in quantitative genetics since we derive fundamental results about the relationship among family members’ complex trait values. The conditional sibling trait distribution provides a simple way of understanding the expected trait values of individuals according to their sibling’s trait value, which could be used to answer questions of societal importance and inform future research. For example, it can be used to answer questions such as: as a consequence of genetics alone, how much overlap should there be in the traits of offspring of mid-parents at the 5th and 95th percentile and how does that contrast with what we observe in highly structured societies? Moreover, further development of the theory described here could lead to a range of other applications, for example, estimating levels of assortative mating, inferring historical selection pressures, and quantifying heritability in specific strata of the population.

Appendix 1

Derivation of conditional sibling distributions

Here, we derive the distribution that describes the probability density of the ‘conditional’ sibling (S2) given the genetic liability, trait value, and case status of one or more index siblings. In each case, we assume a population of unrelated parents and rely on the results from the infinitesimal polygenic model that show that within family variance is normally distributed around mid-parent genetic liability (average of parents) with half the ancestral trait variance even when selection, drift, population structure, or dominance effects alter the between family trait distribution (Barton and Keightley, 2002; Barton et al., 2017; Reichenberg et al., 2022).

Case (1) Index liability known, continuous trait (h2=1): P(S2S1=s1)

We begin with the simplest case, a polygenic normally distributed trait which is fully heritable (h2=1), where the genetic liability of an index sibling in a population is known. Throughout we denote the mid-parent, index sibling, and conditional sibling by M,S1 and S2. We begin by calculating the mid-parent distribution conditional on index liability using Bayes’ theorem:

(10) p(M=mS=s)p(S=sM=m)=N(Sm,σ22)N(M0,σ2/2)exp{(sm)2σ2}exp{m2σ2}=N(Ms/2,σ2/4)

Then using the following identity (Barndorff-Nielsen et al., 1982)

(11) N(xαz,σ12)N(zμ,σ22) dz=N(xαμ,σ12+α2σ22)

We calculate the conditional sibling distribution similarly:

(12) p(S2S1=s1)=p(S2,M=mS1=s1)dm=p(S2M=m,S1=s1)p(M=mS1=s1)dm=p(S2M=m)p(M=mS1=s1)dm=N(S2m,σ22)N(M=ms12,σ24)dmp(S2S1=s1)=N(S2s12,3σ24)

Thus, as predicted by the infinitesimal polygenic model the conditional sibling liability is normally distributed around the mid-parent liability distribution with additional variance equal to half the population liability variance.

Case (2) Index trait value known, continuous trait (h21): P(S2S1=s1)

In this case, the primary result considered in this article, a trait z-value, or equivalently, the percentile rank of an index sibling in genome-wide association where the rank-based inverse transformation has been applied (McCaw et al., 2020) is known. Transformation to a Z distribution (σ2=1) means that for heritability h2 the genetic liability and environmental contributions to trait variance are σg2=h2 and σe2=1h2, respectively. Similar to the previous case, we begin by calculating the conditional mid-parent liability from Bayes’ theorem:

(13) p(Mg=mgS1=s1)p(S1=s1Mg=mg)=N(S1=s1mg,1h22)N(Mg=mg0,h2/2)exp{(s1m)22h2}exp{mg2h2}exp{2mg22mgs1h22h2(1h2/2)}exp{(mgs1h2/2)2h2(1h2/2)}=N(Mg=mg12s1h2,12h2(1h2/2))

Then, we again use Equation 11 to derive the distribution conditional sibling distribution:

(14) p(S2=ssS1=s1)=p(S2=s2Mg=mg)p(Mg=mgS1=s1) dmg=N(S2mg,1h22)N(Mg12s1h2,12h2(1h22))dmg=N(S2=s212s1h2,1h44)

Case (3) Multiple index trait values known, continuous trait: P(S3S1=s1,S2=s2)

The conditional sibling distribution can also be derived using the joint trait distribution for related individuals:

yN(0,G)

where the covariance 𝐆 is the genetic relationship matrix. Thus for a sibling pair

p(s1,s2)=N2(0,(1ph2/2ph2/2p1))

As shown by Bernardo and Smith, 1994, if X is multivariate normal N(μ,λ1), where λ=Σ1 is the precision matrix, and X is partitioned into x1 and x2, with corresponding partitions of µ and λ of

μ=(μ1μ2),λ=(λ11λ12λ21λ22)

then the conditional distribution of x1 given x2 is also normal with mean and precision matrix:

(15) μ1λ111λ12(x2μ2),λ11

Thus, given the joint distribution for three siblings:

p(s1,s2,s3)=N3(0,(1h2/2h2/2h2/21h2/2h2/2h2/21))

The precision matrix λ=Σ1 is

λ=12+h2h4p2+h2ph2ph2ph2p2+h2ph2ph2ph2p2+h2

And we can calculate the conditional distribution for two sibling using Equation 15:

p(S3S1=s1,S2=s2)=N(S3h22+h2(s1+s2),1h42+h2)

Case (4) Binary trait: P(S2S1=Affected)

Here, we again assume an underlying distribution that is N(0,1) and made up of genetic and environmental components. However, we only know the index sibling’s status, which, as described under the liability threshold model (Falconer, 1967), is equivalent to conditioning on the event where than index sibling’s trait value is above or below a z-value threshold T:

(16) p(S2|S1=Affected)=p(S2|S1>T)  and  p(S2|S1=                            Affected)=p(S2|S1<T)

where T=Φ1(1K), Φ1 is the inverse normal cumulative distribution function, and K is the incidence of the binary trait in the population. Thus, the conditional distribution one sibling given an Affected index sibling can be calculated integrating over the normal distribution truncated at T:

(17) p(S2S1>T)=Tp(S2S1=s1)p(S1=s1)ds1

The first two moments of this truncated normal (Johnson et al., 1995) are

E(S1S1>T)=ϕ(T)1Φ(T)=ϕ(T)K=i
V(S1S1>T)=1+Tϕ(T)K(ϕ(T)K)2=1+iTi2

Approximating the index sibling distribution using a normal whose moments are taken from this truncated distribution, Equation 17 becomes

(18) p(S2=ss|S1>T)=N(S2=s2|12s1h2,1h44)N(S1=s1|i,1+iTi2)ds1

which can be solved using the identity given in Equation 11:

(19) =N(S2=s212ihx2,1h44+14h4(1+iTi2))=N(S2=s212ih2,1h44i(iT))

Thus, conditional on an Affected sibling, the probability of concordance is

p(S2=AffectedS1=Affected)=p(S2>TS1>T)
=1Φ(Tih2/21h4i(iT)/4)

which is equivalent to Reich’s (Reich et al., 1972) correction to Falconer’s (Falconer, 1965) approximation where the relationship between the relatives is 0.5 for siblings. The probability of discordance given an Unaffected sibling can be calculated from Bayes’ theorem:

p(S2=AffectedS1=                            Affected)=p(S1=                            AffectedS2=Affected)p(S2=Affected)p(S1=                            Affected)=KKp(S2=AffectedS1=Affected)1K

which allow the conditional probability of case status to be determined given an index sibling’s status.

Appendix 2

Statistical tests for complex architecture

Here, we describe our statistical tests for complex tail architecture. Our tests identify changes in the conditional sibling distribution when ascertaining an index sibling in the tail relative to polygenic expectation. To carry out these tests, we establish a null distribution built on the assumption that indexing on siblings not in the tails reduces the likelihood that either sibling phenotype in the pair is driven by rare variants of large effect. We use the region from the 5th to the 95th percentile to estimate heritability. From the n sibling pairs where the index sibling s1(i) is in the 5th to the 95th, we calculate the conditional likelihood Equation 14:

(20) L(D|h2)=inp(s2(i)|s1(i),h2)(1h44)n/2exp(in(s2(i)s1(i)h22)22(1h44))

and maximise the log-likelihood with respect to h2:

(21) logLnlog(1h44)11h44i=1n(s2(i)s1(i)h22)2

to obtain a maximum likelihood estimate for h2 that is used to define the null distribution for our statistical tests.

Statistical test for de novo architecture

We identify de novo mutations of large effect by testing for discordance between siblings relative to the polygenic null using the conditional distribution of a sibling given index sibling from Equation 14. Since de novo mutations typically result in trait values in the tail of the distribution, the test conditions on those index siblings in a specified upper quantile q of the distribution, that is, those sibling pairs (s1(i),s2(i)) such that s1(i)>Φ(q), defined as the set Aq. We introduce an additional parameter α where values of α<0 in the right tail and α>0 in the left tail are indicative of discordant siblings with trait values closer to the mean, giving a log-likelihood:

(22) logL=12(1h44)iAq(s2(i)(12s1(i)h2+α))2

The null hypothesis H0:α=0 is tested via a score test:

(23) U=ddαlogL=11h44iAs2(i)(12s2(i)h2+α)
(24) I=d2dα2logL=n1h44

And the score test for H0:α=0:

(25) z=UI=iAqs2(i)12s2(i)h2n(1h44)

Statistical test for mendelian architecture

Here, we test for excess concordance between siblings in the tails of the distribution by testing for an excess number of observed siblings in the tail S2>Φ1(q) given the index sibling is in the tail S1>Φ1(q), where q is the quantile of interest. Denoting the set of index siblings in the tail by Aq and the size of the set by n, under the null of pologenicity, we calculate the probability that the conditional sibling exceeds Φ1(q) from the normal cdf and compute the mean to define mean concordance under polygenicity π0:

π0=1niAqP(S2(i)>Φ1(q)S1(i)=s1(i))
=1niAq(1Φ(Φ1(q)s1(i)h221h44))

Denoting the observed concordance (number of sibling pairs both >Φ1(q)) by r, the binomial log-likelihood (ignoring the constant) is

logL=rlogπ+(nr)log(1π)U=ddπlogL=rπnr1π=r(1π)(nr)ππ(1π)=rnππ(1π)I=d2dπ2logL=rπ2nr(1π)2=r(1π)2+(nr)π2π2(1π)2

Assuming r=nπ such that I is not a function of any particular observation:

=nπ(1π)

And the score test for H0:π=π0:

z=UI=rnπ0nπ0(1π0)

Statistical test for non-polygenic architecture

General departures from polygenicity can be identified based on the degree of discordance in conditional sibling distribution from the expectation. Assuming polygenicity, given index and conditional sibling pairs (s1(i),s2(i)), from Equation 4, we can write

z2(i)=(s2(i)12s1(i)h2)1h44N(0,1)

Therefore, departures in polygenicity can be tested in trait quantiles by testing the observed distribution of z2(i),iAq, where Aq is the set of index siblings s1(i) in the quantile of interest, relative to a standard normal distribution via the Kolmogorov–Smirnov test.

Appendix 3

Model evaluation

Here, we compare our theoretical derivations (Equation 4) that rely on the infinitesimal polygenic model (Barton et al., 2017) with simulated offspring data (Appendix 3—figure 1). We also compare our model to our empirical simulation (see ‘Model and methods’) that draws allele frequencies and effect sizes from publicly available GWAS data (Neale, 2018) for two traits to produce parent and offspring genotype and genetic liability (equivalent to trait value when h2=1, Appendix 3—figure 2).

Appendix 3—figure 1
Theoretical and simulated conditional expectation and variance in liability (z-score) and rank across index sibling percentiles for conditional sibling, mid-parents, and index siblings.

Simulation drew 1000 parent liability values from,N(0,1); these were randomly paired to produce to mid-parents with liability,mi; two offspring were subsequently drawn from (N(mi,12)) and randomly assigned as index and conditional siblings.

Appendix 3—figure 2
For both Fluid Intelligence and Standing Height, genome-wide association studies (GWAS) variants (on chromosome 1) were used to simulate parent and offspring genotypes and liability values.

Plots show that for both traits the offspring distribution is normal and that the sibling distribution is multivariate normal, in line with our theoretical prediction.

These tests demonstrate that our theoretical framework accurately reflects an additive polygenic trait in an outcrossing population. Additionally, these results demonstrate that deviations in the conditional sibling distribution can be interpreted as non-polygenic architecture or quantile-specific environmental effects.

Appendix 4

Software availability and extended results

We have made our code, sample data, and a brief tutorial available online at https://www.sibarc.net/ (sibArc:Software for Inference of Genetic Architecture, 2024). Below, we display the trait summaries and sibling test results for all 18 UK Biobank traits analysed and referenced in the article.

Appendix 4—table 1
Application to the UK Biobank (extended result – trait summaries).
Trait name (field ID)Sib pairsUnique valuesSkewKurtosisSib-h2 (full)Sib-h2 (5–95)
LH Grip Strength (46)17,174850.3692.7310.320.36
Waist Circumference (48)17,2737160.423.290.490.52
Ankle Spacing (3143)790019280.253.110.670.69
Sitting Height (20015)17,2163520.043.40.730.82
Forced Expiratory Vol (20150)99186620.43.220.490.58
Body Fat % (23099)16,7855770.092.560.530.57
Whole Body Impedance (23106)16,8017710.242.80.570.59
Right Leg Impedance (23107)16,7933500.153.340.550.58
Left Leg Impedance (23108)16,7933520.143.370.540.56
Right Arm Impedance (23109)16,9834750.332.630.510.53
Left Arm Impedance (23110)16,7974550.342.620.510.53
Trunk Fat Percentage (23127)16,780644-0.083.190.510.52
Red Blood Cell Count (30010)16,30734210.113.310.480.53
Haemoglobin Concentration (30020)16,3071119-0.073.360.390.44
Haematocrit Percentage (30030)16,3032872-0.023.350.370.41
Cholesterol (30690)545270280.373.440.470.53
LDL Direct (30780)543254750.373.360.450.5
Urate (30880)543950350.463.160.420.42
Appendix 4—table 2
Application to the UK Biobank (extended result – sibling tests).
Trait name (field ID)TailIdx sib cutoffKS test p valueDe novo obs, expDe novo p valueMendelian obs, expMend p value
LH Grip Strength (46)Upper2.220.09320.53, 0.460.832813, 6.80.0071
Lower−2.815.95e-06−0.22, −0.541.59e-057, 1.90.0012
Waist Circ. (48)Upper2.590.02070.9, 0.790.941416, 5.89.1e-06
Lower−2.250.0559−0.53, −0.640.071512, 8.60.1144
Ankle Spacing (3143)Upper2.410.32660.9, 0.950.29497, 4.90.1593
Lower−2.350.1923−0.99, −0.930.71777, 5.10.1934
Sitting Height (20015)Upper2.362.56e-110.52, 1.194.16e-145, 20.00.9998
Lower−2.583.5e-06−0.63, −1.132.88e-0810, 11.40.6617
FEV (20150)Upper2.360.05940.73, 0.790.05037, 4.40.1532
Lower-2.750.0005−0.42, −0.823.96e-053, 2.20.3044
Body Fat (23099)Upper2.330.57650.88, 0.790.875417, 9.20.0044
Lower−2.550.004−0.65, −0.850.00329, 6.60.1748
Body Imp. (23106)Upper2.40.05650.68, 0.830.01987, 8.90.7465
Lower−2.460.012−0.71, −0.870.020910, 8.20.2566
Right Leg Imp. (23107)Upper2.310.74910.76, 0.780.39199, 9.90.6143
Lower−2.50.0003−0.59, −0.850.00036, 7.20.672
Left Leg Imp. (23108)Upper2.30.13930.83, 0.740.896510, 9.10.3752
Lower−2.50.0093−0.65, −0.810.01815, 6.30.7078
Right Arm Imp. (23109)Upper2.510.0010.58, 0.810.00123, 6.90.9356
Lower−2.370.4972−0.72, −0.770.25210, 8.70.3251
Left Arm Imp (23110)Upper2.540.00230.66, 0.810.03067, 6.10.3548
Lower−2.40.599−0.69, −0.760.176710, 7.90.2183
Trunk Fat % (23127)Upper2.210.58810.68, 0.660.626911, 8.80.2204
Lower−2.510.1247−0.65, −0.740.10569, 5.50.0624
RBC Count (30010)Upper2.290.08920.57, 0.720.02568, 9.20.653
Lower−2.491.58e-09−0.34, -0.821.07e-104, 7.10.8815
Haemoglobin Conc (30020)Upper2.280.02170.49, 0.60.06434, 7.20.8908
Lower−2.594.34e-06−0.32, −0.711.67e-075, 4.40.3878
Haematocrit % (30030)Upper2.290.01040.34, 0.550.00343, 6.40.9135
Lower−2.520.0004−0.35, −0.630.00024, 4.30.5622
Cholesterol (30690)Upper2.280.40590.57, 0.710.14256, 3.10.0479
Lower−2.350.0045−0.37, −0.740.00125, 3.10.1316
LDL Direct (30780)Upper2.30.22620.37, 0.660.00833, 3.00.5098
Lower−2.350.0112−0.51, −0.680.09185, 2.60.0614
Urate (30880)Upper2.410.15660.48, 0.610.1463, 2.00.2314
Lower−2.470.1641−0.74, -0.60.85137, 1.50.001

Data availability

Sibling data used is from UK Biobank (Sudlow et al., 2015) https://www.ukbiobank.ac.uk/ under Application 18177. The software produced, that can be used to carry out our analysis on sibling trait data, is made available at https://www.sibarc.net/ (https://github.com/tadesouaiaia/sibArc copy archived at Souaiaia, 2025).

References

  1. Book
    1. Alberts B
    2. Johnson A
    3. Lewis J
    (2002)
    Molecular Biology of the Cell
    American Society for Cell Biology.
  2. Book
    1. Bernardo JM
    2. Smith AFM
    (1994)
    Bayesian Theory
    Chichester: Wiley.
  3. Book
    1. Johnson NL
    2. Kotz S
    3. Balakrishnan N
    (1995)
    Continuous Univariate Distributions
    John wiley & sons.
    1. Johnson T
    2. Barton N
    (2005) Theoretical models of selection and mutation on quantitative traits
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 360:1411–1425.
    https://doi.org/10.1098/rstb.2005.1667
  4. Website
    1. Neale B
    (2018) Neale Lab Data
    Accessed August 1, 2018.
    1. Yengo L
    2. Vedantam S
    3. Marouli E
    4. Sidorenko J
    5. Bartell E
    6. Sakaue S
    7. Graff M
    8. Eliasen AU
    9. Jiang Y
    10. Raghavan S
    11. Miao J
    12. Arias JD
    13. Graham SE
    14. Mukamel RE
    15. Spracklen CN
    16. Yin X
    17. Chen SH
    18. Ferreira T
    19. Highland HH
    20. Ji Y
    21. Karaderi T
    22. Lin K
    23. Lüll K
    24. Malden DE
    25. Medina-Gomez C
    26. Machado M
    27. Moore A
    28. Rüeger S
    29. Sim X
    30. Vrieze S
    31. Ahluwalia TS
    32. Akiyama M
    33. Allison MA
    34. Alvarez M
    35. Andersen MK
    36. Ani A
    37. Appadurai V
    38. Arbeeva L
    39. Bhaskar S
    40. Bielak LF
    41. Bollepalli S
    42. Bonnycastle LL
    43. Bork-Jensen J
    44. Bradfield JP
    45. Bradford Y
    46. Braund PS
    47. Brody JA
    48. Burgdorf KS
    49. Cade BE
    50. Cai H
    51. Cai Q
    52. Campbell A
    53. Cañadas-Garre M
    54. Catamo E
    55. Chai JF
    56. Chai X
    57. Chang LC
    58. Chang YC
    59. Chen CH
    60. Chesi A
    61. Choi SH
    62. Chung RH
    63. Cocca M
    64. Concas MP
    65. Couture C
    66. Cuellar-Partida G
    67. Danning R
    68. Daw EW
    69. Degenhard F
    70. Delgado GE
    71. Delitala A
    72. Demirkan A
    73. Deng X
    74. Devineni P
    75. Dietl A
    76. Dimitriou M
    77. Dimitrov L
    78. Dorajoo R
    79. Ekici AB
    80. Engmann JE
    81. Fairhurst-Hunter Z
    82. Farmaki AE
    83. Faul JD
    84. Fernandez-Lopez JC
    85. Forer L
    86. Francescatto M
    87. Freitag-Wolf S
    88. Fuchsberger C
    89. Galesloot TE
    90. Gao Y
    91. Gao Z
    92. Geller F
    93. Giannakopoulou O
    94. Giulianini F
    95. Gjesing AP
    96. Goel A
    97. Gordon SD
    98. Gorski M
    99. Grove J
    100. Guo X
    101. Gustafsson S
    102. Haessler J
    103. Hansen TF
    104. Havulinna AS
    105. Haworth SJ
    106. He J
    107. Heard-Costa N
    108. Hebbar P
    109. Hindy G
    110. Ho YLA
    111. Hofer E
    112. Holliday E
    113. Horn K
    114. Hornsby WE
    115. Hottenga JJ
    116. Huang H
    117. Huang J
    118. Huerta-Chagoya A
    119. Huffman JE
    120. Hung YJ
    121. Huo S
    122. Hwang MY
    123. Iha H
    124. Ikeda DD
    125. Isono M
    126. Jackson AU
    127. Jäger S
    128. Jansen IE
    129. Johansson I
    130. Jonas JB
    131. Jonsson A
    132. Jørgensen T
    133. Kalafati IP
    134. Kanai M
    135. Kanoni S
    136. Kårhus LL
    137. Kasturiratne A
    138. Katsuya T
    139. Kawaguchi T
    140. Kember RL
    141. Kentistou KA
    142. Kim HN
    143. Kim YJ
    144. Kleber ME
    145. Knol MJ
    146. Kurbasic A
    147. Lauzon M
    148. Le P
    149. Lea R
    150. Lee JY
    151. Leonard HL
    152. Li SA
    153. Li X
    154. Li X
    155. Liang J
    156. Lin H
    157. Lin SY
    158. Liu J
    159. Liu X
    160. Lo KS
    161. Long J
    162. Lores-Motta L
    163. Luan J
    164. Lyssenko V
    165. Lyytikäinen LP
    166. Mahajan A
    167. Mamakou V
    168. Mangino M
    169. Manichaikul A
    170. Marten J
    171. Mattheisen M
    172. Mavarani L
    173. McDaid AF
    174. Meidtner K
    175. Melendez TL
    176. Mercader JM
    177. Milaneschi Y
    178. Miller JE
    179. Millwood IY
    180. Mishra PP
    181. Mitchell RE
    182. Møllehave LT
    183. Morgan A
    184. Mucha S
    185. Munz M
    186. Nakatochi M
    187. Nelson CP
    188. Nethander M
    189. Nho CW
    190. Nielsen AA
    191. Nolte IM
    192. Nongmaithem SS
    193. Noordam R
    194. Ntalla I
    195. Nutile T
    196. Pandit A
    197. Christofidou P
    198. Pärna K
    199. Pauper M
    200. Petersen ERB
    201. Petersen LV
    202. Pitkänen N
    203. Polašek O
    204. Poveda A
    205. Preuss MH
    206. Pyarajan S
    207. Raffield LM
    208. Rakugi H
    209. Ramirez J
    210. Rasheed A
    211. Raven D
    212. Rayner NW
    213. Riveros C
    214. Rohde R
    215. Ruggiero D
    216. Ruotsalainen SE
    217. Ryan KA
    218. Sabater-Lleal M
    219. Saxena R
    220. Scholz M
    221. Sendamarai A
    222. Shen B
    223. Shi J
    224. Shin JH
    225. Sidore C
    226. Sitlani CM
    227. Slieker RC
    228. Smit RAJ
    229. Smith AV
    230. Smith JA
    231. Smyth LJ
    232. Southam L
    233. Steinthorsdottir V
    234. Sun L
    235. Takeuchi F
    236. Tallapragada DSP
    237. Taylor KD
    238. Tayo BO
    239. Tcheandjieu C
    240. Terzikhan N
    241. Tesolin P
    242. Teumer A
    243. Theusch E
    244. Thompson DJ
    245. Thorleifsson G
    246. Timmers P
    247. Trompet S
    248. Turman C
    249. Vaccargiu S
    250. van der Laan SW
    251. van der Most PJ
    252. van Klinken JB
    253. van Setten J
    254. Verma SS
    255. Verweij N
    256. Veturi Y
    257. Wang CA
    258. Wang C
    259. Wang L
    260. Wang Z
    261. Warren HR
    262. Bin Wei W
    263. Wickremasinghe AR
    264. Wielscher M
    265. Wiggins KL
    266. Winsvold BS
    267. Wong A
    268. Wu Y
    269. Wuttke M
    270. Xia R
    271. Xie T
    272. Yamamoto K
    273. Yang J
    274. Yao J
    275. Young H
    276. Yousri NA
    277. Yu L
    278. Zeng L
    279. Zhang W
    280. Zhang X
    281. Zhao JH
    282. Zhao W
    283. Zhou W
    284. Zimmermann ME
    285. Zoledziewska M
    286. Adair LS
    287. Adams HHH
    288. Aguilar-Salinas CA
    289. Al-Mulla F
    290. Arnett DK
    291. Asselbergs FW
    292. Åsvold BO
    293. Attia J
    294. Banas B
    295. Bandinelli S
    296. Bennett DA
    297. Bergler T
    298. Bharadwaj D
    299. Biino G
    300. Bisgaard H
    301. Boerwinkle E
    302. Böger CA
    303. Bønnelykke K
    304. Boomsma DI
    305. Børglum AD
    306. Borja JB
    307. Bouchard C
    308. Bowden DW
    309. Brandslund I
    310. Brumpton B
    311. Buring JE
    312. Caulfield MJ
    313. Chambers JC
    314. Chandak GR
    315. Chanock SJ
    316. Chaturvedi N
    317. Chen YDI
    318. Chen Z
    319. Cheng CY
    320. Christophersen IE
    321. Ciullo M
    322. Cole JW
    323. Collins FS
    324. Cooper RS
    325. Cruz M
    326. Cucca F
    327. Cupples LA
    328. Cutler MJ
    329. Damrauer SM
    330. Dantoft TM
    331. de Borst GJ
    332. de Groot L
    333. De Jager PL
    334. de Kleijn DPV
    335. Janaka de Silva H
    336. Dedoussis GV
    337. den Hollander AI
    338. Du S
    339. Easton DF
    340. Elders PJM
    341. Eliassen AH
    342. Ellinor PT
    343. Elmståhl S
    344. Erdmann J
    345. Evans MK
    346. Fatkin D
    347. Feenstra B
    348. Feitosa MF
    349. Ferrucci L
    350. Ford I
    351. Fornage M
    352. Franke A
    353. Franks PW
    354. Freedman BI
    355. Gasparini P
    356. Gieger C
    357. Girotto G
    358. Goddard ME
    359. Golightly YM
    360. Gonzalez-Villalpando C
    361. Gordon-Larsen P
    362. Grallert H
    363. Grant SFA
    364. Grarup N
    365. Griffiths L
    366. Gudnason V
    367. Haiman C
    368. Hakonarson H
    369. Hansen T
    370. Hartman CA
    371. Hattersley AT
    372. Hayward C
    373. Heckbert SR
    374. Heng CK
    375. Hengstenberg C
    376. Hewitt AW
    377. Hishigaki H
    378. Hoyng CB
    379. Huang PL
    380. Huang W
    381. Hunt SC
    382. Hveem K
    383. Hyppönen E
    384. Iacono WG
    385. Ichihara S
    386. Ikram MA
    387. Isasi CR
    388. Jackson RD
    389. Jarvelin MR
    390. Jin ZB
    391. Jöckel KH
    392. Joshi PK
    393. Jousilahti P
    394. Jukema JW
    395. Kähönen M
    396. Kamatani Y
    397. Kang KD
    398. Kaprio J
    399. Kardia SLR
    400. Karpe F
    401. Kato N
    402. Kee F
    403. Kessler T
    404. Khera AV
    405. Khor CC
    406. Kiemeney L
    407. Kim BJ
    408. Kim EK
    409. Kim HL
    410. Kirchhof P
    411. Kivimaki M
    412. Koh WP
    413. Koistinen HA
    414. Kolovou GD
    415. Kooner JS
    416. Kooperberg C
    417. Köttgen A
    418. Kovacs P
    419. Kraaijeveld A
    420. Kraft P
    421. Krauss RM
    422. Kumari M
    423. Kutalik Z
    424. Laakso M
    425. Lange LA
    426. Langenberg C
    427. Launer LJ
    428. Le Marchand L
    429. Lee H
    430. Lee NR
    431. Lehtimäki T
    432. Li H
    433. Li L
    434. Lieb W
    435. Lin X
    436. Lind L
    437. Linneberg A
    438. Liu CT
    439. Liu J
    440. Loeffler M
    441. London B
    442. Lubitz SA
    443. Lye SJ
    444. Mackey DA
    445. Mägi R
    446. Magnusson PKE
    447. Marcus GM
    448. Vidal PM
    449. Martin NG
    450. März W
    451. Matsuda F
    452. McGarrah RW
    453. McGue M
    454. McKnight AJ
    455. Medland SE
    456. Mellström D
    457. Metspalu A
    458. Mitchell BD
    459. Mitchell P
    460. Mook-Kanamori DO
    461. Morris AD
    462. Mucci LA
    463. Munroe PB
    464. Nalls MA
    465. Nazarian S
    466. Nelson AE
    467. Neville MJ
    468. Newton-Cheh C
    469. Nielsen CS
    470. Nöthen MM
    471. Ohlsson C
    472. Oldehinkel AJ
    473. Orozco L
    474. Pahkala K
    475. Pajukanta P
    476. Palmer CNA
    477. Parra EJ
    478. Pattaro C
    479. Pedersen O
    480. Pennell CE
    481. Penninx B
    482. Perusse L
    483. Peters A
    484. Peyser PA
    485. Porteous DJ
    486. Posthuma D
    487. Power C
    488. Pramstaller PP
    489. Province MA
    490. Qi Q
    491. Qu J
    492. Rader DJ
    493. Raitakari OT
    494. Ralhan S
    495. Rallidis LS
    496. Rao DC
    497. Redline S
    498. Reilly DF
    499. Reiner AP
    500. Rhee SY
    501. Ridker PM
    502. Rienstra M
    503. Ripatti S
    504. Ritchie MD
    505. Roden DM
    506. Rosendaal FR
    507. Rotter JI
    508. Rudan I
    509. Rutters F
    510. Sabanayagam C
    511. Saleheen D
    512. Salomaa V
    513. Samani NJ
    514. Sanghera DK
    515. Sattar N
    516. Schmidt B
    517. Schmidt H
    518. Schmidt R
    519. Schulze MB
    520. Schunkert H
    521. Scott LJ
    522. Scott RJ
    523. Sever P
    524. Shiroma EJ
    525. Shoemaker MB
    526. Shu XO
    527. Simonsick EM
    528. Sims M
    529. Singh JR
    530. Singleton AB
    531. Sinner MF
    532. Smith JG
    533. Snieder H
    534. Spector TD
    535. Stampfer MJ
    536. Stark KJ
    537. Strachan DP
    538. ’t Hart LM
    539. Tabara Y
    540. Tang H
    541. Tardif JC
    542. Thanaraj TA
    543. Timpson NJ
    544. Tönjes A
    545. Tremblay A
    546. Tuomi T
    547. Tuomilehto J
    548. Tusié-Luna MT
    549. Uitterlinden AG
    550. van Dam RM
    551. van der Harst P
    552. Van der Velde N
    553. van Duijn CM
    554. van Schoor NM
    555. Vitart V
    556. Völker U
    557. Vollenweider P
    558. Völzke H
    559. Wacher-Rodarte NH
    560. Walker M
    561. Wang YX
    562. Wareham NJ
    563. Watanabe RM
    564. Watkins H
    565. Weir DR
    566. Werge TM
    567. Widen E
    568. Wilkens LR
    569. Willemsen G
    570. Willett WC
    571. Wilson JF
    572. Wong TY
    573. Woo JT
    574. Wright AF
    575. Wu JY
    576. Xu H
    577. Yajnik CS
    578. Yokota M
    579. Yuan JM
    580. Zeggini E
    581. Zemel BS
    582. Zheng W
    583. Zhu X
    584. Zmuda JM
    585. Zonderman AB
    586. Zwart JA
    587. 23andMe Research Team
    588. VA Million Veteran Program
    589. DiscovEHR (DiscovEHR and MyCode Community Health Initiative)
    590. eMERGE (Electronic Medical Records and Genomics Network)
    591. Lifelines Cohort Study
    592. PRACTICAL Consortium
    593. Understanding Society Scientific Group
    594. Chasman DI
    595. Cho YS
    596. Heid IM
    597. McCarthy MI
    598. Ng MCY
    599. O’Donnell CJ
    600. Rivadeneira F
    601. Thorsteinsdottir U
    602. Sun YV
    603. Tai ES
    604. Boehnke M
    605. Deloukas P
    606. Justice AE
    607. Lindgren CM
    608. Loos RJF
    609. Mohlke KL
    610. North KE
    611. Stefansson K
    612. Walters RG
    613. Winkler TW
    614. Young KL
    615. Loh PR
    616. Yang J
    617. Esko T
    618. Assimes TL
    619. Auton A
    620. Abecasis GR
    621. Willer CJ
    622. Locke AE
    623. Berndt SI
    624. Lettre G
    625. Frayling TM
    626. Okada Y
    627. Wood AR
    628. Visscher PM
    629. Hirschhorn JN
    (2022) A saturated map of common genetic variants associated with human height
    Nature 610:704–712.
    https://doi.org/10.1038/s41586-022-05275-y

Article and author information

Author details

  1. Tade Souaiaia

    Department of Cellular Biology, SUNY Downstate Health Sciences, Brooklyn, United States
    Contribution
    Conceptualization, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    For correspondence
    tade.souaiaia@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3922-1372
  2. Hei Man Wu

    Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York, United States
    Contribution
    Data curation, Formal analysis, Investigation, Project administration
    Competing interests
    No competing interests declared
  3. Clive Hoggart

    Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York, United States
    Contribution
    Conceptualization, Formal analysis, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Paul F O'Reilly
    For correspondence
    clive.hoggart@mssm.edu
    Competing interests
    No competing interests declared
  4. Paul F O'Reilly

    Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, New York, United States
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Investigation, Writing – original draft, Project administration, Writing – review and editing
    Contributed equally with
    Clive Hoggart
    For correspondence
    paul.oreilly@mssm.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7515-0845

Funding

No external funding was received for this work.

Acknowledgements

We thank the participants in the UK Biobank and the scientists involved in the construction of this resource for making the sibling data used in this article available. The work in this article has been conducted using the UK Biobank Resource under application 18177 (Dr O’Reilly). We would also like to thank Dr. Avi Reichenberg for early discussions and Dr. Peter Visscher for highlighting key references related to the topic, and Dr. Shai Carmi for providing feedback on a draft version of the article.

Version history

  1. Preprint posted:
  2. Sent for peer review:
  3. Reviewed Preprint version 1:
  4. Reviewed Preprint version 2:
  5. Version of Record published:

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.87522. This DOI represents all versions, and will always resolve to the latest one.

Copyright

© 2023, Souaiaia et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 706
    views
  • 34
    downloads
  • 0
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Tade Souaiaia
  2. Hei Man Wu
  3. Clive Hoggart
  4. Paul F O'Reilly
(2025)
Sibling similarity can reveal key insights into genetic architecture
eLife 12:RP87522.
https://doi.org/10.7554/eLife.87522.3

Share this article

https://doi.org/10.7554/eLife.87522

Further reading

    1. Genetics and Genomics
    Thomas J O'Brien, Ida L Barlow ... André EX Brown
    Research Article

    There are thousands of Mendelian diseases with more being discovered weekly and the majority have no approved treatments. To address this need, we require scalable approaches that are relatively inexpensive compared to traditional drug development. In the absence of a validated drug target, phenotypic screening in model organisms provides a route for identifying candidate treatments. Success requires a screenable phenotype. However, the right phenotype and assay may not be obvious for pleiotropic neuromuscular disorders. Here, we show that high-throughput imaging and quantitative phenotyping can be conducted systematically on a panel of C. elegans disease model strains. We used CRISPR genome-editing to create 25 worm models of human Mendelian diseases and phenotyped them using a single standardised assay. All but two strains were significantly different from wild-type controls in at least one feature. The observed phenotypes were diverse, but mutations of genes predicted to have related functions led to similar behavioural differences in worms. As a proof-of-concept, we performed a drug repurposing screen of an FDA-approved compound library, and identified two compounds that rescued the behavioural phenotype of a model of UNC80 deficiency. Our results show that a single assay to measure multiple phenotypes can be applied systematically to diverse Mendelian disease models. The relatively short time and low cost associated with creating and phenotyping multiple strains suggest that high-throughput worm tracking could provide a scalable approach to drug repurposing commensurate with the number of Mendelian diseases.

    1. Genetics and Genomics
    2. Neuroscience
    Timothy J Abreo, Emma C Thompson ... Edward C Cooper
    Research Article

    KCNQ2 variants in children with neurodevelopmental impairment are difficult to assess due to their heterogeneity and unclear pathogenic mechanisms. We describe a child with neonatal-onset epilepsy, developmental impairment of intermediate severity, and KCNQ2 G256W heterozygosity. Analyzing prior KCNQ2 channel cryoelectron microscopy models revealed G256 as a node of an arch-shaped non-covalent bond network linking S5, the pore turret, and the ion path. Co-expression with G256W dominantly suppressed conduction by wild-type subunits in heterologous cells. Ezogabine partly reversed this suppression. Kcnq2G256W/+ mice have epilepsy leading to premature deaths. Hippocampal CA1 pyramidal cells from G256W/+ brain slices showed hyperexcitability. G256W/+ pyramidal cell KCNQ2 and KCNQ3 immunolabeling was significantly shifted from axon initial segments to neuronal somata. Despite normal mRNA levels, G256W/+ mouse KCNQ2 protein levels were reduced by about 50%. Our findings indicate that G256W pathogenicity results from multiplicative effects, including reductions in intrinsic conduction, subcellular targeting, and protein stability. These studies provide evidence for an unexpected and novel role for the KCNQ2 pore turret and introduce a valid animal model of KCNQ2 encephalopathy. Our results, spanning structure to behavior, may be broadly applicable because the majority of KCNQ2 encephalopathy patients share variants near the selectivity filter.