The use of siblings to infer the factors influencing complex traits has been a cornerstone of quantitative genetics. Here we utilise siblings for a novel application: the identification of genetic architecture, specifically that in individuals with extreme trait values (e.g. in the top 1%). Establishing genetic architecture in these individuals is important because they are at greatest risk of disease and are most likely to harbour rare variants of large effect due to natural selection. We develop a theoretical framework that derives expected trait distributions of siblings based on an index sibling’s trait value and trait heritability. This framework is used to develop statistical tests that can infer complex genetic architecture in trait tails, distinguishing between polygenic, de novo and Mendelian tail architecture. We apply our tests to UK Biobank data here, while they can be used to infer genetic architecture in any cohort or health registry that includes siblings, without requiring genetic data. We describe how our approach has the potential to help disentangle the genetic and environmental causes of extreme trait values, to identify individuals likely to carry pathogenic variants for follow-up clinical genetic testing, and to improve the design and power of future sequencing studies to detect rare variants.
The authors present valuable findings on how to determine the genetic architecture of extreme phenotype values by using data on sibling pairs. While the authors' derivations of the method are correct, the scenarios considered are incomplete, making it difficult to have confidence in the interpretation of the results as demonstrating the influence of de-novo or Mendelian (rare, penetrant-variant) architectures. The method shows nevertheless promise and will be of interest to researchers studying complex trait genetics.
The fields of quantitative genetics and genetic epidemiology have exploited the shared genetics and environment of siblings in a range of applications, notably to estimate heritability, using theory first developed a century ago (1, 2), and more recently to infer a so-called ‘household effect’ (3), which contributes to genetic risk indirectly via a correlation between genetics and the household environment. Here we leverage trait information on siblings to infer the genetic architecture of those traits, not only at the population-level, but also in relation to whether the genetic liability of each individual - specifically those with extreme trait values - is a result of a few large effect alleles or many small effect alleles.
The genetic architecture of a complex trait is typically inferred from the findings of multiple studies: with genome-wide association studies (GWAS) identifying common variants (4, 5), whole exome or whole genome sequencing studies detecting rare variants (6), and family sequencing studies designed to identify de novo and rare Mendelian mutations (7). The relative contribution of each type of variant to trait heritability is a function of historical selection pressures on the trait in the population (8, 9). If selection has recently acted to increase the average value of a trait, then the lower tail of the trait distribution will be subject to negative selection and may be enriched for large effect rare variants (10), while if the trait is subject to stabilising selection, then both tails of the trait distribution may be enriched for rare variant aetiology (11). This can result in less accurate polygenic scores in the tails of the trait distribution (12), but can also produce dissimilarity between siblings beyond what is expected under polygenecity. For example, studies on the intellectual ability of sibling pairs have demonstrated similarity for average intellectual ability (13), regression-to-the-mean for siblings at the upper tail of the distribution (13), and complete discordance when one sibling is at the lower extreme tail of the distribution (14). Findings from such studies such as Reichenberg et al. 2016 (14) are are consistent with the presence of de novo alleles of large effect in the trait tails, alongside alternative explanations such as specific environmental exposures. However, no theoretical framework has been developed to formally infer genetic architecture from sibling trait data.
We introduce a theoretical framework that allows widely available sibling trait data in population cohorts and health registries to be leveraged to perform statistical tests that estimate complex genetic architecture in the tails of the trait distribution. These tests can differentiate between polygenic, de novo alleles and rare variants of large effect (hereafter ‘Mendelian variants’ for shorthand). This framework establishes expectations about the trait distributions of siblings of index individuals with extreme trait values (e.g. the top 1% of the trait) according to polygenic and non-polygenic tail trait architecture (see Figure 1).
Critical to our framework is our derivation of the “conditional-sibling trait distribution”, which describes the trait distribution for one individual given the quantile value of one or more “index” siblings. Our statistical framework, derivation of the conditional-sibling trait distribution, and simulation study, allow us to develop statistical tests to infer genetic architecture from sibling registries (15) without the need for genetic data (see Model & Methods). We validate the statistical power of our tests using simulated data and real data from the UK Biobank. Our novel framework can be extended to applications such as estimating heritability, inferring assortative mating, and characterising historical selection pressures.
Model & Methods
Here we outline a framework that models the conditional-sibling trait distribution which describes an individual’s trait distribution conditioned on one or more index sibling(s). In this section we describe our derivation of the distribution for a completely polygenic trait, outline development of statistical tests that detect complex genetic architecture via departure polygenicity, and describe the simulation scheme used to validate and benchmark our results. Full derivations that complement this high-level summary of our methodology are included in the Appendices.
Conditional-Sibling Inference Framework
Assuming completely polygenic architecture, siblings of individuals with extreme trait values are expected to be much less extreme. This regression-to-the-mean can be understood through the two factors (16, 17) that determine inherited genetic liability: (i) average genetic liability of the parents, known as “midparent” liability, and, (ii) random genetic reassortment occurring during meiosis. Assuming that an individual presenting an upper-tail trait value does so due to both high midparent liability and genetic reassortment favouring a higher value, then a sibling sharing common midparent liability - but subject to independent reassortment - is likely less extreme. How much less extreme can be derived by first considering the conditional distribution that relates midparents and their offspring. In the simplest case, for a completely heritable continuous polygenic trait in a large randomly mating population (18), the offspring trait value, s, is normally distributed around the midparent trait value, m, as follows:
where σ2 is the population trait variance. Note that the trait variance within families is half that of the population trait variance under neutrality and random mating, which is a key property in quantitative genetics (19–21). For a trait with heritability h2, trait variance can be partitioned into genetic, , and environmental, , contributions, such that and . Assuming meancentred genetic and environmental trait contributions, and a trait variance of 1 (a standard normalised trait), the distribution for offspring conditioned on the midparent genetic liability, mg, is:
Since p(mg) ∼ 𝒩(0, h2/2), from Bayes’ Theorem it can be shown that the midparent liability conditional on offspring trait value, s, is distributed as follows:
Here we note the relative simplicity of this distribution, which means, for example, that under complete heritability, the siblings of an individual with a standard normal trait value of z, will have a mean trait value of , with variance equal to three quarters of the population variance. In Results (Figure 4) we illustrate how trait heritability and index quantile determine the conditional-sibling distribution in standard trait space and in population percentile space. In Appendix 1 we provide a full derivation of this distribution (Equation 4) and generalize our analytical results across a range of scenarios, including binary phenotypes and multiple index siblings, increasing the utility of this framework for further applications and theoretical development.
Statistical Tests for Complex Tail Architecture
In Figure 2, the strategy employed to develop statistical tests for complex tail architecture is depicted. Our approach corresponds to testing for deviations from the expected conditional-sibling trait distribution under the null hypothesis of polygenicity in the trait tails: excess discordance is indicative of an enrichment of de novo mutations, while excess concordance indicates an enrichment of Mendelian variants, i.e. large effect variants segregating in the population. The heritability, h2, required to define the null distribution, is estimated by maximising the log-likelihood of the conditional-sibling trait distribution (Equation 4) with respect to h2:
where s1 and s2 represent index and conditional-sibling trait values, respectively, and n is the number of sibling pairs. This estimation method allows h2 to be estimated for given quantiles of the trait distribution by restricting sibling pair observations to those index siblings in the quantile of interest. To maximise power to detect non-polygenic architecture in the tails of the trait distribution, we estimate “polygenic heritability” from sibling pairs for which the index sibling trait value is between the 5th and 95th percentile (labeled “Distribution Body” in Figure 2). Tests for complex architecture are then performed in relation to index siblings whose trait values are in the tails of the distribution (e.g. the lower and upper 1%). Below, Aq denotes the set of sibling pairs for which the index sibling is in quantile q such that s1 > Φ(q) and s1 < Φ(q) for the upper and lower tails, respectively, where Φ−1 is the inverse normal cumulative distribution function.
Statistical Test for De Novo Architecture
To identify de novo architecture in the tails of the trait distribution, we introduce a parameter, α, to the log-likelihood defined by the conditional-sibling trait distribution Equation 5:
Values of α > 0 in the lower tail and α < 0 in the upper tail indicate excess regression-to-the-mean and, thus, high sibling discordance, consistent with an enrichment of de novo mutations among the index siblings. The z-statistic of the one-sided score test for α > 0 in the lower quantile, q, relative to the null of α = 0 is (see Appendix 2 for derivation):
For the upper tail test of α < 0, the above is multiplied by -1.
Statistical Test for Mendelian Architecture
To identify Mendelian architecture in the tails of the trait distribution, we compare the observed and expected tail sibling concordance, defined by the number of sibling pairs for which both siblings have trait values in the tail. For each index sibling in Aq, we calculate the probability that the conditional sibling is also in Aq, which, for the upper tail, is given by:
where Φ represents the normal cumulative distribution function. Denoting the mean of across all index siblings in Aq by πo, the expected sibling concordance is nπo where n is the number of index siblings in Aq. Given an observed number of concordant siblings r, the z-statistic for a one-sided score test for excess concordance is (see Appendix 2 for derivation) given by:
Simulation of Conditional Sibling Data
We perform simulations using publicly available GWAS data on multiple traits to validate our analytical model and to benchmark our tests for complex architecture. Figure 3 depicts the different stages of our simulation procedure. We start by simulating a “parent population” (step A), assigning genotypes based on the allele frequencies of the first 100k SNPs from a trait GWAS. Additive parent liability can then be calculated based on the genotype effect size distribution of the GWAS. Next, parents are randomly paired, their liabilities averaged to produce midparent trait values (step B), and genotypes of two offspring (Equation 1) and corresponding genetic liabilities, G, are calculated (step C) assuming independent reassortment of parental alleles and unlinked SNPs.
In step D, we generate offspring trait values for different degrees of heritability by adding an environmental random effect. For heritability, h2, offspring trait values are given by T = hG + E, where the environmental effect E is drawn from a normal distribution with mean 0 and variance (1− h2). The simulated trait has a 𝒩(0, 1) distribution and the correlation between the genetic liability and trait is equal to the heritability.
In step E, we simulate the effect of complex tail architecture on the conditional-sibling trait distribution. We assume that rare variants are sufficiently penetrant to move individuals into the tails of the distribution independent of polygenic liability. We modify sibling trait values for individuals already in the tails (from Step D) to minimise perturbation of the trait distribution. We simulate de novo tail architecture by resampling the less extreme sibling from the background distribution, and simulate Mendelian tail architecture by resampling the less extreme sibling from the background distribution with probability 0.5 and from the same tail as the extreme sibling with probability 0.5.
Application to UK Biobank Data
For the UK Biobank analyses, we used six continuous traits (Body Fat, Mean Corpuscular Haemoglobin, Neuroticism, Hell Bone Mineral Density, Monocyte Count, and Sitting Height) with (22) and data on > 4,500 sibling pairs (sibling-pairs defined as having kinship coefficient 0.18 - 0.35 and > 0.1% SNPs with 0 IBD to distinguish from parent-offspring (23, 24). Outliers with absolute trait value > 6 standard deviations from the mean were removed and then cohort-wide trait values were standardised using a rank-based inverse normal transformation and adjusted for age, sex, recruitment centre, batch covariates and the first 40 principle components. The sub-sample corresponding to the sibling pairs was then re-normalised, and for each sibling pair one was randomly assigned as the index sibling and the other the conditional-sibling. Sibling pairs were then sorted by their index trait value and each sibling binned according to trait percentile rank among all siblings.
Here we illustrate the conditional-sibling trait distribution, validate the accuracy our analytical model using simulation, perform power analyses for our statistical tests for complex genetic tail architecture (see Model & Methods), and apply our tests to trait data on thousands of siblings from the UK Biobank.
Conditional-Sibling Trait Distribution
In Figure 4:A the conditional-sibling trait distribution (Equation 4) is illustrated at different index sibling trait values (ranked percentiles). For an almost entirely heritable polygenic trait (orange), siblings of individuals at the 99th percentile (z = 2.32) have mean z-scores approximately halfway between the population mean and index mean (i.e. z = 1.1). This regression-to-the-mean is greater when trait heritability is lower (blue), assuming (as here) independent environmental risk among siblings.
In Figure 4:B the conditional-sibling z-distribution is transformed into percentiles for interpretation in rank space. This distribution is skewed, especially at the tails, due to truncation at extreme quantiles (i.e. siblings cannot be more extreme than the top 1%). For a trait with h2 = 0.95, siblings of individuals at the 99th percentile (z=2.32) have a mean trait value at the 80th percentile. Note that this is less extreme than the result of transforming their expected z-value into percentile space (Φ−1(1.1) = 86%), which is a consequence of Jensen’s inequality (25) given that the inverse cumulative distribution functional of the normal distribution is convex above zero.
We compared the theoretical conditional-sibling trait distributions to those generated from simulated data (see Appendix 3) and found that irrespective of trait used to simulate data (e.g. fluid intelligence, height) the two distributions did not differ significantly, suggesting that our analytically derived distributions are a valid model for the conditional-sibling trait distribution (Equation 4).
Power of Statistical Tests to Identify Complex Tail Architecture
Building on the theoretical framework introduced in the Model & Methods and illustrated in the previous section, we develop statistical tests to identify complex architecture in the tails of the trait distribution. These tests leverage the fact that the similarity (or dissimilarity) in trait values among siblings provides information about the genetic architecture underlying the trait (see Figure 1). For example, high-impact de novo mutations generate large dissimilarity between siblings when only one carries the unique mutant allele, while Mendelian variants can create excess similarity in the tails of the distribution when siblings share both inherit the same mutant allele.
In Appendix 2, we provide detailed derivations for the statistical tests described at a high level in Model & Methods and explain how they identify tail signatures in contrast to a polygenic background where conditional siblings regress-to-the-mean at a rate proportional to h2/2 (Equation 4). The performance of the two tests evaluated using simulated sibling data is shown in Figure 5. These tests demonstrate that power to identify de novo architecture is greatest when heritability is high, while power to identify Mendelian architecture is greatest when heritability is low. These patterns can be explained by the fact that high heritability should lead to relatively high similarity among siblings, and low heritability to low similarity, under polygenicity. When heritability is estimated near 50% and at least 0.1% of the population has high-impact rare aetiology, both tests are well-powered to identify each class of complex tail architecture.
Identifying Complex Tail Architecture in UK Biobank Data
We applied our two statistical tests of complex tail architecture to sibling-pair data on six traits from the UK Biobank (26) to illustrate the performance of our tests on real data. For each of the six traits, we tested the trait distribution for normality, estimated the (polygenic) heritability in the body of the trait distribution via Equation 5 (computed between the 5th and 95th percentiles), and performed tests to identify de novo and Mendelian architecture in the lower and upper tails of the distribution of each trait. We identified lower tail de novo architecture in Heel Bone Mineral Density and Monocyte Count, upper tail de novo architecture in Mean Corpuscular Haemoglobin and de novo architecture at both ends of the distribution in Sitting Height. Upper tail Mendelian architecture was identified in Body Fat and no complex tail architecture was detected for Neuroticism. These results support evidence from deep sequencing studies that indicate that rare variants play a substantial role in the genetic aetiology for Sitting Height (27, 28) and Heel Bone Mineral Density (29).
In this paper, we present a novel approach to infer the genetic architecture of continuous traits, specifically in the tails of their distributions, from sibling trait data alone. Our approach is based on a theoretical framework that we develop, which derives the expected trait distributions of siblings conditional on the trait value of an index-sibling and the trait heritability, assuming polygenicity. The key intuition underlying the approach is that departures from the expected conditional-sibling trait distribution in relation to index-siblings selected from the trait tails may be due to non-polygenic architecture in the tails.
We demonstrate the validity of our conditional-sibling analytical derivations through simulations and show that our tests for identifying de novo and Mendelian architecture in the tails of trait distributions are well-powered when high-impact alleles are present in the population on the order of 1 out of 1000 individuals. We apply our tests to six traits using UK Biobank data and find evidence for de novo architecture in the distribution tails of heel bone mineral density, monocyte count, mean corpuscular hemoglobin and sitting height, as well as Mendelian architecture in the upper tail of body fat.
There are several areas in which our work could have short-term utility. Firstly, those individuals inferred as having rare variants of high-impact could be followed up in multiple ways to gain individual-level insights. For example, they could undergo clinical genetic testing to identify potential pathogenic variants with effects beyond the examined trait, either in the form of diseases or disorders that the individual has already been diagnosed with or else that they have yet to present with but may be at high future risk for. Furthermore, investigation of their environmental risk profile may indicate an alternative - environmental - explanation for their extreme trait value (see below), rather than the rare genetic architecture inferred by our tests.
Our framework could also help to refine the design of sequencing studies for identifying rare variants of large effect. Such studies either sequence entire cohorts at relatively high cost (30) or else perform more targeted sequencing of individuals in the trait tails with the goal of optimising power per cost (31). However, even the latter approach is usually performed blind to evidence of enrichment of rare variant aetiology in the tails. Since our approach enables identification of individuals most likely to harbour rare variants, then these individuals could be prioritised for (deep) sequencing. Moreover, our ability to distinguish between de novo and Mendelian architecture could influence the broad study design, with the former suggesting that a family trio design may be more effective than population sequencing, which may be favoured if Mendelian architecture is inferred. Furthermore, our approach could be applied as a screening step to prioritise those traits, and corresponding tails, most likely to harbour rare variant architecture. Finally, if sequence data have already been collected, either cohort-wide or using a more targeted design, then our approach could be utilised to increase the power of statistical methods for detecting rare variants by upweighting individuals most likely to harbour rare variants.
This study has several limitations. First and foremost, departures from the expected conditional-sibling trait distributions could be due to environmental risk factors, such as medication-use or work related exposures, rather than rare genetic architecture. For this reason, rejections of the null hypothesis from our tests should be considered only as indicating effects consistent with non-polygenic genetic architecture, alongside alternative explanations such as tail-specific environmental risks. We suggest that further investigation of individuals’ clinical, environmental and genetic profiles are required to achieve greater certainty about the causes of their extreme trait values. Nevertheless, given knowledge that rare variants of high-impact contribute to complex trait architecture, then we expect that traits for which we infer non-polygenic architecture, will, on average, be more enriched for rare architecture in the tail(s) than other traits. Secondly, our modelling assumes that environmental risk factors of siblings are independent of each other. If in fact shared environmental risk factors contribute significantly to trait similarity among siblings, then our heritability estimates will be upwardly biased. However, this would only impact our tests if the degree of shared environmental risk differed in the trail tails relative to the rest of the trait distribution. Moreover, a large meta-analysis of heritability estimates from twin studies (32) concluded that the contribution of shared environment among siblings (even twins) is insubstantial, and so we might expect this to have limited impact on results from our tests. Thirdly, our modelling assumes random mating and so the results from our tests in relation to traits that may be the subject of assortative mating should be considered with caution. Likewise, our modelling assumes additivity of genetic effects and so, while additivity is well-supported by much statistical genetics research (33), results from our tests should be reconsidered for any traits with evidence for significant non-additive genetic effects.
Our approach not only provides a novel way of inferring genetic architecture (without genetic data) but can do so specifically in the tails of trait distributions, which are most likely to harbour complex genetic architecture, due to selection, and are a key focus in biomedical research given their enrichment for disease. This work could also have broader implications in quantitative genetics since we derive fundamental results about the relationship among family members’ complex trait values. The conditional-sibling trait distribution provides a simple way of understanding the expected trait values of individuals according to their sibling’s trait value, which could be used to answer questions of societal importance and inform future research. For example, it can be used to answer questions such as: as a consequence of genetics alone, how much overlap should there be in the traits of offspring of midparents at the 5th and 95th percentile and how does that contrast with what we observe in highly structured societies? Moreover, further development of the theory described here could lead to a range of other applications, for example, estimating levels of assortative mating, inferring historical selection pressures, and quantifying heritability in specific strata of the population.
We thank the participants in the UK Biobank and the scientists involved in the construction of this resource for making the sibling data used in this manuscript available. The work in this manuscript has been conducted using the UK Biobank Resource under application 18177 (Dr O’Reilly). We would also like to thank Dr. Peter Visscher for highlighting some important references.
Appendix 1 Derivation of Conditional Sibling Distributions
Here we derive the distribution that describes the probability density of the “conditional” sibling (S2) given the genetic liability, z-value, case-status, or rank of one or more index siblings relative to the population. In each case we assume a population of unrelated parents and rely on the results from the infinitesimal polygenic model that show that within family variance is normally distributed around midparent genetic liability (average of parents) with half the ancestral trait variance even when selection, drift, population structure or dominance effects alter the between family trait distribution (18, 34, 35).
Case 1) Index Liability Known, Continuous Trait (h2 = 1): P(S2 | S1 = s1)
We begin with the simplest case, a polygenic normally distributed trait which is fully heritable (h2 = 1) where the genetic liability of an index sibling in a population is known. Throughout we denote the midparent, index sibling and conditional sibling by M, S1 and S2. We begin by calculating the midparent distribution conditional on index liability using Bayes theorem:
Then using the following identity (36):
We calculate the conditional sibling distribution similarly:
Thus, as predicted by the infinitesimal polygenic model the conditional sibling liability is normally distributed around the midparent liability distribution with additional variance equal to half the population liability variance.
Case 2) Index Trait Value, Continuous Trait (h2 ≠ 1): P(S2 | S1 = s1)
In this case, the primary result considered in this manuscript, a trait z-value, or equivalently, the percentile rank of an index sibling in genome wide association where the rank-based inverse transformation has been applied (37) is known. Transformation to a Z distribution (Σ2 = 1) means that for heritability h2 the genetic liability and environmental contributions to trait variance are are and , respectively. Similar to the previous case we begin by calculating the conditional midparent liability from Bayes theorem:
Then, we again use Equation 11 to derive the distribution conditional sibling distribution:
Case 3) Multiple Index Trait Values, Continuous Trait: P(S3 | S1 = s1, S2 = s2)
The previous case can also be derived using the joint trait distribution for related individuals:
where the covariance G is the genetic relationship matrix. Thus for a sibling pair
As shown by Bernardo and Smith (38), if X is multivariate normal 𝒩(μ, λ−1), where λ = Σ−1 is the precision matrix, and X is partitioned into x1 and x2, with corresponding partitions of μ and λ of:
then the conditional distribution of x1 given x2 is also normal with mean and precision matrix:
Thus, given the joint distribution for three siblings:
The precision matrix λ = Σ−1 is
And we can calculate the conditional distribution for for two sibling using Equation 15
Case 4) Binary Trait: P(S2 | S1 = Affected)
Here we again assume an underlying distribution that is 𝒩(0, 1) and made up of genetic and environmental components. However, we only know the index sibling’s status, which, as described under the liability threshold model (39), is equivalent to conditioning on the event where than index sibling’s trait value is above or below a z-value threshold T :
where T = Φ−1(1 − K), Φ−1 is the inverse normal cumulative distribution function, and K is the incidence of the binary trait in the population. Thus, the conditional distribution one sibling given an Affected index sibling can be can be calculated integrating over the normal distribution truncated at T :
The first two moments of the this truncated normal (40) are:
Approximating the index sibling distribution using a normal whose moments are taken from this truncated distribution, Equation 17 becomes:
which can be solved using the identity given in Equation 11:
Thus, conditional on an Affected sibling, the probability of concordance is:
The probability that of discordance given an Unaffected sibling can be calculated from Bayes Theorem:
which allow the conditional probability of case status to be determined given a index sibling’s status.
Appendix 2 Statistical Tests for Complex Architecture
Here we describe our statistical tests for complex tail architecture. Our tests identify changes in the conditional sibling distribution when ascertaining on an index sibling in the tail relative to polygenic expectation. To carry out these tests we establish a null distribution built on the assumption that indexing on siblings not in the tails reduces that likelihood that either sibling phenotype in the pair is driven by rare variants of large effect. We use the region from the 5th to the 95th percentile to estimate heritability. From the n sibling pairs where the index sibling is in the 5th to the 95th, we calculate the conditional likelihood Equation 14:
and maximize the log-likelihood with respect to h2:
to obtain an maximum likelihood estimate for h2 that is used to define the null distribution for our statistical tests.
Statistical Test for De Novo Architecture
We identify de novo mutations of large effect by testing for discordance between sibs relative to the polygenic null using the conditional distribution of a sib given index sib from (Equation 14). Since de novo mutations typically result in trait values in the tail of the distribution the test conditions on those index sibs in a specified upper quantile q of the distribution, i.e. those sib pairs such that , defined as the set Aq. We introduce an additional parameter α where values of α < 0 in the right tail and α > 0 in the left tail are indicative of discordant sibs with trait values closer to the mean, giving a log-likelihood:
The null hypothesis H0 : α = 0 is tested via a score test:
And the score test for H0 : α = 0:
Statistical Test for Mendelian Architecture
Here we test for excess concordance between sibs in the tails of the distribution by testing for an excess number of observed siblings in the tail S2 > Φ−1(q) given the index sib is in the tail S1 > Φ−1(q), where q is the quantile of interest. Denoting the set of index sibs in the tail by Aq and the size of the set by n, under the null of pologenicity we calculate the probability that the conditional sibling exceeds Φ−1(q) from the normal cdf and compute the mean to define mean concordance under polygenicity π0:
Denoting the observed concordance (number of sibling pairs both > Φ−1(q)) by r, the binomial log-likelihood (ignoring the constant) is:
Assuming r = nπ such that I is not a function of any particular observation:
And the score test for H0 : π = π0:
Appendix 3 Model Evaluation
Here we compare our theoretical derivations (Equation 4) that rely on the infinitesimal polygenic model (18) with simulated offspring data (Figure 7). We also compare our model to our empirical simulation (see Model & Methods) that draws allele frequencies and effect sizes from publicly available GWAS data (43) for two traits to produce parent and offspring genotype and genetic liability (equivalent to trait value when h2 = 1) (Figure 8).
These tests demonstrate that our theoretical framework accurately reflects an additive polygenic trait in an outcrossing population. Additionally these results demonstrate that deviations in the conditional sibling distribution can be interpreted as non-polygenic architecture or quantile specific environmental effects.
- 1.Xv.—the correlation between relatives on the supposition of mendelian inheritanceEarth and Envi-ronmental Science Transactions of the Royal Society of Edinburgh 52:399–433
- 2.Intra-sire correlations or regressions of offspring on dam as a method of estimating heritability of charac-teristicsJournal of animal science 1940:293–301
- 3.Evidence for gene-environment correlation in child feeding: Links between common genetic variation for bmi in children and parental feeding practicesPLoS genetics 14
- 4.A saturated map of common genetic variants associated with human heightNature 610:704–712
- 5.Whole genome sequence analysis of blood lipid levels in> 66,000 individualsNature communications 13
- 6.Rare coding variants in ten genes confer substantial risk for schizophre-niaNature 604:509–516
- 7.Genetic origins of schizophrenia find common groundNature 604https://doi.org/10.1038/d41586-022-00773-5
- 8.Evolutionary evidence of the effect of rare variants on disease etiologyClinical genetics 79:199–206
- 9.Population genetics of rare variants and complex diseasesHuman heredity 74:118–128
- 10.Evolutionary perspectives on polygenic selection, missing heritability, and gwasHuman genetics 139:5–21
- 11.Unique roles of rare variants in the genetics of complex diseases in humansJournal of human genetics 66:11–23
- 12.Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disordersNature genetics 49:978–985
- 13.Thinking positively: the genetics of high intelligenceIntelligence 48:123–132
- 14.Discontinuity in the genetic and environmental causes of the intellectual disability spectrumProceedings of the National Academy of sciences 113:1098–1103
- 15.The nigerian twin and sibling registryTwin Research and Human Genetics 16:282–284
- 16.Quantitative genetics
- 17.MeiosisIn Molecular Biology of the Cell
- 18.The infinitesimal model: Definition, derivation, and implicationsTheoretical population biology 118:50–73
- 19.Galton’s law of ancestral heredityHeredity 81:579–585
- 20.Theoretical models of selection and mutation on quantitative traitsPhilosophical Transactions of the Royal Society B: Biological Sciences 360:1411–1425
- 21.Risk in relatives, heritability, snp-based heritability, and genetic correlations in psychiatric disorders: a reviewBiological Psychiatry 89:11–19
- 22.Estimation of genetic correlation via linkage disequilibrium score regression and genomic restricted maximum likelihoodThe American Journal of Human Genetics 102:1185–1194
- 23.Familial influences on neuroticism and education in the uk biobankBehavior genetics 50:84–93
- 24.The uk biobank resource with deep phenotyping and genomic dataNature 562:203–209
- 25.Sur les fonctions convexes et les inégalités entre les valeurs moyennesActa mathematica 30:175–193
- 26.Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old agePLoS medicine 12
- 27.A phenome-wide association study of 26 mendelian genes reveals phenotypic expressivity of common and rare variants within the general populationPLoS genetics 16
- 28.Rare slc13a1 variants associate with intervertebral disc disorder highlighting role of sulfate in disc pathologyNature communications 13:1–13
- 29.Identification of 153 new loci associated with heel bone mineral density and functional involvement of gpc6 in osteoporosisNature genetics 49:1468–1475
- 30.Genome-wide association studiesNature Reviews Methods Primers 1
- 31.Extreme-phenotype genome-wide association study (xp-gwas): a method for identifying trait-associated variants by sequencing pools of individuals selected from a diversity panelThe Plant Journal 84:587–596
- 32.Meta-analysis of the heritability of human traits based on fifty years of twin studiesNature genetics 47:702–709
- 33.Common disease is more complex than implied by the core gene omnigenic modelCell 173:1573–1580
- 34.Understanding quantitative genetic variationNature Reviews Genetics 3:11–21
- 35.The “new synthesis”Proceedings of the National Academy of Sciences 119
- 36.Normal variance-mean mixtures and z distributionsInternational Statistical Review/Revue Internationale de Statistique :145–159
- 37.Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studiesBiometrics 76:1262–1272
- 38.Bayesian Theory
- 39.The inheritance of liability to diseases with variable age of onset, with particular reference to diabetes mellitusAnnals of human genetics 31:1–20
- 40.Continuous univariate distributions, volume 2
- 41.The use of multiple thresholds in determining the mode of transmission of semicontinuous traitsAnnals of human genetics 36:163–184
- 42.The inheritance of liability to certain diseases, estimated from the incidence among relativesAnnals of human genetics 29:51–76
- 43.Benjamin Neale. Neale lab data: http://www.nealelab.is/uk-biobank/, 2018.