Sibling similarity can reveal key insights into genetic architecture

  1. Tade Souaiaia  Is a corresponding author
  2. Hei Man Wu
  3. Clive Hoggart  Is a corresponding author
  4. Paul F O'Reilly  Is a corresponding author
  1. Department of Cellular Biology, SUNY Downstate Health Sciences, United States
  2. Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, United States
8 figures, 2 tables and 1 additional file

Figures

Sibling similarity under different tail architectures.

Left to right: When an individual’s extreme trait value (top 1%) is due to many alleles of small effect (‘polygenic’), then their siblings’ trait values are expected to show regression-to-the-mean (grey). When an individual’s extreme trait value is due to a de novo mutation of large effect, then their siblings are expected to have trait values that correspond to the background distribution (green). When an individual’s extreme trait value is the result of an inherited rare allele of large effect (‘Mendelian’), then their siblings are expected to have either similarly extreme trait values or trait values that are drawn from the background distribution (red), depending on whether or not they inherited the same large effect allele.

Identifying complex tail genetic architecture.

Conditional sibling z-values plotted against index sibling quantiles. Grey depicts complete polygenic architecture across index sibling values. In the lower tail, an extreme scenario of de novo architecture is shown in green, resulting in sibling discordance. In the upper tail, extreme Mendelian architecture is shown in red, whereby siblings are half concordant and half discordant, resulting in a bimodal conditional sibling trait distribution. Statistical tests to infer each type of complex tail architecture are designed to exploit these expected trait distributions.

Simulation schematic.

Publicly available genome-wide association studies (GWAS) allele frequency and effect size data is used to simulate parent genetic trait value (A). Mid-parent genetic trait value (B) is simulated assuming random mating. Offspring genotype and genetic trait value (C) is simulated assuming complete recombination. Environmental variation (D) is added to compare with theoretical polygenic conditional sibling distribution. De novo and Mendelian rare-variant effects are simulated (E) to benchmark tests for complex architecture (F).

Conditional sibling trait distribution under polygenic architecture.

(A) The conditional sibling trait distribution according to Equation 4 for index siblings at the 1st, 25th, 50th, 75th, and 99th percentile of the standardised trait distribution, when heritability is high (h2=0.95, in orange) and moderate (h2=0.5, in blue). When heritability is 0.95, conditional sibling expectation is almost half of the index sibling z-score; when heritability is 0.5, the conditional sibling expectation is equal to 1/4 of the index sibling z-score. (B) The conditional distribution transformed into rank space. An individual whose sibling is at the 99% percentile is expected to have a trait value in the 80% percentile when heritability is high and in the 67% percentile when heritability is moderate.

Power to detect complex tail architecture for different heritability levels, de novo and Mendelian frequencies, and sample sizes.

Simulation assumes highly penetrant de novo and Mendelian frequencies of 0.05, 0.1, 0.2, 0.3, and 0.5%. The false-positive rate was set at 0.05. Null simulations (red dashed line) demonstrate tests are well calibrated.

Analysis of six UK Biobank traits.

Application of statistical tests for Mendelian and de novo tail architecture to sibling trait data of six UK Biobank traits. For each trait, the conditional sibling mean is plotted under polygenicity (black line) for the heritability estimated from the data. The red (high) and blue (low) bands represent the expected conditional sibling mean under polygenicity at different heritability values. Statistical tests for de novo architecture, Mendelian architecture, and general departure from polygenicity (Kolmogorov–Smirnov test, Dist P-val) were applied to conditional siblings with index siblings in the upper and lower 1% of the distribution. Significant associations for the Mendelian and de novo tests are shown in red and green, respectively. Tail architecture that is not distinct from polygenic expectation is denoted in grey.

Appendix 3—figure 1
Theoretical and simulated conditional expectation and variance in liability (z-score) and rank across index sibling percentiles for conditional sibling, mid-parents, and index siblings.

Simulation drew 1000 parent liability values from,N(0,1); these were randomly paired to produce to mid-parents with liability,mi; two offspring were subsequently drawn from (N(mi,12)) and randomly assigned as index and conditional siblings.

Appendix 3—figure 2
For both Fluid Intelligence and Standing Height, genome-wide association studies (GWAS) variants (on chromosome 1) were used to simulate parent and offspring genotypes and liability values.

Plots show that for both traits the offspring distribution is normal and that the sibling distribution is multivariate normal, in line with our theoretical prediction.

Tables

Appendix 4—table 1
Application to the UK Biobank (extended result – trait summaries).
Trait name (field ID)Sib pairsUnique valuesSkewKurtosisSib-h2 (full)Sib-h2 (5–95)
LH Grip Strength (46)17,174850.3692.7310.320.36
Waist Circumference (48)17,2737160.423.290.490.52
Ankle Spacing (3143)790019280.253.110.670.69
Sitting Height (20015)17,2163520.043.40.730.82
Forced Expiratory Vol (20150)99186620.43.220.490.58
Body Fat % (23099)16,7855770.092.560.530.57
Whole Body Impedance (23106)16,8017710.242.80.570.59
Right Leg Impedance (23107)16,7933500.153.340.550.58
Left Leg Impedance (23108)16,7933520.143.370.540.56
Right Arm Impedance (23109)16,9834750.332.630.510.53
Left Arm Impedance (23110)16,7974550.342.620.510.53
Trunk Fat Percentage (23127)16,780644-0.083.190.510.52
Red Blood Cell Count (30010)16,30734210.113.310.480.53
Haemoglobin Concentration (30020)16,3071119-0.073.360.390.44
Haematocrit Percentage (30030)16,3032872-0.023.350.370.41
Cholesterol (30690)545270280.373.440.470.53
LDL Direct (30780)543254750.373.360.450.5
Urate (30880)543950350.463.160.420.42
Appendix 4—table 2
Application to the UK Biobank (extended result – sibling tests).
Trait name (field ID)TailIdx sib cutoffKS test p valueDe novo obs, expDe novo p valueMendelian obs, expMend p value
LH Grip Strength (46)Upper2.220.09320.53, 0.460.832813, 6.80.0071
Lower−2.815.95e-06−0.22, −0.541.59e-057, 1.90.0012
Waist Circ. (48)Upper2.590.02070.9, 0.790.941416, 5.89.1e-06
Lower−2.250.0559−0.53, −0.640.071512, 8.60.1144
Ankle Spacing (3143)Upper2.410.32660.9, 0.950.29497, 4.90.1593
Lower−2.350.1923−0.99, −0.930.71777, 5.10.1934
Sitting Height (20015)Upper2.362.56e-110.52, 1.194.16e-145, 20.00.9998
Lower−2.583.5e-06−0.63, −1.132.88e-0810, 11.40.6617
FEV (20150)Upper2.360.05940.73, 0.790.05037, 4.40.1532
Lower-2.750.0005−0.42, −0.823.96e-053, 2.20.3044
Body Fat (23099)Upper2.330.57650.88, 0.790.875417, 9.20.0044
Lower−2.550.004−0.65, −0.850.00329, 6.60.1748
Body Imp. (23106)Upper2.40.05650.68, 0.830.01987, 8.90.7465
Lower−2.460.012−0.71, −0.870.020910, 8.20.2566
Right Leg Imp. (23107)Upper2.310.74910.76, 0.780.39199, 9.90.6143
Lower−2.50.0003−0.59, −0.850.00036, 7.20.672
Left Leg Imp. (23108)Upper2.30.13930.83, 0.740.896510, 9.10.3752
Lower−2.50.0093−0.65, −0.810.01815, 6.30.7078
Right Arm Imp. (23109)Upper2.510.0010.58, 0.810.00123, 6.90.9356
Lower−2.370.4972−0.72, −0.770.25210, 8.70.3251
Left Arm Imp (23110)Upper2.540.00230.66, 0.810.03067, 6.10.3548
Lower−2.40.599−0.69, −0.760.176710, 7.90.2183
Trunk Fat % (23127)Upper2.210.58810.68, 0.660.626911, 8.80.2204
Lower−2.510.1247−0.65, −0.740.10569, 5.50.0624
RBC Count (30010)Upper2.290.08920.57, 0.720.02568, 9.20.653
Lower−2.491.58e-09−0.34, -0.821.07e-104, 7.10.8815
Haemoglobin Conc (30020)Upper2.280.02170.49, 0.60.06434, 7.20.8908
Lower−2.594.34e-06−0.32, −0.711.67e-075, 4.40.3878
Haematocrit % (30030)Upper2.290.01040.34, 0.550.00343, 6.40.9135
Lower−2.520.0004−0.35, −0.630.00024, 4.30.5622
Cholesterol (30690)Upper2.280.40590.57, 0.710.14256, 3.10.0479
Lower−2.350.0045−0.37, −0.740.00125, 3.10.1316
LDL Direct (30780)Upper2.30.22620.37, 0.660.00833, 3.00.5098
Lower−2.350.0112−0.51, −0.680.09185, 2.60.0614
Urate (30880)Upper2.410.15660.48, 0.610.1463, 2.00.2314
Lower−2.470.1641−0.74, -0.60.85137, 1.50.001

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Tade Souaiaia
  2. Hei Man Wu
  3. Clive Hoggart
  4. Paul F O'Reilly
(2025)
Sibling similarity can reveal key insights into genetic architecture
eLife 12:RP87522.
https://doi.org/10.7554/eLife.87522.3