Distribution of HLA and Tri-SNP haplotypes among TEDDY samples.

Cox PH regression results for all tested outcomes.

Log hazard ratios (log(HR)) and 95% confidence intervals of the tri-SNP 101 haplotype as well as the known risk factors FDR and sex from the Cox PH models for outcomes T1D (A), IA (B), IAA-first (C), GADA-first (D), CD (E) and CDA (F), using the entire cohort (left) or only the DR3-DQ2 homozygote individuals (right). Dashed vertical line at 0 indicating an HR of 1 (log(HR) = 0), i.e. no effect on risk. Left side of the vertical line indicates reduced risk vs increased risk on the right side. Whiskers indicate 95% CI around HR. The model assesses the independent risk/protection afforded by each covariate compared to the baseline for categorical covariates FDR and sex for which the baselines are having no FDR and female sex, respectively. Tri-SNP 101 is modeled numerically, so the HR reported is per each additional 101 allele.

C4 gene expression values with respect to tri-SNP.

Count per million (CPM) values in 129 DR3 homozygous individuals showing decreasing C4A and increasing C4B gene expression as tri-SNP 101 allele count increases. Each point represents the median CPM value of multiple samples from one individual. Boxes represent the interquartile range (IQR) and midlines mark the median value.

Unique sequence read coverage in C4 region and copy number calls.

Uniquely mapping read coverage from WGS data of 188 homozygous DR3-DQ2 individuals. C4A and C4B genes share extensive sequence identity along the genes except a ∼3 kilobase region indicated with boxes. Reads mapping to these regions were used to estimate C4A (left column) and C4B (right column) copy numbers per sample. Samples were sorted based on C4A copy numbers. A maximum value of 4 was used for the heatmap to moderate high outlier values.