Associations of ABO and Rhesus D blood groups with phenome-wide disease incidence: A 41-year retrospective cohort study of 482,914 patients

  1. Peter Bruun-Rasmussen
  2. Morten Hanefeld Dziegiel
  3. Karina Banasik
  4. Pär Ingemar Johansson
  5. Søren Brunak  Is a corresponding author
  1. Department of Clinical Immunology, Copenhagen University Hospital, Denmark
  2. Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark

Abstract

Background:

Whether natural selection may have attributed to the observed blood group frequency differences between populations remains debatable. The ABO system has been associated with several diseases and recently also with susceptibility to COVID-19 infection. Associative studies of the RhD system and diseases are sparser. A large disease-wide risk analysis may further elucidate the relationship between the ABO/RhD blood groups and disease incidence.

Methods:

We performed a systematic log-linear quasi-Poisson regression analysis of the ABO/RhD blood groups across 1,312 phecode diagnoses. Unlike prior studies, we determined the incidence rate ratio for each individual ABO blood group relative to all other ABO blood groups as opposed to using blood group O as the reference. Moreover, we used up to 41 years of nationwide Danish follow-up data, and a disease categorization scheme specifically developed for diagnosis-wide analysis. Further, we determined associations between the ABO/RhD blood groups and the age at the first diagnosis. Estimates were adjusted for multiple testing.

Results:

The retrospective cohort included 482,914 Danish patients (60.4% females). The incidence rate ratios (IRRs) of 101 phecodes were found statistically significant between the ABO blood groups, while the IRRs of 28 phecodes were found statistically significant for the RhD blood group. The associations included cancers and musculoskeletal-, genitourinary-, endocrinal-, infectious-, cardiovascular-, and gastrointestinal diseases.

Conclusions:

We found associations of disease-wide susceptibility differences between the blood groups of the ABO and RhD systems, including cancer of the tongue, monocytic leukemia, cervical cancer, osteoarthrosis, asthma, and HIV- and hepatitis B infection. We found marginal evidence of associations between the blood groups and the age at first diagnosis.

Funding:

Novo Nordisk Foundation and the Innovation Fund Denmark

Editor's evaluation

This important retrospective analysis of nearly 500,000 hospitalized Danish patients sheds light on the possible relationships between blood type and susceptibility to a host of diseases. The Danish National Patient Register is a compelling data source, and the statistical methodology is solid. The findings reported herein provide evidence, supporting information, and potential hypotheses for researchers interested in the causes and etiology of diseases as they relate to blood type.

https://doi.org/10.7554/eLife.83116.sa0

Introduction

Still 100 years after the discovery of the ABO and Rhesus D (RhD) blood group systems, the selective forces that may have attributed to the observed blood group population differences remain elusive (Anstee, 2010). The pathophysiological mechanisms behind the observed relationship between blood groups and diseases are not well understood either. The ABO system has been associated with susceptibility to multiple diseases, including gastrointestinal- and cardiovascular diseases and pancreatic-, gastric-, and ovarian cancers (Vasan et al., 2016; Liumbruno and Franchini, 2014; Wolpin et al., 2010; Groot et al., 2020; Edgren et al., 2010; Dahlén et al., 2021; Li and Schooling, 2020). The ABO system has also been associated with the susceptibility, progression, and severity of COVID-19 (Ellinghaus et al., 2020). In contrast, apart from hemolytic disease of the newborn, reported associations between the RhD blood group and disease development are sparser (Anstee, 2010).

Specifically, higher levels of factor VIII (FVIII) and von Willebrand factor (vWF) observed in individuals with a non-O blood group have been suggested to affect the development of cardiovascular disease (Jenkins and O’Donnell, 2006; Franchini and Lippi, 2015). Additionally, blood group-related antigens have been suggested to be involved in the adhesion of trophoblast, inflammatory cells, and metastatic tumor cells to the endothelial cells of the vasculature (Ravn and Dabelsteen, 2000). The endothelial cells of the vasculature have also been suggested to contribute to the initiation and propagation of severe clinical manifestations of COVID-19 (Teuwen et al., 2020).

Recently, an associative disease-wide risk analysis of the ABO and RhD blood groups was conducted in a large Swedish cohort (Dahlén et al., 2021). The study generated further support for previous findings and suggested new associations. Here, we further uncover the relationship between the ABO and RhD blood groups and disease susceptibility using a Danish cohort of 482,914 patients. In contrast to previous studies, we use up to 41 years of follow-up data, and a disease categorization scheme specifically developed for disease-wide analysis called phecodes (Wu et al., 2019). Further, we determine the uniqueness of each individual ABO blood group as opposed to using blood group O as the reference.

We estimate incidence rate ratios of 1312 phecodes (diagnoses) for the ABO and RhD blood groups. Further, we determine associations between the ABO/RhD blood groups and the age at the first diagnosis to better disclose the temporal life course element of disease development.

Methods

Study design

This retrospective cohort study was based on the integration of the Danish National Patient Registry (DNPR) and data on ABO/RhD blood groups of hospitalized patients. We included Danish patients who had their ABO/RhD blood group determined in the Capital Region or Region Zealand (covering ~45% of the Danish population Nordjylland, 2022), between January 1, 2006, and April 10, 2018. A blood type determination is commonly done for patients who may require a blood transfusion during hospitalization for example, anemic patients and women in labor. In the inclusion period, approximately 90% of the population in the Capital Region and 97% of the Region Zealand population were of European ancestry (Supplementary file 1). The DNPR provided the International Classification of Diseases 8th and 10th revision (ICD-8 and ICD-10) diagnosis codes, dates of diagnosis, date of birth, date of potential emigration, and sex of patients, with records dating back to 1977.

Similar to a case-control study, the patients were included retrospectively. Here, selection into the study was based on an in-hospital ABO/RhD blood group determination. That is, the person-time and the entire disease history back to 1977 of patients hospitalized between 2006 and 2018 with known ABO/RhD blood groups were included retrospectively.

We defined diseased and non-diseased individuals using the phecode mapping from ICD-10 diagnosis codes (Wu et al., 2019). Before categorizing the assigned ICD diagnosis codes into phecodes, the ICD-8 codes were converted to ICD-10 codes (Pedersen et al., 2023). Further, referral diagnoses were excluded. Pregnancy- and perinatal diagnosis (ICD-10 chapters 15–16) assigned before or after age 10 were excluded, or, when possible, rightly assigned to the mother or newborn, respectively. The disease categories of injuries, poisonings, and symptoms were deemed unlikely to be associated with the blood groups and excluded from the analyses (phecode categories: ‘injuries and poisonings’, ‘symptoms’ and phecodes above 999). Only phecodes with at least 100 cases in the study sample were included. The patients were followed from the entry in the DNPR to the date of death, emigration, the first event of the studied phecode, or end study period (April 10, 2018), whichever came first. Thus, follow-up was up to 41 years. The patients were allowed to contribute events and time at risk to multiple phecode analysis.

Diagnosis-wide incidence rate ratios

We used a log-linear quasi-Poisson regression model to estimate incidence rate ratios (IRRs) of each phecode among individuals with blood groups A, B, AB, and O relative to the other blood groups, respectively (e.g. A vs. B, AB, and O) (Dewey et al., 1995; Ver Hoef and Boveng, 2007). Further, we compared individuals with positive RhD type relative to negative RhD type. The analyses of diseases developed by both males and females were adjusted for sex, while analyses of sex-restricted diseases (e.g. cervical cancer) only included a subgroup of individuals of the restricted sex. Sex-restricted diseases were pre-defined by the phecode terminology. Sex was adjusted for as prior studies have found sex differences in the incidence rates of multiple diseases (Westergaard et al., 2019). Further, we adjusted for the year of birth and attained age, both modeled using restricted cubic splines with five knots. Attained age was split into 1 year intervals and treated as a time-dependent covariate, thus allowing individuals to move between categories with time. Herewith, age was used as the underlying time scale. Further, an interaction between attained age and sex was modeled for non-sex-restricted analyses. Patients were excluded from the analysis if they were assigned the phecode under study at the start of the DNPR. For analysis of congenital phecodes (e.g. sickle cell disease), prevalence ratios were estimated instead of IRRs by using the cohort size as the offset (see Supplementary file 2 for a list of the congenital phecodes). The analyses of ABO blood groups were adjusted for RhD type, and the RhD-analyses were adjusted for the ABO blood group. Adjustment for the birth year was done to control for societal changes and was used instead of the calendar period of diagnosis. The robust quasi-Poisson variance formula was used to control for over-dispersion (Ver Hoef and Boveng, 2007). We conducted a supplemental analysis using the same methodology but where blood group O was instead used as the reference to enable direct comparison and meta-analysis with previous studies.

Age of first hospital diagnosis

We estimated differences in age of first phecode of individuals with blood group A, B, AB, and O relative to any other blood group, respectively. Similar analyses were done for RhD-positive individuals relative to RhD negative individuals. We used a linear regression model adjusted for sex and birth year (as a restricted cubic spline with five knots). Analysis of sex-restricted phecodes was not adjusted for sex. Individuals who were assigned the studied phecode at the start date of the DNPR were excluded as the age of diagnosis was uncertain. Further, congenital- and pregnancy-related phecodes were not included.

Statistical analyses were performed in R (version 3.6.2) using the survival and rms package. p-values were two-sided. p-values and confidence intervals were adjusted for multiple testing by the false discovery rate (FDR) approach, accounting for the number of performed tests (5 blood groups times 1312 phecodes; Benjamini and Hochberg, 1995; Altman and Bland, 2011). FDR adjusted p-values <0.05 were deemed statistically significant. The analysis pipeline was made in python (anaconda3/5.3.0) using snakemake for reproducibility (Köster and Rahmann, 2012). The analyses code is available through https://www.github.com/peterbruun/blood_type_study (copy archived at Bruun-Rasmussen, 2023). The manuscript complies with the STROBE reporting guidelines.

Results

In total, 482,914patients (60.4%females) were included and 1312 phecodes (diagnosis codes) were examined (Figure 1, and Table 1). The median follow-up time for all phecode analyses was 17,555,322 person-years (Q1-Q3: 17,324,597–17,615,142). The cohort held a wide age distribution of patients born from 1901 to 2015 (Table 1, and Supplementary file 3). The ABO/RhD blood group distribution of the patients was similar to that of a previously summarized reference population of 2.2 million Danes (Table 1; Barnkob et al., 2020; Banks, 2022).

Selection of patients for the 41-year retrospective cohort study on ABO/RhD blood groups and associations with disease incidence in 482,914 Danish patients.
Table 1
Characteristics of patients in the 41-year retrospective cohort study on ABO/RhD blood groups and associations with disease incidence in 482,914 Danish patients.
N=482,914
ABO, n (%)O197,634 (40.9)
A206,110 (42.7)
AB22,111 (4.6)
B57,059 (11.8)
RhD, n (%)Negative74,150 (15.4)
Positive408,764 (84.6)
Sex, n (%)K291,649 (60.4)
M191,265 (39.6)
Birth year, median [Q1,Q3]1963 [1945,1982]
Age at entry, median [Q1,Q3]13 [0,32]
Follow-up time, median [Q1,Q3]40.8 [33.4,41.3]

Incidence rate ratios

After adjustment for multiple testing, we found the incidence rate ratios (IRRs) of 101 and 28 phecodes (116 unique) to be statistically significant for the ABO and RhD blood groups, respectively. The statistically significant IRRs are given with 95% confidence intervals in Table 2. The estimates of all examined phecodes are given in Supplementary file 4. Further, Manhattan plots of the p-values and disease categories are presented in Figures 26 .

Manhattan plot for blood group A with phecodes included by category.

The vertical axis shows the -log10 transformed FDR adjusted p-values on a log10-scale. The horizontal axis shows the phecodes by category. The red line indicates the statistically significant level of <0.05 for FDR adjusted p-values. Associations with p-value >0.8 are not displayed. Coloured and annotated associations were deemed statistically significant. The direction of the triangles indicates positive or inverse associations (upward: IRR >1, downward: IRR <1). The size of the triangles indicates the size of the incidence rate ratio.

Manhattan plot for blood group B with phecodes included by category.

The vertical axis shows the -log10 transformed FDR adjusted p-values on a log10-scale. The horizontal axis shows the phecodes by category. The red line indicates the statistically significant level of <0.05 for FDR adjusted p-values. Associations with p-value >0.8 are not displayed. Coloured and annotated associations were deemed statistically significant. The direction of the triangles indicates positive or inverse associations (upward: IRR >1, downward: IRR <1). The size of the triangles indicates the size of the incidence rate ratio.

Manhattan plot for blood group AB with phecodes included by category.

The vertical axis shows the -log10 transformed FDR adjusted p-values on a log10-scale. The horizontal axis shows the phecodes by category. The red line indicates the statistically significant level of <0.05 for FDR adjusted p-values. Associations with p-value >0.8 are not displayed. Coloured and annotated associations were deemed statistically significant. The direction of the triangles indicates positive or inverse associations (upward: IRR >1, downward: IRR <1). The size of the triangles indicates the size of the incidence rate ratio.

Manhattan plot for blood group O with phecodes included by category.

The vertical axis shows the -log10 transformed FDR adjusted p-values on a log10-scale. The horizontal axis shows the phecodes by category. The red line indicates the statistically significant level of <0.05 for FDR adjusted p-values. Associations with p-value >0.8 are not displayed. Coloured and annotated associations were deemed statistically significant. The direction of the triangles indicates positive or inverse associations (upward: IRR >1, downward: IRR <1). The size of the triangles indicates the size of the incidence rate ratio.

Manhattan plot for the Rhesus D blood group with phecodes included by category.

The vertical axis shows the -log10 transformed FDR adjusted p-values on a log10-scale. The horizontal axis shows the phecodes by category. The red line indicates the statistically significant level of <0.05 for FDR adjusted p-values. Associations with p-value >0.8 are not displayed. Coloured and annotated associations were deemed statistically significant. The direction of the triangles indicates positive or inverse associations (upward: IRR >1, downward: IRR <1). The size of the triangles indicates the size of the incidence rate ratio.

Table 2
Statistically significant incidence rate ratios for each individual blood group A, B, AB, and O relative to any other blood group (e.g. A vs. B, AB, and O combined). Further, also for RhD-positive blood group relative to the RhD negative blood group.
Blood group ABlood group BBlood group ABBlood group 0Blood group RhD
PhecodePhenotypeCasesPerson-yearsIRR (95% CI)p-valueIRR (95% CI)p-valueIRR (95% CI)p-valueIRR (95% CI)p-valueIRR (95% CI)p-value
Infectious Diseases
010Tuberculosis2101176034400.87 (0.74, 1.02)0.0931.4 (1.18, 1.65)<0.0011.07 (0.32, 3.57)0.9150.97 (0.59, 1.57)0.8991.36 (1.12, 1.65)0.002
070Viral hepatitis6596175570780.88 (0.82, 0.93)<0.0011.22 (1.12, 1.34)<0.0011 (0.81, 1.23)0.971.04 (0.9, 1.21)0.5831.12 (0.99, 1.28)0.069
070.2Viral hepatitis B1664176135720.7 (0.62, 0.79)<0.0011.66 (1.46, 1.9)<0.0011.12 (0.38, 3.26)0.8511.06 (0.7, 1.61)0.7941.36 (1.07, 1.71)0.01
071Human immunodeficiency virus [HIV] disease1182176208080.81 (0.63, 1.05)0.1091.18 (0.54, 2.59)0.6881.19 (0.52, 2.73)0.6931.1 (0.67, 1.81)0.7251.49 (1.04, 2.15)0.031
071.1HIV infection, symptomatic1182176208080.81 (0.63, 1.05)0.1091.18 (0.54, 2.59)0.6881.19 (0.52, 2.73)0.6931.1 (0.67, 1.81)0.7251.49 (1.04, 2.15)0.031
110.13Dermatophytosis of the body211176305310.89 (0.35, 2.22)0.8091.64 (1.08, 2.49)0.020.93 (0.03, 30.15)0.970.88 (0.35, 2.23)0.8041.13 (0.2, 6.46)0.897
Neoplasms
145.2Cancer of tongue606176290550.95 (0.51, 1.76)0.8771.13 (0.71, 1.82)0.6161.21 (0.43, 3.38)0.7260.96 (0.46, 2.03)0.930.74 (0.6, 0.92)0.007
157Pancreatic cancer2828176279481.26 (1.16, 1.36)<0.0011.02 (0.34, 3.13)0.971.11 (0.53, 2.33)0.80.76 (0.7, 0.82)<0.0011.01 (0.52, 1.98)0.97
180Cervical cancer and dysplasia12538103088601.05 (0.99, 1.12)0.1180.9 (0.83, 0.97)0.010.96 (0.66, 1.39)0.8391 (0.99, 1.01)0.970.93 (0.86, 1.01)0.083
180.3Cervical intraepithelial neoplasia [CIN] [Cervical dysplasia]10895103271071.06 (0.99, 1.13)0.1150.89 (0.82, 0.97)0.0090.95 (0.67, 1.34)0.7661 (0.84, 1.19)0.970.93 (0.85, 1.01)0.102
195.1Malignant neoplasm, other7383176034240.96 (0.87, 1.05)0.3680.95 (0.77, 1.17)0.6420.96 (0.55, 1.69)0.9061.07 (1, 1.15)0.0380.88 (0.82, 0.94)<0.001
204.3Monocytic leukemia179176313001.04 (0.15, 7.27)0.970.92 (0.06, 13.67)0.9570.25 (0.06, 0.98)0.0471.13 (0.44, 2.91)0.8090.84 (0.37, 1.93)0.698
216Benign neoplasm of skin12993174954311.02 (0.84, 1.23)0.8830.9 (0.82, 0.98)0.0140.99 (0.62, 1.57)0.971.03 (0.93, 1.14)0.5830.95 (0.85, 1.05)0.32
860Bone marrow or stem cell transplant142176313021.01 (0.77, 1.31)0.970.63 (0.15, 2.57)0.5310.15 (0.05, 0.44)<0.0011.37 (1.02, 1.82)0.0334.15 (2.13, 8.09)<0.001
Endocrine/Metabolic
242Thyrotoxicosis with or without goiter9744175274260.93 (0.86, 1.02)0.1270.99 (0.59, 1.65)0.970.9 (0.66, 1.22)0.5051.09 (1.02, 1.18)0.0131.02 (0.7, 1.48)0.927
250Diabetes mellitus36810172950331.01 (0.92, 1.11)0.8491.09 (1.05, 1.13)<0.0011.06 (0.95, 1.18)0.320.95 (0.92, 0.97)<0.0011.07 (1.03, 1.11)<0.001
250.2Type 2 diabetes32505173465331 (0.82, 1.23)0.971.1 (1.06, 1.15)<0.0011.07 (0.96, 1.19)0.2130.94 (0.92, 0.97)<0.0011.07 (1.02, 1.12)0.004
261Vitamin deficiency6674175940820.94 (0.86, 1.03)0.1831.15 (1.05, 1.26)0.0030.93 (0.66, 1.31)0.6831.01 (0.75, 1.36)0.941.06 (0.92, 1.22)0.405
261.4Vitamin D deficiency4105176130130.91 (0.85, 0.97)0.0071.23 (1.14, 1.34)<0.0010.95 (0.56, 1.61)0.861.01 (0.63, 1.63)0.971.14 (1.04, 1.25)0.006
271Disorders of carbohydrate transport and metabolism1306176190380.86 (0.69, 1.07)0.1761.28 (1, 1.64)0.0471.07 (0.23, 4.92)0.9341.01 (0.51, 2)0.971.16 (0.78, 1.72)0.475
271.3Intestinal disaccharidase deficiencies and disaccharide malabsorption1154176213370.85 (0.68, 1.05)0.1351.32 (1.05, 1.67)0.0181.08 (0.24, 4.88)0.9241.01 (0.55, 1.86)0.971.15 (0.73, 1.8)0.562
272Disorders of lipoid metabolism41222173470321.06 (1.03, 1.09)<0.0011.01 (0.72, 1.42)0.971.03 (0.63, 1.69)0.920.93 (0.91, 0.96)<0.0011.02 (0.9, 1.17)0.733
272.11Hypercholesterolemia35565173950121.06 (1.03, 1.09)<0.0011.01 (0.79, 1.29)0.9561.03 (0.88, 1.21)0.7090.93 (0.91, 0.96)<0.0011.03 (0.96, 1.09)0.433
272.13Mixed hyperlipidemia1324176199111.05 (0.66, 1.67)0.8411.04 (0.14, 7.94)0.971.39 (1.01, 1.91)0.040.87 (0.7, 1.08)0.2111.09 (0.58, 2.04)0.809
Hematopoietic
282Hereditary hemolytic anemias947482914*0.76 (0.59, 0.99)**0.0391.36 (0.9, 2.06)**0.1411.42 (0.72, 2.8)**0.3161.03 (0.25, 4.26)**0.971.65 (1.06, 2.56)**0.026
282.8Other hemoglobinopathies557482914*0.65 (0.51, 0.82)**<0.0011.56 (1.16, 2.1)**0.0031.47 (0.77, 2.83)**0.2481.09 (0.58, 2.07)**0.7982.24 (1.52, 3.29)**<0.001
286Coagulation defects4124176067961.14 (1.01, 1.3)0.0350.95 (0.56, 1.6)0.8541.14 (0.67, 1.95)0.6330.87 (0.76, 0.99)0.0290.93 (0.66, 1.33)0.717
286.11Von willebrand’s disease214482914*0.71 (0.35, 1.47)**0.3680.49 (0.17, 1.45)**0.1990.39 (0.02, 8.74)**0.5621.96 (1.32, 2.91)**<0.0010.68 (0.29, 1.6)**0.388
286.3Coagulation defects complicating pregnancy or postpartum2015105024181.15 (1.08, 1.22)<0.0011.02 (0.58, 1.79)0.9461.14 (0.95, 1.38)0.1640.84 (0.79, 0.89)<0.0010.88 (0.8, 0.97)0.013
286.7Other and unspecified coagulation defects1085176214211.23 (1.05, 1.45)0.0130.98 (0.45, 2.16)0.971.52 (1.1, 2.08)0.0110.74 (0.64, 0.85)<0.0011.03 (0.23, 4.72)0.97
Mental Disorders
300.1Anxiety disorder7985176031881.03 (0.93, 1.14)0.5750.97 (0.8, 1.16)0.7350.96 (0.66, 1.41)0.8580.99 (0.76, 1.29)0.9530.92 (0.86, 0.99)0.027
Neurological
333.8Other degenerative diseases of the basal ganglia381176302090.78 (0.57, 1.07)0.1281.05 (0.11, 9.95)0.9670.65 (0.24, 1.78)0.4071.32 (1.02, 1.71)0.0331 (0.95, 1.05)0.97
339Other headache syndromes3466176032691.05 (0.9, 1.22)0.5461.11 (0.95, 1.3)0.2041.03 (0.39, 2.73)0.9570.9 (0.83, 0.99)0.0231.06 (0.82, 1.36)0.688
345Epilepsy, recurrent seizures, convulsions23469172287800.99 (0.77, 1.28)0.970.96 (0.86, 1.08)0.510.97 (0.71, 1.32)0.8531.03 (0.96, 1.1)0.440.94 (0.89, 1)0.039
345.3Convulsions14391173516760.99 (0.76, 1.3)0.970.96 (0.79, 1.18)0.7321 (0.88, 1.13)0.971.02 (0.89, 1.18)0.770.92 (0.85, 0.99)0.036
Sense Organs
361.1Retinal detachment with retinal defect3037175943941.06 (0.87, 1.29)0.5821.07 (0.69, 1.66)0.7871.23 (0.9, 1.69)0.1930.88 (0.79, 0.98)0.0220.97 (0.49, 1.96)0.948
381Otitis media and Eustachian tube disorders22790171445511.07 (1.03, 1.11)<0.0010.94 (0.86, 1.04)0.2261.01 (0.55, 1.87)0.970.95 (0.91, 1)0.0390.96 (0.87, 1.06)0.437
381.1Otitis media12313173640911.09 (1.04, 1.14)<0.0010.95 (0.83, 1.09)0.4490.98 (0.44, 2.22)0.970.94 (0.89, 1)0.0670.97 (0.82, 1.13)0.688
381.2Eustachian tube disorders2358175711181.15 (1.01, 1.31)0.0330.9 (0.62, 1.29)0.5660.87 (0.38, 2.01)0.7650.93 (0.74, 1.17)0.5310.88 (0.69, 1.13)0.32
385.5Tympanosclerosis and middle ear disease related to otitis media530176252151.25 (0.95, 1.64)0.1140.99 (0.55, 1.77)0.971.21 (0.37, 3.96)0.7610.77 (0.6, 0.98)0.0361.08 (0.29, 3.98)0.918
386.9Dizziness and giddiness (Light-headedness and vertigo)1060176240970.84 (0.73, 0.97)0.0161.04 (0.43, 2.55)0.9330.82 (0.42, 1.57)0.5551.2 (1.06, 1.37)0.0051.08 (0.7, 1.68)0.738
389Hearing loss43238171661140.99 (0.88, 1.12)0.8971.01 (0.74, 1.37)0.971.01 (0.78, 1.3)0.971.01 (0.83, 1.21)0.961.06 (1.01, 1.1)0.009
389.3Degenerative and vascular disorders of ear22354174199470.99 (0.81, 1.22)0.941 (0.85, 1.19)0.971.02 (0.53, 1.94)0.9611 (0.83, 1.22)0.971.08 (1.02, 1.15)0.012
Circulatory System
394Rheumatic disease of the heart valves8422175776581.08 (1.01, 1.16)0.0290.98 (0.61, 1.57)0.9240.93 (0.63, 1.38)0.7440.94 (0.86, 1.03)0.20.96 (0.78, 1.19)0.729
411.2Myocardial infarction25905174111931.03 (0.96, 1.11)0.4341.06 (0.97, 1.16)0.1881.03 (0.58, 1.8)0.9320.94 (0.9, 0.98)0.0081.02 (0.83, 1.26)0.867
415Pulmonary heart disease10870175653691.2 (1.15, 1.26)<0.0011.07 (0.9, 1.27)0.4751.14 (0.96, 1.36)0.1250.78 (0.75, 0.82)<0.0010.95 (0.82, 1.09)0.482
415.11Pulmonary embolism and infarction, acute1533176124651.39 (1.21, 1.59)<0.0011.11 (0.56, 2.18)0.7811.14 (0.44, 2.98)0.7980.65 (0.57, 0.75)<0.0010.88 (0.59, 1.32)0.562
425.12Other hypertrophic cardiomyopathy466176288200.81 (0.61, 1.07)0.1380.83 (0.35, 1.98)0.6831.11 (0.12, 10.23)0.931.3 (1.03, 1.64)0.0280.91 (0.32, 2.6)0.877
428.1Congestive heart failure (CHF) NOS8357175953281.03 (0.94, 1.13)0.51.02 (0.76, 1.37)0.9061.12 (0.96, 1.3)0.140.94 (0.89, 0.99)0.0281.01 (0.7, 1.46)0.961
440Atherosclerosis10901175547041.05 (0.95, 1.16)0.3451.1 (0.96, 1.25)0.1691.12 (0.91, 1.37)0.30.9 (0.85, 0.95)<0.0011.04 (0.88, 1.23)0.675
440.2Atherosclerosis of the extremities8348175703361.06 (0.96, 1.18)0.2581.09 (0.9, 1.32)0.4061.08 (0.75, 1.55)0.6860.9 (0.84, 0.96)0.0021.03 (0.8, 1.33)0.823
442.2Aneurysm of iliac artery326176306600.75 (0.64, 0.89)<0.0011.13 (0.63, 2.04)0.691.2 (0.61, 2.38)0.6071.21 (1, 1.47)0.0450.72 (0.6, 0.87)<0.001
443Peripheral vascular disease13791175467961.05 (0.98, 1.12)0.1531.01 (0.59, 1.73)0.971.03 (0.71, 1.5)0.8720.94 (0.89, 1)0.0351.03 (0.91, 1.18)0.63
443.7Peripheral angiopathy in diseases classified elsewhere3677176114141.02 (0.68, 1.51)0.941.1 (0.9, 1.34)0.371.1 (0.78, 1.55)0.6030.93 (0.83, 1.04)0.2141.18 (1.05, 1.31)0.004
444Arterial embolism and thrombosis2390176146191.14 (0.98, 1.34)0.0931.11 (0.77, 1.61)0.5831.28 (0.93, 1.76)0.1340.79 (0.7, 0.89)<0.0011 (0.92, 1.08)0.97
444.1Arterial embolism and thrombosis of lower extremity artery1286176238721.13 (0.78, 1.63)0.5331.13 (0.61, 2.13)0.7061.38 (0.86, 2.23)0.1850.78 (0.63, 0.97)0.0220.98 (0.35, 2.76)0.97
451Phlebitis and thrombophlebitis16748174797091.22 (1.18, 1.25)<0.0011.15 (1.09, 1.21)<0.0011.29 (1.18, 1.4)<0.0010.73 (0.7, 0.75)<0.0010.97 (0.84, 1.13)0.717
451.2Phlebitis and thrombophlebitis of lower extremities15650174895281.23 (1.19, 1.27)<0.0011.14 (1.08, 1.2)<0.0011.31 (1.2, 1.42)<0.0010.72 (0.69, 0.74)<0.0010.97 (0.84, 1.12)0.678
452Other venous embolism and thrombosis4275176071941.24 (1.16, 1.32)<0.0011.21 (1.07, 1.36)0.0021.31 (1.1, 1.56)0.0030.69 (0.64, 0.74)<0.0011 (0.89, 1.11)0.97
452.8Postphlebitic syndrome341176294251.35 (1.08, 1.68)0.0091.26 (0.76, 2.08)0.3781.82 (1.13, 2.94)0.0140.55 (0.45, 0.67)<0.0011.18 (0.54, 2.56)0.688
454Varicose veins16500173819711.1 (1.01, 1.2)0.0320.98 (0.57, 1.68)0.9461.04 (0.51, 2.1)0.930.91 (0.83, 1)0.050.98 (0.67, 1.42)0.906
455Hemorrhoids9001175239620.95 (0.84, 1.08)0.4810.91 (0.78, 1.08)0.2790.91 (0.67, 1.24)0.5621.1 (1.03, 1.18)0.0050.99 (0.67, 1.46)0.97
456Chronic venous insufficiency [CVI]925176267091.24 (1.07, 1.44)0.0041.04 (0.23, 4.7)0.9671.37 (0.91, 2.05)0.1310.73 (0.64, 0.83)<0.0010.94 (0.49, 1.79)0.858
459Other disorders of circulatory system2555176167131.12 (0.99, 1.27)0.0761.12 (0.85, 1.48)0.4331.15 (0.75, 1.75)0.5280.82 (0.75, 0.9)<0.0011.08 (0.8, 1.46)0.636
459.9Circulatory disease NEC2174176189301.16 (1.02, 1.31)0.0241.13 (0.86, 1.5)0.3811.22 (0.85, 1.74)0.2790.78 (0.71, 0.86)<0.0011.02 (0.38, 2.74)0.966
Respiratory
474Acute and chronic tonsillitis41427168176021.1 (1.08, 1.13)<0.0010.93 (0.88, 0.97)0.0021.01 (0.63, 1.63)0.970.94 (0.91, 0.96)<0.0010.97 (0.9, 1.05)0.466
474.2Chronic tonsillitis and adenoiditis27077170304801.14 (1.11, 1.17)<0.0010.89 (0.84, 0.93)<0.0011.02 (0.75, 1.4)0.8970.92 (0.89, 0.94)<0.0010.96 (0.89, 1.03)0.259
477Epistaxis or throat hemorrhage12337175067940.93 (0.89, 0.98)0.0060.92 (0.83, 1.02)0.1180.94 (0.71, 1.23)0.6481.12 (1.08, 1.16)<0.0011.01 (0.73, 1.41)0.948
495Asthma31106172387381.04 (1, 1.08)0.0280.94 (0.88, 0.99)0.0231 (1, 1)0.970.99 (0.9, 1.08)0.7770.99 (0.78, 1.27)0.97
Digestive
530.11GERD7461175821580.99 (0.69, 1.43)0.9661.12 (1.03, 1.22)0.0071.04 (0.62, 1.72)0.8970.95 (0.88, 1.03)0.2231.12 (1.03, 1.21)0.007
530.7Gastroesophageal laceration-hemorrhage syndrome597176239001.16 (0.8, 1.66)0.4391.04 (0.15, 7.23)0.970.43 (0.22, 0.83)0.0120.94 (0.44, 2.02)0.8831.09 (0.39, 2.99)0.883
531Peptic ulcer (excl. esophageal)16678174437040.9 (0.87, 0.94)<0.0011 (0.94, 1.07)0.970.92 (0.75, 1.14)0.4711.12 (1.08, 1.16)<0.0011.02 (0.83, 1.25)0.86
531.1Hemorrhage from gastrointestinal ulcer6277175837510.91 (0.84, 0.98)0.0090.89 (0.75, 1.04)0.1490.91 (0.6, 1.4)0.6881.17 (1.11, 1.25)<0.0011.07 (0.91, 1.27)0.421
531.2Gastric ulcer6745175553120.91 (0.85, 0.98)0.0081.01 (0.63, 1.63)0.970.92 (0.62, 1.36)0.6811.11 (1.04, 1.18)0.0020.97 (0.73, 1.28)0.838
531.3Duodenal ulcer4534175556840.88 (0.8, 0.96)0.0071.06 (0.71, 1.57)0.8030.97 (0.21, 4.56)0.971.12 (1.01, 1.24)0.0281.05 (0.77, 1.43)0.789
531.5Gastrojejunal ulcer586176285390.83 (0.61, 1.11)0.2131.41 (1.08, 1.84)0.0121.12 (0.23, 5.41)0.8971.01 (0.64, 1.6)0.970.85 (0.56, 1.28)0.439
550Abdominal hernia47761169760731.01 (0.94, 1.09)0.7570.94 (0.89, 0.99)0.0210.97 (0.81, 1.17)0.7651.02 (0.96, 1.08)0.5751.01 (0.77, 1.32)0.97
562Diverticulosis and diverticulitis16569175155680.94 (0.87, 1.03)0.190.92 (0.82, 1.03)0.1380.91 (0.7, 1.18)0.4811.11 (1.06, 1.17)<0.0010.98 (0.75, 1.28)0.884
562.1Diverticulosis16569175155680.94 (0.87, 1.03)0.190.92 (0.82, 1.03)0.1380.91 (0.7, 1.18)0.4811.11 (1.06, 1.17)<0.0010.98 (0.75, 1.28)0.884
571.81Portal hypertension1101176276080.94 (0.66, 1.35)0.7661.17 (0.83, 1.63)0.3760.62 (0.43, 0.9)0.0111.06 (0.76, 1.47)0.7460.99 (0.52, 1.88)0.97
574Cholelithiasis and cholecystitis31530172992061.05 (1.02, 1.09)0.0040.98 (0.85, 1.13)0.7660.99 (0.64, 1.53)0.970.96 (0.92, 1)0.0621.01 (0.73, 1.39)0.957
574.1Cholelithiasis24028173649171.05 (1.01, 1.1)0.0290.98 (0.81, 1.18)0.8090.99 (0.49, 1.96)0.970.96 (0.91, 1.02)0.1831.01 (0.75, 1.35)0.97
574.12Cholelithiasis with other cholecystitis2650176056421.12 (1, 1.25)0.0440.88 (0.69, 1.12)0.320.91 (0.48, 1.7)0.770.95 (0.76, 1.2)0.6880.96 (0.61, 1.52)0.878
574.2Calculus of bile duct8401175609371.07 (1, 1.15)0.0420.95 (0.78, 1.17)0.6510.98 (0.46, 2.06)0.9530.95 (0.87, 1.05)0.331.07 (0.94, 1.21)0.32
575.2Obstruction of bile duct1593176266841.24 (1.1, 1.41)<0.0010.97 (0.28, 3.43)0.971.04 (0.18, 5.91)0.970.8 (0.7, 0.91)0.0011.1 (0.71, 1.69)0.688
578Gastrointestinal hemorrhage20111175022000.96 (0.9, 1.03)0.2960.94 (0.83, 1.08)0.3930.91 (0.78, 1.07)0.2491.08 (1.03, 1.12)<0.0011.01 (0.77, 1.33)0.93
578.8Hemorrhage of rectum and anus11095175553310.96 (0.87, 1.06)0.4130.93 (0.79, 1.1)0.3950.92 (0.71, 1.18)0.5051.09 (1.03, 1.15)0.0041 (0.8, 1.26)0.97
Genitourinary
614.51Cervicitis and endocervicitis660104887910.97 (0.52, 1.8)0.9181.1 (0.7, 1.73)0.6961.4 (1.03, 1.91)0.0330.93 (0.64, 1.34)0.7061.16 (0.86, 1.57)0.326
622.2Mucous polyp of cervix1401104988021.17 (1.03, 1.34)0.020.99 (0.64, 1.53)0.971.12 (0.56, 2.21)0.7660.83 (0.73, 0.95)0.0070.97 (0.43, 2.18)0.948
626.12Excessive or frequent menstruation10504103758230.92 (0.86, 0.99)0.0260.96 (0.74, 1.24)0.7571.02 (0.48, 2.14)0.971.1 (1.03, 1.17)0.0040.97 (0.76, 1.23)0.811
Pregnancy Complications
634.3Ectopic pregnancy5034104269670.97 (0.87, 1.08)0.5921.11 (1.01, 1.21)0.0221.11 (0.93, 1.34)0.250.96 (0.86, 1.08)0.5240.99 (0.65, 1.5)0.957
643Excessive vomiting in pregnancy4314104703230.91 (0.85, 0.97)0.0051.17 (1.08, 1.27)<0.0011.12 (0.83, 1.51)0.481 (0.87, 1.14)0.971 (0.89, 1.11)0.97
643.1Hyperemesis gravidarum3558104884460.9 (0.84, 0.96)0.0021.15 (0.97, 1.37)0.1051.11 (0.91, 1.37)0.3131.01 (0.76, 1.36)0.9271.05 (0.78, 1.4)0.777
647.3Major puerperal infection274105046891.04 (0.34, 3.13)0.9521.37 (0.98, 1.92)0.0680.78 (0.25, 2.45)0.6880.86 (0.59, 1.25)0.4390.72 (0.56, 0.93)0.01
649.1Diabetes or abnormal glucose tolerance complicating pregnancy5053104806820.92 (0.64, 1.33)0.6791.22 (0.95, 1.56)0.1181.17 (0.75, 1.82)0.5040.95 (0.55, 1.64)0.8661.26 (1.03, 1.55)0.026
654.1Abnormality of organs and soft tissues of pelvis complicating pregnancy, childbirth, or the puerperium5842104795330.94 (0.89, 0.99)0.0161.09 (1.01, 1.16)0.0190.93 (0.76, 1.13)0.4671.04 (0.96, 1.12)0.3781.08 (1.01, 1.16)0.03
654.2Rhesus isoimmunization in pregnancy808105031901.12 (0.91, 1.38)0.2940.8 (0.61, 1.06)0.1180.82 (0.4, 1.71)0.6121.01 (0.65, 1.57)0.970.26 (0.24, 0.29)<0.001
656.1Isoimmunization of fetus or newborn508482914*2.6 (1.57, 4.33)**<0.0011.57 (0.53, 4.61)**0.420.4 (0.01, 11.01)**0.5990.24 (0.11, 0.51)**<0.0011.03 (0.21, 5.13)**0.97
Dermatologic
704Diseases of hair and hair follicles4102175783530.99 (0.62, 1.59)0.971.18 (1.04, 1.34)0.011.03 (0.25, 4.3)0.970.93 (0.82, 1.06)0.2731.04 (0.68, 1.59)0.876
704.2Hirsutism1778176088880.94 (0.7, 1.26)0.6881.32 (1.15, 1.5)<0.0011.26 (0.96, 1.65)0.0930.88 (0.75, 1.04)0.131.13 (0.87, 1.45)0.369
707Chronic ulcer of skin2121176175330.94 (0.74, 1.19)0.6171.23 (1.04, 1.47)0.0170.94 (0.29, 3.07)0.9270.98 (0.51, 1.88)0.9591.01 (0.59, 1.74)0.97
Musculoskeletal
722.7Intervertebral disc disorder with myelopathy747176254610.95 (0.61, 1.47)0.821.26 (1.01, 1.59)0.0420.65 (0.31, 1.35)0.2481.01 (0.58, 1.76)0.970.97 (0.28, 3.31)0.961
727Other disorders of synovium, tendon, and bursa30487172727531.02 (0.95, 1.09)0.6270.94 (0.89, 0.99)0.0160.99 (0.57, 1.7)0.971.01 (0.92, 1.11)0.8090.97 (0.9, 1.04)0.352
727.7Contracture of tendon (sheath)391176280060.91 (0.45, 1.84)0.81.14 (0.38, 3.43)0.8261.11 (0.05, 24.01)0.9511.02 (0.46, 2.25)0.970.65 (0.47, 0.9)0.009
728Disorders of muscle, ligament, and fascia11338175107631.08 (1.02, 1.13)0.0080.93 (0.83, 1.04)0.1990.88 (0.74, 1.05)0.1640.98 (0.85, 1.12)0.7660.93 (0.85, 1.01)0.085
740Osteoarthrosis53711171877481.01 (0.94, 1.09)0.8150.95 (0.9, 0.99)0.0211.02 (0.79, 1.31)0.9111.01 (0.93, 1.1)0.8380.97 (0.92, 1.01)0.132
740.11Osteoarthrosis, localized, primary40386173330611.01 (0.94, 1.09)0.7790.94 (0.9, 0.99)0.0120.99 (0.72, 1.35)0.941.01 (0.95, 1.08)0.6630.97 (0.92, 1.02)0.188
741Symptoms and disorders of the joints5085175784500.99 (0.66, 1.48)0.970.85 (0.73, 0.99)0.0320.93 (0.52, 1.69)0.831.09 (0.99, 1.2)0.0721 (0.82, 1.24)0.97
742Derangement of joint, non-traumatic16652174381761.03 (0.96, 1.11)0.3570.9 (0.83, 0.97)0.0040.98 (0.61, 1.56)0.9241.02 (0.88, 1.17)0.8380.95 (0.87, 1.04)0.291
742.9Other derangement of joint13669174901821.02 (0.9, 1.16)0.7570.91 (0.83, 0.99)0.0230.98 (0.47, 2.06)0.971.02 (0.91, 1.15)0.7140.95 (0.86, 1.04)0.26
743Osteoporosis, osteopenia and pathological fracture15875175364241.05 (1, 1.11)0.0490.94 (0.85, 1.05)0.260.92 (0.76, 1.12)0.4050.99 (0.85, 1.14)0.8581 (0.98, 1.02)0.97
743.11Osteoporosis NOS13633175539761.06 (1.02, 1.1)0.0030.93 (0.87, 1)0.0390.94 (0.78, 1.13)0.5140.98 (0.9, 1.07)0.681 (0.82, 1.21)0.97
Congenital Anomalies
747Cardiac and circulatory congenital anomalies15297482914*1.07 (1.01, 1.14)**0.0290.94 (0.8, 1.11)**0.4751.02 (0.47, 2.23)**0.9610.95 (0.88, 1.03)**0.2230.98 (0.77, 1.25)**0.884
747.1Cardiac congenital anomalies1621482914*1.19 (1.02, 1.39)**0.0230.77 (0.6, 0.98)**0.0361.03 (0.27, 3.86)**0.970.93 (0.69, 1.25)**0.6351.02 (0.48, 2.16)**0.97
755Congenital anomalies of limbs6557482914*0.99 (0.65, 1.5)**0.9570.96 (0.71, 1.29)**0.7780.96 (0.46, 1.97)**0.911.04 (0.89, 1.23)**0.6270.88 (0.8, 0.97)**0.013
755.61Congenital hip dysplasia and deformity2823482914*1.01 (0.57, 1.78)**0.970.87 (0.66, 1.16)**0.3550.89 (0.42, 1.88)**0.7661.07 (0.85, 1.35)**0.570.81 (0.71, 0.94)**0.004
  1. a. Statistically significant IRRs are marked with bold (FDR adjusted P-value <0.05).

  2. b. The IRRs are adjusted for age, sex, interaction between age and sex, and birth year.

  3. c. All other ABO blood groups and the RhD negative blood group was used as a reference, respectively.

  4. d. FDR-adjusted p-values and 95% confidence intervals are presented.

  5. e. FDR adjusted p-values above 0.97 were set to 0.97 to avoid exploding adjusted confidence intervals.

  6. e. Phecodes are divided by PheWAS disease categories.

  7. f. The number of events and the follow-up time in person-years for each phecode is presented.

  8. g. For study results of congenital phecodes estimates marked with ** are prevalence ratios instead of IRRs and the corresponding person-year marked with * are the size of the cohort.

Table 2—source data 1

Associations between the ABO/RhD blood groups and phecode incidence rate ratios.

https://cdn.elifesciences.org/articles/83116/elife-83116-table2-data1-v2.zip

The number of statistically significant IRRs for A, B, AB, O, and RhD were 50, 38, 11, 53, and 28, respectively. However, a between blood group comparison on the number of statistically significant IRRs is problematic because the analyses of blood group A and O had the highest power given that these blood groups were most frequent in the study sample (Table 1). For 13 phecodes, an association was found for both the ABO blood group and the RhD blood group. The ABO blood groups were found positively associated with 75 phecodes and inversely associated with 67 phecodes. The RhD-positive blood group was found to have 16 positive- and 12 inverse associations. Blood groups A and O were associated with diseases of the circulatory and digestive system. Blood group B was associated with several infectious, metabolic, and musculoskeletal diseases. The associations of the RhD blood group included cancers, infectious diseases, and pregnancy complications. The results of the supplementary analyses where blood group O was used as the reference is shown in Supplementary files 6 and 7.

Age at first diagnosis

We found the B blood group to be associated with a later diagnosis of viral infection. Further, blood group O was associated with a later diagnosis of phlebitis and thrombophlebitis (Table 3 and Supplementary file 5). The RhD-positive group was associated with a later diagnosis of acute and chronic tonsilitis diagnosis.

Table 3
Statistically significant associations between the ABO/RhD blood groups and the age of the first diagnosis.
Blood group ABlood group BBlood group ABBlood group 0Blood group RhD
PhecodePhenotypeNEstimate (95% CI)p-valueEstimate (95% CI)p-valueEstimate (95% CI)p-valueEstimate (95% CI)p-valueEstimate (95% CI)p-value
079Viral infection25075–0.26 (–3.06, 2.54)0.8640.92 (0.13, 1.71)0.0220.53 (–6.18, 7.23)0.887–0.23 (–3.22, 2.75)0.8870.27 (–5.22, 5.75)0.93
451Phlebitis and thrombophlebitis167480.58 (–1.02,–0.13)0.011–0.24 (–5.71, 5.22)0.936–0.6 (–5.87, 4.66)0.8330.91 (0.57, 1.25)<0.0010.01 (–0.33, 0.34)0.97
451.2Phlebitis and thrombophlebitis of lower extremities15650–0.53 (–1.08, 0.01)0.055–0.27 (–5.85, 5.31)0.93–0.7 (–6.35, 4.95)0.820.9 (0.55, 1.25)<0.0010.08 (–3.89, 4.06)0.97
474Acute and chronic tonsillitis41428–0.29 (–1.45, 0.87)0.6340.42 (–1.97, 2.8)0.7440.38 (–5.71, 6.48)0.9090.05 (–2.15, 2.24)0.970.67 (0.15, 1.19)0.011
474.1Acute tonsillitis181620.1 (–4.42, 4.61)0.970.41 (–7.17, 7.98)0.9230.75 (–7.23, 8.74)0.864–0.42 (–3.76, 2.93)0.821.34 (0.64, 2.04)<0.001
  1. a. Statistically significant effect estimates are marked with bold (FDR adjusted P-value <0.05).

  2. b. FDR adjusted p-values and 95% confidence intervals are presented.

  3. c. FDR adjusted p-values above 0.97 were set to 0.97 to enable estimation of adjusted confidence intervals.

  4. d. Estimates represent increases or decreases in years of age of first diagnosis.

Table 3—source data 1

Associations between the ABO/RhD blood groups and the age of the first diagnosis.

https://cdn.elifesciences.org/articles/83116/elife-83116-table3-data1-v2.zip

Discussion

We found the ABO/RhD blood groups to be associated with a wide spectrum of diseases including cancers and musculoskeletal-, genitourinary-, endocrinal-, infectious-, cardiovascular-, and gastrointestinal diseases. Associations of the ABO blood groups included monocytic leukemia, tonsilitis, renal dialysis, diseases of the female reproductive system, and osteoarthrosis. Associations of the RhD blood group included cancer of the tongue, malignant neoplasm (other), tuberculosis-, HIV-, hepatitis B infection, type 2 diabetes, hereditary hemolytic anemias, major puerperal infection, anxiety disorders, and contracture of tendon.

The blood groups may reflect their corresponding genetic markers; thus, our findings may indicate an association between disease and the ABO locus on chromosome 9 and the RH locus on chromosome 1, respectively. Alternatively, the associations may indicate that the blood groups are involved in disease mechanisms at the molecular level mediated either through the blood group antigens or by the blood group reactive antibodies. However, our findings have a compromised causal interpretation given the retrospective inclusion of individuals (and person-time) after an in-hospital blood group test.

Our results support several previously observed associations including positive associations between the non-O blood groups and prothrombotic diseases of the circulatory system (phecodes: 411.1–459.9), associations with gastroduodenal ulcers, associations of blood group O and lower risk of type 2 diabetes, and positive association between blood group B and tuberculosis (Vasan et al., 2016; Edgren et al., 2010; Dahlén et al., 2021; Fagherazzi et al., 2015; Rao, 2012). Further, our results support findings associating non-O blood groups with increased risk of pancreatic cancer (Liumbruno and Franchini, 2014). The role of the ABO blood group in HIV susceptibility remains controversial; we only observed a positive association for the RhD-positive blood group (Davison et al., 2020).

We found blood group B to be positively associated with ‘ectopic pregnancy’, ‘excessive vomiting in pregnancy, and ‘abnormality of organs and soft tissues of pelvis complicating pregnancy’ indicating that blood group B mothers may be more likely to experience pregnancy complications. Further, we found positive associations of blood group A with both ‘mucous polyp of cervix’, and blood group AB with ‘cervicitis and endocervicitis’. Taken together these findings may indicate that the ABO blood groups are associated with diseases of the female reproductive system. However, the study design does not allow for any causal interpretation.

Only a few statistically significant associations were found for the analyses of the age of the first diagnosis; thus, indicating that the blood group’s involvement in disease onset may be marginal. However, we assumed a linear relationship with age because assessing potential non-linear relationships for each disease would be unfeasible given the large number of tests performed. The linearity assumption may not hold for all analyses which limits the interpretation of the estimates.

A strength of our approach is that we utilized the phecode disease classification scheme that is specifically developed for disease-wide risk analyses (Wu et al., 2019) The phecode mapping scheme combines ICD-10 codes that clinical domain experts have deemed to cover the same disease. For example, respiratory tuberculosis (A16), tuberculosis of nervous system (A17), and miliary tuberculosis (A19), are combined into the phecode tuberculosis (phecode 10). Phecodes may therefore provide increased power and precision compared with using ICD-10 categories (Denny et al., 2010). Further, contrary to previous studies, we compared each blood group to all other blood groups, instead of determining effect estimates relative to blood group O. Thus, here we better capture the uniqueness of each individual ABO blood group.

Limitations

Our study has some important limitations, firstly, the retrospective inclusion of patients and person-time may have introduced an immortal time bias from deaths before enrollment (in-hospital ABO/RhD blood group test) (Yadav and Lewis, 2021). The findings are therefore conditioned on patients surviving until the enrollment period. This implies, for example, that if a specific blood group causes a higher incidence of a deadly disease, then patients with such blood group are more likely to have died before enrollment, and therefore fewer individuals having both that blood group and the disease will be present in our cohort. If so, the direction of the estimates for deadly diseases strongly related to any blood group will have been lowered or even flipped, relative to any causal relationship. The study design, however, enabled 41 year of follow-up and was deemed reasonable because the blood groups have not been associated with mortality differences. Moreover, the blood group distribution in our cohort was found to be almost identical to a reference population of 2.2 million Danish blood donors. Further, we replicated several findings of associations between the blood groups and severe diseases, including pancreatic cancer (Vasan et al., 2016; Liumbruno and Franchini, 2014). This may indicate that the potential bias was less prevalent. Further, by controlling for year of birth, the potential effects of immortal time bias were likely reduced, however, this could not be tested. Immortal time biases are potentially applicable in many biobanks studies, e.g. when using the UK Biobank for retrospective studies (Yadav and Lewis, 2021).

The generalizability of our findings is limited further because our cohort solely included hospitalized patients with known ABO and RhD blood groups. These are patients whom the treating doctor has deemed likely to potentially require a blood transfusion during hospitalization. The patients under study might therefore suffer from other diseases than patients without a determined blood group, and than never hospitalized individuals. Further, diseases that do not require hospitalization could not be examined. If the effect sizes are modified by factors which are more common in our cohort than in the general population then the estimates may not be generalizable. However, it is unclear if such effect modifier exists. Lastly, it was not possible to adjust for possible confounding from the geographical distribution or ethnicity of the patients (Anstee, 2010). This may have biased some estimates because the distribution of blood groups varies between ethnicities while ethnicity is also associated with differences in disease susceptibility. Particularly, ethnicity has been associated with differences in prevalence of infectious-, cardiovascular-, sickle cell disease, and thasalamia (Kurian and Cardarelli, 2007; McQuillan et al., 2004). Thus, the estimate of these disease groups should be interpreted with caution. The Danish population is however quite homogenous and approximately 94% of Danes have European ancestry (Supplementary file 1). Therefore, a potential bias from ethnicity may be less prevalent in our cohort as compared with studies in populations of more admixed origin.

In conclusion, we found the ABO/RhD blood groups to be associated with a wide spectrum of diseases, including cardiovascular-, infectious-, gastrointestinal- and musculoskeletal diseases. This may indicate that some of the potential selective pressure on the blood groups can be attributed to disease susceptibility differences. We found few associations between the blood groups and age of first diagnosis.

Data availability

Anonymized patient data was used in this study. Due to national and EU regulations, the data cannot be shared with the wider research community. However, data can be accessed upon relevant application to the Danish authorities. The Danish Patient Safety Authority and the Danish Health Data Authority have permitted the use of the data in this study; whilst currently, the appropriate authority for journal data use in research is the regional committee ("Regionsråd"). The statistical summary data used to create the tables and graphs are available as Table 2—source data 1 and Table 3—source data 1. The analysis code is publicly available through https://www.github.com/peterbruun/blood_type_study (copy archived at Bruun-Rasmussen, 2023).

References

  1. Website
    1. Banks TDB
    (2022) Blodtypes
    Accessed August 31, 2022.
    1. Dewey M
    2. Clayton D
    3. Hills M
    (1995) Statistical models in epidemiology
    Journal of the Royal Statistical Society. Series A Statistics in Society 158:343.
    https://doi.org/10.2307/2983301
    1. Ellinghaus D
    2. Degenhardt F
    3. Bujanda L
    4. Buti M
    5. Albillos A
    6. Invernizzi P
    7. Fernández J
    8. Prati D
    9. Baselli G
    10. Asselta R
    11. Grimsrud MM
    12. Milani C
    13. Aziz F
    14. Kässens J
    15. May S
    16. Wendorff M
    17. Wienbrandt L
    18. Uellendahl-Werth F
    19. Zheng T
    20. Yi X
    21. de Pablo R
    22. Chercoles AG
    23. Palom A
    24. Garcia-Fernandez A-E
    25. Rodriguez-Frias F
    26. Zanella A
    27. Bandera A
    28. Protti A
    29. Aghemo A
    30. Lleo A
    31. Biondi A
    32. Caballero-Garralda A
    33. Gori A
    34. Tanck A
    35. Carreras Nolla A
    36. Latiano A
    37. Fracanzani AL
    38. Peschuck A
    39. Julià A
    40. Pesenti A
    41. Voza A
    42. Jiménez D
    43. Mateos B
    44. Nafria Jimenez B
    45. Quereda C
    46. Paccapelo C
    47. Gassner C
    48. Angelini C
    49. Cea C
    50. Solier A
    51. Pestaña D
    52. Muñiz-Diaz E
    53. Sandoval E
    54. Paraboschi EM
    55. Navas E
    56. García Sánchez F
    57. Ceriotti F
    58. Martinelli-Boneschi F
    59. Peyvandi F
    60. Blasi F
    61. Téllez L
    62. Blanco-Grau A
    63. Hemmrich-Stanisak G
    64. Grasselli G
    65. Costantino G
    66. Cardamone G
    67. Foti G
    68. Aneli S
    69. Kurihara H
    70. ElAbd H
    71. My I
    72. Galván-Femenia I
    73. Martín J
    74. Erdmann J
    75. Ferrusquía-Acosta J
    76. Garcia-Etxebarria K
    77. Izquierdo-Sanchez L
    78. Bettini LR
    79. Sumoy L
    80. Terranova L
    81. Moreira L
    82. Santoro L
    83. Scudeller L
    84. Mesonero F
    85. Roade L
    86. Rühlemann MC
    87. Schaefer M
    88. Carrabba M
    89. Riveiro-Barciela M
    90. Figuera Basso ME
    91. Valsecchi MG
    92. Hernandez-Tejero M
    93. Acosta-Herrera M
    94. D’Angiò M
    95. Baldini M
    96. Cazzaniga M
    97. Schulzky M
    98. Cecconi M
    99. Wittig M
    100. Ciccarelli M
    101. Rodríguez-Gandía M
    102. Bocciolone M
    103. Miozzo M
    104. Montano N
    105. Braun N
    106. Sacchi N
    107. Martínez N
    108. Özer O
    109. Palmieri O
    110. Faverio P
    111. Preatoni P
    112. Bonfanti P
    113. Omodei P
    114. Tentorio P
    115. Castro P
    116. Rodrigues PM
    117. Blandino Ortiz A
    118. de Cid R
    119. Ferrer R
    120. Gualtierotti R
    121. Nieto R
    122. Goerg S
    123. Badalamenti S
    124. Marsal S
    125. Matullo G
    126. Pelusi S
    127. Juzenas S
    128. Aliberti S
    129. Monzani V
    130. Moreno V
    131. Wesse T
    132. Lenz TL
    133. Pumarola T
    134. Rimoldi V
    135. Bosari S
    136. Albrecht W
    137. Peter W
    138. Romero-Gómez M
    139. D’Amato M
    140. Duga S
    141. Banales JM
    142. Hov JR
    143. Folseraas T
    144. Valenti L
    145. Franke A
    146. Karlsen TH
    147. Severe Covid-19 GWAS Group
    (2020) Genomewide association study of severe covid-19 with respiratory failure
    The New England Journal of Medicine 383:1522–1534.
    https://doi.org/10.1056/NEJMoa2020283
    1. Kurian AK
    2. Cardarelli KM
    (2007)
    Racial and ethnic differences in cardiovascular disease risk factors: A systematic review
    Ethnicity & Disease 17:143–152.
  2. Software
    1. Pedersen MK
    2. Eriksson R
    3. Pedersen HK
    4. Collin C
    5. Reguant R
    6. Simon C
    7. Sørup FKH
    8. Damgaard KA
    9. Birch AM
    10. Larsen M
    11. Nielsen AP
    12. Kirstine Belling SB
    (2023)
    A unidirectional mapping of ICD-8 to ICD-10 codes, for harmonized longitudinal analysis of diseases
    Under Rev.

Decision letter

  1. Philip Boonstra
    Reviewing Editor; University of Michigan, United States
  2. Eduardo L Franco
    Senior Editor; McGill University, Canada
  3. Philip Boonstra
    Reviewer; University of Michigan, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Associations of ABO and Rhesus D Blood Groups with Phenome-Wide Disease Incidence: A 41-year Retrospective Cohort Study of 482,914 Patients" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Philip Boonstra as Reviewing Editor and Reviewer #3, and the evaluation has been overseen by a Senior Editor.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Reviewers 1 and 3 both point out an issue regarding the A and O subgroups dominating the analyses due to their sample size. This seems to be a very important caveat to include in the comparison of the number of statistically significant findings per blood group.

2) Reviewer 1 comments on the lack of adjustment for patient ethnicity as a confounder (or a surrogate for other confounders). Please engage with this comment, which may involve explaining why this is unlikely, or which may involve actually trying to incorporate patient ethnicity into your models.

3) All of the reviewers raise many other good points in their Comments to the Authors, which I encourage you to read and engage with, potentially adjusting your analyses if you believe appropriate.

Reviewer #1 (Recommendations for the authors):

This study aims to address the important question of how different blood types are related to disease risk and age at diagnosis.

However, a major concern is that as per the comments in the public review, the lack of adjustment for confounding due to ethnicity represents a highly substantial limitation of this work. While it is briefly mentioned in the manuscript, this is a very major limitation that leads to very limited interpretability of the results. Incorporating ethnicity as a covariate into the analyses would be crucial.

Reviewer #2 (Recommendations for the authors):

Abstract/Intro

– "we determined the uniqueness" is a bit vague, could you more explicitly say you perform tests with A, AB, B, and O blood groups each as reference group as opposed to only O as the reference group?

– "diagnosis-wide" or "disease-wide" was used but perhaps more accurate to say "phenome wide" as in the title? Both disease-wide and diagnosis-wide are also used in the introduction before phecodes are introduced, and consistency might be better here.

– Age of disease onset specifically, not just disease onset, and perhaps age at first diagnosis is the most accurate (as is used in the introduction).

Methodological considerations

– ICD9 wasn't used? Most of the phecode mapping was done in ICD9/ICD10. Am I understanding ref 16 is in preparation? It would be important to describe how this is done and how it may bias your phecodes, particularly if ref 16 is not pre-printed yet. A good sanity check is how the prevalence/incidence of a handful of traits with this mapping compares to any other population-wide prevalence/incidence measures.

– It would be great to see a sensitivity analysis using either a mixed model to adjust for cryptic relatedness, close family structure, and population structure (presuming like other countries, the DNPR has many relatives). This may not statistically work with the quasi-Poisson model, so perhaps just restricting it to <3 degree individuals and comparing findings (of course this will decrease power, but nice to see if some of the main findings remain).

– I see it in the limitations, but please comment earlier in the manuscript on who gets a blood group determination in the hospital (e.g. people who made need a transplant sooner). Is it possible to characterize the disease prevalence in the subsection of the DNPR with a blood type and without so you can identify any major disease group biases?

– Was a power calculation used to determine the need for 100 cases?

– How is emigration recorded? National registers?

– The interquartile range would be a better descriptor of follow-up time rather than just the maximum (41 years) in the methods and in the limitations section.

– Why use a log-linear quasi-Poisson regression to estimate incidence rate ratios as opposed to logistic regression and odds ratios? It could be a valuable addition to the paper to provide odds ratios as well.

– It's good to adjust for ABO and RhD when testing RhD and ABO respectively, but is it possible to use interaction terms and consider these as well?

– It's great to use birth year to adjust for any "cohort effects" in society over time. Is attained age the age at end of the study period? Wouldn't birth year and attained age be too highly correlated to use both in the model? What is the rationale for using cubic splines for these variables rather than the numerical variables themselves? I see from the code that 20-year increments were used for the knots, any rationale for this methodology would be helpful.

– By excluding patients assigned a phecode at the start of the DNPR, would these be people who had the diagnosis previously and were recorded upon the inauguration of the register? Is there any kind of washout period you are using to define "start" of the DNPR?

– What is the ancestral breakdown of the cohort? Mostly European ancestries? While our current labels for genetic ancestry are quite rough, I think this is an important piece of information given the different distributions of blood groups across global populations.

– Excellent availability of code and summary data for tables/graphs.

Results

– The audience may be less interested in the number of significant phecodes and more in patterns. It could be good to comment on the shared phecodes between the 50, 38, 11, 53, and 28 found. What large-scale disease groupings (phewas disease categories) do these tend to fall in? Does one blood group have far more cardiovascular phecodes than another? Are any phecodes significant for more than 3 blood groups? Etc.

– Personally, I prefer p-values in scientific notation rather than <0.001 but I understand Table 2 is a lot of data to present.

– The figures could benefit from larger labels for readability.

– For the Manhattan plots, it would be good to specify -log10 FDR transformed adjusted p-values on the y-axis in addition to the figure legend.

– Were any blood groups associated with an earlier onset of outcomes?

Discussion

– What was used to identify "novel associations"? A systematic literature review? Comparison to Dahlén? I would refrain from using novel unless you define specifically how it was determined to be novel.

– A systematic comparison with what seems to be the closest study, Dahlén et al., would be beneficial as a type of replication.

– I would refrain from using the term linkage in the discussion as that may lead the reader to think of chromosomal linkage, but I think the authors mean a causal association.

– I don't think the findings support the discussion point on the selective pressure.

Reviewer #3 (Recommendations for the authors):

1. Could the authors justify why choosing to fit separate models comparing one blood type against all others, e.g. A vs. all others then AB vs. all others, is the more sensible choice than fitting one model that jointly tests for A vs. AB vs. B vs. O? I understand that there are various interpretative and statistical challenges to both, but fitting separate models is not internally consistent. The 'A vs. all others' model implicitly assumes that there is no difference in incidence in the AB, B, and O groups, but then the next model ('AB vs. all others') makes a different assumption, namely that there is no difference in incidence in the A, B, and O groups.

2. A natural limitation to this analysis is that there are more statistically significant findings in the O and A blood groups because they are the more prevalent groups, and statistical significance is driven by sample size. In this sense, it would be interesting if there were a way to account for the differences in sample size between the blood groups. Is it possible to investigate whether any of the groups have disproportionately more statistically significant findings after accounting for sample size?

3. Page 6, line 124: I think the use of the word 'confounder' here is not quite right in the technical sense, as I do not read this sentence to be claiming that sex is influencing blood type.

4. Regarding the legend for Figure 2:

a. It should have triangles rather than circles. Assuming this plot was made in ggplot2, this can be done using the override.aes argument in the guides function.

b. it would be helpful to show more than 3 values on the legend.

c. It would be helpful to use the same scale across the subfigures. What I mean is, in the bloodgroup AB figure, there is no discernible difference in size between a 1.1 and a 4.0 rate ratio.

d. I realize this is very pedantic but I believe the legend is technically not showing rate ratios but rather max(rate_ratio, 1/rate_ratio).

5. Do the authors have any intuition why Figure 1 is bimodal? My interpretation of this figure is that, among those who were hospitalized in Denmark between 2006 and 2018, the plurality was born either in the immediate post-WW2 era (makes sense to me) or the 80s (doesn't make as much sense to me).

6. Page 7, line 152: reference 20 is not related to FDR. Can the authors provide a reference for their specific approach to controlling FDR?

https://doi.org/10.7554/eLife.83116.sa1

Author response

Essential revisions:

1) Reviewers 1 and 3 both point out an issue regarding the A and O subgroups dominating the analyses due to their sample size. This seems to be a very important caveat to include in the comparison of the number of statistically significant findings per blood group.

This has now been mentioned in the lines where the number of statistically significant findings per blood groups are mentioned (lines 191-194). Further, a supplemental analysis has been included where blood group O is used as the reference (Supplementary files 6–7.

2) Reviewer 1 comments on the lack of adjustment for patient ethnicity as a confounder (or a surrogate for other confounders). Please engage with this comment, which may involve explaining why this is unlikely, or which may involve actually trying to incorporate patient ethnicity into your models.

We unfortunately do not have access to information on ethnicity. We have further elaborated on this limitation in the manuscript and pointed out that the Danish population is quite homogenous why this is less of a concern in our study as compared with studies of populations with more admixed origin (lines 341-349). For a more detailed discussion on this please see our response to the reviewers’ comments.

3) All of the reviewers raise many other good points in their Comments to the Authors, which I encourage you to read and engage with, potentially adjusting your analyses if you believe appropriate.

We have engaged with all the comments and have changed the manuscript and figures accordingly.

Reviewer #1 (Recommendations for the authors):

This study aims to address the important question of how different blood types are related to disease risk and age at diagnosis.

However, a major concern is that as per the comments in the public review, the lack of adjustment for confounding due to ethnicity represents a highly substantial limitation of this work. While it is briefly mentioned in the manuscript, this is a very major limitation that leads to very limited interpretability of the results. Incorporating ethnicity as a covariate into the analyses would be crucial.

We agreed that ethnicity may introduce bias in the analysis of some diseases. We have now further elaborated on this limitation in lines 341-349. Unfortunately, we do not have information on ethnicity, and therefore an analysis adjusting for ethnicity is not possible with the available data. While we agree that ethnicity is a concern, it should be noted that ethnicity will only introduce confounding for the analysis of diagnosis which are more common for certain ethnicities and where at the same time the distribution of the ABO/RhD blood groups are different between ethnicities. This may e.g. be the case for infectious diseases, cardiovascular diseases and bleeding disorders as has now been mentioned in the limitations. Further, the Danish population is very homogeneous (approximately 94% have European origin). Therefore, a potential bias from not being able to adjust for ethnicity will be less distinct in our study. In Supplement Figure 1, we have now provided a graph showing the ancestry of the complete Danish population, the Capital Region of Denmark, and Region Zealand from 2008-2016 (Supplemenary file 1). From the graph it can be seen that approximately 90% of the population in the Capital Region and 97% of the Region Zealand population were of European origin. This information has now been added in the method section lines 99-102.

Reviewer #2 (Recommendations for the authors):

Abstract/Intro

– "we determined the uniqueness" is a bit vague, could you more explicitly say you perform tests with A, AB, B, and O blood groups each as reference group as opposed to only O as the reference group?

The formulation has now been changed to “we determined the incidence rate ratios for each individual ABO blood group relative to all other ABO blood groups…” (lines 37-38).

– "diagnosis-wide" or "disease-wide" was used but perhaps more accurate to say "phenome wide" as in the title? Both disease-wide and diagnosis-wide are also used in the introduction before phecodes are introduced, and consistency might be better here.

We have chosen not to use the term “phenome wide” in the manuscript text because “phenome wide” without the mentioning of disease is to be understood as a PheWAS analysis where genetic information of a specific SNP is used. The title is combining “phenome-wide” with “disease incidence” to underline that it is not a classic PheWAS study. As suggested by the reviewer we have now consistently used “disease-wide” in the introduction.

– Age of disease onset specifically, not just disease onset, and perhaps age at first diagnosis is the most accurate (as is used in the introduction).

We agree that age at first diagnosis is the most accurate formulation. This has now been changed throughout the manuscript.

Methodological considerations

– ICD9 wasn't used? Most of the phecode mapping was done in ICD9/ICD10. Am I understanding ref 16 is in preparation? It would be important to describe how this is done and how it may bias your phecodes, particularly if ref 16 is not pre-printed yet. A good sanity check is how the prevalence/incidence of a handful of traits with this mapping compares to any other population-wide prevalence/incidence measures.

ICD9 codes was never used in Denmark. Ref 16 is currently under review. I have received a copy of the manuscript by the authors which I have now uploaded as a “related manuscript file” for the reviewers to see. The authors do unfortunately not wish to make it available as preprint.

– It would be great to see a sensitivity analysis using either a mixed model to adjust for cryptic relatedness, close family structure, and population structure (presuming like other countries, the DNPR has many relatives). This may not statistically work with the quasi-Poisson model, so perhaps just restricting it to <3 degree individuals and comparing findings (of course this will decrease power, but nice to see if some of the main findings remain).

This is indeed an interesting suggestion. Unfortunately, we do not have access to information on population or family structure. However, we do not believe that family structure would impose a significant amount of bias given the large sample size. As mentioned in the limitation’s ethnicity may however impose a bias which have now been further elaborated in the limitations section.

– I see it in the limitations, but please comment earlier in the manuscript on who gets a blood group determination in the hospital (e.g. people who made need a transplant sooner). Is it possible to characterize the disease prevalence in the subsection of the DNPR with a blood type and without so you can identify any major disease group biases?

We have now commented earlier on in the manuscript on the kinds of patients which were included in the study (lines 99-100). Further, we have in the limitation elaborated on what this means for generalizability (lines 336-339). With the available data is it unfortunately not possible to identify any major disease group differences between the hospitalized patients with a blood group determination and those without.

– Was a power calculation used to determine the need for 100 cases?

A power calculation was not used. Similarly to the study by Dahlén et al. the number of cut-off cases was arbitrarily picked. We have now changed the formulation in line 111-112 so that is cannot be understood as if a power calculation had been done for each of the 5x1300 analysis.

– How is emigration recorded? National registers?

Yes. This information has now been added to the manuscript (line 104).

– The interquartile range would be a better descriptor of follow-up time rather than just the maximum (41 years) in the methods and in the limitations section.

In the methods and limitation we referred to the length of the study period which is the maximum length of follow-up. The median and interquartile range of follow-up time in the cohort is given in Table 1.

– Why use a log-linear quasi-Poisson regression to estimate incidence rate ratios as opposed to logistic regression and odds ratios? It could be a valuable addition to the paper to provide odds ratios as well.

A logistic regression would not take time-to-event into account. Thus, incorporating person-time using a log-linear Poisson regression enriches the analysis as compared with a logistic regression.

– It's good to adjust for ABO and RhD when testing RhD and ABO respectively, but is it possible to use interaction terms and consider these as well?

Similar to the study by Dahlén et al. we chose not to consider interactions between the ABO and RhD blood groups as there is little evidence that such interaction exists.

– It's great to use birth year to adjust for any "cohort effects" in society over time. Is attained age the age at end of the study period? Wouldn't birth year and attained age be too highly correlated to use both in the model? What is the rationale for using cubic splines for these variables rather than the numerical variables themselves? I see from the code that 20-year increments were used for the knots, any rationale for this methodology would be helpful.

Attained age is not the age at the end of the study period. If so attained age and birth year would indeed be highly correlated. Attained age is the underlying person-time. In a classic logistic regression time is not incorporated. However, we are doing a time-to-event study using a log-linear Possion regression incorporating person-time. Attained age is thus the age of each individual at each time period in the study. Thus, we are comparing individuals of the same age over time. As explained in the methods section, attained age is divide into 1-year time intervals and used as the underlying time.

We did use the numerical variables in the models however instead of assuming linear relationships we used cubic splines to allow flexible modeling which better adjusts for potential non-linear relationships. For a more elaborated discussion on the benefits modelling using restricted cubic splines the reviewer is referred to: Gauthier, J., Q. V. Wu, and T. A. Gooley. "Cubic splines to model relationships between continuous variables and outcomes: a guide for clinicians." Bone marrow transplantation 55.4 (2020): 675-680.

– By excluding patients assigned a phecode at the start of the DNPR, would these be people who had the diagnosis previously and were recorded upon the inauguration of the register? Is there any kind of washout period you are using to define "start" of the DNPR?

This is likely to be people who had the diagnosis previously which was then registered at time the registry was created. Few is also likely to be individuals who got the diagnosis at the very start of the registry. Because this is unknown, we chose to exclude these patients from the analysis as this information was regarded too noisy. There is no wash-out period. The “start” of the DNPR is defined as the official start of the registry. Few patient were excluded from this procedure and thus the effect on the analysis is very limited.

– What is the ancestral breakdown of the cohort? Mostly European ancestries? While our current labels for genetic ancestry are quite rough, I think this is an important piece of information given the different distributions of blood groups across global populations.

The Danish population is very homogeneous and mostly of European ancestry. We have now added a table in Supplementary file 1 showing the ancestral distribution in Denmark in the inclusion period. We do unfortunately not have any information available on ancestry or ethnicity of the study sample. This fact has now been further elaborated in the limitations section lines 341-348.

– Excellent availability of code and summary data for tables/graphs.

Thanks. We have tried to make the analysis code easy accessible to allow for replication in other cohorts.

Results

– The audience may be less interested in the number of significant phecodes and more in patterns. It could be good to comment on the shared phecodes between the 50, 38, 11, 53, and 28 found. What large-scale disease groupings (phewas disease categories) do these tend to fall in? Does one blood group have far more cardiovascular phecodes than another? Are any phecodes significant for more than 3 blood groups? Etc.

We have now commented on the phewas disease groupings of the blood groups (lines 198-201).

– Personally, I prefer p-values in scientific notation rather than <0.001 but I understand Table 2 is a lot of data to present.

This was also something we considered. However, we decided that we would rather direct the reader’s attention to the confidence intervals than the p-value. Therefore, we used the <0.001 notation.

– The figures could benefit from larger labels for readability.

Larger labels have now been added to the figures.

– For the Manhattan plots, it would be good to specify -log10 FDR transformed adjusted p-values on the y-axis in addition to the figure legend.

This has now been specified on the y-axis.

– Were any blood groups associated with an earlier onset of outcomes?

In table 3 we show the findings from the analysis on the ABO and RhD blood groups association with age at the first diagnosis.

Discussion

– What was used to identify "novel associations"? A systematic literature review? Comparison to Dahlén? I would refrain from using novel unless you define specifically how it was determined to be novel.

We did a literature search ourselves but agree that without a reference to a published systematic literature review the term novel should not be used. We have now removed the term “novel” lines 266 and 268.

– A systematic comparison with what seems to be the closest study, Dahlén et al., would be beneficial as a type of replication.

We agree with the reviewer that such comparison would be beneficial. However, a systematic comparison would be very comprehensive, and we believe it would be more appropriate as a manuscript on its own. In the discussion we have instead chosen to discuss a few selected findings some of which have also been found in the study by Dahlén et al. as indicated by a reference to the study by Dahlén et al. For example, we highlight associations of pancreatic cancer, gastroduodenal ulcers, type 2 diabetes, and cardiovascular diseases which was also observed by Dahlén et al. Lastly, it should be noted that the two studies cannot be compared directly as Dahlén et al. used blood groups O as the reference and we used all other blood groups as the reference.

– I would refrain from using the term linkage in the discussion as that may lead the reader to think of chromosomal linkage, but I think the authors mean a causal association.

The term linkage has now been changed to relationship (line 273).

– I don't think the findings support the discussion point on the selective pressure.

We agree, this point has now been removed from the discussion and conclusion.

Reviewer #3 (Recommendations for the authors):

1. Could the authors justify why choosing to fit separate models comparing one blood type against all others, e.g. A vs. all others then AB vs. all others, is the more sensible choice than fitting one model that jointly tests for A vs. AB vs. B vs. O? I understand that there are various interpretative and statistical challenges to both, but fitting separate models is not internally consistent. The 'A vs. all others' model implicitly assumes that there is no difference in incidence in the AB, B, and O groups, but then the next model ('AB vs. all others') makes a different assumption, namely that there is no difference in incidence in the A, B, and O groups.

The aim of the study was to determine how each individual ABO blood group is distinct from all other ABO blood groups in terms of disease susceptibility. This question can only be answered by fitting separate models defining the patients as either having the blood group under study or not. An alternative, which would answer a different research question, would be to do a pairwise comparison between each ABO blood group would require 8x1,312 analyses (see response 1 to reviewer #1s comment). We do not believe that one model which jointly tests for all pair-wise comparisons can be fitted, the model would need to define a reference which in previous studies have been blood group O. The research question which has been asked in previous studies is thus how does blood group A, B and AB differ from blood group O in terms of disease susceptibility, respectively. We believe it to be more informative to ask how each blood group differs from all the other blood groups. However, we agree with the reviewer that this has its limitations because of the difference in the frequency of the individual ABO blood groups. We have now added a Supplementary Analysis where blood group O has been used as the reference (lines 145-147). The results of the supplemental analysis are presented in Supplementary files 6–7.

2. A natural limitation to this analysis is that there are more statistically significant findings in the O and A blood groups because they are the more prevalent groups, and statistical significance is driven by sample size. In this sense, it would be interesting if there were a way to account for the differences in sample size between the blood groups. Is it possible to investigate whether any of the groups have disproportionately more statistically significant findings after accounting for sample size?

We agree that this is a natural limitation to any study concerning blood groups and we have now commented on this in line 192-195. The only way to be able to better find associations for the less prevalent blood groups would be to include more patients, potentially by conducting a similar study combining cohorts from several countries to increase the power. We do not know of a way to determine if any of the blood groups have disproportionately more statistically significant findings while accounting for sample size.

3. Page 6, line 124: I think the use of the word 'confounder' here is not quite right in the technical sense, as I do not read this sentence to be claiming that sex is influencing blood type.

We agree with the reviewer that this was an incorrect formulation. We have now changed the statement.

4. Regarding the legend for Figure 2:

a. It should have triangles rather than circles. Assuming this plot was made in ggplot2, this can be done using the override.aes argument in the guides function.

We thank the reviewer for this tip. The circles have now been replaced by triangles.

b. it would be helpful to show more than 3 values on the legend.

We have now added two more legends so that five values are shown. However, for the comparison of the IRRs the readers should instead use table 2.

c. It would be helpful to use the same scale across the subfigures. What I mean is, in the bloodgroup AB figure, there is no discernible difference in size between a 1.1 and a 4.0 rate ratio.

We have now applied the same legends in all plots.

d. I realize this is very pedantic but I believe the legend is technically not showing rate ratios but rather max(rate_ratio, 1/rate_ratio).

This is correct. It is showing the rate ratio if rate ratio >= 1 and 1/rate ratio if the rate ratio <1. This makes it possible to compare the sizes of the rate ratios for both positive and negative associations. The direction of the triangles shows if it is a positive or inverse association as mentioned in the figure text.

5. Do the authors have any intuition why Figure 1 is bimodal? My interpretation of this figure is that, among those who were hospitalized in Denmark between 2006 and 2018, the plurality was born either in the immediate post-WW2 era (makes sense to me) or the 80s (doesn't make as much sense to me).

We agree with the reviewers first point. The reason that the distribution is bimodal is likely because the older population will be hospitalized in the inclusion period corresponding to the first peak. The second peak is likely caused by pregnant women who would be hospitalized when given birth and who commonly have their blood type determined in the hospital.

6. Page 7, line 152: reference 20 is not related to FDR. Can the authors provide a reference for their specific approach to controlling FDR?

Thanks for spotting this error. We have now also added a reference to the specific approach used to control for FDR.

https://doi.org/10.7554/eLife.83116.sa2

Article and author information

Author details

  1. Peter Bruun-Rasmussen

    1. Department of Clinical Immunology, Copenhagen University Hospital, Copenhagen, Denmark
    2. Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3595-1311
  2. Morten Hanefeld Dziegiel

    Department of Clinical Immunology, Copenhagen University Hospital, Copenhagen, Denmark
    Contribution
    Conceptualization, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8034-1523
  3. Karina Banasik

    Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Conceptualization, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Pär Ingemar Johansson

    Department of Clinical Immunology, Copenhagen University Hospital, Copenhagen, Denmark
    Contribution
    Conceptualization, Supervision, Writing – review and editing
    Competing interests
    has received grants from the AP Møller Foundation, Innovation Fund Denmark and Novo Nordisk Foundation. The author has been issued the following patents: Publication no: 20110201553, 20110268732, 20130040898, 20130261177, 20150057325, 20160113891, 9381166, 9381243, 20160250164, 9433589, 20160303040 and US20090053193A1. PI Johansson reports ownership of stocks in Trial-Lab AB, Endothel Pharma ApS, TissueLink ApS, and MoxieLab ApS. PI Johansson declares that the financial interests listed have no impact on the submitted work. The author has no other competing interests to declare. The author declares that the financial interests listed have no impact on the submitted work
  5. Søren Brunak

    Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
    Contribution
    Conceptualization, Resources, Supervision, Writing – review and editing
    For correspondence
    soren.brunak@cpr.ku.dk
    Competing interests
    participates on the Danish National Genome Center advisory board and is the Chairman for the data infrastructure board. The author has stock in Intomics A/S, Hoba Therapeutics Aps, Novo Nordisk A/S, Lundbeck A/S and ALK Abello. The author participates on the board of directors for both Proscion A/S and Intomics A/S. The author has no other competing interests to declare. SB declares that the financial interests listed have no impact on the submitted work
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0316-5866

Funding

Novo Nordisk Fonden (NNF14CC0001)

  • Søren Brunak

Novo Nordisk Fonden (NNF17OC0027594)

  • Søren Brunak

Innovation Fund Denmark (5153-00002B)

  • Søren Brunak

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This study was performed as a part of the CAG (Clinical Academic Group) Center for Endotheliomics under the Greater Copenhagen Health Science Partners (GCHSP). Sources of Funding The study was supported by the Novo Nordisk Foundation (grants NNF14CC0001 and NNF17OC0027594) and the Innovation Fund Denmark (grant 5153-00002B). The funders played no role in the conduct of the study. Funding Novo Nordisk Foundation and the Innovation Fund Denmark

Ethics

Human subjects: This is a register-based study and informed consent for such studies is waived by the Danish Data Protection Agency. Data access was approved by the Danish Patient Safety Authority (3-3013-1731), the Danish Data Protection Agency (DT SUND 2016-50 and 2017-57) and the Danish Health Data Authority (FSEID 00003092 and FSEID 00003724).

Senior Editor

  1. Eduardo L Franco, McGill University, Canada

Reviewing Editor

  1. Philip Boonstra, University of Michigan, United States

Reviewer

  1. Philip Boonstra, University of Michigan, United States

Publication history

  1. Received: August 31, 2022
  2. Preprint posted: September 26, 2022 (view preprint)
  3. Accepted: March 8, 2023
  4. Accepted Manuscript published: March 9, 2023 (version 1)
  5. Version of Record published: March 27, 2023 (version 2)

Copyright

© 2023, Bruun-Rasmussen et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 331
    Page views
  • 57
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Peter Bruun-Rasmussen
  2. Morten Hanefeld Dziegiel
  3. Karina Banasik
  4. Pär Ingemar Johansson
  5. Søren Brunak
(2023)
Associations of ABO and Rhesus D blood groups with phenome-wide disease incidence: A 41-year retrospective cohort study of 482,914 patients
eLife 12:e83116.
https://doi.org/10.7554/eLife.83116

Further reading

    1. Epidemiology and Global Health
    2. Immunology and Inflammation
    Zaki A Sherif, Christian R Gomez ... RECOVER Mechanistic Pathway Task Force
    Review Article

    COVID-19, with persistent and new onset of symptoms such as fatigue, post-exertional malaise, and cognitive dysfunction that last for months and impact everyday functioning, is referred to as Long COVID under the general category of post-acute sequelae of SARS-CoV-2 infection (PASC). PASC is highly heterogenous and may be associated with multisystem tissue damage/dysfunction including acute encephalitis, cardiopulmonary syndromes, fibrosis, hepatobiliary damages, gastrointestinal dysregulation, myocardial infarction, neuromuscular syndromes, neuropsychiatric disorders, pulmonary damage, renal failure, stroke, and vascular endothelial dysregulation. A better understanding of the pathophysiologic mechanisms underlying PASC is essential to guide prevention and treatment. This review addresses potential mechanisms and hypotheses that connect SARS-CoV-2 infection to long-term health consequences. Comparisons between PASC and other virus-initiated chronic syndromes such as myalgic encephalomyelitis/chronic fatigue syndrome and postural orthostatic tachycardia syndrome will be addressed. Aligning symptoms with other chronic syndromes and identifying potentially regulated common underlining pathways may be necessary for understanding the true nature of PASC. The discussed contributors to PASC symptoms include sequelae from acute SARS-CoV-2 injury to one or more organs, persistent reservoirs of the replicating virus or its remnants in several tissues, re-activation of latent pathogens such as Epstein–Barr and herpes viruses in COVID-19 immune-dysregulated tissue environment, SARS-CoV-2 interactions with host microbiome/virome communities, clotting/coagulation dysregulation, dysfunctional brainstem/vagus nerve signaling, dysautonomia or autonomic dysfunction, ongoing activity of primed immune cells, and autoimmunity due to molecular mimicry between pathogen and host proteins. The individualized nature of PASC symptoms suggests that different therapeutic approaches may be required to best manage specific patients.

    1. Epidemiology and Global Health
    Mette Hartmann Nonboe, George Napolitano ... Elsebeth Lynge
    Research Article

    Background:

    Denmark was one of the few countries where it was politically decided to continue cancer screening during the COVID-19 pandemic. We assessed the actual population uptake of mammography and cervical screening during this period.

    Methods:

    The first COVID-19 lockdown in Denmark was announced on 11 March 2020. To investigate possible changes in cancer screening activity due to the COVID-19 pandemic, we analysed data from the beginning of 2017 until the end of 2021. A time series analysis was carried out to discover possible trends and outliers in the screening activities in the period 2017–2021. Data on mammography screening and cervical screening were retrieved from governmental pandemic-specific monitoring of health care activities.

    Results:

    A brief drop was seen in screening activity right after the first COVID-19 lockdown, but the activity quickly returned to its previous level. A short-term deficit of 43% [CI –49 to –37] was found for mammography screening. A short-term deficit of 62% [CI –65 to –58] was found for cervical screening. Furthermore, a slight, statistically significant downward trend in cervical screening from 2018 to 2021 was probably unrelated to the pandemic. Other changes, for example, a marked drop in mammography screening towards the end of 2021, also seem unrelated to the pandemic.

    Conclusions:

    Denmark continued cancer screening during the pandemic, but following the first lockdown a temporary drop was seen in breast and cervical screening activity.

    Funding:

    Region Zealand (R22-A597).