Associations of ABO and Rhesus D blood groups with phenome-wide disease incidence: A 41-year retrospective cohort study of 482,914 patients
Abstract
Background:
Whether natural selection may have attributed to the observed blood group frequency differences between populations remains debatable. The ABO system has been associated with several diseases and recently also with susceptibility to COVID-19 infection. Associative studies of the RhD system and diseases are sparser. A large disease-wide risk analysis may further elucidate the relationship between the ABO/RhD blood groups and disease incidence.
Methods:
We performed a systematic log-linear quasi-Poisson regression analysis of the ABO/RhD blood groups across 1,312 phecode diagnoses. Unlike prior studies, we determined the incidence rate ratio for each individual ABO blood group relative to all other ABO blood groups as opposed to using blood group O as the reference. Moreover, we used up to 41 years of nationwide Danish follow-up data, and a disease categorization scheme specifically developed for diagnosis-wide analysis. Further, we determined associations between the ABO/RhD blood groups and the age at the first diagnosis. Estimates were adjusted for multiple testing.
Results:
The retrospective cohort included 482,914 Danish patients (60.4% females). The incidence rate ratios (IRRs) of 101 phecodes were found statistically significant between the ABO blood groups, while the IRRs of 28 phecodes were found statistically significant for the RhD blood group. The associations included cancers and musculoskeletal-, genitourinary-, endocrinal-, infectious-, cardiovascular-, and gastrointestinal diseases.
Conclusions:
We found associations of disease-wide susceptibility differences between the blood groups of the ABO and RhD systems, including cancer of the tongue, monocytic leukemia, cervical cancer, osteoarthrosis, asthma, and HIV- and hepatitis B infection. We found marginal evidence of associations between the blood groups and the age at first diagnosis.
Funding:
Novo Nordisk Foundation and the Innovation Fund Denmark
Editor's evaluation
This important retrospective analysis of nearly 500,000 hospitalized Danish patients sheds light on the possible relationships between blood type and susceptibility to a host of diseases. The Danish National Patient Register is a compelling data source, and the statistical methodology is solid. The findings reported herein provide evidence, supporting information, and potential hypotheses for researchers interested in the causes and etiology of diseases as they relate to blood type.
https://doi.org/10.7554/eLife.83116.sa0Introduction
Still 100 years after the discovery of the ABO and Rhesus D (RhD) blood group systems, the selective forces that may have attributed to the observed blood group population differences remain elusive (Anstee, 2010). The pathophysiological mechanisms behind the observed relationship between blood groups and diseases are not well understood either. The ABO system has been associated with susceptibility to multiple diseases, including gastrointestinal- and cardiovascular diseases and pancreatic-, gastric-, and ovarian cancers (Vasan et al., 2016; Liumbruno and Franchini, 2014; Wolpin et al., 2010; Groot et al., 2020; Edgren et al., 2010; Dahlén et al., 2021; Li and Schooling, 2020). The ABO system has also been associated with the susceptibility, progression, and severity of COVID-19 (Ellinghaus et al., 2020). In contrast, apart from hemolytic disease of the newborn, reported associations between the RhD blood group and disease development are sparser (Anstee, 2010).
Specifically, higher levels of factor VIII (FVIII) and von Willebrand factor (vWF) observed in individuals with a non-O blood group have been suggested to affect the development of cardiovascular disease (Jenkins and O’Donnell, 2006; Franchini and Lippi, 2015). Additionally, blood group-related antigens have been suggested to be involved in the adhesion of trophoblast, inflammatory cells, and metastatic tumor cells to the endothelial cells of the vasculature (Ravn and Dabelsteen, 2000). The endothelial cells of the vasculature have also been suggested to contribute to the initiation and propagation of severe clinical manifestations of COVID-19 (Teuwen et al., 2020).
Recently, an associative disease-wide risk analysis of the ABO and RhD blood groups was conducted in a large Swedish cohort (Dahlén et al., 2021). The study generated further support for previous findings and suggested new associations. Here, we further uncover the relationship between the ABO and RhD blood groups and disease susceptibility using a Danish cohort of 482,914 patients. In contrast to previous studies, we use up to 41 years of follow-up data, and a disease categorization scheme specifically developed for disease-wide analysis called phecodes (Wu et al., 2019). Further, we determine the uniqueness of each individual ABO blood group as opposed to using blood group O as the reference.
We estimate incidence rate ratios of 1312 phecodes (diagnoses) for the ABO and RhD blood groups. Further, we determine associations between the ABO/RhD blood groups and the age at the first diagnosis to better disclose the temporal life course element of disease development.
Methods
Study design
This retrospective cohort study was based on the integration of the Danish National Patient Registry (DNPR) and data on ABO/RhD blood groups of hospitalized patients. We included Danish patients who had their ABO/RhD blood group determined in the Capital Region or Region Zealand (covering ~45% of the Danish population Nordjylland, 2022), between January 1, 2006, and April 10, 2018. A blood type determination is commonly done for patients who may require a blood transfusion during hospitalization for example, anemic patients and women in labor. In the inclusion period, approximately 90% of the population in the Capital Region and 97% of the Region Zealand population were of European ancestry (Supplementary file 1). The DNPR provided the International Classification of Diseases 8th and 10th revision (ICD-8 and ICD-10) diagnosis codes, dates of diagnosis, date of birth, date of potential emigration, and sex of patients, with records dating back to 1977.
Similar to a case-control study, the patients were included retrospectively. Here, selection into the study was based on an in-hospital ABO/RhD blood group determination. That is, the person-time and the entire disease history back to 1977 of patients hospitalized between 2006 and 2018 with known ABO/RhD blood groups were included retrospectively.
We defined diseased and non-diseased individuals using the phecode mapping from ICD-10 diagnosis codes (Wu et al., 2019). Before categorizing the assigned ICD diagnosis codes into phecodes, the ICD-8 codes were converted to ICD-10 codes (Pedersen et al., 2023). Further, referral diagnoses were excluded. Pregnancy- and perinatal diagnosis (ICD-10 chapters 15–16) assigned before or after age 10 were excluded, or, when possible, rightly assigned to the mother or newborn, respectively. The disease categories of injuries, poisonings, and symptoms were deemed unlikely to be associated with the blood groups and excluded from the analyses (phecode categories: ‘injuries and poisonings’, ‘symptoms’ and phecodes above 999). Only phecodes with at least 100 cases in the study sample were included. The patients were followed from the entry in the DNPR to the date of death, emigration, the first event of the studied phecode, or end study period (April 10, 2018), whichever came first. Thus, follow-up was up to 41 years. The patients were allowed to contribute events and time at risk to multiple phecode analysis.
Diagnosis-wide incidence rate ratios
We used a log-linear quasi-Poisson regression model to estimate incidence rate ratios (IRRs) of each phecode among individuals with blood groups A, B, AB, and O relative to the other blood groups, respectively (e.g. A vs. B, AB, and O) (Dewey et al., 1995; Ver Hoef and Boveng, 2007). Further, we compared individuals with positive RhD type relative to negative RhD type. The analyses of diseases developed by both males and females were adjusted for sex, while analyses of sex-restricted diseases (e.g. cervical cancer) only included a subgroup of individuals of the restricted sex. Sex-restricted diseases were pre-defined by the phecode terminology. Sex was adjusted for as prior studies have found sex differences in the incidence rates of multiple diseases (Westergaard et al., 2019). Further, we adjusted for the year of birth and attained age, both modeled using restricted cubic splines with five knots. Attained age was split into 1 year intervals and treated as a time-dependent covariate, thus allowing individuals to move between categories with time. Herewith, age was used as the underlying time scale. Further, an interaction between attained age and sex was modeled for non-sex-restricted analyses. Patients were excluded from the analysis if they were assigned the phecode under study at the start of the DNPR. For analysis of congenital phecodes (e.g. sickle cell disease), prevalence ratios were estimated instead of IRRs by using the cohort size as the offset (see Supplementary file 2 for a list of the congenital phecodes). The analyses of ABO blood groups were adjusted for RhD type, and the RhD-analyses were adjusted for the ABO blood group. Adjustment for the birth year was done to control for societal changes and was used instead of the calendar period of diagnosis. The robust quasi-Poisson variance formula was used to control for over-dispersion (Ver Hoef and Boveng, 2007). We conducted a supplemental analysis using the same methodology but where blood group O was instead used as the reference to enable direct comparison and meta-analysis with previous studies.
Age of first hospital diagnosis
We estimated differences in age of first phecode of individuals with blood group A, B, AB, and O relative to any other blood group, respectively. Similar analyses were done for RhD-positive individuals relative to RhD negative individuals. We used a linear regression model adjusted for sex and birth year (as a restricted cubic spline with five knots). Analysis of sex-restricted phecodes was not adjusted for sex. Individuals who were assigned the studied phecode at the start date of the DNPR were excluded as the age of diagnosis was uncertain. Further, congenital- and pregnancy-related phecodes were not included.
Statistical analyses were performed in R (version 3.6.2) using the survival and rms package. p-values were two-sided. p-values and confidence intervals were adjusted for multiple testing by the false discovery rate (FDR) approach, accounting for the number of performed tests (5 blood groups times 1312 phecodes; Benjamini and Hochberg, 1995; Altman and Bland, 2011). FDR adjusted p-values <0.05 were deemed statistically significant. The analysis pipeline was made in python (anaconda3/5.3.0) using snakemake for reproducibility (Köster and Rahmann, 2012). The analyses code is available through https://www.github.com/peterbruun/blood_type_study (copy archived at Bruun-Rasmussen, 2023). The manuscript complies with the STROBE reporting guidelines.
Results
In total, 482,914patients (60.4%females) were included and 1312 phecodes (diagnosis codes) were examined (Figure 1, and Table 1). The median follow-up time for all phecode analyses was 17,555,322 person-years (Q1-Q3: 17,324,597–17,615,142). The cohort held a wide age distribution of patients born from 1901 to 2015 (Table 1, and Supplementary file 3). The ABO/RhD blood group distribution of the patients was similar to that of a previously summarized reference population of 2.2 million Danes (Table 1; Barnkob et al., 2020; Banks, 2022).
Incidence rate ratios
After adjustment for multiple testing, we found the incidence rate ratios (IRRs) of 101 and 28 phecodes (116 unique) to be statistically significant for the ABO and RhD blood groups, respectively. The statistically significant IRRs are given with 95% confidence intervals in Table 2. The estimates of all examined phecodes are given in Supplementary file 4. Further, Manhattan plots of the p-values and disease categories are presented in Figures 2—6 .
The number of statistically significant IRRs for A, B, AB, O, and RhD were 50, 38, 11, 53, and 28, respectively. However, a between blood group comparison on the number of statistically significant IRRs is problematic because the analyses of blood group A and O had the highest power given that these blood groups were most frequent in the study sample (Table 1). For 13 phecodes, an association was found for both the ABO blood group and the RhD blood group. The ABO blood groups were found positively associated with 75 phecodes and inversely associated with 67 phecodes. The RhD-positive blood group was found to have 16 positive- and 12 inverse associations. Blood groups A and O were associated with diseases of the circulatory and digestive system. Blood group B was associated with several infectious, metabolic, and musculoskeletal diseases. The associations of the RhD blood group included cancers, infectious diseases, and pregnancy complications. The results of the supplementary analyses where blood group O was used as the reference is shown in Supplementary files 6 and 7.
Age at first diagnosis
We found the B blood group to be associated with a later diagnosis of viral infection. Further, blood group O was associated with a later diagnosis of phlebitis and thrombophlebitis (Table 3 and Supplementary file 5). The RhD-positive group was associated with a later diagnosis of acute and chronic tonsilitis diagnosis.
Discussion
We found the ABO/RhD blood groups to be associated with a wide spectrum of diseases including cancers and musculoskeletal-, genitourinary-, endocrinal-, infectious-, cardiovascular-, and gastrointestinal diseases. Associations of the ABO blood groups included monocytic leukemia, tonsilitis, renal dialysis, diseases of the female reproductive system, and osteoarthrosis. Associations of the RhD blood group included cancer of the tongue, malignant neoplasm (other), tuberculosis-, HIV-, hepatitis B infection, type 2 diabetes, hereditary hemolytic anemias, major puerperal infection, anxiety disorders, and contracture of tendon.
The blood groups may reflect their corresponding genetic markers; thus, our findings may indicate an association between disease and the ABO locus on chromosome 9 and the RH locus on chromosome 1, respectively. Alternatively, the associations may indicate that the blood groups are involved in disease mechanisms at the molecular level mediated either through the blood group antigens or by the blood group reactive antibodies. However, our findings have a compromised causal interpretation given the retrospective inclusion of individuals (and person-time) after an in-hospital blood group test.
Our results support several previously observed associations including positive associations between the non-O blood groups and prothrombotic diseases of the circulatory system (phecodes: 411.1–459.9), associations with gastroduodenal ulcers, associations of blood group O and lower risk of type 2 diabetes, and positive association between blood group B and tuberculosis (Vasan et al., 2016; Edgren et al., 2010; Dahlén et al., 2021; Fagherazzi et al., 2015; Rao, 2012). Further, our results support findings associating non-O blood groups with increased risk of pancreatic cancer (Liumbruno and Franchini, 2014). The role of the ABO blood group in HIV susceptibility remains controversial; we only observed a positive association for the RhD-positive blood group (Davison et al., 2020).
We found blood group B to be positively associated with ‘ectopic pregnancy’, ‘excessive vomiting in pregnancy, and ‘abnormality of organs and soft tissues of pelvis complicating pregnancy’ indicating that blood group B mothers may be more likely to experience pregnancy complications. Further, we found positive associations of blood group A with both ‘mucous polyp of cervix’, and blood group AB with ‘cervicitis and endocervicitis’. Taken together these findings may indicate that the ABO blood groups are associated with diseases of the female reproductive system. However, the study design does not allow for any causal interpretation.
Only a few statistically significant associations were found for the analyses of the age of the first diagnosis; thus, indicating that the blood group’s involvement in disease onset may be marginal. However, we assumed a linear relationship with age because assessing potential non-linear relationships for each disease would be unfeasible given the large number of tests performed. The linearity assumption may not hold for all analyses which limits the interpretation of the estimates.
A strength of our approach is that we utilized the phecode disease classification scheme that is specifically developed for disease-wide risk analyses (Wu et al., 2019) The phecode mapping scheme combines ICD-10 codes that clinical domain experts have deemed to cover the same disease. For example, respiratory tuberculosis (A16), tuberculosis of nervous system (A17), and miliary tuberculosis (A19), are combined into the phecode tuberculosis (phecode 10). Phecodes may therefore provide increased power and precision compared with using ICD-10 categories (Denny et al., 2010). Further, contrary to previous studies, we compared each blood group to all other blood groups, instead of determining effect estimates relative to blood group O. Thus, here we better capture the uniqueness of each individual ABO blood group.
Limitations
Our study has some important limitations, firstly, the retrospective inclusion of patients and person-time may have introduced an immortal time bias from deaths before enrollment (in-hospital ABO/RhD blood group test) (Yadav and Lewis, 2021). The findings are therefore conditioned on patients surviving until the enrollment period. This implies, for example, that if a specific blood group causes a higher incidence of a deadly disease, then patients with such blood group are more likely to have died before enrollment, and therefore fewer individuals having both that blood group and the disease will be present in our cohort. If so, the direction of the estimates for deadly diseases strongly related to any blood group will have been lowered or even flipped, relative to any causal relationship. The study design, however, enabled 41 year of follow-up and was deemed reasonable because the blood groups have not been associated with mortality differences. Moreover, the blood group distribution in our cohort was found to be almost identical to a reference population of 2.2 million Danish blood donors. Further, we replicated several findings of associations between the blood groups and severe diseases, including pancreatic cancer (Vasan et al., 2016; Liumbruno and Franchini, 2014). This may indicate that the potential bias was less prevalent. Further, by controlling for year of birth, the potential effects of immortal time bias were likely reduced, however, this could not be tested. Immortal time biases are potentially applicable in many biobanks studies, e.g. when using the UK Biobank for retrospective studies (Yadav and Lewis, 2021).
The generalizability of our findings is limited further because our cohort solely included hospitalized patients with known ABO and RhD blood groups. These are patients whom the treating doctor has deemed likely to potentially require a blood transfusion during hospitalization. The patients under study might therefore suffer from other diseases than patients without a determined blood group, and than never hospitalized individuals. Further, diseases that do not require hospitalization could not be examined. If the effect sizes are modified by factors which are more common in our cohort than in the general population then the estimates may not be generalizable. However, it is unclear if such effect modifier exists. Lastly, it was not possible to adjust for possible confounding from the geographical distribution or ethnicity of the patients (Anstee, 2010). This may have biased some estimates because the distribution of blood groups varies between ethnicities while ethnicity is also associated with differences in disease susceptibility. Particularly, ethnicity has been associated with differences in prevalence of infectious-, cardiovascular-, sickle cell disease, and thasalamia (Kurian and Cardarelli, 2007; McQuillan et al., 2004). Thus, the estimate of these disease groups should be interpreted with caution. The Danish population is however quite homogenous and approximately 94% of Danes have European ancestry (Supplementary file 1). Therefore, a potential bias from ethnicity may be less prevalent in our cohort as compared with studies in populations of more admixed origin.
In conclusion, we found the ABO/RhD blood groups to be associated with a wide spectrum of diseases, including cardiovascular-, infectious-, gastrointestinal- and musculoskeletal diseases. This may indicate that some of the potential selective pressure on the blood groups can be attributed to disease susceptibility differences. We found few associations between the blood groups and age of first diagnosis.
Data availability
Anonymized patient data was used in this study. Due to national and EU regulations, the data cannot be shared with the wider research community. However, data can be accessed upon relevant application to the Danish authorities. The Danish Patient Safety Authority and the Danish Health Data Authority have permitted the use of the data in this study; whilst currently, the appropriate authority for journal data use in research is the regional committee ("Regionsråd"). The statistical summary data used to create the tables and graphs are available as Table 2—source data 1 and Table 3—source data 1. The analysis code is publicly available through https://www.github.com/peterbruun/blood_type_study (copy archived at Bruun-Rasmussen, 2023).
References
-
Controlling the false discovery rate: A practical and powerful approach to multiple testingJournal of the Royal Statistical Society 57:289–300.https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
-
SoftwareBlood_type_study, version swh:1:rev:411336fb15b802c4a257e448d7c12f2340f01694Software Heritage.
-
Statistical models in epidemiologyJournal of the Royal Statistical Society. Series A Statistics in Society 158:343.https://doi.org/10.2307/2983301
-
Risk of gastric cancer and peptic ulcers in relation to ABO blood type: a cohort studyAmerican Journal of Epidemiology 172:1280–1285.https://doi.org/10.1093/aje/kwq299
-
Genomewide association study of severe covid-19 with respiratory failureThe New England Journal of Medicine 383:1522–1534.https://doi.org/10.1056/NEJMoa2020283
-
Genetically determined ABO blood group and its associations with health and diseaseArteriosclerosis, Thrombosis, and Vascular Biology 40:830–838.https://doi.org/10.1161/ATVBAHA.119.313658
-
Snakemake -- a scalable bioinformatics workflow engineBioinformatics 28:2520–2522.https://doi.org/10.1093/bioinformatics/bts480
-
Racial and ethnic differences in cardiovascular disease risk factors: A systematic reviewEthnicity & Disease 17:143–152.
-
Hemostasis, cancer, and ABO blood group: The most recent evidence of associationJournal of Thrombosis and Thrombolysis 38:160–166.https://doi.org/10.1007/s11239-013-1027-4
-
Racial and ethnic differences in the seroprevalence of 6 infectious diseases in the United States: Data from NHANES III, 1988-1994American Journal of Public Health 94:1952–1958.https://doi.org/10.2105/ajph.94.11.1952
-
SoftwareA unidirectional mapping of ICD-8 to ICD-10 codes, for harmonized longitudinal analysis of diseasesUnder Rev.
-
The ABO blood group distribution and pulmonary tuberculosisJournal of Clinical and Diagnostic Research 6:943–946.https://doi.org/10.7860/JCDR.2012.4370.2298
-
COVID-19: The vasculature unleashedNature Reviews. Immunology 20:389–391.https://doi.org/10.1038/s41577-020-0343-0
-
Mapping icd-10 and icd-10-cm codes to phecodes: Workflow development and initial evaluationJMIR Medical Informatics 7:e14325.https://doi.org/10.2196/14325
Article and author information
Author details
Funding
Novo Nordisk Fonden (NNF14CC0001)
- Søren Brunak
Novo Nordisk Fonden (NNF17OC0027594)
- Søren Brunak
Innovation Fund Denmark (5153-00002B)
- Søren Brunak
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This study was performed as a part of the CAG (Clinical Academic Group) Center for Endotheliomics under the Greater Copenhagen Health Science Partners (GCHSP). Sources of Funding The study was supported by the Novo Nordisk Foundation (grants NNF14CC0001 and NNF17OC0027594) and the Innovation Fund Denmark (grant 5153-00002B). The funders played no role in the conduct of the study. Funding Novo Nordisk Foundation and the Innovation Fund Denmark
Ethics
Human subjects: This is a register-based study and informed consent for such studies is waived by the Danish Data Protection Agency. Data access was approved by the Danish Patient Safety Authority (3-3013-1731), the Danish Data Protection Agency (DT SUND 2016-50 and 2017-57) and the Danish Health Data Authority (FSEID 00003092 and FSEID 00003724).
Copyright
© 2023, Bruun-Rasmussen et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,085
- views
-
- 188
- downloads
-
- 4
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Epidemiology and Global Health
Artificially sweetened beverages containing noncaloric monosaccharides were suggested as healthier alternatives to sugar-sweetened beverages. Nevertheless, the potential detrimental effects of these noncaloric monosaccharides on blood vessel function remain inadequately understood. We have established a zebrafish model that exhibits significant excessive angiogenesis induced by high glucose, resembling the hyperangiogenic characteristics observed in proliferative diabetic retinopathy (PDR). Utilizing this model, we observed that glucose and noncaloric monosaccharides could induce excessive formation of blood vessels, especially intersegmental vessels (ISVs). The excessively branched vessels were observed to be formed by ectopic activation of quiescent endothelial cells (ECs) into tip cells. Single-cell transcriptomic sequencing analysis of the ECs in the embryos exposed to high glucose revealed an augmented ratio of capillary ECs, proliferating ECs, and a series of upregulated proangiogenic genes. Further analysis and experiments validated that reduced foxo1a mediated the excessive angiogenesis induced by monosaccharides via upregulating the expression of marcksl1a. This study has provided new evidence showing the negative effects of noncaloric monosaccharides on the vascular system and the underlying mechanisms.
-
- Epidemiology and Global Health
- Microbiology and Infectious Disease
Influenza viruses continually evolve new antigenic variants, through mutations in epitopes of their major surface proteins, hemagglutinin (HA) and neuraminidase (NA). Antigenic drift potentiates the reinfection of previously infected individuals, but the contribution of this process to variability in annual epidemics is not well understood. Here, we link influenza A(H3N2) virus evolution to regional epidemic dynamics in the United States during 1997—2019. We integrate phenotypic measures of HA antigenic drift and sequence-based measures of HA and NA fitness to infer antigenic and genetic distances between viruses circulating in successive seasons. We estimate the magnitude, severity, timing, transmission rate, age-specific patterns, and subtype dominance of each regional outbreak and find that genetic distance based on broad sets of epitope sites is the strongest evolutionary predictor of A(H3N2) virus epidemiology. Increased HA and NA epitope distance between seasons correlates with larger, more intense epidemics, higher transmission, greater A(H3N2) subtype dominance, and a greater proportion of cases in adults relative to children, consistent with increased population susceptibility. Based on random forest models, A(H1N1) incidence impacts A(H3N2) epidemics to a greater extent than viral evolution, suggesting that subtype interference is a major driver of influenza A virus infection ynamics, presumably via heterosubtypic cross-immunity.