Ethnic and region-specific genetic risk variants of stroke and its comorbid conditions can define the variations in the burden of stroke and its phenotypic traits
Abstract
Burden of stroke differs by region, which could be attributed to differences in comorbid conditions and ethnicity. Genomewide variation acts as a proxy marker for ethnicity, and comorbid conditions. We present an integrated approach to understand this variation by considering prevalence and mortality rates of stroke and its comorbid risk for 204 countries from 2009 to 2019, and Genome-wide association studies (GWAS) risk variant for all these conditions. Global and regional trend analysis of rates using linear regression, correlation, and proportion analysis, signifies ethnogeographic differences. Interestingly, the comorbid conditions that act as risk drivers for stroke differed by regions, with more of metabolic risk in America and Europe, in contrast to high systolic blood pressure in Asian and African regions. GWAS risk loci of stroke and its comorbid conditions indicate distinct population stratification for each of these conditions, signifying for population-specific risk. Unique and shared genetic risk variants for stroke, and its comorbid and followed up with ethnic-specific variation can help in determining regional risk drivers for stroke. Unique ethnic-specific risk variants and their distinct patterns of linkage disequilibrium further uncover the drivers for phenotypic variation. Therefore, identifying population- and comorbidity-specific risk variants might help in defining the threshold for risk, and aid in developing population-specific prevention strategies for stroke.
eLife assessment
This paper provides a useful analysis of the variation of the burden of strokes across geographic regions, finding differences in the relationship between strokes and their comorbidities. This dataset and the correlations found within will be a resource for directing the focus of future investigations. The results are technically solid, but there are cases where statistical analyses are yet to be carried out to support statements of statistical significance.
https://doi.org/10.7554/eLife.94088.3.sa0Introduction
Stroke affects over 101 million people worldwide and is ranked the second most fatal disease in the world, with 6.5 million deaths in 2019 (GBD 2019 Stroke Collaborators, 2021). Comorbid conditions of stroke are critical contributors to burden of stroke and the duration of the comorbid conditions can further determine the severity of stroke risk or mortality. Prevalence for comorbid conditions range from 43% to 94% and estimates can go as high as 99% above 66 years of age (Gallacher et al., 2019). Prevalence and mortality risk in stroke have often been evaluated from socio-economic viewpoint, but it is also critical to understand the differences in drivers such as comorbid conditions. It is the accumulated risk of comorbid conditions that enhance the risk of stroke further. Are these comorbid conditions differentially impacted by socio-economic factors and ethnogeographic factors. This was clearly evident in COVID era, when COVID-19 differentially impacted the risk of stroke, possibly due to its differential influence on the comorbidities of stroke.
Mortality in stroke, its subtypes, and their comorbid conditions have a strong ethnic bias (Tarko et al., 2022; Gardener et al., 2020; Mkoma et al., 2020). Genetics act as surrogate markers for ethnogeographic indices. It is important to understand which comorbid conditions are influenced by socio-economic indices, and how they impact the risk of stroke and their underlying genetic basis. A Danish study reported the effect sizes of association with comorbid conditions for stroke to have 15% higher mortality risk in presence of diabetes mellitus with end-organ damage, 20% for peripheral vascular disease, 25% for chronic pulmonary disease, 35% for congestive heart failure and atrial flutter, 45% for moderate to severe renal disease, and 1.8- to 2.4-fold for mild-to-severe liver disease (Schmidt et al., 2014). A UK Biobank study on stroke multimorbidity reported 1.5× higher risk of mortality in those with two additional comorbidities and a ≈2.5× higher risk of mortality in those with ≥5 comorbidities over 7 years (Gallacher et al., 2018). Thus, the differential impact of comorbid or multimorbid conditions contributing to the additive effect of illness burden needs to be addressed from an ethnogenetic perspective. Devising an appropriate strategy for prevention of stroke burden, needs a careful evaluation of the underlying genetic signature for each of these comorbid conditions and distinguishing their ethnic bias.
The objective of the study was to understand what determines the differences in stroke burden around the globe. Variations in burden of stroke could be influenced by comorbid conditions, and incidentally both stroke and its comorbid conditions can be influenced by ethnogeographic factors and genetics can act as a stable proxy marker for all. To resolve this, we considered the prevalence and mortality of a total of 11 disease conditions, consisting of stroke and its comorbid conditions, across different continents and ethnicities from 2009 to 2019. The disease conditions were further stratified as per their ethnogeographic locations and their genetic risk variants extrapolated from GWAS data. This study would provide insights on the regional patterns of the burden of stroke and its comorbid conditions, and help in resolving it from an ethnic and genetic viewpoint. These insights would further aid in developing and strategizing regional- and ethnic-specific needs for prevention of the risk of comorbid conditions and stroke.
Results
Global mortality, incidence and prevalence rates of stroke and its comorbid conditions
Globally, stroke ranks as the second most fatal disease in 2019 (84.2/100,000, 95%UI 76.8–90.2) among the eight diseases analyzed, preceded by ischemic heart disease (IHD; 117.9/100,000, 95%UI 107.8–125.9) as shown in Figure 1—figure supplement 1 and Supplementary file 1. High systolic blood pressure (high SBP) ranks as the most fatal comorbid disease condition with an age-standardized mortality rate (ASMR) of 138.9/100,000 [95%UI 121.3–155.7] among all conditions. Within stroke subtypes, ischemic stroke ranks highest globally with ASMR of 43.5/100,000 [95%UI 39.08–46.8], followed by intracerebral hemorrhage (ICH) at 36.0/100,000 [95%UI 32.9–38.7] and subarachnoid hemorrhage (SAH) at 4.7/100,000 [95%UI 4.1–5.2]. The ranking of the ASMRs of the diseases follows the same trend throughout the last decade with minor exceptions, where high body mass index (high BMI) improved its rank in 2014 by swapping with high low-density lipoprotein cholesterol (high LDL), and type 1 diabetes (T1D) dethroned chronic kidney disease (CKD) in 2019.
Global trends in crude and age-standardized incidence rates (ASIRs) show that stroke incidence ranks fourth, while IHD is on top followed by type 2 diabetes (T2D) and CKD (Figure 1—figure supplement 1, Supplementary file 1). Crude incidence rates of stroke and its subtypes increased in the last decade but ASIRs decreased, with exception of ischemic stroke, where an increase in ASIR was observed. Among other diseases, IHD, T2D, and T1D show a continuous increase in the last decade, with T2D (280.1/100,000, 95%UI 258.8–303.9) surpassing IHD (274.0/100,000, 95%UI 242.9–306.4) in 2019 in crude rates.
The global trend between crude and age-standardized prevalence rates (ASPRs) revealed that the ranking of stroke remains at sixth position throughout the years (Figure 1—figure supplement 1, Supplementary file 1). Among stroke subtypes, ischemic stroke ranks highest. The comorbid conditions, high SBP and high BMI, ranks first and second followed by CKD and T2D, and interestingly all rank above stroke. We also observe that though there is a continuous increase in prevalence rates in the last decade, the ranking of stroke or its comorbid conditions does not change over the years, with the exception of T1D ASPRs overtaking ICH. We were keen to understand if these global trends of ASMR and ASPR are influenced by region and ethnicity.
Ethno-regional differences in mortality and prevalence of stroke and its major comorbid conditions
We observed interesting patterns of ASMRs of stroke, its subtypes, and its major comorbidities across different regions over the years as shown in Figure 1a and Figure 1—figure supplement 2, Table 1 and Supplementary file 1. When assessed in terms of ranks, high SBP is the most fatal condition followed by IHD in all regions, except Oceania where IHD and high SBP swap ranks. Africa (206.2/100,000, 95%UI 177.4–234.2) and Middle East (198.6/100,000, 95%UI 162.8–234.4) have the highest ASMR for high SBP, even though they rank as only the third and sixth most populous continents (Figure 1—figure supplement 3), respectively. Both high SBP (−0.64% to −2.25% estimated annual percentage change [EAPC]) and IHD (−0.45% to −1.17% EAPC) show a decreasing trend for ASMRs in all regions. However, only Europe shows a significant decrease for high SBP (−2.26%; p = 0.009) and IHD (−2.37%; p = 0.006) in the decade. Stroke has a decreasing trend for ASMRs with East Asia and Europe showing a significant decrease of −2.2% (p = 0.021) and −2.6% (p = 0.03), respectively. Stroke has the highest mortality in East Asia (127.1/100,000, 95%UI 104.9–150.5 in 2019), and is the only region that ranks stroke higher than IHD. Though Europe, Middle East, and Central and South Asia have ASMRs similar to global rates for stroke, Central and South Asia ranks stroke as the third most fatal factor, while America, Europe, and Middle East ranks it fifth. Oceania (62.1/100,000, 95%UI 34.1–90.2) and America (40.3/100,000, 95%UI 36.2–43.1) have lowest rates for stroke in 2019.
Among the stroke subtypes, ICH and ischemic stroke show maximum ethnogenetic differences in mortality rates (14.1/100,000–61.4/100,000) and ranking (4th to 9th) in 2019. While ICH shows a significant decrease in ASMR in East Asia (–3.53%, p = 0.009), ischemic stroke shows a significant decrease in Europe (–2.37%, p = 0.06). High BMI and high LDL rank in the top 5 but their mortality rates differed across all regions, with the highest rates for both in the Middle East. Only Europe shows a significant decrease in high LDL (–2.53%, p = 0.03) over the decade. The Middle East has the highest ASMRs due to IHD and high SBP, followed by Africa, Central and South Asia, East Asia, and Europe, all having rates higher than global. All continents have similar mortality rates for T2D and CKD across the years, except Oceania, where the T2D rate is nearly three times CKD rate. Africa has the highest mortality rate for T1D (1.59/100,000, 95%UI 1.2–1.9).
ASPRs also showed an interesting pattern of distribution, and, in contrast to mortality, showed an increase over the decade (Figure 1b and Figure 1—figure supplement 4, Table 2 and Supplementary file 1). Highest ASPRs were observed for high SBP across all regions, except America, Middle East, and Oceania, where high BMI has most prevalence. While EAPC of prevalence of high SBP showed significant decrease in all regions (−0.38% to −1.77%), except Central and South Asia, high BMI showed a significant increase (1.77–5.6%) in all (Table 2). The ASPR ranking of CKD and T2D rose to top 5, in sharp contrast to their ASMR rankings. Prevalence of CKD (EAPC 0.24– 0.7%) and T2D (EAPC 0.6–2.18%) is significantly increasing in all regions. For all other diseases, the pattern of ranking and rates across regions were stable with minor exceptions. Stroke ranks sixth for ASPRs in all regions, and it is interesting to note that ASPRs of all the comorbid conditions, except T1D, rank above stroke. While Europe shows a significant decline in ASPRs of stroke (–0.9%, p = 0.008) and ischemic stroke (–0.85%, p = 0.03) across the years, East Asia shows a significant increase for stroke (0.7%, p = 0.02) and ischemic stroke (1.09%, p = <0.001). Globally, T1D swapped its prevalence ranking with ICH in 2014, largely influenced by the significant increase in prevalence in Middle East (2.83%, p = <0.001). However, the highest prevalence of T1D is in Europe and Oceania. The prevalence of IHD has remained nearly constant in all continents in the last decade, except Oceania (–0.43%), America (–0.49%), and Middle East (–0.27%) which shows a significant decrease. In 2019, Middle East has the highest prevalence for IHD (4843.02/100,000, 95%UI 4243.02–5442.58), while America (1695.6/100,000, 95%UI 1530.8–1871.9) has the lowest ASPR, less than half the rate of the Middle East. We were further keen to understand if these regional differences in mortality and prevalence rates also reflect a socio-economic bias and if so, does it reflect in a category of comorbid conditions.
Contribution of metabolic risk and hypertension in stroke based on ethnogenetic locations
When the prevalence and mortality rates of stroke and its comorbidities were grouped into three groups, namely strokes, metabolic disorders, and high SBP, we find that out of the three proportional mortalities shown in Figure 2, strokes group has the highest proportion (37.1–47.2%) across all years and regions, except Oceania and America, where instead, metabolic disorders have the highest proportion (39.1–42.2%) that is significantly higher compared to global proportion (Supplementary file 1). East Asia has the highest proportional mortality for strokes among all regions in all 3 years (44.7–47.2%). This was in sharp contrast to the prevalence proportion of strokes (4.6–9.1%), which was the least among the three groups, with the highest proportional prevalence for strokes being 8.9–9.1% in Central and South Asia. The proportional mortality for high SBP (22.0–30.5%) is very similar across the regions. Metabolic disorders have significantly higher proportional prevalence compared to global in Middle East, Oceania, and America, the highest being in America (63.9%, Supplementary file 1). Asian and African regions have lowest proportional prevalence for metabolic disorders, with Central and South Asia having significantly lower proportions. However, these regions have the highest prevalence proportion for high SBP (46.6–54.3%), with Central and South Asia having a significantly higher proportion compared to global. On the other hand, the Middle East, Oceania, and America have significantly lower proportional prevalence compared to global. We were further keen to understand the correlation among ASMR and ASPR of comorbid conditions among ethnogeographic regions.
Correlation among prevalence and mortality rates based on ethnogeographic region
Correlation between ASMRs and ASPRs for stroke and all comorbid conditions across each ethnogeographic location is shown in Figure 3. High SBP prevalence and mortality show a strong positive correlation in Central and South Asia but a strong negative correlation in the Middle East. The prevalence and mortality of high BMI have a strong negative correlation in Central and South Asia and Middle East populations, but a strong positive correlation in East Asia. Prevalence of CKD has negative correlation with mortality rates in East Asia. Prevalence of T2D has negative correlation with mortality rates in Oceania and positive correlation in America. It is interesting to note that there was not much of a correlation in mortality and prevalence rates for most of the conditions. For overall stroke, though minor correlations between various ethnicities are seen, this becomes alarmingly clear in the stroke subtypes. The correlation matrix of mortality and prevalence rates of stroke and its comorbidities does reflect strong ethnogeographic distinctions, which formed the basis of further investigation on the genetic basis of stroke and its comorbidities.
Ethnogeographic stratification of stroke and its comorbidities based on GWAS data
To resolve the ethnogeographic distinctions for stroke and its major comorbid conditions based on their genetic risk, we considered all GWAS loci for stroke and its major comorbid conditions, and subjected it to stratification analysis. From the GWAS loci, we observed a distinct population structure that distinguished ethnogeographic populations based on their genetic signatures (Figure 4). For all diseases, except high BMI, the individuals clustered into five groups, each corresponding with the five super-populations from 1000 Genome project namely, African, East Asian, South Asian, European, and American. For high BMI, the individuals clustered into three groups corresponding to African, East Asian, and European. Though broad clustering of ancestral populations among the diseases looks similar, the proportions of ancestral populations in certain diseases vary greatly. Among stroke, IHD, and T2D risk variants, the populations structured in a similar way, while for T1D, CKD, and LDL the patterns were slightly different in European and South Asian ancestry. Whereas for high SBP the major fluctuations were observed in South Asian and East Asian populations. This stratification was further explored using population-based clustering, where a similar pattern was observed in the principal component analysis (PCA) plots for stroke and its comorbid conditions (Figure 5). African population seems to be a distinct outlier for most diseases, and the East Asian comes a distant second in the cluster pattern.
Similar patterns were observed in stratification analysis after grouping the GWAS loci of stroke and its comorbidities into three groups as before, namely strokes, metabolic disorders, and high SBP. The African, East Asian, and South Asian populations had distinct structure in all three groups (Figure 4—figure supplement 1). These observations do indicate that the underlying genetic factors of stroke and its comorbid factors can be the real indicators of ethnogeographic patterns of risk for stroke and its comorbid conditions. However, we were further keen to understand the extent of shared and unique individual risk variants across stroke and its comorbid condition and how these unique or shared variants can help in distinguishing their relevance across ethnicities.
Shared and unique risk variants of stroke among the different ethnogenetic regions
The unique and shared individual variants across stroke and its comorbid conditions were identified from the GWAS data irrespective of ethnicity. We find the majority of the risk variants were unique to a disease condition, however, several risk variants were also seen to be shared with stroke and other comorbid conditions as seen in Figure 6. We were further keen to have a deeper insight into the distribution of risk variants in stroke across ethnicities. Stroke has only 55% of the risk variants common to all the five populations as seen in Figure 7 and Supplementary file 1. Two groups of populations share the most number of variants, namely, the Africa–America–Europe–South Asia (6% of variants shared) group and the East Asia–America–Europe–South Asia (4%) group. Africa has the highest number of unique variants for stroke (6%), followed by Europe (3%).
South Asia has two unique variants for stroke, rs528002287 and rs148010464, which maps to genes PCSK6 and the intergenic region of PLA2G4A/LINC01036, respectively. The variants rs528002287 and rs148010464 are low-frequency variants in South Asia with a minor allele frequency (MAF) of 0.053 and 0.054, respectively. We were further keen to understand how these unique variants in South Asia are tagged to the nearby variants of different frequencies in the different ethnicities. Can the linkage disequilibrium (LD) patterns of rare allele and common allele help in distinguishing the ethnogeographic distinction in phenotype variation. While comparing, the LD pattern of low-frequency variants and common variants across ethnicities, we observe contrasting patterns. The LD plots between the unique variants and low-frequency variants, that are tagged to it, clearly demonstrated a unique LD pattern in South Asia, compared to other populations (Figure 8). Contrastingly, the LD plots of common variants tagged to the risk variant of stroke unique to South Asia showed similar LD patterns among all populations (Figure 8—figure supplement 1). These differences might also reflect unique or distinct phenotypic differences among ethnicities for risk in stroke.
Discussion
To the best of our knowledge, this is the first study that explored the burden of stroke and its comorbid conditions across regions, stratifying and distinguishing their unique features based on their genetic background extrapolated from the GWAS risk loci. The dynamics of different rates of stroke, its subtypes, and comorbid factors do reflect ethnogeographic differences. Globally, the prevalence and incidence rates of stroke have increased, while mortality rates decreased with minor shifts in ranking in the last decade. Interestingly, the incidence and prevalence rank of stroke rates remain the same globally, but for mortality it ranks third, preceded only by IHD and high SBP. While the global stroke prevalence is nearly 15 times its mortality rate, prevalence of comorbid conditions such as high SBP, high BMI, CKD, and T2D are alarmingly 150- to 500-fold higher than their mortality rates. These comorbid conditions can drastically affect the outcome of stroke. Interestingly, these disparities in rates get further widened when evaluated from an ethnogeographic perspective. The age-standardized prevalence of stroke in 2019 ranges from lowest in Central and South Asia (858.5/100,000, 95%UI 737.6–979.4) to highest in East Asia (1513.1/100,000, 95%UI 1390.7–1635.5), in contrast to mortality rates, lowest in America (40.3/100,000, 95%UI 36.2–43.1) and highest in East Asia (127.7/100,000, 95%UI 104.9–150.5).
The rates of stroke, its subtypes, and comorbid conditions do correlate to some extent but their ranking varies significantly. In terms of ranking, stroke ranks sixth in prevalence across all ethnogeographic locations, but ranks second in mortality in East Asia and third in Central and South Asia and Africa. We find that among the considered comorbid conditions, some rank above stroke in both incidence and prevalence, however in terms of mortality, stroke ranks highest with exception to high SBP. Similarly, the ranking of the comorbid conditions also varies when the global population is stratified based on ethnogeographic locations. In the last decade, there has been tremendous development in the healthcare industry globally, but this is not reflected in the mortality or prevalence data from 2009 to 2019. Ranking of comorbid conditions by the rates is very crucial to identify ethnic-specific comorbid risk that can be helpful in guiding and managing stroke risk.
The changing dynamics of stroke or its comorbid conditions can be attributed to a multitude of factors. Often the global burden of stroke has been discussed from the point of view of socio-economic parameters. Studies indicate that half of the stroke-related deaths are attributable to poor management of modifiable risk factors (Avan et al., 2019; Baatiema et al., 2020). However, we observe that different socio-economic regions are driven by different risk factors. Considering that Europe, America, Oceania, and the Middle East represent high socio-economic regions, the comorbid conditions that drive prevalence and mortality rates here seem to be more of metabolic in nature, while for South Asians, high SBP is the prominent factor. It is evident from the correlation between prevalence and mortality for stroke, its subtypes, and comorbid conditions that there is an ethnic- and comorbid-specific correlation, which possibly does not reflect a clear socio-economic distinction. The comorbid conditions for stroke subtypes also differ to a large extent. Diabetes is a comorbid condition for stroke but not for SAH, and this risk is ethnic-specific (Lindgren et al., 2013; Koshy et al., 2010). Therefore, it is very pertinent to understand the stroke risk from an ethnic view point, beyond the boundaries of socio-economic criteria, as the drivers of comorbid risk and ethnicity rely on genetic and epigenetic components. Studies reported reduction in life expectancy in 31 of 37 high-income countries, deduced to be due to COVID-19 (Islam et al., 2021). However, it would be unfair to ignore the comorbid conditions which could also be the critical determinants for reduced life expectancy in these countries.
Stroke has a complex etiology, which is further influenced by its comorbid conditions and this impacts its phenotypic variability. A strong genetic risk drives both stroke and its comorbid conditions. Genetic risk variants for diabetes, cardiovascular disease, diabetic retinopathies and nephropathies, hypertension, inflammation, and kidney diseases have been reportedly shown to have strong ethnogenetic variation (Shoily et al., 2021). Implications of ethnogenetic differences were evident when GWAS genes for stroke and all studies of comorbid conditions were used to stratify the 1000 genome super-populations. We observed that the GWAS risk loci for stroke and its comorbid conditions like high BMI, high LDL, high SBP, T2D, T1D, and CKD could stratify the super-populations based on its ethnogenetic considerations. Fluctuations in genetic structure of stroke and its comorbid conditions signify the impact of ethnic variations on mortality and prevalence rates. Stroke accounts for approximately 20% of deaths in diabetics (Banerjee et al., 2012; Boehme et al., 2017). Diabetics and pre-diabetics, and the duration of diabetes have been reported to have increased risk of stroke, which gets aggravated in African-Americans (Banerjee et al., 2012; Boehme et al., 2017). As stroke and its comorbid conditions are heavily influenced by lifestyle, high-income countries showed evidence of metabolic disorders being the major cause of concern for both prevalence and mortality. Similar observation in a UK biobank cohort study on stroke suggests that genetic and lifestyle factors were independently associated with incident stroke, which emphasizes the benefit of entire populations adhering to a healthy lifestyle, independent of genetic risk (Rutten-Jacobs et al., 2018).
Metabolic risk variations could also be a reflection of their underlying genetic differences. Significant differences among ethnicities in metabolism of various macromolecules have been reported (Vasishta et al., 2022). Genetic variations demonstrate inter-ethnic differences in LDL levels resulting in differential impact on dyslipidemia (Klarin et al., 2018). A meta-analysis on European, East Asian, and African-American ethnicities revealed that common variants of CDH13 and ADIPOQ regulate adiponectin levels, an important component of BMI indicator (Dastani et al., 2012). ALDH2*504Lys allele has been reported to be associated with high BMI, increased tolerance of alcohol, high SBP, and decreased high-density lipoprotein in East Asians (Takeuchi et al., 2011; Xu et al., 2010). MEGASTROKE consortium on stroke and its comorbid factors reported five variants associated with blood pressure, two with LDL cholesterol and reported that all stroke subtypes were associated with a Genetic Risk Score for high SBP (Malik et al., 2018). A study on IHD, T2D in European populations with different healthcare systems, and local population substructures reported Polygenic risk score with similar accuracy across Europeans, to a lesser extent to South and East Asian populations, and very poor transferability for Africans (Mars et al., 2022).
Even stroke subtypes show a strong ethnic variation (Malik et al., 2018; Hu et al., 2022). While COL4A1 and COL4A2 were common denominators for most of the stroke subtypes, high-density lipoprotein was inversely associated with small vessel stroke (Meschia, 2020). The INTERSTROKE study on stroke subtypes reported that among comorbid factors, high SBP was significantly associated with ICH and diabetes, cardiac issues, and apolipoproteins with ischemic stroke (O’Donnell et al., 2016). Ischemic stroke associated with ApoB/ApoA1 ratio had higher (67.6%) population attributable fraction (PAF) in Southeast Asia compared to Western Europe, North America, and Australia (24.8%), while ischemic stroke associated with atrial fibrillation had lower PAF in South Asia (3.1%) compared to the rest (17.1%) (O’Donnell et al., 2016). Emerging studies like novel ethnic-specific genetic variants, such as SUMOylation pathway in Indians (Kumar et al., 2021), SFXN4 and TMEM108 in Africans (Keene et al., 2020) indicate the involvement of different pathways among different ethnicities in stroke. The A allele of c.*84G>A loci in CETP gene was found to be a risk factor for IHD in South Asians (Ganesan et al., 2016). High SBP was found to be a risk factor in all major stroke subtypes except lobar ICH (Georgakis et al., 2020). Therefore, identifying differential risk in different ethnicities for stroke and its subtypes, and its impact on comorbid conditions might also indicate different treatment modalities which can minimize adverse metabolic side effects.
Identifying the pattern of genetic variation is critical in distinguishing stroke and its endophenotypic variations. The risk variants rs528002287 (locus 15q26.3) in PCSK6 and rs148010464 (locus 1q31.1) an intergenic variant in PLA2G4A/LINC01036 for stroke were unique to South Asia, and were found to be associated with cardioembolic stroke and small vessel stroke in South Asians (Kumar et al., 2021). A recent INTERSTROKE study reported the association of short sleep duration with increased risk for stroke to be highest in the South Asian ethnicity (Odds Ratio 9.13, 95% Confidence Interval 5.86–14.66) (Mc Carthy et al., 2023). Decreased sleep quantity and quality have been reported to increase blood pressure (Sabanayagam and Shankar, 2010), prevalence for which was found to be highest for South Asians in our study. These observations are interesting as PCSK6 is known to regulate sodium homeostasis and thereby maintain diastolic blood pressure (Li et al., 2004; Chen et al., 2015). Reports also indicate 2.34-fold difference in the expression of PCSK6 during maintenance phase of hypertension (Marques et al., 2010). PCSK6 has also been reported to be involved in processing of precursors of Melanin Concentrating hormone (MCH) under certain conditions (Viale et al., 1999). MCH is known to play a central role in promoting and stabilizing sleep (Jego et al., 2013; Konadhode et al., 2013). In insomnia patients, PLA2G4A was reported to be upregulated by 1.88-fold after improvement in sleep (Livingston et al., 2015). In sleep deprived mice, glycolytic pathway and lipid metabolism were upregulated and expression of PLA2G4A was downregulated (Hinard et al., 2012). These observations are interesting as PLA2G4A is known to play a role in the metabolism of phospholipids, production of lipid mediators, and the release of arachidonic acid (Burke and Dennis, 2009; Shimizu et al., 2006). Arachidonic acid is involved in signaling pathways of metabolic processes like release of insulin and glucose disposal (Chen et al., 1996; Nugent et al., 2001; Wolford et al., 2003). PLA2G4A also plays a role in the production of pro-thrombotic TXA2 and thus, inhibition of PLA2G4A can reduce platelet aggregation and thromboembolism (Murakami et al., 2011). Thus, the risk of unique genetic variants in PCSK6 and PLA2G4A in South Asian ancestry may indicate a unique endophenotype for stroke, which might also indicate the influence of underlying risk variants for comorbid conditions, for example PLA2G4A in metabolic processes.
GWAS has yielded numerous common risk alleles that are associated with various human phenotypes (Manolio et al., 2009; McCarthy and Hirschhorn, 2008). Since the rationale for GWAS is the ‘common disease, common variant’ hypothesis, it has been able to identify only those common variants with a moderate effect on the associated trait and thus, these identified variants only explain a small proportion of the heritability of the trait. One of the major conclusions from the 1000 Genomes Project was that most variations in the human genome are rare and unique to specific subpopulations (Abecasis et al., 2010; Abecasis et al., 2012). From an evolutionary point of view, alleles with strong effects that are detrimental will be controlled by purifying selection keeping its frequency low. Hence, rare and low-frequency variants might be variants with large effects on traits (Bodmer and Bonilla, 2008). Examples of genes like ABCA1, PCSK9, and LDLR (Kathiresan et al., 2009; Lusis and Pajukanta, 2008), which carries both common variants with moderate effects as well as rare variants with large effect for lipid levels indicate that genes can contain both types of variants associated with a complex trait. Using this logic, we were keen to identify the influence of common and rare variants in stroke and its comorbid conditions and their pattern of LD in distinguishing ethnic-specific risk.
While we observe the majority of the GWAS variants associated with stroke are common variants, a minority of these are low-frequency variants with below 10% frequency. The alternate alleles of these variants were found to be present only in specific super-populations of 1000 Genomes, while the variants are monoallelic in the other super-populations. This difference among populations gets further enhanced when we look at the LD patterns of these low-frequency variants with other low-frequency variants which are in proxy. Distinct LD blocks with these unique rare variants present in one super-population were seen to be absent in other populations. On the other hand, LD patterns of common variants (frequency greater than 10%) in the same regions show similar LD patterns for all populations. Thus, the rare variant hypothesis could explain a significant proportion of the differences in burden of stroke seen across populations. Such genes identified could be possible candidate genes for identifying rare and low-frequency variants that could play a role in the heritability of stroke and its comorbid factors.
Conclusion
The dynamics of incidence, prevalence, and mortality rates of stroke and its subtypes along with its comorbid risk factors reflect strong ethnogeographic differences. Our work highlights that these ethnogeographic differences for stroke and its comorbid conditions need to be evaluated and stratified based on their ethnogenetic background. Genetic variables should be considered as primary evidence as they define the threshold for biochemical, metabolomic, or epigenetic variables. The different socio-economic regions are driven by different risk factors of stroke and low-frequency variants could be playing a role in the differences in burden of stroke seen in the different regions as shown in Figure 9. Identifying population-specific unique variants for stroke and its comorbid conditions might refine the drivers for endophenotypic variations for stroke risk. We would like to suggest that integrating public health genomics and articulating it with comorbid conditions for stroke should be considered crucial irrespective of the economic status, as both lower and higher socio-economic regions have different drivers of stroke risk.
Methods
Data sources
We obtained ASIRs, ASPRs, and ASMRs for a total of eight diseases and three disease conditions in 204 countries, for the years 2009–2019 using the GBD Results Tool (vizhub.healthdata.org/gbd-results/) (Global Burden of Disease Collaborative Network, 2020) and from NCD Risk Factor Collaboration (NCD-RisC, ncdrisc.org/) study (Zhou et al., 2017; NCD Risk Factor Collaboration NCD-RisC, 2017) in November 2021. GBD codes for the selected diseases were B.2.2 (Ischemic heart disease, IHD), B.2.3 (Stroke), B.2.3.1 (Ischemic stroke), B.2.3.2 (Intracerebral hemorrhage, ICH), B.2.3.3 (Subarachnoid hemorrhage, SAH), B.8.1.1 (Diabetes mellitus type 1, T1D), B.8.1.2 (Diabetes mellitus type 2, T2D), and B.8.2 (Chronic kidney disease, CKD). Three disease conditions, which are also comorbid factors of stroke, were also selected, high SBP (>110–115 mmHg), high BMI (>23.0 kg/m2), and high LDL cholesterol. For these, ASMRs in 204 countries, as well as global rates for 2009–2019 were obtained using the GBD Results Tool, and age-standardized prevalence percentages in 204 countries for the years 2009–2016 (data for 2017–2019 was not available) were obtained using NCD-Risc website (ncdrisc.org/) (Zhou et al., 2017; NCD Risk Factor Collaboration NCD-RisC, 2017). Global crude rates and age-standardized incidence, prevalence, and mortality rates were obtained from GBD and NCD for all the diseases and disease conditions (Figure 1—figure supplement 1, Supplementary file 1). GBD 2019 and NCD-RisC study compiled with the GATHER Guidelines.
Spatio-temporal trend analysis and estimated annual percent change
The 204 countries were grouped into eight geographic regions namely Global, America, Europe, Middle East, Africa, Central Asia and South Asia, East Asia, and Oceania, based on ethnicity (Supplementary file 1, Figure 1—figure supplement 3). The prevalence percentages were converted into ASPRs for each disease condition. ASMRs and ASPRs for each region were obtained using Bayesian model averaging of linear regression models with Markov chain Monte Carlo sampling. Bayesian Information Criterion (BIC) was the model selection criteria. Poisson distribution with global rates as lambda was the prior distribution on the models. The population of each country was used as weights. 10,000 draws from the posterior distribution of model parameters were used to obtain the point estimates (mean of the draws) and 95% uncertainty intervals (2.5th to 97.5th percentiles of the posterior distributions) of mortality and prevalence rates for each region for years 2009–2019. Rate estimates presented in this paper are age-standardized rates per 100,000 population.
The ASMRs and ASPRs thus obtained for the different diseases in each region were subjected to a temporal rank analysis for 2009, 2014, and 2019 using custom Perl scripts and change in the trend was plotted as a bump plot. The size and position of the points indicates the rate and rank, respectively. To quantify the temporal trends, EAPC was modeled using Poisson regression using a generalized linear model for the log-transformed rates: log(y) = β0 + β1x1 + β2x2 + … + βpxp, where y is the age-standardized rate, xi are the calendar years, βi are the rate trends. Under the assumption of linearity of log of age-standardized rate with time, EAPC = 100 * exp(βˆ) – 1. The 95% uncertainty interval is calculated as CI(EAPC) = β + (Z(1−α)/2) × SE, where α is the confidence level, SE is the standard error of βˆ.
EAPC for ASMRs and ASPRs was calculated for the time period 2009–2019 (Tables 1 and 2). EAPC for high SBP prevalence was calculated from 2009–2015 and for high BMI prevalence from 2009 to 2016. EAPC was considered statistically significant if the uncertainty interval of EAPC did not cross zero. Statistical significance of spatio-temporal difference in rates was calculated using chi-square test by assuming the rate to be under Poisson distribution. 2019 ASMRs (or ASPRs) in each region were compared with global ASMR (or ASPR) to compare change over locations and with 2009 ASMRs (ASPRs) to compare change over time. 2014 ASPRs were used when 2019 data were not available. p-values (two-sided) for ASMRs and ASPRs are shown in Supplementary file 1. Chi-square tests were done using Open Source Epidemiologic Statistics for Public Health (https://www.openepi.com/Menu/OE_Menu.htm). The relation between ASMRs and ASPRs of each region was measured using Pearson correlation. For correlation, we considered the data of 2014 due to the complete spectrum of data availability. All analysis, unless specified, was done using R Statistical Software (version 4.1.2) (R Development Core Team, 2021) with packages BAS (Clyde, 2022), Rcan (Laversanne and Vignat, 2020), corrplot (Wei and Simko, 2021), dplyr, and ggplot2.
Proportional mortality and prevalence
The diseases were classified into three categories as stroke (ischemic stroke, ICH, SAH, and IHD), metabolic disorders (high BMI, high LDL, T2D, CKD, and T1D) and high SBP. To determine the proportion of each category in a region, total ASMRs were scaled to 100 for all 3 years. The same was done for ASPRs (high LDL cholesterol data were not available). Statistical significance of difference in proportions was calculated using a one-sample test for binomial proportion using normal-theory method. The proportion mortality in each category was compared in a pairwise manner with global as well other regional proportions to calculate the two-sided p-value. The same was done for proportional prevalence. P-values for proportional mortality and prevalence are shown in Supplementary file 1. Proportion comparisons were done using Open Source Epidemiologic Statistics for Public Health (https://www.openepi.com/Menu/OE_Menu.htm).
Population structure analysis
For evaluating the ethnogenetic perspective of stroke and its comorbid conditions we considered the risk variants associated with each disease. The risk variants were obtained from GWAS Catalog (https://www.ebi.ac.uk/gwas/home), during the period November 2021 to August 2022. The trait IDs used to retrieve data from GWAS Catalog for the different diseases were EFO_0000712 (stroke), EFO_0001645 (IHD), EFO_0000537 (high SBP), MONDO_0005148 (T2D), MONDO_0005147 (T1D), EFO_0003884 (CKD), EFO_0007041 (high BMI), and EFO_0004611 (high LDL). The position of the risk variants in GRCh37 assembly was obtained, and variants less than 10,000 bp apart were excluded using custom Perl scripts. The total number of risk variants for each disease thus obtained are shown in Supplementary file 1. The genotype of identified biallelic autosomal SNPs in unrelated individuals was extracted from the 1000 Genome VCF files (https://www.internationalgenome.org/; Lusis and Pajukanta, 2008). The proportion of ancestral populations in each individual was estimated from their genotype using a model-based clustering approach (Pritchard et al., 2000). The admixture model with correlated allele frequencies was specified to cluster the individuals into either five or three clusters (in case of BMI). The genotype was converted to eigenvectors using principal component analysis (Purcell and Chang, 2023; Chang et al., 2015).
Shared and unique risk variants among ethnicities
The number of GWAS risk variants for stroke shared among, as well as, unique to the five super-populations in 1000 Genomes (African, East Asian, South Asian, European, and American) was obtained (Figure 6 and Supplementary file 1). A risk variant was considered to be present in a population if the alternate allele frequency in 1000 Genomes was greater than or equal to 0.05. For each GWAS risk variant, the gene variant map was obtained, and the genes shared among the populations was estimated (Supplementary file 1).
Calculation of LD of risk variants of stroke in South Asia
Proxy variants (R2 > 0.01) in the region ±500 kb of the risk variant of stroke present in the South Asian population was obtained from LDlink (Machiela and Chanock, 2015). Among the proxy variants, variants with MAF less than 0.1 were selected along with the risk variant as low-frequency variants, and variants with MAF greater than 0.1 were termed as common variants. LD between two alleles A and B is quantified using the coefficient of LD DAB calculated using the equation DAB = pABpApB, where pi represents the frequency of the allele i or haplotype i. To be able to compare the level of LD between different pairs of alleles, D is normalized as follows: D′ = D/Dmax, where Dmax = max{−pApB, −(1 pA)(1 − pB)} when D < 0 and Dmax = min{pB(1 pA), pA(1 − pB)} when D > 0 (Lewontin, 1964). Estimates of D′ were calculated separately for low-frequency variants and common variants, and plotted using the R library gaston (Perdry and Dandine-Roulland, 2022).
Materials and correspondence
The datasets analyzed during the current study are available in the GBD database, at https://vizhub.healthdata.org/gbd-results/?params=gbd-api-2019-permalink/fe8b05e8222bcf3ec3af555762006f2a.
All other materials including computer code will be available upon request. Please contact the corresponding author for the same.
Data availability
The datasets analyzed during the current study are available in the GBD database.
References
-
Stroke risk factors, genetics, and preventionCirculation Research 120:472–495.https://doi.org/10.1161/CIRCRESAHA.116.308398
-
Phospholipase A2 biochemistryCardiovascular Drugs and Therapy 23:49–59.https://doi.org/10.1007/s10557-008-6132-9
-
PCSK6-mediated corin activation is essential for normal blood pressureNature Medicine 21:1048–1053.https://doi.org/10.1038/nm.3920
-
SoftwareBAS: bayesian variable selection and model averaging using bayesian adaptive sampling, version R package version 1.6.4BAS: Bayesian Variable Selection and Model Averaging Using Bayesian Adaptive Sampling.
-
Optogenetic identification of a rapid eye movement sleep modulatory circuit in the hypothalamusNature Neuroscience 16:1637–1643.https://doi.org/10.1038/nn.3522
-
Common variants at 30 loci contribute to polygenic dyslipidemiaNature Genetics 41:56–65.https://doi.org/10.1038/ng.291
-
Optogenetic stimulation of MCH neurons increases sleepThe Journal of Neuroscience 33:10257–10263.https://doi.org/10.1523/JNEUROSCI.1225-13.2013
-
Risk factors for aneurysmal subarachnoid hemorrhage in an Indian populationCerebrovascular Diseases 29:268–274.https://doi.org/10.1159/000275501
-
SoftwareRcan: cancer Registry data analysis and Visualisation, version R package version 1.3.82Rcan: Cancer Registry Data Analysis and Visualisation.
-
The association between paired basic amino acid cleaving enzyme 4 gene haplotype and diastolic blood pressureChinese Medical Journal 117:382–388.
-
A treasure trove for lipoprotein biologyNature Genetics 40:129–130.https://doi.org/10.1038/ng0208-129
-
Genome-wide association studies: potential next steps on a genetic journeyHuman Molecular Genetics 17:R156–R165.https://doi.org/10.1093/hmg/ddn289
-
Ethnic differences in incidence and mortality of stroke in DenmarkEuropean Journal of Public Health 30:ckaa165.https://doi.org/10.1093/eurpub/ckaa165.346
-
Recent progress in phospholipase A₂ research: from cells to animals to humansProgress in Lipid Research 50:152–192.https://doi.org/10.1016/j.plipres.2010.12.001
-
Arachidonic acid stimulates glucose uptake in 3T3-L1 adipocytes by increasing GLUT1 and GLUT4 levels at the plasma membrane. Evidence for involvement of lipoxygenase metabolites and peroxisome proliferator-activated receptor gammaThe Journal of Biological Chemistry 276:9149–9157.https://doi.org/10.1074/jbc.M009817200
-
SoftwareGaston: genetic data handling (QC, GRM, LD, PCA) & linear mixed models, version R package version 1.5.9Gaston: Genetic Data Handling (QC, GRM, LD, PCA) & Linear Mixed Models.
-
SoftwareR: A language and environment for statistical computingR Foundation for Statistical Computing, Vienna, Austria.
-
Cellular localization and role of prohormone convertases in the processing of pro-melanin concentrating hormone in mammalsThe Journal of Biological Chemistry 274:6536–6545.https://doi.org/10.1074/jbc.274.10.6536
-
ALDH2 genetic polymorphism and the risk of type II diabetes mellitus in CAD patientsHypertension Research 33:49–55.https://doi.org/10.1038/hr.2009.178
Article and author information
Author details
Funding
Kerala State Council for Science, Technology and Environment
- Rashmi Sukumaran
Rajiv Gandhi Centre for Biotechnology, Department of Biotechnology, Ministry of Science and Technology, India
- Moinak Banerjee
The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.
Acknowledgements
RS acknowledges the support of Kerala State Council for Science, Technology and Environment (KSCSTE) for providing the research fellowship. RS and ASN acknowledge the SIUCEB support at the Department of Computational Biology and Bioinformatics, University of Kerala for providing the necessary facilities to carry out the work. MB acknowledges the Department of Biotechnology for providing intra-mural support to carry out the work.
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.94088. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2024, Sukumaran et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 515
- views
-
- 28
- downloads
-
- 0
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Epidemiology and Global Health
- Microbiology and Infectious Disease
Influenza viruses continually evolve new antigenic variants, through mutations in epitopes of their major surface proteins, hemagglutinin (HA) and neuraminidase (NA). Antigenic drift potentiates the reinfection of previously infected individuals, but the contribution of this process to variability in annual epidemics is not well understood. Here, we link influenza A(H3N2) virus evolution to regional epidemic dynamics in the United States during 1997—2019. We integrate phenotypic measures of HA antigenic drift and sequence-based measures of HA and NA fitness to infer antigenic and genetic distances between viruses circulating in successive seasons. We estimate the magnitude, severity, timing, transmission rate, age-specific patterns, and subtype dominance of each regional outbreak and find that genetic distance based on broad sets of epitope sites is the strongest evolutionary predictor of A(H3N2) virus epidemiology. Increased HA and NA epitope distance between seasons correlates with larger, more intense epidemics, higher transmission, greater A(H3N2) subtype dominance, and a greater proportion of cases in adults relative to children, consistent with increased population susceptibility. Based on random forest models, A(H1N1) incidence impacts A(H3N2) epidemics to a greater extent than viral evolution, suggesting that subtype interference is a major driver of influenza A virus infection ynamics, presumably via heterosubtypic cross-immunity.
-
- Epidemiology and Global Health
- Evolutionary Biology
Several coronaviruses infect humans, with three, including the SARS-CoV2, causing diseases. While coronaviruses are especially prone to induce pandemics, we know little about their evolutionary history, host-to-host transmissions, and biogeography. One of the difficulties lies in dating the origination of the family, a particularly challenging task for RNA viruses in general. Previous cophylogenetic tests of virus-host associations, including in the Coronaviridae family, have suggested a virus-host codiversification history stretching many millions of years. Here, we establish a framework for robustly testing scenarios of ancient origination and codiversification versus recent origination and diversification by host switches. Applied to coronaviruses and their mammalian hosts, our results support a scenario of recent origination of coronaviruses in bats and diversification by host switches, with preferential host switches within mammalian orders. Hotspots of coronavirus diversity, concentrated in East Asia and Europe, are consistent with this scenario of relatively recent origination and localized host switches. Spillovers from bats to other species are rare, but have the highest probability to be towards humans than to any other mammal species, implicating humans as the evolutionary intermediate host. The high host-switching rates within orders, as well as between humans, domesticated mammals, and non-flying wild mammals, indicates the potential for rapid additional spreading of coronaviruses across the world. Our results suggest that the evolutionary history of extant mammalian coronaviruses is recent, and that cases of long-term virus–host codiversification have been largely over-estimated.