Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture

  1. Haiko Schurz  Is a corresponding author
  2. Vivek Naranbhai
  3. Tom A Yates
  4. James J Gilchrist
  5. Tom Parks
  6. Peter J Dodd
  7. Marlo Möller
  8. Eileen G Hoal
  9. Andrew P Morris
  10. Adrian VS Hill
  11. International Tuberculosis Host Genetics Consortium
  1. DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, South Africa
  2. Wellcome Centre for Human Genetics, University of Oxford, United Kingdom
  3. Massachusetts General Hospital, United States
  4. Dana-Farber Cancer Institute, United States
  5. Centre for the AIDS Programme of Research in South Africa, South Africa
  6. Harvard Medical School, United States
  7. Division of Infection and Immunity, Faculty of Medical Sciences, University College London, United Kingdom
  8. Department of Paediatrics, University of Oxford, United Kingdom
  9. Department of Infectious Diseases Imperial College London, United Kingdom
  10. School of Health and Related Research, University of Sheffield, United Kingdom
  11. Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, United Kingdom
  12. Jenner Institute, University of Oxford, United Kingdom

Abstract

The heritability of susceptibility to tuberculosis (TB) disease has been well recognized. Over 100 genes have been studied as candidates for TB susceptibility, and several variants were identified by genome-wide association studies (GWAS), but few replicate. We established the International Tuberculosis Host Genetics Consortium to perform a multi-ancestry meta-analysis of GWAS, including 14,153 cases and 19,536 controls of African, Asian, and European ancestry. Our analyses demonstrate a substantial degree of heritability (pooled polygenic h2 = 26.3%, 95% CI 23.7–29.0%) for susceptibility to TB that is shared across ancestries, highlighting an important host genetic influence on disease. We identified one global host genetic correlate for TB at genome-wide significance (p<5 × 10-8) in the human leukocyte antigen (HLA)-II region (rs28383206, p-value=5.2 × 10-9) but failed to replicate variants previously associated with TB susceptibility. These data demonstrate the complex shared genetic architecture of susceptibility to TB and the importance of large-scale GWAS analysis across multiple ancestries experiencing different levels of infection pressure.

Editor's evaluation

This article describes an important multi-ancestry meta-analysis of genome-wide association studies of susceptibility to tuberculosis. It demonstrates substantial heritability from common genetic variants, although this varies across studies. The main finding of the article is a variant in the HLA region that affects tuberculosis risk, for which the evidence is solid. The results and methods will be of interest to infectious disease researchers and human genetics researchers. The article highlights both the promise and challenges of performing multi-ancestry genetic association studies of infectious disease risk.

https://doi.org/10.7554/eLife.84394.sa0

Introduction

Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb) and related species, remains a leading cause of death globally. Around one-quarter of the global population is estimated to show immunological evidence of prior exposure to Mtb (Houben and Dodd, 2016), and in 2019 an estimated 10 million people developed the disease, resulting in 1.4 million deaths (WHO, 2020). This disease burden could be substantially reduced with action to address the social determinants of disease and equitable scale-up of existing interventions. However, tools to prevent, diagnose, and treat TB could be improved if a better understanding of the underpinning pathophysiology could help identify those at greatest risk of the disease.

The role of host genetic factors in TB susceptibility has long been of significant interest. Over 100 candidate genes have been studied, but few associations have proven reproducible (Naranbhai, 2016). This failure to replicate may be a result of the modest size of many TB genome-wide association studies (GWAS), variability in phenotyping between studies, the impact of population-specific effects, the challenge of complex population structure in some high-burden settings (e.g., admixed individuals), and, possibly, pathogen variation (Correa-Macedo et al., 2019; Daya et al., 2014a; Luo et al., 2019; Möller and Kinnear, 2020; Müller et al., 2021; Omae et al., 2017; Schurz et al., 2018). Seventeen GWAS have been reported but only two loci replicate between studies (Daya et al., 2014a; Schurz et al., 2018; Chimusa et al., 2014; The Wellcome Trust Case Control Consortium, 2007; Curtis et al., 2015; Mahasirimongkol et al., 2012; Qi et al., 2017; Thye et al., 2010; Thye et al., 2012; Quistrebert et al., 2021; Sveinbjornsson et al., 2016; Hong et al., 2017; Li et al., 2021; Luo et al., 2019; Zheng et al., 2018; Grant et al., 2016; Png et al., 2012). The WT1 locus, identified in cohorts from Ghana and Gambia, replicated in South Africa and Russia. The ASAP1 locus identified in Russia was replicated through reanalysis of prior studies (Correa-Macedo et al., 2019; Möller and Kinnear, 2020).

To address these challenges, we established the International Tuberculosis Host Genetics Consortium (ITHGC) to study the host genetics of disease through collaborative and equitable data sharing (Naranbhai, 2016). The ITHGC includes 12 case–control GWAS from nine countries in Europe, Africa, and Asia (total of 14,153 pulmonary TB cases and 19,536 healthy controls). Inclusion of multiple ancestral groups in a multi-ancestry meta-analysis has the advantage of maximizing power and enhancing fine-mapping resolution to identifying true global associated variants that influence TB susceptibility across population groups.

Here we present the first analyses of the ITHGC dataset exploring host genetic correlates of TB susceptibility using a multi-ancestry meta-analysis approach, including fine-mapping of human leukocyte antigen (HLA) loci and estimation of genetic heritability.

Results

Study overview

In total, 12 GWAS from three major ancestral groups (European, African, and Asian) were included in this study (Table 1; a more detailed table outlining the selection of cases and controls is provided in Supplementary file 1a). All individual datasets were imputed and aligned to the same reference allele before association testing, using an additive genetic model, to obtain odds ratios (OR) and p-values to be used in the meta-analysis. For each individual study (for which we had raw genotyping data), the polygenic heritability was estimated, and HLA alleles were imputed for fine-mapping of the HLA regions.

Table 1
Summary of ITHGC TB-GWAS datasets.
DatasetPopulationCases/ controlsTB prevalence per 100 ,000 paEstimated proportion of controls ever exposed to Mtb (±SD)*#SNPsGenotyping platformReference
China 1Asian483/587890.302 (0.101)7,710,153Affymetrix Genome-Wide Human SNP Array 6.0thye@bni-hamburg.de
(unpublished)
China 2Asian1290/1145890.302 (0.101)9,769, 029Illumina Human OmniZhonghua-8 chipsmagdakellis@gmail.com
(unpublished)
China 3Asian972/1537890.302 (0.101)9,726,450Illumina Human OmniZhonghua-8 chipsQi et al., 2017
ThailandAsian433/2952360.404 (0.112)6,723,358Illumina Human610-QuadMahasirimongkol et al., 2012
JapanAsian751/3199230.142 (0.125)9,051,051Illumina HumanHap550Mahasirimongkol et al., 2012
RussiaEuropean5914/60221090.191 (0.093)10,878,777Affymetrix Genome-Wide Human SNP Array 6.0Curtis et al., 2015
EstoniaEuropean239/7047130.116 (0.093)10,611,556Illumina 370Kandres.metspalu@ut.ee
(unpublished)
GermanyEuropean586/3337.80.067 (0.081)10,602,193Illumina Omni2.5+exomethye@bni-hamburg.de
(unpublished)
GambiaAfrican1316/13821260.280 (0.089)18,634,017Affymetrix GeneChip 500KThe Wellcome Trust Case Control Consortium, 2007
GhanaAfrican1359/19522820.539 (0.198)19,029,214Affymetrix Genome-Wide Human SNP Array 6.0Thye et al., 2010
RSA(A) African19/5777170.436 (0.127)9,227,330Affymetrix 500kDaya et al., 2014b
RSA(M)African410/4057170.436 (0.127)11,371,838Illumina MEGA arraySchurz et al., 2018
  1. GWAS, genome-wide association studies; ITHGC, International Tuberculosis Host Genetics Consortium; Mtb, Mycobacterium tuberculosis; TB, tuberculosis.

  2. *

    Estimated proportion of control individuals ever infected with Mtb by age 35–44 in 2010, based on data from Houben & Dodd.

  3. Raw genotyping data available.

  4. RSA(A/M): South African admixed population (RSA) Affymetrix (A) and MEGA (M) array data.

The summary statistics from the individual GWAS of each dataset were used to conduct a combined, multi-ancestry meta-analysis using MR-MEGA and ancestry-specific (European, African, and Asian) fixed effects (FE) meta-analyses using GWAMA. Finally, the impact of infection pressure on the multi-ancestry meta-regression was assessed and the concordance in direction of effect for the reference allele between studies was investigated.

Polygenic heritability estimates suggest a genetic contribution to TB disease susceptibility

Twin studies estimate the narrow-sense heritability of susceptibility to TB at up to 80% (Diehl and Von, 1936; Kallmann and Reisner, 1943; Comstock, 1978), but there are few modern estimates. Using raw (unimputed) genotyping data, and assuming population prevalence of disease in each study population equivalent to the reported WHO prevalence rates for that country (WHO, 2020), we estimated polygenic heritability of susceptibility to TB in 10 contributing studies which ranged from 5 to 36% (average of 26.3%, Supplementary file 1b). Comparisons of the heritability estimates between studies from different geographical locations do not take into consideration the differences in environmental pressures between the included studies, and as such these estimates of heritability are only interpretable if the distribution of nongenetic determinants of TB is held constant (Pearce, 2011). Furthermore, variations in phenotype definition can have an impact on heritability estimates (Supplementary file 1a). This is supported by previous research by McHenry et al., 2021a, where significant differences in polygenic heritability estimates were identified between subjects with latent TB infection (LTBI), active TB, and subjects classified as resistors. (McHenry et al., 2021a). As this study includes data with varying methods of classifying TB cases and healthy controls (Supplementary file 1a), there is potential for a degree of heterogeneity and misclassification (between cases and controls) that can have an impact on the heritability estimates. Recent history has seen the near elimination of TB in several countries associated with economic development and public health action. However, while improvement of socioeconomic standing and environment has a stronger impact than host genetics, these crude estimates of polygenic heritability do indicate that TB susceptibility is, in part, heritable. These results require future, more rigorous investigations to narrow down the level of heritable risk and pinpoint genomic loci involved by accounting for population stratification to obtain more accurate heritability estimates.

Multi-ancestry meta-analysis identifies susceptibility loci for TB

For the primary multi-ancestry meta-analysis, MR-MEGA was used as it allows for differences in allelic effects of variants on disease risk between GWAS. Principal components (PCs), derived from a matrix of similarities in allele frequencies between GWAS, were plotted and revealed distinct separation between the three main ancestral groups included in the study (Figure 4) . To account for this, the first two PCs were included as covariates in MR-MEGA as they sufficiently accounted for the allele frequency differences between the study populations, as assessed via a QQ-plot and associated lambda inflation value (Figure 1—figure supplement 1, lambda = 1.00). In total, 26,620,804 variants with a minor allele frequency (MAF) > 1% and present in at least three studies were included in the analysis, of which 3,184,478 were present in all 12 datasets.

A significant association peak on chromosome 6 was identified in the HLA class II region (Figure 1). One variant (rs28383206, OR = 0.89, CI = 0.84–0.94, p-value=8.26 × 10–9) within this peak was associated with susceptibility to TB at genome-wide significance (p<5.0e–8, Figures 13, Table 2). Both the residual heterogeneity (p-value=0.012) and ancestry-correlated heterogeneity (p-value=5.28e–6) are significant (p-value<0.05) for the associated variant. However, the evidence of ancestry-correlated heterogeneity is much stronger than for residual heterogeneity, indicating that genetic ancestry contributes more to differences in effects sizes between GWAS than does study design (e.g., phenotyping differences and potential case–control misclassification). The association peak encompasses many HLA-ll genes, including HLA-DRB1/5 (major histocompatibility complex, class II, DR beta 1/5), HLA-DQA1 (major histocompatibility complex, class II, DQ alpha 1), and HLA-DQB3 (major histocompatibility complex, class II, DQ beta 3, Figures 1 and 2). While not reaching genome-wide significance, the HLA class l locus is also indirectly tagged through the association with rs2621322, in the TAP2 (transporter 2, ATP binding cassette subfamily B member) gene, a transporter protein that restores surface expression of MHC class I molecules and has previously been implicated in TB susceptibility (Thu et al., 2016). HLA-A, DQA1, DQB1, DRB1, and TAP2 genes have previously been linked to TB susceptibility through TB candidate gene and GWAS analysis (Thu et al., 2016; Kinnear et al., 2017; Stein et al., 2017; Sveinbjornsson et al., 2016; Zhang et al., 2021). The HLA-II locus encodes several proteins crucial in antigen presentation, including HLA-DR, HLA-DQ, and HLA-DP, which are widely implicated in susceptibility to infection and autoimmunity (Kelly and Trowsdale, 2019; Shiina et al., 2009).

Figure 1 with 5 supplements see all
Manhattan plot of p-values (more than three studies) from the MR-MEGA analysis of all 12 datasets with genomic control reveals one significant association in the HLA-ll region of chromosome 6 (rs28383206).

Image produced using R scripts provided by MR-MEGA (Mägi et al., 2017), and source data file has been uploaded to https://doi.org/10.5061/dryad.6wwpzgn2s.

Regional association plot for the chromosome 6 HLA-ll rs28383206 association in the multi-ancestry analysis revealing a significant peak in the HLA-ll region.

Image produced using the online LocusZoom database with linkage disequilibrium (LD) mapping set to ‘all’ and p-values>0.01 removed (Boughton et al., 2021), and source data file has been uploaded to https://doi.org/10.5061/dryad.6wwpzgn2s.

Figure 3 with 1 supplement see all
HLA conditioning analysis.

(A) Forest plot (odds ratio and 95% confidence interval) of the significant chromosome 6 association (rs28383206) for tuberculosis (TB) susceptibility in the multi-ancestry analysis, implemented using MR-MEGA with genomic control correction (GCC). Of the 12 studies included, 8 contained this variant. Studies that did not contain the variant are included in the plot but do not have results associated with them. (B) Forest plot for HLA DQA1*02:01 for the eight studies included in the HLA association analysis. Other studies included were obtained from literature searches of previous studies where HLA imputation and association studies were performed (Sveinbjornsson et al., 2016; Li et al., 2021; Zheng et al., 2018). For source data, see Figure 3—source data 1.

Table 2
Significant and suggestive associations (p-value ≤1e–5) for the multi-ancestry analysis including data from all 12 datasets implementing MR-MEGA analysis with GCC.
Marker nameChromosomePositionGeneLocationCADD scoreEANEAEAFSample sizeDatasetsp-Value
rs28383206632575167HLA-DRB1Intergenic7.6GA0.16825,05988.26e–09
  1. GCC, genomic control correction; EA, effect allele; EAF, effect allele frequency; NEA, noneffect allele.

HLA-II

Given the strong association peak in the HLA-ll locus (Figures 1 and 2), we imputed HLA-ll alleles to fine-map this association. HLA alleles were imputed using the HIBAG R package that utilizes both genotyping array and population-specific reference panels to obtain the most accurate imputations for each individual dataset. Association testing was then conducted using an additive genetic model for each individual dataset before meta-analyzing the results (Source data 1, sheets 11–15).

Notwithstanding inconsistency across populations, the strongest signal in the combined global analyses is at DQA1*02:01, revealing a protective effect (OR = 0.88, 95% CI = 0.82–93, p-value=1.3e–5, Figure 3B). The signal remains apparent in the six populations with the lead SNP at MAF > 2.5% and individual-level data available (p-value=0.0003). After conditioning on the lead SNP (rs28383206) in this subset, there is no residual significant association at DQA1*02:01 (p-value=0.44, Figure 3—figure supplement 1), suggesting that the classical allele is tagging the rs28383206 association. This observation is consistent with previous observations of HLA analysis in Icelandic (DQA1*02:01: OR = 0.82, p-value=7.39e–4) and Han Chinese populations (DQA1*02:01: OR = 0.82, p-value=7.39e–4), but showed opposite direction of effect in another Chinese population (DQA1*02:01: OR = 1.28, p-value=0.0193, Figure 3B; Sveinbjornsson et al., 2016; Li et al., 2021; Zheng et al., 2018).

The significant HLA associations overlap with the association peak observed in the multi-ancestry meta-analysis (Figure 2) but show more consistency in the direction of effects between the input studies compared to the lead SNPs identified in the association peak. This suggests that the rs28383206 association in the meta-analysis is tagging an HLA allele, where the different linkage disequilibrium (LD) patterns from the included ancestral populations result in the differences in effects sizes between populations at the rs28383206 association.

This variation in significant associations is, in part, attributable to the observed variation in HLA allele frequencies across all the included studies and may also reflect differential tagging of at least one unknown causal variant across populations (Source data 1, sheets 16–22).

The variable role of classical HLA alleles in different populations could be partially due to unique infectious pressures that each geographical region faces and could also explain why different strains of Mtb are more or less prevalent in different regions as they adapted to the HLA profile of the population within this region. Sequencing efforts of global mycobacterial isolates find hyperconservation of class II epitopes, suggesting pathogen advantage achieved through limiting HLA-II recognition and highlighting the potential complex interplay between pathogen and host evolution in modifying class II presentation in TB infection (Comas et al., 2010). Previous work has shown evidence of interaction between genetic variants of the host and specific strains of Mtb in Ghanaian, Ugandan, South African, and Asian populations (Möller and Kinnear, 2020; Müller et al., 2021; Correa-Macedo et al., 2019; Salie et al., 2014; Luo et al., 2015; Wampande et al., 2019; Micheni et al., 2021; McHenry et al., 2021b; McHenry et al., 2020). These interactions provide further evidence that Mtb may have undergone substantial genetic evolution, in concert with host migration and evolution of different populations (Comas et al., 2013; Coscolla and Gagneux, 2014). Some studies suggest that HLA-II epitopes may have undergone regional mutations that modify HLA-II binding, and we speculate that the heterogeneity observed in HLA-II associations between regions may, at least in part, be accounted for by different pressures exerted by varying stains of Mtb (Copin et al., 2016).

Impact of infection pressure on meta-regression

To further understand the heterogeneity across populations, we attempted to account for variation in levels of prior exposure that could serve to mask host effects given that not all controls will have been exposed to Mtb. In low transmission settings, more susceptible but unexposed individuals would be included as controls, who, had they been exposed to Mtb, might have progressed to TB disease. Overall, including each cohort’s estimated prevalence of prior exposure had a significant impact on the residual heterogeneity and association statistics of 5% of the variants included in the meta-analysis (419,460/8,355,367), which at a significance level of p-value<0.05 is what is to be expected purely by chance. Separating the results into bins according to p-values revealed that the bins where the covariate had the biggest impact were for p-values in the range of 1e–3 to 1e–5 (Figure 1—figure supplement 2), while significant and suggestive associations reported in this study did not show any significant changes in residual heterogeneity. While the proportion of variants significantly impacted when correcting for infection pressures is low and has the biggest impact on variants with larger p-values, there was still an overall reduction in the chi-square value for the residual heterogeneity (mean chi-square value reduced by 10). This suggests that accounting for potential lifetime of infections does account for some of the observed residual heterogeneity; it is most likely not the main driving force for these residuals.

When considering the impact of force of infection, it is important to consider not only the proportion of controls ever exposed but also the impact of recurrent exposure. There is some evidence to suggest that genetic barriers to progression to TB may be overcome if the infectious dose is high (Fox et al., 1929). Repeated exposure may be observed where TB prevalence is high, as in South Africa, and could contribute to the overall lower effects sizes observed in the GWAS enrolling RSA people. Inclusion of potential lifetime infections in meta-regression could help adjust for these effects and prove useful for not only TB, but meta-analysis of infectious diseases in general, and should be further explored.

Other suggestive loci that did not reach significance

There were four loci with suggestive associations and strong peaks on the Manhattan plot (Figure 1) that did not reach significance but should still be considered as potential variants of interest (Supplementary file 1c). One chr9 peak (rs4576509, p-value=7.40e–07) was intergenic (Figure 1—figure supplement 3) while the second (rs6477824, p-value=2.99e–07) is located in the 5′-UTR region of the zinc finger protein 483 (ZNF483) gene (Figure 1—figure supplement 3), previously associated with age at menarche (Demerath et al., 2013; Elks et al., 2010). The chromosome 11 peak (rs12362545, p-value=1.24e–06) is located in the PPFIA binding protein 2 (PPFIBP2) gene (Figure 1—figure supplement 4), which plays a role in axon guidance and neuronal synapse development and has previously been implicated in cancer development (Colas et al., 2011; Wu et al., 2018). The final peak (rs35787595, p-value=5.41e–06), on chromosome 16 (Figure 1—figure supplement 5), is located in the craniofacial development protein 1 (CFDP1) gene region and involved in chromatin organization (Messina et al., 2017). These genes have not been previously linked to TB susceptibility and a potential role is unclear, and as a result further validation of these variants is needed before any conclusions on their impact to TB susceptibility can be drawn.

Ancestry-specific meta-analysis

Concordance in the direction of effects of the risk allele between the ancestry-specific meta-analyses was examined to determine whether significant enrichment (above the expected 50%) exists at different p-value thresholds. Significant enrichment in the concordance of direction of effect was only observed when using the European ancestry as reference compared to the African meta-analysis results for SNPs with p-values>0.001 and <0.01 (p-value=0.0061, Supplementary file 1d). The lack of enrichment between the ancestries suggests significant ancestry-specific associations, which could be further compounded by the differences in local infection pressures. Due to the lack of concordance and the separation of the ancestral populations in the principal component analysis (PCA) plot (Figure 4), ancestry-specific meta-analysis was done.

Figure 4 with 4 supplements see all
Principal component analysis (PCA) plot of all 12 studies based on the MR-MEGA mean pairwise genome-wide allele frequency differences.

Image produced using the R plot function. For source data, see Figure 4—source data 1.

The PCA plot (Figure 4) for the 12 studies (based on mean pairwise genome-wide allele frequency differences calculated by MR-MEGA) illustrates distinct separation between the three major population groups (Asia, Europe, and Africa). The separation observed between the African studies (Gambia/Ghana and RSA) is due to the high level of admixture in the RSA population. The RSA population is a five-way admixed South African population with genetic contributions from Bantu-speaking African, KhoeSan, European, and South and South East Asian populations, which explains the observed shift in the PCA plot (Daya et al., 2013; Figure 4).

QQ-plots for the ancestry-specific analysis show no significant inflation or deflation. After removing associations without any clear peaks on the Manhattan plots (associations driven by a single study), we found no significant associations for the ancestry-specific analysis. However, suggestive peaks that did not reach genome-wide significance were identified in the European and Asian ancestry-specific analyses (Figure 4—figure supplements 1 and 2, Supplementary file 1e). Potential causes for the lack of associations and suggestive peaks in the African analysis (Figure 4—figure supplement 3) are the increased genetic diversity within Africa, the inclusion of admixed samples (RSA), and the smaller sample size compared to the other ancestry-specific meta-analysis. While power can be increased through inclusion of greater genetic diversity, between-subpopulation differences in allele frequency can introduce confounding. Confounding by genetic background can result in both spurious associations and the masking of true associations. Such confounding may explain why the results observed elsewhere may not replicate in admixed samples. Removing the admixed data and analyzing only the Gambian and Ghanaian datasets also did not produce any significant results, although, clearly, the sample size was smaller.

For the European analysis (Figure 4—figure supplement 1), suggestive peaks were identified on chromosomes 6 (rs28383206, p-value=7.06e–08), 8 (rs3935174, p-value=1.00e–06), and 11 (rs12362545, p-value=1.06e–07, Supplementary file 1e), while the Asian (Figure 4—figure supplement 2) analysis identified suggestive peaks on chromosome 6 (rs146049519, p-value=1.06e–06) and 8 (rs62495207, p-value=5.10e–06, Supplementary file 1e).

The suggestive peaks on chromosomes 6 and 11 in the European subgroup analysis overlap with the suggestive peaks of the multi-ancestry meta-analysis (Figure 1, Figure 4—figure supplement 4, Supplementary file 1e), but the suggestive peak on chromosome 8 is unique to this population (Figure 4—figure supplement 1, Supplementary file 1e). The strongest signal for this peak (rs3935174, OR = 0.87, p-value=1.00e–6) is located in the ArfGAP with SH3 domain, ankyrin repeat, and PH domain 1 (ASAP1) region, which encodes an ADP-ribosylation factor (ARF) GTPase-activating protein and is potentially involved in the regulation of membrane trafficking and cytoskeleton remodeling (Brown et al., 1998). Variants in ASAP1 (rs4733781 and rs10956514) have previously been linked to TB susceptibility in a TB-GWAS analysis of the same Russian population included here (Curtis et al., 2015). While these ASAP1 variants were present in all 12 studies and had consistent direction of effects, they presented with a strong signal in the European ancestry-specific analysis only (African and Asian p-values all ≥ 0.1). These differences in association were not driven by allele frequency differences as they are similar between the included study populations. A possible explanation for the association being observed only in the European meta-analysis is that the association is driven by the Russian dataset. rs4733781 has a strong signal in the Russian dataset (p-value=2.96e–7), but very weak signals in all other populations included in the analysis (p-value>0.01) and is in LD with rs3935174 (r2 = 0.6935 and D’ = 0.8791) identified in our analysis. rs4733781 also did not replicate in a previous GWAS from Iceland (Sveinbjornsson et al., 2016), further suggesting that this association is not specific to European populations, but rather driven by the large Russian dataset included in this study.

The suggestive peak on chromosome 8 in the Asian subgroup analysis lies in an intergenic region (Figure 4—figure supplement 2, Supplementary file 1e) and the link to TB susceptibility is unclear. Finally, the suggestive region on chromosome 6 overlaps with the significant peak from the multi-ancestry analysis (Figure 1 and Figure 4—figure supplement 2) and is located in the major histocompatibility complex, class II, DR beta 1 (HLA-DRB1), as discussed above (Figure 4—figure supplement 2, Supplementary file 1e).

Prior associations

To determine whether associations from previously published TB-GWAS, TB candidate SNPs, and SNPs within candidate gene studies replicate in this meta-analysis, we extracted all significant and suggestive associations from prior analyses and compared these to our multi-ancestry and ancestry-specific meta-analysis results (Luo et al., 2019; Schurz et al., 2018; Chimusa et al., 2014; The Wellcome Trust Case Control Consortium, 2007; Curtis et al., 2015; Mahasirimongkol et al., 2012; Qi et al., 2017; Thye et al., 2010; Thye et al., 2012; Quistrebert et al., 2021; Hong et al., 2017; Zheng et al., 2018; Grant et al., 2016; Png et al., 2012; Daya et al., 2014b). In total, 44 SNPs and 36 genes were identified from the GWAS catalog, of which 33 SNPs and all candidate genes were present in our data (Source data 1, sheet 2). We also extracted the association statistics for a further 90 previously identified candidate genes from our multi-ancestry and population-specific meta-analysis results (Source data 1, sheet 2; Naranbhai, 2016).

Using a Bonferroni-corrected p-value of 0.0015 for the number of SNPs tested (33) as the significance threshold for replication, two candidate SNPs (rs4733781: p-value=3.22e–5; rs10956514: p-value=0.000118; Source data 1, sheets 3 and 4) replicated in the multi-ancestry meta-analysis, both located in the ASAP1 gene region (Curtis et al., 2015; Chen et al., 2019; Wang et al., 2018). However, as discussed in the previous section, these associations are driven by the Russian dataset, which is the same data used by Curtis et al., 2015, where these associations were originally discovered (Curtis et al., 2015). As the Russian population included in our analysis presenting with a strong signal for these variants, there is no independent evidence for these candidate SNPs as they did not replicate in any other population.

For the Asian ancestry-specific analysis, the replicated variant was rs41553512, located in the HLA-DRB5 gene (p-value=3.53E-05). HLA-DRB5 is located within the HLA-ll region identified in the multi-ancestry meta-analysis (Figure 1) and was previously identified by Qi et al., 2017 in a Han Chinese population. The African ancestry-specific analysis did not replicate previous associations, with the lowest p-value at rs6786408 in the FOXP1 gene (p-value=0.023). While this variant was previously identified in a North African cohort, the fact that it does not replicate here could be because of the genetic diversity within Africa and specifically the variability introduced by the five-way admixed South African population.

Discussion

This large-scale, multi-ethnic meta-analysis of genetic susceptibility to TB, involving 14,153 cases and 19,536 controls, identified one risk locus achieving genome-wide significance, and further investigation of this region revealed significant classical HLA allele associations. This association is noteworthy given we show that there is association in other studies for the same allele (Kinnear et al., 2017; Stein et al., 2017).

Based on the significant association, rs28383206, in the HLA region identified in this multi-ancestry analysis (Figure 3A), HLA-specific imputation and association testing were done to fine-map the region and identify potential HLA alleles driving this association. HLA DQA1*02:01 had the strongest signal in the meta-analysis across the eight included studies (Figure 3B), but this signal disappeared when conditioning on the significant SNP (rs28383206). HLA DQA1*02:01 has previously been identified in an Icelandic and two Chinese populations, but the direction of effect was not consistent (Sveinbjornsson et al., 2016; Li et al., 2021; Zheng, 2018). Despite these inconsistencies, the association between Mtb and HLA class II should be explored in more detail in future studies. A study investigating the outcomes of Mtb exposure in individuals of African ancestry identified protective effects of HLA class II alleles for individuals resistant to TB, highlighting the importance of HLA class II and susceptibility to TB (Dawkins et al., 2022). HLA class II is a key determinant of the immune response in TB, and Mtb has the mechanisms to directly interfere with MHC class 2 antigen presentation (Sia and Rengarajan, 2019). This is supported by studies in mice, where mice in which the MHC class ll genes were deleted died quickly when exposed to Mtb and died faster than the mice in which MHC class I genes were deleted (Sia and Rengarajan, 2019).

The p-values of residual heterogeneity in genetic effects between the studies in the multi-ancestry meta-analysis show no significant inflation between the studies. This suggests that the differences in study characteristics (phenotype definition, infection pressure, Mtb strain) are not the main contributor to the lack of significant associations. However, they certainly have an impact, which is further compounded with ancestry-correlated heterogeneity and other factors (e.g., socioeconomic standing). The ancestry-correlated heterogeneity p-values are generally lower than the residual heterogeneity, suggesting that genetic ancestry has a stronger impact on the differences in effects sizes between the studies. This is supported by the fact that previous TB genetic association studies have identified significant effects of ancestry on TB susceptibility (Chimusa et al., 2014; Daya et al., 2014b). However, the effects of genetic ancestry can be confounded by other factors not accounted for in this analysis, such as the differences in socioeconomic factors (including the differences in housing, employment, poverty, and access to healthcare), phenotype definitions, and differences in infection pressure between the included study populations (Hargreaves et al., 2011; Duarte et al., 2018; Lönnroth et al., 2009). Specifically, the lack of consistency and specificity in TB diagnosis between the included studies introduces heterogeneity and the potential for misclassification of cases and controls, which can reduce the power to detect significant associations (Supplementary file 1a). While this is a limitation of this study, the fact that the residual heterogeneity is overpowered by the ancestry-specific heterogeneity suggests that the phenotype definitions are not the main driver behind the lack of significant associations. For the ancestry-specific analysis, fewer studies result in there being less input heterogeneity to account for, but the reduced sample size was not sufficient to detect any ancestry-specific genome-wide associations. This is particularly evident for the African ancestry-specific meta-analysis where the large degree of heterogeneity, which could be a result of the high genetic diversity within Africa, in combination with differences in socioeconomic factors compared to other populations included in this study, resulted in no observable suggestive association peaks (Campbell and Tishkoff, 2008; Peprah et al., 2015). Furthermore, the suggestive associations (Supplementary file 1c and e) reported in this study should be interpreted with care, and further validation is required before any conclusions can be drawn on the impact that they could have on TB susceptibility.

Polygenic heritability estimates revealed genetic contributions to TB susceptibility for all studies, but the level of this contribution varied greatly (5–36%), suggesting that other factors are contributing to both the lack of significant associations detected in this meta-analysis and the variation observed for the polygenic heritability estimates. These factors likely include environmental, socioeconomic, and varying levels of infection pressures, as well as genetic ancestry-specific effects between the included study populations. An individual from South Africa will face a much higher force of infection than individuals in Europe, and making the assumption that environmental circumstances are equal will significantly skew these crude heritability estimates (Pearce, 2011). This argument is sustained by the fact that increasing disease prevalence (higher infection pressure) increased the level of genetic contribution to TB susceptibility up to a certain point, presumably accounted for by increasingly informative control samples, after which further increasing the infection pressure will not further impact genetic susceptibility.

To determine the impact that force of infection has on the level of genetic contribution to TB susceptibility, we modeled values for proportion of people ever infected with Mtb to include in the multi-ancestry meta-analysis and correct for the different force of infection faced by individuals in each country. Inclusion of this covariate, however, only resulted in a significant difference for 5% of the analyzed variants, what is to be expected based on chance alone, and as such we cannot conclude that a significant portion of the observed residual heterogeneity is explained by this. Limited metadata forced us to make several assumptions about the ages of study participants and the dates on which they were enrolled. With more precise metadata, or Mtb infection test results in controls, the potential impact of lifetime infection could be better quantified and may contribute to elucidating genetic TB susceptibility. Multi-ancestry meta-analysis of other infectious diseases could also potentially benefit from the inclusion of force of infection covariates. It would also be important to determine whether there is a level of exposure beyond which host genetic barriers to infection are overcome (Simmons et al., 2018).

A single significant association was identified in this multi-ancestry meta-analysis, which is small when compared to other meta-analyses of similar size. Factors contributing to this include the difficulty in analyzing multi-ancestry data, the outdated arrays and lack of suitable reference panels for the included study populations, and heterogeneity in case and control definitions between the studies. The issue of heterogeneity in definitions is especially pronounced for this study as it included unpublished data with limited information, which does not indicate how cases were confirmed and controls were collected. The complexity of TB and generally small genetic effects suggests that larger sample sizes or alternative methods of investigation are needed. Utilizing GWAS arrays that better capture diverse populations in combination with imputation making use of larger and more diverse reference panels would allow for larger and more consistent datasets for future meta-analysis. Remapping specific areas of interest such as the HLA, ASAP1, or TLR using long-read sequencing would be invaluable. Increased amounts of genetic data will also allow for more accurate TB heritability analysis and permit analysis of polygenic risk scores and exploration of host–pathogen interactions.

In conclusion, this large-scale multi-ancestry TB GWAS meta-analysis revealed significant associations and shared genetic TB susceptibility architecture across multiple populations from different genetic backgrounds. The analysis shows the value of collaboration and data sharing to solve difficult problems and elucidate what determines susceptibility to complex diseases such as TB. We hope that this publication will encourage others to make their data available for future large-scale meta-analyses.

Methods

Data

This analysis includes 12 of the 17 published (and unpublished, Table 1, Supplementary file 1) GWAS of TB (with HIV-negative cohorts) prior to 2022 (Schurz et al., 2018; Chimusa et al., 2014; The Wellcome Trust Case Control Consortium, 2007; Curtis et al., 2015; Mahasirimongkol et al., 2012; Qi et al., 2017; Thye et al., 2010; Thye et al., 2012; Daya et al., 2014b). For unpublished works, we contacted researchers that were funded for genetic TB research and acquired data-sharing agreements to obtain summary statistics (or raw data) along with any metadata that was available. It excludes data from Iceland and Vietnam (Quistrebert et al., 2021) as they declined to share data. It excludes data from China, Korea, Peru, and Japan (Luo et al., 2019; Hong et al., 2017; Li et al., 2021; Zheng, 2018; Sveinbjornsson et al., 2016) as data-sharing agreements could not be finalized in time for this analysis. The Indonesian and Moroccan data were too sparsely genotyped and not suitable for reliable imputation. In addition, the Moroccan data was family-based and thus also not suitable for this meta-analysis as this would introduce confounding effects from the inclusion of related individuals (Grant et al., 2016; Png et al., 2012). Finally, cases and controls are also available within large-scale biobanks, for example, UK Biobank, which could also be leveraged in future iterations of this analysis (Munafò et al., 2018).

Included individuals were genotyped on a variety of genotyping arrays (Table 1, Supplementary file 1), and raw genotyping data was available for eight datasets and for the remainder association testing summary statistics were obtained to use in the meta-analysis (Table 1, Supplementary file 1). Quality control (QC) of raw genotyping data (Table 1, Supplementary file 1) was done using Plink (v1.9), followed by pre-phasing using SHAPEIT and imputation with IMPUTE2 with the 1000 genomes phase 3 reference panel (Chang et al., 2015; Delaneau et al., 2013; Howie et al., 2009; Sudmant et al., 2015). QC and imputation were done as described previously (Schurz et al., 2018; Schurz et al., 2019); briefly, we used a MAF filter of 0.025 and an individual and SNP missingness filter of 0.1. Hardy–Weinberg equilibrium threshold was set at a Bonferroni-corrected p-value according to the number of SNPs testes (0.05/number of SNPs) and samples where sex could not be determined from genotyping were also removed. Imputed data was filtered at a quality score of 0.3, prior to individual and genotype filtration steps. Prior to QC and imputation, allele orientation was corrected using Genotype Harmoniser version 1.4.15, and the genome build of all datasets was checked for consistency (GRCh37) and updated if necessary using the liftOver software from the UCSC genome browser (Deelen et al., 2014; Kent et al., 2002). The four datasets with only summary statistics available (Table 1, Supplementary file 1) were imputed and QC’d during the original investigations, but the marker names and allele orientation were checked for concordance between the summary statistics and the rest of the consortium’s imputed data.

Polygenic heritability analysis

To assess the level of genetic contribution to TB susceptibility, we estimated polygenic heritability on the individual studies for which raw genotyping data was available (Table 1, Supplementary file 1). Polygenic heritability estimates were calculated using GCTA (v1.93.2), a genomic risk prediction tool (Yang et al., 2011). The genetic relationship matrix was calculated for each autosomal chromosome. Raw genotype data was pruned for SNPs in LD using a 50 SNP window, sliding by 10 SNPs at a time and removing all variants with LD > 0.5. Samples were filtered by removing cryptic relatedness (--grm-cutoff 0.025) and assuming that the causal loci have similar distribution of allele frequencies as the genotyped SNPs (--grm-adj 0). Principal components were then calculated (--pca 20) to include as covariates prior to estimating heritability. Heritability estimations were transformed onto the liability scale using the GCTA software to account for the difference in the proportion of cases in the data compared to the population prevalence (Yang et al., 2011). The average heritability estimate was calculated by taking the mean of all estimates and the confidence intervals were estimated based on the standard error across all studies and the number of studies included.

Meta-analysis

All variants with MAF > 1% and polymorphic in at least three studies (from at least two different ancestries) were included in the primary analysis. For the GWAS, summary statistics of each dataset variants with infinite confidence intervals were removed prior to the meta-analysis. A multi-ancestry meta-analysis plus separate ancestry-specific analyses for Africa, Asia, and Europe were performed. MR-MEGA (Meta-Regression of Multi-Ethnic Genetic Association, v0.20), a meta-analysis tool that maximizes power and enhances fine-mapping when combining data across different ethnicities, was used for the multi-ancestry meta-analysis (Mägi et al., 2017). To account for the expected heterogeneity in allelic effects between populations, MR-MEGA implements a multi-ancestry meta-regression that includes covariates to represent genetic ancestry, obtained from multidimensional scaling of mean pairwise genome-wide allele frequency differences. Genomic control correction (GCC) was implemented during the MR-MEGA analysis for the individual input data (if lambda was >1.05) and output statistics, and the first two PCs, calculated from the genome-wide allele frequency differences, were included as covariates in the regression. QQ-plots of p-values and associated lambda values were used to assess the quality of results prior to downstream investigation.

For the ancestry-specific analyses, the studies were grouped by the major ancestral groups (Table 1, Supplementary file 1) and all variants with a MAF of > 1% that were observed in at least two studies were included in the meta-analysis. We performed traditional fixed-effects meta-analyses in GWAMA (v2.2.2), implementing GCC and assessed the results using QQ-plots (Mägi and Morris, 2010). The genome-wide significance threshold for all association testing was set at p-value=5 × 10-8 (Panagiotou et al., 2012).

HLA imputation

To fine-map HLA alleles over the HLA locus we imputed HLA class l and ll variants for all 8 studies for which raw data was available (Table 1 and Supplementary file 1). HLA imputation for the HLA class l regions A, B and C as well as the HLA class ll regions DPB1, DRB1, DQB1 and DQA1 was done using the R package HIBAG (version 1.5), implemented in the R free software environment (version 4.0.5) using the predict() command for imputation (R Development Core Team, 2013; Zheng, 2018; Zheng et al., 2014).

The reference datasets for HLA imputation are both genotyping panel and population-specific, and HIBAG has a database of reference data for many genotyping arrays. Each reference panel is also available for either Asian, European, or African populations or a mixture of the three (https://hibag.s3.amazonaws.com/hlares_index.html#estimates). For each dataset included for imputation, the reference panel chosen was the same as the genotyping array used for the data and the reference population was chosen to match the data as closely as possible. Asian and European reference panels were used for the Asian and European populations and African references were used for the Gambia and Ghana datasets, while mixed datasets were implemented for the admixed RSA population.

Following imputation, the HIBAG package (hlaAssocTest) command was used to implement an additive association test for the HLA alleles across the different regions limited to alleles at MAF > 2.5%. Analyses were adjusted for the first four PCs with and without the rs28383206 genotype in the model. Association testing results for the eight included studies were then combined in a fixed-effects meta-analysis using Metasoft software (Han and Eskin, 2011). Ancestry-specific meta-analysis grouped according to the major population groups (Table 1, Supplementary file 1) was also done using the same method.

Estimation of infection pressure

To generate a covariate capturing the likely cumulative exposure to Mtb for included controls, the results of Houben and Dodd, 2016 were adapted to produce a distance matrix to feed into the meta-analysis. The approach in this article fits a Gaussian process model of infection risk history to local data. To represent uncertainty in derived results, a sample of 200 estimated histories of the annual risk of TB infection in each country was used to calculate the expected fraction of control participants ever infected with Mtb, assuming that controls were uniformly aged between 35 and 44 y in 2010, which approximates the period during which controls were recruited for most of the studies. The true age of the controls was not known for all of the datasets, but as quite a substantial skew to the age distribution would be required to have an impact on the results, we believe our choice here is justified. This was done by including estimates for the potential lifetime infections for each source population as a covariate in the MR-MEGA multi-ancestry meta regression. To determine the impact of the covariate, a chi-square difference test was implemented, on an SNP-SNP basis, on the residual and association testing statistics of two meta-analysis output statistics, one including and the other excluding the potential lifetime infections covariate (Satorra and Bentler, 2001). The aim was to determine whether inclusion of potential lifetime infections in the regression explained some of the residual heterogeneity.

Concordance of direction of effect

To determine the degree to which direction of effect is shared for SNPs between the ancestry-specific meta-analysis, we followed the methodology of Mahajan et al., 2014. First, we identified all variants present in all 12 included datasets. Among these SNPs, we then identified an independent subset of variants in the European ancestry-specific meta-analysis showing nominal evidence of association (p-value≤0.001) and separated by at least 500 kb. The identified SNPs were then extracted from the Asian and African ancestry-specific meta-analysis results to calculate the number of SNPs that had the same direction of effect as in the European analysis. To determine whether significant excess in concordance of effect direction was present, a one-sided binomial test was implemented with the expected concordance set at 50%. This analysis was then repeated for other p-value thresholds (0.001<p≤0.01; 0.01<p≤0.5; and 0.5<p≤1), and also using the African and Asian meta-analysis results as reference.

Data availability

Summary statistics of all meta-analysis will be made available on Dryad (https://doi.org/10.5061/dryad.6wwpzgn2s). The summary statistics and raw data (where available) of the individual data files cannot be made available but enquiries or requests for this data can be made through the corresponding authors or authors directly responsible for the data, listed in Table 1. As the ITHGC consortium has strict data transfer and sharing agreements with the original authors/owners of the data we can not ethically share the source data files in any way, be it either anonymized, de-identified or in any other form. All data that is not restricted by these data transfer and ethical agreements has been either uploaded to the online repository (https://doi.org/10.5061/dryad.6wwpzgn2s) or submitted along with this document. If any interested researchers want to apply for access to the original raw and individual GWAS datasets or any other other data currently restricted they can contact the corresponding author of this manuscript to put them in touch with the original data owners/authors, or the original data owners/authors can be contacted directly by contacting the corresponding authors listed in Table 1. Once the original authors/owners of the data have been contacted discussions can be had to share the data using the appropriate and ethically approved methods, which could include data transfer agreements or similar application processes.

The following previously published data sets were used
    1. Schurz H
    2. Naranbhai V
    3. Yates TA
    4. Gilchrist J
    5. Parks T
    6. Dodd P
    7. Möller M
    8. Hoal EG
    9. Morris A
    10. Hill AV
    (2022) Dryad Digital Repository
    Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture.
    https://doi.org/10.5061/dryad.6wwpzgn2s

References

  1. Book
    1. Diehl K
    2. Von O
    (1936)
    Der Erbeinfluss Bei Der Tuberkulose
    Gustav Fischer.
    1. Elks CE
    2. Perry JRB
    3. Sulem P
    4. Chasman DI
    5. Franceschini N
    6. He C
    7. Lunetta KL
    8. Visser JA
    9. Byrne EM
    10. Cousminer DL
    11. Gudbjartsson DF
    12. Esko T
    13. Feenstra B
    14. Hottenga J-J
    15. Koller DL
    16. Kutalik Z
    17. Lin P
    18. Mangino M
    19. Marongiu M
    20. McArdle PF
    21. Smith AV
    22. Stolk L
    23. van Wingerden SH
    24. Zhao JH
    25. Albrecht E
    26. Corre T
    27. Ingelsson E
    28. Hayward C
    29. Magnusson PKE
    30. Smith EN
    31. Ulivi S
    32. Warrington NM
    33. Zgaga L
    34. Alavere H
    35. Amin N
    36. Aspelund T
    37. Bandinelli S
    38. Barroso I
    39. Berenson GS
    40. Bergmann S
    41. Blackburn H
    42. Boerwinkle E
    43. Buring JE
    44. Busonero F
    45. Campbell H
    46. Chanock SJ
    47. Chen W
    48. Cornelis MC
    49. Couper D
    50. Coviello AD
    51. d’Adamo P
    52. de Faire U
    53. de Geus EJC
    54. Deloukas P
    55. Döring A
    56. Smith GD
    57. Easton DF
    58. Eiriksdottir G
    59. Emilsson V
    60. Eriksson J
    61. Ferrucci L
    62. Folsom AR
    63. Foroud T
    64. Garcia M
    65. Gasparini P
    66. Geller F
    67. Gieger C
    68. GIANT Consortium
    69. Gudnason V
    70. Hall P
    71. Hankinson SE
    72. Ferreli L
    73. Heath AC
    74. Hernandez DG
    75. Hofman A
    76. Hu FB
    77. Illig T
    78. Järvelin M-R
    79. Johnson AD
    80. Karasik D
    81. Khaw K-T
    82. Kiel DP
    83. Kilpeläinen TO
    84. Kolcic I
    85. Kraft P
    86. Launer LJ
    87. Laven JSE
    88. Li S
    89. Liu J
    90. Levy D
    91. Martin NG
    92. McArdle WL
    93. Melbye M
    94. Mooser V
    95. Murray JC
    96. Murray SS
    97. Nalls MA
    98. Navarro P
    99. Nelis M
    100. Ness AR
    101. Northstone K
    102. Oostra BA
    103. Peacock M
    104. Palmer LJ
    105. Palotie A
    106. Paré G
    107. Parker AN
    108. Pedersen NL
    109. Peltonen L
    110. Pennell CE
    111. Pharoah P
    112. Polasek O
    113. Plump AS
    114. Pouta A
    115. Porcu E
    116. Rafnar T
    117. Rice JP
    118. Ring SM
    119. Rivadeneira F
    120. Rudan I
    121. Sala C
    122. Salomaa V
    123. Sanna S
    124. Schlessinger D
    125. Schork NJ
    126. Scuteri A
    127. Segrè AV
    128. Shuldiner AR
    129. Soranzo N
    130. Sovio U
    131. Srinivasan SR
    132. Strachan DP
    133. Tammesoo M-L
    134. Tikkanen E
    135. Toniolo D
    136. Tsui K
    137. Tryggvadottir L
    138. Tyrer J
    139. Uda M
    140. van Dam RM
    141. van Meurs JBJ
    142. Vollenweider P
    143. Waeber G
    144. Wareham NJ
    145. Waterworth DM
    146. Weedon MN
    147. Wichmann HE
    148. Willemsen G
    149. Wilson JF
    150. Wright AF
    151. Young L
    152. Zhai G
    153. Zhuang WV
    154. Bierut LJ
    155. Boomsma DI
    156. Boyd HA
    157. Crisponi L
    158. Demerath EW
    159. van Duijn CM
    160. Econs MJ
    161. Harris TB
    162. Hunter DJ
    163. Loos RJF
    164. Metspalu A
    165. Montgomery GW
    166. Ridker PM
    167. Spector TD
    168. Streeten EA
    169. Stefansson K
    170. Thorsteinsdottir U
    171. Uitterlinden AG
    172. Widen E
    173. Murabito JM
    174. Ong KK
    175. Murray A
    (2010) Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies
    Nature Genetics 42:1077–1085.
    https://doi.org/10.1038/ng.714
    1. Kallmann FJ
    2. Reisner D
    (1943)
    Twin studies on the significance of genetic factors in tuberculosis
    American Review of Tuberculosis 47:547–549.
    1. Mahajan A
    2. Go MJ
    3. Zhang W
    4. Below JE
    5. Gaulton KJ
    6. Ferreira T
    7. Horikoshi M
    8. Johnson AD
    9. Ng MCY
    10. Prokopenko I
    11. Saleheen D
    12. Wang X
    13. Zeggini E
    14. Abecasis GR
    15. Adair LS
    16. Almgren P
    17. Atalay M
    18. Aung T
    19. Baldassarre D
    20. Balkau B
    21. Bao Y
    22. Barnett AH
    23. Barroso I
    24. Basit A
    25. Been LF
    26. Beilby J
    27. Bell GI
    28. Benediktsson R
    29. Bergman RN
    30. Boehm BO
    31. Boerwinkle E
    32. Bonnycastle LL
    33. Burtt N
    34. Cai Q
    35. Campbell H
    36. Carey J
    37. Cauchi S
    38. Caulfield M
    39. Chan JCN
    40. Chang LC
    41. Chang TJ
    42. Chang YC
    43. Charpentier G
    44. Chen CH
    45. Chen H
    46. Chen YT
    47. Chia KS
    48. Chidambaram M
    49. Chines PS
    50. Cho NH
    51. Cho YM
    52. Chuang LM
    53. Collins FS
    54. Cornelis MC
    55. Couper DJ
    56. Crenshaw AT
    57. van Dam RM
    58. Danesh J
    59. Das D
    60. de Faire U
    61. Dedoussis G
    62. Deloukas P
    63. Dimas AS
    64. Dina C
    65. Doney AS
    66. Donnelly PJ
    67. Dorkhan M
    68. van Duijn C
    69. Dupuis J
    70. Edkins S
    71. Elliott P
    72. Emilsson V
    73. Erbel R
    74. Eriksson JG
    75. Escobedo J
    76. Esko T
    77. Eury E
    78. Florez JC
    79. Fontanillas P
    80. Forouhi NG
    81. Forsen T
    82. Fox C
    83. Fraser RM
    84. Frayling TM
    85. Froguel P
    86. Frossard P
    87. Gao Y
    88. Gertow K
    89. Gieger C
    90. Gigante B
    91. Grallert H
    92. Grant GB
    93. Grrop LC
    94. Groves CJ
    95. Grundberg E
    96. Guiducci C
    97. Hamsten A
    98. Han BG
    99. Hara K
    100. Hassanali N
    101. Hattersley AT
    102. Hayward C
    103. Hedman AK
    104. Herder C
    105. Hofman A
    106. Holmen OL
    107. Hovingh K
    108. Hreidarsson AB
    109. Hu C
    110. Hu FB
    111. Hui J
    112. Humphries SE
    113. Hunt SE
    114. Hunter DJ
    115. Hveem K
    116. Hydrie ZI
    117. Ikegami H
    118. Illig T
    119. Ingelsson E
    120. Islam M
    121. Isomaa B
    122. Jackson AU
    123. Jafar T
    124. James A
    125. Jia W
    126. Jöckel KH
    127. Jonsson A
    128. Jowett JBM
    129. Kadowaki T
    130. Kang HM
    131. Kanoni S
    132. Kao WHL
    133. Kathiresan S
    134. Kato N
    135. Katulanda P
    136. Keinanen-Kiukaanniemi KM
    137. Kelly AM
    138. Khan H
    139. Khaw KT
    140. Khor CC
    141. Kim HL
    142. Kim S
    143. Kim YJ
    144. Kinnunen L
    145. Klopp N
    146. Kong A
    147. Korpi-Hyövälti E
    148. Kowlessur S
    149. Kraft P
    150. Kravic J
    151. Kristensen MM
    152. Krithika S
    153. Kumar A
    154. Kumate J
    155. Kuusisto J
    156. Kwak SH
    157. Laakso M
    158. Lagou V
    159. Lakka TA
    160. Langenberg C
    161. Langford C
    162. Lawrence R
    163. Leander K
    164. Lee JM
    165. Lee NR
    166. Li M
    167. Li X
    168. Li Y
    169. Liang J
    170. Liju S
    171. Lim WY
    172. Lind L
    173. Lindgren CM
    174. Lindholm E
    175. Liu CT
    176. Liu JJ
    177. Lobbens S
    178. Long J
    179. Loos RJF
    180. Lu W
    181. Luan J
    182. Lyssenko V
    183. Ma RCW
    184. Maeda S
    185. Mägi R
    186. Männisto S
    187. Matthews DR
    188. Meigs JB
    189. Melander O
    190. Metspalu A
    191. Meyer J
    192. Mirza G
    193. Mihailov E
    194. Moebus S
    195. Mohan V
    196. Mohlke KL
    197. Morris AD
    198. Mühleisen TW
    199. Müller-Nurasyid M
    200. Musk B
    201. Nakamura J
    202. Nakashima E
    203. Navarro P
    204. Ng PK
    205. Nica AC
    206. Nilsson PM
    207. Njølstad I
    208. Nöthen MM
    209. Ohnaka K
    210. Ong TH
    211. Owen KR
    212. Palmer CNA
    213. Pankow JS
    214. Park KS
    215. Parkin M
    216. Pechlivanis S
    217. Pedersen NL
    218. Peltonen L
    219. Perry JRB
    220. Peters A
    221. Pinidiyapathirage JM
    222. Platou CG
    223. Potter S
    224. Price JF
    225. Qi L
    226. Radha V
    227. Rallidis L
    228. Rasheed A
    229. Rathman W
    230. Rauramaa R
    231. Raychaudhuri S
    232. Rayner NW
    233. Rees SD
    234. Rehnberg E
    235. Ripatti S
    236. Robertson N
    237. Roden M
    238. Rossin EJ
    239. Rudan I
    240. Rybin D
    241. Saaristo TE
    242. Salomaa V
    243. Saltevo J
    244. Samuel M
    245. Sanghera DK
    246. Saramies J
    247. Scott J
    248. Scott LJ
    249. Scott RA
    250. Segrè AV
    251. Sehmi J
    252. Sennblad B
    253. Shah N
    254. Shah S
    255. Shera AS
    256. Shu XO
    257. Shuldiner AR
    258. Sigurđsson G
    259. Sijbrands E
    260. Silveira A
    261. Sim X
    262. Sivapalaratnam S
    263. Small KS
    264. So WY
    265. Stančáková A
    266. Stefansson K
    267. Steinbach G
    268. Steinthorsdottir V
    269. Stirrups K
    270. Strawbridge RJ
    271. Stringham HM
    272. Sun Q
    273. Suo C
    274. Syvänen AC
    275. Takayanagi R
    276. Takeuchi F
    277. Tay WT
    278. Teslovich TM
    279. Thorand B
    280. Thorleifsson G
    281. Thorsteinsdottir U
    282. Tikkanen E
    283. Trakalo J
    284. Tremoli E
    285. Trip MD
    286. Tsai FJ
    287. Tuomi T
    288. Tuomilehto J
    289. Uitterlinden AG
    290. Valladares-Salgado A
    291. Vedantam S
    292. Veglia F
    293. Voight BF
    294. Wang C
    295. Wareham NJ
    296. Wennauer R
    297. Wickremasinghe AR
    298. Wilsgaard T
    299. Wilson JF
    300. Wiltshire S
    301. Winckler W
    302. Wong TY
    303. Wood AR
    304. Wu JY
    305. Wu Y
    306. Yamamoto K
    307. Yamauchi T
    308. Yang M
    309. Yengo L
    310. Yokota M
    311. Young R
    312. Zabaneh D
    313. Zhang F
    314. Zhang R
    315. Zheng W
    316. Zimmet PZ
    317. Altshuler D
    318. Bowden DW
    319. Cho YS
    320. Cox NJ
    321. Cruz M
    322. Hanis CL
    323. Kooner J
    324. Lee JY
    325. Seielstad M
    326. Teo YY
    327. Boehnke M
    328. Parra EJ
    329. Chambers JC
    330. Tai ES
    331. McCarthy MI
    332. Morris AP
    (2014) Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility
    Nature Genetics 46:234–244.
    https://doi.org/10.1038/ng.2897
  2. Software
    1. R Development Core Team
    (2013) R: A language and environment for statistical computing
    R Foundation for Statistical Computing, Vienna, Austria.
  3. Book
    1. Zheng X
    (2018) Imputation-based HLA typing with SNPs in GWAS studies
    In: Boegel S, editors. HLA Typing: Methods and Protocols. Springer. pp. 163–176.
    https://doi.org/10.1007/978-1-4939-8546-3

Decision letter

  1. Alexander Young
    Reviewing Editor; University of California, Los Angeles, United States
  2. Bavesh D Kana
    Senior Editor; University of the Witwatersrand, South Africa
  3. Alexander Young
    Reviewer; University of California, Los Angeles, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Alexander Young as Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Bavesh Kana as the Senior Editor.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

The reviewers agreed that your paper is an important contribution to the effort to understand the genetics of host susceptibility to tuberculosis infection. However, some points in the paper were not clear and other points need to be expanded upon.

1) Please improve the clarity of the presentation of the results on SNP heritability. Please address reviewer #1's concern that the estimates should be transformed to the liability scale.

2) Please address reviewer #2's comments about what is driving the HLA association

3) Please address reviewers 2 and 3's comments about how your results relate to the existing associations/candidate genes discussed in the literature.

4) A more precise description of inclusion/exclusion criteria for studies in the meta-analysis is needed. It would also be better if study or ancestry specific summary statistics are released on publication as well as the main meta-analysis summary statistics.

Reviewer #1 (Recommendations for the authors):

The heritability estimates (as far as I can tell) are from applying GCTA to the case-control data encoded as a binary outcome. In order to make these estimates comparable across studies with different case-control ratios, the authors should transform their estimates onto the liability scale.

Why were variants showing within-ancestry heterogeneity removed?

It is hard to assess whether the test for the effect of prevalence on residual heterogeneity was well-powered enough to draw any conclusions.

The claim that there should be reduced power from inclusion of admixed samples due to increased allele frequency differences doesn't make sense to me. Greater genetic diversity should increase power (but also potentially increase confounding).

The link between finding a genome-wide significant locus in the multi-ancestry meta-analysis and the fact that tuberculosis predates the dispersal of modern humans out of Africa seems tenuous to me.

The justification for leaving UK Biobank data out of the meta-analysis doesn't seem valid. While UKB is a non-representative cohort, the case-control cohorts used in the meta-analysis are likely to be even less representative than the UKB. Why not include UKB since this could increase power substantially?

Figure S2: why is this on a different scale to the main GWAS results? Can it be put on the same scale to aid comparison with the GWAS results.

Reviewer #2 (Recommendations for the authors):

1. I think the paper would benefit from having a main text table with all of the nominal associations articulated. They refer to nominal associations – but no pvalues or effect sizes are provided in the main text. Since these are important findings, this should be done. They refer to Table S3 (which I cannot find).

2. I am unsure how the supplementary tables and the excel worksheets lineup. Authors refer to Figure 5, which I cannot find. Authors should carefully make sure that supplemental tables are clearly labeled and findable, along with other materials.

3. Authors present replication of ASAP1 data. Is this offering independent evidence? Is there any independent evidence of previously reported SNP associations? If not authors should say clearly in the abstract that prior known TB SNP associations failed to replicate.

Disappointing as the message is, perhaps it is one of the most important messages.

4. Authors should repeat heritability analysis with S-LDSC (using in reference LD panel) to insure robustness of GATK results. Also stratified LDSC should be used to see if there are cell-type specific annotations or gene sets that are seen consistently across the data sets. That is – it may be possible that there are clear pathways that are enriched across populations with respect to heritability captured, even if no individual alleles replicate.

5. Not clear what sort of data sharing will happen? Raw data should be share if possible – summary statistics for all of the cohorts, and of the meta-analysis.

Reviewer #3 (Recommendations for the authors):

First, given the heritability focus, it would be appropriate to also cite a recent paper that included heritability estimation of a number of TB phenotypes. This paper is especially relevant because it makes the point about the importance of phenotype definition to the eventual heritability estimate, a weakness that plagues some of the GWAS studies included in this paper: https://pubmed.ncbi.nlm.nih.gov/34871961/

Second, a recent paper examined HLA in an African population and did not find associations with TB. This might also be important to cite and discuss: https://pubmed.ncbi.nlm.nih.gov/35702824/

Third, a list of papers is cited about interaction between host genetic variants and strains of Mtb. Not only has this work been done in Ghana and South Africa, it has also been done in Uganda and a couple of Asian populations. This list of references really should be expanded.

Fourth, a few things in Table S1 need to be clarified. Several cells have identical phrases used for TB diagnosis ("AFB staining and culturing of Mtb from sputum samples"). Is it really "and" or is it sometimes "or" or "and/or"? Do all of these studies truly have identical definitions? Was chest x-ray ever used in the definition? This seems quite surprising given previous reviews that have detailed the phenotype definitions in some of these studies. A few cells have "NA" listed in the TB diagnosis. This must be spelt out in the footnote of the table. Also, do those papers really have no detail about the phenotype definition? There must be something.

Fifth, Table S2 presenting the polygenic heritability analyses really needs to be clarified. The column headers are not explained in the footnote. What is the difference between 0.1x', 1x', and 10x', and what does that have to do with heritability? There is also relatively little discussion of this rather complex table in the Results. It also must be clarified in the Methods whether the SNPs were thinned for LD (this is generally done in these sorts of analyses).

Sixth, the supplemental data file really should include some sort of readme to help the reviewer know what they are looking at. While the manuscript includes a list of citations of papers that were used in these candidate gene look-ups, it would be helpful if the Supplementary file included a list of genes looked-up. It also isn't clear if only GWAS hits were looked up, or well-studied candidate genes. If not the latter, they really should be included, given the wealth of research in this area.

Seventh, please clarify the infection pressure analysis. This is potentially a nice addition, but it isn't described very clearly. What are "a sample of 200 estimated histories"? There is an assumption that controls were "uniformly between 25-44 in 2010" – do the authors have age distributions for the controls in all these GWAS studies, and if the data were collected before 2010, how does that impact things?

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture" for further consideration by eLife. Your revised article has been evaluated by Bavesh Kana (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Please address the remaining concerns of reviewer 2. Please also include a comment on the limitations that they may have missed including relevant datasets that they were unaware of.

Reviewer #3 (Recommendations for the authors):

The authors addressed most of the prior comments very thoroughly. A couple of issues remain:

1) The paper that I referenced earlier https://pubmed.ncbi.nlm.nih.gov/34871961/ also includes heritability of TB disease (see Table 3).

2) Thank you for including the thorough case and control definitions in Supplemental Table 1a. Please review it carefully – it seems that some of the case and control definitions are reversed (for example, China 1 and China 2, both have descriptions of healthy individuals under case and TB disease diagnosis under control). There are some important points seen in this table. First, TB was not always excluded from the control populations – often these are generic controls, and this can introduce misclassification bias. Second, the lack of specificity of TB diagnosis also introduces heterogeneity and potential misclassification as well. Not all studies used the gold standard for TB diagnosis – how does this affect interpretation?

https://doi.org/10.7554/eLife.84394.sa1

Author response

Essential revisions:

1) Please improve the clarity of the presentation of the results on SNP heritability. Please address reviewer #1's concern that the estimates should be transformed to the liability scale.

We have thoroughly gone over the comments from reviewer 1 and have addressed them to clarify the SNP heritability results and highlighted the fact that the heritability estimates were already transformed to the liability scale. As it was not clear that the estimates were transformed, we clarified this in the methods (page: 19) and Results section (page: 4) in the main manuscript. We have also addressed all other comments provided by reviewer #1, as detailed in this document (page: 2-4).

2) Please address reviewer #2's comments about what is driving the HLA association

We agree with the reviewers that the HLA section required more work and analysis. As a results and to address the comments of the reviewers we have completely reworked the entire HLA section to clarify the results and identify underlying factors driving the HLA association. We have also addressed all other comments and concerns brought up by the reviewer. Full responses to reviewer #2 comments are in this document (page: 4-7).

3) Please address reviewers 2 and 3's comments about how your results relate to the existing associations/candidate genes discussed in the literature.

The section on prior associations, which addresses how results relate to existing associations/candidate genes discussed in literature, has been updated to clarify how candidate genes and SNPs were selected and how they relate to the results of this meta-analysis. The exact changes to reviewer #2 are detailed in this document (page: 4-7) and responses to all comments from reviewer #3 are detailed in this document on page 7-12.

4) A more precise description of inclusion/exclusion criteria for studies in the meta-analysis is needed. It would also be better if study or ancestry specific summary statistics are released on publication as well as the main meta-analysis summary statistics.

The precise descriptions of inclusion/exclusion criteria and how unpublished studies were sought has been updated in the methods section ‘data’ of the main manuscript (page: 18) to clarify why we are limiting the analysis to the data included in this iteration of the study. Furthermore, we have updated the data sharing section of the manuscript (page: 24) to clarify which results are made available and which cannot be shared. We provide the meta-analysis output for both the global meta-analysis (which includes all studies) and the genetic ancestry specific analysis of the European, Asian, and African populations included in this study. Unfortunately, we as the consortium cannot share the summary statistics or raw data of the individual input studies as we do not have permission to do so. For access to the individual study summary statistics the original authors of the datasets need to be approached, this has been specifically mentioned in the data sharing (page: 24) section of the manuscript.

Reviewer #1 (Recommendations for the authors):

The heritability estimates (as far as I can tell) are from applying GCTA to the case-control data encoded as a binary outcome. In order to make these estimates comparable across studies with different case-control ratios, the authors should transform their estimates onto the liability scale.

We thank the reviewers for pointing out that our results and methods did not convey the fact that the results were transformed onto the liability scale. For our analysis the estimate of variance explained on the observed scale was transformed to that on the underlying scale by the GCTA algorithm using linear transformation. This accounts for ascertainment bias in a case-control study, i.e., a much higher proportion of cases in the sample than in the general population. The manuscript was updated to clarify that estimates were transformed, and a footnote was added to Table S2 (now renamed supplementary file 1b) to clarify that the V(G)/Vp_L represents the transformed estimate.

The manuscript has been updated as follows on page 19.

Heritability estimations were transformed onto the liability scale using the GCTA software to account for the difference in the proportion of cases in the data compared to the population prevalence.

Why were variants showing within-ancestry heterogeneity removed?

We thank the reviewers for raising this issue. Unfortunately this text had been left in from an earlier version of the manuscript but has now been remove. In earlier versions of analysis, variants showing within-ancestry heterogeneity were removed. However, using MR-MEGA software, which control for population specific effects, we did not filter based on within ancestry heterogeneity. The manuscript has been updated (page: 5) to remove this statement.

It is hard to assess whether the test for the effect of prevalence on residual heterogeneity was well-powered enough to draw any conclusions.

We appreciate the reviewer pointing this out and we agree that it is not easy to make a statement to exactly define if this test was sufficiently powered enough to draw a conclusion. For the infection prevalence analysis, we used the heterogeneity Chi-square values calculated by the MR-MEGA meta-analysis software. We used the Chi-square values from the analysis with and without the prevalence covariate added to the analysis and tested if there is a significant difference between these two Chi-square values. As this test is done on a SNP-by-SNP basis the power depends on the number of cases and controls for each SNP. As we have a fixed amount of data, we cannot do anything to change the power for this analysis without including more data. As this is the largest TB meta-analysis to data and adding more data is not possible at this point, we are unsure what we could add to clarify the conclusions from these results, and it is important to stress this aspect does not impact our main findings. Specifically in our analysis prevalence has very limited effect on the residual heterogeneity (especially compared to the heterogeneity introduced by the different ancestral background) supporting our conclusion that the background prevalence is not a major driver of factor heterogeneity.

The claim that there should be reduced power from inclusion of admixed samples due to increased allele frequency differences doesn't make sense to me. Greater genetic diversity should increase power (but also potentially increase confounding).

We thank the reviewer for pointing this out and giving us the opportunity to clarify our explanation. We agree that increased genetic diversity should increase the power, however, the inclusion of the admixed populations could introduce significant confounding effects due to differences in allele frequencies particularly between the admixed RSA and other African ancestry datasets. While the GWAS analysis of the individual RSA datasets was controlled for the effects of admixture the effects of allele frequency differences can still impact the meta-analysis. The lower sample size and reduced power of the African ancestry-specific meta-analysis compared to the other ancestry-specific meta-analysis in combination with the confounding effects can results in the lack of significant associations.

We have updated the manuscript to clarify this in page 11-12.

Potential causes for the lack of associations and suggestive peaks in the African analysis (Figure 4—figure supplement 3) are the increased genetic diversity within Africa and the inclusion of admixed samples (RSA) and the smaller sample size compared to the other ancestry-specific meta-analysis. While power can be increased through inclusion of greater genetic diversity between subpopulation differences in allele frequency can introduce confounding. Confounding by genetic background can result in both spurious associations and the masking of true associations. Such confounding may explain why results observed elsewhere may not replicate in admixed samples. Removing the admixed data and analyzing only the Gambian and Ghanaian datasets also did not produce any significant results although, clearly, the sample size was smaller.

The link between finding a genome-wide significant locus in the multi-ancestry meta-analysis and the fact that tuberculosis predates the dispersal of modern humans out of Africa seems tenuous to me.

We thank the reviewer for pointing out that our statement is too strong. We agree and removed the statement from the manuscript (page: 15).

The justification for leaving UK Biobank data out of the meta-analysis doesn't seem valid. While UKB is a non-representative cohort, the case-control cohorts used in the meta-analysis are likely to be even less representative than the UKB. Why not include UKB since this could increase power substantially?

We thank the reviewer for raising this concern and agree that inclusion of more data for this manuscript would be beneficial, however, when this project and the ITHGC were established the UK Biobank data and other biobank data not included in this publication were not yet available and as such were not included. Including additional datasets at this point will require the entire body of work to be re-done and is beyond the scope of the current manuscript. We do, however, anticipate future iterations of this work including more data.

Figure S2: why is this on a different scale to the main GWAS results? Can it be put on the same scale to aid comparison with the GWAS results.

We thank the reviewer for pointing out this mistake, we have updated the figures to have the same the same scale as the other forest plots.

Reviewer #2 (Recommendations for the authors):

1. I think the paper would benefit from having a main text table with all of the nominal associations articulated. They refer to nominal associations – but no pvalues or effect sizes are provided in the main text. Since these are important findings, this should be done. They refer to Table S3 (which I cannot find).

We thank the reviewer for these comments, we have updated the manuscript to include the p-values for the nominal associations throughout. We did not include a table for these nominal associations in the main manuscript as it is a long table and does not contribute significantly to the main discussion. As such we have included the table in the supplementary tables document (file name: Supplementary file 1). Table S3 (now renamed supplementary file 1c) referenced in the main manuscript can be found on page 4-5 in the file mentioned above.

2. I am unsure how the supplementary tables and the excel worksheets lineup. Authors refer to Figure 5, which I cannot find. Authors should carefully make sure that supplemental tables are clearly labeled and findable, along with other materials.

We thank the reviewer for pointing this out and agree that is vital the supplementary material is clearly integrated with the main manuscript. Accordingly, we have updated the manuscript to properly reference the supplementary tables file (Supplementary file 1) and the supplementary data excel sheet (Source data 1). The additional excel sheet contains additional information and results that are not directly discussed in detail in the main manuscript, but which could still be valuable for future research referencing this publication. We have also added a readme file (sheet 1 of the excel document) and titles to clarify the results provided in the excel sheet. Finally, the reference to Figure 5 in the main manuscript was a mistake and has been correctly updated to reference Figure 4.

3. Authors present replication of ASAP1 data. Is this offering independent evidence? Is there any independent evidence of previously reported SNP associations? If not authors should say clearly in the abstract that prior known TB SNP associations failed to replicate.

We thank the reviewer for this suggestion, and we agree that the section on ASAP1 was not clearly explained, and we have reworked this section to clarify if this association is offering independent evidence. Looking at the results the association in ASAP1 was driven by the Russian cohort as this is the only dataset where there is a strong signal for ASAP1 variants. Russia p-value for rs3935174 is 2.965610e-07 and the p-value for all other cohorts is > 0.1, except for RSA MEGA which is 0.01. Based on this we concluded that our analysis does not offer independent evidence as the Russian cohort is the same dataset in which the ASAP1 association was originally identified.

We have updated the “Ancestry-specific meta-analysis” section to clarify this (page: 13) as shown below:

“A possible explanation for the association being observed only in the European meta-analysis is that the association is driven by the Russian dataset. rs4733781 has a strong signal in the Russian dataset (p-value = 2.96e-7), but very weak signals in all other populations included in the analysis (p-value > 0.01) and is in LD with rs3935174 (r2=0.6935 and D’=0.8791) identified in our analysis. rs4733781 also did not replicate in a previous GWAS from Iceland 19, further suggesting that this association is not specific to European populations, but rather driven by the large Russian dataset included in this study.”

The “Prior associations” section was also updated (page: 13) as shown below:

“However, as discussed in the previous section, these associations are driven by the Russian dataset, which is the same data used by Curtis et al. (2015) where these associations were originally discovered13. As the Russian population included in our analysis presenting with a strong signal for these variants there is no independent evidence for these candidate SNPs as they did not replicate in any other population.”

Finally, we added a statement to the abstract that previous associations were not replicated in this meta-analysis (page: 2) as shown below:

“We identified one global host genetic correlate for TB at genome-wide significance (p<5e-8) in the human leukocyte antigen (HLA)-II region (rs28383206, p-value = 5.2e-9), but failed to replicate variants previously associated with TB susceptibility.”

4. Authors should repeat heritability analysis with S-LDSC (using in reference LD panel) to insure robustness of GATK results. Also stratified LDSC should be used to see if there are cell-type specific annotations or gene sets that are seen consistently across the data sets. That is – it may be possible that there are clear pathways that are enriched across populations with respect to heritability captured, even if no individual alleles replicate.

We thank the reviewers for this suggestion. However, a detailed analysis of heritability is beyond the scope of this paper. Furthermore, valid reference populations and accurate LD scores are not available for all the populations included in this publication. In future, once appropriate reference data and LD scores are available, we plan furthermore detailed analyses of heritability using the consortium dataset.

5. Not clear what sort of data sharing will happen? Raw data should be share if possible – summary statistics for all of the cohorts, and of the meta-analysis.

We thank the reviewer for pointing out that the data sharing is not clear in the manuscript. Unfortunately, we cannot share the summary statistics of the individual cohorts as this is not covered by the data sharing agreement of the consortium. Only the summary statistics for the various meta-analyses and HLA analysis are available at this point. We have updated the manuscript (page: 24) to clarify this, see below.

“Summary statistics of all meta-analysis will be made available on the Dryad online database (https://doi.org/10.5061/dryad.6wwpzgn2s). The summary statistics and raw data (where available) of the individual data files cannot be made available but enquires or request for this data can be made through the corresponding authors or authors directly responsible for the data, listed in Table 1”

Reviewer #3 (Recommendations for the authors):

First, given the heritability focus, it would be appropriate to also cite a recent paper that included heritability estimation of a number of TB phenotypes. This paper is especially relevant because it makes the point about the importance of phenotype definition to the eventual heritability estimate, a weakness that plagues some of the GWAS studies included in this paper: https://pubmed.ncbi.nlm.nih.gov/34871961/

We thank the reviewer for this suggestion. However, we do not agree that the results from the suggested paper are directly comparable to our analysis, as the resister phenotype is beyond the scope of our manuscript. While we do agree that different phenotype definitions will have an impact on heritability estimates, the case definition for all our cohorts was active TB disease. While there is some variation in how exactly active TB confirmation was obtained for all our cohorts, we believe this has only a limited impact on our heritability estimates.

We updated the manuscript to mention this in page: 4, shown below.

“Furthermore, variations in phenotype definition can have an impact on heritability estimates (supplementary file 1a).”

Second, a recent paper examined HLA in an African population and did not find associations with TB. This might also be important to cite and discuss: https://pubmed.ncbi.nlm.nih.gov/35702824/

We thank the reviewer for this suggestion, however, considering the smaller sample size of the suggested publication it is difficult to relate the findings to our larger meta-analysis. Nonetheless, given our study found limited evidence of allelic associations in the available African datasets the studies the results are in fact broadly compatible. We have added a section to the discussion to highlight that the results found in the suggested study support future in depth studies of HLA class II and their role in protective effects against TB. The manuscript was updated (page: 15) as shown below:

“A study investigating outcomes of Mtb exposure in individuals of African Ancestry identified protective effects of HLA class II alleles for individuals resistant to TB, highlighting the importance of HLA class II and susceptibility to TB62”

Third, a list of papers is cited about interaction between host genetic variants and strains of Mtb. Not only has this work been done in Ghana and South Africa, it has also been done in Uganda and a couple of Asian populations. This list of references really should be expanded.

We thank the reviewer for pointing out these studies, the manuscript has been updated to include these references (page: 9).

“Previous work has shown evidence of interaction between genetic variants of the host and specific strains of Mtb in Ghanaian, Ugandan, South African and Asian populations7,8,38–44.”

Fourth, a few things in Table S1 need to be clarified. Several cells have identical phrases used for TB diagnosis ("AFB staining and culturing of Mtb from sputum samples"). Is it really "and" or is it sometimes "or" or "and/or"? Do all of these studies truly have identical definitions? Was chest x-ray ever used in the definition? This seems quite surprising given previous reviews that have detailed the phenotype definitions in some of these studies. A few cells have "NA" listed in the TB diagnosis. This must be spelt out in the footnote of the table. Also, do those papers really have no detail about the phenotype definition? There must be something.

We thank the reviewer for pointing out these issues in the table. We have updated the Table S1 (now renamed supplementary file 1a) and double checked and clarified the procedures for diagnosis of all the included datasets to include a clear phenotype definition for all included studies.

Fifth, Table S2 presenting the polygenic heritability analyses really needs to be clarified. The column headers are not explained in the footnote. What is the difference between 0.1x', 1x', and 10x', and what does that have to do with heritability? There is also relatively little discussion of this rather complex table in the Results. It also must be clarified in the Methods whether the SNPs were thinned for LD (this is generally done in these sorts of analyses).

We thank the reviewer for pointing out these issues and we have updated the manuscript to clarify the analysis and results. Table S2 (now renamed supplementary file 1b) in the supplementary data has been updated as contained results from analysis that we have discarded. The 0.1x, 1x and 10x is the prevalence multiplier where we wanted to see how the heritability estimates change with the infection pressure. This analysis was not included in the final manuscript, and the table has now been updated to remove the additional columns.

We have also updated the methods section of the heritability analysis to clarify that we used un-imputed data, and that the data was pruned for LD at a 50 SNP window, sliding by 10 SNPs at a time and removing all variants with LD greater than 0.5 (page: 19).

“The genetic relationship matrix was calculated for each autosomal chromosome (un-imputed data) which were pruned for SNPs in linkage disequilibrium (LD) using a 50 SNP window, sliding by 10 SNPs at a time and removing all variants with LD greater than 0.5.”

Sixth, the supplemental data file really should include some sort of readme to help the reviewer know what they are looking at. While the manuscript includes a list of citations of papers that were used in these candidate gene look-ups, it would be helpful if the Supplementary file included a list of genes looked-up. It also isn't clear if only GWAS hits were looked up, or well-studied candidate genes. If not the latter, they really should be included, given the wealth of research in this area.

We thank the reviewer for this feedback and suggestions. For the prior association analysis, we included previous GWAS hits as well as SNPs within well-studied candidate genes previously investigated. A list of all candidate SNPs and genes have been added to the supplementary excel data file (sheet 2), along with a readme file to explain the data in the supplementary data excel file. The readme file is also included in the excel file itself (sheet 1). The manuscript has been updated to clarify that candidate and GWAS variants were assessed. The manuscript has been updated to clarify which SNPs were included in this analysis (page: 13)

“To determine if associations from previously published TB-GWAS, TB candidate SNPs, and SNPs within candidate gene studies replicate in this meta-analysis, we extracted all significant and suggestive associations from prior analyses and compared these to our multi-ancestry and ancestry-specific meta-analysis results6,10–18,20,23–26,31.”

Seventh, please clarify the infection pressure analysis. This is potentially a nice addition, but it isn't described very clearly. What are "a sample of 200 estimated histories"? There is an assumption that controls were "uniformly between 25-44 in 2010" – do the authors have age distributions for the controls in all these GWAS studies, and if the data were collected before 2010, how does that impact things?

We thank the reviewer for highlighting this. The full methodology is described in Houben and Dodd. A simulation approach is used to sample historical trajectories of TB infection risk from a Gaussian process model of infection risk fitted to data, and this uncertainty is propagated through calculations. Furthermore, as we do not have age data for all the included studies, we decided to model the force of infection for 35-44 years of age. As averaging over non-uniform distributions within those age ranges is likely to give similar central estimates (perhaps smaller SD) and quite a substantial skew to the age distribution would be needed to make much of a difference to the means we propose that changes in age would not make a big impact and our chosen range is justified. Changing the year, we chose for modelling (2010) also would not have a substantial impact as TB epidemics tend to change rather slowly (~1.5% change in infection incidence per year) particularly since prevalence reflects lifetime exposure.

To help clarify this in the text without recapitulating the methods in the reference, we have changed this sentence to read (page: 21):

“The approach in this paper fits a Gaussian process model of infection risk history to local data. To represent uncertainty in derived results, a sample of 200 estimated histories of the annual risk of TB infection in each country was used to calculate the expected fraction of control participants ever infected with Mtb, assuming that controls were uniformly aged between 35-44 years in 2010, which approximates the period during which controls were recruited for most of the studies. The true age of the controls was not known for all of the datasets, but as quite a substantial skew to the age distribution would be required to have an impact on the results we believe our choice here is justified.”

[Editors’ note: what follows is the authors’ response to the second round of review.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Please address the remaining concerns of reviewer 3. Please also include a comment on the limitations that they may have missed including relevant datasets that they were unaware of.

Reviewer #3 (Recommendations for the authors):

The authors addressed most of the prior comments very thoroughly. A couple of issues remain:

1) The paper that I referenced earlier https://pubmed.ncbi.nlm.nih.gov/34871961/ also includes heritability of TB disease (see Table 3).

We thank the reviewer for bringing this paper to our attention. We added a reference in the Polygenic heritability section (page 4) in the previous iteration of this manuscript. We have now expanded on this to add an explanation and example from the suggested study to explain the impact that phenotype variations can have on heritability estimates and related this back to the inconsistent phenotype definitions of the studies included in this meta-analysis.

The manuscript has been updated as follows on page 4-5.

“This is supported by previous research by McHenry et al. (2021) where significant differences in polygenic heritability estimates were identified between subjects with latent TB infection (LTBI), active TB and subjects classified as resistors.31. As this study includes data with varying methods of classifying TB cases and healthy controls (Supplementary file 1a) there is potential for a degree of heterogeneity and misclassification (between cases and controls) that can have an impact on the heritability estimates.”

2) Thank you for including the thorough case and control definitions in Supplemental Table 1a. Please review it carefully – it seems that some of the case and control definitions are reversed (for example, China 1 and China 2, both have descriptions of healthy individuals under case and TB disease diagnosis under control). There are some important points seen in this table. First, TB was not always excluded from the control populations – often these are generic controls, and this can introduce misclassification bias. Second, the lack of specificity of TB diagnosis also introduces heterogeneity and potential misclassification as well. Not all studies used the gold standard for TB diagnosis – how does this affect interpretation?

We thank the reviewer for pointing out the fact that the case and control definitions in Supplemental Table 1a were reversed. We have addressed this and updated the table to have the correct information in the correct column. We are also grateful that the case control definitions in Supplemental Table 1a were carefully reviewed and thank the reviewer for the suggestions to highlight the impact that the heterogeneity and potential misclassification of the phenotype definitions can have. We agree that the heterogeneity and potential misclassification of the phenotype classification introduces heterogeneity and can reduce power to detect significant associations. However, we have carefully examined the results and based on the association statistics for the residual heterogeneity and ancestry-specific heterogeneity and while various factors influence the results, the ancestry-specific factors clearly have the stronger impact on the results. We reiterate this point in the discussion and again highlight the impact that the phenotype classification can have on the results in this study.

The manuscript has been updated as follows on page 16.

“Specifically, the lack of consistency and specificity in TB diagnosis between the included studies introduces heterogeneity and the potential for misclassification of cases and controls, which can reduce the power to detect significant associations (Supplementary file 1a). While this is a limitation of this study the fact that the residual heterogeneity is overpowered by the ancestry-specific heterogeneity suggests that the phenotype definitions are not the main driver behind the lack of significant associations.”

https://doi.org/10.7554/eLife.84394.sa2

Article and author information

Author details

  1. Haiko Schurz

    DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
    Contribution
    Conceptualization, Data curation, Investigation, Methodology, Writing – original draft, Writing – review and editing
    For correspondence
    haikoschurz@gmail.com
    Competing interests
    No competing interests declared
    Additional information
    co-first authors
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0009-3409
  2. Vivek Naranbhai

    1. Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    2. Massachusetts General Hospital, Boston, United States
    3. Dana-Farber Cancer Institute, Boston, United States
    4. Centre for the AIDS Programme of Research in South Africa, Durban, South Africa
    5. Harvard Medical School, Boston, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    Additional information
    co-first authors
  3. Tom A Yates

    Division of Infection and Immunity, Faculty of Medical Sciences, University College London, London, United Kingdom
    Contribution
    Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6081-1767
  4. James J Gilchrist

    1. Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    2. Department of Paediatrics, University of Oxford, Oxford, United Kingdom
    Contribution
    Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2045-6788
  5. Tom Parks

    1. Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    2. Department of Infectious Diseases Imperial College London, London, United Kingdom
    Contribution
    Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Peter J Dodd

    School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom
    Contribution
    Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  7. Marlo Möller

    DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
    Contribution
    Resources, Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0805-6741
  8. Eileen G Hoal

    DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
    Contribution
    Resources, Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  9. Andrew P Morris

    Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, United Kingdom
    Contribution
    Software, Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6805-6014
  10. Adrian VS Hill

    1. Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    2. Jenner Institute, University of Oxford, Oxford, United Kingdom
    Contribution
    Resources, Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  11. International Tuberculosis Host Genetics Consortium

    Contribution
    Conceptualization, Data curation, Writing – review and editing
    Competing interests
    No competing interests declared
    1. Haiko Schurz, DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University,, Cape Town, South Africa
    2. Vivek Naranbhai, Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    3. Tom A Yates, Division of Infection and Immunity, Faculty of Medical Sciences, University College, London, United Kingdom
    4. James J Gilchrist, Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    5. Tom Parks, Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    6. Peter J Dodd, Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, United Kingdom
    7. Marlo Möller, DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University,, Cape Town, South Africa
    8. Eileen G Hoal, DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University,, Cape Town, South Africa
    9. Andrew P Morris, Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, United Kingdom
    10. Adrian VS Hill, Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    11. Reinout van Crevel, Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, Netherlands
    12. Arjan van Laarhoven, Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, Netherlands
    13. Tom HM Ottenhoff, Head Lab Dept of Infectious Diseases; Head Group Immunology and Immunogenetics of Bacterial Infectious Diseases Leiden University Medical Center, Leiden, Netherlands
    14. Andres Metspalu, Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
    15. Reedik Magi, Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
    16. Christian G Meyer, Institute of Tropical Medicine, Eberhard-Karls University Tübingen, Tübingen, Germany
    17. Magda Ellis, Tuberculosis Research Group, Centenary Institute, Sydney, Australia
    18. Thorsten Thye, School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom
    19. Surakameth Mahasirimongkol, Department of Medical Sciences, Ministry of Public Health, Nonthaburi, Thailand
    20. Ekawat Pasomsub, Virology Laboratory, Department of Pathology, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
    21. Katsushi Tokunaga, Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
    22. Yosuke Omae, Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
    23. Hideki Yanai, Fukujuji Hospital and Research Institute of Tuberculosis, Japan Anti-Tuberculosis Association, Kiyose, Japan
    24. Taisei Mushiroda, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
    25. Michiaki Kubo, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
    26. Atsushi Takahashi, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
    27. Yoichiro Kamatani, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
    28. Bachti Alisjahbana, Faculty of Medicine, Universitas Padjdjaran - Hasan Sadikin Hospital, Bandung, Indonesia
    29. Wei Liu, Department of Plastic and Reconstructive Surgery, Shanghai Key Laboratory of Tissue Engineering, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University – School of Medicine, Shanghai, China
    30. A-dong Sheng, National Clinical Research Center for Respiratory Diseases, National Key Discipline of Pediatrics, Capital Medical University, Beijing, China
    31. Yurong Yang, Ningxia Medical University, Ningxia Hui Autonomous Region, Ningxia, China

Funding

National Institute for Health Research (Academic Clinical Lectureship)

  • James J Gilchrist

Versus Arthritis (21754)

  • Andrew P Morris

Medical Research Council (MR/P022081/1)

  • Peter J Dodd

National Institute for Health Research (NIHR Clinical Lecturer)

  • Tom A Yates

National Institute for Health Research (CL-2020-21-001)

  • Tom Parks

Wellcome (10.35802/222098)

  • Tom Parks

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Acknowledgements

Computation used the Oxford Biomedical Research Computing (BMRC) facility, a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. Financial support was provided by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This work was partly supported by a Grant in-Aid for Scientific Research (B) (KAKENHI 21406006) from Japan Society for the Promotion of Science (JSPS). The clinical information and samples in Thailand, in this part, were supported by JSPS KAKENHI 17256005 and later by research grant from the Ministry of Health, Labor and Welfare (MHLW) H21-aids-12. We would like to thank all the subjects and the members of the Rotary Club of Osaka-Midosuji District 2660 Rotary International in Japan who donated their DNA for this work. We thank all members of BioBank Japan, Institute of Medical Science, The University of Tokyo, and of RIKEN Center for Genomic Medicine for their contribution to the completion of our study. This work was conducted as a part of the BioBank Japan Project that was supported by the Ministry of Education, Culture, Sports, Science and Technology of the Japanese government. As for Thai samples, we thank all of the staff and collaborators of the TB/HIV Research Project, Thailand, a research project between the Research Institute of Tuberculosis, the Japan Anti-tuberculosis Association, and the Thai Ministry of Public Health for collecting clinical data and DNA samples. We thank the German Consortium 'TB or not TB Network' (https://www.tbornottb.de/), which was responsible for collecting the German TB samples. We acknowledge the support of the DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa. This research was funded in whole, or in part, by the Wellcome Trust. For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. JJG is funded by an NIHR Academic Clinical Lectureship. APM acknowledges support from Versus Arthritis (grant reference 21754). PJD was supported by a fellowship from the UK Medical Research Council (MR/P022081/1); this UK-funded award is part of the EDCTP2 program supported by the European Union. ME was supported by an NHMRC fellowship (552496). The research was supported by the NHMRC grant 1025166. AvL and RvC are supported by the National Institute of Allergy and Infectious Diseases at NIH [R01 AI136921]. TAY is an NIHR Clinical Lecturer supported by the National Institute for Health Research. TP acknowledges funding from the National Institute for Health Research (CL-2020-21-001) and the Wellcome Trust (222098/Z/20/Z). The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health and Social Care. AM and RM are funded by the EU project no. 2014-2020.4.01.15-0012 'Gentransmed'. BA is supported by the 'Scientific Programme Indonesia Netherlands' (SPIN) under the Royal Academy of Arts and Sciences (KNAW), the Netherlands.

Ethics

A research collaboration agreement was signed by all contributors. Ethics approval for the meta-analysis presented here was granted by the Health Research Ethics Committee of Stellenbosch University (project registration number S17/01/013). In addition, all institutions involved in the ITHGC have ethics approval for their respective studies: China 1 and 2: The study protocol was approved by the Ethics Committee of the Beijing Chest Hospital, the 309 Hospital of the PLA, Shijiazhuang Fifth Hospital, the China PLA General Hospital, the Tongliao TB institute and the Center for Diseases Control and Prevention in Jalainuoer. China 3: Ethics approval was granted by the Ethics Committees of the Beijing Children's Hospital, the Beijing Geriatric Hospital, the Tuberculosis Hospital in Shaanxi Province, the Beijing Institute of Genomics, Chinese Academy of Sciences and the Center for Disease Control and Prevention of Jiangsu Province. Thailand: Ethics approval was granted by the Ethics Review Committee of the Ministry of Public Health in Thailand. Japan: Ethics approval was granted by the Institutional Review Board of the Center for Genomic Medicine, RIKEN Russia: Blood samples from all participants were collected and studied with written informed consent according to the Declaration of Helsinki and with approvals from the local ethics committees in Russia (St. Petersburg and Samara) and the UK (Human Biological Resource Ethics Committee of the University of Cambridge and the National Research Ethics Service, Cambridgeshire 1 REC, 10/H0304/71). Estonia: The Estonian Bioethics and Human Research Council (EBIN) approved the Estonian Genome Center study reported in this manuscript. Germany: The study protocol was approved by the ethics committee (EC) of the University of Luebeck, Germany (reference 07-125), and was adopted by other ethics committees covering all 18 participating centres (EC of the medical faculty of the University of Goettingen; EC of the Medical Council of Hessen, Frankfurt /Main; EC of the Medical Council Hamburg; EC of the Medical Council Lower Saxony, Hannover; EC of the Medical Faculty Carl Gustav Carus, Technical University of Dresden; EC of the Medical Council Berlin; EC of the Medical Council Bavaria, Munich; EC of the Medical Faculty, Friedrich-Alexander-University Erlangen-Nuremberg; EC of the Medical Faculty of the University of Regensburg; EC of the University of Witten/ Herdecke) Gambia: Ethics approval was granted by the Medical Research Council (MRC) and the Gambian government joint ethical committee. Ghana: Ethics approval was granted by the Committee on Human Research, Publications and Ethics, School of Medical Sciences, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana, and the Ethics Committee of the Ghana Health Service, Accra, Ghana. RSA A and RSA M: Ethics approval was granted by the Health Research Ethics Committee of Stellenbosch University (project registration numbers S17/01/013, NO6/07/132 and 95/072).

Senior Editor

  1. Bavesh D Kana, University of the Witwatersrand, South Africa

Reviewing Editor

  1. Alexander Young, University of California, Los Angeles, United States

Reviewer

  1. Alexander Young, University of California, Los Angeles, United States

Version history

  1. Preprint posted: August 30, 2022 (view preprint)
  2. Received: October 23, 2022
  3. Accepted: November 23, 2023
  4. Version of Record published: January 15, 2024 (version 1)
  5. Version of Record updated: January 16, 2024 (version 2)

Copyright

© 2024, Schurz et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 342
    Page views
  • 59
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Haiko Schurz
  2. Vivek Naranbhai
  3. Tom A Yates
  4. James J Gilchrist
  5. Tom Parks
  6. Peter J Dodd
  7. Marlo Möller
  8. Eileen G Hoal
  9. Andrew P Morris
  10. Adrian VS Hill
  11. International Tuberculosis Host Genetics Consortium
(2024)
Multi-ancestry meta-analysis of host genetic susceptibility to tuberculosis identifies shared genetic architecture
eLife 13:e84394.
https://doi.org/10.7554/eLife.84394

Share this article

https://doi.org/10.7554/eLife.84394

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Thomas A Sasani, Aaron R Quinlan, Kelley Harris
    Research Article

    Maintaining germline genome integrity is essential and enormously complex. Although many proteins are involved in DNA replication, proofreading, and repair, mutator alleles have largely eluded detection in mammals. DNA replication and repair proteins often recognize sequence motifs or excise lesions at specific nucleotides. Thus, we might expect that the spectrum of de novo mutations – the frequencies of C>T, A>G, etc. – will differ between genomes that harbor either a mutator or wild-type allele. Previously, we used quantitative trait locus mapping to discover candidate mutator alleles in the DNA repair gene Mutyh that increased the C>A germline mutation rate in a family of inbred mice known as the BXDs (Sasani et al., 2022, Ashbrook et al., 2021). In this study we developed a new method to detect alleles associated with mutation spectrum variation and applied it to mutation data from the BXDs. We discovered an additional C>A mutator locus on chromosome 6 that overlaps Ogg1, a DNA glycosylase involved in the same base-excision repair network as Mutyh (David et al., 2007). Its effect depends on the presence of a mutator allele near Mutyh, and BXDs with mutator alleles at both loci have greater numbers of C>A mutations than those with mutator alleles at either locus alone. Our new methods for analyzing mutation spectra reveal evidence of epistasis between germline mutator alleles and may be applicable to mutation data from humans and other model organisms.

    1. Evolutionary Biology
    2. Genetics and Genomics
    Ban Wang, Alexander L Starr, Hunter B Fraser
    Research Article

    Although gene expression divergence has long been postulated to be the primary driver of human evolution, identifying the genes and genetic variants underlying uniquely human traits has proven to be quite challenging. Theory suggests that cell-type-specific cis-regulatory variants may fuel evolutionary adaptation due to the specificity of their effects. These variants can precisely tune the expression of a single gene in a single cell-type, avoiding the potentially deleterious consequences of trans-acting changes and non-cell type-specific changes that can impact many genes and cell types, respectively. It has recently become possible to quantify human-specific cis-acting regulatory divergence by measuring allele-specific expression in human-chimpanzee hybrid cells—the product of fusing induced pluripotent stem (iPS) cells of each species in vitro. However, these cis-regulatory changes have only been explored in a limited number of cell types. Here, we quantify human-chimpanzee cis-regulatory divergence in gene expression and chromatin accessibility across six cell types, enabling the identification of highly cell-type-specific cis-regulatory changes. We find that cell-type-specific genes and regulatory elements evolve faster than those shared across cell types, suggesting an important role for genes with cell-type-specific expression in human evolution. Furthermore, we identify several instances of lineage-specific natural selection that may have played key roles in specific cell types, such as coordinated changes in the cis-regulation of dozens of genes involved in neuronal firing in motor neurons. Finally, using novel metrics and a machine learning model, we identify genetic variants that likely alter chromatin accessibility and transcription factor binding, leading to neuron-specific changes in the expression of the neurodevelopmentally important genes FABP7 and GAD1. Overall, our results demonstrate that integrative analysis of cis-regulatory divergence in chromatin accessibility and gene expression across cell types is a promising approach to identify the specific genes and genetic variants that make us human.