Host-pathogen genetic interactions underlie tuberculosis susceptibility in genetically diverse mice

  1. Clare M Smith  Is a corresponding author
  2. Richard E Baker
  3. Megan K Proulx
  4. Bibhuti B Mishra
  5. Jarukit E Long
  6. Sae Woong Park
  7. Ha-Na Lee
  8. Michael C Kiritsy
  9. Michelle M Bellerose
  10. Andrew J Olive
  11. Kenan C Murphy
  12. Kadamba Papavinasasundaram
  13. Frederick J Boehm
  14. Charlotte J Reames
  15. Rachel K Meade
  16. Brea K Hampton
  17. Colton L Linnertz
  18. Ginger D Shaw
  19. Pablo Hock
  20. Timothy A Bell
  21. Sabine Ehrt
  22. Dirk Schnappinger
  23. Fernando Pardo-Manuel de Villena
  24. Martin T Ferris
  25. Thomas R Ioerger
  26. Christopher M Sassetti  Is a corresponding author
  1. Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, United States
  2. Department of Molecular Genetics and Microbiology, Duke University, United States
  3. Department of Immunology and Microbial Disease, Albany Medical College, United States
  4. Department of Microbiology and Immunology, Weill Cornell Medical College, United States
  5. University Program in Genetics and Genomics, Duke University, United States
  6. Department of Genetics, University of North Carolina at Chapel Hill, United States
  7. Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, United States
  8. Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, United States
  9. Department of Computer Science and Engineering, Texas A&M University, United States

Abstract

The outcome of an encounter with Mycobacterium tuberculosis (Mtb) depends on the pathogen’s ability to adapt to the variable immune pressures exerted by the host. Understanding this interplay has proven difficult, largely because experimentally tractable animal models do not recapitulate the heterogeneity of tuberculosis disease. We leveraged the genetically diverse Collaborative Cross (CC) mouse panel in conjunction with a library of Mtb mutants to create a resource for associating bacterial genetic requirements with host genetics and immunity. We report that CC strains vary dramatically in their susceptibility to infection and produce qualitatively distinct immune states. Global analysis of Mtb transposon mutant fitness (TnSeq) across the CC panel revealed that many virulence pathways are only required in specific host microenvironments, identifying a large fraction of the pathogen’s genome that has been maintained to ensure fitness in a diverse population. Both immunological and bacterial traits can be associated with genetic variants distributed across the mouse genome, making the CC a unique population for identifying specific host-pathogen genetic interactions that influence pathogenesis.

Editor's evaluation

This work takes advantage of the genetic diversity of a panel of mice, termed the collaborative cross, to identify those host factors that contribute to heterogeneous outcomes after tuberculosis infection. The authors infect this panel of mouse strains with pools of Mycobacterium tuberculosis transposon mutants, allowing identification of specific host genotypes that confer fitness effects on certain bacterial mutants. The resulting analyses identify loci that affect quantitative immunological phenotypes or fitness of select bacterial mutants. The study is likely to be an important resource to microbiologists in general and those individuals studying the host immune response to tuberculosis infection

https://doi.org/10.7554/eLife.74419.sa0

Introduction

Infection with Mycobacterium tuberculosis (Mtb) produces heterogeneous outcomes that are influenced by genetic and phenotypic variation in both the host and the pathogen. Classic human genetic studies show that host variation influences immunity to tuberculosis (TB) (Abel et al., 2018; Comstock, 1978). Likewise, the co-evolution of Mtb with different populations across the globe has produced genetically distinct lineages that demonstrate variable virulence traits (Gagneux et al., 2006; Hershberg et al., 2008; Wirth et al., 2008). The role of genetic variation on each side of this interaction is established, yet the intimate evolutionary history of both genomes suggests that interactions between host and pathogen variants may represent an additional determinant of outcome (McHenry et al., 2020). Evidence for genetic interactions between host and pathogen genomes have been identified in several infections (Ansari et al., 2017; Berthenet et al., 2018), including TB (Caws et al., 2008; Holt et al., 2018; Thuong et al., 2016). However, the combinatorial complexity involved in identifying these relationships in natural populations have left the mechanisms largely unclear.

Mouse models have proven to be a powerful tool to understand mechanisms of susceptibility to TB. Host requirements for protective immunity were discovered by engineering mutations in the genome of standard laboratory strains of mice, such as C57BL/6 (B6), revealing a critical role of Th1 immunity. Mice lacking factors necessary for the production of Th1 cells or the protective cytokine interferon gamma (IFNγ) are profoundly susceptible to Mtb infection (Caruso et al., 1999; Cooper et al., 1993; Cooper et al., 1997; Flynn et al., 1993; Saunders et al., 2002). Defects in this same immune axis cause the human syndrome Mendelian Susceptibility to Mycobacterial Disease (MSMD) (Altare et al., 1998; Bogunovic et al., 2012; Bustamante et al., 2014; Filipe-Santos et al., 2006), demonstrating the value of knockout (KO) mice to characterize genetic variants of large effect. Similarly, the standard mouse model has been used to define Mtb genes that are specifically required for optimal bacterial fitness during infection (Bellerose et al., 2020; Sassetti and Rubin, 2003; Zhang et al., 2013).

Despite the utility of standard mouse models, it has become increasingly clear that the immune response to Mtb in genetically diverse populations is more heterogeneous than any single small animal model (Smith and Sassetti, 2018). For example, while IFNγ-producing T cells are critical for protective immunity in standard inbred lines of mice, a significant fraction of humans exposed to Mtb control the infection without producing a durable IFNγ response (Lu et al., 2019). Similarly, IL-17 producing T cells have been implicated in both protective responses and inflammatory tissue damage in TB, but IL-17 has little effect on disease progression in B6 mice, except in the context of vaccination or infection with particularly virulent Mtb (Gopal et al., 2012; Khader et al., 2007). The immunological homogeneity of standard mouse models may also explain why only a small minority of the >4000 genes that have been retained in the genome of Mtb during its natural history promote fitness in the mouse (Bellerose et al., 2020). Thus, homogenous mouse models of TB fail to capture the distinct disease states, mechanisms of protective immunity, and selective pressures on the bacterium that are observed in natural populations.

The Collaborative Cross (CC) and Diversity Outbred (DO) mouse populations are new mammalian resources that more accurately represent the genetic and phenotypic heterogeneity observed in outbred populations (Churchill et al., 2004; Churchill et al., 2012). These mouse panels are both derived from the same eight diverse founder strains but have distinct population structures (Saul et al., 2019). DO mice are maintained as an outbred population and each animal represents a unique and largely heterozygous genome (Keller et al., 2018; Svenson et al., 2012). In contrast, each inbred CC strain’s genome is almost entirely homozygous, producing a genetically stable and reproducible population in which the phenotypic effect of recessive mutations is maximized (Shorter et al., 2019; Srivastava et al., 2017). Together, these resources have been leveraged to identify host loci underlying the immune response to infectious diseases (Noll et al., 2019). In the context of TB, DO mice have been used as individual, unique hosts to identify correlates of disease, which resemble those observed in non-human primates and humans (Ahmed et al., 2020; Gopal et al., 2013; Koyuncu et al., 2021; Niazi et al., 2015). Small panels of the reproducible CC strains have been leveraged to identify host background as a determinant of the protective efficacy of BCG vaccination (Smith et al., 2016) and a specific variant underlying protective immunity to tuberculosis (Smith et al., 2019). While these studies demonstrate the tractability of the DO and CC populations to model the influence of host diversity on infection, dissecting host-pathogen interactions requires the integration of pathogen genetic diversity.

We combined the natural but reproducible host variation of the CC panel with a comprehensive library of Mtb transposon mutants to determine whether the CC population could be used to characterize the interactions between host and pathogen. Using over 60 diverse mouse strains, we report that the CC panel encompasses a broad spectrum of TB susceptibility and immune phenotypes. By leveraging high-resolution bacterial phenotyping known as ‘Transposon Sequencing’ (TnSeq), we quantified the relative fitness of a saturated library of Mtb mutants across the CC panel and specific immunological mouse knockout strains. We report that approximately three times more bacterial genes contribute to fitness in the diverse panel than in any single mouse strain, defining a large fraction of the bacterial genome that is dedicated to adapting to distinct immune states. Association of both host immunological phenotypes and bacterial fitness traits with Quantitative Trait Loci (QTL) demonstrated the presence of discrete Host-Interacting-with Pathogen QTL (HipQTL) that represent inter-species genetic interactions that influence the pathogenesis of this infection. Together, these observations support the CC population as a tractable model of host diversity that greatly expands the spectrum of immunological and pathological states that can be modeled in the mouse.

Results

The spectrum of TB disease traits in the CC exceeds that observed in standard inbred mice

To characterize the diversity of disease states that are possible in a genetically diverse mouse population, we infected a panel of 52 CC lines and the eight founder strains with Mtb. To enable bacterial transposon sequencing (TnSeq) studies downstream, the animals were infected via the intravenous (IV) route with a saturated library of Mtb transposon mutants (infectious dose of 105 CFU), which in sum produce an infection that is similar to the wild-type parental strain (Bellerose et al., 2020; Sassetti and Rubin, 2003). Groups of three to six male mice per genotype were infected and TB disease-related traits were quantified at one-month post-infection. Data from all surviving animals that were phenotyped are provided in Figure 1—source data 1. The bacterial burden after 4 weeks of infection was assessed by plating (colony-forming units, CFU) and quantifying the number of bacterial chromosomes in the tissue (chromosome equivalents, CEQ). These two metrics were highly correlated (r = 0.88) and revealed a wide variation in bacterial burden across the panel (Figure 1A and Figure 1—figure supplement 1). The phenotypes of the inbred founder strains were largely consistent with previous studies employing an aerosol infection (Smith et al., 2016), where the WSB strain was more susceptible than the more standard B6, 129S1/SvlmJ (129), and NOD/ShiLtJ (NOD) strains. Across the entire CC panel, lung bacterial burden varied by more than 1000-fold, ranging from animals that are significantly more resistant than B6, to mice that harbored more than 109 bacteria in their lungs (Figure 1A). Bacterial burden in the spleen also varied several thousand-fold across the panel and was moderately correlated with lung burden (r = 0.43) (Figure 1—figure supplement 1). Thus, the CC panel encompasses a large quantitative range of susceptibility.

Figure 1 with 2 supplements see all
he spectrum of M.tuberculosis disease-related traits across the collaborative cross.

(A) Average lung CFU (log10) across the CC panel at 4 weeks post-infection. Bars show mean ± SD for CFU per CC or parental strain; groups of three to six mice per genotype were infected via IV route (infectious dose of 104 in the lungs and 105 in the spleen as quantified by plating CFU 24 hr post-infection). To compare the field standard B6 mouse strain with the diverse CC mouse strains, bars noted with * indicate strains that were statistically different from B6 (p < 0.05; 1-factor ANOVA with Dunnett’s post-test). (B) Heatmap of the 32 disease-related traits (log10 transformed) measured including: lung and spleen colony forming units (CFU); lung and spleen chromosomal equivalents (CEQ); weight loss (% change); cytokines from lung; ‘earliness of death’ (EoD), reflecting the number of days prior to the end of experiment that moribund strains were euthanized. Mouse genotypes are ordered by lung CFU. Scaled trait values were clustered (hclust in R package heatmaply) and dendrogram nodes colored by 3 k-means. Blue node reflects correlation coefficient R > 0.7; green R = 0.3–0.6 and red R < 0.2. Source files of all measured phenotypes are available in Figure 1—source data 1. (C) Correlation of lung CFU and weight (% change) shaded by CXCL1 levels. Genotypes identified as statistical outliers for weight are noted by #; CXCL1 by † (CC030 is triangle with #†;CC040 is triangle with #; AJ is circle with #; CC056 is circle with †). (D) Correlation of lung CFU and IFNγ levels shaded by IL-17. Strains identified as outliers for IFNγ noted by # (CC055 is left circle with #, AJ is right circle with #, CC051 is triangle with #). Each point in (C) and (D) is the average value per genotype. Outlier genotypes were identified after linear regression using studentized residuals. (E–H) Disease traits measured in a validation cohort (B6 vs CC042, CC032, CC037, and CC027) at 4 weeks after post low-dose aerosol infection (E) lung CFU (log10); (F) Weight (percent change relative to uninfected); (G) CXCL1 abundance in lung (log10 pg/mL homogenate); (H) IFNγ (log10 pg/mL homogenate). Bar plots show the mean ± SD. p-Values indicate strains that were statistically different from B6 (1-factor ANOVA with Dunnett’s post-test). Source files of all measured phenotypes in the aerosol validation cohort are available in Figure 1—source data 2. Groups consist of three to six mice per genotype. All mice in the initial CC screen and validation cohort were male.

Figure 1—source data 1

CC TB disease phenotypes.

TB disease-related phenotypes measured in the CC and parental strains at one-month post-infection. Recorded values are the average and standard deviation of indicated number of mice per genotype (‘N of mice infected’ at the start of the large screen and ‘N of surviving phenotyped animals’). Mice were infected over three batches (denoted by ‘block’). ‘Freezer days’ denotes the number of days prior to the one-month end of infection timepoint that some moribund genotypes were euthanized in accordance with IACUC approved endpoints. ‘Blaze’ denotes genotypes with white head-spotting coat color trait (WSB haplotype for Kitl; used as a positive control/proof-of-concept for QTL mapping as per Aylor et al., 2011; Smith et al., 2019).

https://cdn.elifesciences.org/articles/74419/elife-74419-fig1-data1-v2.zip
Figure 1—source data 2

Aerosol validation phenotypes.

TB disease-related phenotypes measured in B6 and the susceptible CC genotypes (CC027, CC032, CC037, CC042) after infection with Mtb by low-dose aerosol infection. Recorded values are the individual measurements per mouse, designated by genotype.

https://cdn.elifesciences.org/articles/74419/elife-74419-fig1-data2-v2.zip

Comparing various measures of infection progression showed many expected correlations but also an unexpected decoupling of some phenotypes. As an initial assessment of the disease processes in these animals, we correlated bacterial burden and lung cytokine abundance with measures of systemic disease such as weight loss and sufficient morbidity to require euthanasia (‘earliness of death’). In general, correlations between these metrics indicated that systemic disease was associated with bacterial replication and inflammation (Figure 1B and Figure 1—figure supplement 1). Lung CFU was strongly correlated with weight loss, mediators that enhance neutrophil differentiation or migration (CXCL2 (MIP-2; r = 0.79), CCL3 (MIP-1a; r = 0.77), G-CSF (r = 0.78), and CXCL1 (KC; r = 0.76)), and more general proinflammatory cytokines (IL-6 (r = 0.80) and IL-1α (r = 0.76)) (Figure 1—figure supplement 1). These findings are consistent with previous work in the DO panel, that found both proinflammatory chemokines and neutrophil accumulation to be predictors of disease (Ahmed et al., 2020; Gopal et al., 2013; Koyuncu et al., 2021; Niazi et al., 2015).

The reproducibility of CC genotypes allowed us to quantitatively assess the heritability (h2) of these immunological and disease traits. The percent of the variation attributed to genotype ranged from 56%–87% (mean = 73.4%; (Appendix 1—table 1)). The dominant role of genetic background in determining the observed phenotypic range allowed a more rigorous assessment of strains possessing outlier phenotypes than is possible in the DO population, based on linear regression using studentized residuals that accounts for the intragenotype variation. For example, despite the correlation between lung CFU and weight loss (r = 0.57), several strains failed to conform to this relationship (Figure 1C). In particular, CC030/GeniUnc (p = 0.003), CC040/TauUnc (p = 0.027) and A/J (p = 0.03) lost more weight than their bacterial burdens would predict (Figure 1C; noted by #). Similarly, CXCL1 abundance was higher in CC030/GeniUnc (p = 0.001) and lower in CC056/GeniUnc (p = 0.040), than the level predicted by their respective bacterial burden (Figure 1C; outlier genotypes noted by †). Thus, these related disease traits can be dissociated based on the genetic composition of the host.

The cluster of cytokines that was most notably unrelated to bacterial burden included IFNγ and the interferon-inducible chemokines CXCL10 (IP10), CXCL9 (MIG), and CCL5 (RANTES) (Red cluster in Figure 1B; Figure 1—figure supplement 1) (R < 0.3). Despite the clear protective role for IFNγ (Cooper et al., 1993; Flynn et al., 1993), high levels have been observed in susceptible mice, likely as a result of high antigen load (Barber et al., 2011; Lazar-Molnar et al., 2010). While high IFNγ levels in susceptible animals was therefore expected, it was more surprising to find a number of genotypes that were able to control bacterial replication yet had very low levels of this critically important cytokine (Figure 1D). This observation is likely due the inclusion of two founder lines, CAST/EiJ (CAST) and PWK/PhJ (PWK) that we previously found to display this unusual phenotype (Smith et al., 2016).

To assess the reproducibility of these findings in an aerosol infection model, we tested four CC genotypes that were susceptible by IV infection, including CC027, CC032, CC037, and CC042. We infected groups of 4–6 mice per genotype with H37Rv strain via low-dose aerosol infection (~100 CFU), including B6 mice as resistant controls. At 4-weeks post infection, we quantified lung CFU, lung cytokine abundance and weight loss as measurements of TB disease. Compared to the resistant B6 mice, the selected CC strains demonstrated higher bacterial burden in the lung (Figure 1E) and significant weight loss (Figure 1F), thus validating disease traits as consistent across both route and dose. Likewise, cytokines that were highly correlated with lung burden in the CC screen (Figure 1B, Figure 1—figure supplement 1) were consistent in the aerosol validation study (Figure 1—figure supplement 2). Notably, CXCL1 was consistently high in the susceptible genotypes, as compared to B6 (Figure 1G), and was highly correlated with lung burden by both IV (R = 0.76) and aerosol (R = 0.92) routes. IFNγ levels were variable across the strains (Figure 1H) and did not correlate with lung CFU (R = −0.22), concordant with findings from the CC screen (R = −0.21). Altogether, this survey of TB-related traits demonstrated a broad range of susceptibility and the presence of qualitatively distinct and genetically determined disease states.

TipQTL define genetic variants that control TB immunophenotypes

Tuberculosis ImmunoPhenotype Quantitative Trait Loci (TipQTL), which were associated with TB disease or cytokine traits, were identified and numbered in accordance with previously reported TipQTL (Smith et al., 2019). Of the 32 TB-disease traits, we identified nine individual metrics that were associated with a chromosomal locus. Of these, three were associated with high confidence (p ≤ 0.053), and six other QTL met a suggestive threshold as determined by permutation analysis (p < 0.2; Table 1). Several individual trait QTL occupied the same chromosomal locations. For example, spleen CFU and spleen CEQ, which are both measures of bacterial burden and highly correlated, were associated with the same interval on distal chromosome 2 (Table 1, Tip5; Figure 2A and C). IL-10 abundance was associated with two distinct QTL (Table 1). While IL-10 was only moderately correlated with spleen CFU (R = 0.48), one of its QTL fell within the Tip5 bacterial burden interval on chromosome 2 (Figure 2A and C). At this QTL, the NOD haplotype was associated with high values for all three traits (Figure 2E). Similarly, the strongly correlated traits, CXCL1 abundance and lung CFU, were individually associated to the same region on chromosome 7 (Table 1, Tip8; Figure 2B and D). In this interval, the CAST haplotype was associated with both low bacterial burden and CXCL1 (Figure 2F). At both Tip5 and Tip8, we found no statistical evidence that the positions of the associated QTL were different (Tip5 p = 0.55; Tip8 p = 0.27; 400 bootstrap samples) (Boehm et al., 2019). These observations support the role of a single causal variant at each locus that is responsible for a pleiotropic trait. Coincident mapping can provide both additional statistical support for QTL (p values by Fisher’s combined probability test: Chr 7, p = 0.067; Chr 2, p = 0.041) and suggests potential mechanisms of disease progression.

Host loci underlying TB disease-related traits.

(A–B) Whole genome QTL scans of (A) spleen CEQ, spleen CFU and IL-10 (B) lung CFU and CXCL1. (C) Zoom of chromosome two loci. (D) Zoom of chromosome seven loci. Thresholds were determined by permutation analysis; solid line, middle dashed line, and lowest dotted lines represent p = 0.05, p = 0.1, and p = 0.2. (E–F) Scaled phenotype value per haplotype at the QTL peak marker. Each dot represents the mean value for a genotype.

Table 1
Disease-related Tuberculosis ImmunoPhenotype QTL (TipQTL).

Multiple QTL within the same interval and clear allele effects are designated with the same TipQTL number. p-Values are determined by Churchill-Doerge permutations (Churchill and Doerge, 1994). Column headings: QTL, quantitative trait loci; Chr, chromosome; LOD, logarithm of the odds; CEQ, chromosomal equivalents.

QTLTraitChrLODp valueInterval start (Mb)Peak (Mb)Interval end (Mb)
Tip5Spleen CEQ29.142.38E-02174.29178.25178.25
Tip5Spleen CFU27.042.19E-0173.98174.29180.10
Tip6IL-928.614.52E-0233.4341.441.48
Tip6IL-927.851.26E-0122.7724.6225.65
Tip7IL-17157.845.27E-0267.9874.1482.11
Tip8CXCL177.571.06E-0130.4345.2246.72
Tip8Lung CFU77.471.17E-0131.0637.7845.22
Tip9IL-10177.161.85E-0180.9882.4783.55
Tip10Lung CFU157.131.86E-0177.0078.1678.70

A number of factors can limit the statistical significance of QTL identified in the CC population, including small effect sizes, limited genotype availability, and the genetic complexity of the trait. We took an F2 intercross approach to independently assess the importance of the lung CFU QTL on chromosomes 7 and 15 (Tip8 and Tip10, Table 1). Given that the associations at both QTL were driven by the CAST haplotype (Figure 2F), we generated an F2 population based on two CC strains, CC029/Unc and CC030/GeniUnc, that contained CAST sequence at Tip8 and Tip10, respectively (Figure 3A and B). The F2 validation cohort (n = 251 mice) were genotyped (Sigmon et al., 2020) and infected with the Mtb strain H37Rv (IV route with infectious dose of 105 CFU, as per the original CC screen). At 1 month post infection, lung CFU was quantified, and we conducted QTL mapping in R/qtl2 (Broman et al., 2019) to identify host loci underlying bacterial burden in the lung. We identified a QTL significantly associated with lung CFU (LOD = 6.81; p < 0.05; 10,000 permutations) on chromosome seven that overlapped with Tip8 (peak position Chr7:28.6 Mb), thus validating this locus as a main driver of bacterial burden. In this reduced complexity cross, we did not observe a QTL on chromosome 15 (Tip10). This may be due to the B6 haplotype at this locus in CC030, which did not represent the strongest phenotypic contrast to CAST. Additionally in the mapping validation study, we identified a new resistance (low lung CFU) locus on chromosome 8 (LOD = 4.08; peak position Ch8:116.1 Mb), driven by the CC029 cross partner with the CAST haplotype. This QTL was not present in the original CC screen, probably due to the low representation of the CAST haplotype at that marker in the CC cohort tested. Altogether, this intercross strategy validated Tip8 as a strong predictor of lung CFU, though rigorous validation of Tip10 may require a more optimal pairing of parental strains.

An F2 intercross approach to validate QTL underlying lung CFU.

(A) Haplotypes of CC030 and CC029 CC strains at Chr7 (Tip 8) and (B) at Chr15 (Tip10). The F2 population (n = 251) based on these founders were genotyped, infected with Mtb (105 infectious dose by IV route, as per the original CC screen), and lung CFU was quantified at 1 month post-infection. (C) QTL mapping identified genome-wide significant (p < 0.05) loci on Chr7 (LOD = 6.81; peak position on Chr7 at 28.6 Mb) overlapping with Tip8 and a new locus on Chr8 (LOD = 4.08; peak position Ch8:116.1 Mb). Thresholds were determined by permutation analysis; solid line, middle dashed line, and lowest dotted lines represent p = 0.05, p = 0.1, and p = 0.2. Source files of F2 genotypes are available in Figure 3—source data 1; phenotypes are available in Figure 3—source data 1.

Figure 3—source data 1

F2 Intercross genotype data.

MiniMUGA genotype data from 251 F2 mice generated from CC030xCC029 intercross strategy. The infected F2 cohort included both male and female mice, as indicated.

https://cdn.elifesciences.org/articles/74419/elife-74419-fig3-data1-v2.zip
Figure 3—source data 2

F2 Intercross phenotype data.

Lung CFU data quantified by plating CFU from Mtb infected lungs from 251 F2 mice at 1 month post infection (matched with genotype data in Figure 3—source data 1). The infected F2 cohort included both male and female mice, as indicated.

https://cdn.elifesciences.org/articles/74419/elife-74419-fig3-data2-v2.zip

Mtb adapts to diverse hosts by utilizing distinct gene repertoires

This survey of disease-associated traits demonstrated that the CC panel encompasses a number of qualitatively distinct immune phenotypes. To determine if different bacterial functions were necessary to adapt to these conditions, we leveraged transposon sequencing (TnSeq) as a high-resolution phenotyping approach to estimate the relative abundance of individual Mtb mutants after selection in each CC host genotype. To serve as benchmarks of known immunological lesions, we also performed TnSeq in B6 mice that were lacking the mediators of Th1 immunity, lymphocytes (Rag2-/-) and IFNγ (Ifng-/-), or were lacking the immunoregulatory mediators that control disease by inhibiting inflammation, nitric oxide synthase (Nos2-/-) (Mishra et al., 2013) or the NADPH phagocyte oxidase (Cybb-/-) (Olive et al., 2018). The relative representation of each Mtb mutant in the input library versus the output library recovered from each mouse spleen after one-month of infection was quantified by TnSeq (Long et al., 2015). A total of 123 saturated Mtb transposon libraries (representing >50,000 independent insertion events) were sequenced, capturing 60 distinct mouse genotypes (Figure 4—source data 1).

From this TnSeq screen, we identified 214 Mtb genes that are required for growth or survival of Mtb in B6 mice, based on significant underrepresentation of the corresponding mutant after four weeks of in vivo selection. Eighty-seven percent of these genes overlapped with a similar previous analysis in BALB/c mice (Bellerose et al., 2020) highlighting the specificity of the analysis. All but one of the genes found to be important in B6, were also required in the larger mouse panel, further increasing confidence in this Mtb gene set (Figure 4A and B). While the total number of genes found to be necessary in each genotype across the diversity panel was largely similar, the composition of these Mtb gene sets varied considerably. As more CC strains, and presumably more distinct immune states, were included in the analysis, the cumulative number of bacterial genes necessary for growth in these animals also increased. This cumulative gene set plateaued at ~750, after the inclusion of approximately 20–25 mouse genotypes (Figure 4A). Simply sampling additional libraries of B6 does not appreciably increase the number of genes identified as necessary for growth in that genotype (Figure 4—figure supplement 1), supporting the presence of alternative selective environments across the CC mice. The number of genes important for fitness in the CC panel far outnumbered the 380 genes identified in the B6 and immunodeficient KO strains combined (Figure 4B and Figure 4—source data 1).

Figure 4 with 1 supplement see all
Mtb genetic requirements vary across diverse hosts.

(A) The number of Mtb genes required for growth or survival in each diverse mouse strain across the panel (Qval ≤0.05). Orange indicates the mutants required for each strain; turquoise shows the cumulative requirement as each new host strain is added. (B) Venn diagram showing the composition of Mtb gene sets required in each category of host (white, largest circle), only required in the CC panel (gray), required in specific immunological KO mice (blue) and genes required in B6 mice (red). Note, 1* is required in B6 and KO. In order to be called ‘essential’ in each mouse strain, Mtb genes had to be significantly over or underrepresented in at least two genotypes. (C) Each box shows log2 fold change (LFC) of individual mutants from the TnSeq screen relative to the input pool in indicated mouse strains (top); log2 fold change of the indicated deletion mutants relative to WT from a pooled mutant validation infection (middle panel); relative fitness calculated from (middle panel) to account for generation differences in each host due to differential growth rate. Bars are the average of 3–6 mice per mutant/genotype ± SD. Statistical differences between mini-pool validation groups was assessed by Welch’s t-test. (D) Lung CFU and spleen CFU from single strain low-dose aerosol infections of ∆bioA mutant or WT H37Rv strain in B6 and CAST mice at 2- or 5 weeks post-infection. Dashed line indicates the limit of detection. Each point indicates the average CFU ± SD of 4–5 mice per group. Statistical differences between groups were assessed by mixed effects models (Tukey’s test). (E) Log2 fold change of selected mutants from the TnSeq screen across the CC panel and immunological KO mice. Each dot represents the average LFC per mouse genotype; KO mouse strains (on a B6 background) dots are shown larger for clarity. All mice in the large CC TnSeq screen were male; mice in the ∆bioA aerosol validation were female; mice in the mini-pool validation studies were male and female with no significant differences detected. Source file of the TnSeq screen is available in Figure 4—source data 1; source count data of the TnSeq validation experiment is available in Figure 4—source data 2.

Figure 4—source data 1

TnSeq summary table.

LFC values represent the log2 fold change (LFC) between input and mouse-selected pools. ‘NA’ indicates genes with fewer than three occupied TA transposon insertion sites for the indicated comparison. Qvals represent adjusted p-values comparing mutant abundance in input and selected pools. ‘NA’ indicates genes with fewer than three occupied TA transposon insertion sites for the indicated comparison. Required in vivo: ‘TRUE’ indicates the mutant is significantly underrepresented (Qval <0.05) after in mouse-selection in at least two mouse strains. Required in B6: ‘TRUE’ indicates the mutant is significantly underrepresented (Qval <0.05) after in selection in B6 mice. Required in KO mice: "TRUE" indicates the mutant is significantly underrepresented (Qval <0.05) after in selection in Rag-/-, Nos2-/-, Cybb-/-, or Ifnγ-/- mice. Core gene set: ‘TRUE’ indicates the mutant is significantly underrepresented (Qval <0.05) in 30 mouse strains. U = uninformative; fewer than three occupied TA transposon insertion sites in all strains in panel. F = filtered; essential in only a single strain. ‘Module’ corresponds to WGCNA module number as illustrated in Figure 5A. Mouse strains are listed in the same order as Figure 5B, with the corresponding cluster designation.

https://cdn.elifesciences.org/articles/74419/elife-74419-fig4-data1-v2.xlsx
Figure 4—source data 2

Validation counts table.

CFU and normalized barcode counts from Mtb mutant mini-pool infection in individual mice, including mbtA, glpK, pstC2, eccB1, RNaseJ, and WT (H37Rv).

https://cdn.elifesciences.org/articles/74419/elife-74419-fig4-data2-v2.xlsx

To verify that our TnSeq study accurately assessed the effect of the corresponding loss-of-function alleles, we assessed the phenotypes of selected bacterial deletion mutants in a small set of mouse genotypes that were predicted to produce differential selection. Individual Mtb mutants lacking genes necessary for ESX-1 Type VII secretion (eccB1), siderophore-mediated iron acquisition (mbtA), phosphate transport (pstC2), glycerol catabolism (glpK), and RNA processing (rnaseJ) were generated and tagged with a unique molecular barcode. These mutants were combined with a barcoded wild-type parental strain, and the resulting ‘mini-pool’ was subjected to in vivo selection after IV infection of a sub-panel of mouse strains, as in the original screen. The relative abundance of each mutant was determined by sequencing the amplified barcodes and data from all reliably detected strains is shown in (Figure 4; Figure 4—source data 2). In each case, the difference in relative abundance predicted by TnSeq was reproduced with deletion mutants. In this simplified system, we were able to accurately quantify the expansion of the bacterial population and calculate the ‘fitness’ of each mutant relative to the wild-type strain. Fitness reflects the inferred doubling time of the mutant, where a fitness of 1 is defined as wild-type, and 0 represents a complete lack of growth. Even by this metric, the deletion mutants displayed the differences in fitness between mouse strains that was predicted by TnSeq (Figure 4C). The statistical significance of these differences in abundance or fitness were similar for each mutant (between p = 0.009 and p = 0.06), except for mbtA where the variation was higher, and confidence was modestly lower (p = 0.07 and p = 0.12). This study also allowed us to estimate the sensitivity of the TnSeq method, which could detect even the 30% fitness defect of the ∆glpK strain between the B6 and CC018 animals (Figure 4C), a defect that was not observed in previous studies in BALB/c mice (Bellerose et al., 2019; Pethe et al., 2010).

To also validate TnSeq predictions in a single-strain aerosol infection model, we used a biotin biosynthetic mutant. bioA is necessary for biotin production and is essential for growth in B6 mice (Woong Park et al., 2011). Our TnSeq study (Figure 4—source data 1) predicted this mutant was less attenuated in the CAST background (ratio of input/selected = 12.1) than in the B6 strain (ratio of input/selected = 42.2). Two weeks after aerosol infection, we found that the ∆bioA mutant was cleared from the lungs and spleen of B6 mice but displayed similar growth to wild-type in the lungs of CAST mice (Figure 4D). By 6 weeks post-infection the ∆bioA mutant had also been largely cleared from the lungs of CAST (Figure 4D). Thus, while TnSeq was unable predict long-term outcome, it provided an accurate assessment of relative growth attenuation in these host backgrounds.

The immunological diversity of CC mice is reflected in the pathogen’s genetic requirements

The distribution of Mtb’s requirements across the mouse panel suggested the presence of two broad categories of genes. A set of 136 ‘core’ virulence functions were required in the majority of mouse genotypes, and a second larger set of 607 ‘adaptive’ virulence genes were required in only a subset of lines (Figure 4—source data 1). The core functions included a number of genes previously found to be important in B6 mice, including those necessary for the synthesis of essential cofactors, such as pyridoxine (pdx1) (Dick et al., 2010); for the acquisition of nutrients, such as siderophore-bound iron (irtAB) (Ryndak et al., 2010), cholesterol (mce4) (Pandey and Sassetti, 2008), glutamine (glnQ and rv2563) (Bellerose et al., 2020) and for Type VII secretion (ESX1 genes) (Stanley et al., 2003). Despite the importance of these core functions, a large range in the relative abundance of these mutants was observed across the panel, and in some cases specific immunological requirements could be discerned. Mutants lacking the major structural components of the ESX1 system were attenuated for growth in B6 mice, as expected. This requirement was enhanced in mice lacking Rag2, Ifng, or Nos2 (Figure 4E), consistent with the preferential role of ESX1 during the initial stage of infection before the initiation of adaptive immunity (Stanley et al., 2003), which is prolonged in these immunodeficient strains. In contrast, the attenuation of mutants lacking the glnQ encoded glutamine uptake system was relieved in all four immunodeficient mouse lines (Figure 4E). In both cases, the differential mutant abundance observed in these KO mice was reproduced, or exceeded, in the CC panel.

The adaptive virulence functions included a number of Mtb genes previously thought to be dispensable in the mouse model and were only necessary in CC strains. For example, the alkyl hydroperoxide reductase, AhpC has been proposed to function with the adjacently encoded peroxiredoxin, AhpD and is critical for detoxifying reactive nitrogen intermediates in vitro (Chen et al., 1998; Hillas et al., 2000). However, deletion of ahpC has no effect on Mtb replication in B6 or BALB/c mice (Springer et al., 2001), and we confirmed that ahpC and ahpD mutations had no effect in any of the B6-derived strains. In contrast, ahpC, but not ahpD mutants were highly attenuated in a small number of CC strains (Figure 4E). Similarly, the four phospholipase C enzymes of Mtb (plcA-D) are implicated in both fatty acid uptake and modifying host cell membranes but are dispensable for replication in B6 mice (Le Chevalier et al., 2015). Again, while we found that none of these genes were required in B6-derived KO mouse strains, the plcD mutants were specifically underrepresented in a number of CC mice (Figure 4E). These individual bacterial functions are controlled by regulatory proteins, such as the extracytoplasmic sigma factors. Despite the importance of these transcription factors in the response to stress, only sigF has consistently been shown to contribute to bacterial replication in standard inbred lines of mice (Geiman et al., 2004; Rodrigue et al., 2006). Our study assessed the importance of each sigma factor in parallel across diverse host genotypes and identified a clear role for several of these regulators. sigC, sigI, sigF, sigL, and sigM mutants were each significantly underrepresented in multiple strains of mice, and several of these phenotypes were only apparent in the diverse CC animals (Figure 4E). In sum, the 607 adaptive functions that are differentially required across the host panel represents nearly 20% of the non-essential gene set of Mtb, suggesting that a significant fraction of the pathogen’s genome is dedicated to maintaining optimal fitness in diverse host environments.

Differential genetic requirements define virulence pathways in Mtb

To more formally investigate the distinct stresses imposed on the bacterial population across this host panel, we characterized the differentially required bacterial pathways. Upon performing each possible pairwise comparison between the in vivo selected mutant pools, we found 679 mutants whose representation varied significantly (FDR < 5%) in at least two independent comparisons (Figure 4—source data 1). We then applied weighted gene correlation network analysis (WGCNA) (Langfelder and Horvath, 2008) to divide the mutants into 20 internally-correlated modules. Further enrichment of these modules for the most representative genes (intramodular connectivity >0.6) revealed that nearly all modules contained genes that are encoded adjacently in the genome and many of these modules consisted of genes dedicated to a single virulence-associated function (Figure 5A). Module three contains two distally encoded loci both known to be necessary for ESX1-mediated protein secretion, the primary ESX1 locus (rv3868-rv3883) and the espACD operon (rv3616c-rv3614c). Similarly, other modules consisted of genes responsible for ESX5 secretion (Module 7), mycobactin synthesis (Module 4), the Mce1 and Mce4 lipid importers (Modules 5 and 16), phthiocerol dimycocerosate synthesis (PDIM, Module 8), PDIM transport (Module 16), and phosphate uptake (Module 14). The 20 genes assigned to Module six included two components of an important oxidative stress resistance complex (sseA and rv3005c) and were highly enriched for mutants predicted to be involved in this same process via genetic interaction mapping (11/20 genes were identified in Nambi et al., 2015, a statistically significant overlap [p < 2.8e-10 by hypergeometric test]). Thus, each module represented a distinct biological function.

Figure 5 with 1 supplement see all
Mtb virulence pathways associate with distinct host immune pressures.

(A) Weighted gene correlation network analysis (WGCNA) of the 679 Mtb genes that significantly vary across the diverse mouse panel. The most representative genes of each module (intramodular connectivity >0.6) are shown. (B) Mouse genotypes were clustered based on the relative abundance of the 679 variable Mtb mutants. The six major clusters (Cluster A-F) were associated with both CFU and the relative abundance of mutants in each bacterial module (1-20; right hand-side with known functions). Statistical analysis is described in Methods. Yellow shading indicates clusters associated with lung CFU. * indicate modules significantly associated with specific mouse clusters (p < 0.05).

Many pathway-specific modules contained genes that represented novel functional associations. For example, the gene encoding the sigma factor, sigC, was found in Module one along with a non-ribosomal peptide synthetic operon. Previous genome-wide ChIP-seq and overexpression screens support a role for SigC in regulating this operon (Minch et al., 2015; Turkarslan et al., 2015). Similarly, rv3220c and rv1626 have been proposed to comprise an unusual two-component system that is encoded in different regions of the genome (Morth et al., 2005). Both of these genes are found in Module 2, along with the PPE50 and PPE51 genes that encode at least one outer membrane channel (Wang et al., 2020; Figure 5A). In both cases, these associations support both regulatory and obligate functional relationships between these genes. Six of the 20 modules were not obviously enriched for genes of a known pathway, demonstrating that novel virulence pathways are important for adapting to changing host environments.

To explore the complexity of immune environments in the CC, we used the TnSeq profiles of the 679 differentially fit Mtb mutants to cluster the mouse panel into six major groups of host genotypes (Figure 5B). One mouse cluster was significantly associated with high CFU (Cluster F, Figure 5B), which contained susceptible Nos2-/-, Cybb-/-, Ifng -/-, and Rag2-/- animals. This high CFU cluster was associated with alterations in a diverse set of bacterial modules and corresponded to an increased requirement for lipid uptake (Modules 5 and 16) and ESX1, consistent with previous TnSeq studies in susceptible Nos2-/- and C3HeB/FeJ mice (Mishra et al., 2017). In addition, we identified a significant reduction in the requirement for the oxidative stress resistance (Module 6) in the highest CFU cluster. Despite these associations between bacterial genetic requirements and susceptibility, the clustering of mouse genotypes was largely independent of overall susceptibility. Similarly, while Module one was significantly associated with high IFNγ levels, other bacterial fitness traits were not highly correlated with cytokine abundance (Figure 5—figure supplement 1). Instead, each major mouse cluster was associated with a distinct profile of Mtb genetic requirements. This observation supported the presence of qualitatively distinct disease states and complex genetic control of immunity.

Identification of genome-wide host interacting with pathogen QTL (HipQTL)

To investigate the host genetic determinants of the bacterial microenvironment, we leveraged TnSeq as a high-resolution phenotyping platform to associate Mtb mutant fitness profiles with variants in the mouse genome. When the relative abundance of each Mtb mutant phenotype was considered individually, the corresponding ‘Host Interacting with Pathogen QTL’ (HipQTL) were distributed across the mouse genome (Figure 6A). Forty-one of these traits reached an unadjusted p-value threshold of 0.05 and can be considered as robust for single hypothesis testing (Hip1-41, Table 2). These included HipQTL associated with ahpC (Hip12) and eccD1 (Hip22), that explain at least a portion of the observed variable abundance of these Mtb mutants (Figure 4E). In order to reduce complexity and increase the power of this analysis, we performed QTL mapping based on the first principal component of each of the previously defined modules of Mtb virulence pathways (Figure 5A). Three of these ‘eigentraits’ were associated with QTL at a similar position on chromosome 10 (Figure 6B), corresponding to Module 3 (TypeVII secretion, ESX1), Module 4 (mycobactin synthesis, mbt), or Module 16 (cholesterol uptake, mce4). In all three cases, a single mutant from the module was independently associated with a QTL at the same position as the module eigentrait (Table 2; Hip21, Hip22, Hip24), and all genes in the corresponding network cluster (Figure 5A) mapped to the same location (Figure 6C–E). While not all individual traits mapped with high confidence, the coincidence of these multiple QTL was statistically significant (Figure 6B).

Identification of ‘Host Interacting with Pathogen’ QTL mapping (HipQTL).

(A) Manhattan plot of single Mtb mutant QTL mapping across the mouse genome. Each dot represents an individual Mtb mutant plotted at the chromosomal location of its maximum LOD score. Red dashed line indicates p < 0.01; Blue p < 0.05. (B) Chromosome 10 QTL (in Mb) corresponding to Mtb eigentraits identified in network analysis in Figure 5. Module 3 (Type VII secretion, ESX1 operon; orange), Module 4 (Mycobactin synthesis, mbt; green) and Module 16 (Cholesterol uptake, mce4; purple) are shown. Solid and dotted lines indicated p = 0.05 and p = 0.1, respectively. Chromosomal position is in megabase units (Mb). (C–E) QTL mapping of single Mtb mutants corresponding to the (C) ESX1 module, (D) mbt module and (E) mce4 modules. Coincidence of multiple QTL was assessed by the NL-method of Neto et al., 2012. Thresholds shown are for N = 9, N = 8, and N = 6 for panels C, D, and E, respectively. Chromosomal position is in megabase units (Mb). (F) Parental founder effects underlying Module 3, 4, and 16 QTL. Allele effects were calculated at the peak LOD score marker on chromosome 10. (G) Distribution of log2 fold change (LFC) of representative single mutants from each module; eccCa1 (ESX1 module), mbtE (mbt module), and mce4F (mce4 module) relative to in vitro. Each dot is the LFC of the specified mutant in each CC mouse strain. Box and whiskers plots of each trait indicate the median and interquartile range. (H) Spleen CEQ and Spleen CFU for CC strains (box plots as in G). Mouse values are grouped by the parental haplotype allele series underlying the chromosome 10 Hip42 locus (NOD/WSB vs AJ/B6/NZO). Each dot represents the average CFU/CEQ of each CC genotype. Statistical differences in disease-associated traits and distinct haplotypes groups were assessed by t-test. LOD, logarithm of the odds; LFC, log2 fold change; CEQ, chromosomal equivalents; CFU, colony-forming units.

Table 2
HipQTL for single Mtb mutant QTL and eigentrait/module QTL.

Hip1-41 each represent host loci associated with the relative abundance of a single mutant (p < 0.05). Hip42-46 correspond to Mtb eigentraits identified in network analysis in Figure 5 (including significant p < 0.05 and suggestive p < 0.25). Figure column headings: QTL, quantitative trait loci; Mtb, Mycobacterium tuberculosis; Module #, module number determined from WGCNA modules; ORF, open reading frame; ID, identification number; LOD, logarithm of the odds; Chr, chromosome.

QTLTraitMtb ORF IDModule #LODP valueChrStart (Mb)Peak (Mb)End (Mb)
Hip1rv0770RVBD_0770mod179.815.61E-03140.4342.7343.32
Hip2rv0309RVBD_0309mod137.953.22E-02157.9958.1862.79
Hip3rv3657cRVBD_3657 cmod157.904.95E-021136.39138.24143.60
Hip4rv0110RVBD_0110mod187.794.39E-022170.67174.00178.84
Hip5rv3577RVBD_3577mod79.233.83E-0233.3210.0314.67
Hip6rv3005cRVBD_3005 cmod68.033.75E-02320.3126.1226.12
Hip7dinXRVBD_1537mod159.975.38E-03326.9930.2933.85
Hip8fadA6RVBD_3556 cmod58.741.01E-02329.2335.2237.11
Hip9dinXRVBD_1537mod159.221.60E-02336.2236.8338.27
Hip10rv2707RVBD_2707mod68.174.21E-023100.90103.23115.82
Hip11rv3701cRVBD_3701 cmod67.903.38E-02474.0078.2587.00
Hip12ahpCRVBD_2428mod138.122.14E-02619.7522.2123.31
Hip13umaARVBD_0469mod208.322.31E-027117.87118.41120.15
Hip14rv2566RVBD_2566mod157.864.55E-027123.21126.67126.67
Hip15rv3173cRVBD_3173 cmod58.173.03E-027137.41138.36138.36
Hip16rv3173cRVBD_3173 cmod58.123.28E-027139.15140.76141.88
Hip17rv3502cRVBD_3502 cmod58.163.17E-02915.9116.3318.72
Hip18mycP1RVBD_3883 cmod39.093.66E-03928.4729.4531.10
Hip19rv0057RVBD_0057mod68.393.79E-02936.7840.0740.36
Hip20hycERVBD_0087mod208.211.40E-02947.4047.9351.80
Hip21mbtARVBD_2384mod48.302.05E-021064.4868.0975.42
Hip22eccD1RVBD_3877mod38.083.08E-021064.5668.1271.04
Hip23rv2989RVBD_2989mod129.161.67E-021074.3077.6381.03
Hip24mce4ARVBD_3499 cmod167.914.12E-021078.8881.3688.25
Hip25treSRVBD_0126mod77.943.04E-021120.8036.1444.06
Hip26pckARVBD_0211mod37.674.74E-021185.9589.7891.75
Hip27aspBRVBD_3565mod78.323.66E-0211114.69116.99117.08
Hip28rv1227cRVBD_1227 cmod179.164.64E-021225.2325.2328.54
Hip29rv0219RVBD_0219mod207.943.09E-021240.6542.6547.22
Hip30rv3643RVBD_3643mod88.891.04E-021395.4397.0897.79
Hip31ansARVBD_1538 cmod118.283.32E-021396.8297.7999.09
Hip32echA19RVBD_3516mod209.683.95E-0213113.20114.59117.64
Hip33rv1836cRVBD_1836 cmod159.191.75E-021474.9476.4076.43
Hip34rv2183cRVBD_2183 cmod117.754.84E-021612.1814.0617.92
Hip35rv1178RVBD_1178mod68.194.76E-021780.9280.9283.23
Hip36rv0492cRVBD_0492 cmod178.903.18E-02185.855.8512.40
Hip37cysMRVBD_1336mod128.478.67E-03194.206.466.46
Hip38atsARVBD_0711mod18.581.38E-021931.2137.8637.93
Hip39galE2RVBD_0501mod68.102.85E-02X6.016.019.12
Hip40pks11RVBD_1665mod178.251.94E-02X50.4351.7552.29
Hip41pknKRVBD_3080 cmod178.733.79E-02X95.01102.02130.04
Hip42Module 3ESX1 operonmod37.805.38E-021064.768.2777.07
Hip42Module 4Mycobactin (mbt)mod47.795.05E-021065.2369.9474.30
Hip43Module16mce4 operonmod167.537.97E-021074.3081.3687.61
Hip44Module 19unclassifiedModule 197.641.39E-011160.8762.2063.26
Hip45Module 10Transcriptional regulationModule 106.951.04E-0115100.39102.25103.36
Hip46Module 10Transcriptional regulationModule 106.322.54E-011932.7432.8737.48

Both the relative positions of the module-associated QTL and the associated founder haplotypes indicated that a single genetic variant controlled the abundance of ESX1 and mbt mutants (Hip42). Specifically, we found no statistical support for differentiating these QTL based on position (p = 0.93) (Boehm et al., 2019) and the same founder haplotypes were associated with extreme trait values at both loci, though they had opposite effects on the abundance of ESX1 mutants and mbt mutants (Figure 6F). We conclude that a single haplotype has a pleiotropic effect on Mtb’s environment and has opposing effects of the requirement for mycobactin synthesis and ESX1 secretion. The relationship between this variant and the mce4-associated QTL (Hip43) was less clear, as the statistical support for independent QTL was weak (ESX1 and mce4 QTL p = 0.17; mbt and mce4 QTL p = 0.08) and the effects of founder haplotypes were similar but not identical (Figure 6F). Some of this ambiguity may be related to the relatively small range in trait values for mce4, compared to either ESX1 or mbt (Figure 6G). Based on this data, we report two distinct HipQTL in this region (Hip42 and 43; Table 2).

Two TipQTL overlapped with HipQTL (Figure 7; Tip5/Hip4 on chromosome two and Tip9/Hip35 on chromosome 17), suggesting specific interactions between bacterial fitness and immunity. However, most Tip- and HipQTL were distinct, indicating that the fitness of sensitized bacterial mutants can be used to detect genetic variants that subtly influence the bacterial environment but not overtly alter disease. We chose to further investigate whether HipQTL might alter overall bacterial disease using the most significant HipQTL on chromosome 10 (Hip42). We found that the founder haplotypes associated with extreme trait values at this QTL could differentiate CC strains with significantly altered total bacterial burden, and the NOD and WSB haplotypes were associated with higher bacterial numbers (p = 0.0085 for spleen CEQ; P = 0.027 for spleen CFU; Figure 6H). Thus, not only could the HipQTL strategy identify specific interactions between host and bacterial genetic variants, but it also appears to be a sensitive approach for identifying host loci that influence the trajectory of disease.

Visual representation of all Tip and HipQTL mapped in the CC TnSeq infection screen.

Tuberculosis ImmunoPhenotypes (Tip) QTL (QTL mapped by disease-associated traits in CC mice), are shown in green. TipQTL mapped by separate traits that share similar founder effects were considered to be the same QTL and were named accordingly. Host Interacting with Pathogen (Hip) QTL, (QTL mapped by individual TnSeq mutant relative abundance profiles), are shown in purple. After WGCNA mutant clustering and mapping with representative eigengenes from each module, QTL mapped by module eigengenes are shown in magenta.

Identifying candidate genes underlying QTL

A pipeline was designed to prioritize genetic variants based on genomic and tuberculosis disease criteria. We concentrated on three QTL: two that were highly significant and with clear allele effects (Tip5, Hip42), and the Tip8 locus which we validated by intercross. For each QTL region, we identified genes that belonged to a differentially expressed transcriptional module in mouse lungs following Mtb infection (Moreira-Teixeira et al., 2020). Next, we identified genetic variants segregating between the causal CC haplotypes in the gene bodies corresponding to these transcripts, and prioritized missense or nonsense variants.

For the Tip5 QTL underlying CEQ, CFU, and IL-10 levels, we identified nine candidate genes with regulatory or splicing variants and two genes with missense variants specific to the NOD haplotype. Of these candidates, cathepsin Z (Ctsz) encodes a lysosomal cysteine proteinase and has previously been associated with TB disease risk in humans (Adams et al., 2011; Cooke et al., 2008). The QTL underlying lung CFU and CXCL1 abundance (Tip8), which was driven solely by the genetically divergent CAST founder haplotype, contained over 50 genes (Table 3) and will need further refinement. The QTL associated with the abundance of ESX1 and mbt mutants (Hip42) had a complex causal haplotype pattern (AJ/B6/NZO vs. 129/CAST/PWK vs. NOD/WSB) suggesting multiple variants might be impacting common genes. Within this interval, we identified 13 genes expressed in response to Mtb infection, three of which had SNPs fully or partially consistent with at least one of the identified causal haplotype groups (Table 3). Ank3 contains several SNPs in the 3’ UTR and other non-coding exons that differentiated NOD/WSB from the other haplotypes. Similarly, Fam13c had two missense mutations following the same haplotype pattern. For the AJ/B6 haplotype state, we identified a missense mutation and several variants in the 3’ UTR of Rhobtb1, which belongs to the Rho family of the Ras superfamily of small GTPases (Goitre et al., 2014). Overall, the evidence supports a role for Rhobtb1 in a monogenic effect at the chromosome 10 locus. This evidence includes both protein coding differences dividing AJ/B6 from the other haplotypes, a potential expression/transcript regulatory difference that segregates the NOD/WSB state from the remaining parental haplotypes, and a plausible role for this gene in controlling intracellular trafficking (Long et al., 2020) and the opposing requirements for ESX1 and mycobactin.

Table 3
Candidate genes within QTL regions.

Prioritized candidates shown for selected QTL. Candidates were prioritized by filtering on (1) differential expression during Mtb infection, and (2) variants within TB-expressed genes that segregated between informative CC haplotypes. Genes listed below contain non-synonymous variants (i.e. amino acid changes, regulatory mutations or splicing mutations) consistent with the identified singly causal haplotype (NOD for Tip5; CAST for Tip8). Hip42 displayed a more complex haplotype pattern (WSB/NOD vs AJ/B6/NZO), and candidate selection is discussed in the main text. Genes with missense or nonsense variants (denoted by *).

Tip5Tip8Hip42
CtszFxyd5*SiglecgAnk3
Tubb1Fxyd1Nkg7Cdk1
Atp5eLgi4Cd33*Tmem26
Prelid3bFxyd3Siglece*Slc16a9
Zfp831*HpnKlk13Fam13c
Edn3Scn1bKlk8Rhobtb1
Gm14391*Gramd1a*Klk7*
Gm6710Pdcd2l*Klk1b9*
Zfp931Gpi1Klk1
4931406P16RikClec11a
Kctd15Shank1
Chst8Syt3
PepdLrrc4b
CebpaJosd2
Slc7a10Spib
Lrp3*Pold1
Rhpn2Napsa*
Faap24Kcnc3
Tdrd12*Myh14
Ankrd27*Atf5
Pdcd5Il4i1
Dpy19l3Pnkp*
Tshz3*Ptov1
Ccne1Fuz
1600014C10RikTsks
Plekhf1Cpt1c*
Vstm2b
Zfp975*
Zfp715*

Discussion

Our broad profiling of both host and pathogen traits after Mtb infection in a large panel of CC strains, created a reproducible resource to study the diverse host-pathogen interactions that drive tuberculosis disease. The immunological analysis of the CC panel identified correlates of TB disease progression that were consistent with previous studies in both mice and human patients (Ahmed et al., 2020; Niazi et al., 2015; Zak et al., 2016). We also identified outlier strains that produce distinct immunological states, suggesting that our previous reliance on genetically homogenous lab strains of mice has oversimplified our understanding of TB pathogenesis. For example, despite the strong correlation between lung bacterial burden and weight loss, CC030/GeniUnc and CC040/TauUnc mice suffered from more inflammation and wasting than would be predicted from the number of bacteria in their lungs or spleens. This phenotype reflects a failure of disease ‘tolerance’, which is proposed to be a critical determinant of protective immunity (Ayres and Schneider, 2012; Olive et al., 2018). Similarly, we identified a number of CC genotypes that produce very low, or undetectable, levels of the protective cytokine IFNγ, but still control lung bacterial replication. While a growing body of literature suggests that immune responses distinct from the canonical Th1 response can control infection (Lu et al., 2019; Sakai et al., 2016), these CC strains are the first example of an animal model in which IFNγ appears to be dispensable. Despite the relatively small group sizes used in this initial phenotypic screen, the reproducibility of the CC strains facilitated the identification of these phenotypes and provides tractable models for further characterization.

The ability to separate aspects of the immune response from disease progression implied that these features are under distinct multigenic control. Our study demonstrated the feasibility of mapping the genetic variants that control the complex immune response to Mtb. The QTL identified in this study are generally distinct from CC loci that control immunity to viruses (Ferris et al., 2013; Gralinski et al., 2017; Noll et al., 2020) or another intracellular bacterial pathogen, Salmonella (Zhang et al., 2019). However, Tip8 and Tip10 overlap with QTL previously defined via Mtb infection of a CC001xCC042 F2 intercross population (Smith et al., 2019) suggesting that common variants may have been identified in both studies. While the specific genetic variants responsible for these QTL remain unknown, both coincident trait mapping and bioinformatic analysis suggest mechanistic explanations for some QTL-phenotype associations. For example, a single interval on chromosome two controls CFU levels and IL-10, and contains a variant in the Ctsz gene encoding Cathepsin Z. Ctsz is a strong candidate considering its known roles in autophagy (Amaral et al., 2018), dendritic cell differentiation and function (Obermajer et al., 2008), its upregulation in non-human primates (Ahmed et al., 2020) and human patients with Mtb (Zak et al., 2016), and the association of CTSZ variants with disease risk in human TB studies (Adams et al., 2011; Cooke et al., 2008). Regardless of the responsible variants, these data will facilitate the generation of new congenic animal models that isolate the contribution of each QTL to phenotype.

Using TnSeq as a multidimensional phenotyping method across this population provided insight into how the diversity of host-derived microenvironments have shaped the pathogen’s genome. While Mtb is an obligate pathogen with no significant environmental niche, only a minority of the genes in its genome have been found to contribute to bacterial fitness in either laboratory media or individual inbred mouse models, leaving the pressures that maintain the remaining genomic content unclear. Our study indicated that a roughly similar number of genes are important for Mtb fitness in a given mouse strain, even immunodeficient strains that likely represent the most divergent environments. While this observation may seem counterintuitive, it is consistent with previous TnSeq studies in both mouse models (Mishra et al., 2017) and in vitro conditions (Minato et al., 2019), where distinct but similarly sized gene sets are necessary for growth under very different conditions. Overall, we find that approximately three times more genes contribute to bacterial growth or survival in the CC population than in the standard B6 model. While some bacterial genetic requirements could be associated with known immune pathways, most of the differential pressures on bacterial mutants could not be attributed to these simple deficiencies in known mechanisms of immune control. Instead, it appears that the CC population produces a spectrum of novel environments, and that a relatively large fraction of the pathogen’s genome is needed to adapt to changing immune pressures. Differential pressures on these adaptive virulence functions are similarly apparent in genomic analyses of Mtb clinical isolates. Signatures of selection have been detected in ESX1-related genes (Holt et al., 2018; Sousa et al., 2020), phoPR (Gonzalo-Asensio et al., 2014), and the oxidative stress resistance gene sseA (de Keijzer et al., 2014), suggesting that Mtb is exposed to similarly variable host pressures in genetically diverse human and mouse populations. While the combinatorial complexity of associating host and pathogen genetic variants in natural populations is daunting, the identification of HipQTL in the CC panel indicates that these inter-species genetic interactions can be important determinants of pathogenesis and can be dissected using this tractable model of diversity.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Mus musculus, male)Collaborative Cross miceDOI: https://doi.org/10.1038/ng1104-113
Strain, strain background (Mycobacterium tuberculosis)H37RvDOI: 10.1073/pnas.2134250100
Genetic reagent (Mycobacterium tuberculosis)∆glpK; ∆pstC2; ∆eccB1; ∆mbtA;DOI: 10.1128/mBio.0146718
Genetic reagent (Mycobacterium tuberculosis)∆BioADOI: 10.1371/journal.ppat.1002264
Recombinant DNA reagentpKM464 (plasmid)DOI: 10.1128/mBio.0146718
Recombinant DNA reagentBarcode qtag (plasmid)DOI: 10.1128/mSystems.0039620
Sequence-based reagentqtag/barcode sequencing primer setsDOI: 10.1128/mSystems.0039620Table S6
Sequence-based reagentMiniMUGA genotyping arrayNeogen Inc
Sequence-based reagentGigaMUGA genotyping arrayNeogen Inc
Commercial assay or kit32-plex cytokine assayEve Technologies, Calgary, CA
Software, algorithmR/qtl2DOI: 10.1534/genetics.118.301595Dr. Karl Broman (University of Wisconsin-Madison)
Software, algorithmWGCNADOI: 10.1186/1471-2105-9-559Dr. Peter Langfelder (UCLA)

Mice

Male and female Collaborative Cross parental strains (A/J #0646; C57BL/6 J #0664; 129S1/SvImJ #02448; NOD/ShiLtJ #01976; NZO/HiLtJ #02105; CAST/EiJ #0928, PWK/PhJ #3,715 and WSB/EiJ #01145) and single gene immunological knockout mice were purchased from The Jackson Laboratory (Nos2-/- #2609, Cybb-/- #2365, Ifnγ-/- #2287) or Taconic (RagN12) and bred at UMASS. Male mice from 52 CC strains were purchased from the UNC Systems Genetics Core Facility (SGCF) between July 2013 and August 2014. The 52 CC strains used in this study include: CC001/Unc, CC002/Unc, CC003/Unc, CC004/TauUnc, CC005/TauUnc, CC006/TauUnc, CC007/Unc, CC008/GeniUnc, CC009/Unc, CC010/GeniUnc, CC011/Unc, CC012/GeniUnc, CC013/GeniUnc, CC015/Unc, CC016/GeniUnc, CC017/Unc, CC018/Unc, CC019/TauUnc, CC021/Unc, CC022/GeniUnc, CC023/GeniUnc, CC024/GeniUnc, CC025/GeniUnc, CC027/GeniUnc, CC028/GeniUnc, CC029/Unc, CC030/GeniUnc, CC031/GeniUnc, CC032/GeniUnc, CC033/GeniUnc, CC034/Unc, CC035/Unc, CC037/TauUnc, CC038/GeniUnc, CC039/Unc, CC040/TauUnc, CC041/TauUnc, CC042/GeniUnc, CC043/GeniUnc, CC044/Unc, CC045/GeniUnc, CC046/Unc, CC047/Unc, CC051/TauUnc, CC055/TauUnc, CC056/GeniUnc, CC057/Unc, CC059/TauUnc, CC060/Unc, CC061/GeniUnc, CC065/Unc, CC068/TauUnc. More information regarding the CC strains can be found at http://csbio.unc.edu/CCstatus/index.py?run=AvailableLines.information.

CC030 x CC029 F2 mice were generated in the FPMV lab at UNC by crossing CC030 and CC029 mice (purchased from the SGCF in 2016) to generate F1s (both CC030 dam by CC029 sires as well as CC029 dam by CC030 sires). The resulting F1s were subsequently intercrossed to generate 251 F2 mice with all possible grandparental combinations. Male and female F2 mice were shipped to UMASS for Mtb infections.

All mice were housed in a specific pathogen-free facility under standard conditions (12 hr light/dark, food and water ad libitum). Mice were infected with Mtb between 8 and 12 weeks of age. Male mice were used for initial large CC screen; male and female mice were used for F2 validation cohort.

M. tuberculosis trains

Request a detailed protocol

All M. tuberculosis strains (H37Rv background) were grown in Middlebrook 7H9 medium containing oleic acid-albumin-dextrose-catalase (OADC), 0.2% glycerol, and 0.05% Tween 80 to log-phase with shaking (200 rpm) at 37 °C. Hygromycin (50 µg/ml) or kanamycin (20 µg/ml) was added when necessary. The TnSeq library consisting of Himar1 transposon mutants was described previously (Sassetti et al., 2003). The ∆bioA strain was made by homologous recombination as previously described (Woong Park et al., 2011). For pooled mutant infections, deletion strains (GlpK, PstC2, EccB1, MbtA) were constructed using ORBIT (Murphy et al., 2018), which included gene replacement by the vector pKM464 carrying unique q-Tag sequences to identify each mutant for deep sequencing. The rnaseJ mutant was also made by ORBIT and was kindly provided by Dr. Nathan Hicks and Dr Sarah Fortune. Prior to all in vivo infections, cultures were washed, resuspended in phosphate-buffered saline (PBS) containing 0.05% Tween 80, and sonicated before diluting to desired concentration (see below).

Mouse infections

Request a detailed protocol

For TnSeq experiments, 1 × 106 CFU of a saturated library of Himar1 transposon mutants (Sassetti et al., 2003) was delivered via intravenous tail vein injection, resulting in an infectious dose (Day 1 CFU) of 105 in the spleen and 104 in the lung. For the TnSeq screen, groups of three to six mice per genotype were infected, including 52 CC strains, 8 parental strains, and single-gene knockout mice (Nos2-/-, Cybb-/-, Ifnγ-/- and RagN12). Mice were infected over three infection batches, as denoted in Figure 1—source data 1. Burden and immunological data from all surviving animals are provided in Figure 1—source data 1. At indicated time points mice were euthanized, and organs were harvested then homogenized in a FastPrep-24 (MP Biomedicals). CFU was determined by dilution plating on 7H10 agar with 20 µg/mL kanamycin. For library recovery, approximately 1 × 106 CFU per mouse was plated on 7H10 agar with 20 µg/mL kanamycin. After three weeks of growth, colonies were harvested by scraping and genomic DNA was extracted. The relative abundance of each transposon mutant was estimated as described (Long et al., 2015).

Single strain validation aerosol infections were performed in a Glas-Col machine to deliver 50–150 CFU/mouse. At indicated time points, mice were euthanized, and organs were harvested then homogenized in a FastPrep-24 (MP Biomedicals). CFU was determined by dilution plating on 7H10 agar with 20 ug/mL kanamycin or 50 µg/mL hygromycin as required.

Chromosomal equivalent (CEQ) was enumerated according to previously published protocol (Lin et al., 2014; Munoz-Elías et al., 2005). Cytokines and chemokines were assayed from organ homogenates using the pro-inflammatory focused 32-plex (Eve Technologies, Calgary, CA).

For pooled mutant infections, three to five mice per genotype (B6, CC051, PWK, CC042, CC005, CC018) were infected with a pool of deletion mutants at equal ratios via the intravenous route (1 × 106 CFU/mouse resulting in an infectious dose (D1 CFU) of 1 × 105 in the spleen). At indicated time points, approximately 10,000 CFU from the spleen homogenate of each mouse was plated on 7H10 agar. Genomic DNA was extracted for sequencing as described previously (Long et al., 2015). Sequencing libraries spanning the variable region of each q-Tag were generated using PCR primers binding to regions common among all q-Tags, similar to previously described protocols (Bellerose et al., 2020; Blumenthal et al., 2010; Martin et al., 2017). In each PCR, a unique molecular counter was incorporated into the sequence to allow for the accurate counting of input templates and account for PCR jackpotting. The libraries were sequenced to 1000-fold coverage on an Illumina NextSeq platform using a 150-cycle Mid-Output kit with single-end reads. Total abundance of each mutant in the library was determined by counting the number of reads for each q-Tag with a unique molecular counter. Relative abundance of each mutant in the pool was then calculated by dividing the total abundance of a mutant by the total abundance of reads for wild-type H37Rv. The relative abundance was normalized to relative abundance at initial infection (Day 0) and log2 transformed. Fitness was calculated as previously described (Palace et al., 2014). Burden and normalized counts from all Mtb mutants in each mouse are provided in Figure 4—source data 2.

For CC030 x CC029 F2 infections, 251 F2 mice (including equivalent numbers of male and female mice) were infected via IV route with an infectious dose of 105 CFU of TnSeq library (as described above), to replicate the original CC infection experimental conditions. Mice were sacrificed at 1 month post-infection and bacterial burden was quantified by plating for CFU (as described above).

Quantification and statistical analysis

TnSeq analysis

Request a detailed protocol

TnSeq libraries were prepared and counts of each transposon mutant were estimated as described (Long et al., 2015). NCBI Reference Sequence NC_018143.1 was used for H37Rv genome and annotations. A total of 123 libraries were sequenced, capturing 60 distinct mouse genotypes. In the majority of cases, two replicate mouse libraries were used per genotype. Only a single TnSeq library was obtained for CC010, CC031, CC037, CC059, CC016, and PWK/PhJ. Insertion mutant counts across all libraries were normalized by beta-geometric correction (DeJesus et al., 2015), binned by gene, and replicate values for each mouse genotype averaged. Mean values for each gene were divided by the grand mean then log2 transformed and quantile normalized. The resulting phenotype values were used for both WGCNA and QTL mapping.

To eliminate genes having no meaningful variation across the mouse panel, statistical tests of log2 fold change (LFC) in counts between all possible pairs of mouse genotypes were performed by resampling (DeJesus et al., 2015). 679 ‘significantly varying genes’ were identified whose representation varied significantly (FDR < 5%) in at least two independent comparisons. For relative mutant abundance estimates, LF C in counts between in vitro-grown H37Rv (six replicate libraries) vs libraries from each mouse genotype were determined by resampling as above. LFC, Q-values and modules for TnSeq data across the mouse strains is available in Figure 4—source data 1.

WGCNA analysis

Request a detailed protocol

Weighted gene correlation network analysis (WGCNA) was applied to categorize the 679 significantly varying genes into 20 internally-correlated modules (Langfelder and Horvath, 2008). Modules were filtered (intramodular connectivity >0.6) to obtain the most representative genes. First principal component scores of module eigengenes were used as phenotype values for QTL mapping after first winsorizing (q = 0.05) using the R package broman (https://cran.r-project.org/web/packages/broman/index.html).

In order to perform association analysis between modules of genes and clusters of mice (Figure 5B), the mice were clustered based on the matrix of TnSeq LFCs for significantly varying genes using hclust in R (with the ‘Ward.D2’ distance metric). Then, for each module of genes, the LFCs in each cluster of mice were pooled and compared to all the other mice using a t-test, identifying modules with a mean LFC in a specific mouse cluster that is significantly higher or lower than the average across all the other mice. The resulting p-values over all combinations of gene modules and mouse clusters were adjusted using Benjamini-Hochberg for an overall FDR < 0.05.

Disease-related trait analysis and heritability estimation

Request a detailed protocol

For the trait heatmap, trait values were clustered (hclust in R package heatmaply; traits scaled as per default function) and dendrogram nodes colored by 3 k-means. Correlation between disease-related TB traits for both IV and aerosol validation experiments was determined by Pearson’s correlation and visualized using corrplot (ordered by hclust) (Figure 1—figure supplements 1 and 2). Heritability (h2) of the immunological and TB disease-related traits was calculated by estimating the percent of variation attributed to between strain differences relative to within strain noise as previously described (Appendix1) (Noll et al., 2020). This is explicitly: SS(strain)/SS(total) in an ANOVA table (where SS(total) is SS(strain)+ SS(error)) (SS; sum of squares). p-Values were calculated by ANOVA and multiple-test corrected using the Benjamini-Hochberg method. Throughout the text, correlations are cited using the following standardized nomenclature: 0–0.19 = very weak, 0.2–0.39 = weak, 0.40–0.59 = moderate, 0.6–0.79 = strong, 0.8–1.0 = very strong correlation.

Genotyping and QTL mapping

Request a detailed protocol

A subset of the inbred CC mice used in the analysis were genotyped on the GigaMUGA array (Morgan et al., 2015) available from Neogen Inc The inbred parents, F1s and F2 mice from the CC030xCC029 cross were genotyped on the MiniMUGA array (Sigmon et al., 2020) at Neogen Inc, For CC030 x CC029 F2 QTL analysis, markers were filtered to 2499 markers that differentiated between CC029 and CC030 haplotypes (Figure 3—source data 1). For QTL mapping in the F2 panel, genotype (Figure 3—source data 1) and lung burden data (Figure 3—source data 2) from 251 Mtb-infected F2 individuals was imported into R (version 3.6.1) and formatted for R/qtl2 (version 0.20) (Broman et al., 2019). QTL mapping incorporated kinship as a covariate using the LOCO (Leave One Chromosome Out) method. Further, sex and infection batch were also considered as covariates for mapping. LOD scores were calculated within R/qtl2 to assess genotypic associations with lung burden at each marker. QTL significance thresholds were established by 10,000 permutations.

For QTL mapping in the CC panel, the Most Recent Common Ancestor (Srivastava et al., 2017) 36-state haplotypes were downloaded from the UNC Systems Genetics Core Facility and simplified to 8-state haplotype probabilities (for the 8 CC founder strains), which is appropriate for additive genetic mapping. We generated 36-state haplotype probabilities from the individual CC mice genotyped on GigaMUGA and combined these data with the MRCA data to obtain a common genome cache.

For CC QTL analysis, genotype and phenotype data were imported into R (version 3.6.1) and reformatted for R/qtl2 (version 0.20) (Broman et al., 2019). Individual TnSeq and clinical trait phenotype values were winsorized (q = 0.006) as above. GigaMUGA annotations were downloaded from the Jackson Laboratory, and markers were thinned to a spacing of 0.1 cM using the reduce_markers function of R/qtl2. The final genetic map contained 10,067 markers. QTL mapping was carried out using a linear mixed model with LOCO (leave one chromosome out) kinship. For clinical trait scans, batch (denoted by ‘block’ in Figure 1—source data 1) was included as an additive covariate. Significance thresholds for QTL were estimated using 10,000 permutations (scan1_perm function). For each trait, the maximum LOD scores from the permutation scans were used to fit generalized extreme value distributions, from which genome-wide permutation p-values were calculated. LOD profiles and effect plots were generated using the plotting functions of the R/qtl2 package. Multiple QTL at similar genetic locations were assessed for independence using qtl2pleio with 400 bootstrap samples (Boehm et al., 2019). The quantile-based permutation thresholding method of Neto et al., 2012 was used to assess the statistical significance of co-mapping traits. The NL-method, which determines the LOD thresholds controlling genome-wide error rate for a given p-value and ‘hotspot’ size, was employed.

Candidate gene prioritization

Request a detailed protocol

To identify potential candidate genes, we focused on three QTL that were either statistically significant (Tip5, Hip42) or were validated by intercross (Tip8). For each QTL interval (determined by Bayesian interval in qtl2), we identified mouse genes that were in differentially expressed modules between infected lungs of resistant and susceptible mouse strains (Moreira-Teixeira et al., 2020). Of these genes, we next used the Sanger sequence data (Keane et al., 2011) to filter on genetic variants segregating between CC founder haplotypes. Where there were clear causal haplotypes, we further filtered to genes with missense or nonsense variants.

Appendix 1

Appendix 1—table 1
Heritability (h2) estimates for each measured TB-disease associated phenotype (Tuberculosis ImmunoPhenotypes;Tip).

h2 was calculated from the percentage of variation attributed to strain differences in each trait across the CC strains, as previously described (Noll et al., 2020). P-values were calculated by ANOVA and multiple-test corrected using the Benjamini-Hochberg method. Weight change is the percentage of weight (grams), CFU/CEQ is log10 transformed, cytokines are measured in pg/mL lung homogenate and log10 transformed.

Traith2 (%)p-valueAdj. p-value
IFN-γ87.707.90E-202.21E-18
Lung CFU83.305.83E-158.16E-14
Lung CEQ80.552.65E-142.47E-13
CXCL181.578.19E-145.73E-13
MIG81.191.62E-139.09E-13
MIP-280.823.07E-131.43E-12
IP-1080.139.73E-133.89E-12
M-CSF79.333.54E-121.24E-11
IL-1778.857.43E-122.31E-11
MIP-1α78.022.53E-117.08E-11
G-CSF77.713.98E-111.01E-10
MCP-177.051.00E-102.34E-10
IL-1α75.626.60E-101.42E-09
IL-673.974.89E-099.77E-09
RANTES73.706.63E-091.20E-08
Spleen CFU72.946.83E-091.20E-08
LIF73.488.50E-091.40E-08
VEGF73.081.34E-082.08E-08
IL-1β71.666.11E-089.00E-08
Weight Change67.568.75E-071.22E-06
MIP-1β68.511.23E-061.65E-06
TNF-α66.387.39E-069.41E-06
LIX65.301.70E-052.07E-05
Eotaxin64.273.63E-054.23E-05
IL-1063.974.48E-055.02E-05
IL-262.879.59E-051.03E-04
Spleen CEQ60.431.09E-041.13E-04
IL-956.793.19E-033.19E-03

Data availability

All relevant data to support the findings of this study are located within the paper and supplementary files. Genome sequence data is deposited in the NCBI Gene Expression Omnibus (GEO), accession number GSE164156. All raw phenotype values and QTL mapping objects are located on GitHub @sassettilab in the https://github.com/sassettilab/Smith_et_al_CC_TnSeq, (copy archived at swh:1:rev:2ded9735b23d9780eb7872eb55625cff35090430) repository.

The following data sets were generated
    1. Smith CM
    (2021) NCBI Gene Expression Omnibus
    ID GSE164156. Host-pathogen genetic interactions underlie tuberculosis susceptibility in genetically diverse mice.

References

    1. Caruso AM
    2. Serbina N
    3. Klein E
    4. Triebold K
    5. Bloom BR
    6. Flynn JL
    (1999)
    Mice deficient in CD4 T cells have only transiently diminished levels of IFN-gamma, yet succumb to tuberculosis
    Journal of Immunology (Baltimore, Md 162:5407–5416.

Decision letter

  1. Bavesh D Kana
    Senior and Reviewing Editor; University of the Witwatersrand, South Africa

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "Host-pathogen genetic interactions underlie tuberculosis susceptibility" for consideration by eLife. Your article has been reviewed by 5 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The reviewers have opted to remain anonymous.

We are sorry to say that, after consultation with the reviewers, we have decided that your work will not be considered further for publication by eLife.

Specifically, reviewers concur that your manuscript reports some interesting data. The application of the CC to study the host immune response to TB will yield useful findings. However, reviewers raised concerns regards the statistical approach taken to assess differences in phenotypes/outcomes between CC strains. The primary concern is that the numbers of animals per group sampled is small. When this is combined with the expected heterogeneity in the data, the statistical comparisons as reported are difficult to interpret. After extensive discussions, we were unable to come to a clear sense of whether the observations are robust. Based on this we have opted to reject the manuscript in its current form. That said, the manuscript does report some interesting data that would constitute a useful resource for the field. Hence, we would be willing to consider it again, if you feel you can address the comments. In this context, reviewers have made some extensive recommendations. Please consider these and the suggestions for statistical analyses. After reconsidering, the aerosol infection experiments suggested by reviewer 2 would not be necessary in a revised version, assuming that you can provide reassurance as to the robustness of the findings by clarifying or revising statistical methods. Acceptance of the revised manuscript is not guaranteed

Reviewer #1:

In two previous papers (2016 and 2019, mBIO), Smith et al. have used collaborative cross mice to better understand host susceptibility to TB. Those papers used aerosol infection.

In this work, the authors use intravenous infections so as to be able to test a library of bacterial mutants that the lab has previously constructed with the view of identifying the comprehensive set of virulence genes required for infection. the goal of this work would be to understand better the differences in individuals who are more or less resistant to TB and how the pathogen's wider gene set allows it to infect a greater proportion of people successfully.

In two previous papers (2016 and 2019, mBIO), Smith et al. have used collaborative cross mice to better understand host susceptibility to TB. Those papers used aerosol infection.

In this work, the authors use intravenous infections so as to be able to test a library of bacterial mutants that the lab has previously constructed. The ability to use large numbers of bacteria delivered intravenously should theoretically avoid the bottlenecks resulting from aerosol infection. However, possibly because of the small numbers of mice used from each genotype (2-6, average 3), the data are noisy. This is obvious even in Figure 1A where bacterial burdens in only 15/59 groups are statistically different from the B6 mouse used, using relatively nonstringent statistical analyses. This most likely represents experimental variability, which disadvantages the rest of the work and its conclusions.

The problem becomes obvious early in the paper when they look at the correlation between lung and spleen bacterial burdens. An R2 of 0.43 is not a moderate correlation but a poor correlation. Either there is an interesting reason for this or more likely it simply reflects small numbers. The correlations between bacterial burdens and cytokines/chemokines are also week to nonexistent. Here too, the lung – spleen disparity is a problem. For instance, IL-1b and TNFa are weakly correlated for lung burdens and not at all for spleen. For IFNγ, which the authors make a point of detailing, the data do not support their statements. There is no correlation even if the two or three high (#) strains are excluded. This makes the TipQTL analysis of genetic variants that control TB immunophenotypes also suspect and the statistics are weak. The argument that Tip8 is a strong predictor of lower bacterial burden is also not convincing based on Figure 3B. First, there is strong overlap in the bacterial CFU between samples 2 and 3, one of which has the Tip8 CAST and the other which does not. Second, the last group which also contains the CAST at Tip8 looks not different from the third one. What would be the reason for that? Overall, these data are not statistically robust. This looks like noise.

It is also not clear these new findings relate to their previous ones in Smith et al. 2019 mBio. In the current paper, the authors follow up by infecting a B6/CAST cross by the aerosol route to conclude that the bacterial control of CAST mice is independent of IFNγ-secreting cells. In the previous paper, they perform a different cross, CC001/CC042 to also dissociate bacterial control from IFNγ production. How do the current findings relate to the previous ones? The previous ones are far from understood and it feels as if the additional data are more confusing than clarifying.

On this weak framework, the authors try to determine if the CC mice can reveal the extended requirement for pathogen genes that are not reflected in single mutant strains. This work begins in Figure 4. (First, a minor point, I compute the genes in B6 to be 159+53=212 whereas on page 15, line 275 states that they identified 234 genes.) Beyond this, I have several questions:

1) The authors state that while the pathogen gene sets were different from mouse strain to mouse strain, the number of genes required in each strain was roughly the same. What is their explanation/model for this constant number, given a 3-log difference in bacterial burdens between the strains reflecting enormous differences in susceptibility? How could this model be tested? This is an important point.

The authors report a common set of ~ 130 genes. Are there other common subgroups defined by common pathogen gene sets? These types of analyses are essential to both validating the system and in so doing may give biological information. What about the additional genes in the CC subset over the immunodeficient KO set? What can be learned from them?

And beyond this, I do not follow their further analyses and am just not convinced that they are meaningful. There seems to be a lot of statistical massaging with seemingly erudite analyses but with the small number of mice used per genotype, the large error and the glossing over of problems, it is hard to see how this paper represents a significant advance.

I hope that the paper is being co-reviewed by individuals expert in mouse genetics and in statistics or in both. If not, this will be essential.

Reviewer #2:

This work takes advantage of the genetic diversity of a panel of mice (called the collaborative cross (CC) mice) to identify mice that produce heterogeneous outcomes after M. tuberculosis (Mtb) infection. The hope is to better recapitulate the diversity of outcomes of Mtb infection seen in humans, a current major limitation of the dominant B6 mouse model. While some important human Mtb infection outcomes (most notably, latency) are apparently not captured by the CC mice, there is nevertheless an impressive variation in infection outcomes in the CC panel (>100-fold range in lung CFU burdens, for example). There is also large variation in host responses. For example, CAST mice control infection but are found to lack an IFNγ response thought previously to be critical for Mtb control based on studies in B6 (but known to be similarly absent from some human controllers). The authors also infect the CC mice with pools of transposon mutants, allowing identification of specific host genotypes that confer fitness effects on specific bacterial mutants. The resulting analyses identify loci that affect quantitative immunological phenotypes (TipQTL) or fitness of bacterial mutants (HipQTL). No causative genetic variants underlying these QTL are validated by independently generated mutation, so in some sense, the "discoveries" here are limited. However, the impact of this work is likely to be great as a resource that can be mined by investigators in the field to identify specific mouse strains in which, for example, a specific bacterial mutant has a phenotype, or a particular host response is observed, greatly facilitating followup work.

Given the lack of a major new biological insight or mechanism, perhaps this should be considered as a "Tools and Resources" paper (https://reviewer.elifesciences.org/author-guide/types) rather than a regular research paper.

The use of a high-dose IV model of infection, while necessary for TnSeq experiments, may limit the relevance of the observations to pulmonary TB. This concern is somewhat ameliorated by the fact that the susceptibility of the parent strains largely recapitulates the expected phenotype based on previous aerosol infections. However, for a few key CC mice with interesting phenotypes (e.g., the exacerbated weight loss in CC030 or CC040) it would be nice to validate in a replication experiment and that the phenotype is consistently seen even with aerosol infections. The experiment in Figure 3 could potentially serve as such validation, but it is not clear whether this experiment was using IV or aerosol infection (please specify this clearly in the legend and text regardless).

Line 143: please specify in the text the dose (CFU) of bacteria that were delivered IV. This is important information and should be in the main text.

Line 159: "Thus, the CC panel encompasses a much greater quantitative range of susceptibility than standard inbred lines." – this is not obviously true. The range of CFU budens seen from NOD to WSB appears to capture most of the range of the CC panel. Perhaps change the wording here?

Line 208: is it really "none", as in not a single IFNγ+ T cell? The bar graph suggests that there is some, so perhaps revise this sentence.

Line 251: "mice that contained CAST at Tip8, Tip10 or both loci had reduced CFU burden". It appears from the data in Figure 3 that in fact the only statistically significant difference is for the Tip8 locus, and that the presence of the CAST Tip10 allele abolishes this difference. If that is correct, the manuscript needs to be clear that in fact the Tip10 locus did not validate (statistically).

Line 274: "Consistent with our previous work, we identified 234 Mtb genes that are required for growth or survival in Mtb in B6 mice". In fact, how consistent are the data? Of the 234 genes identified, how many were previously identified? How many of the previously identified genes were in the set of 234? Please specify so that the readers can get a sense of the experiment-to-experiment variability of the approach.

Why would more Mtb genes be essential in Rag1-/- mice as compared to WT mice? This seems unexpected to me and perhaps deserves some comment? What are these genes – could this be discussed at least a little? Line 340 mentions that ESX genes are among those preferentially required in Rag1-/- (though obviously they are also required in B6, so presumably this does not account for the 'additional' genes required in Rag1-/-). Could the authors elaborate a bit more on what they think is going on here?

Line 281: "As more CC strains, and presumably more distinct immune states, were included in the analysis, the cumulative number of genes necessary for growth in these animals also increased". It is possible that repeating the experiment with more and more B6 animals would also increase the cumulative number of genes necessary? In other words, might the sensitivity of the TnSeq method be what is limiting here as opposed to host genetic diversity? I presume the authors have a good argument against this concern; it would be helpful for readers to hear it.

Line 295: "the resulting mini-pool was subjected to in vivo selection in the same manner as the TnSeq study" – does this mean that the minipool was used to infect ALL the same mouse strains (all the CC mice) as in Figure 4A? This isn't clear. A related concern regarding Figure 4C: why are selected mouse strains shown in this figure? Are they the strains that showed the biggest signal in the TnSeq experiment? The worry is they were cherry picked. This could be explained better.

Line 298: "relative abundance predicted by TnSeq was reproduced with deletion mutants" – this is difficult to evaluate because the fitness of each of the mutants in each of the strains in Figure 4A is not stated. I realize the full dataset of all the TnSeq data is available as a spreadsheet but it would be nice if the reader didn't have to dig through that. Ideally it would be great to create a searchable website in which investigators could enter their gene of interest and the strains in which the mutant shows a phenotype (and the magnitude of the phenotype) could be displayed.

Line 312: again, it is not clear what the TnSeq predicted. Please state that result here too.

Line 435: which HipQTL are associated with both ahpC and eccD1? Please specify. I don't see this information in the cited figure either (Figure 4E). Please make sure this is correct.

Reviewer #3:

In this elegant and expansive study, the authors have studied bacterial adaptive response in the context of variable immune pressure exerted by diverse hosts. They used the Collaborative Cross (CC) mouse panel in conjunction with a library of Mtb transposon mutants to uncover M. tuberculosis genetic requirements with host genetics and immunity. The authors report several new findings. For example, they report that several Mtb virulence pathways become unmasked only under specific host microenvironments. They also found that a number of genotypes were able to control bacterial replication but had very low levels of IFNγ. Also, dependent on the genetic composition of the host, related disease traits such as lung bacterial burden and weight loss could be dissociated. Another significant outcome was identifying candidate genes underlying the QTLs and showing that the QTLs were generally distinct from CC loci that control immunity to other pathogens.

Overall, I find this to be a complete and thorough study.

Reviewer #4:

I reviewed this study only to evaluate the robustness of the results and the statistical evaluations used in the study. No attention was paid to novelty or scientific interest. Some of the statistical approaches used by the authors were beyond my expertise and require further analysis by a professional statistician to which I lay no claim to being. This study used multiple simultaneous comparisons of both animal strains and bacterial strains, with what appears to be generally quite small sample size, non-replicated experiments. Data files that are presented either omit entirely some experiments or provide only summary data of mean values without indicating numbers of subjects, measures of variability such as standard deviation; no information on the results of replicate studies, if any were performed, are in these data files. Therefore I find it impossible to determine if many of the "significant" associations are robust. I give some examples below:

1) Data for Figure 1. The data set for figure 1 only summary data. There is no indication of the number of animals studied in each group and no indication of standard deviation. Also, there is no indication in the data set or in the text if any of the experimental results for 1A to 1D were based on replicate studies. The underlying data for figures E to H are not available in the figure 1 data file; the text gives the range of subject sizes and states that two independent experiments were performed, but it is unclear if the data presented in the figures represents both experiments, just one experiment, or if the data presented includes both experiments whether the results were simply combined or if there was some normalization performed. The statistical analysis method does not say whether the data were log-transformed before analysis. All of the author's conclusions may be absolutely correct but it is difficult to know this without more detailed information.

2) Data for Figure 4. The data set for this figure only includes the underlying data for 4A. No underlying data are given for 4C, D and E. No indication is given of the number of repeated experiments, and if these were performed what the results were. No method is given for the statistical analysis used in 4D. The authors conclude that there are significant differences between mouse strains for several Mtb strains, but give no information on the numbers of mice studied, variation within groups, repeat experiments, etc; also no statistical analysis is performed. Again, the author's conclusions may be absolutely correct but it is difficult to know this without seeing the underlying data.

While the authors may believe that their answers to the transparent reporting form were complete and accurate, based on what I have found I'm not certain that this is correct.

Reviewer #5:

Sample size and reproducibility. The groups were small (average of 3 mice, sometimes only 2), which could result in a high variance per group. It would be important to evaluate how reproducible are the phenotypic measurements per strain. The authors mention reproducibility per genotype (Page 8 line 177), but they do not clearly show it. Perhaps a dimension reduction approach (PCA or UMAP) could be used to evaluate the distance between replicates. As specific examples: How many animals were used per group in outlier strains (Figure 1 C and D)? What was the variance for the y-axis measurements? Could variance explain the outlier nature of the strain?

QTL analysis. The CC QTL analysis faces similar challenges to genetic studies in humans. Overlooking complex population structure can lead to inflated false positives but can also prevent harvesting the full potential of the samples. Limitations of the QTL analysis are briefly introduced (Page 11 line 241) but the explanation in methods is convoluted and difficult to follow. The study could benefit from using their SNPs data for genome-wide association testing using all strains correcting by population structure. This strategy was shown to be robust when performed to a study of similar framework using different inbred strains and their crosses (PMID 32365090). This method could address sample size and QTL as the animals would be treated as unique samples (not per group) correcting by population structure (similarities between samples in the dataset). If correct, this would be a more straightforward genome-wide analysis.

Detailed comments:

In methods, it is stated that TipQTL significance was assessed using permutations. However, in the Results section and in table 1 it is not clearly stated if the p values are raw or corrected via the permutation.

Line 63: Co-existence of Mtb and Homo sapiens is now widely accepted, perhaps 2000-6000 years. Has co-evolution been demonstrated or is this hypothesized? Same point with the role of genetic variation presented as established.

When describing correlations, suggest a standard terminology: Strong, Moderate, Weak. Highly correlated is of uncertain meaning. On line 182 the R2 was 0.57 but the text said correlated. Some use the following: 0-0.19 = very weak, 0.2-0.39 = weak, 0.40-0.59 = moderate, 0.6-0.79 = strong, 0.8-1 = very strong correlation

Line 151. What does it mean when the susceptibility is largely consistent with other studies. Can the authors spell out the inconsistencies somehow?

Line 220. What is reasonable statistical confidence?

Line 272. At some point, it would be useful to read in the text that the combined studies of WT Bl6, RAG-/-, IFN-/-, Nos2-/- and CYbb-/- mice produced x essential genes, and that the CC raised it from x to y. The paragraph tells us the 234 genes were identified in Bl6 and that the plateau was ~750. How far from 234 to 750 was reached with the 4 specified knockout mice?

Line 321. I did not understand what was complex about the kinetics. Perhaps this can be clarified?

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Host-pathogen genetic interactions underlie tuberculosis susceptibility" for consideration by eLife. Your article has been reviewed by 5 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Bavesh Kana as the Senior Editor. The reviewers have opted to remain anonymous.

Reviewers and I concur that your manuscript provides value to the field as a "Tools and Resource" paper for generating new hypotheses. The genotypic diversity explored in the study, together with an assessment of disease pathology/outcomes, provides a useful framework to pose important questions that are currently topical for tuberculosis. That said, it would be important to frame the study as a resource rather than trying to probe specific questions with the current dataset, which can be limited by the depth of data available currently in the work. I encourage you to consider carefully how best to present this to the field in a manner that acknowledges the limitations and but also creates the space for innovative and explorative future work.

To facilitate this, please address the following concerns:

1. For a paper reporting a potentially useful resource, it is imperative that all data be publicly available for further interrogation and exploration. Reviewers expressed concern that this is not the case with your submission. For the GEO link, data appear available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164156. However, for the GitHub link, the Smith repository was not located. There was only one repository (Bellerose): https://github.com/sassettilab?tab=repositories. Please address this.

2. In lines 145-147, it is stated: Groups of 3-6 male mice per genotype were infected and TB disease-related traits were quantified at one-month post-infection. Data from all surviving animals are provided in Figure 1-sourcedata1. However, the source data file indicates that the N is 1-3 for most of the data, except for two strains, Bl6 and CC010, which are reported to have 6 mice. There are asterisks for significance in only in a minority of the bars (~15) in Figure 1A. What about the remainder? What n's did they have? Too few to determine significance? This should be clarified.

3. Explain why an R value of 0.43 would be considered a moderate correlation, when elsewhere you suggest that there could be biological differences that may make this correlation weak. Similarly, it is often the case that in inflammatory processes, TNF and IFNγ track together, even if made by different cells, which is not even strictly the case. These are two examples of continued disagreement. Please reconsider carefully how you present these data.

4. Line 184-188. The "significant" associations described in this section are for non-replicate studies with N=2 (CC30, 40 and A/J). Perhaps reconsider the use of "significant" in this context. In Line 253, this experiment used a large N and hence likely produced robust results.

5. Figure 1 A. Giving results of an ANOVA with post-hoc analysis comparison of means to a single mouse strain (C57/BL6) is difficult to interpret because of the potential bias from post-hoc decisions about which strain serves as a comparator. For example, using AJ as a comparator shows no significant differences with other strains. For this reason, excluding this analysis is warranted.

6. Line 199: "A number of genotypes were able to control bacterial replication yet had very low levels of IFNΓ (Figure 1D)" -- for the benefit of the reader, please specify which genotypes fall into this category. Ideally this would be specifically mentioned in the text and in addition the genotypes of interest would be labelled in the figure. Similarly for Figure 1C, it would be helpful if the significant "outlier" genotypes were labelled in the figure so the reader can know which these are.

7. Line 487: "Two TipQTL overlapped with HipQTL (Figure 7)…". Please specify in the text which overlapping QTL are being referred to here. It seems evident from the figure but specifying would assist the reader here.

8. Some of the type on the figures is really tiny, e.g., 1A, 1B, most of figure 4, parts of figure 6, etc. Enlarge the panels with the tiniest type so that they are more readable.

Reviewer #1:

The main criticism of the initial submission was the small sample size per CC genotype. Admittedly, and as expected, a number of animals within the susceptible CC genotype died during the study. For this reason, 23 out of the 60 CC genotypes had less than three mice per group, including 3 out of the 4 top CC strains (CC032, CC037, and CC027). In this revised version of the manuscript, the authors addressed the criticism by replicating the difference in lung CFU, weight, and selected cytokines between B6 and the top four CC strains. Moreover, based on the response to reviewer 1 they have indicated robust literature supporting that sample size per group did not impact the statistical power of their analyses. As far as I am concerned, they have addressed most of the reviewer's comments properly. Some of the analyses suggested in the first round of review were not formally evaluated. However, given that the paper was now submitted as a "Tools and Resources" it should be possible for independent groups to explore the data produced by Smith, C. et al. in details or using different analytical approaches. I have not further comments

Reviewer #2:

Overall I remain enthusiastic about this manuscript. It has been a longstanding question whether the diversity of immune responses seen in humans could be modeled in mice and this manuscript represents the most substantial (by far) attempt to address this question. Although some key human phenotypes (e.g., "latency") do not appear to be recapitulated in the CC mouse panel, there are definitely several interesting phenotypes reported that are not seen in the typically used B6 strain. For example, the identification of strains of mice that control TB despite low levels of IFNγ production is an interesting phenotype that may phenocopy the "resister" phenotype observed in some humans.

I think the quality of the work is overall high and that, given experimental constraints, sufficient effort was expended to assure the rigor and reproducibility of the datasets.

Originally my main concern was that the paper reports little new in the way of biological insight -- for example, the host genes underlying the TipQTL or HipQTL were not identified, and the underlying mechanisms responsible for the observed immune phenotypes were not dissected. Despite this, I think the manuscript is an important contribution if published as a "Tools and Resource" paper, which is an idea the authors appear to have embraced in their response to reviews. In particular, identification of CC strains which give phenotypes for hundreds of specific bacterial mutants (that give no phenotype in the standard B6 strain) is likely to be extremely helpful to investigators in the field.

1) Please confirm that this manuscript is now submitted as a "Tools and Resource" paper.

2) Line 199: "A number of genotypes were able to control bacterial replication yet had very low levels of [IFNγ] (Figure 1D)" -- for the benefit of the reader, please specify which genotypes fall into this category. Ideally this would be specifically mentioned in the text and in addition the genotypes of interest would be labelled in the figure. Similarly for Figure 1C, it would be helpful if the significant "outlier" genotypes were labelled in the figure so the reader can know which these are.

3) Line 487: "Two TipQTL overlapped with HipQTL (Figure 7)…". Please specify in the text which overlapping QTL are being referred to here. I think it is evident from the figure but I would be reassured I am interpreting the figure correctly if you could be specific here.

4) Some of the type on the figures is absurdly tiny, e.g., 1A, 1B, most of figure 4, parts of figure 6, etc. I would suggest enlarging the panels with the tiniest type so that they are more readable.

Reviewer #4:

The sole purpose of my review was to determine if the described experiments were robust, in terms of sample sizes and replication.

The data provided in the manuscript are not robust, with a few exceptions. The experiment studying 60 different mouse strains analyzed for 21 different outcomes (1260 total possible comparisons) utilized a single experiment with mouse strain numbers ranging from 1 to 6 (not 3 to 6 as specified in the manuscript), where the N was 1 for one mouse strain, 2 for 22 strains, 3 for 35 strains and 6 for 2 strains. No replicate studies were performed for the entire group. The chances of analytic variability and assignment of significance to interactions between mouse strain and measured outcome are far too great to regard these data as anything other than hypothesis generating data requiring confirmation. The authors say that the experiment represented in figures 1E-H validate their global conclusions for this data set, whereas they really only validate results for five different mouse strains studied for four different outcomes, two of which are highly correlated (lung cfu and weight loss). This validation study was not repeated, to demonstrate reproducibility, which would bolster those results.

There continues to be a lack of raw data availability supporting some of the various experiments presented in the manuscript. The Github file specified in the manuscript is either not yet posted or not available publicly (and therefore not to me). The data underlying the experiments presented in figure 1 is available only in summary form (mean, N, SD) but not for each mouse studied. Neither the raw nor summary data for figure 4D are not found in any of the data files, and specifically not in the Figure 4 supplemental data 2 file.

Specific comments

1) 184-188. The "significant" associations described in this section are for non-replicate studies with N=2 (CC30, 40 and A/J).

2) 253. This experiment used a large N and hence likely produced robust results.

3) Figure 1 A. Giving results of ANOVA with post-hoc analysis comparison of means to results to a single mouse strain (C57/BL6) is difficult to interpret because of the potential bias from post-hoc decisions about which strain serves as a comparator. For example, using AJ as a comparator shows no significant differences with other strains. For this reason, leaving out that analysis is warranted.

4) Figure 4 C data. Multiple comparisons are performed without apparent adjustment for such, for small N (3 to 5) non-replicate studies. Were these comparisons preplanned?

5) Figure 4D data. No raw data are available.

Reviewer #5:

1. In lines 145-147, the authors state: Groups of 3-6 male mice per genotype were infected and TB disease-related traits were quantified at one-month post-infection. Data from all surviving animals are provided in Figure1-sourcedata1.

I am puzzled by the 3-6 mice reported because my reading of the source data file is that the N is 1-3 except for two strains at most, Bl6 and CC010 which are reported to have 6 mice.

There are asterisks for significance in only in a minority of the bars (~15) in Figure 1A. What about the remainder? What n's did they have? Too few to determine significance?

2. It is puzzling why the authors would continue to defend their characterization of an R value of 0.43 as a moderate correlation, when they themselves point out that there could be biological differences that may make this correlation weak. Similarly, it is often the case that in inflammatory processes, TNF and IFNγ track together, even if made by different cells, which is not even strictly the case. These are two examples of continued disagreement.

3. The authors seem to be interested in converting this paper to a Tool/Resource. However, the narrative remains the same as when they wrote it as a Research Paper. For the paper to be considered as a Tool/Resource, the paper needs to be written as such, backing off from strong claims and rewritten as a hypothesis generating screen.

https://doi.org/10.7554/eLife.74419.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

Reviewer #1:

In two previous papers (2016 and 2019, mBIO), Smith et al. have used collaborative cross mice to better understand host susceptibility to TB. Those papers used aerosol infection.

In this work, the authors use intravenous infections so as to be able to test a library of bacterial mutants that the lab has previously constructed with the view of identifying the comprehensive set of virulence genes required for infection. The goal of this work would be to understand better the differences in individuals who are more or less resistant to TB and how the pathogen's wider gene set allows it to infect a greater proportion of people successfully.

In this work, the authors use intravenous infections so as to be able to test a library of bacterial mutants that the lab has previously constructed. The ability to use large numbers of bacteria delivered intravenously should theoretically avoid the bottlenecks resulting from aerosol infection. However, possibly because of the small numbers of mice used from each genotype (2-6, average 3), the data are noisy.

While this characterization of our group size is correct, we edited the text to make it clear that groups of 3-5 mice per genotype were infected (Line 145). Of the 60 strains, a small number of animals within susceptible CC genotype groups died during the study and were not phenotyped. Data from all surviving animals is provided in Figure1-SourceData1.

This is obvious even in Figure 1A where bacterial burdens in only 15/59 groups are statistically different from the B6 mouse used, using relatively nonstringent statistical analyses. This most likely represents experimental variability, which disadvantages the rest of the work and its conclusions.

We disagree that our data are “noisy”. In general, our ability to detect statistically significant differences in lung CFU between genotypes that vary by ~1 log is comparable to many other studies. Regardless, this is not a point to belabor, as this critique seems to reflect expectations that our study was never designed to fulfill. A classical “mouse study” is powered to detect even small differences in pairwise comparisons between groups. Our study was not designed for the pairwise comparison of genotypes. Instead, this is primarily a genetic mapping study, where the statistical power is derived from the inclusion of multiple genotypes that share a trait-associated haplotype (seven on average in our population). The proper estimate of “noise” in this type of study is based on the heritability of the variation (i.e. the proportion of the trait variation that can be attributed to genetics). Heritability estimates were provided in Figure1-FigureSupp3 of the original submission and show that greater than 80% of the variation in lung CFU and CEQ are due to genetic variation – not “noise”. We also note that the majority of similar genetic mapping studies include only a single animal per genotype. Our strategy to include multiple animals per genotype improves trait value estimation and the overall rigor and reproducibility of our work, relative to almost any other comparable published genetic study. This design is based on systematic studies by members of our group showing that the inclusion of additional CC strains improved mapping power more than additional replicates (Keele, Crouse et al. 2019) and that individual outlier CC strains were identifiable and statistically significant using as few as 2 mice per genotype. Using only 2 mice per strain, CC studies are powered equivalently to a ~400 outbred mouse study per trait (Keele, Zhang et al. 2021). Thus, far from being a limitation of our work, the inclusion of 3-5 animals per genotype makes our study among the most rigorous to date.

The problem becomes obvious early in the paper when they look at the correlation between lung and spleen bacterial burdens. An R2 of 0.43 is not a moderate correlation but a poor correlation. Either there is an interesting reason for this or more likely it simply reflects small numbers. The correlations between bacterial burdens and cytokines/chemokines are also week to nonexistent. Here too, the lung – spleen disparity is a problem. For instance, IL-1b and TNFa are weakly correlated for lung burdens and not at all for spleen.

This comment appears to be based on the reviewer’s own expectations for correlated traits. We both fundamentally disagree with these expectations, and suggest that the reviewer did not consider correlated traits for which there is a much more solid rationale. For example, the reviewer ignores the strong correlation between CFU and CEQ in both lung and spleen (r=0.8 and 0.88, respectively). Based on the criteria suggested by Reviewer 5, these represent “very strong” correlations that support the accuracy of our bacterial burden estimates and our heritability measures. Thus, the more modest correlation between lung and spleen burdens that the reviewer criticizes can be attributed to an important biological difference that is completely consistent with our current understanding of the differential importance of specific immune mechanisms in lung and spleen (e.g.(Sakai, Kauffman et al. 2016)). Similarly, it is unclear why the reviewer expects IL1-b and TNFa to correlate, as they are produced by different cells as a result of different signaling pathways. Much more reasonable expectations would be a correlation between IL-1a and IL-1b (r=0.84), or between the monocyte/granulocyte chemoattractants, MIP-1b, MIP-1a, MCP1, G-CSF, MIP-2, CXCL1 (r=0.89-0.95). These very strong correlations support the accuracy of our cytokine measurements. We ask the reviewer to reconsider these data with an open mind. If they still concerned, we also provide additional aerosol validation data in the new Figure 1 E-H.

In sum, we strongly disagree with this critique of our experimental design, group sizes, and data quality. However, we concede that our study was not designed for robust pairwise comparisons between strains and agree that additional experimental validation/replication strengthens the work. As a result, we have extensively revised Figure 1 to remove conclusions based on pairwise comparisons. These data have been replaced with a simple replication study, which demonstrates the reproducibility of our data in an aerosol infection model. We hope the reviewer agrees that this improves the rigor of the work, and supports subsequent analyses of the dataset.

For IFNγ, which the authors make a point of detailing, the data do not support their statements. There is no correlation even if the two or three high (#) strains are excluded.

This represents a misunderstanding, as the text clearly stated that IFNγ and CFU were not correlated. Regardless, these data have been removed to simplify the conclusions of Figure 1.

This makes the TipQTL analysis of genetic variants that control TB immunophenotypes also suspect and the statistics are weak.

We again note that QTL mapping is not based on simple pairwise comparisons but draws its statistical power from the presence of multiple strains that share a haplotype at a trait associated locus (on average, 7 strains per haplotype at any specified locus). The majority of previous QTL mapping studies use single animals per genotype, and our inclusion of multiple animals per genotype significantly increases the accuracy of our trait value estimations. Finally, this version of the manuscript, like the first, includes a heritability estimate for every trait. The heritability of most traits is >75%, and we are completely transparent in reporting the traits that are more prone to technical variation, which represent cytokines that are produced at relatively low levels. The text referring to this Figure1-FigureSupp3 table has been edited to accentuate its importance.

The argument that Tip8 is a strong predictor of lower bacterial burden is also not convincing based on Figure 3B. First, there is strong overlap in the bacterial CFU between samples 2 and 3, one of which has the Tip8 CAST and the other which does not. Second, the last group which also contains the CAST at Tip8 looks not different from the third one. What would be the reason for that? Overall, these data are not statistically robust. This looks like noise.

The previous Figure 3B compares the F2 offspring of genetically diverse CC strains. The observed variation in susceptibility between individuals in this population is not “noise”, but the result of genetic variation. This is both expected and necessary. In no situation would it be reasonable to expect that the distributions of CFU values between these populations would not “overlap”. Instead, we asked if the effect of the Tip8 locus could be discerned, even in the presence of other genetic variants segregating in the CC. The statistically significant difference identified between these populations represents rigorous support for the Tip8 QTL, akin to a replication cohort in a human GWAS study. To take this one step further, we have since genotyped a total of 261 F2 intercross mice that we infected with Mtb and quantified lung burden. We conducted QTL mapping and validated the chromosome 7 locus as a genome-wide significant QTL underlying lung CFU. We present this further validation in the new Figure 3C and text. We note that this kind of rigorous independent validation of a QTL is almost never performed before publication.

It is also not clear these new findings relate to their previous ones in Smith et al. 2019 mBio. In the current paper, the authors follow up by infecting a B6/CAST cross by the aerosol route to conclude that the bacterial control of CAST mice is independent of IFNγ-secreting cells. In the previous paper, they perform a different cross, CC001/CC042 to also dissociate bacterial control from IFNγ production. How do the current findings relate to the previous ones? The previous ones are far from understood and it feels as if the additional data are more confusing than clarifying.

The previous Figure 3B compares the F2 offspring of genetically diverse CC strains. The observed variation in susceptibility between individuals in this population is not “noise”, but the result of genetic variation. This is both expected and necessary. In no situation would it be reasonable to expect that the distributions of CFU values between these populations would not “overlap”. Instead, we asked if the effect of the Tip8 locus could be discerned, even in the presence of other genetic variants segregating in the CC. The statistically significant difference identified between these populations represents rigorous support for the Tip8 QTL, akin to a replication cohort in a human GWAS study. To take this one step further, we have since genotyped a total of 261 F2 intercross mice that we infected with Mtb and quantified lung burden. We conducted QTL mapping and validated the chromosome 7 locus as a genome-wide significant QTL underlying lung CFU. We present this further validation in the new Figure 3C and text. We note that this kind of rigorous independent validation of a QTL is almost never performed before publication.

On this weak framework, the authors try to determine if the CC mice can reveal the extended requirement for pathogen genes that are not reflected in single mutant strains. This work begins in Figure 4. (First, a minor point, I compute the genes in B6 to be 159+53=212 whereas on page 15, line 275 states that they identified 234 genes.)

We thank the reviewer for noting this mistake. It has been fixed (159+53+1+1 = 214 in Figure 4A).

Beyond this, I have several questions:

1) The authors state that while the pathogen gene sets were different from mouse strain to mouse strain, the number of genes required in each strain was roughly the same. What is their explanation/model for this constant number, given a 3-log difference in bacterial burdens between the strains reflecting enormous differences in susceptibility? How could this model be tested? This is an important point.

This is a very interesting effect that has been noted in previous publications. While it is not unreasonable to expect more genes to be necessary for fitness in stressful situations, this expectation is not supported by experimental data. Previous TnSeq studies find that while different gene sets are necessary for optimal growth/survival in resistant and susceptible animals, the size of these gene sets are roughly similar (Mishra, Lovewell et al. 2017). This observation is consistent with in vitro studies, where similarly sized gene sets are necessary for growth in different media conditions (Minato, Gohl et al. 2019). One must conclude that while some genes become more important in resistant animals, others become less so. While it is difficult to speculate further, we have added this concept to the discussion of the revised manuscript (lines 592-599).

The authors report a common set of ~ 130 genes. Are there other common subgroups defined by common pathogen gene sets? These types of analyses are essential to both validating the system and in so doing may give biological information. What about the additional genes in the CC subset over the immunodeficient KO set? What can be learned from them?

Identifying “common subgroups defined by pathogen gene sets” is the sole purpose of Figure 5 and the corresponding 600 words of text (lines 422-481). Validation of the system is confirmed by the identification of known functional pathways in TnSeq-derived gene modules (Figure 5A), and the association of these modules with mouse subgroups (Figure 5B). “Additional genes in the CC subset over the immunodeficient KO set” are presented in Figure 5B, along with functional annotation. While we appreciate that this comment was meant as a constructive suggestion, we do not know how to respond, except to note that this analysis was performed and included.

And beyond this, I do not follow their further analyses and am just not convinced that they are meaningful. There seems to be a lot of statistical massaging with seemingly erudite analyses but with the small number of mice used per genotype, the large error and the glossing over of problems, it is hard to see how this paper represents a significant advance.

I hope that the paper is being co-reviewed by individuals expert in mouse genetics and in statistics or in both. If not, this will be essential.

We can only hope the reviewer will reconsider their expectations. Our study was never designed to rely on single pairwise comparisons between genotypes, as a typical “mouse study” would. Instead, our study exceeds the rigor of many comparable genetic mapping studies by including multiple animals per genotype and including independent validation of an important QTL (Figure 3). Our “statistical messaging and seemingly erudite analyses” represent rigorous analytical strategies that are appropriate for our study design, and as cited, represent standards in the field (Churchill, Airey et al. 2004, Ferris, Aylor et al. 2013, Gralinski, Ferris et al. 2015, Graham, Swarts et al. 2017, Lorè, Sipione et al. 2020, Noll, Whitmore et al. 2020). Our mouse group size was entirely appropriate for our goals. We demonstrate that our data are not prone to “large error”. The implication that we are “glossing over problems” is unwarranted and vague. We trust that the reviewer will defer to other referees with expertise in mouse genetics and statistics to critique these aspects of the work.

Reviewer #2:

This work takes advantage of the genetic diversity of a panel of mice (called the collaborative cross (CC) mice) to identify mice that produce heterogeneous outcomes after M. tuberculosis (Mtb) infection. The hope is to better recapitulate the diversity of outcomes of Mtb infection seen in humans, a current major limitation of the dominant B6 mouse model. While some important human Mtb infection outcomes (most notably, latency) are apparently not captured by the CC mice, there is nevertheless an impressive variation in infection outcomes in the CC panel (>100-fold range in lung CFU burdens, for example). There is also large variation in host responses. For example, CAST mice control infection but are found to lack an IFNγ response thought previously to be critical for Mtb control based on studies in B6 (but known to be similarly absent from some human controllers). The authors also infect the CC mice with pools of transposon mutants, allowing identification of specific host genotypes that confer fitness effects on specific bacterial mutants. The resulting analyses identify loci that affect quantitative immunological phenotypes (TipQTL) or fitness of bacterial mutants (HipQTL). No causative genetic variants underlying these QTL are validated by independently generated mutation, so in some sense, the "discoveries" here are limited. However, the impact of this work is likely to be great as a resource that can be mined by investigators in the field to identify specific mouse strains in which, for example, a specific bacterial mutant has a phenotype, or a particular host response is observed, greatly facilitating followup work.

Given the lack of a major new biological insight or mechanism, perhaps this should be considered as a "Tools and Resources" paper (https://reviewer.elifesciences.org/author-guide/types) rather than a regular research paper.

This is a great idea. We agree completely and have submitted the revised version as a “Tools and Resources” paper.

The use of a high-dose IV model of infection, while necessary for TnSeq experiments, may limit the relevance of the observations to pulmonary TB. This concern is somewhat ameliorated by the fact that the susceptibility of the parent strains largely recapitulates the expected phenotype based on previous aerosol infections. However, for a few key CC mice with interesting phenotypes (e.g., the exacerbated weight loss in CC030 or CC040) it would be nice to validate in a replication experiment and that the phenotype is consistently seen even with aerosol infections. The experiment in Figure 3 could potentially serve as such validation, but it is not clear whether this experiment was using IV or aerosol infection (please specify this clearly in the legend and text regardless).

In response to this comment, as well as those of Reviewer 1, we have extensively revised Figure 1 to focus on experimental replication in an aerosol model and increase the value of the larger dataset. The new validation data using an aerosol infection model are presented in Figure 1 E-H. The legend to Figure 3 was edited as requested.

Line 143: please specify in the text the dose (CFU) of bacteria that were delivered IV. This is important information and should be in the main text.

This has been added (lines 143 in main text; also expanded in methods sections, lines 701-708).

Line 159: "Thus, the CC panel encompasses a much greater quantitative range of susceptibility than standard inbred lines." – this is not obviously true. The range of CFU budens seen from NOD to WSB appears to capture most of the range of the CC panel. Perhaps change the wording here?

Absolutely correct. The text has been edited (lines 151-155).

Line 208: is it really "none", as in not a single IFNγ+ T cell? The bar graph suggests that there is some, so perhaps revise this sentence.

These data were removed in response to Reviewer 1.

Line 251: "mice that contained CAST at Tip8, Tip10 or both loci had reduced CFU burden". It appears from the data in Figure 3 that in fact the only statistically significant difference is for the Tip8 locus, and that the presence of the CAST Tip10 allele abolishes this difference. If that is correct, the manuscript needs to be clear that in fact the Tip10 locus did not validate (statistically).

This was a very helpful comment. To build on this validation study of lung CFU, we have now generated a total of 251 F2 intercross mice between informative CC strains, genotyped, infected with Mtb, quantified lung burden at one-month post infection, and performed genome-wide QTL mapping. In this rigorous validation study, identified Tip8 as significantly associated with lung CFU, thus validating the Chr7 locus as the main QTL. We expand discussion of Tip10 as suggested (lines 255-271).

Line 274: "Consistent with our previous work, we identified 234 Mtb genes that are required for growth or survival in Mtb in B6 mice". In fact, how consistent are the data? Of the 234 genes identified, how many were previously identified? How many of the previously identified genes were in the set of 234? Please specify so that the readers can get a sense of the experiment-to-experiment variability of the approach.

The most comparable previous dataset was generated in BALB/c mice. The text now reports that 87% of the genes we report in B6 were also found in BALB/c, and nearly 100% of these genes were consistently identified in the larger CC panel (lines 290-296). These observations support both the sensitivity and specificity of our TnSeq analysis.

Why would more Mtb genes be essential in Rag1-/- mice as compared to WT mice? This seems unexpected to me and perhaps deserves some comment? What are these genes – could this be discussed at least a little? Line 340 mentions that ESX genes are among those preferentially required in Rag1-/- (though obviously they are also required in B6, so presumably this does not account for the 'additional' genes required in Rag1-/-). Could the authors elaborate a bit more on what they think is going on here?

Reviewer 1 also requested additional discussion of gene set sizes and functional insights. While we hesitate to speculate in too much detail, additional details are now included in the discussion (lines 592-599).

Line 281: "As more CC strains, and presumably more distinct immune states, were included in the analysis, the cumulative number of genes necessary for growth in these animals also increased". It is possible that repeating the experiment with more and more B6 animals would also increase the cumulative number of genes necessary? In other words, might the sensitivity of the TnSeq method be what is limiting here as opposed to host genetic diversity? I presume the authors have a good argument against this concern; it would be helpful for readers to hear it.

This was a very helpful suggestion. The revised manuscript includes a study verifying that the inclusion of additional B6 libraries does not substantially increase the estimate of genes necessary for growth. For this study, we randomly paired six replicate TnSeq libraries from 4-week post infection B6 mice to identify genes required for growth in each of the three runs vs cumulatively. The process was repeated 10 times with the results shown in Figure4-FigureSupplement 1 and show that a single “run”, which is equivalent to the data generated from each genotype in our larger study, was sufficient to identify the majority of genes, and adding additional datasets did not substantially increase the cumulative gene count.

Line 295: "the resulting mini-pool was subjected to in vivo selection in the same manner as the TnSeq study" – does this mean that the minipool was used to infect ALL the same mouse strains (all the CC mice) as in Figure 4A? This isn't clear. A related concern regarding Figure 4C: why are selected mouse strains shown in this figure? Are they the strains that showed the biggest signal in the TnSeq experiment? The worry is they were cherry picked. This could be explained better.

A small focused pool of bacterial mutants was used to infect a subset of CC mice, which were chosen based on TnSeq data. Thus, this experiment was designed to test a number of individual observations from the TnSeq dataset, and these are shown in the figure. We have added additional explanation of our rationale (lines 309-319). Importantly, we note that, “data from all reliably-detected strains is shown in Figure 4C” to alleviate the reviewer’s concern that the presented data were “cherry picked”.

Line 298: "relative abundance predicted by TnSeq was reproduced with deletion mutants" – this is difficult to evaluate because the fitness of each of the mutants in each of the strains in Figure 4A is not stated. I realize the full dataset of all the TnSeq data is available as a spreadsheet but it would be nice if the reader didn't have to dig through that. Ideally it would be great to create a searchable website in which investigators could enter their gene of interest and the strains in which the mutant shows a phenotype (and the magnitude of the phenotype) could be displayed.

Both the TnSeq data and the corresponding data from the minipool infection is provided in Figure 4C, so it should not be necessary for the reader to refer to the supplementary tables. The suggested searchable database including these data and many other publicly available datasets has been constructed, and will be reported in a separate publication.

Line 312: again, it is not clear what the TnSeq predicted. Please state that result here too.

Done (line 342-344).

Line 435: which HipQTL are associated with both ahpC and eccD1? Please specify. I don't see this information in the cited figure either (Figure 4E). Please make sure this is correct.

Done (line 457-458).

Reviewer #4:

I reviewed this study only to evaluate the robustness of the results and the statistical evaluations used in the study. No attention was paid to novelty or scientific interest. Some of the statistical approaches used by the authors were beyond my expertise and require further analysis by a professional statistician to which I lay no claim to being. This study used multiple simultaneous comparisons of both animal strains and bacterial strains, with what appears to be generally quite small sample size, non-replicated experiments.

The revised manuscript contains independent validation of the susceptibility and cytokine production traits in Figure 1 (including Figure1-SourceData2), independent validation of an important QTL in Figure 3 (including Figure3-SourceData1 and Figure3-SourceData2), and independent validation of the TnSeq predictions in Figure 4 (including Figure4-SourceData2). In all cases, our larger datasets have proven to be robust and reproducible.

Data files that are presented either omit entirely some experiments or provide only summary data of mean values without indicating numbers of subjects, measures of variability such as standard deviation; no information on the results of replicate studies, if any were performed, are in these data files. Therefore I find it impossible to determine if many of the "significant" associations are robust.

Several of the reviewer’s specific examples were quite helpful, and we respond to each below.

1) Data for Figure 1. The data set for figure 1 only summary data. There is no indication of the number of animals studied in each group and no indication of standard deviation. Also, there is no indication in the data set or in the text if any of the experimental results for 1A to 1D were based on replicate studies. The underlying data for figures E to H are not available in the figure 1 data file; the text gives the range of subject sizes and states that two independent experiments were performed, but it is unclear if the data presented in the figures represents both experiments, just one experiment, or if the data presented includes both experiments whether the results were simply combined or if there was some normalization performed. The statistical analysis method does not say whether the data were log-transformed before analysis. All of the author's conclusions may be absolutely correct but it is difficult to know this without more detailed information.

The reviewer highlights a significant oversight – the lack of details in our supplementary table. This has been remedied. We now provide the group size, mean, and standard deviation for each measurement. We have deposited all raw data in our Sassetti Lab GitHub repository, which is publicly available. The other comments refer to data that has been removed in response to Reviewer 1.

2) Data for Figure 4. The data set for this figure only includes the underlying data for 4A. No underlying data are given for 4C, D and E. No indication is given of the number of repeated experiments, and if these were performed what the results were. No method is given for the statistical analysis used in 4D. The authors conclude that there are significant differences between mouse strains for several Mtb strains, but give no information on the numbers of mice studied, variation within groups, repeat experiments, etc; also no statistical analysis is performed. Again, the author's conclusions may be absolutely correct but it is difficult to know this without seeing the underlying data.

The underlying data for 4A was all included in the previous submission (Figure4-SourceData1). This is also the source data for Figure 4E as mentioned in the figure legend (line 967). For 4C, all mouse numbers were in the legend (line 955) and we additionally include source data file of counts for each Mtb mutant in individual mice (Figure4-SourceData2). For 4D, the number of mice were in the legend (line 960) and we have added additional text to specify the statistical testing conducted between the lung burden of B6 and CAST mice at each specified timepoint (lines 960-961).

While the authors may believe that their answers to the transparent reporting form were complete and accurate, based on what I have found I'm not certain that this is correct.

We verify that the revised manuscript complies with the transparent reporting requirements.

Reviewer #5:

Sample size and reproducibility. The groups were small (average of 3 mice, sometimes only 2), which could result in a high variance per group. It would be important to evaluate how reproducible are the phenotypic measurements per strain. The authors mention reproducibility per genotype (Page 8 line 177), but they do not clearly show it.

The reviewer refers to our heritability analysis which was provided in Figure1-FigureSupp3. Heritability is the fraction of variation attributable to between strain differences relative to within strain noise. This is explicitly: SS(strain)/SS(total) in an ANOVA table (where SS(total) is SS(strain)+SS(error)) (SS; sum of squares). Our heritability analysis shows that >75% of variation could be attributed to genetic differences for the majority of traits. In the revised manuscript, we also provide p-values of each trait showing that genotype has a significant effect on all traits.

In addition, the revised manuscript includes group size, mean, and standard deviation for each trait and genotype in Figure1-SourceData1, validation of susceptibility and cytokine abundance traits in Figure 1 (Figure 1 E-H and Figure1-SourceData2), validation of an important QTL in Figure 3C (Figure3-SourceData1 and 2), and validation of TnSeq measurements in Figure 4 (Figure4-SourceData2). This provides a transparent assessment of technical variability, and demonstrates the reproducibility of our dataset.

Perhaps a dimension reduction approach (PCA or UMAP) could be used to evaluate the distance between replicates.

This is a very interesting idea, and we appreciate the suggestion. However, it is difficult to envision how PCA or UMAP could produce a quantitative assessment of genotype-dependent variation that is more robust than the heritability analysis we provide in Table S2.

As specific examples: How many animals were used per group in outlier strains (Figure 1 C and D)? What was the variance for the y-axis measurements? Could variance explain the outlier nature of the strain?

Outliers were identified by linear regression using studentized residuals, which accounts for intra-genotype variation. This is now explained more clearly in the main text (lines 179-183) and figure legend.

QTL analysis. The CC QTL analysis faces similar challenges to genetic studies in humans. Overlooking complex population structure can lead to inflated false positives but can also prevent harvesting the full potential of the samples. Limitations of the QTL analysis are briefly introduced (Page 11 line 241) but the explanation in methods is convoluted and difficult to follow. The study could benefit from using their SNPs data for genome-wide association testing using all strains correcting by population structure. This strategy was shown to be robust when performed to a study of similar framework using different inbred strains and their crosses (PMID 32365090). This method could address sample size and QTL as the animals would be treated as unique samples (not per group) correcting by population structure (similarities between samples in the dataset). If correct, this would be a more straightforward genome-wide analysis.

We appreciate the reviewer’s constructive suggestions. While the reviewer is entirely correct that in certain situations (such as the cited study, PMID 32365090), direct SNP-based approaches can allow for improved resolution of causal variants, we would argue that in many cases these SNP approaches would in fact further cloud the analyses.

The field standard for CC studies (Aylor, Valdar et al. 2011, Ferris, Aylor et al. 2013, Gralinski, Ferris et al. 2015, Keele, Crouse et al. 2019) has been the haplotype-based genetic mapping which also includes a kinship (or overall relatedness) correction (Churchill, Gatti et al. 2012, Broman, Gatti et al. 2019). This is largely due to the fact that (as shown in(Aylor, Valdar et al. 2011), in a finite population such as the CC, the strain distribution patterns of single SNPs can lead to false associations (due to long-range linkage disequilibrium). In the cited case, a simple monogenic trait (the white spot) was assessed, and the problem should only be exacerbated for multi-genic traits).

Furthermore, we have strong evidence presented here that several of our discovered QTL are due to multiple alleles (i.e. we find haplotypes associated with high, medium and low phenotypes). Members of our team have shown this causally before (Ferris, Aylor et al. 2013). As any given SNP cannot fully capture 3 (or more) states, the field has acknowledged that haplotype mapping is the state of the art.

We agree that in the future, with designs like the DO studies (Chick, Munger et al. 2016, Keele, Zhang et al. 2021), or F2 crosses (Smith, Proulx et al. 2019), will facilitate improved SNP (or copy + structural variant) mapping and we are excited to begin anticipating these.

Detailed comments:

In methods, it is stated that TipQTL significance was assessed using permutations. However, in the Results section and in table 1 it is not clearly stated if the p values are raw or corrected via the permutation.

This has been corrected.

Line 63: Co-existence of Mtb and Homo sapiens is now widely accepted, perhaps 2000-6000 years. Has co-evolution been demonstrated or is this hypothesized? Same point with the role of genetic variation presented as established.

Additional citations have been added (lines 63-64).

When describing correlations, suggest a standard terminology: Strong, Moderate, Weak. Highly correlated is of uncertain meaning. On line 182 the R2 was 0.57 but the text said correlated. Some use the following: 0-0.19 = very weak, 0.2-0.39 = weak, 0.40-0.59 = moderate, 0.6-0.79 = strong, 0.8-1 = very strong correlation

This is an excellent idea, and we edited the text accordingly.

Line 151. What does it mean when the susceptibility is largely consistent with other studies. Can the authors spell out the inconsistencies somehow?

This text was edited to be more explicit (lines 151-155).

Line 220. What is reasonable statistical confidence?

The text was edited to remove this confusing reference.

Line 272. At some point, it would be useful to read in the text that the combined studies of WT Bl6, RAG-/-, IFN-/-, Nos2-/- and CYbb-/- mice produced x essential genes, and that the CC raised it from x to y. The paragraph tells us the 234 genes were identified in Bl6 and that the plateau was ~750. How far from 234 to 750 was reached with the 4 specified knockout mice?

The requested statement was added (lines 302-306).

Line 321. I did not understand what was complex about the kinetics. Perhaps this can be clarified?

This text was clarified (Lines 342-344).

References:

Aylor, D. L., W. Valdar, W. Foulds-Mathes, R. J. Buus, R. A. Verdugo, R. S. Baric, M. T. Ferris, J. A. Frelinger, M. Heise, M. B. Frieman, L. E. Gralinski, T. A. Bell, J. D. Didion, K. Hua, D. L. Nehrenberg, C. L. Powell, J. Steigerwalt, Y. Xie, S. N. Kelada, F. S. Collins, I. V. Yang, D. A. Schwartz, L. A. Branstetter, E. J. Chesler, D. R. Miller, J. Spence, E. Y. Liu, L. McMillan, A. Sarkar, J. Wang, W. Wang, Q. Zhang, K. W. Broman, R. Korstanje, C. Durrant, R. Mott, F. A. Iraqi, D. Pomp, D. Threadgill, F. P. de Villena and G. A. Churchill (2011). "Genetic analysis of complex traits in the emerging Collaborative Cross." Genome Res 21(8): 1213-1222.

Broman, K. W., D. M. Gatti, P. Simecek, N. A. Furlotte, P. Prins, Ś. Sen, B. S. Yandell and G. A. Churchill (2019). "R/qtl2: Software for Mapping Quantitative Trait Loci with High-Dimensional Data and Multiparent Populations." Genetics 211(2): 495-502.

Chick, J. M., S. C. Munger, P. Simecek, E. L. Huttlin, K. Choi, D. M. Gatti, N. Raghupathy, K. L. Svenson, G. A. Churchill and S. P. Gygi (2016). "Defining the consequences of genetic variation on a proteome-wide scale." Nature 534(7608): 500-505.

Churchill, G. A., D. C. Airey, H. Allayee, J. M. Angel, A. D. Attie, J. Beatty, W. D. Beavis, J. K. Belknap, B. Bennett, W. Berrettini, A. Bleich, M. Bogue, K. W. Broman, K. J. Buck, E. Buckler, M. Burmeister, E. J. Chesler, J. M. Cheverud, S. Clapcote, M. N. Cook, R. D. Cox, J. C. Crabbe, W. E. Crusio, A. Darvasi, C. F. Deschepper, R. W. Doerge, C. R. Farber, J. Forejt, D. Gaile, S. J. Garlow, H. Geiger, H. Gershenfeld, T. Gordon, J. Gu, W. Gu, G. de Haan, N. L. Hayes, C. Heller, H. Himmelbauer, R. Hitzemann, K. Hunter, H. C. Hsu, F. A. Iraqi, B. Ivandic, H. J. Jacob, R. C. Jansen, K. J. Jepsen, D. K. Johnson, T. E. Johnson, G. Kempermann, C. Kendziorski, M. Kotb, R. F. Kooy, B. Llamas, F. Lammert, J. M. Lassalle, P. R. Lowenstein, L. Lu, A. Lusis, K. F. Manly, R. Marcucio, D. Matthews, J. F. Medrano, D. R. Miller, G. Mittleman, B. A. Mock, J. S. Mogil, X. Montagutelli, G. Morahan, D. G. Morris, R. Mott, J. H. Nadeau, H. Nagase, R. S. Nowakowski, B. F. O'Hara, A. V. Osadchuk, G. P. Page, B. Paigen, K. Paigen, A. A. Palmer, H. J. Pan, L. Peltonen-Palotie, J. Peirce, D. Pomp, M. Pravenec, D. R. Prows, Z. Qi, R. H. Reeves, J. Roder, G. D. Rosen, E. E. Schadt, L. C. Schalkwyk, Z. Seltzer, K. Shimomura, S. Shou, M. J. Sillanpää, L. D. Siracusa, H. W. Snoeck, J. L. Spearow, K. Svenson, L. M. Tarantino, D. Threadgill, L. A. Toth, W. Valdar, F. P. de Villena, C. Warden, S. Whatley, R. W. Williams, T. Wiltshire, N. Yi, D. Zhang, M. Zhang and F. Zou (2004). "The Collaborative Cross, a community resource for the genetic analysis of complex traits." Nat Genet 36(11): 1133-1137.

Churchill, G. A., D. M. Gatti, S. C. Munger and K. L. Svenson (2012). "The Diversity Outbred mouse population." Mamm Genome 23(9-10): 713-718.

Ferris, M. T., D. L. Aylor, D. Bottomly, A. C. Whitmore, L. D. Aicher, T. A. Bell, B. Bradel-Tretheway, J. T. Bryan, R. J. Buus, L. E. Gralinski, B. L. Haagmans, L. McMillan, D. R. Miller, E. Rosenzweig, W. Valdar, J. Wang, G. A. Churchill, D. W. Threadgill, S. K. McWeeney, M. G. Katze, F. Pardo-Manuel de Villena, R. S. Baric and M. T. Heise (2013). "Modeling host genetic regulation of influenza pathogenesis in the collaborative cross." PLoS Pathog 9(2): e1003196.

Graham, J. B., J. L. Swarts, M. Mooney, G. Choonoo, S. Jeng, D. R. Miller, M. T. Ferris, S. McWeeney and J. M. Lund (2017). "Extensive Homeostatic T Cell Phenotypic Variation within the Collaborative Cross." Cell Rep 21(8): 2313-2325.

Gralinski, L. E., M. T. Ferris, D. L. Aylor, A. C. Whitmore, R. Green, M. B. Frieman, D. Deming, V. D. Menachery, D. R. Miller, R. J. Buus, T. A. Bell, G. A. Churchill, D. W. Threadgill, M. G. Katze, L. McMillan, W. Valdar, M. T. Heise, F. Pardo-Manuel de Villena and R. S. Baric (2015). "Genome Wide Identification of SARS-CoV Susceptibility Loci Using the Collaborative Cross." PLoS Genet 11(10): e1005504.

Keele, G. R., W. L. Crouse, S. N. P. Kelada and W. Valdar (2019). "Determinants of QTL Mapping Power in the Realized Collaborative Cross." G3 (Bethesda) 9(5): 1707-1727.

Keele, G. R., T. Zhang, D. T. Pham, M. Vincent, T. A. Bell, P. Hock, G. D. Shaw, J. A. Paulo, S. C. Munger, F. Pardo-Manuel de Villena, M. T. Ferris, S. P. Gygi and G. A. Churchill (2021). "Regulation of protein abundance in genetically diverse mouse populations." Cell Genomics: 100003.

Lorè, N. I., B. Sipione, G. He, L. J. Strug, H. J. Atamni, A. Dorman, R. Mott, F. A. Iraqi and A. Bragonzi (2020). "Collaborative Cross Mice Yield Genetic Modifiers for Pseudomonas aeruginosa Infection in Human Lung Disease." mBio 11(2).

Minato, Y., D. M. Gohl, J. M. Thiede, J. M. Chacon, W. R. Harcombe, F. Maruyama and A. D. Baughn (2019). "Genomewide Assessment of Mycobacterium tuberculosis Conditionally Essential Metabolic Pathways." mSystems 4(4).

Mishra, B. B., R. R. Lovewell, A. J. Olive, G. Zhang, W. Wang, E. Eugenin, C. M. Smith, J. Y. Phuah, J. E. Long, M. L. Dubuke, S. G. Palace, J. D. Goguen, R. E. Baker, S. Nambi, R. Mishra, M. G. Booty, C. E. Baer, S. A. Shaffer, V. Dartois, B. A. McCormick, X. Chen and C. M. Sassetti (2017). "Nitric oxide prevents a pathogen-permissive granulocytic inflammation during tuberculosis." Nat Microbiol 2: 17072.

Noll, K. E., A. C. Whitmore, A. West, M. K. McCarthy, C. R. Morrison, K. S. Plante, B. K. Hampton, H. Kollmus, C. Pilzner, S. R. Leist, L. E. Gralinski, V. D. Menachery, A. Schäfer, D. Miller, G. Shaw, M. Mooney, S. McWeeney, F. Pardo-Manuel de Villena, K. Schughart, T. E. Morrison, R. S. Baric, M. T. Ferris and M. T. Heise (2020). "Complex Genetic Architecture Underlies Regulation of Influenza-A-Virus-Specific Antibody Responses in the Collaborative Cross." Cell Rep 31(4): 107587.

Sakai, S., K. D. Kauffman, M. A. Sallin, A. H. Sharpe, H. A. Young, V. V. Ganusov and D. L. Barber (2016). "CD4 T Cell-Derived IFN-γ Plays a Minimal Role in Control of Pulmonary Mycobacterium tuberculosis Infection and Must Be Actively Repressed by PD-1 to Prevent Lethal Disease." PLoS Pathog 12(5): e1005667.

Smith, C. M., M. K. Proulx, R. Lai, M. C. Kiritsy, T. A. Bell, P. Hock, F. Pardo-Manuel de Villena, M. T. Ferris, R. E. Baker, S. M. Behar and C. M. Sassetti (2019). "Functionally Overlapping Variants Control Tuberculosis Susceptibility in Collaborative Cross Mice." mBio 10(6).

[Editors’ note: what follows is the authors’ response to the second round of review.]

Reviewers and I concur that your manuscript provides value to the field as a "Tools and Resource" paper for generating new hypotheses. The genotypic diversity explored in the study, together with an assessment of disease pathology/outcomes, provides a useful framework to pose important questions that are currently topical for tuberculosis. That said, it would be important to frame the study as a resource rather than trying to probe specific questions with the current dataset, which can be limited by the depth of data available currently in the work. I encourage you to consider carefully how best to present this to the field in a manner that acknowledges the limitations and but also creates the space for innovative and explorative future work.

We thank the reviewers and editor for review of our work. To facilitate the paper as a useful “Tools and Resources” paper, we have added text throughout the manuscript highlighting the specific Tools and Resources aspects of each section, detailed the limitations and discussed further avenues for follow-up and new lines of enquiry this work has opened.

To facilitate this, please address the following concerns:

1. For a paper reporting a potentially useful resource, it is imperative that all data be publicly available for further interrogation and exploration. Reviewers expressed concern that this is not the case with your submission. For the GEO link, data appear available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164156. However, for the GitHub link, the Smith repository was not located. There was only one repository (Bellerose): https://github.com/sassettilab?tab=repositories. Please address this.

This GitHub repo has been made public.

2. In lines 145-147, it is stated: Groups of 3-6 male mice per genotype were infected and TB disease-related traits were quantified at one-month post-infection. Data from all surviving animals are provided in Figure 1-sourcedata1. However, the source data file indicates that the N is 1-3 for most of the data, except for two strains, Bl6 and CC010, which are reported to have 6 mice. There are asterisks for significance in only in a minority of the bars (~15) in Figure 1A. What about the remainder? What n's did they have? Too few to determine significance? This should be clarified.

We clarified this after previous review to indicate that while at least 3 mice of each genotype was infected, some individual mice within a group died early and hence were not phenotyped. To make this even more explicit, we have added a column into Figure1-SourceData1_CCTB_Phenotypes to state “N of mice infected” and have changed the “N” column to “N of surviving phenotyped animals”.

3. Explain why an R value of 0.43 would be considered a moderate correlation, when elsewhere you suggest that there could be biological differences that may make this correlation weak. Similarly, it is often the case that in inflammatory processes, TNF and IFNγ track together, even if made by different cells, which is not even strictly the case. These are two examples of continued disagreement. Please reconsider carefully how you present these data.

This standardized nomenclature is defined in the text and was used in response to the previous review. Since the interpretation of this terminology is subjective, we use a systematic nomenclature.

4. Line 184-188. The "significant" associations described in this section are for non-replicate studies with N=2 (CC30, 40 and A/J). Perhaps reconsider the use of "significant" in this context. In Line 253, this experiment used a large N and hence likely produced robust results.

The word “significant” has been removed.

5. Figure 1 A. Giving results of an ANOVA with post-hoc analysis comparison of means to a single mouse strain (C57/BL6) is difficult to interpret because of the potential bias from post-hoc decisions about which strain serves as a comparator. For example, using AJ as a comparator shows no significant differences with other strains. For this reason, excluding this analysis is warranted.

C57BL/6J was used as the comparator as that is the standard mouse strain used in the TB field. As a Tools and Resources paper, comparing this new mouse panel to the accepted/standard mouse strain C57BL/6J is the comparisons most researchers immediately would look for (and have enquired about from the preprint) and thus would be most beneficial for researchers in the field. A sentence has been added to figure legend 1 to explicitly point researchers to the most useful comparisons.

6. Line 199: "A number of genotypes were able to control bacterial replication yet had very low levels of IFNΓ (Figure 1D)" -- for the benefit of the reader, please specify which genotypes fall into this category. Ideally this would be specifically mentioned in the text and in addition the genotypes of interest would be labelled in the figure. Similarly for Figure 1C, it would be helpful if the significant "outlier" genotypes were labelled in the figure so the reader can know which these are.

These genotypes have been specified again in the text and figure legend.

7. Line 487: “Two TipQTL overlapped with HipQTL (Figure 7)…”. Please specify in the text which overlapping QTL are being referred to here. It seems evident from the figure but specifying would assist the reader here.

This has been specified in the text (line 492 and highlighted in the marked up manuscript).

8. Some of the type on the figures is really tiny, e.g., 1A, 1B, most of figure 4, parts of figure 6, etc. Enlarge the panels with the tiniest type so that they are more readable.

All figures have been enlarged to the maximum size permittable.

https://doi.org/10.7554/eLife.74419.sa2

Article and author information

Author details

  1. Clare M Smith

    1. Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    2. Department of Molecular Genetics and Microbiology, Duke University, Durham, United States
    Contribution
    Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Validation, Visualization, Writing – original draft
    For correspondence
    clare.m.smith@duke.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2601-0955
  2. Richard E Baker

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Data curation, Formal analysis, Methodology, Visualization, Writing – original draft
    Competing interests
    No competing interests declared
  3. Megan K Proulx

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Formal analysis, Investigation, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9524-8302
  4. Bibhuti B Mishra

    1. Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    2. Department of Immunology and Microbial Disease, Albany Medical College, Albany, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7203-1653
  5. Jarukit E Long

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Sae Woong Park

    Department of Microbiology and Immunology, Weill Cornell Medical College, New York, United States
    Contribution
    Investigation, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4649-4566
  7. Ha-Na Lee

    Department of Microbiology and Immunology, Weill Cornell Medical College, New York, United States
    Contribution
    Investigation, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4136-0128
  8. Michael C Kiritsy

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Investigation, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8364-8088
  9. Michelle M Bellerose

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0232-9953
  10. Andrew J Olive

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3441-3113
  11. Kenan C Murphy

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Methodology, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3677-2876
  12. Kadamba Papavinasasundaram

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Supervision, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7837-1344
  13. Frederick J Boehm

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1644-5931
  14. Charlotte J Reames

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5579-5881
  15. Rachel K Meade

    1. Department of Molecular Genetics and Microbiology, Duke University, Durham, United States
    2. University Program in Genetics and Genomics, Duke University, Durham, United States
    Contribution
    Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8322-0257
  16. Brea K Hampton

    1. Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
    2. Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7167-5652
  17. Colton L Linnertz

    Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2969-8193
  18. Ginger D Shaw

    1. Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
    2. Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2590-4973
  19. Pablo Hock

    Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  20. Timothy A Bell

    Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9546-6334
  21. Sabine Ehrt

    Department of Microbiology and Immunology, Weill Cornell Medical College, New York, United States
    Contribution
    Supervision, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7951-2310
  22. Dirk Schnappinger

    Department of Microbiology and Immunology, Weill Cornell Medical College, New York, United States
    Contribution
    Supervision, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
  23. Fernando Pardo-Manuel de Villena

    1. Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
    2. Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, United States
    Contribution
    Funding acquisition, Resources, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5738-5795
  24. Martin T Ferris

    Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, United States
    Contribution
    Funding acquisition, Investigation, Methodology, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1241-6268
  25. Thomas R Ioerger

    Department of Computer Science and Engineering, Texas A&M University, College Station, United States
    Contribution
    Investigation, Methodology, Resources, Supervision, Writing – original draft
    Competing interests
    No competing interests declared
  26. Christopher M Sassetti

    Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, United States
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Project administration, Supervision, Writing – original draft
    For correspondence
    christopher.sassetti@umassmed.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6178-4329

Funding

National Institute of Allergy and Infectious Diseases (AI132130)

  • Fernando Pardo-Manuel de Villena
  • Christopher M Sassetti

National Institute of Allergy and Infectious Diseases (U19AI100625)

  • Fernando Pardo-Manuel de Villena
  • Martin T Ferris

Howard Hughes Medical Institute (A20-0146)

  • Brea K Hampton

National Human Genome Research Institute (U24HG010100)

  • Fernando Pardo-Manuel de Villena

Bank of America (Charles H King Postdoctoral Fellowship)

  • Clare M Smith

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank all members of Sassetti Lab, past and present for technical help and discussions; Dr. Nathan Hicks and Dr. Sarah Fortune for kindly providing the rnaseJ mutant; Dr. David Tobin for insightful manuscript comments; Dr. Dennis Ko for QTL acronym creativity; and the Systems Genetic Core at UNC for their help in procuring CC mice in timely fashion. This work was supported by NIH grants AI132130 to C M Sassetti and FPMV; U19AI100625 to FPMV and MF; a fellowship from the Charles H King Foundation to C M Smith; and a HHMI Gilliam Fellowship A20-0146 to BKH. The genetic characterization of the CC strains was supported in part by NIH grant U24HG010100 to FPMV.

Ethics

Mouse studies were performed in strict accordance using the recommendations from the Guide for the Care and Use of Laboratory Animals of the National Institute of Health and the Office of Laboratory Animal Welfare. Mouse studies at the University of Massachusetts Medical School (UMASS) were performed using protocols approved by the UMASS Institutional Animal Care and Use Committee (IACUC) (Animal Welfare Assurance Number A3306-01) in a manner designed to minimize pain and suffering in Mtb-infected animals. Any animal that exhibited severe disease signs was immediately euthanized in accordance with IACUC approved endpoints. All mouse studies at UNC (Animal Welfare Assurance #A3410-01) were performed using protocols approved by the UNC Institutional Animal Care and Use Committee (IACUC).

Senior and Reviewing Editor

  1. Bavesh D Kana, University of the Witwatersrand, South Africa

Publication history

  1. Preprint posted: December 1, 2020 (view preprint)
  2. Received: October 4, 2021
  3. Accepted: January 27, 2022
  4. Accepted Manuscript published: February 3, 2022 (version 1)
  5. Version of Record published: February 15, 2022 (version 2)

Copyright

© 2022, Smith et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,511
    Page views
  • 277
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Clare M Smith
  2. Richard E Baker
  3. Megan K Proulx
  4. Bibhuti B Mishra
  5. Jarukit E Long
  6. Sae Woong Park
  7. Ha-Na Lee
  8. Michael C Kiritsy
  9. Michelle M Bellerose
  10. Andrew J Olive
  11. Kenan C Murphy
  12. Kadamba Papavinasasundaram
  13. Frederick J Boehm
  14. Charlotte J Reames
  15. Rachel K Meade
  16. Brea K Hampton
  17. Colton L Linnertz
  18. Ginger D Shaw
  19. Pablo Hock
  20. Timothy A Bell
  21. Sabine Ehrt
  22. Dirk Schnappinger
  23. Fernando Pardo-Manuel de Villena
  24. Martin T Ferris
  25. Thomas R Ioerger
  26. Christopher M Sassetti
(2022)
Host-pathogen genetic interactions underlie tuberculosis susceptibility in genetically diverse mice
eLife 11:e74419.
https://doi.org/10.7554/eLife.74419

Further reading

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Bethany Sump et al.
    Research Article

    For some inducible genes, the rate and molecular mechanism of transcriptional activation depends on the prior experiences of the cell. This phenomenon, called epigenetic transcriptional memory, accelerates reactivation and requires both changes in chromatin structure and recruitment of poised RNA Polymerase II (RNAPII) to the promoter. Memory of inositol starvation in budding yeast involves a positive feedback loop between transcription factor-dependent interaction with the nuclear pore complex and histone H3 lysine 4 dimethylation (H3K4me2). While H3K4me2 is essential for recruitment of RNAPII and faster reactivation, RNAPII is not required for H3K4me2. Unlike RNAPII-dependent H3K4me2 associated with transcription, RNAPII-independent H3K4me2 requires Nup100, SET3C, the Leo1 subunit of the Paf1 complex and, upon degradation of an essential transcription factor, is inherited through multiple cell cycles. The writer of this mark (COMPASS) physically interacts with the potential reader (SET3C), suggesting a molecular mechanism for the spreading and re-incorporation of H3K4me2 following DNA replication.

    1. Genetics and Genomics
    2. Neuroscience
    Alyssa J Lawler et al.
    Tools and Resources

    Recent discoveries of extreme cellular diversity in the brain warrant rapid development of technologies to access specific cell populations within heterogeneous tissue. Available approaches for engineering-targeted technologies for new neuron subtypes are low yield, involving intensive transgenic strain or virus screening. Here, we present Specific Nuclear-Anchored Independent Labeling (SNAIL), an improved virus-based strategy for cell labeling and nuclear isolation from heterogeneous tissue. SNAIL works by leveraging machine learning and other computational approaches to identify DNA sequence features that confer cell type-specific gene activation and then make a probe that drives an affinity purification-compatible reporter gene. As a proof of concept, we designed and validated two novel SNAIL probes that target parvalbumin-expressing (PV+) neurons. Nuclear isolation using SNAIL in wild-type mice is sufficient to capture characteristic open chromatin features of PV+ neurons in the cortex, striatum, and external globus pallidus. The SNAIL framework also has high utility for multispecies cell probe engineering; expression from a mouse PV+ SNAIL enhancer sequence was enriched in PV+ neurons of the macaque cortex. Expansion of this technology has broad applications in cell type-specific observation, manipulation, and therapeutics across species and disease models.