Interactions between strains govern the eco-evolutionary dynamics of microbial communities

Abstract
Editor's evaluation
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Genomic data has revealed that genotypic variants of the same species, that is, strains, coexist and are abundant in natural microbial communities. However, it is not clear if strains are ecologically equivalent, and at what characteristic genetic distance they might exhibit distinct interactions and dynamics. Here, we address this problem by tracking 10 taxonomically diverse microbial communities from the pitcher plant Sarracenia purpurea in the laboratory for more than 300 generations. Using metagenomic sequencing, we reconstruct their dynamics over time and across scales, from distant phyla to closely related genotypes. We find that most strains are not ecologically equivalent and exhibit distinct dynamical patterns, often being significantly more correlated with strains from another species than their own. Although even a single mutation can affect laboratory strains, on average, natural strains typically decouple in their dynamics beyond a genetic distance of 100 base pairs. Using mathematical consumer-resource models, we show that these taxonomic patterns emerge naturally from ecological interactions between community members, but only if the interactions are coarse-grained at the level of strains, not species. Finally, by analyzing genomic differences between strains, we identify major functional hubs such as transporters, regulators, and carbohydrate-catabolizing enzymes, which might be the basis for strain-specific interactions. Our work suggests that fine-scale genetic differences in natural communities could be created and stabilized via the rapid diversification of ecological interactions between strains.

Editor's evaluation

How easily is one species replaced by another system in an ecosystem, and what does it take so that two species are no longer equivalent? This is a central issue of ecology, which has been addressed in this elegant study. The rule of thumb the authors come up with, that genetic differences between two bacterial strains greater than about 100 bp are a good predictor of these strains being no longer ecologically equivalent, is likely to be one that will be highly cited in future.

https://doi.org/10.7554/eLife.74987.sa0

Introduction

In nature, microbial communities contain individuals on a continuum of phylogenetic diversity, where both evolutionarily distant and proximate members coexist (Ding et al., 2016; Fierer et al., 2007). Members in the same communities can belong to different domains of life, such as archaea, bacteria, and fungi, and at the same time, can stably exist with extremely closely related relatives, a few single-nucleotide polymorphisms (SNPs) apart (Schloissnig et al., 2013; Han et al., 2014; Faith et al., 2013; Kashtan et al., 2014). Collectively, closely and distantly related community members perform many vital ecological functions such as the regulation of biogeochemical cycles and fiber digestion in animal guts (Rousk and Bengtson, 2014; David et al., 2014). However, identifying how these different levels of diversity interact and co-evolve in complex communities remains an elusive and challenging problem.

A common strategy to study this problem is to analyze the dynamics of complex communities at different levels of diversity with metagenomic sequence data. Such data have been successfully leveraged to reconstruct the linkage between polymorphic sites in natural microbial populations, effectively allowing one to resolve both species and strain abundances (Good et al., 2017; Garud et al., 2019; Plucain et al., 2014; Frenkel et al., 2015). With these data, we can ask the important question: which level of diversity—strains, species, genera, families (or even broader taxonomic units)—have the strongest interactions and influence on community dynamics? In other words, what is the appropriate level of coarse-graining to describe and predict microbial community dynamics? To illustrate, if interactions at the species level are conserved (similar) at lower levels (like strains), but not at higher levels (like genera or families), then a species-level description would be appropriate. However, if strains within species display distinct dynamics and interactions with strains of other species, then a coarse-grained, species-level description would not be appropriate to describe long-term dynamics. This question is hard to answer using natural systems such as mammalian guts since they are often far from equilibrium, where extrinsic factors like host control can be much stronger drivers of community dynamics than interactions between the community members themselves (Spor et al., 2011). By domesticating natural communities in controlled laboratory conditions, however, we can study community dynamics near equilibrium (Frazão et al., 2019). In such domesticated communities, we expect changes in community composition to be induced primarily by intrinsic factors such as biological interactions. Further, by studying multiple communities domesticated in parallel in the same abiotic environment, we can disentangle which observations about community dynamics are repeatable and general, and which ones are chance and context-specific.

Here, we show that strains—which are genetic variants of the same species—exhibit the strongest dynamical correlations across all taxonomic levels in microbial communities. These results stem from tracking 10 domesticated microbial communities from the pitcher plant Sarracenia purpurea for more than 300 generations. By analyzing dynamics at several phylogenetic levels, we show that community composition varies most strongly at the level of strains. Further, interactions within each community were strain-specific: at times decoupling the dynamics of one strain from its close relative. Remarkably, as few as 100 base pair differences across strain genomes (~99.99% similarity) were sufficient for strain dynamics to diverge from each other. Finally, we show that strains can differentiate by fine-tuning only a handful of functional categories in their genomes, such as transcriptional regulators, metabolite transporters, and tricarboxylic acid (TCA) cycle enzymes. Together, our results highlight that while broader taxonomic units like families may help predict community dynamics at short timescales, strains may be the relevant unit of taxonomic coarse-graining at which to study interactions and dynamics in microbiomes at longer timescales, not merely a descriptive detail.

Results

Naturally occurring strains coexist for hundreds of generations in laboratory microcosms

We followed the eco-evolutionary dynamics of 10 replicate microbial communities derived from the carnivorous pitcher plant S. purpurea (Figure 1a). We began by sampling communities from 10 distinct pitchers from plants belonging to the same bog, and after sampling, filtered out particles larger than 3 μm (to focus on bacteria). We then transferred and propagated each ecological replicate community separately through serial passaging in a medium consisting solely of acidified water and ground cricket powder as the nutrient source (see Materials and methods). This medium mimics, in part, the ecological conditions of the pitchers from which we derived these microbes.

Figure 1 with 10 supplements see all

Download asset Open asset

Closely related strains coexist for hundreds of generations in pitcher plant-derived microbial communities.

(a) Diagram illustrating our experimental protocol. Stacked bar plots show the composition of one community (M06) at the amplicon sequence variant (ASV) (species) level sampled at each transfer; each color corresponds to a unique ASV that we tracked further using metagenomic sequencing, with their genera in the legend. (Illustration credit: Michelle Oraa Ali.) (b) Relative abundances of ASV27 (blue) and ASV14 (pink) with the approximate number of generations (see Materials and methods); the eight time points shown correspond to those for which we collected metagenomic data. (c) Relative abundances of strains identified in ASV27 and ASV14 using metagenomes. The shaded colors correspond to the abundance of each of the two strains. (d) Phylogenetic tree of the dominant strain-containing taxa across all 10 communities, with the identified genus names labeled. Brackets indicate the number of detected strains belonging to each genus. (e) Distribution of the genetic distance (divergence) between strains belonging to the same species, measured in the number of detected single-nucleotide polymorphisms (SNPs) differentiating them. (f) Bar plot showing the probability with which two members of the same taxonomic group (family, genus, species, etc.) co-occurred in a sample, normalized with a null model where all members were distributed randomly across communities. Dashed line indicates the null expectation.

To stably maintain high diversity, we initially propagated communities at a low dilution factor (1:2) every 3 days for 21 transfers, during which they reached distinct, rich, and stable equilibria (Figure 1a; ecological dynamics studied in a previous publication [Bittleston et al., 2020]). To study community dynamics over evolutionary timescales, we followed them using a similar protocol, but at a higher dilution factor (1:100) and less frequent transfer rate (every 7 days) for more than 300 generations (46 additional transfers). An increased dilution factor allowed a higher number of generations to pass between successive passages, and a weekly transfer rate made community propagation experimentally manageable over a full year. To resolve both coarse- (species) and fine (genotype)-scale dynamics, we performed 16S rRNA and deep metagenomic sequencing, respectively. Specifically, we used 16S rRNA sequencing to follow species composition (with species defined as having identical 16S rRNA); this gave us the ‘ecological’ dynamics of each community. Metagenomic sequencing, which we performed at eight evenly spaced time points between transfers 23 and 67, allowed us to follow ‘evolutionary,’ or genotype-level, changes in each community (Figure 1c). Together, this setup enabled us to track the eco-evolutionary dynamics of all 10 communities.

At the species level (variants with the same 16S rRNA sequence), the composition of each community fluctuated dynamically, but remained around the same equilibrium state for >300 generations (Figure 1a; median ~31% temporal coefficient of variation in the abundance of a species, Figure 1—figure supplement 1; and tightly clustered low-dimensional dynamics, Figure 1—figure supplement 2). We hypothesized that the genetic variation occurring during the experiment, through mutations and recombination, might be responsible for these fluctuations. To test this, we mapped metagenomic reads from each sample to a database of 33 reference genomes belonging to isolates from our communities (see Materials and methods). To avoid sequencing-related artifacts, we ensured that read mapping was competitive, that is, we only used those sequenced reads that mapped unambiguously to one genome. Importantly, because our reference genomes belonged to isolates from our communities, any detected genetic variation indicated the presence of variants relative to a resident of these communities.

Most (97%) genetic changes were in the form of SNPs, an overwhelming majority (~98%) of which were detected in the first sequenced time point (~23 generations). This suggested that the communities had significant preexisting, or standing, genetic variation. Given the large degree of divergence (~1% genome-wide divergence for some pairs; Figure 1e) and estimated divergence time based on bacterial mutation rates (~100 generations to diverge by one SNP; see Materials and methods), the preexisting variation likely came from genetic variants in the plants rather than from variants arising during the first 23 generations of lab propagation. Interestingly, the large divergence times of the genetic variants (~100 s of years) were much larger than the age of the pitchers we sampled (~3 months; see Materials and methods). This suggested that the preexisting variants represented different isolates (or strains) that had arrived in the pitchers through separate colonization events rather than having diversified within a pitcher during its lifetime. Therefore, we rejected our original hypothesis that mutations and recombination occurring during the experiment were the main driver of community dynamics. Instead, we focused on the dynamics of preexisting variants. Almost all (~99%) SNPs were biallelic (had only two variants; Figure 1—figure supplement 3), which allowed us to track variant dynamics in terms of the temporal trajectory of each SNP.

The temporal trajectories of SNPs in the same reference genome (species) were highly correlated, with their allele frequencies increasing and decreasing together (mean correlation coefficient 0.8; p<0.01; Figure 1—figure supplement 4). Such highly correlated allele frequency trajectories are hallmarks of genetic linkage (Good et al., 2017; Roodgar et al., 2009) and suggest that the large number of SNPs in each species were co-localized in a small number of genotypes or strains. We clustered the allele frequency trajectories and could statistically detect at most two strains for each species (see Materials and methods and Figure 1—figure supplement 5). For each cluster (strain), we estimated its frequency within the whole community as the fraction of the species’ abundance that corresponded to the strain (see Materials and methods). We calculated the relative abundance of each strain by multiplying its frequency with the relative abundance of the species it belonged to. Together, we concluded that within each community there was a pair of strains underlying most taxa, including the phylogenetically diverse Elizabethkingia, Aquitalea, and Delftia (Figure 1d). Within each species, strain dynamics displayed vastly different patterns. For some species, only one of the strains changed appreciably in abundance over the experiment (Figure 1c, top; Xylophilus sp.). For other species, both strains fluctuated constantly throughout the experiment (Figure 1c, bottom; Pseudomonas sp.). From an evolutionary standpoint, we observed a roughly bimodal distribution of the overall change in strain frequencies across different species in our communities (Figure 1—figure supplement 8, Figure 1—figure supplement 9). In some cases, the abundance of a species stayed relatively constant while the strains comprising it fluctuated relatively, similar to what has been observed in other systems such as the human gut (Garud et al., 2019; Roodgar et al., 2009; Zhao et al., 2019).

Since we observed different conspecific strains in different communities, we next asked at which phylogenetic level the 10 replicate communities were most variable at, that is, strains, species, genera, or families. For this, we calculated the probability that two members of the same taxonomic group (family, genus, species, etc.) co-occurred in a sample, normalized against a null model where all members were distributed randomly across communities (see Materials and methods). We found that community composition across all 10 replicates was much more variable at the level of strains than species (Figure 1f). This result is consistent with previous work that natural microbial communities are taxonomically variable at finer phylogenetic levels (strains and species), but more similar at coarser levels (families) (Goldford et al., 2018; Louca et al., 2016). Together, we concluded that natural strains of the same species can coexist in microbial communities over hundreds of generations, maintaining the communities in distinct stable states.

Highly related strains can decouple in their dynamics

Motivated by the observation that closely related strains persisted in communities, we asked whether the dynamics of each strain were similar, or coupled, to that of their closest relatives. To answer this question, we measured a strain–strain coupling for strains of the same species, defined as the correlation between their abundance trajectories (Figure 2a and Materials and methods). Strain dynamics could either be highly correlated (‘coupled strains,’ Figure 2a) or be uncorrelated (‘decoupled strains’). Measuring strain–strain coupling for each species across all 10 communities revealed a bimodal distribution, whose two modes were occupied by highly correlated and uncorrelated strain pairs, respectively (Figure 2b). The shape of this distribution allowed us to reliably classify strain pairs as either decoupled (coupling <0.4, the inflection point of the distribution; see Materials and methods) or coupled (coupling >0.4). A null model with no ecological differences between conspecific strains instead showed a unimodal distribution of coupling (mean coupling 0.95, Figure 2—figure supplement 1 and Materials and methods). This control suggested that the observed incidences of strain decoupling were much higher than expected simply by chance (p<10^–3; Kolmogorov–Smirnov test). We thus concluded that ~20% of conspecific strains, belonging to the same species, displayed rather dissimilar or decoupled dynamics.

Figure 2 with 3 supplements see all

Download asset Open asset

Even highly related strains (~100 single-nucleotide polymorphisms [SNPs] apart) can decouple in their dynamics.

(a) Schematic showing examples of strain–strain coupling. We defined strain–strain coupling as the temporal correlation between strain abundances belonging to the same species in a community; coupled strains (left) had highly correlated abundances while decoupled strains (right) were uncorrelated. (b) Distribution of the strain–strain coupling across all species and communities, smoothed using Gaussian kernel density estimation; dashed line shows the threshold coupling used to classify strains as coupled and decoupled (see Materials and methods). (c) Strain–strain coupling as a function of the genetic distance between strains. Each gray point represents a conspecific strain pair. The solid red line shows a moving average (LOESS fit). (d) Relative abundance of the species labeled ASV14 of the genus *Pseudomonas* in community M03, and its underlying strains over time; the blue shaded region represents strain 1 while the green region represents strain 2. The solid blue line shows the total species abundance. (e) Bar plot showing the number and genetic location of SNPs in the core genome of ASV31 from community M04, whose strains were decoupled and differed by 186 SNPs. Each bar shows the number of SNPs in one gene along with its annotation. Only SNPs belonging to annotated genes are shown.

To test if the genetic distance between strains influenced their strain–strain coupling, we plotted the average coupling as a function of the genetic distance between a conspecific strain pair (see Materials and methods). Remarkably, the average strain–strain coupling decreased sharply beyond a genetic distance of just about 100 SNPs (Figure 2c). This suggested that changes in as few as 100 bps, corresponding to roughly 0.01% of these genomes, were sufficient to decouple strain dynamics. A consequence of such decoupling was that in species with decoupled strains drastic changes in the species’ abundance were driven only by changes in one of the strains (Figure 1c and Figure 2d, top).

To understand the genomic signatures of strain decoupling, we examined the location and putative function of SNPs in a pair of highly related but decoupled strains (see Materials and methods). To serve as an illustration, we chose the Delftia genome in community M04, which had the fewest (186) SNPs among its decoupled strains, which shed light on which functions may be associated with decoupling. The SNPs differentiating these strains were scattered throughout the core genome, with 62% in the coding regions of genes with known functional annotations. Broadly, these genes corresponded to transcriptional regulators such as gbdR, transmembrane proteins such as the nitrate transporter nitT, and enzymes implicated in central carbon metabolism, such as aceE, murA, and ppsA (Figure 2e). The gbdR protein is a known transcriptional regulator of amino acid metabolism, glycine betaine catabolism, as well as phosphatase activity in bacteria, and may play a role in differentiating metabolic flux balance between strains (Wargo et al., 2008). The nitT protein is a member of the major facilitator superfamily (MFS) and a transporter that controls the uptake of nitrate available in the environment (Maeda and Omata, 1997). Finally, ppsA and murA are enzymes that metabolize pyruvate and its derivatives, and may determine how strains catabolize metabolic intermediates before they enter the TCA cycle (Mikkelsen et al., 2004; Marquardt et al., 1992). Together, this evidence suggests that diversifying in a handful of functional categories is sufficient for strains to decouple in their ecological dynamics.

Dynamical correlations suggest that interactions are strain-specific

Equipped with the understanding that conspecific strains could display decoupled dynamics, we asked which ecological factors were responsible for the observed differences in strain dynamics. Our experimental setup explicitly controlled for abiotic ecological factors since all communities were grown in the same environmental conditions. Therefore, any dynamical differences between strains must have emerged as a result of biotic factors, such as ecological interactions. To detect and measure interactions between community members, we exploited the fact that all 10 communities were at equilibrium (median ~31% temporal coefficient of variation in the abundance of a species; Figure 1a, Figure 1—figure supplement 1, Figure 1—figure supplement 2). In a controlled environment, interactions between community members at equilibrium are expected to induce temporal correlations between their abundances (Faust et al., 2015; Tikhonov et al., 2015; Faust and Raes, 2012). In contrast, we expect almost no correlations if all abundance fluctuations are purely stochastic (due to neutral drift) (Hubbell, 2001; Descheemaeker and de Buyl, 2020; Grilli, 2020; Ramsayer et al., 2012). Motivated by this logic, we measured dynamical correlations between the abundance trajectories of pairs of community members.

To determine which level of taxonomy (strains or species) had the strongest interactions, we compared the strength of correlations between the two levels of taxonomic grouping. Specifically, at the species level, we measured a species–species correlation between each co-occurring species pair as the magnitude of the correlation between the relative abundances that pair (Figure 3a). At the strain level, we only measured correlations between strain pairs belonging to different species, not between conspecifics (which we had already measured in Figure 2a–c). To compare strain correlations with their corresponding species correlations, we measured a strain–strain correlation between each species pair; we used the correlation with the highest magnitude among all four possible pairs as the strongest strain pair for a given species pair (Figure 3a and Materials and methods).

Figure 3 with 8 supplements see all

Download asset Open asset

Community interactions are strain-specific.

(a) Schematic showing how we measured dynamical correlations at the strain and species level for a species pair A and B. For any species pair, we defined strain–strain correlations as the highest magnitude of correlation among all four strain pairs from different species (left), never the same species. The species–species correlation for the same pair was simply the correlation between the two species (right). (b) Scatter plot of the dynamical correlation between species in a community and the highest correlation between their corresponding strain pairs. Each point represents one species in one of the 10 communities. The shaded region indicates strain–strain correlations higher than species–species correlations. Inset shows a pie chart of the fraction of points supporting higher strain-level interactions (76%) versus species-level interactions (24%). Triangle indicates a pair of *Achromobacter* (red) and *Delftia* (blue) species, shown in (c). Top: relative abundance plots of two uncorrelated species measured over the experiment. Bottom: relative abundances of the minor strains for the same species, which are strongly negatively correlated. (**d, e**) Schematics of our models showing how species are split into strains, with tunable differences in their consumption rates for each resource, as well as the serial dilution protocol that we simulate, where we slightly change the growth medium from transfer to transfer. (**f, g**) Scatter plots of the expected dynamical correlations using our models, (f) where strains are ecologically distinct (hypothesis 1) and (g) identical (hypothesis 2), similar to (b). Schematics of the consumption rate matrices for both models (hypotheses) are also shown.

Remarkably, we found that in the majority of comparisons (76%), at least one pair of strains from two different species was more strongly correlated than the corresponding species themselves (Figure 3b, Figure 3—figure supplement 2, Figure 3—figure supplement 3; p<10^–6, Wilcoxon signed-rank test adjusted for multiple comparisons). In most cases, even when both strain- and species-level correlations were high, the values of at least one pair of strain correlations were often higher, on average being larger by 0.17 ± 0.01 (one-sample Student’s t-test, p<10^–3). To test if this result could arise purely because we compared four pairs of strains for each pair of species, we used a null model in which strains were randomly sampled from the dataset and combined into new mock ‘species’ (see Materials and methods). This shuffling coalesced unrelated pairs of strains, but preserved the bias in the number of comparisons performed and the abundance data themselves, and revealed that the observed high fraction of strain-dominant interactions was not a statistical artifact (p<10^–3, permutation test; Figure 3—figure supplement 1, Figure 3—figure supplement 6). Additionally, interaction networks were, on average, 30% denser at the strain level than the species level (Figure 3—figure supplement 5 and supplementary text). Finally, in 7% of cases, our analysis revealed so-called ‘hidden’ interactions, masked at the species level but visible with strains. In these cases, while there was virtually no correlation between species dynamics (Figure 3c; Pearson correlation 0.02, p=0.9), the dynamics of their minor strains were strongly correlated (Figure 3c; Pearson correlation – 0.71, p<0.01, after a Benjamini–Hochberg correction). These results suggest that coarse-graining communities beyond strains, say into species, decreases dynamical correlations. Strains of the same species often correlate more strongly with strains of another species than with their own conspecifics. One reason for this may be that conspecific strains interact differently with the same community members, making community interactions strain-specific.

Minimal models support strain-specific, not species-specific, interactions

Two different hypotheses could potentially explain the observation that strain correlations are typically higher than the corresponding species correlations. First, conspecific strains have different interactions, and the increased strain correlations point to systematic differences in, say, the metabolic abilities of conspecific strains. Second, interactions are conserved at the species level (strains are ecologically equivalent, i.e., phenotypically identical), but the increased strain correlations arise from a combination of stochastic effects, such as a higher number of comparisons, measurement noise, and abundance fluctuations within species.

To compare the dynamical patterns expected under both hypotheses, we simulated community dynamics under two scenarios using two minimal consumer-resource models where interactions were mediated by resource competition. In the first model, strains were phenotypically distinct (hypothesis 1; Figure 3f), while in the second, strains were phenotypically identical (hypothesis 2; Figure 3g). In both models, we simulated the assembly of 10 independent communities under serial propagation in the same environment (see Materials and methods). To model abundance fluctuations, we introduced minor variations in the proportion of resources supplied during each transfer (Figure 3e). We encoded competitive interactions through a matrix of resource consumption rates; the difference between the rates of two taxa (hereafter referred to as D) inversely controlled the strength of competition between them (Figure 3d). In the first model, each conspecific pair had a different set of consumption rates, tuned by the ‘competitive distances’ D between them (Figure 3f). In the second model, interactions were instead species-specific, that is, both conspecific strains had identical consumption rates (D = 0), with stochastic fluctuations being the only way to alter their frequencies relative to each other (Figure 3g).

Remarkably, only one of the models, where conspecific strains have different interactions, produced patterns that were consistent with the data (78% strain correlations greater than species correlations; Figure 3f). The alternate model, with phenotypically identical strain interactions, produced strain correlations that were greater in only 19% of simulated species (Figure 3g). Encouragingly, strain–strain correlations produced by the first model could also recapitulate the decoupling between conspecifics beyond a characteristic phenotypic distance (Figure 3—figure supplement 7), consistent with our experimental observation of strain decoupling beyond a characteristic genetic distance (Figure 2c). We will provide a geometric intuition for this later in the article (Figure 5; Figure 3—figure supplement 8). Together, our results show that the patterns observed in the data are consistent with a model in which resource-mediated interactions are stronger between strains, not species.

Genetic variation in transporters, regulators, and pseudogenes differentiates strains

We finally asked what differentiated strains at the genomic level—in particular, at the level of the core genome, where SNPs can be reliably identified. We had already explored a specific example while studying decoupled strains (Figure 2e); we now asked for more general features that differentiated coexisting strains. Most (97%) of the differences between strains in their core genomes were in the form of SNPs; the remaining 3% constituted insertions and deletions (average 4 bp per event; see Materials and methods). We had only one reference genome for each ASV, thus we could not detect changes in the flexible genome, such as gene gains in a specific strain. For all 163 strains, there was no significant bias in the distribution of SNPs between coding and non-coding regions, implying that most (>80%) of the SNPs were found in the coding regions (Figure 4—figure supplement 1).

We next asked which functional categories of genes were enriched in strain-differentiating SNPs. For this, we performed a broad categorical analysis, giving us a bird’s eye view of the cellular functions likely to be variable in strains (we used annotations and categories from the KEGG database, respectively; see Materials and methods) (Kanehisa et al., 2017). Four major functional categories differentiating strains emerged: (1) two-component systems, (2) enzymes involved in carbon metabolism, (3) transcription factors, and (4) transporters (Figure 4a). To get a closer look at specific examples of pathways affected by each of these four categories, we examined ASV11, belonging to the genus Aquitalea in M04, which contained SNPs in genes from all four categories (Figure 4b; Supplementary file 2). For two-component systems, we found the dctD protein, which, along with dctB, regulates the uptake of C4-dicarboxylates such as aspartate, malate, fumarate, and succinate (Park et al., 2002). The enzyme korB is an oxidoreductase, known to decarboxylate 2-oxoglutarate, a key intermediate in the TCA cycle (Tersteegen et al., 1997). The transcription factor phnF represses the phnCDE operon and regulates the biosynthesis of amino acids such as glutamate (Aravind and Anantharaman, 2003). Finally, we found many diverse sugar and amino acid transporters (of the ABC and MFS families), ion transporters like corA (Papp-Wallace and Maguire, 2007), as well as iutA, which mediates siderophore uptake and competition for iron in bacteria (Figure 4b; Krewulak and Vogel, 2008). Many of these functional categories were specific to the coexisting strains in our communities, and not generic differences between members of the same taxa (Figure 4—figure supplement 1 and Materials and methods). These examples typify some key functional signatures that differentiate strains and suggest that variation in transporters, regulators, and enzymes in central carbon metabolism may be the basis for strain-specific ecological interactions, for example, by separating the metabolic niches of strains, as in our models.

Figure 4 with 4 supplements see all

Download asset Open asset

Genetic variation in regulators, transporters, and pseudogenes differentiates strains.

(a) Bar plot showing the four functional categories of genes most enriched in strain-differentiating single-nucleotide polymorphisms (SNPs). The x-axis represents the mean number of SNPs belonging to the category in a strain pair. (b) Table showing an example of a gene in each functional category identified in (a); the middle column shows a schematic of the gene with its name (italics). (c) The average number of pseudogenes detected in strains per strain as a function of time. The solid line shows a linear regression, whose intercept shows the number of pseudogenes detected at the first sequenced time point; the shaded region represents the standard error of the mean (s.e.m.). (d) Bar plot showing the two functional categories most enriched in strain-differentiating pseudogenes. The x-axis represents the median number of genes belonging to the category in a strain pair. (e) Bar plot showing the mean pN/pS of mutations detected in pseudogenes (red) and all other strain-differentiating genes (gray). Dashed line represents the expected pN/pS under a neutral model. All error bars represent s.e.m.

Another key genomic signature was the presence of several strain-specific pseudogenes in strain genomes. Pseudogenes are genes that contain a premature stop codon and are not expected to produce functional proteins (Balakirev and Ayala, 2003). Conspecific strains differed from each other not just by mutations, but also by differential pseudogenization. Not only did one of the two conspecific strains in each pair have several (~5) pseudogenes at the first time point (~23 generations), but we could detect new pseudogenes being generated throughout the experiment (Figure 4c, Figure 4—figure supplement 4; see Materials and methods for how we detected pseudogenes and assigned them to strains). This suggested that pseudogenization was both extensive and rampant during community evolution. Pseudogenes that differentiated conspecific strains were enriched in two functional categories: (1) motility proteins such as cheA, responsible for chemotaxis (Karatan et al., 2001), and (2) phage proteins such as fii and gpdD, known to constitute viral tails (Figure 4d; Temple et al., 1991). Functions such as chemotaxis and structural viral proteins are not expected to be advantageous for bacterial growth in the environmental conditions of our experiment, and are thus likely to face weak, or no, selection. Indeed, while pseudogenes did not accumulate mutations faster than other genes (Figure 4—figure supplement 2), their mutational profile had a pN/pS ~ 1, consistent with neutral evolution (Figure 4e). In contrast, other genes showed signatures of purifying selection, with a pN/pS ~ 0.2, significantly lower than 1 (Figure 4e). The rampant pseudogenization observed in our communities suggests that evolution tends to deactivate genes that do not contribute to fitness, such as chemotaxis and viral genes in stable, well-mixed environments. The end products of such nonfunctional pseudogenization and functional genomic variation may lead to further phenotypic differences between strains, allowing even highly related strains to show decoupled dynamics.

Discussion

Understanding the role of strains in microbiomes is crucial to the study of microbial ecology and evolution (Van Rossum et al., 2020). Here, we addressed this question by propagating microbial communities from pitcher plants for more than 300 generations. We found that strains that were preexisting genetic variants belonging to the same species were the key determinants of long-term community dynamics. Differing by between 1 and ~10,000 SNPs, strains imparted both compositional and dynamical variability to communities. Compositionally, we found that communities were most variable in terms of strains, with each community carrying a unique set of strains even when they carried the same species. This variability was likely the result of two chance events: stochastic sampling and laboratory selection. Dynamically, once the communities settled into unique equilibria, we found that the dynamics of even extremely closely related strains, with as few as 100 SNPs, could be decoupled from each other. Even when a species’ abundance appeared relatively stable over time, strains comprising it could be highly dynamic. Such strains were temporally correlated with other strains belonging to different species. Mathematical consumer-resource models capturing our experiment suggested that these observations and patterns were consistent with resource-mediated competitive interactions between strains, not species.

Our observation that correlated fluctuations among existing strains dominate evolutionary dynamics is perhaps somewhat different from one’s naïve expectation, anticipating new mutations and recombination events to be the dominant forces driving evolution within communities (Plucain et al., 2014). Once our communities reached a stable equilibrium, their evolution was marked by the dynamics of the preexisting strains, which could stably coexist for hundreds of generations. Thus, in both cases, either by mutation and recombination (in single-species laboratory evolution) or by assembly (in complex communities as in this article), the dynamics of populations are determined by the ecological interactions between fine-scale genetic variants, or strains (Good et al., 2017; Roodgar et al., 2009). Moreover, in both cases, the interactions between strains are likely resource-mediated since the genetic differences between ecologically divergent strains are concentrated in metabolic genes (Plucain et al., 2014). Taken together, these results suggest a striking parallel between the eco-evolutionary dynamics of both isogenic populations as well as complex communities.

The patterns we observed, such as conspecific strain decoupling and higher strain correlations beyond species boundaries, underscore the emerging view that strains are the most dynamic and interactive units of microbiomes (Roodgar et al., 2009; Leventhal et al., 2018; Goyal, 2018). Our study has one major advantage over previous work where natural communities, such as human gut microbiomes, were sampled over time without environmental control. That is, that, by experimentally propagating natural samples, we shielded the communities from external host-induced perturbations such as the migration of new genetic variants or immune system control. Therefore, any observed community dynamics were a result of intrinsic causes, such as strain-specific interactions, not extrinsic causes, such as host-induced shifts. The reproducibility of these patterns across 10 independent replicate communities further strengthens our findings.

Finally, our results raise a question about the appropriate level at which to monitor microbial community composition. On one hand, our results show that strains as few as 100 SNPs apart can have independent ecological dynamics and interactions, and as such, represent distinct ecological variables. On the other hand, consistent with previous work, our results also show that the presence of strains is highly variable across communities, even in the same abiotic environmental conditions (Leventhal et al., 2018). Therefore, coarse-grained assemblages (e.g., members of the same taxonomic family) should better represent the relationship between metabolic niches and community composition, especially for short-term community assembly dynamics (Goldford et al., 2018; Louca et al., 2016). But it is not clear how such assemblages should be defined in general, and what type of functional redundancy they capture. This is because there are numerous definitions of what constitutes ‘function.’ In the study that preceded this article, we found that the substrate consumption profile of the community was strongly correlated with community composition (at the 16S rRNA level), indicating thus that there is, at best, weak functional redundancy in the community if substrate preference is what we define as ‘function’ (Bittleston et al., 2020). However, if what we care about is the conversion of organic carbon to CO₂, for instance, community composition is indeed highly redundant (most species in the community could perform aerobic respiration and yield similar amounts of CO₂). More work is needed in this arena to clarify how relevant ecological functions diverge across the microbial phylogeny. However, at the very least, we can say that if our questions pertain long-term dynamics Figure 5, or (eco)-evolution, strains are the appropriate variables with which to study the system.

Figure 5

Download asset Open asset

Conceptual model of strain-dominated long-term dynamics.

(Diversity pre-assembly) Schematic showing the phenotypes (e.g., resource consumption rates) of strains belonging to two different species, A (green) and B (red), in a large strain pool in nature. Each point represents the consumption rate of a particular strain on two hypothetical resources. (Diversity post-assembly) When assembled in a community (e.g., domesticated in the lab), a subset of the strains from each species (here, two from each species) may survive and coexist once the community reaches equilibrium. (Dynamics around equilibrium) The strains from each species influence each other’s long-term dynamics around equilibrium. Strains from different species that are closer in phenotype space (A₁ and B₁) will display strongly correlated dynamics while phenotypically distant strains (A₂ and B₂) will be weakly correlated.

Share this article

Cite this article

Closely related strains coexist for hundreds of generations in pitcher plant-derived microbial communities.

Even highly related strains (~100 single-nucleotide polymorphisms [SNPs] apart) can decouple in their dynamics.

Community interactions are strain-specific.

Genetic variation in regulators, transporters, and pseudogenes differentiates strains.

Conceptual model of strain-dominated long-term dynamics.

Author details

Akshit Goyal

Contribution

Competing interests

Leonora S Bittleston

Contribution

Competing interests

Gabriel E Leventhal

Contribution

Competing interests

Lu Lu

Contribution

Competing interests

Otto X Cordero

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags