Macroecological patterns in coarse-grained microbial communities

William R. Shoemaker; Jacopo Grilli

doi:10.7554/eLife.89650.1

eLife assessment

How macro-ecological patterns of microbiomes depend on the taxonomic level across a wide range of taxa and ecosystems, and that correlations in richness across taxonomic scales are largely created by variation in sample size, are valuable findings. The authors present convincing evidence that a stochastic logistic growth model is a more appropriate choice as null model than one that is based on the neutral theory of biodiversity. The work will be of interest to microbial ecologists and those interested in general ecological patterns.

https://doi.org/10.7554/eLife.89650.1.sa2

Significance of findings

valuable: Findings that have theoretical or practical implications for a subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

convincing: Appropriate and validated methodology in line with current state-of-the-art

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

The structure and diversity of microbial communities is intrinsically hierarchical due to the shared evolutionary history of community members. This history is typically captured through taxonomic assignment and phylogenetic reconstruction, sources of information that are frequently used to group microbes into higher levels of organization in experimental and natural communities. Connecting community diversity to the joint ecological dynamics of the abundances of these groups is a central problem of community ecology. However, how diversity and dynamics depend on the scale of observation at which groups are defined has never been systematically examined. Here, we used a macroecological approach to quantitatively characterize the structure and diversity of microbial communities among disparate environments across taxonomic and phylogenetic scales. We found that measures of biodiversity at a given scale can be consistently predicted using predictions derived from a minimal model of ecology, the Stochastic Logistic Model of growth (SLM). Extending these within-scale results, we examined the relationship between measures of biodiversity calculated at different scales (e.g., genus vs. family), an empirical pattern predicted by the Diversity Begets Diversity (DBD) hypothesis. We found that the relationship between richness estimates at different scales can be quantitatively predicted assuming independence among community members. Contrastingly, only by including correlations between species abundances (e.g., as consequence of interactions) can we predict the relationship between estimates of diversity at different scales. The results of this study characterize novel microbial patterns across scales of organization and establish a sharp demarcation between recently proposed macroecological patterns that are not and are affected by ecological interactions.

Introduction

An essential feature of microbial communities is their heterogeneous composition. A single environmental sample typically has a high richness, harboring hundreds to thousands of community members [1–3]. This high level of richness reaches an astronomical quantity at the global level, as scaling relationships and models of biodiversity predict upwards of one trillion (∼ 10¹²) species on Earth [4, 5]. Even experimental communities in laboratory settings with a single carbon source can harbor ≥ 40 community members, culminating in a total richness numbering in the hundreds among replicate communities (e.g., [6]). This richness contributes to the sheer diversity of microbial communities, a challenge for researchers attempting to identify the general principles that govern their dynamics and composition.

While richness estimates of microbial communities are undoubtedly high, the choice of assigning a community member to a given taxon remains intrinsically arbitrary. This arbitrariness remains regardless of whether the definition of a taxon is based on physiological attributes measured in the laboratory, entire genomes (i.e., metagenomics), or single-gene amplicon-based methods (i.e., 16S rRNA annotation). Despite their methodological differences, these approaches can all be viewed as different ways to cluster individuals within a community into groups. To contend with the sheer richness of microbial communities, researchers frequently rely on annotation-based approaches, i.e., by summing the abundances of species that belong to the same group at a given taxonomic scale (e.g., genus, family, etc.). This approach pares down communities to a size that is amenable for the visualization of individual groups and allows for questions of scale-dependent community reproducibility to be addressed [6–13].

The trend of performing analyses at a given taxonomic scale raises the question of how the composition of a community at one scale relates to that at another. To address these questions, researchers have examined the relationship between biodiversity measures at different scales in order to pare down the set of plausible ecological mechanisms that govern community composition. Specifically, recent efforts have found that microbial richness/diversity within a given taxonomic group (e.g., genus) is typically positively correlated with the richness/diversity among the remaining groups (e.g., family) [9, 14], an empirical pattern that aligns with the predictions of the Diversity Begets Diversity hypothesis (DBD) [15–17]. Evidence of the DBD hypothesis has historically been attributed to the construction of novel niches within a community through member interactions [15, 18], with similar mechanisms having been proposed to explain the existence of a positive relationship in microbial communities [14]. However, we still lack a quantitative understanding of how community composition at one scale should relate to that of another. Proceeding towards this goal requires two elements: 1) a systematic approach to grouping community members and 2) an appropriate null model for the composition of communities.

The operation of grouping the components of a system into a smaller number (e.g., merging read counts of OTUs to the family level in a community) is known in the physical sciences as coarse-graining. This formalism defines our systematic approach to grouping community members. While it is often not explicitly acknowledged as such, coarse-graining is a core concept in the microbial life sciences [19]. By smoothing over microscopic details at a lower level of biological organization in order to make progress at a higher level, the concept of coarse-graining has contributed towards the development of effective models of physiological growth [20, 21], evolutionary dynamics [22, 23], and the dependence of ecosystem properties on the diversity of underlying communities [24]. Coarse-graining has even been used to glean insight into the question of whether ”species” as a unit has meaning for microorganisms, as modeling efforts have found that the operation permits the delimitation of species when the resource preferences of community members are structured [25]. These theoretical and empirical efforts suggest that coarse-graining may provide an appropriate framework for investigating patterns of diversity and abundance within and between taxonomic scales of observation.

When evaluating the novelty of an empirical pattern it is useful to identify an appropriate null model for comparison [26–28]. Prior research efforts have demonstrated the novelty of the fine vs. coarse-grained relationship by contrasting inferences from empirical data with predictions obtained from the Unified Neutral Theory of Biodiversity (UNTB) [14, 29–33]. These predictions generally failed to reproduce slopes inferred from empirical data [14], implying that the fine vs. coarse-grained relationship represents a novel macroecological pattern that cannot be quantitatively explained by existing ecological models. However, the task of identifying an appropriate null model for comparison is not straightforward. Rather, the question of what constitutes an appropriate null model remains a persistent topic of discussion in community ecology [26, 34–37]. Here we take the view that a null model is appropriate for examining the relationship between two observables (e.g., community diversity at different scales) if it is capable of quantitatively predicting each observable (e.g., community diversity at one scale). By this standard, the UNTB is an unsuitable choice as a null as it generally fails to capture basic patterns of microbial diversity and abundance at any scale [38–40]. One relevant example is that the UNTB predicts that the distribution of mean abundances of species across sites is extremely narrow (i.e., converging to a delta distribution as the number of samples increases), whereas analyses suggest that empirical data tends to follow a broad lognormal distribution [40].

Contrastingly, recent efforts have determined that the predictions of a model of self-limiting growth with environmental noise, the Stochastic Logistic Model (SLM), is capable of quantitatively capturing multiple empirical macroecological patterns in microbial communities [40–44]. The stationary solution of this model predicts that the abundance of a given community member across sites follows a gamma distribution [40], a result that provides the foundation necessary to predict macroecological patterns among and between different taxonomic and phylogenetic scales.

In this study, we evaluated macroecological patterns of microbial communities across scales of evolutionary resolution. To limit potential biases that may result due to taxonomic annotation errors and to use all available data, we investigated the macroecological consequences of coarse-graining by developing a procedure that groups community members using the underlying phylogeny in addition to relying on taxonomic assignment. To ensure the generality of our findings and their commensurability with past research efforts, we used data from the Earth Microbiome Project (EMP), a public catalogue of microbial community barcode data. First, we assessed the extent that microbial diversity varies as the abundances of community members are coarse-grained by phylogenetic distance and taxonomic rank. The results of these analyses lead us to consider whether the predictive capacity of the gamma remains robust in the face of coarse-graining, a prediction that we quantitatively evaluated among community members and then extended to predict overall community richness and diversity. The accuracy of the gamma provided motivation to test whether the gamma was capable of predicting the relationship between fine and coarse-grained estimates, the empirical pattern that has been interpreted as evidence for the DBD hypothesis. Together, these analyses present evidence of the scale invariance of macroecological patterns in microbial communities as well as the applicability of the gamma distribution as a minimal model for evaluating macroecological patterns of microbial biodiversity.

Results

The macroecological consequences of phylogenetic and taxonomic coarse-graining

While microbial communities are often coarse-grained into higher taxonomic scales, their effect on measures of biodiversity and the underlying phylogeny are rarely examined. Before proceeding with the full analysis using public 16S rRNA amplicon data from the EMP, we elected to quantify the fraction of remaining community members across coarse-graining thresholds, a reflection of the extent that coarse-graining reduces global richness and the relation between taxonomic and phylogenetic coarse-graining. We first defined a coarse-grained group g as the set of OTUs that have the same assigned label in a given taxonomic rank out of G groups (e.g., Pseudomonas at the genus level) or are collapsed when the phylogeny is truncated by a given root-to-tip distance (Figs. 1, S1).The relative abundance of group g in site j is defined as .

The process of coarse-graining abundances using the phylogeny.
Taxonomic assignment in 16S rRNA amplicon sequence data provides the opportunity to investigate how properties of communities vary at different taxonomic scales. The most straightforward means of coarse-graining here is to sum the abundances of OTUs/ASVs that belong to the same taxonomic group. Amplicon data-based studies provide information about the shared evolutionary history of community members, information that can be leveraged by the construction of phylogenetic trees. A coarse-graining procedure can be defined that is analogous to one based on taxonomy, where a phylogenetic distance is chosen, and terminal nodes are collapsed if their distance to a common ancestor is less than the prescribed distance.

We found that coarse-graining had a drastic effect on the total number of community members within an environment, reducing global richness by ∼90% even at just the genus level (Fig. S2a). By coarse-graining over a range of phylogenetic distances, we saw that a fraction of coarse-grained community members comparable to that of genus-level coarse-graining occurs at a phylogenetic distance of ∼0.1 (Fig. S2b). This distance translates to only ∼3% of the total distance of the tree, meaning that the majority of OTUs are coarse-grained over a minority of the tree. This pattern is likely driven by the underlying structure of microbial phylogenetic trees, where most community members have short branch lengths [45]. This result suggests that while coarse-graining communities to the genus or family level substantially reduces global richness, it does so without coarse-graining the majority of the evolutionary history that is captured by the phylogeny. Assuming that phylogenies capture ecological changes that occur over evolutionary time, this detail implies that ecological divergence that is captured by the phylogeny should be retained even when communities are considerably coarse-grained.

With our coarse-graining procedures established, we proceeded with our macroecological investigation. Recent efforts have found that the distribution of abundances of a given ASV/OTU maintains a consistent statistically similar form across independent sites and time, a pattern known as the Abundance Fluctuation Distribution [40–42, 46, 47]. By coarse-graining empirical AFDs and rescaling them by their mean and variance (i.e., standard score), we saw that AFDs from the human gut microbiome retain their shape across phylogenetic scales (Fig. 2a). This pattern of invariance holds across environments for both phylogenetic and taxonomic coarse-graining (Figs. S3, S4), suggesting that AFDs can likely be described by a single probability distribution.

The AFD remains invariant under coarse-graining.
We can see that despite phylogenetic coarse-graining, a) the general shape of the AFD for OTUs that were present in all samples (i.e., an occupancy of one) and b) the relationship between the mean coarse-grained abundance and occupancy across sites remains invariant. Furthermore, predictions obtained from the gamma distribution are capable of capturing the relationship between the mean abundance and occupancy, suggesting that the gamma remains a useful quantitative null model under coarse-graining. All data in this plot is from the human gut microbiome.

It has been previously demonstrated that microbial AFDs are well-described by a gamma distribution that is parameterized by the mean relative abundance and the shape parameter (i.e., equal to the squared inverse of the coefficient of variation [40]). This distribution can be viewed as the stationary distribution of a Stochastic Logistic Model (SLM) of growth, a mathematical model that successfully captures macroecological patterns of microbial communities across both sites and time [40, 41, 44] (Eq. 5 in Materials and Methods).

Using this result, we determined whether the gamma sufficiently characterized coarse-grained AFDs. In order to accomplish this task, it is worth noting that we do not directly observe x_i. Rather, our ability to observe a community member is dependent on our sampling effort (i.e., total number of reads for a given site). To account for sampling, one can derive a form of the gamma distribution that explicitly accounts for the sampling process, obtaining the probability of obtaining n reads out of N total reads belonging to a community member Pr (Materials and Methods, [40]). Given that n = 0 for a species we do not observe, we can define the fraction of M sites where a species was observed (i.e., occupancy, o_i) as

We can then compare this prediction to observed estimates of occupancy to assess the accuracy of the gamma distribution across coarse-grained thresholds. We found that Eq. 1 generally succeeded in predicting observed occupancy across phylogenetic and taxonomic scales for all environments (Figs. S5, S6). We then determined whether the gamma is capable of predicting the relationship between macroecological quantities. One such relationship is that the occupancy of a species should increase with its mean abundance, known as the abundance-occupancy relationship [48]. This pattern has been found across microbial systems [49–51] and can be quantitatively predicted using the gamma [40]. We see that this relationship is broadly captured across taxonomic and phylogenetic scales for all environments (Figs. 2b, S7 S8). This result implies that the ability to observe a given taxonomic group is primarily determined by its mean abundance across sites and the sampling effort within a site [40].

To quantitatively assess the gamma we calculated the relative error of our occupancy predictions (Eq. 10) for all coarse-graining thresholds. We found that the mean logarithm of the error only slightly increased for the initial taxonomic and phylogenetic scales, where it then exhibited a sharp decrease across environments (Fig. S9a,b). The error then only began to decrease once the community became highly coarse-grained, harboring a global richness (union of all community members in all sites for a given environment) < 20. This result means that, if anything, the accuracy of the gamma distribution only improves with coarse-graining.

Reconciling coarse-graining and the predictions of the gamma distribution

The consistent predictive success of the gamma distribution in the face of coarse-graining raises the question of why it remains a sufficient null model. The sum of independent gamma distributed random variables only returns a gamma if all variables have identical rate parameters , a requirement that microbial communities clearly do not meet since they typically harbor broad mean abundance distributions. Given that a gamma AFD cannot predict the distribution of correlations between AFDs [40], it is first worth examining whether the degree of dependence between AFDs shapes coarse-grained variables. We first consider the relation between the variance of the sum and the sum of variances.

By plotting against across coarse-grained thresholds, we found that the contribution of covariance was weak, meaning that the statistical moments at higher scales can be approximated by those at lower scales (Figs. S10, S11). This result is consistent with previous efforts demonstrating that the strongest correlations between AFDs are typically concentrated among pairs of closely related community members (i.e., low phylogenetic distance) [52], implying that the effect of correlation should dissipate when communities are coarse-grained by their taxonomy or phylogeny. Given that the variance of the sum can be approximated by the sum of the variances and that, by definition, the mean of a sum is the sum of the means, it is reasonable to propose that the statistical moments of coarse-grained AFDs are sufficient to characterize the distribution.

Finally, while there is no general closed-form solution for the sum of independent gamma distributed random variables with different rate parameters (equivalent to considering the convolution of many AFDs with different carrying capacities), continued progress has been made towards obtaining suitable approximations [53–57]. This body of work includes an analysis demonstrating that a single gamma distribution can provide a suitable approximation to the distribution of the sum of many gamma random variables with different rate parameters [58]. In summary, the gamma distribution successfully captures patterns of biodiversity under taxonomic and phylogenetic coarse-graining because the correlation between community members is typically low and the sum of multiple gamma distributions can be approximated by a single gamma distribution.

Predicting measures of richness and diversity within a coarse-grained scale

Given that the presence or absence of a community member is used to estimate community richness, a measure previously used to make claims about patterns of microbial diversity across taxonomic scales [14], we can visualize the sufficiency of the gamma by predicting the mean richness within an environment at a given coarse-grained threshold (Eq. 11). Likewise, we can use the entirety of the distribution of read counts to predict the diversity within a site, a measure that reflects richness as well as the distribution of abundances within a community (Eq. 12), analytic predictions that we validated through simulations (Fig. S12).

Focusing on the human gut microbiome as an example, we found that we can predict the typical richness of a community across phylogenetic scales using the gamma distribution (Fig. 3a). This success is repeated when we examined the predicted diversity (Fig. 3b). By examining all nine environments, we see that despite the dissimilarity in environments we were able to predict mean richness and diversity in the face of coarse-graining (Fig. 3c,d). The results of this analysis suggest that the composition of microbial communities remains largely invariant in the face of coarse-graining and that the gamma distribution remains a suitable null model for predicting average community measures across coarse-grained scales. Identical results were obtained for taxonomic coarse-graining (Fig. S13).

The gamma distribution successfully predicts mean richness and diversity under phylogenetic coarse-graining.
a) The expected richness predicted by the gamma (Eq. 11) was capable of predicting richness across phylogenetic coarse-graining scales, as illustrated by data from the human gut. b) Predictions remained successful across all environments. c,d) Similarly, predictions of expected diversity (12) succeeded across coarse-graining scales among all environments.

Turning to higher-order moments, we examined the variance of richness and diversity across sites. Using a similar approach that was applied to the mean, we derived analytic predictions for the variance (Eq. 15). With the human gut as an example, we see that analytic predictions typically fail to capture estimates of variance obtained from empirical data for phylogenetic coarse-graining (Fig. 4a,b). This lack of predictive success was consistent across environments (Fig. 4c,d), implying that a model of independent species with gamma distributed abundances is insufficient to capture the variance of measures of biodiversity. A major assumption made in our derivation was that species are independent, an assumption that is almost certainly wrong given that the gamma distribution has been previously demonstrated to be unable to capture the empirical distribution of correlations in the AFDs of community members [40]. To attempt to remedy this failed prediction, we again turned to the law of total variance by estimating the covariance of richness and diversity from empirical data and adding the covariance to the predicted variance for each measure. We found that the addition of this empirical estimate was sufficient to predict the observed variance in the human gut (Fig. 4a,b) as well as across environments (Fig. 4e,f), implying that the underlying model is fundamentally correct for predicting the first moment of measures of biodiversity but cannot capture the correlations necessary to explain higher statistical moments such as the variance. Identical results were again obtained with taxonomic coarse-graining (Fig. S14).

The gamma can only predict the variance of richness and diversity under phylogenetic coarse-graining when covariance is included.
a,b) In contrast with the mean, the variance of richness and diversity estimates predicted by the gamma (Eq. 15) assuming independence among species fails to capture empirical estimates from the human gut. Predictions are only comparable when empirical estimates of covariance are included in the predictions of the gamma distribution. c,d) This lack of predictive success is constant across environments, e,f) though the addition of covariance consistently improves our analytic predictions.

Predicting patterns of richness and diversity between coarse-grained scales

Our predictions of the statistical moments of richness and diversity using the gamma distribution provided the foundation necessary to investigate macroecological patterns between different coarse-grained scales. One such prominent pattern is the relationship between the fine-grain richness/diversity within a given coarse-grained group vs. the coarse-grained richness/diversity among all remaining groups (e.g., the number of classes within Firmicutes vs. the number of phyla excluding the phylum Firmicutes), a pattern that has been purported to reflect the existence of DBD in microbial systems. Before continuing, we note that the acronym DBD technically refers to the hypothesis that such positive relationships reflects the existence of ecological interactions through which coarse-grained diversity bolsters the accumulation of fine-grained diversity (e.g., niche construction [16, 59]). Since we are primarily interested in the predictive power of an empirically-validated null model of biodiversity, we distinguish between DBD as a hypothesis and DBD as an empirical pattern by referring to the slope as the fine vs. coarse-grained relationship throughout the remainder of this manuscript.

The fine vs. coarse-grained relationship can be quantified as the slope of the relationship between the fine-grained richness within a given coarse-grained group g (S_g,m) and the richness in the remaining G − 1 coarse-grained groups: S_g,m ∝ αS_G\g,m, where G \ g denotes the exclusion of group g and α is the slope of the relationship. This formulation was proposed in Madi et al. and to ensure commensurability we adopted it here [14]. Using Eq. 1, we can define each of these estimators in terms of the sampling form of the gamma while accounting for sampling

Similarly, we can use Eq. 12 to derive predictions for fine and coarse-grained diversity.

By repeating this calculation for all M sites, we obtained vectors of coarse and fine-grained richness estimates for group g from which we can infer the fine vs. coarse-grained slope through ordinary least squares regression. By repeating this process for all G groups we obtained a distribution of slopes that can be directly compared to those obtained from empirical data.

Before performing a direct comparison, we first note the features of the empirical slopes and how they pertain to the predictions we obtained. By examining the distribution of empirical slopes pooled over all coarse-graining thresholds for each environment, we see that they are rarely less than zero (Figs. 5a, S19a). The few negative slopes inferred from empirical data were extremely small, having absolute values < 10⁻⁴ and can be treated as zeros. Furthermore, the distribution of slopes follows the same form across environments, suggesting that the fine vs. coarse-grained slope reflects a general feature of community sequence data rather than the ecology of specific environments. Like the empirical slopes, the gamma distribution virtually always predicted a positive slope for all environments for both taxonomic and phylogenetic coarse-graining. This paucity of negative slopes suggests that the prediction of the alternative to the DBD hypothesis, the Ecological Controls hypothesis [60], is virtually absent in empirical data and cannot be generated from an empirically validated null model of microbial biodiversity.

The gamma distribution as a tool for investigating the novelty of fine vs. coarse-grained slopes.
a,b) The predictions of the gamma distribution (Eq. 3) successfully reproduce observed fine vs. coarse-grained richness slopes across scales of phylogenetic coarse-graining. c,d) In contrast, the predictions of the gamma distribution fail to capture diversity slopes (Eq. 4).

However, only observing positive slopes does not necessarily provide support to the DBD hypothesis. A direct comparison of slopes predicted from the gamma distribution to those inferred from empirical data is necessary to determine whether the predictions of DBD lie outside what can be reasonably captured by an interaction-free model such as the SLM. To evaluate the novelty of the fine vs. coarse-grained slopes we compared the values of observed slopes to those obtained from the interaction-free SLM. We found that the predictions of the gamma distribution closely matched the observed slopes across environments for both taxonomic and phylogenetic coarse-graining (Figs. S15, S16). We can consolidate these results by taking the average slope for a given coarse-grained level, from which we see that the mean slope predicted by the gamma distribution does a reasonable job capturing empirical slopes across environments (Figs. 5b, S19b). These results indicate that we should expect to see a positive relationship between richness estimates at different scales and that the relationships we observe can be quantitatively captured by a gamma distributed AFD. It is worth noting that the fine vs. coarse-grained slope could be sufficiently predicted even though the gamma distribution only succeeded at predicting average richness, suggesting that higher order statistical moments, and by extension interactions between community members, are unnecessary to quantitatively capture the positive relationship observed between fine and coarse-grained estimates of richness.

While richness is a widespread and versatile estimator that is commonly used in community ecology, it can neglect a considerable amount of information by focusing on presences and absences instead of the entirety of the distribution of abundances. To rigorously test the predictive power of the gamma distribution it is necessary to evaluate the fine vs. coarse-grained relationship for diversity. We again found that disparate environments had similar distributions of slopes from empirical data (Figs. 5c, S19c), suggesting that the slope of the relationship is likely a general property of microbial communities rather than an environment-specific pattern. However, unlike richness, diversity predictions obtained from the gamma distribution generally failed to capture observed slopes, as the squared correlation between observed and predicted slopes can be less than that of richness by over an order of magnitude (Figs. 5d, S17, S18, S19d). Here we see where the predictions of an interaction-free SLM succeeded and failed to predict observed macroecological patterns.

Given that the gamma distribution failed to predict the observed diversity slope, it is worth evaluating whether additional features could be incorporated to generate successful predictions. A notable omission is that there is an absence of interactions between community members in the SLM, meaning that we were unable to predict correlations between community member abundances. However, while considerable progress has been made (e.g., [11]), predicting the observed distribution of correlation coefficients between community members while accounting for sampling remains a non-trivial task. Given that the gamma distribution succeeded at predicting other macroecological patterns, we elected to perform a simulation where a collection of sites was modeled as an ensemble of communities with correlated gamma-distributed AFDs with the means, variances, correlations, and total depth of sampling set by estimates from empirical data (Materials and methods). By including correlations between AFDs into the simulations, the statistical outcome of ecological interactions between community members, we were able to largely capture the observed fine vs. coarse-grained diversity slopes (Figs. 6, S20, S21, S22). These results suggest that rather than diversity at a fine-scale begetting diversity at a coarse-scale, the correlations that exist at a fine-scale do not dissipate under coarse-graining, resulting in a positive relationship between measures of diversity at different scales.

Gamma distribution simulations with correlations capture observed diversity slopes.
Observed fine vs. coarse-grained diversity slopes can be quantitatively reproduced under phylogenetic coarse-graining by simulating correlated gamma distributed AFDs at the OTU-level.

Discussion

The results of this study demonstrate that macroecological patterns in microbial communities remain largely invariant across taxonomic and phylogenetic scales. By focusing on the predictions of the SLM, an interaction-free model of microbial growth under environmental fluctuations, we were able to evaluate the extent that measures of biodiversity can be predicted under coarse-graining. We were largely able to predict said measures using the same model with zero free parameters (i.e., no statistical fitting) across scales, implying that certain macroecological patterns of microbial communities remain self-similar across taxonomic and phylogenetic scales. Building off of this result, we investigated the dependence of community measures between different degrees of coarse-graining, a pattern that has been formalized as the Diversity Begets Diversity hypothesis [14, 15]. The prediction derived from the sampling form of the gamma distribution quantitatively captured the observed slopes of the fine vs. coarse-grained relationship for richness, while it failed to capture the slope of diversity. We showed that introducing correlations between abundance fluctuations in the SLM allow to recover the prediction for the slopes of diversity.

Our richness results complement past work which demonstrated that occupancy, the constituent of richness, is highly dependent on sampling depth (i.e., total read count) and the mean abundance of a community member [40]. This past work and the relationships between the mean abundance and occupancy evaluated in this manuscript demonstrate that occupancy alone is unlikely to contain ecological information that is not already captured by the distribution of abundances across hosts (i.e., the AFD). Our analyses of the relationship between fine and coarse-grained richness support this conclusion, as predictions derived from a gamma distribution quantitatively captured the observed slope. The success of an interaction-free model in predicting fine vs. coarse-grained slopes is an indictment of the appropriateness of estimators that rely solely on the presence of a community member for identifying novel macroecological patterns, a measure that has been used to bolster support for the DBD hypothesis at the level of 16S rRNA amplicons as well as strains [14, 61]. Rather, estimates of richness harbor little information about the dynamics of a community across taxonomic and phylogenetic scales that is not already captured by the sampling form of the gamma. Contrasting with richness, the predictions of diversity from the gamma sampling distribution were unable to capture fine vs. coarse-grained relationships from empirical data. Given that measures of diversity incorporate information about the richness and evenness of a community [62], the comparative deficiency of our predictions for fine vs. coarse-grained diversity suggests that forms of the SLM that neglect interactions between community members cannot capture coarse-graining relationships that depend on the evenness of the distribution of abundances.

Macroecological patterns are not imbued with mechanistic explanation [63]. Rather, the onus is on the investigator to identify plausible mechanisms. Often in ecology this task is made easier by eliminating mechanisms that are incapable of producing the observed pattern, that is, identifying an appropriate null. The novelty of the fine vs. coarse-grained relationship was previously assessed using a null model which assumed that community dynamics were primarily driven by demographic noise (i.e., the UNTB) [14, 32]. Empirical patterns of microbial abundance cannot be reasonably captured by such models, making predictions obtained from the UNTB invalid for evaluating the novelty of microbial macroecological patterns. In contrast, models that combine self-limiting growth with environmental noise reproduce several empirical patterns, making the SLM an appropriate choice for evaluating the novelty of fine vs. coarse-grained relationships [40, 44]. This is not a trivial detail, as there is historical precedence on the need to identify an appropriate null in order to investigate how fine and coarse-grained measures of biodiversity relate to one another, as one of the earliest adoptions of null model analysis in ecology was done to investigate the ratio of species to genera in a community [64, 65].

In this study, the predictions of the sampling form of the gamma considerably improved when correlations between community members were included. This result suggests that rather than exclusively pointing to niche construction as previously suggested [14], any ecological mechanism that can capture the observed distribution of correlation coefficients is a plausible candidate. Given that models of consumer-resource dynamics have succeeded in capturing macroecological patterns [66, 67], including quantitatively predicting the distribution of correlation coefficients [11], it is reasonable to suggest that such mechanisms are ultimately responsible for the relationship between fine and coarse-grained measures of diversity and can be reduced to phenomenological models such as the SLM. Indeed, experimental investigations of the slope evaluated here have found that positive slopes exist in artificial communities maintained in a laboratory setting, where the strength of the correlation between fine and coarse-grained scales is driven by the secretion of secondary metabolites [9]. This phenomenon, known as cross-feeding, can be viewed as a mechanism that is compatible with the concept of niche construction [16] and the original interpretation of Madi et al. [14].

In the interest of providing macroecological insight into the DBD hypothesis, we solely focused on coarse-graining procedures that relied on phylogenetic reconstruction and taxonomic assignment. However, it is worth noting that it is also possible to coarse-grain community members by the strength of their correlations (i.e., sum the abundances of each pair of community members with the strongest correlation in AFDs). This procedure has been named the phenomenological renormalization group method due to its ability to identify if and where a system is stable despite knowing little about the system’s dynamics (i.e., fixed points in nonlinear systems) [68, 69].

However, given that the AFD correlation between two community members is often inversely related to their phylogenetic distance, such an analysis would likely be redundant, as coarse-graining based on the strength of correlation would effectively coarse-grain the most closely related community members [52].

A major goal of this study was to evaluate the novelty of macroecological patterns that were used to bolster support for the DBD hypothesis. We used the same dataset in order to ensure generality and commensurability with past research efforts. However, it is worth inspecting how the use of a global survey dataset constrains the inferences one can make. Throughout this study we implicitly assumed that an ensemble approach is valid, meaning that we viewed different sites/hosts as virtual copies of a given environment. This assumption is valid for time-series studies where the distribution of microbial abundances remains stationary with respect to time [70], as the stationary solution of the SLM has successfully characterized microbial community time-series at both the level of OTUs [40] and strains [47]. So, in communities that are stationary with respect to time, our fine vs. coarse-grained relationship results will remain valid.

Materials and methods

Data acquisition and processing

To ensure that our analyses were generalizable across ecosystems and commensurate with prior DBD investigations, we used amplicon sequence data from the V4 region of the 16S rRNA gene generated and curated by the Earth Microbiome Project [1, 14]. We restricted our analysis to the quality control (QC)-filtered subset of the EMP, which was annotated using the closed-reference database SILVA [71] and consists of 96 studies culminating in 23,828 total samples with each processed sample having ≥10,000 reads. We downloaded the public Silva reference tree for OTUs with 97% similarity 97 otus.tre from the EMP database. We identified nine heavily sampled environments in the metadata file emp qiime mapping qc filtered.tsv and selected 100 random samples from each environment.

We briefly note that our occupancy and richness predictions depend on the form of the gamma distribution that explicitly accounts for sampling as a multinomial process. The multinomial distribution describes the probability of sampling n reads given a relative abundance of x and total read count N with replacement, a process we can model as the Poisson limit of a binomial sampling process for individual community members. Given this choice and the past success of the gamma distribution, we deviated from past analyses by electing to not sub-sample read counts to the same depth across samples, as the process of sampling without replacement would bias the sampling distribution for rare community members [14].

Coarse-graining protocol

Taxonomic coarse-graining was performed as the summation of the abundances of all OTUs within a given taxonomic group. We removed taxa with indeterminate labels to prevent potential biases due to taxonomic misassignment, (e.g., ”uncultured”, ”ambiguous taxa”, ”candidatus”, ”unclassified”, etc.). Manual inspection of EMP taxonomic annotations revealed a low number of OTUs that had been assigned the taxonomic label of their host (e.g., Arachis hypogaea (peanut)). These marked OTUs were removed from all downstream analyses.

Phylogenetic coarse-graining was performed using the phylogenetic tree provided by SILVA 123 97 otus.tre in the EMP release. Each internal node of a phylogenetic tree was collapsed if the mean branch lengths of its descendants was less than a given distance. All phylogenetic operations were performed using the Python package ETE3 [72].

Deriving biodiversity measure predictions

While the gamma distribution as the stationary solution of the SLM and the sampling form of the gamma have been previously derived [40], we briefly outline relevant derivations here for the convenience of the reader before deriving the predicted richness and diversity of a community. We define the SLM as the following Langevin equation

Here τ_i, K_i, and σ_τi represent the timescale of growth, the carrying capacity, and the coefficient of variation of growth rate fluctuations, respectively. Multiplicative environmental noise is captured by the product of a linear frequency term, the coefficient of variation of growth rate fluctuations, and a Brownian noise term η(t) that introduces stochasticity into the equation. The expected value of η(t) is ⟨η(t)⟩= 0 [73].

The dependence of η(t^′) at time t^′ on an earlier time η(t) is defined as ⟨η(t)η(t^′)⟩ = δ(t −t^′) [73]. This standard definition means that if the noise term is shifted in time, it has zero correlation with itself. We briefly note that because DBD patterns were originally investigated in Madi et al. using an ensemble of sites that belong to the same type of ecosystem rather than the timeseries of a single site [14], the gamma distribution alone does not prove the validity of the SLM nor does it prove alternatively formulated stochastic differential equations of ecology that also predict a gamma distribution (e.g., [74]). However, given that the SLM has successfully characterized the temporal dynamics of microbial communities, we believe that this model is an appropriate formulation for investigating DBD patterns [40, 44, 47].

In contrast to the SLM, macroecological predictions can be derived from the UNTB. There are many forms of the UNTB, but the novelty of observed fine vs. coarse-grained relationships was assessed using a form of the UNTB that predicts that the distribution of community member abundances within a given site follows a zero-sum multinomial distribution [14, 32]. Formulations of the UNTB generally seek to capture ecological dynamics in a community without fixed carrying capacities that are driven by demographic noise, so for the purpose of this study the SLM can be viewed in contrast to the following Langevin model of neutral ecology: .

The stationary distribution of the SLM can be derived using the Itô ↔ Fokker-Planck equivalence and taking the stationary solution [40, 75], resulting in the gamma distributed AFD. Through the SLM, we can define the mean relative abundance and its squared inverse coefficient of variation as and , respectively. These are parameters that are estimated from the empirical data and will be used below to obtain predictions. Using these definitions and the stationary distribution of Eq. 5, we obtain the gamma distribution

When we sequence microbial communities, we do not directly observe abundances. Rather, we obtain read counts from sequencing. Therefore, it is necessary to account for the reality of sampling when we apply Eq. 6 to empirical data. We can account for sampling by first assuming that the probability of observing a single community member can be modeled as a binomial sampling process. Given that the total number of reads is typically large (N ≫ 1) and the typical relative abundance of a community member is much smaller than one (x_i ≪1), the binomial can be approximated as a Poisson sampling process with the following probability of sampling n reads

This formulation of the sampling process is convenient, as it can be used to obtain an analytic solution for the probability of observing n reads given and β_i, the parameters we estimate from the data. This distribution can be obtained solving the convolution of the Poisson and the gamma [40]. The result allows us to calculate the probability of obtaining n reads out of a total sampling depth of N for the ith OTU as

We will be using this equation to obtain predictions of measures of biodiversity. First, noticing that the probability of a community member’s absence is the complement of its presence, we can define the expected occupancy of a community member across M sites as

The success of our predictions was assessed using the relative error.

Using the definition of occupancy from the sampling form of the gamma, we derive the expected richness of a community as

Similarly, we can derive the expected diversity as

In physics parlance, these predictions neglect interactions between community members, also known as mean-field predictions. We can calculate the mean-field prediction of Eq. 11 from empirical data. However, there is no analytic solution for the integral inside the sum of Eq. 12. To calculate ⟨H⟩, we performed numerical integration on each integral for each taxon in each sample at a given coarse-grained resolution using the quad() function from SciPy.

To predict the variance of each measure we derived the expected value of the second moment, assuming independence among community members. We derive the second moments of richness and diversity

where δ_i,j is the Kronecker delta.

By performing an analogous series of operations, we obtain the expected value of the second moment for diversity.

Where the expected value of the second moment of the diversity term is defined as . From which we obtain the expected value of the variance

Fine vs. coarse-grained slope inference

In order to predict the relationship between the measures within a coarse-grained group and that among all remaining groups, we calculated a vector of predicted richness or diversity estimates for all sites using Eq. 3a or Eq. 4a within a given coarse-grained group and 3b or Eq. 4b among the remaining groups. This “leave-one-out” procedure was originally implemented in Madi et al., where the authors examined the slope of fine vs. coarse-grained measures of diversity as a sliding window across taxonomic ranks with both the fine and coarse scales increasing with each rank (e.g., genus:family, family: order, etc.) [14]. To maintain consistency, we used the same definition for our predictions. We also extended the definition to the case of phylogenetic coarse-graining, where we compared fine and coarse scales using different phylogenetic distances while retaining the same ratio (e.g., 0.1:0.3, 0.3:0.5, etc.).

Slopes were estimated using ordinary least squares regression with SciPy. Throughout the manuscript the success of a prediction was evaluated by calculating its relative error as follows. We only inferred the slope if a fine-grained group had at least five members. We only examined the slopes of a given coarse-grained threshold if at least three slopes could be inferred.

Simulating communities of correlated gamma-distributed AFDs

Correlated gamma-distributed AFDs were simulated by performing inverse transform sampling. For each environment with M sites, an M × S_obs matrix Z was generated from the standard Gaussian distribution using the empirical S_obs × S_obs correlation matrix of empirical relative abundances. The cumulative distribution U = Φ (Z)_Gaus. was obtained and a matrix of SADs was obtained using the point percentile function of the gamma and empirical distribution of mean relative abundances and the squared inverse coefficient of variation of abundances: , β = β₁, β₂, ⋯, . To simulate the process of sampling, each SAD of the resulting M × S_obs matrix of true relative abundances was sampled using a multinomial distribution with the empirical distribution of total read counts.

Supporting information

Supplemental Information

Data and code availability

All sequencing data used in this study was obtained from the EMP (URL: ftp://ftp.microbio.me/emp/release1/). Processed data are available on Zenodo, DOI: 10.5281/zenodo.7692046. All code written for this study is available on GitHub under a GNU General Public License: macroeco phylo

Acknowledgements

This work was supported by the NSF Postdoctoral Research Fellowships in Biology Program under Grant No. 2010885 (W.R.S.).

Author contributions

W.R.S. and J.G. conceptualized the project, completed the derivations, and wrote the manuscript. W.R.S. performed all analyses.

References

1.
1. Thompson LR
2. Sanders JG
3. McDonald D
4. Amir A
5. Ladau J
6. Locey KJ
7. et al.
2017A communal catalogue reveals Earth’s multiscale microbial diversityIn: Nature Nature Publishing Group pp. 457–63https://www.nature.com/articles/nature24621 Google Scholar
2.
1. Shoemaker WR
2. Locey KJ
3. Lennon JT
2017A macroecological theory of microbial biodiversityNature Ecology & Evolution 1:107Google Scholar
3.
1. Barberán A
2. Casamayor EO
3. Fierer N.
2014The microbial contribution to macroecologyFrontiers in Microbiology 5https://www.frontiersin.org/articles/10.3389/fmicb.2014.00203 Google Scholar
4.
1. Locey KJ
2. Lennon JT
2016Scaling laws predict global microbial diversityIn: Proceedings of the National Academy of Sciences Proceedings of the National Academy of Sciences pp. 5970–5https://www.pnas.org/doi/10.1073/pnas.1521291113 Google Scholar
5.
1. Lennon JT
2. Locey KJ
2020More support for Earth’s massive microbiomeBiology Direct 15:5https://doi.org/10.1186/s13062-020-00261-8 Google Scholar
6.
1. Dal Bello M
2. Lee H
3. Goyal A
4. Gore J.
2021Resource–diversity relationships in bacterial communities reflect the network structure of microbial metabolismNature Ecology & Evolution 5:1424–34https://www.nature.com/articles/s41559-021-01535-8 Google Scholar
7.
1. Louca S
2. Parfrey LW
3. Doebeli M.
2016Decoupling function and taxonomy in the global ocean microbiomeScience (New York, NY) 353:1272–7Google Scholar
8.
1. Goldford JE
2. Lu N
3. BajićD Estrela S
4. Tikhonov M
5. Sanchez-Gorostiaga A
6. et al.
2018Emergent simplicity in microbial community assemblyScience (New York, NY) 361:469–74Google Scholar
9.
1. Estrela S
2. Diaz-Colunga J
3. Vila JCC
4. Sanchez-Gorostiaga A
5. Sanchez A.
2022Diversity begets diversity under microbial niche constructionbioRxiv https://www.biorxiv.org/content/10.1101/2022.02.13.480281v1
10.
1. Estrela S
2. Sanchez-Gorostiaga A
3. Vila JC
4. Sanchez A.
2021Nutrient dominance governs the assembly of microbial communities in mixed nutrient environmentsIn: eLife eLife Sciences Publications, Ltd :e65948https://doi.org/10.7554/eLife.65948 Google Scholar
11.
1. Ho PY
2. Good BH
3. Huang KC
2022Competition for fluctuating resources reproduces statistics of species abundance over time across wide-ranging microbiotasIn: eLife eLife Sciences Publications, Ltd :e75168https://doi.org/10.7554/eLife.75168 Google Scholar
12.
1. Good BH
2. Rosenfeld LB
2022Eco-evolutionary feedbacks in the human gut microbiomebioRxiv https://www.biorxiv.org/content/10.1101/2022.01.26.477953v1
13.
1. Tian L
2. Wang XW
3. Wu AK
4. Fan Y
5. Friedman J
6. Dahlin A
7. et al.
2020Deciphering functional redundancy in the human microbiomeIn: Nature Communications Nature Publishing Group :6217https://www.nature.com/articles/s41467-020-19940-1 Google Scholar
14.
1. Madi N
2. Vos M
3. Murall CL
4. Legendre P
5. Shapiro BJ
2020Does diversity beget diversity in microbiomes?In: eLife eLife Sciences Publications, Ltd :e58999https://doi.org/10.7554/eLife.58999 Google Scholar
15.
1. Whittaker RH
1972Evolution and Measurement of Species DiversityIn: Taxon International Association for Plant Taxonomy (IAPT pp. 213–51https://www.jstor.org/stable/1218190 Google Scholar
16.
1. Roman MS
2. Wagner A.
2018An enormous potential for niche construction through bacterial cross-feeding in a homogeneous environmentIn: PLOS Computational Biology Public Library of Science :e1006340https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006340 Google Scholar
17.
1. Maynard DS
2. Bradford MA
3. Lindner DL
4. van Diepen LTA
5. Frey SD
6. Glaeser JA
7. et al.
2017Diversity begets diversity in competition for spaceIn: Nature Ecology & Evolution Nature Publishing Group pp. 1–8https://www.nature.com/articles/s41559-017-0156 Google Scholar
18.
1. Calcagno V
2. Jarne P
3. Loreau M
4. Mouquet N
5. David P.
2017Diversity spurs diversification in ecological communitiesIn: Nature Communications Nature Publishing Group :15810https://www.nature.com/articles/ncomms15810 Google Scholar
19.
1. Good BH
2. Hallatschek O.
2018Effective models and the search for quantitative principles in microbial evolutionCurrent Opinion in Microbiology 45:203–12Google Scholar
20.
1. Scott M
2. Klumpp S
3. Mateescu EM
4. Hwa T.
2014Emergence of robust growth laws from optimal regulation of ribosome synthesisMolecular Systems Biology 10:747https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4299513/Google Scholar
21.
1. Jun S
2. Si F
3. Pugatch R
4. Scott M.
2018Fundamental principles in bacterial physiology—history, recent progress, and the future with focus on cell size control: a reviewIn: Reports on Progress in Physics IOP Publishing :056601https://doi.org/10.1088/1361-6633/aaa628 Google Scholar
22.
1. Schweinsberg J.
2003Coalescent processes obtained from supercritical Galton–Watson processesStochastic Processes and their Applications 106:107–39https://www.sciencedirect.com/science/article/pii/S0304414903000280 Google Scholar
23.
1. Desai MM
2. Walczak AM
3. Fisher DS
2013Genetic Diversity and the Structure of Genealogies in Rapidly Adapting PopulationsGenetics 193:565–85https://doi.org/10.1534/genetics.112.147157 Google Scholar
24.
1. Moran J
2. Tikhonov M.
2022Defining Coarse-Grainability in a Model of Structured Microbial EcosystemsIn: Physical Review X American Physical Society :021038https://link.aps.org/doi/10.1103/PhysRevX.12.021038 Google Scholar
25.
1. Tikhonov M.
2017Theoretical microbial ecology without speciesPhysical Review E 96:032410Google Scholar
26.
1. O’Dwyer JP
2. Rominger A
3. Xiao X.
2017Reinterpreting maximum entropy in ecology: a null hypothesis constrained by ecological mechanismEcology Letters 20:832–41https://onlinelibrary.wiley.com/doi/pdf/10.1111/ele.12788 Google Scholar
27.
1. McGill BJ
2010Towards a unification of unified theories of biodiversityEcology Letters 13:627–42https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1461-0248.2010.01449.x Google Scholar
28.
1. Harte J.
2011Maximum entropy and ecology: a theory of abundance, distribution, and energeticsOxford series in ecology and evolution. Oxford ; New York: Oxford University Press Google Scholar
29.
1. Hubbell SP
2011The Unified Neutral Theory of Biodiversity and Biogeography (MPB-32In: Publication Title: The Unified Neutral Theory of Biodiversity and Biogeography (MPB-32) Princeton University Press https://www.degruyter.com/document/doi/10.1515/9781400837526/html Google Scholar
30.
1. Volkov I
2. Banavar JR
3. Hubbell SP
4. Maritan A.
2003Neutral theory and relative species abundance in ecologyIn: Nature Nature Publishing Group pp. 1035–7https://www.nature.com/articles/nature01883 Google Scholar
31.
1. Azaele S
2. Suweis S
3. Grilli J
4. Volkov I
5. Banavar JR
6. Maritan A.
2016Statistical mechanics of ecological systems: Neutral theory and beyondIn: Reviews of Modern Physics American Physical Society :035003https://link.aps.org/doi/10.1103/RevModPhys.88.035003 Google Scholar
32.
1. Alonso D
2. McKane AJ
2004Sampling Hubbell’s neutral theory of biodiversityEcology Letters 7:901–10https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1461-0248.2004.00640.x Google Scholar
33.
1. Azaele S
2. Pigolotti S
3. Banavar JR
4. Maritan A.
2006Dynamical evolution of ecosystemsIn: Nature Nature Publishing Group pp. 926–8https://www.nature.com/articles/nature05320 Google Scholar
34.
1. Simberloff D.
1983Competition Theory, Hypothesis-Testing, and Other Community Ecological BuzzwordsIn: The American Naturalist The University of Chicago Press pp. 626–35https://www.journals.uchicago.edu/doi/10.1086/284163 Google Scholar
35.
1. Harvey PH
2. Colwell RK
3. Silvertown JW
4. May RM
1983Null Models in EcologyAnnual Review of Ecology and Systematics 14:189–211https://doi.org/10.1146/annurev.es.14.110183.001201 Google Scholar
36.
1. Gotelli NJ
2. Graves GR
1996Null Models in Ecologyhttp://repository.si.edu/xmlui/handle/10088/7782
37.
1. Gotelli NJ
2. Ulrich W.
2012Statistical challenges in null model analysisOikos 121:171–80https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1600-0706.2011.20301.x Google Scholar
38.
1. Li L
2. Ma ZS
2016Testing the Neutral Theory of Biodiversity with Human Microbiome DatasetsIn: Scientific Reports Nature Publishing Group :31448https://www.nature.com/articles/srep31448 Google Scholar
39.
1. Harris K
2. Parsons TL
3. Ijaz UZ
4. Lahti L
5. Holmes I
6. Quince C.
2017Linking Statistical and Ecological Theory: Hubbell’s Unified Neutral Theory of Biodiversity as a Hierarchical Dirichlet ProcessIn: Proceedings of the IEEE Proceedings of the IEEE pp. 516–29Google Scholar
40.
1. Grilli J.
2020Macroecological laws describe variation and diversity in microbial communitiesIn: Nature Communications Nature Publishing Group :4743https://www.nature.com/articles/s41467-020-18529-y Google Scholar
41.
1. Zaoli S
2. Grilli J.
2021A macroecological description of alternative stable states reproduces intra- and inter-host variability of gut microbiomeScience Advances 7:eabj2882Google Scholar
42.
1. Zaoli S
2. Grilli J.
2022The stochastic logistic model with correlated carrying capacities reproduces beta-diversity metrics of microbial communitiesIn: PLOS Computational Biology Public Library of Science :e1010043https://journals.plos.org/ploscompbiol/Google Scholar
43.
1. Descheemaeker L
2. Grilli J
3. de Buyl S.
2021Heavy-tailed abundance distributions from stochastic Lotka-Volterra modelsIn: Physical Review E American Physical Society :034404https://link.aps.org/doi/10.1103/PhysRevE.104.034404 Google Scholar
44.
1. Descheemaeker L
2. de Buyl S.
2020Stochastic logistic models reproduce experimental time series of microbial communitiesIn: eLife eLife Sciences Publications, Ltd :e55650https://doi.org/10.7554/eLife.55650 Google Scholar
45.
1. O’Dwyer JP
2. Kembel SW
3. Sharpton TJ
2015Backbones of evolutionary history test biodiversity theory for microbesIn: Proceedings of the National Academy of Sciences Proceedings of the National Academy of Sciences pp. 8356–61https://www.pnas.org/doi/10.1073/pnas.1419341112 Google Scholar
46.
1. Shoemaker WR
2022A macroecological perspective on genetic diversity in the human gut microbiomebioRxiv https://www.biorxiv.org/content/10.1101/2022.04.07.487434v1
47.
1. Wolff R
2. Shoemaker W
3. Garud N.
2023Ecological Stability Emerges at the Level of Strains in the Human Gut MicrobiomeIn: mBio American Society for Microbiology pp. e02502–22https://journals.asm.org/doi/10.1128/mbio.02502-22 Google Scholar
48.
1. Gaston KJ
2. Blackburn TM
3. Greenwood JJD
4. Gregory RD
5. Quinn RM
6. Lawton JH
2000Abundance–occupancy relationshipsJournal of Applied Ecology 37:39–59https://onlinelibrary.wiley.com/doi/pdf/10.1046/j.1365-2664.2000.00485.x Google Scholar
49.
1. Shade A
2. Dunn RR
3. Blowes SA
4. Keil P
5. Bohannan BJM
6. Herrmann M
7. et al.
2018Macroecology to Unite All Life, Large and SmallTrends in Ecology & Evolution 33:731–44Google Scholar
50.
1. Sloan WT
2. Woodcock S
3. Lunn M
4. Head IM
5. Curtis TP
2007Modeling Taxa-Abundance Distributions in Microbial Communities using Environmental Sequence DataMicrobial Ecology 53:443–55https://doi.org/10.1007/s00248-006-9141-x Google Scholar
51.
1. Burns AR
2. Stephens WZ
3. Stagaman K
4. Wong S
5. Rawls JF
6. Guillemin K
7. et al.
2016Contribution of neutral processes to the assembly of gut microbial communities in the zebrafish over host developmentIn: The ISME Journal Nature Publishing Group pp. 655–64https://www.nature.com/articles/ismej2015142 Google Scholar
52.
1. Sireci M
2. Muñoz MA
3. Grilli J.
2022Environmental fluctuations explain the universal decay of species-abundance correlations with phylogenetic distancebioRxiv https://www.biorxiv.org/content/10.1101/2022.07.12.499693v2
53.
1. Stewart T
2. Strijbosch LWG
3. Moors H
4. Pv Batenburg
2007A Simple Approximation to the Convolution of Gamma DistributionsSSRN Electronic Journal http://www.ssrn.com/abstract=900109
54.
1. Murakami H.
2015Approximations to the distribution of sum of independent non-identically gamma random variablesMathematical Sciences 9:205–13http://link.springer.com/10.1007/s40096-015-0169-2 Google Scholar
55.
1. Hu C
2. Pozdnyakov V
3. Yan J.
2020Density and distribution evaluation for convolution of independent gamma variablesComputational Statistics 35:327–42http://link.springer.com/10.1007/s00180-019-00924-9 Google Scholar
56.
1. Behme A
2. Bondesson L.
2017A class of scale mixtures of $\operatorname {Gamma} (k)$-distributions that are generalized gamma convolutionsBernoulli 23https://projecteuclid.org/journals/bernoulli/volume-23/issue-1/Google Scholar
57.
1. Barnabani M.
2017An approximation to the convolution of gamma distributionsCommunications in Statistics - Simulation and Computation 46:331–43https://www.tandfonline.com/doi/full/10.1080/03610918.2014.963612 Google Scholar
58.
1. Covo S
2. Elalouf A.
2014A novel single-gamma approximation to the sum of independent gamma variables, and a generalization to infinitely divisible distributionsElectronic Journal of Statistics 8https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-8/issue-1/A-novel-single-gamma-approximation-to-the-sum-of-independent/10.1214/14-EJS914.full Google Scholar
59.
1. Laland KN
2. Odling-Smee FJ
3. Feldman MW
1999Evolutionary consequences of niche construction and their implications for ecologyIn: Proceedings of the National Academy of Sciences Proceedings of the National Academy of Sciences pp. 10242–7https://www.pnas.org/doi/full/10.1073/pnas.96.18.10242 Google Scholar
60.
1. Schluter D
2. Pennell MW
2017Speciation gradients and the distribution of biodiversityIn: Nature Nature Publishing Group pp. 48–55https://www.nature.com/articles/nature22897 Google Scholar
61.
1. Madi NJ
2. Chen D
3. Wolff R
4. Shapiro BJ
5. Garud NR
2023Community diversity is associated with intra-species genetic diversity and gene loss in the human gut microbiomeIn: eLife eLife Sciences Publications, Ltd :e78530https://doi.org/10.7554/eLife.78530 Google Scholar
62.
1. Magurran AE
2004Measuring biological diversityMalden, Ma: Blackwell Pub Google Scholar
63.
1. Warren RJ
2. Costa JT
3. Bradford MA
2022Seeing shapes in clouds: the fallacy of deriving ecological hypotheses from statistical distributionsOikos https://onlinelibrary.wiley.com/doi/10.1111/oik.09315
64.
1. Williams CB
1947The Generic Relations of Species in Small Ecological CommunitiesIn: Journal of Animal Ecology [Wiley, British Ecological Society] pp. 11–8https://www.jstor.org/stable/1502 Google Scholar
65.
1. Smith FA
2. Gittleman JL
3. Brown JH
2014Foundations of macroecology: classic papers with commentariesChicago: University of Chicago Press Google Scholar
66.
1. Chesson P.
1990MacArthur’s consumer-resource modelTheoretical Population Biology 37:26–38https://www.sciencedirect.com/science/article/pii/004058099090025Q Google Scholar
67.
1. Cui W
2. Marsland R
3. Mehta P.
2021Diverse communities behave like typical random ecosystemsIn: Physical Review E American Physical Society :034416https://link.aps.org/doi/10.1103/PhysRevE.104.034416 Google Scholar
68.
1. Nicoletti G
2. Suweis S
3. Maritan A.
2020Scaling and criticality in a phenomenological renormalization groupIn: Physical Review Research American Physical Society :023144https://link.aps.org/doi/10.1103/PhysRevResearch.2.023144 Google Scholar
69.
1. Meshulam L
2. Gauthier JL
3. Brody CD
4. Tank DW
5. Bialek W.
2019Coarse Graining, Fixed Points, and Scaling in a Large Population of NeuronsIn: Physical Review Letters American Physical Society :178103https://link.aps.org/doi/10.1103/PhysRevLett.123.178103 Google Scholar
70.
1. Faith JJ
2. Guruge JL
3. Charbonneau M
4. Subramanian S
5. Seedorf H
6. Goodman AL
7. et al.
2013The long-term stability of the human gut microbiotaScience (New York, NY) 341:1237439Google Scholar
71.
1. Quast C
2. Pruesse E
3. Yilmaz P
4. Gerken J
5. Schweer T
6. Yarza P
7. et al.
2013The SILVA ribosomal RNA gene database project: improved data processing and web-based toolsNucleic Acids Research 41:D590–6Google Scholar
72.
1. Huerta-Cepas J
2. Serra F
3. Bork P.
2016ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic DataMolecular Biology and Evolution 33:1635–8Google Scholar
73.
1. Gardiner CW
2009Stochastic methods: a handbook for the natural and social sciencesIn: 4th ed. No. 13 in Springer series in synergetics Berlin Heidelberg: Springer Google Scholar
74.
1. George AB
2. O’Dwyer J.
2022Universal abundance fluctuations across microbial communities, tropical forests, and urban populationsbioRxiv https://www.biorxiv.org/content/10.1101/2022.09.14.508016v1
75.
1. Engen S
2. Lande R.
1996Population Dynamic Models Generating Species Abundance Distributions of the Gamma TypeJournal of Theoretical Biology 178:325–31https://www.sciencedirect.com/science/article/pii/S0022519396900284 Google Scholar

Article and author information

Author information

William R. Shoemaker
Quantitative Life Sciences, The Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, 34151, Italy
ORCID iD: 0000-0003-0111-4838
- Corresponding author; email: williamrshoemaker@gmail.com
- Contact:⠀williamrshoemaker@gmail.com, grilli.jacopo@gmail.com
Jacopo Grilli
Quantitative Life Sciences, The Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, 34151, Italy
ORCID iD: 0000-0002-8235-5803
- Contact:⠀williamrshoemaker@gmail.com, grilli.jacopo@gmail.com

Version history

Preprint posted: May 24, 2023
Sent for peer review: June 19, 2023
Reviewed Preprint version 1: August 21, 2023
Reviewed Preprint version 2: December 21, 2023
Version of Record published: January 22, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.89650. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 1,807
downloads: 155
citations: 14

Views, downloads and citations are aggregated across all versions of this paper published by eLife.