1 Introduction

Bacterial communities are a fundamental reservoir of ecological functions and biological diversity. They are relevant for any environmental and host-associated ecosystem, ranging from soil to the human gut. In particular, the human gut microbiome, by interacting with host metabolism and immune system Levy et al. [2016], Barroso-Batista et al. [2015], Barreto and Gordo [2023], is a key regulator of human health. Dysbioses are defined as alterations of the healthy microbiome composition associated with the gastrointestinal tract Lloyd-Price et al. [2019], Das and Nair [2019], Nishida et al. [2018]. Some recent studies have shown that gut dysbioses are associated with changes in microbial species interaction patterns Bashan et al. [2016], Seppi et al. [2023], while emergent patterns focusing on single species properties Azaele et al. [2016], Grilli [2020], such as species abundance distribution (SAD), fail to characterize altered states of microbiome Seppi et al. [2023], Pasqualini et al. [2023]. Microbiomes host various interactions, measured for in-vitro communities Kehe et al. [2021], Venturelli et al. [2018], and are expected to govern natural ones.

Recently, stochastic non-interacting neutral Zeng et al. [2015], Venkataraman et al. [2015], Sala et al. [2016], Sieber et al. [2019] and logistic models Descheemaeker and De Buyl [2020], Grilli [2020], Zaoli and Grilli [2022], have been applied to give a satisfactory description of empirical patterns in different types of communities, ranging from human to soil microbiomes. However, they completely neglect species interactions in the population dynamics, and therefore they do not seem to have the right modelling tools to obtain novel insights from dysbioses models.

For these reasons, several approaches have been proposed in recent years to infer microbial interactions Faust and Raes [2012], Xiao et al. [2017], Camacho-Mateu et al. [2024]. However, such inference remains very challenging and problematic in many respects Faisal et al. [2010], Angulo et al. [2017], Tu et al. [2019], Armitage and Jones [2019], Holt [2020]: a) the very high dimension of typical microbiome datasets and the lack of longitudinal (long-term) experiments; b) the time-dependent nature of microbial species interactions Pacciani-Mori et al. [2021]; c) abiotic factors such as resources, temperature, and pH, can introduce environmental filtering effects Sireci et al. [2022] and induce effective interactions; d) experimental and technical challenges that introduce sampling effects, false positive species Tovo et al. [2020], spurious correlation, and bias in the data Weiss et al. [2016], Gloor et al. [2017], Dohlman and Shen [2019]. Therefore, even in a scenario where metagenomic samples are noise- and bias-free, reconstructing species-interaction networks of the underlying microbial dynamics remains a hard task.

An alternative, theoretically more elegant approach was pioneered by May May [1972], proposing to use random matrices to model species interaction networks, i.e., each entry of the adjacency matrix is extracted at random from a given distribution. Given the impossibility of empirically measuring the interaction strengths, the advantage of such an approach is the reduction from ~ S2 (if S is the number of species) to just a few parameters (e.g., 2) used to parameterize the distribution Allesina and Tang [2012]. In recent years several investigations have applied this framework to the study of population dynamics through generalized Lotka-Volterra (gLV) equations, employing quenched disordered system techniques (also known as “glassy”), mutated from Statistical Physics Bunin [2017], Galla [2018], Biroli et al. [2018], Altieri et al. [2021], Lorenzana and Altieri [2022]. The main idea is thus to focus on the ensemble properties of the gLV, which will depend only on a few relevant control parameters (e.g., the mean μ and the variance σ2 of the random interactions strengths) Barbier et al. [2018]. However, despite a range of very interesting results, this approach has remained confined mainly to purely theoretical domains (but see Hatton et al. [2024]) and has only recently been used to give an interpretation of controlled experiments with microbial communities Hu et al. [2022].

In this work, we therefore provide a proof of concept for the applicability of this framework to human microbiomes. In particular, we will infer the parameters of the quenched dgLV model from healthy and unhealthy cohorts of gut microbiomes. We will show that the different physiological states of the human gut microbiome correspond to different interaction patterns of the dgLV model. We will then introduce quantitative metrics to quantify the role of stability and the contribution of ecological forces to the dynamics.

2 Results

2.1 Disordered generalized Lotka-Volterra model

The dgLV model describes the evolution in time of the concentration abundances of a local pool of S interacting species, i.e.,

where Ni is the populations of species i-th, Ki is its carrying capacity, and ri the growth rate where sets the timescale of species dynamics, which we will assume will not depend on the species, i.e. ρi = ρ. The coefficients αij are random i.i.d random variables with and . We also include an immigration species-independent rate λ, which will be treated as a reflecting wall mathematically regularizing the problem, and a demographic noise term Altieri et al. [2021] ηi, with , and where δ denotes the Dirac delta and T sets the scale of noise intensity. For notational purposes, we also introduce its inverse β = T-1. Finally, we require that the interactions are symmetric, i.e., αij = αji. This way, we can map this problem into an equilibrium thermodynamic one that is exactly solvable. As shown in the Supplementary Information (SI), section A, it is possible to justify this symmetric assumption and a linear dependence between the growth rate and the carrying capacity, ri = ρKi, by considering the quasi-stationary approximation of the MacArthur consumer-resource model. In particular, we set ρ = 1 Biroli et al. [2018]. The presence of a noise term allows us to write the Fokker-Planck equation of the system and study its stationary solution (see Altieri et al. [2021], Altieri and Biroli [2022] for a detailed derivation in a similar Hamiltonian formalism). Here, we adopt the Itô prescription for the stochastic dynamics in such a way as to prevent species resurgence by noise Biroli et al. [2018].

The replica formalism, a well-known technique in disordered systems, comes into play and allows us to derive a non-interacting Hamiltonian corresponding to the dgLV symmetric interactions αij (see Methods and the SI, Section B for the complete derivation). In simple words, instead of trying to solve the problem for one random setup, the formalism considers many replicas of the system and averages their behaviors. This approach helps to smooth out the randomness and reveals the typical behavior of the system. We also assume a single equilibrium scenario, known as replica-symmetric (RS) regime. Although the SADs of empirical microbial communities display fat tails, a feature more compatible with the multi-attractors phase of the asymmetric 1RSB Mallmin et al. [2024] case or with the GLV with annealed disorder Suweis et al. [2023], we consider this simplification as it is the regime in which we can obtain explicit analytical relations between the model parameters and the data and where model inversion is feasible.

On the other hand, by using a cavity argument Mezard and Montanari [2009], Altieri and Baity-Jesi [2024], one can analytically derive the SAD of the model (see Methods and SI, Section D):

where the auxiliary variable takes the disorder interactions into account; z is a standardized Gaussian variable, , , , and m =1 — βσ2(qdq0) that should be greater than zero for the theory to be consistent Biroli et al. [2018].

The RS ansatz can also be characterized by its stability to external perturbations, such as the external temperature, the immigration rate, or the interaction heterogeneity. To investigate the stability of the RS phase, we consider the Hessian matrix of the theory by performing a harmonic expansion of the replicated free energy de Almeida and Thouless [1978]. When the leading eigenvalue of the Hessian matrix – suitably diagonalized on the so-called replicon sector – goes below zero, unstable equilibria appear: either the system moves towards a Replica Symmetric Breaking (RSB) phase with multiple locally stable minima, or it develops a marginal full-RSB phase, leading to a hierarchical structure of states and a very complex landscape (also known as glass phase). Therefore, in the latter cases, the RS ansatz no longer represents a thermodynamically stable phase. Based on standard calculations in disordered systems, this leading eigenvalue of the Hessian matrix, called replicon R, can be analytically evaluated as

where is the species response to an external perturbation, keeping track of non-extinct species only. For more details, see SI Section C.

2.2 Data Through the Glass

We now consider cross-sectionally sampled gut microbiomes of two different cohorts: one of healthy and another one of diseased individuals. Here we focus on chronic inflammation syndromes, but, in principle, what we present holds for groups of phenotypically distinct populations. We can idealize the samples of each group as generated from the stationary distribution of the dgLV Eq.(1), with different realizations of disorder α, but with shared μ and ρ (see Methods). The approach we propose therefore considers all samples in the same group as coming from the same statistical ensemble but, on the other hand, it distinguishes ecological regimes characterized by distinct phenotypes through different statistics (e.g., dissimilar μ, and σ) of the random species interactions α. We aim to test the hypothesis that each cohort will be described by a different set of ecological parameters (θ = (μ, σ, T, λ)).

Inference of the dgLV generative model is performed by a moment matching optimization procedure. We aim to infer the free parameters θ = (μ, σ, T, λ) so that to match the order parameters and the average carrying capacity (h, q0, qd, K). This procedure allows us to extract ecological dynamics information for cross-sectional data of healthy (blue) and diseased (red) microbiomes, which are conceptualized as independent disorder realizations.

To calculate the order parameters (h, qd, q0, K) from the data, we thus need to specify how we perform the ensemble and the disorder averages from the data (see Figure 1). Given that we typically lack time series data, and leveraging on an effective mean-field description, we consider estimating the ensemble average as the average among species (that is, different species are realizations of the same underlying stochastic process Azaele et al. [2016], known as neutral hypothesis), i.e., . On the other hand, we assume that the average over the disorder can be computed as a sample average. In other words, within a given phenotype (healthy/unhealthy) each measured microbiome configuration is a sample from the stationary distribution of the dgLV model, with a given realization of the disorder, i.e., , where R is the number of samples and typically R ≫ 1. Moreover, due to our limited knowledge of the fine details governing the interactions in each microbiome, it is reasonable to assume that all communities with a given macrostate - a point in the (h, q0, qd, K) space - experience the same demographic noise T, immigration rate λ, and disorder parameters μ and σ. Eventually, we can compute the order parameters from the data as:

where Na,j gives the population density of species j in sample a.

To evaluate the carrying capacities Ki from the data, as done in most of the works using dGLV Bunin [2017], Biroli et al. [2018], Altieri et al. [2021], Suweis et al. [2023], Mallmin et al. [2024], we assume that in each cohort, all species are characterized by the same carrying capacity Ki = K. Here, we assign the value K as the average maximum relative abundance of species among the available samples for each cohort, thus .

Order parameters inferred from the data using Eqs. (2.2). Panel A) h B) q0 and C) qd in healthy (blue) and unhealthy (red) cohorts. The gray points show the values for the corresponding null models where sample labels have been randomized. Standard deviations have been computed over 5000 realizations.

Note also that, from metagenomics, we only have access to compositional data for species abundances Pasqualini et al. [2023]. This is crucial to properly treat different samples and end up with a consistent analysis. Moreover, this also allows us to give an ecological interpretation of some of the order parameters. In particular, thanks to the compositionality of the data, we have that and (where is known as the average local diversity), while qd provides information on pairwise products between species abundances within each replica. Figure 2 shows the values of such order parameters between healthy and unhealthy cohorts with respect to two randomized cohorts. In the data, we observe a systematic difference between healthy and unhealthy cohorts, pointing to a higher average local diversity in healthy samples (panels A, B), as also observed in Pasqualini et al. [2023]. Panel C highlights the higher qd in unhealthy patients, a signature of the weakening of the species interactions in those samples. We will investigate further this aspect in the following section, by inferring the species interactions in both cohorts.

From the modelling point of view, it can be rigorously shown that constraining the abundances to be normalized to one corresponds to introducing an additional Lagrange multiplier in the expression of the free energy. However, this only acts on the linear term contributing to shifting the mean of the random interactions, μ, but does not have any influence on the heterogeneity σ, therefore on the phase diagram. For more details, we refer the interested reader to the SI (Section B1) along with Altieri et al. [2021]. In Altieri et al. [2021] the authors also showed how a global normalizing constraint on species abundances reflects in a one-to-one mapping with the random replicant model Biscari and Parisi [1995]. (The replicator equations originally introduced by Diederich and Opper [1989] and recast within the replica formalism Biscari and Parisi [1995], Altieri et al. [2021] describe the dynamics of an ensemble of replicants that evolve via random couplings.)

2.3 Species interactions patterns characterize the state of microbiomes

We thus collect all the parameters estimated from the data in a vector π = (h, qd, q0, K). As we will better detail in the Methods, we develop a moment matching inference algorithm to infer the model parameters θ. We therefore generated order parameters from our model that are as close as possible to the data, as depicted in Figure 1. The idea of the method is to introduce a cost function C(θ|π), representing a total relative error for some self-consistent equations. If the parameters θ are such that the right part of the self-consistent equation equals the left part, the problem can be considered solved. Since the landscape of this cost function presents several minima, we performed multiple optimization procedures to collect an ensemble of possible solutions, from which we retained the top 30.

First, we find that different solutions θ* of the optimization problem provides ecological insights into the underlying microbiome populations. As Figure 3A shows, the inferred demographic noise strength (T) and the heterogeneity of the interactions (σ) for healthy and diseased microbiomes are clustered in the T - σ plane. In particular, the SAD for the healthy cohort is robust among the different solutions of the inference procedure, as shown by the superposition of the different curves in the inset of Figure 3A. On the other hand, SADs inferred from unhealthy patients have high sensitivity to different solutions. In particular, some of them display a mode for high-abundance species (light red lines in Figure 3A), a signature of dominant strain in the gut. Consistently, the distribution of the interactions P(αi,j) generated through the inferred parameters μ and σ is different between healthy and diseased cohorts, giving a distinct pattern of interactions (see Figure 3B), a result that is compatible with that found by Bashan and collaborators Bashan et al. [2016]. In particular, we find that dysbiosis reduces the heterogeneity of interaction strengths, a result also observed when taking correlations as a proxy for interactions Seppi et al. [2023].

Panel a: inferred T (demographic noise strength) and σ (interactions heterogeneity) for healthy (blue) and diseased (red) microbiomes are clustered. Darker dots correspond to better solutions (i.e., solutions with a lower value of the cost function C), while the two points with hexagonal markers correspond to the best two (healthy and diseased, respectively) solutions. In the first panel inset, we also show (in log-log scale) the SADs corresponding to each solution. To have a more concise representation, we present each SAD fixing the disorder to its average . Panel b: the probability density function of the inferred interactions αi,j for healthy (blue) and diseased (red) microbiomes. Dysbiosis reduces the heterogeneity of the interaction strengths.

We then assess how close the inferred σ and β = 1/T (the inverse temperature in a statistical physics approach) are to the critical replica symmetry-breaking (RSB) line of the dgLV (R = 0), evaluated being all the other parameters constant (see Methods). We find again that the replicon values R corresponding to each solution of our optimization protocol are significantly different for the two investigated microbiome phenotypes (see Figure 4A). In particular, diseased microbiomes are closer to marginal stability within the replica-symmetric ansatz Altieri et al. [2021], Mézard et al. [1987], de Almeida and Thouless [1978]. Furthermore, by investigating the shape of the SAD given by Eq. (2), we can estimate the ratio between niche (represented by species interaction) and neutral (represented by birth/death and immigration) ecological forces, which can be captured by the quantity ψ Wu et al. [2021]. It detects the emergence of internal modes in the species abundance distribution as a hallmark of niche processes (see SI, Section E).

Inspired by field-theory arguments (see Methods section Free-energy landscape exploration: replica formulation), we can call mass of the theory the m parameter as defined above in Eq. (2). When m → 0 the theoretical order parameters diverge and the system enters a pathological regime of unbounded growth. As such, the mass term, can be considered as a complementary stability measure, capable of capturing the transition to the unbounded growth regime.

In the model, two kinds of effects compete to shape the community structure. On the one hand, we have niche effects, encoded in disordered interactions and thus tracked by the parameters μ, σ, and K. Their overall effect is selective and tends to concentrate the SAD around the typical abundance value. On the other hand, we have neutral effects encoded into the stochastic dynamics and immigration, governing the low abundance regime of the SAD. When the demographic noise amplitude is stronger than immigration (ν < 1, as in our case) the SAD exhibits a low-abundance integrable divergence. In the opposite scenario, ν > 1, there is no divergence and the SAD is modal. Since interactions are random, the probability of observing an internal mode can be estimated as the fraction of SADs realization having non-trivial solutions to the stationary point equation. Such quantity, dubbed as the niche-neutral ratio, can be analytically evaluated and reads:

where and . When ψ ≈ 1, niche and neutral forces give comparable contributions to the dynamics, as both low abundance divergence and a finite abundance mode coexist in the SAD. Finally, if the typical abundance diverges, we enter into the unbounded growth phase, which means that the mass m and the niche-neutral ratio ψ are not independent, as suggested by the analytical expression on ψ. For an exhaustive derivation of this result, see SI (Section E). With the model parameters obtained, we were able to evaluate m and ψ for healthy and diseased microbiomes. Also, in this case, healthy and diseased microbiomes are clearly clustered, as shown in Figure (4). Unhealthy microbiomes turn out to be closer to the unbounded growth phase, and the nice-neutral ratio is larger by five orders of magnitude compared to the healthy case ψU ≈ 105ψH. This leads us to argue that selective pressure is way larger in diseased states, while in the healthy one, birth and death effects are the key drivers of the dynamics. These results are also confirmed by the SADs shape in the inset of Figure (3, panel a).

The replicon eigenvalue corresponding to each solution of our optimization procedure (shaded dots). The solid hexagon represents the replicon corresponding to the best solutions that minimize the error in predicting the order parameters of the theory (minimum C). The two investigated microbiome phenotypes (healthy in blue, diseased in red) are significantly different. In particular, diseased microbiomes are closer to the marginal stability of replica-symmetric ansatz (grey horizontal line). Panel b: Solutions of the moment-matching objective function are shown as a function of ψ and m, which in turn depend on the SAD parameters (see main text). Healthy (blue) and diseased (red) microbiomes appear to be clustered. Therefore, distinct ecological organization scenarios (strong neutrality/emergent neutrality) emerge. Darker dots correspond to solutions with lower values of the cost function, while hexagonal markers correspond to the two best solutions.

Discussion and Conclusions

In our exploration of the gut microbiome through the lens of disordered systems and random matrix theory, we have embarked on an ambitious journey to connect intricate theoretical concepts to practical analyses of environmental microbiome data. In particular, we have characterized healthy and unhealthy gut microbiomes using the disordered generalized Lotka-Volterra model (dgLV) for population dynamics. This attempt to apply advanced theoretical methodologies from disordered systems to real-world biological data represents a significant stride in understanding the complex interplay within microbial communities.

One delicate aspect of our study lies in addressing the inherent limitations and the potential scope for improvement in future research. A notable challenge is the discrepancy between the empirical Species Abundance Distribution (SAD), which is observed in the data, and the theoretical SAD predicted by the quenched dgLV model Suweis et al. [2023]. While empirical data showcase a diverse range of species abundance, following a power-law distribution, the model predictions tend to exhibit exponential decay in SAD tails. This mismatch underscores the need for further refinement of the model to accurately capture the nuanced patterns observed in real-world data. For example, recently it has been proposed that introducing an annealed disorder can generate SADs that look closer to the empirical ones Suweis et al. [2023]. Another possibility is to set the dgLV model parameters in such a way as to reproduce the multi-attractor phase. In fact, in this region, the SADs display a more heterogeneous shape Arnoulx de Pirey and Bunin [2024]. We already intend to explore this follow-up direction by combining one-step replica-symmetry-breaking (1RSB) computations in the multiple-attractor phase with Dynamical Mean-Field Theory analysis for the asymmetric interaction case.

Another related limitation is the challenge of generating species abundance samples from the dgLV model that mirrors the statistical properties of the observed empirical data. In our current framework, each microbiome sample could be extracted from p(N|ζ), where ζ is a realization of the disorder. However, p(N|ζ) near N → 0 presents a power-law exponent ν - 1, with νH,D ≈ 10-3. This generates numerical instabilities and dominates the sampling process, posing difficulties in generating representative synthetic samples. Moreover, while microbiome data are inherently compositional Pasqualini et al. [2023], the dgLV model species populations Ni are positive real numbers. However, as already noted, it can be shown that introducing a normalization in such equations does not change the structure of the proposed solutions Altieri et al. [2021] and therefore should not affect the conclusions of our work. The study also sheds light on the role of demographic noise within the context of the dgLV model. In fact, in the limit T → 0, the SAD transitions to a truncated Gaussian, with a prominent peak at zero reflecting the fraction of extinct species. In contrast to scenarios at zero noise, where species extinctions are observed, the inclusion of demographic noise in the dgLV model suggests a scenario where no species goes extinct, supporting the everything is everywhere hypothesis Grilli [2020], Pigani et al. [2024]. In other words, within this framework, zeros in the data are due to sampling effects and not to local species extinctions Grilli [2020], Pasqualini et al. [2023].

In conclusion, our work opens up new avenues for future research, particularly in refining the theoretical models to better align with empirical observations and in exploring the nuances of species abundance distributions within the microbiome context. Moreover, the integration of other forms of environmental variability and species-specific traits could provide a more holistic view of ecosystem dynamics as also proposed by the One Health-One Microbiome framework Tomasulo et al. [2024]. The insights gained from this study not only enhance our understanding of the complex dynamics in microbial ecosystems but also pave the way for more sophisticated analytical tools in ecological modelling, contributing significantly to our ability to interpret and manage these vital biological systems.

Acknowledgements

S.S. acknowledges Iniziativa PNC0000002-DARE - Digital Lifelong Prevention. A.M. acknowledges financial support under the National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.1, Call for tender No. 104 by the Italian Ministry of University and Research (MUR), funded by the European Union - NextGenerationEU – Project Title “Emergent Dynamical Patterns of Disordered Systems with Applications to Natural Communities” – CUP 2022WPHMXK - Grant Assignment Decree No. 2022WPHMXK adopted on 19/09/2023 by the Italian Ministry of University and Research (MUR). A.A. acknowledges the support received from the Agence Nationale de la Recherche (ANR) of the French government, under the grant ANR-23-CE30-0012-01 (project “SIDECAR”). We thank Silvia De Monte for insightful discussions.