Abstract
The rapid advancement of environmental sequencing technologies, such as metagenomics, has significantly enhanced our ability to study microbial communities. The eubiotic composition of these communities is crucial for maintaining ecological functions and host health. Species diversity is only one facet of a healthy community’s organization; together with abundance distributions and interaction structures, it shapes reproducible macroecological states, i.e., joint statistical fingerprints that summarize whole-community behavior. Despite recent developments, a theoretical framework connecting empirical data with ecosystem modeling is still in its infancy, particularly in the context of disordered systems. Here, we present a novel framework that couples statistical-physics tools for disordered systems with metagenomic data, explicitly linking diversity, interactions, and stability to define and compare these macroecological states. By employing the generalized Lotka-Volterra model with random interactions, we reveal two different emergent patterns of species-interaction networks and species abundance distributions for healthy and diseased microbiomes. On the one hand, healthy microbiomes have similar community structures across individuals, characterized by strong species interactions and abundance diversity consistent with neutral stochastic fluctuations. On the other hand, diseased microbiomes show greater variability driven by deterministic factors, thus resulting in less ecologically stable and more divergent communities. Our findings suggest the potential of disordered system theory to characterize microbiomes and to capture the role of ecological interactions on stability and functioning.
1 Introduction
Microbial communities are a fundamental reservoir of ecological functions and biological diversity. They are relevant for any environmental and host-associated ecosystem, ranging from soil to the human gut. In particular, the human gut microbiome, by interacting with the host’s metabolism and immune system Levy et al. [2016], Barroso-Batista et al. [2015], Barreto and Gordo [2023], is a key regulator of human health. Dysbiosis is defined as an alteration in the composition of healthy microbiomes associated with the gastrointestinal tract Lloyd-Price et al. [2019], Das and Nair [2019], Nishida et al. [2018].
Recently, stochastic non-interacting neutral Zeng et al. [2015], Venkataraman et al. [2015], Sala et al. [2016], Sieber et al. [2019], Descheemaeker and De Buyl [2020], Grilli [2020], Zaoli and Grilli [2022], and logistic models have been successfully used to describe empirical patterns such as species abundance distribution (SAD) and species presence/absence statistics in different types of communities. However, focusing on single-species properties Azaele et al. [2016], Grilli [2020] fails to characterize altered states of microbiome Seppi et al. [2023], Pasqualini et al. [2024]. In particular, recent studies indicate that gut dysbiosis is associated with shifts in microbial species interaction patterns. Network-level properties of species interactions – such as the balance of positive and negative interactions, average interaction strength, and connectivity of the inferred interaction matrix – have been shown to systematically differ between healthy and dysbiotic gut communities (see, for instance, Bashan et al. [2016], Seppi et al. [2023]). Pairwise species interactions have been quantified in simplified in-vitro consortia Kehe et al. [2021], Venturelli et al. [2018]; we assume that the same classes of interactions also operate – albeit in a more complex form – in the native gut microbiome.
For these reasons, several approaches have been proposed in recent years to infer microbial interactions Faust and Raes [2012], Xiao et al. [2017], Camacho-Mateu et al. [2024]. However, such inference protocols remain very challenging and problematic in many respects Faisal et al. [2010], Angulo et al. [2017], Tu et al. [2019], Armitage and Jones [2019], Holt [2020]: a) the very high dimension of typical microbiome datasets and the lack of longitudinal (long-term) experiments; b) the time-dependent nature of microbial species interactions Pacciani-Mori et al. [2021]; c) abiotic factors such as resources, temperature, and pH, can introduce environmental filtering effects Sireci et al. [2022] and induce effective interactions; d) experimental and technical challenges that introduce sampling effects, false positive species Tovo et al. [2020], spurious correlation, and bias in the data Weiss et al. [2016], Gloor et al. [2017], Dohlman and Shen [2019]. Therefore, even in a scenario where metagenomic samples are noise- and bias-free, reconstructing species-interaction networks of the underlying microbial dynamics remains a hard task.
An alternative, theoretically more elegant approach was pioneered by May May [1972], proposing to use random matrices to model species interaction networks, i.e., each entry of the adjacency matrix is extracted at random from a given distribution. Given the impossibility of empirically measuring the interaction strengths, the advantage of such an approach is the reduction from ∼ S2 (if S is the number of species) to just a few parameters (e.g., 2) used to parameterize the distribution Allesina and Tang [2012].
1.1 Insights from Disordered Systems Theory
Although previous studies have largely focused on neutral theory and logistic models – including variants like the Stochastic Logistic Model that account for environmental fluctuations – they often neglect inter-species interactions, limiting their ability to reproduce macroecological patterns at a global scale. Moreover, higher-order correlations among species are known to generate non-trivial effects, such as the emergence of persistent fluctuations or chaotic dynamics due to the non-reciprocity of interactions.
In recent years, increasing attention has been devoted to studying population dynamics through the generalized Lotka-Volterra (gLV) equations, employing disordered systems techniques (also known as glassy), such as replica and cavity methods and Dynamical Mean-Field Theory approaches, originally developed in the context of statistical physics Bunin [2017], Galla [2018], Biroli et al. [2018], Altieri et al. [2021], Lorenzana and Altieri [2022]. Indeed, due to the inherently high dimensionality of microbial datasets, random matrix theory and methods from disordered systems turn out to be particularly well suited. A striking feature of their application to ecological dynamics is that the resulting properties and dynamical regimes do not depend on species-specific details, provided that species are statistically equivalent under relabeling or time-averaging. Phase diagrams can thus be established in terms of a few effective parameters (e.g., the mean μ and the variance σ2 of the random interactions strengths) Barbier et al. [2018]. However, despite a range of very interesting results, this approach has remained confined mainly to purely theoretical domains (but see Hatton et al. [2024]) and has only recently been used to give an interpretation of controlled experiments with synthetic microbial communities Hu et al. [2022].
2 Results
We now provide a proof of concept of the applicability of a high-dimensional disordered setting to human microbiomes. Specifically in the following section, we introduce the disordered generalized Lokta-Volterra (dgLV) model, infer its parameters from healthy and unhealthy cohorts of gut microbiomes, and test whether the resulting interaction patterns and stability metrics discriminate healthy from diseased macro-ecological states. Finally, we will introduce quantitative metrics to define microbiome stability and estimate the contribution of distinct ecological forces to the dynamics.
2.1 Disordered generalized Lotka-Volterra model
The dgLV model describes the time evolution of the concentration abundances of a local pool of S interacting species, i.e.,
where Ni is the populations of species i-th, Ki is its carrying capacity, and ri the growth rate where
The presence of a noise term in Eq. (1) allows us to write the Fokker-Planck equation of the system and study its stationary solution (see Altieri et al. [2021], Altieri and Biroli [2022] for a detailed derivation in a similar Hamiltonian formalism). Here, we adopt the Itô prescription for the stochastic dynamics in such a way as to prevent species resurgence by noise.
The replica formalism, a well-known technique in disordered systems, comes into play and allows us to derive a non-interacting Hamiltonian corresponding to the dgLV symmetric interactions αij (see Methods and Appendix, Section S2, for a complete derivation). In simple words, instead of trying to solve the problem for one random setup, the formalism considers many replicas of the system and averages their behaviors. This approach helps to smooth out the randomness and reveals the typical behavior of the system. We also assume a single equilibrium scenario, known as replica-symmetric (RS) regime. Although the SADs of empirical microbial communities display fat tails, a feature more compatible with the multi-attractors phase of the asymmetric 1RSB Mallmin et al. [2024] case or with the gLV with time-dependent disordered interactions Suweis et al. [2024] (annealed version), we consider this simplification as it is the regime in which we can obtain explicit analytical relations between the model parameters and the data and where model inversion is feasible. By employing a cavity argument Mezard and Montanari [2009], Altieri and Baity-Jesi [2024], one can indeed analytically derive the SAD of the model (see Methods and Appendix, Section S4):
where the auxiliary variable
The RS ansatz can also be characterized by its stability to external perturbations, such as the external temperature, the immigration rate, or the interaction heterogeneity. To investigate the stability of the RS phase, we consider the Hessian matrix of the theory by performing a harmonic expansion of the replicated free energy, as originally pointed out in de Almeida and Thouless [1978]. When the leading eigenvalue of the Hessian matrix, the so-called replicon mode, goes below zero, unstable equilibria appear: either the system moves towards a One-step Replica Symmetric Breaking (1-RSB) phase with multiple locally stable minima, or it develops a marginal full-RSB phase, leading to a hierarchical structure of states and therefore to an extremely rough landscape (also known as Gardner amorphous-like phase). In other words, the RS ansatz no longer represents a thermodynamically stable phase in the latter cases.
Based on standard calculations in disordered systems, the replicon mode ℜ can be analytically evaluated and reads:
where
(The replicon eigenvalue refers to a particular type of fluctuation around the saddle-point (mean-field) solution in the replica framework. When diagonalizing the Hessian matrix of the replicated free energy, fluctuations split into three sectors: longitudinal, anomalous, and replicon. The replicon mode is the most sensitive to criticality signaling – by its vanishing trend – the emergence of many nearly-degenerate states. It essentially describes how ‘soft’ the system is to microscopic rearrangements in configuration space.)
2.2 Data Through the Glass
We now consider cross-sectionally sampled gut microbiomes of two different cohorts: one of healthy and another of diseased individuals. Here we focus on chronic inflammation syndromes, but, in principle, what we present holds for groups of phenotypically distinct populations. We can idealize the samples of each group as generated from the stationary distribution of the dgLV Eq. (1), with different realizations of the disorder α, but with shared μ and σ (see Material and Methods). The approach we propose considers all samples in the same group as coming from the same statistical ensemble. On the other hand, it distinguishes ecological regimes characterized by distinct phenotypes through different statistics (e.g., dissimilar μ and σ) of the random species interactions α. We aim to test the hypothesis that each cohort will be described by a different set of ecological parameters, precisely with θ = (μ, σ, T, λ).
To calculate the order parameters (h, qd, q0, K) from the data, we thus need to specify how we perform the ensemble and the disorder averages empirically (see Figure 1). Since time series data are rarely available, we rely on an effective mean-field description and estimate the ensemble average by averaging over species (that is, different species are realizations of the same underlying stochastic process Azaele et al. [2016], known as neutral hypothesis), i.e.,
where Nj,a represents the population density of species j in sample a.

The inference protocol of the dgLV generative model is performed by a moment matching optimization procedure.
We aim to infer the free parameters θ = (μ, σ, T, λ) – shown on the right – so that to match the mean abundance, higher-order correlations between species abundances, and the average carrying capacity for the two cohorts, i.e. (h, q0, qd, K) – on the left. This procedure allows us to extract ecological dynamics information for cross-sectional data of healthy (blue) and diseased (red) microbiomes, which are conceptualized as independent disorder realizations.
To evaluate the carrying capacities Ki from the data, as done in most of the works using dgLV equations Bunin [2017], Biroli et al. [2018], Altieri et al. [2021], Suweis et al. [2024], Mallmin et al. [2024], we assume that in each cohort all species are characterized by the same carrying capacity Ki = K. Here, we define K as the average of the maximum relative abundances of species across the available samples for each cohort, so that
Note also that, from metagenomics, we only have access to compositional data for species abundances Pasqualini et al. [2024]. This is crucial to properly treat different samples and end up with a consistent analysis. Moreover, this approach allows us to give an ecological interpretation of some of the order parameters. In particular, because of the compositionality of the data, we have that

Order parameters inferred from the data using Eqs. (4).
Panel a) shows h, b) qd, and panel c) q0 in healthy (blue) and unhealthy (red) cohorts. The gray points show the values for the corresponding null models where sample labels have been randomized. Standard deviations have been computed over 5000 realizations.
From the modeling point of view, it can be rigorously shown that constraining the abundances to be normalized to one corresponds to introducing a Lagrange multiplier in the expression of the free energy. However, this only acts on the linear term, contributing to shifting the mean of the random interactions, μ, but does not affect the heterogeneity σ, and therefore the phase diagram overall. For more details, we refer the interested reader to the Appendix (end of Section S2) along with Altieri et al. [2021]. In Altieri et al. [2021] the authors also showed how a global normalizing constraint on species abundances reflects in a one-to-one mapping with the random replicant model Biscari and Parisi [1995] (the replicator equations originally introduced by Diederich and Opper [1989] and recast within the replica formalism Biscari and Parisi [1995], Altieri et al. [2021] describe the dynamics of an ensemble of replicants that evolve via random couplings).
2.3 Species interactions patterns characterize the state of microbiomes
We thus collect all the parameters estimated from the data in a vector π = (h, qd, q0, K). As we will better detail in the Methods, we develop a moment matching inference algorithm to infer the model parameters θ, as depicted in Figure 1. The idea of the method is to introduce a cost function 𝒞(θ|π), representing a total relative error for some self-consistent equations. If the parameters θ are such that the right part of the self-consistent equation equals the left part, the problem is considered solved. Because the landscape associated with this cost function presents several minima, we perform multiple optimization procedures to collect an ensemble of possible solutions, from which we retain the top 30. First, we find that different solutions θ* of the optimization problem provide ecological insights into the underlying microbiome populations.
As originally predicted in Altieri et al. [2021], among all the parameters that define Eq. (1), the only ones relevant for reproducing the theoretical phase diagram are the amplitude of demographic noise and the heterogeneity of interactions. The mean interaction strength, provided it is sufficiently positive, does not play a significant role. This prediction is fully confirmed by the inference procedure applied to the two microbiome datasets, allowing us to identify a universal signature that distinguishes healthy from unhealthy states. Figure 3a shows, indeed, that inferred noise (T) and interaction heterogeneity strength (σ) for healthy and diseased microbiomes are clustered in the two-dimensional plane.
In particular, the SAD for the healthy cohort is robust among the different solutions of the inference procedure, as depicted by the superposition of the different curves in the inset of Figure 3A. On the other hand, SADs inferred from unhealthy patients have high sensitivity to different solutions. In particular, some of them display a mode for high-abundance species (light red lines in Figure 3a), a signature of dominant strain in the gut. Consistently, the distribution of the interactions P (αi,j) generated through the inferred parameters μ and σ is different between healthy and diseased cohorts, giving a distinct pattern of interactions (see Figure 3b), a result that is compatible with that found by Bashan and collaborators Bashan et al. [2016]. Remarkably, we find that dysbiosis reduces the heterogeneity of interaction strengths, a result also observed when taking correlations as a proxy for interactions Seppi et al. [2023].

Panel a: inferred T (demographic noise strength) and σ (interactions heterogeneity) for healthy (blue) and diseased (red) microbiomes are clustered. Darker dots correspond to better solutions (i.e., solutions with a lower value of the cost function 𝒜), while the two points with hexagonal markers correspond to the best two (healthy and diseased, respectively) solutions. In the first panel inset, we also show (in log-log scale) the SADs corresponding to each solution. To have a more concise representation, we present each SAD fixing the disorder to its average
We then assess how close the inferred σ and β = 1/T (a.k.a. inverse temperature in a statistical physics approach) are to the critical replica symmetry-breaking (RSB) line of the dgLV (ℜ = 0), evaluated by keeping all the other parameters constant (see Methods). We find again that the replicon values ℜ corresponding to each solution of our optimization protocol are significantly different for the two investigated microbiome phenotypes (see Figure 4a). In particular, diseased microbiomes are closer to marginal stability within the replica-symmetric ansatz Altieri et al. [2021], Mézard et al. [1987], de Almeida and Thouless [1978]. Furthermore, by investigating the shape of the SAD given by Eq. (2), we can estimate the ratio between niche (represented by species interaction) and neutral (represented by birth/death and immigration) ecological forces, which can be captured by the quantity ψ Wu et al. [2021]. It detects the emergence of peaks in the species abundance distribution as a hallmark of niche processes (see Appendix 2).

Panel a: The replicon eigenvalue corresponding to each solution of our optimization procedure (shaded dots). The solid hexagon represents the replicon corresponding to the best solutions that minimize the error in predicting the order parameters of the theory (minimum 𝒞). The two investigated microbiome phenotypes (healthy in blue, diseased in red) are significantly different. In particular, diseased microbiomes are closer to the marginal stability of replica-symmetric ansatz (grey horizontal line). Panel b: Solutions of the moment-matching objective function are shown as a function of ψ and m, which in turn depend on the SAD parameters (see main text). Healthy (blue) and diseased (red) microbiomes appear to be clustered. Therefore, distinct ecological organization scenarios (strong neutrality/emergent neutrality) emerge. Darker dots correspond to solutions with lower values of the cost function, while hexagonal markers correspond to the two best solutions.
Inspired by field-theory arguments (see Methods and Appendix, Section S2), we can call mass of the theory the m parameter, as defined above in Eq. (2). In classical and quantum field theory, the particle-particle interaction embedded in the quadratic term is typically referred to as a mass source. In our context, m = 1 − βσ2 (qd − q0) captures quadratic fluctuations of species abundances, as also appearing in the expression of the leading eigenvalue of the stability matrix. When m → 0, the analytical order parameters diverge and the system enters the unphysical regime of unbounded growth. As such, the mass term can be considered a complementary stability measure, capable of capturing the transition to the unbounded growth regime.
In the model, two kinds of effects compete to shape the community structure. On the one hand, we have niche effects, encoded in disordered interactions and thus tracked by the parameters μ, σ, and K. Their overall effect is selective and tends to concentrate the SAD around the typical abundance value. On the other hand, we have neutral effects encoded in the stochastic dynamics and immigration, governing the low abundance regime of the SAD. When the demographic noise amplitude is stronger than immigration (ν < 1, as in our case), the SAD exhibits a low-abundance integrable divergence. In the opposite scenario, for ν > 1, there is no divergence and the SAD is modal. Since interactions are random, the probability of observing an internal mode can be estimated as the fraction of SADs realizations having non-trivial solutions to the stationary point equation. Such a quantity, dubbed as the niche-neutral ratio, can be analytically evaluated:
where
In summary, in the Result section we demonstrate that: (i) the inference pipeline robustly recovers demographic noise and interaction heterogeneity by calculating h, q0 and qd from the data; (ii) These parameters cluster by health status and (iii) diseased microbiomes lie closer to the replica-symmetry-breaking threshold, indicating reduced ecological resilience.
3 Discussion and Conclusions
In our exploration of the gut microbiome through the lens of disordered systems and random matrix theory, we have proposed a connection between the theoretical framework of disordered systems and practical analyses of environmental microbiome data. In particular, we have characterized healthy and unhealthy gut microbiomes using the disordered generalized Lotka-Volterra model (dgLV) for population dynamics. We now interpret our theoretical findings in a biological context, contrast them with previous work, and outline limitations and future directions.
The first major result of our work suggests a different role for the various ecological forces shaping the human gut microbial community. In this sense, the niche-neutral ratio ψ, highlights the different roles of interactions in healthy and diseased microbiomes. In the healthy case, neutral forces, such as random birth and death of individuals, characterize the dynamics, making configurations corresponding to this state alike. On the contrary, in diseased microbiomes, disordered, sample-specific interactions are the dominant ecological force, making individual realizations differ significantly from one another. An ecological interpretation of our findings suggests that healthy microbiomes are governed primarily by demographic stochasticity, reflecting a quasi-neutral regime characterized by similar community structures across individuals. Conversely, microbiomes from diseased patients exhibit significantly greater variability, suggesting that deterministic ecological factors – such as weakened species interactions – override neutrality, leading to structural instability and distinct microbial compositions. This observation aligns with the “Anna Karenina principle” Pasqualini et al. [2024], Ma [2020] holding for gut microbiomes, which can be phrased as: “All healthy gut microbiomes are alike; each unhealthy gut microbiome is unhealthy in its own way”. Supporting this interpretation, our analysis of the replicon eigenvalue ℜ shows that the healthy state is associated with pronounced stability to external perturbations. The unhealthy state, instead, being closer to the RSB line, exhibits diminished stability and, consequently, reduced robustness against external perturbations.
Our study also sheds light on the role of demographic noise within the context of the dgLV model. In the limit T → 0, the SAD transitions to a truncated Gaussian, with a prominent peak at zero reflecting the fraction of extinct species. In contrast to a zero-noise scenario, where species extinctions are observed, the inclusion of demographic noise in the dgLV model suggests a picture where no species goes extinct, supporting the everything is everywhere hypothesis Grilli [2020], Pigani et al. [2024]. In other words, within this framework, zeros in the data are due to sampling effects and not due to local species extinctions Grilli [2020], Pasqualini et al. [2024].
A notable limitation of our study lies in the discrepancy between the empirical Species Abundance Distribution (SAD) observed in the data and the theoretical distribution predicted by the quenched dgLV model Suweis et al. [2024]. While empirical data showcase a diverse range of species abundance, following a power-law distribution, the model predictions tend to exhibit exponential decay in SAD tails. This mismatch underscores the need for further refinement of the model to accurately capture the nuanced patterns observed in real-world data. For example, it has recently been proposed that introducing an annealed disorder can generate SADs that more closely resemble the empirical ones Suweis et al. [2024] (unlike the quenched approximation, the annealed version assumes that random couplings are not fixed but rather fluctuate over time, with their covariance governed by independent Ornstein-Uhlenbeck processes). Another possibility is to set the dgLV model parameters in such a way as to reproduce the multi-attractor phase. In fact, in this region, the SADs display a more heterogeneous shape Arnoulx de Pirey and Bunin [2024]. We plan to explore this follow-up direction by combining one-step replica-symmetry-breaking (1RSB) computations in the multiple-attractor phase with Dynamical Mean-Field Theory analysis for the asymmetric interaction case. This latter approach is particularly well suited for studying inherently non-equilibrium dynamics and for extending the framework to systems subject to environmental fluctuations in addition to demographic noise.
Another related limitation is the challenge of generating species abundance samples from the dgLV model that mirrors the statistical properties of the observed empirical data. In our current framework, each microbiome sample could be extracted from p(N|ζ), where ζ is a realization of the disorder. However, p(N|ζ) near N → 0 presents a power-law exponent ν − 1, with νH,D ≈ 10−3. This results in numerical instabilities and dominates the sampling process, posing difficulties in generating representative synthetic samples. Moreover, while microbiome data are inherently compositional Pasqualini et al. [2024], the dgLV model species populations Ni are positive real numbers. However, as already noted, it can be shown that introducing a normalization
In conclusion, our work proposes a bridge between theory and data, particularly in refining the theoretical models to better align with empirical observations and in exploring the nuances of species abundance distributions within the microbiome context. Moreover, the integration of other forms of environmental variability and species-specific traits could provide a more holistic view of ecological dynamics, as also proposed by the One Health-One Microbiome framework Tomasulo et al. [2024].
Overall, our study builds a quantitative link between metagenomic data and the disordered gLV framework, revealing how dysbiosis alters gut species interaction networks. By doing so, it lays the groundwork for more advanced, mechanistically informed models to better interpret and ultimately manage complex microbial ecosystems.
Material and Methods
Microbiome dataset and code
We have selected gut microbiome data from three studies Franzosa et al. [2019], Lloyd-Price et al. [2019], Mars et al. [2020], focusing on inflammatory syndromes of the gastrointestinal tract (Crohn’s Disease, Ulcerative Colitis, and Irritable Bowel Syndromes). Considering all the available metadata, we have selected the patients less affected by possible perturbing factors, such as drugs. Finally, our dataset consists of RHealthy = 91 shotgun metagenomic samples from healthy control individuals and RDiseased = 202 shotgun metagenomic samples. All metagenomic preprocessing and reads classification are extensively described in Pasqualini et al. [2024]. Species abundance profiles from metagenomic data and the parameters values obtained from the moment matching inference are available at the associated Zenodo page.
All the scripts implementing the moment matching inference and the jupyter notebooks to generate the figures are available at the associated GitHub page.
Free-energy landscape exploration: replica formalism
In the case of symmetric interactions – corresponding to conservative forces in the dynamics – a one-to-one mapping between the multi-species dynamics and a thermodynamic formalism can be safely worked out. The first step consists of writing the Fokker-Planck equation in the presence of a white Gaussian noise defined by a zero mean and a variance of amplitude 2T. All technical details, leveraging a Fokker-Planck derivation, can be found in Altieri and Biroli [2022] for a similar pairwise interacting model, but a different self-regulation term accounting for non-logistic behavior (1).
Once the (quenched) disordered Hamiltonian of the model is obtained, we can resort to techniques known in statistical physics of disordered systems, such as the replica and cavity methods Mezard and Montanari [2009], Mézard et al. [1987], Zamponi [2010]. The replica trick, in particular, allows us to handle disordered quantities, such as the free energy and the partition function, which would be otherwise unaffordable (see Appendix below).
We summarize the main findings here along with the expressions of the (RS) order parameters of the model, (h, qd, q0). The three expressions below, originally obtained in Altieri et al. [2021], have offered the starting point of this work, allowing for a thorough comparison with the same order parameters measured from metagenomic data. Their analytical expressions are self-consistently determined by the system of equations:
where the calligraphic notation stands for the Gaussian integration
As long as the system of Eqs. (6) admits physically reasonable solutions, we might claim that the RS ansatz safely holds. This condition is nevertheless necessary but not sufficient because the stability of the RS solution must also be checked. It therefore requires studying the Hessian matrix of free energy and diagonalizing it on a suitable subspace, called replicon, ℜ. The main outcome of this computation is captured by Eq. (3) of the main text. The averaged difference describes the fluctuations between the first and second moments of the species abundances within one state, namely between the diagonal value qd and the off-diagonal contribution q0 of the overlap matrix. Detecting a vanishing value of ℜ corresponds to the appearance of a marginally stable RS solution (see section s3 of the Appendix).
Moment matching inference
The parameters h, q0, and qd can be self-consistently determined through the saddle point of the dgLV free energy in Eqs. (6) (see also Altieri et al. [2021]). We thus aim to estimate which set of model parameters (i.e. θ = (μ, σ, T, λ)) will generate values of the order parameters (h, qd, q0) matching those directly estimated from the data. The solution of such an inference problem may not be unique or exact. We have thus developed an optimization algorithm so to find a pool of possible solutions that minimize the difference between the order parameters estimated by the model and those directly obtained from the data. To infer the parameters θ, we can thus use the self-consistent equations for the order parameters and solve the inverse problem to find the dgLV parameters that match the empirical observations. For each self- consistent equation, we can introduce a relative error δH(θ|π) = (H(θ|π) − h)/h, δQd(θ|π) = (Qd(θ|π) − qd)/qd and δQ0(θ|π) = (Q0(θ|π) − q0)/q0. By summing the square of each of these contributions, we introduce the cost function C for our moment matching problem, i.e.,
As already observed, the cost function has multiple local minima. To explore the rich structure of minima, we adopt a greedy search optimization strategy. First, we generate a vector θ0 so that m0 = m(θ0|π) = 1/2 max(m) = 1/2. This condition ensures that the starting point of the optimization is far from the unbounded growth phase. In particular, it allows us to randomly generate an initial value of the interactions heterogeneity from a broad range σ ∼ Uniform(0,10) and to get the corresponding initial value of the demographic noise by mean of the relation
Cavity method for the species abundance distribution
Another powerful technique rooted in disordered systems is the cavity method, which turns out to be particularly convenient for deriving the species abundance distribution at equilibrium. Without demographic noise, the SAD in the single equilibrium phase is typically captured by a truncated Gaussian distribution Yoshino et al. [2008], Bunin [2017], Altieri and Franz [2019]. In the presence of noise and finite migration, the computation gets more involved but is still doable within the cavity approach Mezard and Montanari [2009]. The basic idea consists of adding a new species to the ecosystem and investigating the resulting joint probability distribution of the typical species. In the thermodynamic limit, the difference between a system composed of S species and the corresponding one with S + 1 species is negligible. Therefore, one can write:
By gathering all relevant information about the so-called cavity field,
For compactness, we skip all technical details at this stage. Proceeding step-by-step – the full derivation can nevertheless be found in the Appendix – we end up writing the expression for the marginal probability distribution:
where Nc has been replaced by N denoting the typical species abundance, m = [1 − σ2β(qd − q0)] denotes the mass term borrowing field-theory terminology, and ζ is an auxiliary Gaussian variable.
Acknowledgements
We thank Silvia De Monte for insightful discussions. S.S. acknowledges Iniziativa PNC0000002-DARE – Digital Lifelong Prevention. A.M. acknowledges financial support under the National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.1, Call for tender No. 104 by the Italian Ministry of University and Research (MUR), funded by the European Union – NextGenerationEU – Project Title “Emergent Dynamical Patterns of Disordered Systems with Applications to Natural Communities” – CUP 2022WPHMXK - Grant Assignment Decree No. 2022WPHMXK adopted on 19/09/2023 by the Italian Ministry of Ministry of University and Research (MUR). A.A. acknowledges the support received from the Agence Nationale de la Recherche (ANR), under the grant ANR-23-CE30-0012-01 (project “SIDECAR”).
Additional information
Author contributions
S.S. and A.A. designed and supervised research, J.P. performed research, E.V.S. and S.F. provided data filtering criteria; all the authors wrote the paper.
Appendices
S1. Mapping between MacArthur and Lotka-Volterra Models
We aim to investigate the mapping between the Lokta-Volterra model and the consumer-resource (CR) or MacArthur (MA) dynamics, as analyzed in Tikhonov and Monasson [2017], Altieri and Franz [2019] for a high-dimensional version. We thus consider Eq. (1) of the main text, which describes the evolution of S randomly interacting species. The self-interaction term parameters are the growth rate ri, and the carrying capacity Ki, whereas all the inter-species interaction parameters are collected into the matrix α. For this argument, we will consider a simplified model version, where stochasticity (a.k.a. effective temperature) and immigration rate are set to zero, leading to a deterministic version, as in Biroli et al. [2018]. This model describes the dynamics of the relative species abundances Ni – previously normalized with respect to the total number of individuals in the pool – according to:
where we have introduced
where
By plugging the obtained resource abundance in the equation for species dynamics, we precisely recover the Generalized Lotka-Volterra model
(16)
Note that, between the second and third passage, we decomposed the sum over j in the diagonal and off-diagonal terms, respectively j = i and j ≠ i, then swapped the summation over resources and species. Finally, we have introduced the quantities
Since the interactions are expressed as a scalar product between uptake profiles, the derived equations correspond to a gLV model with symmetric interactions.
S2. Replica formalism for the disordered Generalized Lotka-Volterra model
To obtain a full characterization of the different emergent phases in the disordered Generalized Lotka-Volterra model, we can take advantage of the replica method. Using the replica identity Mézard et al. [1987], Altieri and Baity-Jesi [2024], i.e.
where β = 1/T is the inverse of the demographic noise amplitude. The overline
The analytical treatment requires the introduction of the overlap matrix, Qab – whose diagonal value is
which remains, at this level, as general as possible. The action 𝒜 reads
to be eventually evaluated by the Laplace method, or saddle-point approximation, in the large S limit.
By resorting to the replica-symmetric (RS) ansatz – according to which the overlap matrix is parametrized by two values only, qd and q0, the diagonal value or self-overlap, and the off-diagonal value, respectively – the last piece in Eq. (20) can be expressed as a function of an effective Hamiltonian
The free energy is thus re-written as
with the associated replica symmetric Hamiltonian:
Note that, although in the computation above we have considered a species-dependent Ki to be as general as possible, in our work we have assumed Ki = K for all species. The full expressions of the parameters h, q0, and qd, to be obtained by a saddle-point approximation of the RS free energy have been reported in the Methods. They can be solved iteratively as also explained in Altieri et al. [2021], Lorenzana and Altieri [2022].
Comment on compositional abundances
For purely theoretical purposes, dealing with absolute or relative abundances makes little difference. However, for the metagenomics data available to us, it is essential to perform the analysis as a function of relative abundances. Forcing the abundances to be normalized is formally equivalent to adding a global constraint in the Hamiltonian such that
The procedure explained above remains exactly the same along with the introduction of n replicas of the reference system. Optimizing over Qab, Ha, and the parameter γa things get easier within the RS approximation. In fact, γa = γ, satisfying a uniformity condition for all replicas. The only visible difference is in the linear term of (23) which now incorporates the explicit dependence on the multiplier γ leading to
We have already discussed in the text how the μ parameter does not play a significant role in determining the different phases, confirming prior results by Bunin and collaborators Bunin [2017], Lorenzana and Altieri [2022]. Once more, we can safely claim that the actual differences and emerging data clustering between healthy and sick samples are driven by the noise amplitude T and the interaction heterogeneity σ.
S3. Stability analysis: Hessian matrix and zero modes
To properly understand the stability of the single equilibrium phase and highlight possibly emergent multiple attractor regimes, we need to perform a stability computation. Computing the stability against external perturbations requires first the definition of the replicated partition function:
then the study the harmonic fluctuation of the free energy around the RS solution. For the latter, the Hessian matrix of the action 𝒜 is needed
where the subscript
Three different elements appear in the computation depending on the constraint/equality condition of their replica indices
Sticking to a single-equilibrium regime, namely in the RS approximation, the expression for the replicon eigenvalue can be further simplified
which precisely corresponds to Eq. (3) of the main text. A strictly positive or a vanishing value of the replicon are associated with a stable or marginally stable phase, respectively. In the second case, the RS ansatz is no longer valid and requires the use of a one-step or multiple-step replica-symmetry breaking.
S4. Cavity argument for the species abundance distribution
In the absence of demographic fluctuations and immigration, it is well-known that the species abundance distribution reflects a truncated Gaussian Yoshino et al. [2008], Bunin [2017], Tikhonov and Monasson [2017], Lorenzana and Altieri [2022]. Albeit the computation gets more complicated in the presence of demographic noise, we can nevertheless employ the cavity method Mezard and Montanari [2009], Zamponi [2010], Altieri and Baity-Jesi [2024] to obtain the single-species marginal probability distribution. Based on a cavity argument, we pretend to add a new species to the ecosystem and eventually investigate the resulting joint probability distribution. In the thermodynamic limit, for S ≫ 1, it reads
where, for the sake of simplicity, we have assumed the carrying capacities to be species-independent, i.e. Ki = K.
The last piece in parentheses denotes the cavity field,
The probability distribution conditioned to hc then reads:
where the thermal and the disordered averages of the cavity field, hc, must eventually be evaluated. Furthermore, to go beyond a naïve mean-field approximation, we need to subtract the background effect exerted by all other species on the picked one through an Onsager reaction term.
In the above expression,
The next step requires the computation of the disordered average. By doing so, we end up with the distribution of the second field, which is defined in terms of the two moments:
A similar derivation was obtained in the case of a generalized Lotka-Volterra model with noise to be compared with numerical distributions in different regimes Wu et al. [2021]. The resulting species abundance distribution typically exhibits a divergence at N = 0 and a secondary maximum, therefore closer to what is referred to as niche scenario.
Then, by introducing a rescaled random variable ζ and integrating over it, the marginal probability distribution becomes
where Nc is just replaced by N denoting the typical species abundance and T → 1/β. The expression for the normalization is roughly similar to the one in Eq. (34) with the only difference being that the mean and the variance of the cavity field must be replaced by the quenched average of
S5. Estimation of Neutral and Niche processes
Following the results of the cavity method in Section 5, we can describe the species abundance distribution of the system (see Eq. (37)) via the quantities ν = βλ > 0, the quadratic coefficient m = 1 − σ2β(qd − q0) and the disorder variable
As we specified in the main text, we refer to the quadratic coefficient m as a mass. When the mass becomes negative, the self-consistent equations (6) in the Methods section are no longer well-defined, and the order parameters diverge. This signals a transition to an unbounded growth regime.
So far, the obtained SAD is flexible enough to accommodate different qualitative shapes frequently appearing in theoretical ecology Azaele et al. [2016]. When ν > 1, the migration strength turns out to be stronger than demographic fluctuations and the distribution resembles a modal log-normal. On the other hand, when 0 < ν < 1 demographic noise overcomes immigration, and a low-abundance (integrable) divergence appears. In our case, ν ≈ 10−3, so demographic noise is much stronger than immigration.
When the divergence in zero appears, two further shapes are possible. If there exists a secondary extremum for the species abundance distribution, then there is an coexistence of niche (driven by species interactions) and neutral (driven by demographic noise) effects. Conversely, if no secondary extrema exist, birth and death effects are the key drivers of the dynamics. In our interpretation of the model, the species abundance distribution is the stationary distribution of a single realization of the Lotka-Volterra dynamics. Given a set of parameters θ = (μ, σ, β, λ), it is possible to evaluate how many realizations of the Lotka-Volterra dynamics will display a secondary extremum. Taking the form of a fraction, we introduce a quantity ψ that captures the chance of displaying such an extremum and measures the relative contribution between selective and neutral effects. Namely, we can evaluate the probability that ℋRS has a minimum different from zero:
Being an algebraic equation of degree two, it has two solutions
where
where
References
- Metabolites: messengers between the microbiota and the immune systemGenes & development 30:1589–1597Google Scholar
- Adaptive immunity increases the pace and predictability of evolutionary change in commensal gut bacteriaNature communications 6:8945Google Scholar
- Intrahost evolution of the gut microbiotaNature Reviews Microbiology 21:590–603Google Scholar
- Multi-omics of the gut microbial ecosystem in inflammatory bowel diseasesNature 569:655–662Google Scholar
- Homeostasis and dysbiosis of the gut microbiome in health and diseaseJournal of biosciences 44:1–8Google Scholar
- Gut microbiota in the pathogenesis of inflammatory bowel diseaseClinical journal of gastroenterology 11:1–10Google Scholar
- Neutral models of microbiome evolutionPLoS computational biology 11:e1004365Google Scholar
- Application of a neutral community model to assess structuring of the human lung microbiomeMBio 6:10–1128Google Scholar
- Stochastic neutral modelling of the gut microbiota’s relative species abundance from next generation sequencing dataBMC bioinformatics 17:179–188Google Scholar
- Neutrality in the metaorganismPLoS Biology 17:e3000298Google Scholar
- Stochastic logistic models reproduce experimental time series of microbial communitieseLife 9:e55650https://doi.org/10.7554/eLife.55650Google Scholar
- Macroecological laws describe variation and diversity in microbial communitiesNature communications 11:4743Google Scholar
- The stochastic logistic model with correlated carrying capacities reproduces beta-diversity metrics of microbial communitiesPLOS Computational Biology 18:e1010043Google Scholar
- Statistical mechanics of ecological systems: Neutral theory and beyondReviews of Modern Physics 88:035003Google Scholar
- Emergent functional organization of gut microbiomes in health and diseasesBiomolecules 14:5Google Scholar
- Emergent ecological patterns and modelling of gut microbiomes in health and in diseasePLOS Computational Biology 20:e1012482Google Scholar
- Universality of human microbial dynamicsNature 534:259–262Google Scholar
- Positive interactions are common among culturable bacteriaScience advances 7:eabi7159Google Scholar
- Deciphering microbial interactions in synthetic human gut microbiome communitiesMolecular systems biology 14:e8157Google Scholar
- Microbial interactions: from networks to modelsNature Reviews Microbiology 10:538–550Google Scholar
- Mapping the ecological networks of microbial communitiesNature communications 8:2042Google Scholar
- Sparse species interactions reproduce abundance correlation patterns in microbial communitiesProceedings of the National Academy of Sciences 121:e2309575121Google Scholar
- Inferring species interaction networks from species abundance data: A comparative evaluation of various statistical and machine learning methodsEcological Informatics 5:451–464Google Scholar
- Fundamental limitations of network reconstruction from temporal dataJournal of the Royal Society Interface 14:20160966Google Scholar
- Reconciling cooperation, biodiversity and stability in complex ecological communitiesScientific reports 9:5580Google Scholar
- How sample heterogeneity can obscure the signal of microbial interactionsThe ISME journal 13:2639–2646Google Scholar
- Some thoughts about the challenge of inferring ecological interactions from spatial dataBiodiversity Informatics 15:61–66Google Scholar
- Constrained proteome allocation affects coexistence in models of competitive microbial communitiesThe ISME Journal 15:1458–1477Google Scholar
- Environmental fluctuations explain the universal decay of species-abundance correlations with phylogenetic distancebioRxiv Google Scholar
- Taxonomic classification method for metagenomics based on core protein families with core-kaijuNucleic Acids Research 48:e93–e93Google Scholar
- Correlation detection strategies in microbial data sets vary widely in sensitivity and precisionThe ISME journal 10:1669–1681Google Scholar
- Microbiome datasets are compositional: and this is not optionalFrontiers in microbiology 8:2224Google Scholar
- Mapping the microbial interactome: Statistical and experimental approaches for microbiome network inferenceExperimental Biology and Medicine 244:445–458Google Scholar
- Will a large complex system be stable?Nature 238:413–414Google Scholar
- Stability criteria for complex ecosystemsNature 483:205–208Google Scholar
- Ecological communities with lotka-volterra dynamicsPhysical Review E 95:042414Google Scholar
- Dynamically evolved community size and stability of random lotka-volterra ecosystems (a)Europhysics Letters 123:48004Google Scholar
- Marginally stable equilibria in critical ecosystemsNew Journal of Physics 20:083051Google Scholar
- Properties of equilibria and glassy phases of the random lotka-volterra model with demographic noisePhysical Review Letters 126:258301Google Scholar
- Well-mixed lotka-volterra model with random strongly competitive interactionsPhysical Review E 105:024307Google Scholar
- Generic assembly patterns in complex ecological communitiesProceedings of the National Academy of Sciences 115:2156–2161Google Scholar
- Diversity begets stability: Sublinear growth and competitive coexistence across ecosystemsScience 383:eadg8488https://doi.org/10.1126/science.adg8488Google Scholar
- Emergent phases of ecological diversity and dynamics mapped in microcosmsScience 378:85–89Google Scholar
- Effects of intraspecific cooperative interactions in large ecosystemsSciPost Physics 12:013Google Scholar
- Chaotic turnover of rare and abundant species in a strongly interacting model communityProceedings of the National Academy of Sciences 121:e2312822121Google Scholar
- Generalized lotka-volterra systems with time correlated stochastic interactionsPhysical Review Letters 133:167101Google Scholar
- Information, physics, and computationOxford University Press Google Scholar
- An introduction to the theory of spin glassesEncyclopedia of Condensed Matter Physics :361–370Google Scholar
- Stability of the sherrington-kirkpatrick solution of a spin glass modelJournal of Physics A: Mathematical and General 11:983Google Scholar
- Replica symmetry breaking in the random replicant modelJournal of Physics A: Mathematical and General 28:4697Google Scholar
- Replicators with random interactions: A solvable modelPhysical Review A 39:4333Google Scholar
- Spin glass theory and beyond: An Introduction to the Replica Method and Its ApplicationsWorld Scientific Publishing Company Google Scholar
- Understanding species abundance distributions in complex ecosystems of interacting speciesarXiv preprint arXiv:2103.02081 Google Scholar
- Testing the anna karenina principle in human microbiome-associated diseasesIscience 23Google Scholar
- Deviation from neutral species abundance distributions unveils geographical differences in the structure of diatom communitiesScience Advances 10:eadh0477Google Scholar
- Many-species ecological fluctuations as a jump process from the brink of extinctionPhysical Review X 14:011037Google Scholar
- Microbiome one health model for a healthy ecosystemScience in One Health :100065Google Scholar
- Gut microbiome structure and metabolic activity in inflammatory bowel diseaseNature microbiology 4:293–305Google Scholar
- Longitudinal multi-omics reveals subset-specific mechanisms underlying irritable bowel syndromeCell 182:1460–1473Google Scholar
- Mean field theory of spin glassesarXivpreprint arXiv:1008.4844 Google Scholar
- Scipy 1.0: fundamental algorithms for scientific computing in pythonNature methods 17:261–272Google Scholar
- Rank abundance relations in evolutionary dynamics of random replicatorsPhysical Review E 78:031924Google Scholar
- Constraint satisfaction mechanisms for marginal stability and criticality in large ecosystemsPhysical Review E 99:010401Google Scholar
- Collective phase in resource competition in a highly diverse ecosystemPhysical review letters 118:048103Google Scholar
- Quadratic replica coupling in the sherrington-kirkpatrick mean field spin glass modelJournal of Mathematical Physics 43:3704–3716Google Scholar
- On the overlap in the multiple spherical sk models
- Mutual information for symmetric rank-one matrix estimation: A proof of the replica formulaAdvances in Neural Information Processing Systems 29Google Scholar
- Random fields and spin glasses: a field theory approachCambridge University Press Google Scholar
- Composite operators in cubic field theories and link-overlap fluctuations in spin-glass modelsPhysical Review B 93:024422Google Scholar
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.105948. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2025, Pasqualini et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 1,326
- downloads
- 30
- citation
- 1
Views, downloads and citations are aggregated across all versions of this paper published by eLife.