The process of coarse-graining abundances using the phylogeny.

Taxonomic assignment in 16S rRNA amplicon sequence data provided the opportunity to investigate how properties of communities vary at different taxonomic scales. The most straightforward means of coarse-graining here is to sum the abundances of OTUs/ASVs that belong to the same taxonomic group. Amplicon data-based studies provide information about the shared evolutionary history of its constituents, information that can be leveraged by the construction of phylogenetic trees. A coarse-graining procedure can be defined that is analogous to one based on taxonomy, where a phylogenetic distance is chosen, and terminal nodes are collapsed if their distance to a common ancestor is less than the prescribed distance.

The shape of the AFD remained qualitatively invariant under coarse-graining.

a) Under phylogenetic coarse-graining the general shape of the AFD for OTUs that were present in all sites (i.e., an occupancy of one) remained qualitatively invariant. b) Similarly, the shape of the relationship between the mean coarse-grained abundance across hosts and occupancy across sites did not tend to vary. Predictions obtained from the gamma distribution are capable of capturing the relationship between the mean abundance and occupancy, suggesting that the gamma distribution remains a useful quantitative null model under coarse-graining. All data in this plot is from the human gut microbiome.

The gamma distribution successfully predicted mean richness and diversity under phylogenetic coarse-graining.

a) The expected richness derived from the gamma distribution (Eq. 13) was capable of predicting richness across phylogenetic coarse-graining scales, as illustrated by data from the human gut. b) Predictions remained successful across all environments, suggesting that a minimal model of zero interactions was sufficient to predict observed properties of community composition c,d) Similarly, predictions of expected diversity (14) also succeeded across coarse-graining scales for all environments. The shade of a color of a given datapoint represents the phylogenetic distance used for coarse-graining, with lighter colors representing finer scales and darker colors representing coarser scales.

The gamma distribution only predicts the variance of richness and diversity under phylogenetic coarse-graining when covariance is included.

a,b) In contrast with the mean, the variance of richness and diversity estimates predicted by the gamma distribution (Eq. 17) failed to capture empirical estimates from the human gut. Predictions are only comparable when empirical estimates of covariance are included in the predictions of the gamma distribution, meaning that dependence among community members is essential to describe the variation in measures of biodiversity across communities c,d) This lack of predictive success was constant across environments, e,f) though the addition of covariance consistently improves our analytic predictions. The colorscale used here is identical to the colorscale used in Fig. 3.

The slope of the fine vs. coarse-grained relationship for richness could be predicted by the gamma distribution, but was novel for estimates of diversity.

a,b) The predictions of the gamma distribution (Eq. 3) successfully reproduced observed fine vs. coarse-grained richness slopes across scales of phylogenetic coarse-graining. c,d) In contrast, the predictions of the gamma distribution failed to capture diversity slopes (Eq. 4). The colorscale used here is identical to the colorscale used in Fig. 3. Squared Pearson correlation coefficients (ρ2) are computed over all slopes for all taxa across all coarse-graining scales.

Including correlations allows the gamma distribution to capture observed diversity slopes.

Observed fine vs. coarse-grained diversity slopes could be quantitatively reproduced under phylogenetic coarse-graining by simulating correlated gamma distributed AFDs at the OTU-level. The colorscale used here is identical to the colorscale used in Fig. 3. Squared Pearson correlation coefficients (ρ2) are computed over all slopes for all taxa across all coarse-graining scales.