The process of coarse-graining abundances using the phylogeny.

Taxonomic assignment in 16S rRNA amplicon sequence data provides the opportunity to investigate how properties of communities vary at different taxonomic scales. The most straightforward means of coarse-graining here is to sum the abundances of OTUs/ASVs that belong to the same taxonomic group. Amplicon data-based studies provide information about the shared evolutionary history of community members, information that can be leveraged by the construction of phylogenetic trees. A coarse-graining procedure can be defined that is analogous to one based on taxonomy, where a phylogenetic distance is chosen, and terminal nodes are collapsed if their distance to a common ancestor is less than the prescribed distance.

The AFD remains invariant under coarse-graining.

We can see that despite phylogenetic coarse-graining, a) the general shape of the AFD for OTUs that were present in all samples (i.e., an occupancy of one) and b) the relationship between the mean coarse-grained abundance and occupancy across sites remains invariant. Furthermore, predictions obtained from the gamma distribution are capable of capturing the relationship between the mean abundance and occupancy, suggesting that the gamma remains a useful quantitative null model under coarse-graining. All data in this plot is from the human gut microbiome.

The gamma distribution successfully predicts mean richness and diversity under phylogenetic coarse-graining.

a) The expected richness predicted by the gamma (Eq. 11) was capable of predicting richness across phylogenetic coarse-graining scales, as illustrated by data from the human gut. b) Predictions remained successful across all environments. c,d) Similarly, predictions of expected diversity (12) succeeded across coarse-graining scales among all environments.

The gamma can only predict the variance of richness and diversity under phylogenetic coarse-graining when covariance is included.

a,b) In contrast with the mean, the variance of richness and diversity estimates predicted by the gamma (Eq. 15) assuming independence among species fails to capture empirical estimates from the human gut. Predictions are only comparable when empirical estimates of covariance are included in the predictions of the gamma distribution. c,d) This lack of predictive success is constant across environments, e,f) though the addition of covariance consistently improves our analytic predictions.

The gamma distribution as a tool for investigating the novelty of fine vs. coarse-grained slopes.

a,b) The predictions of the gamma distribution (Eq. 3) successfully reproduce observed fine vs. coarse-grained richness slopes across scales of phylogenetic coarse-graining. c,d) In contrast, the predictions of the gamma distribution fail to capture diversity slopes (Eq. 4).

Gamma distribution simulations with correlations capture observed diversity slopes.

Observed fine vs. coarse-grained diversity slopes can be quantitatively reproduced under phylogenetic coarse-graining by simulating correlated gamma distributed AFDs at the OTU-level.