1. Ecology
  2. Evolutionary Biology
Download icon

Does diversity beget diversity in microbiomes?

  1. Naïma Madi
  2. Michiel Vos
  3. Carmen Lia Murall
  4. Pierre Legendre
  5. B Jesse Shapiro  Is a corresponding author
  1. Département de sciences biologiques, Université de Montréal, Canada
  2. European Centre for Environment and Human Health, University of Exeter, United Kingdom
  3. Department of Microbiology and Immunology, McGill University, Canada
  4. McGill Genome Centre, McGill University, Canada
Research Article
Cite this article as: eLife 2020;9:e58999 doi: 10.7554/eLife.58999
5 figures, 5 tables, 2 data sets and 7 additional files

Figures

Contrasting the Diversity Begets Diversity (DBD) and Ecological Controls (EC) models.

(A). In this hypothetical scenario, microbiome sample 1 contains one non-focal genus, and two amplicon sequence variants (ASVs) within the focal genus (point at x = 1, y = 2 in the plot). Sample 2 contains three non-focal genera, and four ASVs within the focal genus (point at x = 3, y = 4). Tracing a line through these points yields a positive diversity slope, supporting the DBD model (red). (B) Alternatively, a negative slope would support the Ecological Controls (EC) model (blue line). In the middle panel, we consider a community assembly model to explain the hypothetical data of the top panel, in which standing diversity (black points) in a community selects (for or against) new types (referred to here as ASVs) which arrive via migration (purple points and arrows). In the bottom panel, we consider an evolutionary diversification model of a focal lineage (genus) into ASVs as a function of initial genus-level community diversity present at the time of diversification.

Figure 2 with 26 supplements
Focal-lineage diversity as a function of community diversity in the top two most prevalent taxa at each taxonomic level.

As in Figure 1, the x-axes show community diversity in units of the number of non-focal taxa (e.g. the number of non-Proteobacteria phyla for the left-most column), and the y-axes show the taxonomic ratio within the focal taxon (e.g. the number of classes within Proteobacteria). Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey. Note that linear models are distinct from GLMMs, and are for illustrative purposes only. Four representative environments are shown (see Figure 2—figure supplement 216 for plots in all 17 environments).

Figure 2—figure supplement 1
Distributions of diversity slope estimates across different random effects, from the GLMMs predicting focal lineage diversity as a function of community diversity.

(A) Class:Phylum, (B) Order:Class, (C) Family:Order, (D) Genus:Family, and (E) ASV:Genus. Estimation of random effect coefficients from the GLMMs (Table S1), shows that the effect of diversity on focal lineage diversity (slope estimates) are generally positive but could be negative in some lineages or combinations of environment, lineage (Environment*Lineage), and the laboratory that submitted the dataset (Environment*Lab).Linear models are shown for the number of classes per phylum (y-axis) as a function of community diversity (number of non-focal phyla, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. P-values are Bonferroni corrected for 17 tests. Significant (p<0.05) models are shown with red trend lines.

Figure 2—figure supplement 2
Focal-lineage diversity as a function of community diversity across biomes in Proteobacteria.

Linear models are shown for the number of classes per phylum (y-axis) as a function of community diversity (number of non-focal phyla, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. P-values are Bonferroni corrected for 17 tests. Significant (p<0.05) models are shown with red trend lines.

Figure 2—figure supplement 3
Focal-lineage diversity as a function of community diversity across biomes in Bacteroidetes.

Linear models are shown for the number of classes per phylum (y-axis) as a function of community diversity (number of non-focal phyla, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. P-values are Bonferroni corrected for 17 tests. Significant (p<0.05) models are shown with red trend lines.

Figure 2—figure supplement 4
Focal-lineage diversity as a function of community diversity across biomes in Actinobacteria.

Linear models are shown for the number of classes per phylum (y-axis) as a function of community diversity (number of non-focal phyla, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. P-values are Bonferroni corrected for 17 tests. Significant (p<0.05) models are shown with red trend lines.

Figure 2—figure supplement 5
Focal-lineage diversity as a function of community diversity across biomes in Gammaproteobacteria.

Linear models are shown for the number of orders per class (y-axis) as a function of community diversity (non-focal classes, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 6
Focal-lineage diversity as a function of community diversity across biomes in Alphaproteobacteria.

Linear models are shown for the number of orders per class (y-axis) as a function of community diversity (non-focal classes, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 7
Focal-lineage diversity as a function of community diversity across biomes in Actinobacteria.

Linear models are shown for the number of orders per class (y-axis) as a function of community diversity (non-focal classes, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 8
Focal-lineage diversity as a function of community diversity across biomes in Actinomycetales.

Linear models are shown for the number of families per order (y-axis) as a function of community diversity (non-focal orders, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 9
Focal-lineage diversity as a function of community diversity across biomes in Flavobacteriales.

Linear models are shown for the number of families per order (y-axis) as a function of community diversity (non-focal orders, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 10
Focal-lineage diversity as a function of community diversity across biomes in Rhizobiales.

Linear models are shown for the number of families per order (y-axis) as a function of community diversity (non-focal orders, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 11
Focal-lineage diversity as a function of community diversity across biomes in Flavobacteriaceae.

Linear models are shown for genera per family (y-axis) as a function of community diversity (non-focal families, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 12
Focal-lineage diversity as a function of community diversity across biomes in Sphingomonadaceae.

Linear models are shown for genera per family (y-axis) as a function of community diversity (non-focal families, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 13
Focal-lineage diversity as a function of community diversity across biomes in Verrucomicrobiaceae.

Linear models are shown for genera per family (y-axis) as a function of community diversity (non-focal families, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 14
Focal-lineage diversity as a function of community diversity across biomes in Pseudomonas.

Linear models are shown for ASVs per genus (y-axis) as a function of community diversity (non-focal genera, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 15
Focal-lineage diversity as a function of community diversity across biomes in Planctomyces.

Linear models are shown for ASVs per genus (y-axis) as a function of community diversity (non-focal genera, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 16
Focal-lineage diversity as a function of community diversity across biomes in Clostridium.

Linear models are shown for ASVs per genus (y-axis) as a function of community diversity (non-focal genera, x-axis) in each of the 17 environments (EMPO3 biomes). Only environments containing the focal lineage are shown. Significant positive diversity slopes are shown in red, negative in blue (linear models, p<0.05, Bonferroni corrected for 17 tests), and non-significant in grey.

Figure 2—figure supplement 17
Null models based on Neutral Theory.

Results are shown from data simulated under (A) neutral Model 1, (B) neutral Model 2, or (C) neutral Model 3. Model 1 is sampled from the zero-sum multinomial distribution with a single distribution for the whole dataset, while Model 2 includes a separate distribution for each of the 17 different environments (EMPO 3 biomes). In Model 3 (C), the effect of DBD (top rows) or EC (bottom rows) are ‘spiked in’ at different levels, ranging from 0 to 100% of ASVs in a sample. Blue lines show a linear fit, with slopes (m) estimated by GLMM in selected panels. See Methods for model details, and Table 2 and Supplementary file 3, Section 1.2 for full GLMM results.

Figure 2—figure supplement 18
Lineage diversity (mean ASV:Genus ratio among all lineages) as a function of community diversity (number of genera) in the EMP data.

Samples from different environments (EMPO level 3) are shown in different colours, each with their corresponding linear model fit.

Figure 2—figure supplement 19
Taxonomic ratios estimated from simulated rarefied sequence data.

Each panel simulates a set of microbiome samples that differ in their diversity (number of genera in left panels A and B, number of phyla in right panels C and D) while maintaining a set true taxonomic ratio (horizontal black line). (A) True ratio set to 2 ASVs/genus, close to the per-sample mean and median in the real EMP data, in a range of samples between 1 and 1128 named genera, as observed in the real EMP data. (B) True ratio set to 20 ASVs/genus, equal to the overall mean of 22,014 named ASVs in 1128 named genera, and close to the maximum ratios observed in individual samples (Figure 2—figure supplement 5). Insets show the ranges of 1–50 and 51–150 genera, approximating observations from lower- or higher-diversity samples such as gut and soil, respectively (Figure 2—figure supplement 5). The insets only show the rarefaction to 5000 sequences, as used in the real EMP dataset. (C) True ratio set to three classes/phylum, close to the per-sample mean and median in the real EMP data, in a range of samples between 1 and 84 named phyla, as observed in the real EMP data. (D) True ratio set to 10 classes/phylum, close to the maximum ratios observed in individual samples (Figure 2—figure supplements 24). Different rarefaction levels are shown as different coloured lines.

Figure 2—figure supplement 20
Linear, quadratic, and cubic models for the relationship between focal-lineage diversity and community diversity for varying levels of % nucleotide identity.

Community diversity was estimated as the number of clusters at a focal level (di) and focal-lineage diversity as the mean of the clusters at the rank above (di 1/di). All P-values are <0.001. Linear fit (grey); quadratic fit (blue), cubic fit (red); same colours for the associated adjusted R2. The x-axis (diversity) shows the number of clusters at the focal percent-identity level (di), and the y-axis (diversification) is the mean of the clusters at the rank above (di 1/di).

Figure 2—figure supplement 21
Focal clusters at 75% nucleotide identity.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).

Figure 2—figure supplement 22
Focal clusters at 80% nucleotide identity.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).

Figure 2—figure supplement 23
Focal clusters at 85% nucleotide identity.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).

Figure 2—figure supplement 24
Focal clusters at 90% nucleotide identity.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).

Figure 2—figure supplement 25
Focal clusters at 95% nucleotide identity.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).

Figure 2—figure supplement 26
Focal clusters at 97% nucleotide identity.

Community diversity was estimated as the number of clusters at a focal level (di) and focal lineage diversity as the mean of the clusters at the rank above (di 1/di). Linear (grey), quadratic (blue) and cubic (red), with corresponding adjusted R-squared values in the same colour. P-values are Bonferroni corrected for 17 tests. Significant, p<0.05 (solid lines), non-significant (dashed lines). The x-axis shows the number of clusters at the focal percent-identity level (di), and the y-axis is the mean of the clusters at the rank above (di 1/di).

The diversity slope of focal taxa is higher in low-diversity (often host-associated) microbiomes.

The x-axis shows the mean number of non-focal taxa: (A) phyla, (B) classes, and (C) orders in each biome. On the y-axis, the diversity slope was estimated by a GLMM predicting focal lineage diversity as a function of the interaction between community diversity and environment type at the level of (A) Class:Phylum, (B) Order:Class, and (C) Family:Order ratios (Supplementary file 1 Section 3). The line represents a linear regression; the shaded area depicts 95% confidence limits of the fitted values. Adjusted R2 and P-values from the linear fits are shown at the top right of each panel. See Supplementary file 2 for model goodness of fit. Slopes not significantly different from zero are shown as empty circles. Estimates of bacterial cell density from the literature are indicated in grey text, in units of bacteria/mm3. For animal (skin) and plant surface, units of bacteria/mm2 were converted to mm3 assuming layers of bacteria one micron thick. For rhizosphere samples we assume a density of 1–2 g/cm(Kennedy and de Luna, 2005).

Figure 4 with 1 supplement
The DBD relationship varies between resident and non-resident genera.

(A) Ordination showing genera clustering into their preferred environment clusters. The matrix of 17 environments (rows) by 1128 genera (columns) by, with the matrix entries indicating the percentage of samples from a given environment in which each genus is present, was subjected to principal components analysis (PCA). Circles indicate genera and triangles indicate environments (EMPO 3 biomes). coloured circles are genera inferred by indicator species analysis to be residents of a certain environmental cluster, and grey circles are generalist genera. The three environment clusters identified by fuzzy k-means clustering are: Non-saline (NS, blue), saline (S, green) and animal-associated (purple). Triangles of the same colour indicate EMPO 3 biomes clustered into the same environmental cluster. (B) DBD in resident versus non-resident genera across environment clusters. Results of GLMMs modelling focal lineage diversity as a function of the interaction between community diversity and resident/migrant/generalist status. The x-axis shows the standardized number of non-focal resident genera (community diversity); the y-axis shows the number of ASVs per focal genus. Resident focal genera are shown in orange, migrant focal genera in red, and generalist focal genera in black. Red stars indicate a significantly positive or negative slope (Wald test, p<0.005). See Supplementary file 2 for model goodness of fit.

Figure 4—figure supplement 1
Resident genera of environment clusters.

Results from indicator species analysis illustrated as a heatmap. Only the 25 resident genera with the highest indval indices and p<0.05 (permutation test) are shown for every environment cluster (animal-associated, non-saline and saline free). For the full results see Supplementary file 5.

Positive effect of genome size on DBD.

Results are shown from a GLMM predicting focal lineage diversity as a function of the interaction between community diversity and genome size at the ASV:Genus ratio (Supplementary file 1 Section 6). The x-axis shows the standardized number of non-focal genera (community diversity); the y-axis shows the number of ASVs per focal genus. Variable diversity slopes corresponding to different genome sizes are shown in a blue colour gradient; the shaded area depicts 95% confidence limits of the fitted values. See Supplementary file 2 for model goodness of fit.

Tables

Table 1
Effects of community diversity on focal lineage diversity across taxonomic ratios.

The GLMMs show a statistically significant positive effect of community diversity on focal lineage diversity. Each row reports the effect of community diversity (Div) on focal lineage diversity, as well as its standard error, Wald z-statistic for its effect size and the corresponding P-value (left section), or standard deviation on the slope for the significant random effects (right section). SE = standard error, Env = environment type, Lin = lineage type, Lab = Principal Investigator ID, Sample = EMP Sample ID. Interactions are denoted as ‘*’. n.s. = not significant (likelihood-ratio test). All models provide a significantly better fit than null models without fixed effects (∆AIC > 10 and p<0.05; Supplementary file 2).

Slope (fixed effects)Standard deviation on the slope (random effects)
DivSEzPEnvLinLin*EnvEnv*LabSample
ASV:Genus0.0910.0165.7926.95e-09n.s.0.0740.1420.1140.067
Genus:Family0.0470.0085.9113.41e-09n.s.0.0710.070.039n.s.
Family:Order0.1190.0177.0012.54e-120.0230.0940.0920.106n.s.
Order:Class0.1090.0205.4475.13e-080.050.1410.0780.051n.s.
Class:Phylum0.2720.0436.3412.29e-100.1190.1740.1190.114n.s.
Table 2
GLMMs applied to data simulated under null models.

Null models 1 and 2 were generated under the ZSM distribution, with a single distribution for the whole dataset (Model 1) or one distribution per environment (Model 2). Model 3 is similar to Model 1, except with a single Poisson distribution for the whole dataset, and +DBD or +EC refer to adding these effects to all ASVs in each sample (see Materials and methods and Figure 2—figure supplement 17). Each row reports the effect of community diversity (Div) on focal lineage diversity, as well as its standard error, Wald z-statistic for its effect size and the corresponding P-value (Wald test) (left section), or standard deviation on the slope for the significant random effects (right section). SE = standard error, Env = environment type, Lin = lineage type, Sample = EMP Sample ID. n.s. = not significant (likelihood-ratio test), n.t. = not tested, because separate environments were not included in Models 1 or 3.

Slope (fixed effects)Stand dev on the slope (random effects)
DivSEzPEnvLinLin*EnvSample
Model 1−0.0050.000−9.807<2 e −16n.t.0.639n.t.n.s.
Model 2n.s.
Model 3−0.0120.002−6.5525.69e-11n.t.0.021n.t.n.s.
Model 3 + DBD0.0160.00111.48<2e-16n.t.0.008n.t.n.s.
Model 3 + EC−0.0110.002−6.148.26e-10n.t.nsn.t.n.s.
Table 3
GLMMs with community diversity measured using Shannon diversity.

Results are shown from GLMMs with Shannon diversity of non-focal taxa (Div) as a predictor of ASVs richness of focal taxa. Each row reports the estimate (Div), as well as its standard error, Wald z-statistic for its effect size and the corresponding P-value (Wald test) (left section), or standard deviation on the slope for the significant random effects (right section). SE = standard error, Env = environment type, Lin = lineage type, Lab = Principal Investigator ID, Sample = EMP Sample ID. n.s. = not significant (likelihood-ratio test).

Fixed effectsRandom effects
DivSEzpEnvLinEnv*LinEnv*LabSample
Genus0.0550.0134.331.49e-05n.s.0.080.150.0850.054
Family0.14802276.4918.51e-11n.s.0.1840.2680.160.134
Order0.3780.0389.864<2e-16n.s.0.340.4170.2580.202
Class0.3980.057.9731.54e-15n.s.0.3690.460.3260.262
Phylum0.3190.0883.6140.00030.1690.3160.50.4950.378
Table 4
Community diversity has a stronger effect than abiotic factors on focal lineage diversity (EMP dataset).

Results are shown from GLMMs with community diversity (Div), four abiotic factors (temperature, elevation, pH, and latitude), and their interactions with community diversity, as predictors of focal lineage diversity. Random effects on the intercept included environment, lineage, lab ID and sample ID. Each row reports the taxonomic ratio, the predictors used in the GLMM (fixed effects only), their slope estimate (Est), standard error (SE) and P-value (P) (Wald test). Interactions are denoted as ‘*’. Random effects are not shown.

PredictorEstSEP
ASV:GenusDiv0.1280.013<2e-16
Temperature0.040.0140.00479
Div*Temperature0.0430.0140.00175
Div*Latitude0.0310.0130.02119
Div*Elevation−0.0310.0140.02829
Genus:FamilyDiv0.0940.009<2e-16
Temperature0.0260.0090.00268
pH−0.0420.0095.88e-06
Family:OrderDiv0.1310.01<2e-16
Order:ClassDiv0.1840.01<2e-16
Div*Temperature0.0320.0090.000827
Div*Latitude0.0230.0080.005403
Class:PhylumDiv0.2360.011<2e-16
Div*Temperature0.0590.0142.15e-05
Div*Latitude0.030.0110.00884
Table 5
GLMMs applied to a soil dataset.

Each row reports the taxonomic ratio, the predictors used in the GLMM (fixed effects only), their estimate (Est), standard error (SE) and P-value (P) (Wald test). Left columns: GLMM with community diversity (Div) and all abiotic variables considered separately, as predictors of focal lineage diversity. Right columns: GLMM with community diversity (Div) and the three first principle components (PCs) representing abiotic variables, as predictors of focal lineage diversity. n.s., non-significant (LRT test). All models provide a significantly better fit than null models without fixed effects (∆AIC >10 and p<0.05; Supplementary file 2), except for the GLMM with abiotic factors at the Family:Order level, where latitude has a significant effect on focal lineage diversity but its effect is nearly null, with a ∆AIC between full and null model of 4 and a null marginal R2.

GLMMs with abiotic variablesGLMMs with the 3 first PCs
PredictorEstSEPPredictorEstSEP
ASV:GenusDivn.s.Div0.0640.0169.47e-05
Latitude0.2940.025<2e-16PC1−0.0650.007<2e-16
UV_light−0.1770.016<2e-16PC2−0.030.0061.98e-05
MDR0.0280.0067.12e-06
NPP2003_2015−0.0660.005<2e-16
Latitude^2−0.30.029<2e-16
Clay_silt^2−0.0120.0040.003
Soil_N^2−0.0070.0011.66e-06
Soil_C_N_ratio^20.0030.0010.004
PSEA^20.010.0024.84e-06
MDR^20.0170.0032.40e-08
NPP2003_2015^2−0.0160.0040.0001
Genus:FamilyDiv0.0320.010.0011Div0.0330.010.001
Latitude−0.0350.0062.04e-09PC1−0.0160.0060.02
PC20.020.0060.00089
Family:OrderDivn.s.Divn.s.
Latitude−0.00050.00020.0105PC1−0.0260.0070.00032
Div*PC10.040.0062.14e-12
Div*PC30.0230.0051.68e-06
Order:ClassNull model with no predictor was significant
Class:PhylumDiv0.0320.010.00174Div0.0320.010.003
pH0.0740.014.37e-13PC1−0.0510.013.54e-07
PC2−0.0280.010.006

Data availability

All data is available from the Earth Microbiome Project (ftp.microbio.me), as detailed in the Methods. All computer code used for analysis are available at https://github.com/Naima16/dbd.git (copy archived at https://archive.softwareheritage.org/swh:1:rev:ecb4f844264b72eaa8cbd708244ecd32d414c7dd/).

The following previously published data sets were used

Additional files

Supplementary file 1

Full GLMM outputs for the EMP data.

https://cdn.elifesciences.org/articles/58999/elife-58999-supp1-v3.pdf
Supplementary file 2

Goodness of fit for the GLMMs.

https://cdn.elifesciences.org/articles/58999/elife-58999-supp2-v3.docx
Supplementary file 3

Full GLMM output for simulated data under Neutral Theory models.

https://cdn.elifesciences.org/articles/58999/elife-58999-supp3-v3.pdf
Supplementary file 4

Full GLMM output for soil data (Delgado-Baquerizo et al., 2018).

https://cdn.elifesciences.org/articles/58999/elife-58999-supp4-v3.pdf
Supplementary file 5

Indicator species analysis.

The table shows the assignment of each genus to one of three environment types.

https://cdn.elifesciences.org/articles/58999/elife-58999-supp5-v3.xlsx
Supplementary file 6

Genome size assignment.

The table shows genome sizes assigned to each genus.

https://cdn.elifesciences.org/articles/58999/elife-58999-supp6-v3.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/58999/elife-58999-transrepform-v3.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)