Conceptual framework.

A sedimentary archive spanning 100 years was sampled from Lake Ring, Denmark and dated using radioisotopes. Both biotic and abiotic changes were empirically quantified through time: 1) community-level biodiversity was reconstructed by applying multilocus metabarcoding to environmental DNA isolated from sediment layers (biological fingerprinting); 2) chemical signatures were quantified from the same sediment layers using mass spectrometry analysis (chemical fingerprinting); 3) climate data were collected from publicly available databases. Explainable network models with multimodal learning were applied to identify significant correlations between system-level biodiversity, chemical fingerprinting, and climate variables. Taxonomic units (families) impacted by environmental factors were identified and environmental factors ranked based on their effects on community biodiversity. This approach enables the prioritisation of conservation and mitigation interventions.

PERMANOVA on beta diversity.

Permutational Multivariate Analysis of Variance using weighted Unifrac distances ASV matrices testing for pairwise differences between lake phases across the five barcodes used in the study (16SV1, 16SV4, 18S, COI, rbcL) with 999 permutations. Significant terms (p-values <0.05 after applying Benjamini & Hochberg correction for multiple testing) are in bold. The lake phases are as follows: SP - semi-pristine; E - Eutrophic; P - pesticides; R - recovery.

Biodiversity compositional changes.

(A) Weighted unifrac beta diversity heatmaps between each pair of sediment layers spanning a century (1916-2016) for the five barcodes used in this study (18S, rbcL, COI, 16SV1 and 16SV4). The PERMANOVA statistics in Table 1 support these plots. The scale used may be different among the heatmaps. (B) Taxonomic bar plots including the top 10 most abundant families identified across five barcodes (18S, rbcL, COI, 16SV1 and 16SV4). shown per lake phase: SP - semi-pristine; E - eutrophic; P - pesticides; R - recovery.

Functional analysis.

Functional pathways that are significantly differentially enriched between lake phases are shown for the 16SV1 and the 16SV4 barcodes. The lake phases are as in Figure 2: SP - semi-pristine; E - eutrophic; P - pesticides; R - recovery. Odds ratios indicate the representation of each pathway in the pairwise comparisons.

sCCA 3D plots.

Sparse canonical correlation analysis 3D plots for the five barcodes used (18S, rbcL, COI, 16SV1 and 16SV4), showing the proportion of biodiversity variance explained by the biocides and climate variables. As biocides were introduced around the 1960s, this analysis spans the most recent three lake phases (Eutrophic, Pesticide and Recovery). Interactive version available: https://environmental-omics-group.github.io/Biodiversity_Monitoring/

Joint effects of environmental variables on biodiversity.

A) heatmap showing the frequency of joint effects of biocides and climate variables in eukaryotes (data from the 18S barcode) and prokaryotes (combined data from 16Sv1 and 16Sv4 barcodes). The biocides are ranked based on their correlation coefficient with taxonomic units and climate variables. Ranking of biocide types is provided in Table S3; B) temporal correlation between the family Isochrysidales, summer precipitation and insecticides. The joint effect of summer precipitation and insecticides is also shown; C) temporal correlation between Pleosporales, insecticides and mean minimum temperature. The joint effect of insecticides and mean minimum temperature is also shown. The families’ relative abundance over time in plots B and C are standardized values.

Alpha diversity.

Alpha diversity, measured as Shannon entropy, is shown for the five barcodes used in this study (16SV1, 16SV4, 18S, COI and rbcl) between 1916-2016. The four lake phases are colour-coded as follows: Black - Semi-pristine; blue - Eutrophic; green - Pesticides; red - Recovery. Kruskal-Wallis test across all phases: 18S: h 4.199, Pval = 0.241; rbcL: h 21.677, Pval<0.000; COI: h 16.958, Pval = 0.001; 16SV1: h 7.001, Pval = 0.072; 16SV4: h 2.220, Pval = 0.528.

Principal Coordinate Analysis.

PCoA visualization of weighted unifrac distance between samples. Positive controls for PCR consist of duplicates of up to three samples from the sedimentary archive for each of the five barcodes used in the study (16SV1, 16SV4, 18S, rbcL and COI). Replicated samples are circled. The four lake phases are colour-coded as follows: Black - Semi-pristine; blue - Eutrophic; green - Pesticides; red - Recovery.

Trophic Diatom Index.

LTDI2 calculated using the diatom species identified in our study between 1915 and 2015 with the rbcL barcode and the “DARLEQ3” (Diatoms for Assessing River and Lake Ecological Quality) tool. Mean value of 67.59, standard deviation 6.3. The four lake phases are colour-coded as follows: Black - Semi-pristine; blue - Eutrophic; green - Pesticides; red - Recovery.

Biocides records.

A) Records of physico-chemical parameters measured in Lake Ring. Dotted lines indicate missing data points. Summer and annual mean temperature were recorded at a weather station 80km from Lake Ring. B) Record of biocides sales in Denmark (Million Tons/Year) between 1950 and 2016, downloaded from the Danish national archives; C) empirical record of DDT measured from the sediment layers of Lake Ring using mass spectrometry analysis (ng/g; blue) and plotted against the sales record in Denmark (Million Tons/year; orange). DDT was banned in Denmark in 1986.

AI pipeline.

The analytical pipeline consists of six main steps: Step 1 is the preparation of input data matrices (ASVs, biocides and climate variables) to be used in the sCCA analysis. The type of environmental data may vary with the study; Step 2 is the matrix-on-matrix regression between the ASVs and another environmental data matrix, biocides or climate in this study. Following the sCCA analysis, the ASVs are assigned to family level (or other relevant taxonomic order); Step 3 consists of a Sliding Window (Pearson) Correlation (SWC) analysis, used to identify significant temporal correlations between families and environmental variables from the sCCA analysis; Step 4 identifies the families that co-vary with either biocides or climate variables independently; Step 5 is used to perform an intersection analysis among multiple matrices (families, biodices and climate variables); Step 6 applies a Sliding Window (Pearson) Correlation (SWC) analysis to identify families, whose relative abundance changes both with biocides and climate variables over time. The pipeline enables the ranking of environmental variables or their combination thereof that is inversely correlated to the relative abundance of families over time.

sCCA analysis.

CCA loadings calculated with sparse canonical correlation analysis for biocides (A) and climate variables (B). The categories of biocides are insecticides, fungicides, pesticides and herbicides. The environmental variables are mean minimum temperature, maximum daily precipitation, highest recorded temperature, mean summer temperature, summer precipitation, annual total precipitation, summer atmospheric pressure and lowest recorded temperature.