Patient demographics and notable parameters of seminal quality and function for controls and study subjects.

Fisher’s exact tests for all except age. Chi-square test for age. (n=223).

Characterisation of semen microbiota composition at genera level.

A) Heatmap of Log10 transformed read counts of top 10 most abundant genera identified in semen samples. Samples clustered into three major microbiota groups based mainly on dominance by Streptococcus (Cluster 1), Prevotella (Cluster 2), or Lactobacillus and Gardnerella (Cluster 3). (n=223, Ward’s linkage). B) Silhouette scores of individual samples within each cluster. C) Relative abundance of the top 6 most abundant genera within each cluster. D) Species richness (p<0.0001; Kruskal-Wallis test) and E) alpha diversity (p<0.0001; Kruskal-Wallis test) significantly differed across clusters. F) Assessment of bacterial load using qPCR showed Clusters 2 and 3 have significantly higher bacterial loads compared to Cluster 1 Dunn’s multiple comparison test was used as a post-hoc test for between group comparisons (*p<0.05, ****p<0.0001).

Co-occurrence network estimated with SparCC from 16S sequencing counts at species level.

Network representing co-occurrence patterns (edges), between various taxonomic units, assigned at species level (nodes). Edges are colored by their estimated SparCC correlation coefficient (ρ). Edges with a SparCC bootstrapped p-value < 0.05, ρ < 0.25, and singleton nodes are not shown. Node color represents network community membership.

Relative abundance and prevalence matrices of Flavobacterium in relation to semen quality and morphology.

A) Relative abundance of Flavobacterium was significantly higher in samples with abnormal semen (p=0.0002, q=0.02). B) Detection of flavobacterium was significantly more prevalent in abnormal semen quality samples (p=0.0003). C) Flavobacterium relative abundance was significantly higher in samples with <4% morphologically normal forms (p=0.0002, q=0.01). D) Flavobacterium was also significantly more prevalent in samples with low percentage of morphologically normal sperm (p=0.0009).

Differential abundance analysis for bacterial genera with seminal quality and functional parameters.

Positive t-values indicate a positive relationship, and a negative t-value describes a negative relationship between relative abundance of taxa and seminal quality and function parameters. Significant relationships are indicated using p-values. q-values represent Benjamini-Hochberg false discovery rate corrected p-values for multiple comparisons.

Differential abundance analysis for bacterial species with seminal quality and functional parameters.

Positive t-values indicate a positive relationship and a negative t-value describes a negative relationship between relative abundance of taxa and seminal quality and function parameters. Significant relationships are indicated using p-values. q-values represent Benjamini-Hochberg false discovery rate corrected p-values for multiple comparisons.

Differential abundance analysis for specific taxa at genera level for controls and cases with male factor infertility.

Positive t-values indicate a relationship, and a negative t-value describes a negative relationship between relative abundance of taxa and seminal quality and function parameters. Significant relationships are indicated using p-values. q-values represent Benjamini-Hochberg false discovery rate corrected p-values for multiple comparisons.

Differential abundance analysis for specific taxa at species for controls and male factor infertility.

Positive t-values indicate a positive relationship and a negative t-value describes a negative relationship between relative abundance of taxa and seminal quality and function parameters. Significant relationships are indicated using p-values. q-values represent Benjamini-Hochberg false discovery rate corrected p-values for multiple comparisons.