Sperm counts within published studies have reduced by 55% since the 1970s which may reflect rising health burdens of obesity and / or environmental pollution (1). Male factor accounts for approximately half of all cases of infertility yet there are limited available interventions to improve sperm quality. Understanding the pathogenesis of male infertility may reveal novel therapeutic approaches for treating affected couples.

Symptomatic, genitourinary infection is an established cause of male infertility detected by semen culture and treated with antibiotics (2, 3). Bacteria provoke seminal leukocytes to release bactericidal reactive oxygen species (ROS), which may paradoxically damage sperm DNA and impair semen quality (4). Semen culture has a limited scope for studying the seminal microbiota, but next generation sequencing (NGS) analysis of the semen microbiome (5, 6, 7, 8, 9, 10, 11) has revealed associations between the microbiome semen parameters in relatively small numbers of men with infertility. We and others have reported that asymptomatic men affected by recurrent pregnancy loss (RPL) have increased risks of high seminal ROS and sperm DNA fragmentation, which are also associated with male infertility (12, 13, 14, 15, 16, 17). It is therefore plausible that asymptomatic seminal infection may predispose men to RPL in addition to infertility. Furthermore, common seminal microbial signatures may encompass both male infertility and RPL.. Elucidation of an association would have wide clinical application with therapeutic potential couple with reproductive disorders.

We explored relationships between metataxonomic profiles of bacteria, bacterial copy number and key parameters of sperm function and quality in semen samples prospective collected from 223 men, including those diagnosed with male factor infertility, unexplained infertility, partners affected by recurrent miscarriage, and paternity-proven controls.


Ethical approval was granted by the West London and Gene Therapy Advisory Committee (GTAC) Research Ethics Committee (14/LO/1038) and by the Internal Review Board at the Centre for Reproductive and Genetic Health (CRGH) (IRB-0003C07.10.19). Participants were recruited following informed consent from clinics in Imperial College London NHS Trust and The Centre for Reproductive and Genetic Health (CRGH). Further detailed information on methods used in this study are included in the Supplementary Material.

Semen samples were produced by means of masturbation after 3-7 days abstinence. All semen samples were collected into sterile containers after cleaning of the penis using a sterile wipe. Samples were incubated at 37°C for a minimum of 20 mins prior to analysis. An aliquot was collected in a sterile cryovial and stored at -80°C.

Diagnostic semen analysis was carried out according to WHO 2010 guidelines and UK NEQAS accreditation (18) (19). Seminal analysis was performed in the Andrology Departments of Hammersmith Hospital and CRGH. Microscopic and macroscopic semen qualities were assessed within 60 mins of sample production. Semen volume, sperm concentration, total sperm count, progressive motility and total motility count, morphological assessment, anti-sperm antibodies and leucocyte count were established.

ROS analysis was performed using an in-house developed chemiluminescence assay validated by Vessey et al (20). Results are therefore reported as ‘relative light units per second per million sperm’. The upper limit of optimal ROS was internally determined at 3.77 RLU/sec/106 sperm (95% CI) (21).

Sperm DNA fragmentation assessment performed by TUNEL (Terminal deoxynucleotidyl transferase biotin-dUTP Nick End Labelling) assay defined elevated sperm DNA fragmentation as >20% (22). Samples for the COMET assay were sent to the Examen Lab (Belfast, UK) for analysis with elevated sperm DNA fragmentation defined as >27% (23).

DNA extraction was performed on 200uL of semen using enzymatic lysis and mechanical disruption. Bacterial load was estimated by determining the total number of 16S rRNA gene copies per sample using the BactQuant assay (24).

Metataxonomic profiling of semen microbiota was performed using MiSeq sequencing of bacterial V1-V2 hypervariable regions of 16S rRNA gene amplicons 16S rRNA genes using a mixed forward primerset 28F-YM GAGTTTGATYMTGGCTCAG, 28F-Borrellia GAGTTTGATCCTGGCTTAG, 28F-Chloroflex GAATTTGATCTTGGTTCAG and 28F-Bifdo GGGTTCGATTCTGGCTCAG at a ratio of 4:1:1:1 with 388R reverse primers. Sequencing was performed on the Illumina MiSeq platform (Illumina, Inc. San Diego, California). Following primer trimming and assessment of read quality, amplicon sequence variants (ASV) counts per sample were calculated and denoised using the Qiime2 pipeline (25) and the DADA2 algorithm (26). ASVs were taxonomically classified to species level using a naive Bayes classifier trained on all sequences from the V1-V2 region of the bacterial 16S rRNA gene present in the SILVA reference database (release 138.1) (27) (28).

Controls and contamination 3 negative kit/environmental control swabs were included to identify and eliminate potential sources of contamination and false positives in the 16S metataxonomic profiles. These swabs were removed from the manufacturers packaging, waved in air, and then subjected to the same entire DNA extraction protocol. Decontamination of data was done using the decontam package (v1.9.0) in R, at ASV level, using both “frequency” and “prevalence” contaminant identification methods with threshold set to 0.1 (28). The “frequency” filter was applied using the total 16S rRNA gene copies measured as the conc parameter. For the “prevalence” filter all 3 blank swabs were used as negative controls and compared against all semen samples. ASVs classified as a contaminant by either method (n = 94) were excluded.

Statistical analysis. Hierarchical clustering with Ward-linkage and Jensen-Shannon distance was used to assign samples to putative community state types, with the number of clusters chosen to maximise the mean silhouette score. Linear regression models used to regress microbiome features against semen quality parameters and other clinical and demographic variables were fitted with the base R lm function (v4.2.0). The Benjamini-Hochberg false discovery rate (FDR) correction was used to control the FDR of each covariate signature independently (e.g., ROS, DNA Fragmentation, or Semen quality), with a q < 0.05, or 5%, cut-off, in both regression and Chi-squared analyses. Detailed information for statistical modelling is presented Supplementary methods.


Study population

Semen samples were collected from a total of 223 men; this included control (n=63) and a study group (n=160) comprised of men diagnosed with male factor infertility (MFI) (n=58), male partners of women with recurrent pregnancy loss (RPL) (n=46) and male partners of couples diagnosed with unexplained infertility (UI) (n=26). The overall mean age of the total cohort was 38.1 ± 6 (mean ± SD). The mean age for controls was 40.1 ± 8, and the mean age for patients undergoing various fertility investigations was 37 ± 4.8. Ethnicity representation amongst recruited cohorts were not significantly different (p=0.38, Chi-square; Supplementary Table 1).

Semen quality assessment: Rates of high sperm DNA fragmentation, elevated ROS and oligospermia were more prevalent in the study group compared with control (Table 1). The study group represented 85% of samples with high sperm DNA fragmentation, 85% of samples with elevated ROS and 79% of samples with oligospermia. Rates of abnormal seminal parameters including low sperm concentration, reduced progressive motility and ROS concentrations were found to be highest in the MFI group (Supplementary Figure 1).

Patient demographics and notable parameters of seminal quality and function for controls and study subjects

. The prevalence of sperm DNA fragmentation and ROS were higher in patients undergoing fertility investigations compared to controls. Prevalence of oligospermia was also significantly higher in study subjects. Fisher’s exact tests for all except age. Chi-square test for age. (n=223).

The seminal microbiota: Following decontamination, a total of 7,998,565 high quality sequencing reads were identified and analysed. Hierarchical clustering (Ward linkage) of relative abundance data resolved to genera level identified three major clusters, as determined by average silhouette score, amongst all samples (Figure 1, Supplementary Figure 2). These were compositionally characterised by high relative abundance of 1. Streptococcus, 2. Prevotella, or 3. Lactobacillus and Gardnerella. Assessment of bacterial load using qPCR showed Clusters 2 and 3 had significantly higher bacterial loads compared to Cluster 1. Similar analyses were performed using sequencing data mapped to species level, however, examination of individual sample Silhouette scores within resulting clusters highlighted poor fitting indicating a lack of robust species-specific clusters (Supplementary Figure 3).

Characterisation of semen microbiota composition at genera level. A)

Heatmap of Log10 transformed read counts of top 10 most abundant genera identified in semen samples. Samples clustered into three major microbiota groups based mainly on dominance by Streptococcus (Cluster 1), Prevotella (Cluster 2), or Lactobacillus and Gardnerella (Cluster 3). (n=223, Ward’s linkage). B) Silhouette scores of individual samples within each cluster. C) Relative abundance of the top 6 most abundant genera within each cluster. D) Species richness (p<0.0001; Kruskal-Wallis test) and E) alpha diversity (p<0.0001; Kruskal-Wallis test) significantly differed across clusters. F) Assessment of bacterial load using qPCR showed Clusters 2 and 3 have significantly higher bacterial loads compared to Cluster 1 Dunn’s multiple comparison test was used as a post-hoc test for between group comparisons (*p<0.05, ****p<0.0001).

Bacterial richness, diversity and load were similar between all patient groups examined in the study (Supplementary Figure 4). Similarly, no significant associations between bacterial clusters, richness, diversity or load with seminal parameters, sperm DNA fragmentation or semen ROS were observed (Supplementary Tables 2-3). Several organisms at genera level, identified variably in the literature as responsible for genito-urinary infection, whilst ASVs in the data set did not reach the prevalence criteria (present in at last 25% of the samples) to be carried forward to regression modelling (29) (30) (21). This included Chlamydia, Ureaplasma, Neisseria, Mycoplasma and Escherichia. However, several associations (p<0.05) between relative abundance of specific bacterial genera and key sperm parameters were observed (Table 2). For example, increased sperm DNA fragmentation was positively associated with increased relative abundance of Porphyromonas and Varibaculum and inversely correlated with Cutibacterium and Finegoldia. ROS was positively associated with Lactobacillus species relative abundance, with analyses performed at species level taxonomy indicating that this relationship was largely driven by L. iners (p=0.04; Table 3). In contrast, Corynebacterium was inversely associated with ROS and positively associated with semen volume. Of note, the genera Flavobacterium was positively associated with both abnormal semen quality and sperm morphology and in both cases, withstood FDR correction for multiple testing (q=0.02 and q=0.01, respectively) (Table 2) (Figure 2). Consistent with this, a positive association between an unidentified species of Flavobacterium and semen quality was also observed (q=0.01, Table 3).

Relative abundance and prevalence matrices of Flavobacterium in relation to semen quality and morphology.

A) Relative abundance of Flavobacterium was significantly higher in samples with abnormal semen (p=0.0002, q=0.02). B) Detection of flavobacterium was significantly more prevalent in abnormal semen quality samples (p=0.0003). C) Flavobacterium relative abundance was significantly higher in samples with <4% morphologically normal forms (p=0.0002, q=0.01). D) Flavobacterium was also significantly more prevalent in samples with low percentage of morphologically normal sperm (p=0.0009).

Differential abundance analysis for bacterial genera with seminal quality and functional parameters.

Positive t-values indicate a positive relationship, and a negative t-value describes a negative relationship between relative abundance of taxa and seminal quality and function parameters. Significant relationships are indicated using p-values. q-values represent adjusted p-values for multiple comparisons.

Differential abundance analysis for bacterial species with seminal quality and functional parameters.

Positive t-values indicate a positive relationship and a negative t-value describes a negative relationship between relative abundance of taxa and seminal quality and function parameters. Significant relationships are indicated using p-values. q-values represent adjusted p-values for multiple comparisons.

To focus analyses toward the most extreme phenotype of poor semen quality, a sub-analysis of controls compared with MFI was performed (Table 4). Non-parametric differential abundance analysis again identified a robust relationship between Flavobacterium and abnormal sperm morphology (q=0.01, Table 4). At species level, this was mapped to an unidentified species of Flavobacterium (q=0.01, Table 5). Similar to findings observed for all samples, sperm DNA fragmentation was inversely associated with relative abundance of Cutibacterium and positively associated with Porphyromonas and Varibaculum was also observed.

Differential abundance analysis for specific taxa at genera level for controls and cases with male factor infertility.

Positive t-values indicate a relationship, and a negative t-value describes a negative relationship between relative abundance of taxa and seminal quality and function parameters. Significant relationships are indicated using p-values. q-values represent adjusted p-values for multiple comparisons.

Differential abundance analysis for specific taxa at species for controls and male factor infertility.

Positive t-values indicate a positive relationship and a negative t-value describes a negative relationship between relative abundance of taxa and seminal quality and function parameters. Significant relationships are indicated using p-values. q-values represent adjusted p-values for multiple comparisons.


To our knowledge, this is the largest study to date investigating the seminal microbiome in men. Herein we comprehensively report relationships between semen microbial diversity, load, and compositional structure with both molecular and classical seminal parameters, allowing us to describe seminal microbiome clusters common both healthy men and those with infertility and RPL. We also suggest that microbiota perturbation is a sign of poor semen quality irrespective of whether a man has yet been identified to have a reproductive disorder.

Recent studies have characterised the semen microbiota in health men and those with infertility(5, 6, 8, 9, 10, 11, 31). We have extended these findings by analysing a larger sample of men subclassified into different reproductive disorders likely to arise from poor semen quality. Unlike most prior studies, we were able to phenotype men with molecular markers of reproductive function such as seminal ROS and sperm DNA fragmentation, which are known to damage sperm (5, 6, 8, 9, 10, 11, 31). classical seminal parameters, but also key functional parameters. Furthermore, we incorporated stringent negative controls to permit removal of sequences likely originating from extraction kits and reagents known to contaminant low biomass samples such as semen (6, 8, 31). This is important since Molena et al report that 50%-70% of detected bacterial reads may be contaminants in a sample from testicular spermatozoa (32); with the addition of accessory gland secretions and passage along the urethra it is likely that contamination of ejaculated semen would be much higher.

Mapping of genera level relative abundance data enabled semen samples to be categorised into 3 major clusters characterised by differing relative abundance of Streptococcus, Prevotella, Lactobacillus and Gardnerella. Unlike previous studies, we used an objective statistical approach (i.e. Silhouette methods) to determine the optimal number of microbial clusters supported by the data. These findings are largely consistent with earlier semen metataxonomic profiling studies reporting clusters enriched for Streptococcus, Lactobacillus and Prevotella (5, 6, 8). Moreover, Baud et al., reported increased bacterial richness in the Prevotella-enriched cluster, which we also observed (8). This may suggest that certain compositional characteristics of seminal microbiota are conserved across populations. However, similar modelling of species level data, failed to identify statistically robust clusters. This contrasts with other niches such as the vagina where reproducible clusters based on species level metataxonomic profiles have been demonstrated reflecting mutualistic relationships between specific species and the host, which have coevolved over long periods of time (33, 34). It is possible therefore that our findings indicate that microbiota detected in semen are likely the result of transient colonisation events. Consistent with this, several species known to be commensal to the penile skin including Streptococcus, Corynebacterium and Staphylococcus, or the female genital tract including Gardnerella and Lactobacillus, were observed in semen samples (35). This is in keeping with data suggesting microbiota transference during sexual intercourse (36). It remains possible that a proportion of bacteria detected in semen reflects contamination of the sample acquired during the collection procedure. Studies undertaking assessment of female partner microbiota profiles as well as temporal profiling of semen microbiota would improve understanding of potential dynamic restructuring of semen microbiota compositions. This has been done in part by Baud et al by studying the subfertile couple as a unit to establish if there is a ‘couple microbiota’(37). They took samples from 65 couples with a range of pathologies including idiopathic infertility. From each woman they took vaginal swabs and follicular fluid samples. From each man they took a semen samples and penile swabs. They undertook extensive negative control series and stringent in silico elimination of possible contaminants. The found the male microbiota to be much more diverse than the female, with 90% of female samples being Lactobacillus-dominant. Intra-personal male samples i.e. semen and penile swabs from the same man bore more similarity to each other than inter-personal samples of the same sample type ie semen or penile swab comparisons between men (37). They identified that the male microbiota had very little impact of the microbiota of the female sexual partner (37). Lack of information regarding the sexual activity of the enrolled couples limits this study somewhat.

Several previous studies have described semen microbiota composition to genera level and some have reported associations between specific genera and parameters of semen quality and function (5, 6, 8, 9, 10, 11, 31). However, in many cases these studies have failed to consider multiple comparisons testing, likely leading the reporting of spurious associations. We did not observe any significant associations between bacterial clusters, richness, diversity or load with traditional seminal parameters, sperm DNA fragmentation or semen ROS. This is in contrast with Veneruso et al., who reported that in infertile patients, semen bacterial diversity and richness was decreased whereas Lundy et al., reported that diversity was increased in infertile patients (9, 31). Further, Lundy et al., reported Prevotella abundance to be inversely associated with sperm concentration; this was not replicated in our study (9). There are several possible reasons accounting for the high heterogeneity in results including differences in methodology used to assess the microbial component of semen as well as differences in study design (38). For example, time of sexual abstinence prior to sample production as well as sample processing time often differs between studies, which has been shown to impact microbiological composition of semen (39).

The only association between bacterial taxa and semen parameters to withstand false detection rate testing for multiple comparisons detected in our study was between Flavobacterium and abnormal semen quality and sperm morphology (q=0.02). Flavobacterium are gram-negative physiologically diverse aerobes, some of which are pathogenic (40). Flavobacterium was recently identified as a dominant genus in immature sperm cells retrieved from testicular biopsies of infertile men in a study by Molina et al (32). However, in contrast to these findings, a recent smaller study investigating semen collected from 14 sperm donors and 42 infertile idiopathic patients reported an association between Flavobacterium and increased sperm motility but a negative correlation with sperm DNA fragmentation (10). Though not withstanding multiple correction, we did observe several other associations between specific bacterial taxa and semen parameters. For example, samples enriched with Lactobacillus had lower incidence of elevated seminal ROS, a relationship which could largely be accounted for by Lactobacillus iners, a common member of the cervicovaginal niche (41). Various studies have also found Lactobacillus enrichment in semen to associate with normal seminal parameters, especially morphology (6, 8). were Lactobacillus-predominant (6). However, an association between samples enriched with Lactobacillus and asthenospermia or oligoasthenospermia has also been described (11). We also observed an association between increased sperm DNA fragmentation and samples enriched with Varibaculum, which is consistent with previous reports of increased relative abundance of Varibaculum in semen infertile (31).

This and previous studies have used single sample collections, so temporal variations in semen microbiota remain unknown. As with other studies, we sampled a single geographical population. Ethnic diversity and potential geographical factors such as the environment or dietary habit may have affected our results. The primers used in our study during NGS may not be universal, so may anneal variably to specific bacteria resulting in over-detection, under-detection, or indeed non-detection of some taxa (42) (43). A further limitation of this study, and others, is the lack of reciprocal genital tract microbiome testing of the female partners.

In summary, our study reveals commonalities of microbial composition existing in all men, including those with male infertility and RPL. Furthermore, we conclude that appearance of specific bacterial genera within the semen may indicate poor semen quality in all men including those with RPL. This suggests that the human seminal microbiome may broadly reflect sperm function in the male population, though the direction or mechanisms underlying this relationship require further elucidation.

Authors roles

Mowla, Farahani, Tharakan, Jayasena and MacIntyre made substantial contribution to the study design, acquisition of data, analysis and interpretation of data and critical revision of the article for important intellectual content. Davies and Goncalo made substantial contribution to the analysis and interpretation of data and drafting the article. Lee, Kundu, Khanjani, Sindi and Khalifa made substantial contribution to the acquisition of data and critical revision of the article for important intellectual content. Rai, Regan, Henkel, Minhas Dhillo, Ben Nagi and Bennett made substantial contribution to the study design and critical revision of the article for important intellectual content. All authors approved the final version to be published and are in agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.


We would like to thank the patients and participants for their involvement in the study. We would also like to express our gratitude to the Imperial London Open Access Fund for covering the open access cost.


The Section of Endocrinology and Investigative Medicine is funded by grants from the MRC, NIHR and is supported by the NIHR Biomedical Research Centre Funding Scheme and the NIHR/Imperial Clinical Research Facility. The views expressed are those of the author(s) and not necessarily those of the Tommy’s, the NHS, the NIHR or the Department of Health. The following authors are also funded as follows: NIHR Research Professorship (WSD), NIHR Post-Doctoral Fellowship (CNJ). This project was supported by a research grant from Charm Foundation UK as well as funding by Tommy’s National Centre for Miscarriage Research (grant P62774).

Conflict of interest


Data Availability Statement

The 16S rRNA metataxonomic dataset and the data analysis scripts are publicly available at the European Nucleotide Archive (Project accession PRJEB57401) and GitHub (repository link, respectively).


This reviewed preprint has been updated to correct a corresponding author's email address.