Overview of eccDNA formation during mouse spermatogenesis. (A) Schematic representation of Circle-seq in human sperm cells and mouse SPA, RST, EST and sperm cells validated with immunochemistry. SYCP3: a component of the synaptonemal complex; γH2AX: a marker for double-strand breaks; SP65: a marker for acrosome organelle; TUBULIN: structural component of manchette in EST and flagellum axoneme in sperm cells. (B) Number of eccDNAs detected in different cell types. *two-sided t-test p-value < 0.05; ** two-sided t-test p-value <0.01. (C) Size distribution of eccDNAs during mouse spermatogenesis. Dotted lines indicate multiplies of 180 bp. (D) A representative genomic locus showing the gene annotation, Circle-seq signals, detected eccDNAs, and SINE and DNA repeat elements. Highlighted in red rectangle is a large-sized eccDNA. (E) Enrichment of eccDNAs at given genomic regions relative to randomly-selected control regions. (F) Enrichment of eccDNAs at given repeat elements relative to randomly-selected control regions.

Association between sperm eccDNAs and nucleosome positioning. (A) GC contents of sperm eccDNAs, regions upstream and downstream of eccDNAs, and randomly-selected length-matched control regions. ***two-sided Wilcoxon test p-value <0.001. (B) Predicted probability of nucleosome occupancy for eccDNA and randomly-selected length-matched control regions (highlighted by red-shaded area), and surrounding regions. Boxplots showing the probability distribution of individual eccDNAs and control regions. ***two-sided Wilcoxon test p-value <0.001. (C) Enrichment of eccDNAs at different histones and histone modifications. (D and E) ChIP-seq signal distribution at [-1.8kb, +1.8kb] of the centers of ~180bp (D) and ~360 bp (E) eccDNAs. ChIP-seq signals quantified as reads density are color-coded below heatmaps.

Large-sized eccDNAs are preferentially generated from heterochromatin regions. (A) Distribution at H3K27ac- and H3K9me3-marked regions for eccDNAs of different sizes. (B) Distribution at different genomic regions for eccDNAs of different sizes. (C) Number of small (<3kb) vs large (≥3kb) eccDNAs per Mb as a function of gene number per Mb. Pearson correlation coefficients and two-sided t-test p-values are indicated. (D) Number of small (<3kb) vs large (≥3kb) eccDNAs per Mb as a function of Alu number per Mb. Pearson correlation coefficients and two-sided t-test p-values are indicated.

Microhomology directed ligation accounts for emergence of most eccDNAs. (A) Numbers of eccDNAs or randomly-selected control regions overlapped with recombination hotspots in mouse. (B) Shown are numbers of mouse sperm eccDNAs or randomly-selected control regions having 95% reciprocal overlap with different types of structural variations. (C) Illustrated are how an eccDNA with homologous sequences (CGA) at two ends is identified from short-read sequencing data by our methods vs other methods. (D) Percentages of homologous sequences of different lengths (coded by different color saturation levels) are shown for eccDNAs and randomly-selected control regions. (E) GC content of homologous sequences and randomly-selected control regions. (F) Percentages of homologous sequences of different lengths (coded by different color saturation levels) are shown for small-sized (<3kb) and large-sized (≥3kb) eccDNAs. (G) Length of microhomologous sequences as a function of the eccDNA size. Data points are shown as median plus lower (25%) and upper (75%) quartiles. The shaded area is 95% confidence interval of linear regression line. Pearson correlation coefficient and two-sided t-test p-value are indicated. (H) Sequencing motif analysis for ±10bp leftmost left ends and ±10bp leftmost right ends of eccDNAs with no perfectly matched homologous sequences observed. (I) Model for MMEJ-directed eccDNA biogenesis.

The biogenesis mechanism of germline eccDNAs is conserved between human and mouse. (A) Size distribution of sperm eccDNAs in two biological replicates. (B) GC contents of sperm eccDNAs, regions upstream and downstream of eccDNAs, and randomly-selected length-matched control regions. ***two-sided Wilcoxon test p-value <0.001. (C) Predicted probability of nucleosome occupancy for eccDNA and randomly-selected length-matched control regions (highlighted by red-shaded area), and surrounding regions. Boxplots showing the probability distribution of individual eccDNAs and control regions. ***two-sided Wilcoxon test p-value <0.001. (D) Numbers of eccDNAs or randomly-selected control regions overlapped with recombination hotspots in human. EccDNAs located completely within a hotspot (Intra-), or with both ends overlapped with two different hotspots (Inter-) are shown separately. (E) Shown are numbers of human sperm eccDNAs or randomly-selected control regions having 95% reciprocal overlap with different types of structural variations. (F) Percentages of homologous sequences of different lengths (coded by different color saturation levels) are shown for eccDNAs and randomly-selected control regions. (G) Length of microhomologous sequences as a function of the eccDNA size. Data points are shown as median plus lower (25%) and upper (75%) quartiles. The shaded area is 95% confidence interval of linear regression line. Pearson correlation coefficient and two-sided t-test p-value are indicated.

Detection and characterization of eccDNAs. (A) Genome-wide distribution of eccDNAs detected in two human sperm samples. (B) Histograms showing distributions of reproducible rates between any two replicates of the same biological sample. Left: data generated in this study; Right: public datasets [Moller et al, Nat Commun (2018) 9:1069; Henriksen et al, Mol Cell (2022) 82: 209-217.e7]. (C) Number of detected eccDNAs as a function of non-redundant uniquely-mapped reads (useful reads) randomly sampled from indicated Circle-seq sequencing libraries. (D) Principal component analysis of biological replicates using Jaccard indexes with all other replicates as features. 75% confidence ellipses are indicated. (E-G) eccDNA number per 2 mega bases as a function of gene density (E), density of SINE (F) and LINE (G) elements.

Quality control of the eccDNA isolation procedure. (A) qPCR quantification of the ratio of an exogenous circular DNA (pUC19) to a linear DNA locus (H19 gene) before and after eccDNA isolation procedures. The ratio before eccDNA isolation i normalized to 1 (n=4 independent experiments, two-tailed Student’s t-test p-value=0.0327). Data are presented as means ± SEM. *two-tailed Student’s t-test p-value < 0.05. (B) Relative abundance of mtDNA before and after eccDNA isolation procedures. The abundance of mtDNA before eccDNA isolation is normalized to 1 (n=3 independent experiments, two-tailed Student’s t-test p-value=0.0068). Data are presented as means ± SEM. **two-tailed Student’s t-test, p-value < 0.01. (C) Gel image showing PCR validation of three eccDNAs using inward and outward PCR primers.

Length distribution of eccDNAs in different cells. Dotted lines indicate multiplies of 180 bp.

Correlation between the density of small-sized (A) vs large-sized (B) eccDNAs and the meiotic recombination rate [Jensen-Seaman et al., Genome Res (2004) 14, 528-538].

High similarity between sperm eccDNAs detected in this study and those from apoptotic DNA fragmentation reported previously. (A) Enrichment of eccDNAs reported in Nature (2021) 599: 308-314 at given genomic regions relative to randomly-selected control regions. (B) Enrichment of eccDNAs reported in Nature (2021) 599: 308-314 at given repeat elements relative to randomly-selected control regions. (C) EccDNA numbers per 2Mb region in this study as a function of eccDNA numbers per 2 Mb reported in Nature (2021) 599: 308-314.

Evaluation of our nucleotide-resolution eccDNA detection method. (A) A representative genomic locus showing Circle-seq signals and eccDNAs detected by individual methods. Enlarged are the first eccDNA as well as its sequence information. Homologous sequences (TAC) at the eccDNA end are highlighted in red. (B) Schematic representation of our eccDNA detection pipeline. (C) Percentages of simulated eccDNA regions correctly detected by different methods. (D) Percentages of eccDNA breakpoints correctly assigned by different methods. (E) Percentages of eccDNAs detected by individual methods (x-axis) that were co-detected by other methods (y-axis). (F) Numbers and percentages of eccDNA breakpoints misassigned by different methods.

Biogenesis mechanism of eccDNAs in mouse somatic tissues and human sperms. (A) Predicted probabilities of nucleosome occupancy for ~180bp and ~360 bp eccDNAs in various mouse tissues. (B) Percentages of homologous sequences of different lengths (coded by different color saturation levels) are shown for eccDNAs and randomly-selected control regions. (C) Sequencing motif analysis for ±10bp leftmost left ends and ±10bp leftmost right ends of eccDNAs with no perfectly matched homologous sequences observed. (D) EccDNA numbers per 2Mb region in this study as a function of eccDNA numbers per 2 Mb reported in Nature (2021) 599: 308-314. (E and F) Sequencing motif analysis for ±10bp leftmost left ends and ±10bp leftmost right ends of eccDNAs reported in Mol Cell (2022) 82(1): 209-217 (E) and PNAS (2021) 118(47) (F).