Rooted inter-domain Tree of Life used for molecular dating analyses.

Maximum likelihood tree constructed with IQ-TREE v1.6.12 using the concatenation of 30 RNAP subunits and ribosomal protein sequences and the substitution model LG+R10. Blue labels represent the marine clades dated in our study. Dark gray dots show the temporal calibration used in our molecular dating analyses (Table 1). The marine clades shown are classified on the GTDB as follows: MGII, Poseidoniales; MGI, Nitrososphaerales; SAR202, SAR202; Crocosphaera, Crocosphaera; Prochlorococcus, Prochlorococcus; Synechococcus, Synechococcus; Ca. Marinimicrobia, Marinisomatia; SAR324, SAR324; SAR86, Oceanospirillales; SAR92, Porticoccaceae; SAR11, Pelagibacterales; Roseobacter, Rhodobacteraceae; SAR116, Puniceispirillaceae.

Temporal calibrations used as priors for the molecular dating of the main marine microbial clades. See methods for a detailed explanation of the calibrations used.

Dates of the diversification of marine microbial clades and their relationship with the redox history of Earth’s atmosphere, surface ocean, and deep ocean.

A) Ridges represent the distribution of 100 Bayesian dates estimated using a relaxed molecular clock and an autocorrelated model (see Methods). Ridges of marine clades were colored based on their diversification date: green, Late-branching Phototrophs; orange, Late-branching Clades; blue, Early-branching Clades. The timing of the diversification of major bacterial and archaeal superphyla is represented with gray ridges. Molecular dating estimates resulting from the uncorrelated model UGAM and the autocorrelated model CIR are shown on Supplemental File 8. B) Oxygenation events and redox changes across Earth’s history. Panel adapted from previous work (Alcott et al., 2019). Abbreviations: POE, Paleozoic Oxidation Event; NOE, Neoproterozoic Oxidation Event; GOE, Great Oxidation Event; Pha, Paleozoic; NP, Neoproterozoic; MP: Mesoproterozoic; PP: Paleoproterozoic.

KEGG categories enriched at the crown node of each marine microbial clade.

Clades were classified based on their diversification timing shown in Fig. 2. Enriched categories were identified by statistically comparing a stochastic mapping distribution with an all-rates-different model vs a null distribution with a constant rate model without conditioning on the presence/absence data at the tips. Each dot represents one replicate (See Methods). X-axis represents the number of KOs gained at each crown node for each KEGG category. Stochastic mapping and null distributions were sorted for visualization purposes. The complete list of enriched KEGGs is shown in Supplemental File 7.

Link between the timing of the diversification of the main marine microbial clades and major geological and biological events.

The timing of the geological and biological events potentially involved in the diversification of marine clades is based on previously published data: “Boring billion” (Brasier and Lindsay, 1998; Hodgskiss et al., 2019), red algae fossils (Butterfield, 2000), increased of ocean productivity (Butterfield, 2000; Och and Shields-Zhou, 2012), green algae fossils (Butterfield et al., 2006), and oxygenation of the deep ocean (Lenton et al., 2016). The length of each bar represents the estimated age for marine clades according to Bayesian estimates. The timing of the main oxygenation events is based on previous work (Alcott et al., 2019).