The two main hypotheses for the origin of virophages and NCLDVs. In the virophage first hypothesis, NCLDVs diverge early with its sister lineage evolving into protovirophages. In the nuclear escape hypothesis, NCLDVs descend from endogenous elements (encoding an integrase) that became exogenous; virophages then evolved to become their parasites.

Rooted Bayesian maximum clade credibility tree of the major lineages of eukaryotic viruses in the kingdom Bamfordvirae. The tree is based on the concatenated alignment of 4 core proteins involved in virion morphogenesis (major and minor capsid proteins, ATPase and protease). Tree computed in BEAST 2 (Bouckaert et al., 2019) using a relaxed molecular clock and 140 million generations (relative burn-in of 25%). Black: reference viral genomes, Blue: endogenous elements, Orange: metagenomic sequences.

Outgroup-rooted maximum likelihood tree of the major lineages of eukaryotic viruses in the kingdom Bamfordvirae. The tree is based on the concatenated alignment of 4 core proteins involved in virion morphogenesis (major and minor capsid proteins, ATPase and protease). Enterobacteria phage PRD1 (Tectiviridae) was used as the outgroup for rooting. Tree computed in RAxML-NG with 200 random starting trees and 2,200 bootstraps (Kozlov et al., 2019). Black: reference viral genomes, Blue: endogenous elements, Orange: metagenomic sequences.

Comparison between the nuclear-escape and alternative maximum-likelihood models based on the Akaike information criterion (AIC). The log-likelihoods for the models were obtained from the best maximum-likelihood tree consistent with each hypothesis found in RAxML-NG (Kozlov et al., 2019). Results are shown for the concatenated data set of four core proteins (ATPase, protease, major capsid and minor capsid) and for the protease, major and minor capsid proteins, respectively. The AIC was size-corrected given that the alignment (sample) size was small relative to the number of free parameters, i.e. n/k < 40 (Posada and Buckley, 2004; Symonds and Moussalli, 2011). The best model is highlighted in boldface. The “alternatives” refer to non “nuclear-escape” scenarios, that is, models which are not consistent with the predictions made by the nuclear-escape.

Posterior model odds of the nuclear-escape and alternative hypotheses using concatenated and single-protein data sets. Tree topologies consistent with each hypothesis were filtered and counted from a Bayesian MCMC following the method of Bergsten et al. (2013). All ratios favour the alternative scenarios to the nuclear-escape hypothesis. The best model is highlighted in boldface. The “alternatives” refer to non-“nuclear-escape” scenarios, that is, models which are not consistent with the predictions made by the nuclear-escape.

Bayesian stepping-stone analysis of the nuclear-escape and alternative hypotheses. Each scenario was run on the concatenated data set in MrBayes 3 (Ronquist and Huelsenbeck, 2003) for 20 million generations (average standard deviation of split frequencies < 0.01). The Bayes factor strongly rejects a sister relationship between adenoviruses and NCLDVs (nuclear-escape hypothesis). The best model is highlighted in boldface. The “alternatives” refer to non “nuclear-escape” scenarios, that is, models which are not consistent with the predictions made by the nuclear-escape.

Evolutionary model for the origin of the major lineages of eukaryotic viruses in the kingdom Bamfordvirae. The viral ancestor is inferred to have been an exogenous virus, while the rve-integrase was captured independently by the clade of Mavericks + Polinton-like virus BS_13 and Mavirus, possibly by horizontal gene transfer. Virophages evolved from an autonomous virus that became specialised to parasitise the ancestor of NCLDVs. The vertical cross-hatching indicates that the trait is found in some but not all members of the group. Acronyms refer to genes and genomic features present in the viral genomes: pPOLB (protein-primed DNA polymerase B), MCP (major capsid protein), mCP (minor capsid protein), int (rve-type integrase), pro (adenoviral-like protease), atp (FtsK/HerA DNA packaging ATPase), TIRs (terminal inverted repeats).