The widespread nature of intact GAG-POL-ENV elements across metazoan genomes.

(A) Open reading frame arrangement of intact errantivirus copies. (B) Summary of Errantiviruses found in Metazoan phyla with type of ENV protein indicated. (C) Copy number of intact errantivirus copies in metazoans. Column represents a genome assembly, each tile representative of annotated errantivirus, colour of tile represents copy number.

Phylogeny of Reverse Transcriptase (RT) from all intact metazoan errantiviruses.

Maximum likelihood tree constructed on representative pol RT proteins of intact errantiviruses (see methods) and rooted to the midpoint. The “ancient errantiviruses” and “insect errantiviruses” phylogenetic groups are highlighted in green and white respectively. Positions of known dipteran errantiviruses, Gypsy_ZAM (1), Gypsy_Gypsy5_Dgri (2), Gypsy_Gypsy (3), Gypsy_Idefix (4), and Gypsy_DM176 (5) are indicated. Host classifications of errantiviruses are annotated in colour, non-arthropod Protostomia (yellow), Deuterostomia (red), Diploblasta (orange), non-insect Arthropoda (green), Insecta (blue). One cnidarian element within the group of insect errantiviruses is marked by an asterisk.

env classifications of intact metazoan errantiviruses.

(A) Shown is a pol RT maximum likelihood tree of the representative errantiviruses (see methods) with host (*) and type of ENV protein (**) indicated. Colours represent non-insect animals (*) (yellow), insects (*) (blue), glycoprotein F (F) (**) (orange), and glycoprotein B (HSV/gB) (**) (purple). Clades containing hymenopteran elements in “Insect errantiviruses” that carry either F or HSV/gB env are annotated as 1 and 2. (B) Close up of clades 1 and 2 in “Insect errantiviruses”. F-type and HSV/gB-type errantiviruses that are both found in Lasioglossum lativentre are marked by asterisks.

Structure and phylogeny of glycoprotein F-type ENV found in intact errantiviruses.

(A) Domain arrangement of glycoprotein F ectodomain in Errantiviruses. (B) A Sankey plot of pol RT and glycoprotein F- type ENV ectodomain maximum likelihood trees for representative errantiviruses, showing a phylogenetic congruence between them. Bootstrap values for nodes are given. The “ancient errantivirus” group is highlighted in green. (C) Monomeric and trimeric structures of the “pre-fusion configuration” and the “post fusion configuration”. Schematic cartoons of F-type ENV ectodomain domain 4 are shown in (C’) and (C’’) Respectively. Structures are taken from Diptera_Sciomyzidae_Cmar_errantivirus_19 and Branchipoda_Diplostraca_Dcar_errantivirus_3 for “pre fusion” and “post fusion” structures in (C).

Structure of glycoprotein HSV/gB-type ENV found in intact errantiviruses.

(A) Structural arrangements of Ectodomains, Transmembrane domains and signal peptides found in HSV/gB-type errantiviruses. (B) Shown are the arrangement of domains of the HSV/gB-type ENV ectodomain, and Alphafold 2-predicted 3D monomer and trimer structures of the ectodomain. The structure is taken from Hemiptera_Pemphigidae_Elan_errantivirus_1. (C) Fusion loops and conserved cysteine bridges on structure of the errantivirus HSV/gB-type ENV ectodomain. Shown are variable cysteine bridges in the errantivirus HSV/gB-type ENV ectodomains, in domain I and II (Indicated with orange bar. Proximity of cysteine bridge to fusion loops of domain I are indicated for structures within C1, C2 and C3. Structures taken from Cnidaria_Hydrozoa_Chem_errantivirus_5, Bryozoa_Gymnolaemata_Cpal_errantivirus_2, and Tunicata_Ascidiacea_Clep_errantivirus_1, for C1, C2 and C3 configurations, respectively. (D) Shown is the summary of the cysteine bridge configurations found in HSV/gB-type ENV of different metazoan orders. Positions for cysteine bridge configurations within Nematoda and Platyhelminthes shown on Simple schematic. Exact positions of all cysteine bridges can be found in Supplementary Figure S6.

RNase H domains found in the bridge region of intact errantiviruses.

(A) Alphafold 2-predicted structures of domains found in the bridge regions of errantiviruses, with topology maps of α helices (grey) and β sheets (red). Structures are taken from Annelida_Polychaeta_Apac_errantivirus_7, Bryozoa_Gymnolaemata_Cpal_errantivirus_3, Coleoptera_Coccinellidae_Npum_errantivirus_1, Annelida_Polychaeta_Slim_errantivirus_9, Diptera_Cecidomyiidae_Smos_errantivirus_1 and Hymenoptera_Apoidea_Obic_errantivirus_2 for RNase H/Tether, ‘mini’ RNase H, ‘pseudo’ RNase H domains, “Mini” domain 2, “Pseudo” domain 2 and “Pseudo” domain 3, respectively. (B) Shown is the pol RT maximum likelihood tree of representative errantiviruses, with RNase H domain architecture in bridge region of pol highlighted. “Insect errantiviruses” with RNase H + pseudo–RNase H (blue), “insect errantiviruses” with bridge region outliers (dark blue), “ancient errantiviruses” with RNase H + mini domain (green), “ancient errantiviruses” with bridge region outliers (yellow). (C) The structure of the bridge region of yeast Schizosaccharomyces pombe Ty3 element tf1 predicted by Alphafold 2.

Trees of known Ty3/gypsy elements and newly identified Errantiviruses.

Midpoint rooted pol RT maximum likelihood tree of representative errantiviruses inclusive of non-env Ty3/gypsy pol RT. Host Order and env type given for each clade, as well as solitary elements on the tree are shown. Errantiviruses containing chromodomains are marked in yellow. The “ancient errantivirus” group is coloured in green and “insect errantiviruses” in black. Known Ty3/gypsy elements includes Gypsy Group 1 OSIRIS (Gypsy-21_DAn), Gypsy Group 1 OSVALDO (Dypsy-3_DEl), Gypsy Group 2 BICA (Gypsy-6_DFi), Gypsy Group 2 BLASTOPIA (Gypsy-32_DWil), Gypsy Group 2 MDG3 (INVADER3, MDG3_DM), Gypsy Group 2 MICROPIA (Gypsy- 7_DSe), Gypsy Group 2 SACCO (Gypsy-10_DAn), Gypsy Group 3 412/MDG1 (BLOOD, Gypsy-28_Dan), Gypsy Group 3 CHIMPO (Gypsy-7_DAn), Gypsy Group 3 17.6 (DM176, IDEFIX, ZAM), Gypsy Group 3 GYPSY (Gypsy-5_Dgri, Gypsy-Gypsy), Mag (Mag_As, MAG_Bmor), Cer (Cer1, Cer4), Tor (Tor1, Tor2, Tor4b), Athila/Tat (ATHILA0p1), and Chromoviridae (ATHILA0p1, CRM_AAM94350.1, REINA, Saccharomyces_Ty3, SCL213_Galadriel, Skipper*, Spombe_Tf1_AAA35339.1, SUSHI).

Identification of errantiviruses in metazoan genomes.

(A) Computational approach to identify and annotate env-containing retroelements in metazoan genomes. (B) A pruned maximum likelihood tree of pol RT from errantiviruses with protein baits for consecutive tBLASTn searches annotated in colour, showing a comprehensive coverage of the baits in the tree. (C) pol RT maximum likelihood trees with representative elements taken for Alphafold 2 analysis highlighted in red for bridge domains, F-type and HSV/gB-type env ectodomains.

Complete errantivirus tree and annotations of tRNA primer binding sites of insect errantiviruses.

(A) Complete Maximum likelihood tree constructed on pol RT proteins of intact errantiviruses, with discrete phylogenetic groups within “insect errantiviruses” indicated (I1 - I6). “ancient errantivirus" group highlighted in green, “insect errantivirus group highlighted in white. Host phylogeny (*) annotated for insects (blue), non-insect arthropods (green), non-arthropod Protostomia (yellow), Deuterostomia (Red) and Diploblasta (Orange). ENV type (**) annotated for glycoprotein F (F) (orange), and glycoprotein B (HSV/gB) (purple). (B) A Sankey plot of pol RT and pol IN maximum likelihood trees for errantiviruses, showing an overall phylogenetic congruence between them. (C) Shown are annotations of tRNA primer binding sites of “insect errantiviruses” within phylogenetic groups of pol RT, demonstrating that the types of tRNAs used reflects the separation of pol RT. Number of elements are indicated on Y axis. Sequence information of the tRNA PBS can be found in Table S1.

Insect errantivirus clades.

A maximum likelihood tree of pol RT for the “insect errantivirus” group in isolation, with host phylogeny and env type, and discrete phylogenetic groups (I1-I6) indicated. Host orders (column 1) are annotated for Diptera (red), Lepidoptera (orange), Coleoptera (yellow), Hymenoptera (green), Hemiptera (cyan), Polyneoptera (pink) and Palaeoptera (purple). ENV types (column 2) are annotated for Glycoprotein F (F) (orange), and glycoprotein B (HSV) (purple).

Ancient errantivirus clades.

(A) Maximum likelihood tree of pol RT for the “ancient errantivirus” group in isolation, with host phylogeny and env type, and discrete phylogenetic groups (A1-A11) are indicated. Host phylogeny (*) is annotated for insects (blue), non-insect arthropods (green), non- arthropod Protostomia (yellow), Deuterostomia (Red) and Diploblasta (Orange). ENV type (**) annotated for glycoprotein F (F) (orange), and glycoprotein B (HSV) (purple). (B) Detailed view of phylogenetic groupings in “ancient errantiviruses” with host (*) and env (**) annotated. Bryozoa elements that are found in distinct clades of the tree are highlighted. Each element is named for the host species and simplified to Phylum/Class/Order, Superfamily, Genus and Species. For example, elements found in Diptera Drosophilidae Drosophila melanogaster are named as Diptera Drosophilidae Dmel. (C) tRNA annotations of “ancient errantiviruses” within phylogenetic groups of pol RT. Insect orders in the groups A1, A2 and A7 are marked by hash. Number of elements indicated on Y axis.

Conserved predicted Furin cleavage sites and hydrophobic fusion peptides of glycoprotein F.

Shown is a multiple sequence alignment of the middle part of the ectodomains of representative glycoprotein F. Arginine (R) residues in the putative Furin cleavage site and hydrophobic residues (Leu, Ile, Val, Phe, and Trp) in the fusion peptides are highlighted in red and green, respectively. Furin cleavage sites were predicted using ProP-1.0 (https://services.healthtech.dtu.dk/services/ProP-1.0/). Conserved α helix structures are shown upstream and downstream of the Furin cleavage site.

Conservation of cysteine bridges in errantivirus HSV/gB ectodomains.

Shown are alignments of domain I and II of all errantivirus HSV/gB ectodomains with cysteine residues highlighted in red and hydrophobic residues in the fusion loop peptides highlighted in green. Sequences from different metazoan orders are separately aligned. Conserved cysteine bridges within and across the alignments are numbered, and the tree positions of ERVs are shown where “insect errantiviruses” and “ancient errantiviruses” are highlighted in white and green, respectively, as in Fig 2A. ERVs that were used for the structural analysis are marked by asterisks. The structure of Hymenoptera_Formicoidea_Tbic_errantivirus_1 is shown to highlight the position of the cysteine bridge in domain I of insect errantivirus HSV/gB ectodomain and the absence of the fusion loop II. Note that the single HSV/gB-type ENV found in Annelida errantiviruses did not predict a trimer structure, therefore, it is not included in the structure classification.

Presence of chromodomain in “ancient” errantiviruses –

Maximum likelyhood tree of RT from representative errantiviruses along with known Ty3/gypsy elements. ENV type (*) annotated for Glycoprotein F (F) (orange), and glycoprotein B (HSV/gB) (purple). Chromodomain (**) annotated in purple.