Figures and data

Illustration of the reconstruction process of HERV Env proteins.
The first step was to extract env genes for both endogenous and exogenous retroviruses from Literature, Dfam, NCBI, Uniprot, and PDB (complete env sequence as well as defective env genes). Second, we performed group specific alignments for all the ERV sequences with Mafft and further divided the sequences into subtypes based on the alignments, as in case of HERV3 and HML3. Then, we translated the nt sequences into three forward frames in the alignment and extracted only the translated portions that did not have any stop codons. Lastly, the extracted aa portions were aligned to the reference sequence of each group and hence, a reconstructed env sequence was generated for HERV groups as mentioned in Table 1.

Dataset of the reconstructed HERV Envelope proteins

Phylogeny of the reconstructed Envs with the envelope protein of exogenous retroviruses.
The phylogenetic tree was generated for the reconstructed HERV Env proteins with the 69 exogenous envelope proteins for Class I, Class II and Class III retroviruses. The class I exogenous and endogenous retroviruses are highlighted in green, class II in yellow and class III in orange color. The reconstructed Envs are presented in red color in the tree. The tree also contains RVs of other class with which no ERVs were clustered and hence are not highlighted.



Widespread Distribution of ERV env genes across primate genomes


Coordinates of HML supergroup in Platyrrhini (Squirrel Monkey Genome)

Phylogenetic analysis of the envelope hits obtained from the tBLASTn analysis for the three classes i.e. Class I (gamma-like env), Class II (beta-like env) and Class III (spuma-like env).
The trees were generated using IQ-TREE software with the bootstrap of 1000 replicates. (a)Phylogeny for all the hits obtained for Class I ERVs with the best fit model of TVM+F+R4. (b) Phylogeny for all the hits obtained for Class II ERVs with the best fit model of TVM+F+I+R3 and (c) Phylogeny for all the hits obtained for Class III ERVs with the best fit model of K3Pu+F+R3. All the groups obtained from the tBLASTn hits and further clustered together with the phylogenetic analysis are labelled in the tree. Similar trees are provided as supplementary files with the tip labels having all the annotations and the accession numbers for all the three classes (Supplementary file 2,3 and 4).

Detecting the recombination events in the ERV’s env gene.
(a) Schematic representation of env recombination events in ERVs. The diagram indicates the location of the LTRs in the recombinant env gene, with the coordinates provided in supplementary file 5. (b) Phylogenetic tree illustrating the initiation of recombination events in the primate lineage. This tree was generated based on the available Catarrhini genomes in the Genome Browser using TimeTree server. Since the genome used by the server was Gorilla gorilla gorilla hence is marked with *. The presence of recombination events was tested using BLAT searches in the Genome Browser to trace the timing of their initiation and hence annotated on the phylogenetic tree based on the genomes the recombinants were detected.


Summary of Recombination Events in Gamma-like, Beta-like, and Spuma-like Env Genes Across Primate Genomes
