Dataset of the reconstructed HERV Envelope proteins.

Illustration of the reconstruction process of HERV Env proteins.

The first step was to extract env genes for both endogenous and exogenous retroviruses from Literature, Dfam, NCBI, Uniprot, and PDB (complete env sequence as well as defective env genes). Second, we performed group specific alignments for all the ERV sequences with Mafft and further divided the sequences into subtypes based on the alignments, as in case of HERV3 and HML3. Then, we translated the sequences into three forward frames in the alignment and extracted only the translated portions that did not have any stop codons. Lastly, the extracted aa portions were aligned to the reference sequence of each group and hence, a reconstructed env sequence was generated for HERV groups as mentioned in Table 1.

Phylogeny of the reconstructed Envs with the envelope protein of exogenous retroviruses.

The phylogenetic tree was generated for the reconstructed HERV Env proteins with the 69 exogenous envelope proteins for Class I, Class II and Class III retroviruses. The class I exogenous and endogenous retroviruses are highlighted in green, class II in yellow and class III in orange color. The reconstructed Envs are presented in red color in the tree. The tree also contains RVs of other class with which no ERVs were clustered and hence are not highlighted.

Phylogenetic analysis of the envelope hits obtained from the tBLASTn analysis for the three classes i.e. Class I (gamma-like env), Class II (beta-like env) and Class III (spuma-like env).

The trees were generated using IQ-TREE software with the bootstrap of 1000 replicates. (a)Phylogeny for all the hits obtained for Class I ERVs with the best fit model of TVM+F+R4. (b) Phylogeny for all the hits obtained for Class II ERVs with the best fit model of TVM+F+I+R3 and (c) Phylogeny for all the hits obtained for Class III ERVs with the best fit model of K3Pu+F+R3. All the groups obtained from the tBLASTn hits and further clustered together with the phylogenetic analysis are labelled in the tree. Similar trees are provided as supplementary files with the tip labels having all the annotations and the accession numbers for all the three classes (Supplementary file 2,3 and 4).

Coordinates of HML supergroup in Platyrrhini (Squirrel Monkey Genome).

Detecting the recombination events in the ERV’s env gene.

(a) Schematic representation env recombination events. The representation indicates the presence of recombination in the ERVs’ env sequence as well as its position in the env. The coordinates for these recombination events are provided in the supplementary file 5. (b) Phylogenetic representation of the initiation of the recombination event in the primate lineage. The phylogenetic tree was build using TimeTree of Mega software with only the Catarrhini genomes available in the Genome Browser. The recombinations events were tested doing the blat search in genome browser to detect its presence and to traceback the time when the recombination initiated. The genome browser contains the genome of Gorilla gorilla but the Timetree used the Gorilla gorilla gorilla genome to build the tree and hence if marked with *.

Detection of the conserved domains in the reference ERV sequences as well as recombinant sequences.