Endoparasitoid lifestyle promotes endogenization and domestication of dsDNA viruses

  1. Benjamin Guinet  Is a corresponding author
  2. David Lepetit
  3. Sylvain Charlat
  4. Peter N Buhl
  5. David G Notton
  6. Astrid Cruaud
  7. Jean-Yves Rasplus
  8. Julia Stigenberg
  9. Damien M de Vienne
  10. Bastien Boussau
  11. Julien Varaldi  Is a corresponding author
  1. Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622, France
  2. Zoological Museum, Department of Entomology, University of Copenhagen, Universitetsparken, Denmark
  3. Natural Sciences Department, National Museums Collection Centre, United Kingdom
  4. INRAE, UMR 1062 CBGP, 755 avenue 11 du campus Agropolis CS 30016, 34988, France
  5. Department of Zoology, Swedish Museum of Natural History, Sweden
5 figures and 13 additional files

Figures

Figure 1 with 9 supplements
Endogenous Viral Elements (EVEs ) and their domestication status in Hymenoptera.

Lifestyles are displayed next to species names (blue: free-living, green: endoparasitoid, yellow: ectoparasitoid, gray: unknown). The number of EVEs and domesticated EVEs (dEVEs) found in each …

Figure 1—source data 1

File including all aligned cluster sequences scored from A to X.

https://cdn.elifesciences.org/articles/85993/elife-85993-fig1-data1-v1.zip
Figure 1—figure supplement 1
Example of endogenization events.

The phylogeny of cluster21304 corresponds to the clustering of a set of viral and candidate viral insertion genes sharing a homology. In red are represented the loci of viral origin, and in blue are …

Figure 1—figure supplement 2
Simplified summary of the bioinformatics pipeline for the detection and validation of candidates for endogenization and domestication.
Figure 1—figure supplement 3
Canonical examples of endogenization events inferred by our pipeline.

The column ‘Sp names’ contains the species name, followed by the name of the scaffold in which the endogenous viral element (EVE) has been identified. The ‘Viral family’ column refers to the …

Figure 1—figure supplement 4
IVSPER genes identified in the Campopleginae genome.

The figure compares the synteny of the IVSPER between Hyposoter didymator ichnovirus (HdIV) and the Campoplegninae of our dataset. Homologous genes with synteny between the two species are indicated …

Figure 1—figure supplement 5
Cladogram of the Ophioniformes group, illustrating the two independent endogenization events of two unknown viruses in Banchinae and Campopleginae lineages.

The phylogeny includes 12 subfamilies of the Ophioniformes group within the superfamily Ichneumonoidea. Several species of these subfamilies have been examined for the presence of ichnovirus-like …

Figure 1—figure supplement 6
Ultra conserved element (UCE) trees built to assign to species the unknown Chalcidoidea sequenced with the pool of P. orseoliae.

(A) Phylogeny of early diverging families of Chalcidoidea (423 UCEs and 127,979 bp were analyzed to get the tree, best-fit model = GTR + F + R10). (B) Phylogeny of the family Eulophidae to which the …

Figure 1—figure supplement 7
Source of the datasets and availability of the reads.

Phylogeny of 124 Hymenoptera species. Two Coleoptera species were used to root the tree. The aLRT bootstrap scores are represented along the nodes. The sources refer to the platform or laboratory in …

Figure 1—figure supplement 8
Representation of the score distribution among different virus genome types.

A represents the distribution of the number of endogenization Events and B the number of endogenous viral elements (EVEs). The percentage of Events or EVEs is shown next to the bars.

Figure 1—figure supplement 9
Heatmap representing the viral genes known to be domesticated by Hymenoptera.

The panel (A) refers to the four known cases (Venturia canescens, Fopius arisanus, Cotesia congregata, and Microplitis demolitor) involving Nudivirus donors while the panel (B) refers to the known …

Endogenization involves all types of viral genomic structures.

In all three panels, events inferred as corresponding to domestications are displayed in orange, while events not inferred as domestications are displayed in yellow. (A) Distribution of the number …

Endogenization and domestication of double-stranded DNA (dsDNA) viruses are most prevalent in endoparasitoid species.

(A) Distribution of viral endogenization events (Event) and B of domestication events (dEVEs) across Hymenoptera lifestyles. Crosses indicate the expected proportion of events associated with the …

Figure 4 with 3 supplements
Endogenization and domestication of double-stranded DNA (dsDNA) viruses are more frequent in endoparasitoid species.

Violin plots represent the posterior distribution of the coefficients obtained under the different GLM models (after exponential transformation to obtain a rate relative to free-living species). The …

Figure 4—figure supplement 1
Violin plots of the posterior distribution of GLM coefficients in relation to Hymenoptera lifestyle.

The ectoparasitoid lifestyle is in yellow, the endoparasitoid lifestyle is in green, and the free-living lifestyle is in blue. A binomial negative zero-inflated GLM model was used, with free-living …

Figure 4—figure supplement 2
Violin plots of the posterior distribution of dEvents GLM coefficients in relation to wasp lifestyle (corrected for Events rates).

The ectoparasitoid lifestyle is in yellow, the endoparasitoid lifestyle is in green, and the free-living lifestyle is in blue (the intercept). Coefficients have been transformed into exponential and …

Figure 4—figure supplement 3
Violin plots of the posterior distribution of GLM coefficients in relation to Hymenoptera lifestyle.

The ectoparasitoid lifestyle is in yellow, the endoparasitoid lifestyle is in green, the free-living lifestyle is in blu.e and the eusocial lifestyle is in purple. A binomial negative zero-inflated …

Figure 5 with 4 supplements
Phylogenetic relationships among endogenized and ‘free-living’ double-stranded DNA (dsDNA) viruses.

Specifically, this figure shows the relationships between Naldaviricetes double-stranded DNA viruses and endogenous viral elements (EVEs) from hymenopteran species, where at least three …

Figure 5—source data 1

File including all aligned cluster sequences included in the Naldaviricetes phylogenetic analysis.

https://cdn.elifesciences.org/articles/85993/elife-85993-fig5-data1-v1.zip
Figure 5—figure supplement 1
G+C% Coverage distribution of scaffold containing multiple endogenous viral elements (EVEs) Events.

The size of the dots corresponds to the number of candidate EVEs inside the scaffold. The color represents the genomic entity from which the EVE probably originated (brown: Nudiviridae, blue = LbFV, …

Figure 5—figure supplement 2
Genomic environment for the endogenous viral elements (EVEs) detected in Platygaster orseoliae.

The plot show regions homologous to viral ORFs in the Platygaster orseoliae filamentous virus (PoFV) genome (A). The colored regions correspond to the predicted ORFs in the PoFV genome (gray ORFS in …

Figure 5—figure supplement 3
Phylogenies of LbFV-like proteins under purifying selection in Platygaster orseoliae genome.

The panel A represents the Cluster_25710 which corresponds to the integrase protein. The panel B represents the Cluster_26675 which corresponds to the ac81 protein. Taxa in red correspond to …

Figure 5—figure supplement 4
Candidate endogenous viral elements (EVEs) in two ant species.

In gray are displayed the eukaryotic genes predicted by Augustus, with a dark color for exons, and light for introns. In black are displayed the transposable elements predicted by sequence …

Additional files

Supplementary file 1

Summary statistics table for candidate EVEs.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp1-v1.xlsx
Supplementary file 2

General information regarding the species used in this study.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp2-v1.xlsx
Supplementary file 3

Summary statistics table for control endogenous viral elements (EVEs).

https://cdn.elifesciences.org/articles/85993/elife-85993-supp3-v1.xlsx
Supplementary file 4

Information for individual loci.

Endogenized loci (scoring from A to D) are displayed in the first sheet, whereas exogenous loci (from E to X) are displayed in the second sheet.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp4-v1.xlsx
Supplementary file 5

Table listing the names of virus species known to probably interact with insects.

The data is taken from the virushostdb database (Mihara et al., 2016) (version of 24/03/2023), which lists a wide variety of virus species associated with their presumed hosts. We have also added two important exploratory studies of RNA viruses (Shi et al., 2016; Wu et al., 2020). The viral genomic structures associated with the viruses were retrieved from the ICTV website (V2022_MSL38). Each column represents the information retrieved for each virus species from one of the three sources listed. The Hostdb_Host_lineage column corresponds to the information of the insect host observed interacting with the virus of interest. If a column with the suffix ’Shi’ or ’Haoming’ contains information for a virus species, then this means that this RNA virus species was found in their dataset in an insect.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp5-v1.xlsx
Supplementary file 6

Datasets and detailed statistics are presented in the manuscript.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp6-v1.xlsx
Supplementary file 7

Additional information from the double-stranded DNA (dsDNA) Naldaviricetes phylogenetic analysis in Figure 5, including the best partition models chosen for each partition and the number of genes used for each species of the tree.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp7-v1.xlsx
Supplementary file 8

Biosample information regarding the 34 Hymenoptera species sequenced for this study.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp8-v1.xlsx
Supplementary file 9

Table representing the overlap between transposable elements and the clusters of homologous endogenous viral elements (EVEs).

The transposable elements were inferred using the RepeatModeler RepeatPeps database.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp9-v1.xlsx
Supplementary file 10

Information on the RNAseq datasets used in this study.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp10-v1.xlsx
Supplementary file 11

Details of the tblastn analysis for Platygaster orseoliae endogenous viral elements (PoEFV) and complementary information.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp11-v1.xlsx
Supplementary file 12

File including all the cluster phylogenies.

Leafs highlighted in green represent endogenous viral elements (EVEs) (scored from A to D), while leafs highlighted in red represent free-living viruses or loci annotated as putative free-living sequences (scored from D to X). The letter at the end of the taxon label represents the endogenization score assigned to the candidate. The assigned viral family of the free-living genes appears next to the pipe. The numbers right next to the ‘Event’ refers to the assigned Event number. If EVEs were found under selection (either by RNAseq or dN/dS analysis), the end of the leaf name will be ‘Selective_pressure_YES,’ while if not, the name will be ‘Selective_pressure_NO.’ Ultra-Fast Bootstrap values can be found next to the nodes of the phylogenies. For each phylogeny, the putative consensus protein name as well as the putative viral family is given at the top of the figure.

https://cdn.elifesciences.org/articles/85993/elife-85993-supp12-v1.pdf
MDAR checklist
https://cdn.elifesciences.org/articles/85993/elife-85993-mdarchecklist1-v1.pdf

Download links