Peer review process
Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.
Read more about eLife’s peer review process.Editors
- Reviewing EditorGeorge PerryPennsylvania State University, University Park, United States of America
- Senior EditorGeorge PerryPennsylvania State University, University Park, United States of America
Reviewer #3 (Public review):
Summary:
Retroviruses have been endogenized into the genome of all vertebrate animals. The envelope protein of the virus is not well conserved and acquires many mutations hence can be used to monitor viral evolution. Since they are incorporated into the host genome, they also reflect the evolution of the hosts. In this manuscript the authors have focused their analyses to the env genes of endogenous retroviruses in primates. Important observations made include the extensive recombination events between these retroviruses that were previously unknown and the discovery of HML species in genomes prior to the splitting of old and new world monkeys.
Strengths:
They explored a number of databases and made phylogenetic trees to look at the distribution of retroviral species in primates. The authors provide a strong rationale for their study design, they provide a clear description of the techniques and the bioinformatics tools used.
Weaknesses:
The manuscript is based on bioinformatics analyses only. The reference genomes do not reflect the polymorphisms in humans or other primate species. The analyses thus likely under estimate the amount of diversity in the retroviruses. Further experimental verification will be needed to confirm the observations.
Not sure which databases were used, but if not already analyzed, ERVmap.com and repeatmesker are ones that have many ERVs that are not present in the reference genomes. Also long range sequencing of the human genome has recently become available which may also be worth studying for this purpose.
Comments on revisions:
All comments have been adequately addressed.
Author response:
The following is the authors’ response to the original reviews
Public Reviews:
Reviewer #1 (Public review):
Summary
Chabukswar et al analysed endogenous retrovirus (ERV) Env variation in a set of primate genomes using consensus Env sequences from ERVs known to be present in hominoids using a Blast homology search with the aim of characterising env gene changes over time. The retrieved sequences were analysed phylogenetically, and showed that some of the integrations are LTR-env recombinants.
Strengths
The strength of the manuscript is that such an analysis has not been performed yet for the subset of ERV Env genes selected and most of the publicly available primate genomes.
Weaknesses
Unfortunately, the weaknesses of the manuscript outnumber its strengths. Especially the methods section does not contain sufficient information to appreciate or interpret the results. The results section contains methodological information that should be moved, while the presentation of the data is often substandard. For instance, the long lists of genomes in which a certain Env was found could better be shown in tables. Furthermore, there is no overview of the primate genomes Saili how did you answer to this?, or accession numbers, used. It is unclear whether the analyses, such as the phylogenetic trees, are based on nucleotide or amino acid sequences since this is not stated. tBLASTn was used in the homology searches, so one would suppose aa are retrieved. In the Discussion, both env (nt?) and Env (aa?) are used.
For the non-hominoids, genome assembly of publicly available sequences is not always optimal, and this may require Blasting a second genome from a species. Which should for instance be done for the HML2 sequences found in the Saimiri boliviensis genome, but not in the related Callithrix jacchus genome. Finally, the authors propose to analyse recombination in Env sequences but only retrieve env-LTR recombinant Envs, which should likely not have passed the quality check.
Since the Methods section does not contain sufficient information to understand or reproduce the results, while the Results are described in a messy way, it is unclear whether or not the aims have been achieved. I believe not, as characterisation of env gene changes over time is only shown for a few aberrant integrations containing part of the LTR in the env ORF.
We thank the reviewer for the critiques of the manuscript and their constructive suggestions to improve the clarity, methodological rigor, and data presentation.
(1) The concern regarding the insufficient data in the methods has been resolved in the revised manuscript by adding a supplementary file that contains the genome assemblies that were used to perform the tBLAStn analysis using the reconstructed Env sequences. The requested accession numbers are available for all sequences in the supplementary phylogenetic figures.
(2) We have also modified the manuscript by moving a portion of the results section in the methods section, in particular all the methodological description of the reconstruction of Env part (Line 197-231).
(3) As suggested, the long list of genomes mentioned in the results section in which the Env tBLASTn hits were obtained are now provided in the table form (Table 2) as an overall summary of the distribution of ERV Env in the genomes and the genome assemblies are mentioned in Supplementary file 2.
(4) As for the point regarding the tBLASTn usage in the homology searches, we first performed tBLASTn analysis using the reconstructed Env amino acid sequences as query and performed tBLASTn similarity search in the primate genomes. The tBLASTn algorithm uses the amino acid sequences to compare with the translated nucleotide database in all six frames and hence the hits obtained are nucleotide sequences (Line 381-383). These nt sequences were used for all the further analysis such as sequence alignment, phylogenetic analysis and recombination analysis. For better clarity, we have specified the use of env nt alignments in the methods section to avoid the raised confusion in the discussion.
(5) For the HML supergroup characterization in squirrel monkey genome (Saimiri boliviensis), we used the tBLASTn hits obtained in the S. boliviensis from the initial analysis to perform the comparative genomics in two Platyrrhini genomes available on UCSC Genome browser. In particular, this analysis was performed to confirm the presence of specific members of HML supergroup in squirrel monkey genomes that has not been previously reported. We used the available genome assemblies because of the annotations available on Genome browser, and especially the possibility to use the repeatmasker tracks and the comparative genomics tools in order to use the human genome as a reference. We reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.
(6) The concern regarding only retrieving env-LTR recombinant Envs has been addressed in the revised results section (Lines 747-758). As also mentioned in the methods section, the RDP software detects the recombinant sequences and a breakpoint position for the recombinant signals and hence we confirmed only those sequences that were predicted as potential recombinant sequences by the RDP software through comparative genomics. All the sequences predicted by the software were env-LTR recombinant and hence we confirmed and reported only those recombinant sequences in the manuscript.
Reviewer #1 (Recommendations for the authors):
The paper could be strengthened by:
- a rigorous rewriting and shortening of the manuscript, thereby eliminating all textbook-like paragraphs, and all biological misinterpretations and confusions. Distinguish between retroviral replication as an exogenous virus, and host genome remodeling affecting ERVs. Rewrite the sections on template switching by RT being the basis for the observed recombinations, while host genome recombinations are far more likely. ERVs with such aberrant env/LTR gene recombination are unlikely to be fit for cross-species transmission. Likely, such a recombinant was generated in a common ancestor. Also, host RNA polymerase II transcribes retroviral RNA (line 79), not RT.
- check lines 89-90 as pro is part of the pol gene in gamma- and lentiviruses.
We thank the reviewer for the suggestion, we have revised the manuscript by shortening the introduction section and eliminating the textbook like paragraphs and also clarifying the recombination mechanism. We have revised the introduction section at Lines 102-111, and the clarification for the recombination mechanism is provided at lines 1668-1675
- adding much more information to the Methods section. Such as which genomes were searched, were nt or aa have been retrieved and analysed, were multiple genomes of a species searched, a list of databases used ('various databases' in line 164 does not suffice), etc.
We thank the reviewer for the observation. As mentioned above, in the revised manuscript we have provided more detailed methods by including a supplementary file for the genome assemblies used for tBLASTn analysis and comparative genomics. For the sequence alignment, phylogenetic analysis and recombination analysis we used nt sequences, as it is also mentioned in the revised version. Lastly, all the databases that were used and are mentioned in the methods section.
- more information is needed on the alignments and phylogenetic trees. For instance, how were indels treated? How long were the alignments on average regarding informative sites?
We thank the reviewer for the questions, to answer them we have added a paragraph (Lines 359-362) describing the reconstruction process in more details.
- confirm the findings about the presence or absence of an ERV, such as for the squirrel monkey genome, using additional genomes of the species
As mentioned above, we only used the genome assemblies available on the genome browser because of the annotations available on Genome browser, blasting the second NCBI RefSeq genome using the BLAST algorithm does not provide accurate information and annotations compared to that of Genome browser and hence we reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.
- present the lists of findings in primate genomes on pages 9 and 10 in tables
We thank the reviewer for the suggestion, we have provided a new table (Table 2) in the revised version summarizing the ERV Env distribution results.
- a significant limitation of the study is that only env ERVs found in hominoids have been searched in OWM and NWM, not ones specific for monkeys. This should be mentioned somewhere.
As the reviewer pointed out, the study was designed to explore ERVs’ Env sequences in hominoids which were then searched in the OWM and NWM genomes, this is now better stated in the introduction at Lines 57-60.
- define abbreviations at first use (e.g. HML in abstract)
We thank the reviewer for the suggestion, we have mentioned the abbreviations in the abstract, where we mentioned HML first (Line 65)
- explain 'pathological domestication' (line 42). Domestication implies usefulness to the host. And over time, deleterious insertions would have been likely purged from a population.
We thank the reviewer for the observation, we have modified the sentence and provided a clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).
Furthermore:
- why begin the discussion with a lengthy description of domestication and syncytins, which is not part of the current study?
We thank the reviewer for the critique. Accordingly, we have now modified the discussion section by shortening the part about domestication of syncytins, and just mentioned them as an example at lines 942-944.
- how can 96 hits have been retrieved for spuma-like envs (line 506), while it was earlier reported (line 333), that the most hits were gamma-like?
We thank the reviewer for the observation, we have clarified and explained how 96 hits have been retrieved for spuma-like envs in lines 670-677 of the discussion section.
English grammar should be improved throughout the manuscript.
And I could not open half of the supplementary files
As suggested we have revised English and checked that all files were correctly open.
Reviewer #2 (Public Review):
Summary:
The manuscript by Chabukswar et al. describes a comprehensive attempt to identify and describe the diversity of retroviral envelope (env) gene sequences present in primate genomes in the form of ancient endogenous retrovirus (ERV) sequences.
Strengths:
The focus on env can be justified because of the role the Env proteins likely played in determining viral tropism and host range of the viruses that gave rise to the ERV insertions, and to a lesser extent, because of the potential for env ORFs to be coopted for cellular functions (in the rare cases where the ORF is still intact and capable of encoding a functional Env protein). In particular, these analyses can reveal the potential roles of recombination in giving rise to novel combinations of env sequences. The authors began by compiling env sequences from the human genome (from human endogenous retrovirus loci, or "HERVs") to build consensus Env protein sequences, and then they use these as queries to screen other primate genomes for group-specific envs by tBLASTn. The "groups" referred to here are previously described, as unofficial classifications of endogenous retrovirus sequences into three very broad categories - Class I, Class II and Class III. These are not yet formally recognized in retroviral taxonomy, but they each comprise representatives of multiple genera, and so would fall somewhere between the Family and Genus levels. The retrieved sequences are subject to various analyses, most notably they are screened for evidence of recombination. The recombinant forms appear to include cases that were probably viral dead-ends (i.e. inactivating the env gene) even if they were propagated in the germline.
The availability of the consensus sequences (supplement) is also potentially useful to others working in this area.
Weaknesses:
The weaknesses are largely in presentation. Discussions of ERVs are always complicated by the lack of a formal and consistent nomenclature and the confusion between ERVs as loci and ERVs as indirect information about the viruses that produced them. For this reason, additional attention needs to be paid to precise wording in the text and/or the use of illustrative figures.
We thank the reviewer for the general observation. We put additional attention to the wording in text/figures, and hope to have improved the manuscript clarity.
Reviewer #2 (Recommendations for the authors):
Reviewing the manuscript was a challenge because figures were difficult to read. As provided, the fonts were sometimes too small to read in a standard layout and had to be expanded on screen.
The tree in Figure 3 could also be made easier to read, for example if the authors collapsed related branches and gave the clusters a single, clear label (this is not necessary, just a suggestion) - especially if the supplementary trees have all the labelled branches for any readers who want specific details.
I also recommend asking a third party (perhaps a scientific colleague) with fluency in English grammar and familiarity with English scientific idiom to provide some editorial feedback on the text.
Figure 4 legend is confusing. From the description it sounds like the tree in 4B is a host phylogeny, but it's not clearly stated. And if so, how was the tree generated? Is it based on entire genomes? Include at least enough methodological detail or citations that someone could recreate it, if necessary. The details and how it was done should be briefly mentioned here and in detail in the Methods section.
We thank the reviewer for the observation. As for Figure 4 we have modified its legend and more clearly stated how the phylogenetic tree of the primate genomes was generated using TimeTree. We have also provided further details in the methods section (Lines 475-489).
As suggested we have revised English.
Line 42 - what is "pathological domestication"? It sounds like a contradiction in terms.
We thank the reviewer for the observation. We have modifies the sentence and provided clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).
Lines 166-167 - the authors use the word "classes" but then use a list of terms that correspond to genera within the Retroviridae. The authors should be cautious here, as "class" and "genus" are both official taxonomic terms with different meanings. Do they mean genus? Or, if a more informal term is needed, perhaps "group"?
Thank you for the observation, the ERVs have been classified into three classes (Class I, II and III) based on the relatedness to the exogenous retroviruses Gammaretrovirus, Betaretrovirus and Spumaretrovirus genera respectively and hence have been mentioned in the manuscript as per the nomenclature proposed by Gifford et al., 2018 which has been cited at Lines 122-125.
Line 221- "defferent" should be "different"
Corrected
Lines 233-234 - what is meant by "canonical" and "non-canonical" forms? Can the authors please define these two terms?
Thank you for the question, canonical refers to sequences that are well-preserved and match the structural and functional features of complete env genes, and non-canonical refers to sequences with significant structural alterations or truncations that deviate from this typical form. This explanation has been mentioned in the revised version at Lines 475-479.
Line 252 - if/is
Corrected
Lines 274-276 needs a citation to the paper(s) that reported this.
Corrected
Line 283-285 - this was confusing. How could the authors have noted distinct occurrences and clusters of these if they were excluded from the BLAST analysis? It says the consensus sequences were effectively representing these, but doesn't this raise the possibility that the consensus sequences are not specific enough? Could this also then lead to false identification? Perhaps a few more words to explain should be added.
We thank the reviewer for the observation. While performing the tBlastn search we did obtain the hits for HERV15, HERVR, ERVV1, ERVV2 and PABL, and we have mentioned the detailed explanation about this observation in the revised manuscript at lines 619-627.
Line 298 - missing comma
Corrected
Lines 348-351- this list is not a list of recombination mechanisms. Template switching is a mechanism of recombination, but "acquisition" is simply a generic term, "degradation" is not a mechanism, and "cross-species transmission" might be a driver or a result of recombination, but it is not a mechanism of recombination.
We thank the reviewer for the observation. We have revised the explanation for the recombination events in the discussion section, as some parts of the results have been moved to discussion section (Lines 1058-1065)
Lines 369-372. It's not clear why this means the event was a "very recent occurrence". Do the authors mean that there were shared integration sites between some of the species, and that these sites lacked the insertions in other species (e.g. gibbon, orangutan, monkeys)?
For the long section on recombination events involving an env sequence with an LTR in it, can the authors explain how they know when it's a recombination event versus integration of one provirus into another one, followed by recombination between LTRs to generate a solo-LTR?
We thank the reviewer for the observation. Regarding the very recent occurrence of the recombination event, we have explained it in revised manuscript at lines 769-824 writing “In fact, the recombinant sequences were shared only between 4 species of Catarrhini parvorder and were absent in more distantly related primates (such as gibbons, orangutans, etc.). This with the presence of shared recombination sites suggests that the insertion occurred after the divergence of these species, while its absence in others indicate that it is a recombination event.”
For the observation regarding the env-LTR recombination events, the recombinants were first detected by the RDP software and were further validated through the BLAT search in the genomes available on genome browser. The explanation on how we obtained these env-LTR recombination events is now provided in lines 746-763 of the revised manuscript.
Methods Lines 151-168 and Figure 1 legend Lines 689-690 - how did the authors distinguish between "translated regions" corresponding to the actual Env protein sequence from translation of the other two reading frames? That is, there must have been substantial "translatable" stretches of sequence in the two incorrect reading frames as well as the reading frame corresponding to Env, so the question is how were the correct ones identified for the reconstruction?
We thank the reviewer for the observation. We have provided the detailed explanation to the observation in the methods section (Lines 335-359).
Line 495 - "previously reported" should include citation(s) of the prior report(s).
We thank the reviewer for the observation, we have provided appropriate citations.
Line 525 - the authors propose that the mechanism "is the co-packaging of different ERVs in a virus particle". First, I assume they meant to say that RNA from different ERVs is co-packaged. Second, isn't it also possible or likely that these could arise from co-packaging of exogenous retrovirus RNAs and recombination, especially if the related exogenous forms were still circulating at the time these things arose?
We thank the reviewer for the observation. We have modified in the revised manuscript a proposed mechanism that includes also the possibility of co-packaging of exogenous retrovirus RNAs and recombination, at lines 1082-1099
Line 686 - env should either be italicized (gene) or capitalized (protein), depending on what the authors intended here.
We thank the reviewer for the observation. We have corrected the typological error in the new version of manuscript.
Reviewer #3 (Public review):
Summary:
Retroviruses have been endogenized into the genome of all vertebrate animals. The envelope protein of the virus is not well conserved and acquires many mutations hence can be used to monitor viral evolution. Since they are incorporated into the host genome, they also reflect the evolution of the hosts. In this manuscript the authors have focused their analyses on the env genes of endogenous retroviruses in primates. Important observations made include the extensive recombination events between these retroviruses that were previously unknown and the discovery of HML species in genomes prior to the splitting of old and new world monkeys.
Strengths:
They explored a number of databases and made phylogenetic trees to look at the distribution of retroviral species in primates. The authors provide a strong rationale for their study design, they provide a clear description of the techniques and the bioinformatics tools used.
Weaknesses:
The manuscript is based on bioinformatics analyses only. The reference genomes do not reflect the polymorphisms in humans or other primate species. The analyses thus likely underestimates the amount of diversity in the retroviruses. Further experimental verification will be needed to confirm the observations.
Not sure which databases were used, but if not already analyzed, ERVmap.com and repeatmesker are ones that have many ERVs that are not present in the reference genomes. Also, long range sequencing of the human genome has recently become available which may also be worth studying for this purpose.
We thank the reviewer for the observations and comments. We would like to clarify that the intent of the work was to perform bioinformatics analysis and so a wet lab experimental verification of the observations are out of the scope of the present manuscript. For the aim of the manuscript, we have used the NCBI reference genomes, while for the report of the coordinates of HML supergroup in the squirrel monkey genome and the coordinates of the recombination events through BLAT search we have used genomes assemblies available on Genome browser with repeat masker custom track, since it has well represented ERV annotations.
The suggestion regarding using long range sequencing of human genome is an interesting perspective and hence in the future work we will try to implement it in our analysis as well as perform an experimental verification, since, again, the focus of the present work does not include wet experimental part.
Reviewer #3 (Recommendations for the authors):
In a few places the term HERV has been used when describing ERVs in non-human primates. This needs to be corrected.
We thank the reviewer for the observation. We have checked and accordingly modified the terms in the manuscript wherever necessary.