1. Evolutionary Biology
  2. Microbiology and Infectious Disease
Download icon

Retrocopying expands the functional repertoire of APOBEC3 antiviral proteins in primates

  1. Lei Yang
  2. Michael Emerman
  3. Harmit S Malik  Is a corresponding author
  4. Richard N McLaughlin Jnr  Is a corresponding author
  1. Pacific Northwest Research Institute, United States
  2. Division of Human Biology, Fred Hutchinson Cancer Research Center, United States
  3. Division of Basic Sciences, Fred Hutchinson Cancer Research Center, United States
  4. Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, United States
Research Article
  • Cited 6
  • Views 1,158
  • Annotations
Cite this article as: eLife 2020;9:e58436 doi: 10.7554/eLife.58436


Host-virus arms races are inherently asymmetric; viruses evolve much more rapidly than host genomes. Thus, there is high interest in discovering mechanisms by which host genomes keep pace with rapidly evolving viruses. One family of restriction factors, the APOBEC3 (A3) cytidine deaminases, has undergone positive selection and expansion via segmental gene duplication and recombination. Here, we show that new copies of A3 genes have also been created in primates by reverse transcriptase-encoding elements like LINE-1 or endogenous retroviruses via a process termed retrocopying. First, we discovered that all simian primate genomes retain the remnants of an ancient A3 retrocopy: A3I. Furthermore, we found that some New World monkeys encode up to ten additional APOBEC3G (A3G) retrocopies. Some of these A3G retrocopies are transcribed in a variety of tissues and able to restrict retroviruses. Our findings suggest that host genomes co-opt retroelement activity in the germline to create new host restriction factors as another means to keep pace with the rapid evolution of viruses. (163)


Host genomes have an ancient history of coevolution with selfish genetic elements. One type of these selfish elements, called endogenous retroelements, created a substantial fraction of most animal genomes (Canapa et al., 2016; de Koning et al., 2011; Lander et al., 2001; Smit et al., 2015; Sotero-Caio et al., 2017). Endogenous retroelements such as endogenous retroviruses (ERVs) and Long Interspersed Element-1s (LINE-1s) reside in host genomes where they ‘copy-and-paste’ themselves via the action of their reverse transcriptase. These retroelements can negatively impact host fitness by disrupting genes or regulatory regions, and by increasing the likelihood of ectopic recombination (Boissinot et al., 2001; Hancks and Kazazian, 2012; Kaer and Speek, 2013; Petrov et al., 2003; Song and Boissinot, 2007).

In addition to acting on their own RNA to ensure duplication, the reverse transcription/integration functions encoded by LINE-1s and ERVs also occasionally act on host mRNAs. This ‘off-target’ activity, termed retrocopying, entails the duplication of a host gene via the reverse transcription and integration of an mRNA. These ‘retrocopies’ are intronless and removed from the chromosomal location of the parental intron-containing gene. Previous studies estimated that 3,700–18,000 retrocopies are present in the human genome (Casola and Betrán, 2017; Navarro and Galante, 2015; Potrzebowski et al., 2008).

Two features distinguish retrocopies from other types of gene duplications. First, ‘DNA-based’ mechanisms of duplication (e.g., segmental gene duplications) result in a new copy of the gene including its promoter and distal regulatory elements. In contrast, retrocopying typically duplicates only the exons, leading to the moniker ‘processed pseudogene’. Thus, transcription of a new retrocopy depends on the genomic neighborhood into which it integrates (Carelli et al., 2016). Second, retrocopying relies on the machinery encoded by endogenous retroelements like LINE-1, which are highly active in germline and early embryo (Friedli et al., 2014; Garcia-Perez et al., 2007; Klawitter et al., 2016; Muotri, 2016; Wissing et al., 2012). Therefore, unlike DNA-based duplications, retrocopying is almost exclusively limited to RNAs expressed in germline or early embryonic tissues. It follows that the level of germline expression of host mRNAs should be highly correlated to their probability of generating retrocopies. For example, ribosomal proteins that are highly expressed in germline tissues represent the most abundant class of processed pseudogenes in the human genome (Balasubramanian et al., 2009). While germline expression of host mRNAs predicates the generation of retrocopies, the vast majority of these retrocopies show characteristic signatures of pseudogenization (Casola and Betrán, 2017; Navarro and Galante, 2015; Potrzebowski et al., 2008).

While most retrocopies do not increase the genic capacity of the host due to inactivating mutations, a subset of retrocopies escaped mutational abrasion, presumably because they provide a selective advantage to the host. Indeed, evidence of functional retention in retrocopied sequences has been found in diverse organisms and includes functions such as novel subcellular localization of proteins (Rosso et al., 2008), neurotransmitter metabolism (Burki and Kaessmann, 2004), courtship (Wang et al., 2002), fertility (Kalamegham et al., 2007), and pathogen restriction (Malfavon-Borja et al., 2013; Sayah et al., 2004). Such functional retention may be particularly beneficial in the case of host defense genes, whose functional diversification is necessary for host genomes to keep pace with pathogens. For example, retrocopying of the CypA gene between coding exons of the TRIM5 gene has created novel TRIMCyp fusion genes that can potently restrict retroviruses including HIV-1 (Malfavon-Borja et al., 2013; Newman et al., 2008; Nisole et al., 2004; Sayah et al., 2004; Virgen et al., 2008; Wilson et al., 2008). In a remarkable case of convergent evolution, retrocopying has created TRIMCyp fusion genes multiple times during primate evolution, further expanding and diversifying the TRIM gene family for retroviral defense (Brennan et al., 2008; Virgen et al., 2008). In other examples, mobile element and viral genes themselves have been retrocopied and domesticated for various functions including antiviral defense (Best et al., 1996; Fujino et al., 2014; Malik and Henikoff, 2005; McLaughlin et al., 2014; Ito et al., 2013; Yan et al., 2009).

Here, we investigated whether retrocopying may have similarly diversified another family of host defense genes: the APOBEC3 (A3) cytidine deaminases. Although the common ancestor of placental mammals likely encoded three A3 genes (Münk et al., 2012), this locus has recurrently expanded and contracted throughout mammalian evolution (Ito et al., 2020), including a dramatic expansion to seven paralogous genes in catarrhine primates (Old World monkeys and hominoids) followed by recurrent positive selection of this expanded gene set (Bulliard et al., 2009; Compton et al., 2012; Duggal et al., 2011; Henry et al., 2012; McLaughlin et al., 2016; OhAinle et al., 2006). We found that ancient and recent retrocopying has further diversified the already expansive A3 gene repertoire in primates, adding as many as ten new A3s outside the well-studied’ A3 locus’. Our work uncovered an ancient A3 born via retrocopying in the common ancestor of simian primates and a dramatic, ongoing history of A3 retrocopying in New World monkeys (NWMs). Many of these NWM-specific A3 retrocopies are expressed and some retain the capability to restrict retroviruses. Thus, retrocopied A3s have continually expanded host defense repertoires in primate genomes.


A3I: an ancient A3 retrocopy in simian primates

The A3 locus of human, other hominoids, and Old World monkeys comprises seven clustered genes, A3A-A3H (Jarmuz et al., 2002; OhAinle et al., 2006; Silvas and Schiffer, 2019Figure 1A). We undertook an analysis to search for any variation in this gene structure in other primate genomes. We used BLASTn with each of the seven human A3 nucleotide sequence to query all sequenced primate genomes on NCBI. As expected, each genome contained a series of proximal hits, presumably comprising the A3 locus. However, in every simian primate genome examined, we also found exactly one shared syntenic region, distinct from the A3 locus, with high similarity to A3G. This sequence match spanned the exonic sequences of A3G in a single contiguous region of around 1,100 bp (Figure 1B). Based on the absence of introns, we concluded that this sequence represents a retrocopy of an A3 gene. We found no evidence of the syntenic copy of this A3 retrocopy in the genomes of prosimians including the tarsier, bushbaby, and mouse lemur. We, therefore, conclude that this retrocopy was born in the common ancestor of simian primates, and hence propose the name A3I for this retrocopy, extending the nomenclature scheme for other human A3 genes.

Figure 1 with 2 supplements see all
Identification and phylogenetic distribution of A3I.

(A) A3I is located away from the A3 locus at a distant but highly conserved syntenic locus in all simian primates. The human genome is shown as an example. (B) ORF structure of A3I in various primate species. Purple boxes represent sequences that can be aligned to the intron-containing A3 copies, whereas yellow boxes represent the longest ORF of the A3I in corresponding species. Stars (*) indicate the position of stop codons. (C) Maximum likelihood phylogeny of A3Is and the intron-containing A3Bs and A3Gs. Clusters of A3Is, A3Gs, and A3Bs are highlighted by their respective color, and bootstrap values leading to these clusters are shown on the nodes. (D) Expansion of A3 retrocopies along the primate phylogeny. The number of retrocopies of each A3 is shown in color boxes at the inferred point of retrocopy birth in the primate phylogeny. The white ‘A3’ box represents a sequence that could not be assigned to a particular ortholog.

To understand the relationship between A3I and other A3 genes, we created a maximum likelihood tree using the nucleotide sequences from simian primate A3B, A3G, and A3I genes. A3I could be aligned to primate A3Gs and A3Bs, since A3I shares the deaminase domain organization of these A3s (A3Z2-A3Z1; Figure 1A). With high bootstrap support, we found that all A3Is share a common ancestor to the exclusion of A3B and A3G genes. This pattern held with a tree of individual deaminase domains from all human A3s (Figure 1—figure supplement 1). Our analyses found the A3G genes to be the closest phylogenetic neighbors of the A3Is, suggesting a common ancestry of these two genes (Figure 1C). Since A3G is predicted to have been born soon after the simian-prosimian split (Münk et al., 2012), we propose that A3I arose via retrocopying of A3G in the common ancestor of simian primates, approximately 43 million years ago (MYA) (Perelman et al., 2011).

In all species analyzed, A3I has acquired potentially inactivating mutations relative to A3G. We found that all A3I retrocopies share a nonsense mutation at codon position 261 (Supplementary file 3), and thus, in most species, A3I encodes only a short putative open reading frame (ORF) of 153 codons (compared to the 384 codon ORF of A3G) which spans the N-terminal deaminase domain. Following this initial truncation, there were additional lineage-specific disruptions of the A3I ORF during the diversification of simian primates. These results suggest that A3I was born in the simian ancestor either as a truncated retrocopy or acquired a truncating mutation shortly following birth. Nonetheless, it is possible that the ancient A3Is encoded functional A3 proteins before becoming disabled by mutation.

Multiple A3G retrocopies in new world monkeys

A3I was not the only hit uncovered by our search for A3 retrocopies. Our analyses also revealed A3F retrocopies in three Old World monkeys within the Colobinae subfamily, A3H retrocopies in two New World monkeys, and a single A3 retrocopy that cannot be assigned to a specific A3 parent in the greater galago (Otolemur garnettii) (Figure 1D). However, our most striking finding was that every sequenced NWM genome contained numerous A3G retrocopies. This abundance of A3 retrocopies motivated a deeper investigation of the evolution and function of these NWM A3G retrogenes.

Initially focusing on the common marmoset (Callithrix jacchus) genome, we found three intron-containing A3 genes on chromosome 1 (likely orthologous to human A3A, A3G, and A3H). We also found nine loci outside of the A3 locus with high sequence similarity to human and marmoset A3G genes (Figure 2A). In contrast to the marmoset A3G gene, each of these additional hits lacked introns suggesting they were retrocopies. Seven A3G retrocopies spanned more than 1,100 bp and most of the coding exons of the marmoset A3G. In addition, one retrocopy spanned 811 bp, and one shorter retrocopy spanned 260 bp (Figure 2A). These shorter retrocopies showed a marked 3’ bias, consistent with the 5’-truncation-prone, target-primed reverse transcription mechanism of LINE-1 and related retrotransposons (Cost et al., 2002; Luan et al., 1993). Seven A3G retrocopies possessed premature stop codons, deletions, or early truncations (Figure 2A). Two of these retrocopies encode ORFs that span a single cytidine deaminase domain (’C1’ and ‘C2’). However, two A3G retrocopies (‘SS1’ and ‘SS2’, Figure 2A) were predicted to encode a 382 amino acid protein (comparable to the full-length 384 amino acid protein encoded by the intron-containing A3G gene). Thus, the marmoset genome contains nine A3G retrocopies, which may encode 2–4 additional A3G-like proteins.

Figure 2 with 2 supplements see all
Discovery and phylogenetic analysis of A3G retrocopies in New World monkeys.

(A) The common marmoset (Callithrix jacchus) genome encodes a single A3G and nine retrocopies of A3G (orange boxes). A3G resides on four coding exons at the A3 locus on chromosome 1, while the retrocopies are intronless and found throughout the genome. Some retrocopies contain putative protein-coding ORFs (yellow boxes) of varying lengths that retain alignable sequence similarity to the A3G protein (gray within yellow boxes, regions of poor alignment caused by frame-shifting mutations). (B) PhyML tree of A3G and A3G retrocopies from the four sequenced and assembled New World monkey genomes suggests that six retrocopies are orthologous and conserved in all four species (clusters C1-C5 and A3I). The genome of each species (colors correspond to species) contains an intron-containing A3G (dotted line) as well as retrocopies that are closely related to A3G. These more recent copies are found in only one genome, without identifiable orthologs in the other three species. Some retrocopies retain a putative protein coding ORF (indicated by a circle at the branch tip).

Next, we expanded our search for A3G retrocopies to the other assembled NWM genomes on NCBI: the Bolivian squirrel monkey (Saimiri boliviensis), the white-faced capuchin (Cebus capucinus), and Ma’s night monkey (Aotus nancymaae). Like marmoset, we found multiple A3G retrocopies in each of these genomes – eight in capuchin, seven in night monkey, and ten in squirrel monkey (Figure 2A). All but one of the NWM A3G retrocopies aligned (without large gaps) to each other and intron-containing A3Gs, and a phylogenetic tree confirmed these retrocopies cluster with NWM A3Gs (Figure 1—figure supplement 2). The exception, one squirrel monkey A3G retrocopy (GenBank: JH378161), contains a 286 bp insertion of another shorter A3G retrocopy, most likely the result of a nested insertion of one retrocopy into another.

A maximum likelihood phylogenetic tree (Figure 2B) using the alignable NWM A3Gs and A3G retrocopies revealed six bootstrap-supported ‘clusters’ of retrocopies with representatives from all four analyzed NWM species (Figure 2B and Supplementary file 1 clusters C1-C5 and A3I). One of these clusters contains the NWM A3I sequences which date back to at least the last common ancestor of simian primates. Our findings suggest that the other five clusters represent orthologs of A3G retrocopies born in or before the last common ancestor of these four NWM species. This orthology was further supported by shared synteny in two NWM genomes (marmoset and squirrel monkey) for all clusters (Figure 2—figure supplement 1). We, therefore, conclude that five orthologous NWM A3G retrocopies were likely born via independent retrotransposition events in or prior to the most recent common ancestor of these four species analyzed.

To more precisely date the origins and species distribution of these A3G retrocopies, we investigated their presence in additional NWM species lacking publicly available genome sequences. For each retrocopy, we used UCSC MultiZ alignments (Blanchette et al., 2004) to find flanking sequence conservation in shared syntenic locations of marmoset, human, and mouse genomes to design oligos specific to a single retrogene-containing locus in marmoset. We were able to do so for seven of nine retrocopies (all but SS3 and C4, Figure 2A). Confirming the specificity of these oligos, six of these seven oligo pairs reproducibly amplified a single locus from marmoset genomic DNA with touchdown PCR whereas only A3I was amplified from human genomic DNA (Figure 2—figure supplement 2). Using these oligos and genomic DNA from other species, we observed that many retrocopies were present in the shared syntenic loci in other NWMs. Specifically, retrocopies C1, C2, C3, and C5 of marmoset were also present in titi (Plecturocebus moloch) and saki monkeys (Pithecia pithecia), two species in the basal family of the NWM phylogeny, suggesting these retrocopies were born in or prior to the most recent common ancestor of all NWMs (~25 MYA).

In addition to the six orthologous ‘clusters’ of retrocopies found in all or many NWMs, our phylogenetic analysis also reveals ‘species-specific’ retrocopies (SS) with no apparent ortholog in the other three species with genome assemblies (Figure 2B). These A3G retrocopies instead share a recent common ancestor with the intron-containing A3G gene from the same species, suggesting that they were born recently. Our PCR analysis revealed that some ‘marmoset-specific’ retrocopies were also present in the closely related tamarin (Saguinus oedipus). Thus, A3G retrocopies vary in age, from being found in only one species, in a few closely related species, in all NWM species, or in all simian primates. The different branch lengths leading to each of the NWM A3G retrocopies or retrocopy ortholog clusters also reflect their variable ages (Figure 2B). Our findings suggest that rather than a single burst, A3G retrocopies have been continually born throughout the evolutionary history of NWMs.

Retention of putatively functional NWM A3G retrogenes

Retrocopies are often assumed to be nonfunctional at birth since they usually consist of only the sequence within the mRNA of the parent gene and therefore lack promoters, and enhancers. However, there are well-documented examples of retrogenes that have been retained for their functionality (Casola and Betrán, 2017). To investigate whether any of the A3 retrocopies might be functional, we used several criteria to eliminate retrocopies likely to be non-functional. We assumed that functional retrocopies should be transcribed and should have evolved under selective constraint on an intact open reading frame with an intact cytidine deaminase motif. To be conservative, we narrowed our focus to the A3G retrocopies detectable in the four sequenced, publicly available New World monkey genomes – nine copies in common marmoset, eight copies in white-faced capuchin, ten copies in Bolivian squirrel monkey, and seven copies in Nancy Ma’s night monkey.

Many A3 genes are expressed in the germline and early development (Friedli et al., 2014; Marchetto et al., 2013; Refsland et al., 2010) where they protect against a diverse range of infectious and endogenous elements including retroviruses and LINE-1 (Arias et al., 2012; Harris and Dudley, 2015). In order to similarly assess expression of A3G retrocopies in vivo, we queried publicly available NWM RNA-seq datasets (Supplementary file 2) using all reference A3G retrocopy sequences. We organized all perfect-matching, uniquely-mapping read counts by species (Figure 3, x-axis) and specific retrocopy (Figure 3, y-axis). The intron-containing A3G gene itself showed detectable expression in most datasets in all four species (organized by tissue source of each dataset, colored bars down columns denote tissues). We also found that several A3G retrocopies showed expression in each species (dark bars within each species). For example, C2 retrocopies (Figure 3) are expressed in three of the four analyzed NWM species. In marmoset, the C2 retrocopy is expressed in stem cells and induced pluripotent cells, similar to the intron-containing A3G gene. We also found that each species expressed at least one species-specific retrocopy. In several species, these younger, species-specific A3G retrocopies were expressed in ovaries and testes just like the intron-containing A3G. Overall, our data suggest that a large subset of NWM A3G retrocopies are expressed in vivo, including in tissues relevant for defense against pathogens.

A3G retrocopies are transcribed in New World Monkey tissues A heat map shows the counts of RNA-seq reads (log10 of read count + 1) that map uniquely at 100% identity and coverage.

Each pixel represents the average read counts of available data for the corresponding tissue type and A3G retrocopy. Tissue types are marked by the colored lines behind the pixels. Green represents germline tissues including iPSC, ESC, testis and ovary; orange represents brain tissues of various regions; red represents blood samples including whole blood and lymphocytes. Retrocopies which retain a putative protein coding ORF are labeled with ‘ORF’.

Second, we evaluated the A3G retrocopies for their predicted ORFs. Full-length NWM A3G genes encode a ~ 384 amino acid protein. In contrast, most A3G retrocopies encode only short putative ORFs, typically less than 100 codons (Figure 2A, yellow boxes). However, a subset of A3G retrocopies have retained a predicted ORF of at least 250 codons (Figure 2A, yellow boxes; Figure 2B, empty circles on branches) which encompasses one or two of the core deaminase domains. Most of these retrocopies conserve the core amino acids within these domains required for antiviral or anti-retroelement activities of A3G (Figure 4Huthoff and Malim, 2007; Navarro et al., 2005).

A3G retrocopies retain core deaminase motifs.

An amino acid alignment of the core deaminase motifs shows that A3Gs of various primates have conserved HxE-CxxC motifs in both the N- and C-terminal domains. The putative ORF-encoding retrocopies all retain a conserved C-terminal motif, and most retain an N-terminal motif.

Third, we evaluated the A3G retrocopies for evidence of selective retention. Although A3G retrocopies are expected to lack stop codons upon their birth from an intact A3G gene, absence of stop codons in older A3G retrocopies could indicate functional retention. We adapted a previously published approach (Young et al., 2018) to simulate the rate of decay of ORFs in the absence of selection based on ORF length and conservative and liberal bounds on NWM background mutation frequency and generation time (Campbell and Eichler, 2013; Tacutu et al., 2018; Thomas et al., 2018). We found that that less than 5% of A3G ORFs were expected to remain intact after 20 million years (less than 1% after 40 million years) (Figure 5). In contrast to this expectation, we found two A3G retrocopies have remained intact despite being at least 20 million years old (Bininda-Emonds et al., 2007). These include one C1 A3G retrocopy (with a preserved ORF in capuchin monkey) and a C2 retrocopy (with a preserved ORF in marmoset and night monkey). Based on these findings, we hypothesize that some NWM A3G retrocopies have been retained for their function.

Figure 5 with 1 supplement see all
Simulation and evolution suggest selection to retain ORFs in A3G retrocopies.

(A) A simulation of ORF retention suggests most are lost within 10–20 million years in the absence of any selection to retain the ORF. Dots indicate the proportion of simulated ORFs (10,000 total) that were still intact after a given time. Colors represent three sets of parameters intended to match New World monkeys (green) or provide liberal (orange, mouse-like) and conservative (blue, human-like) bounds on the parameter sets of indel rate and generation time. The substitution rate of Ma’s night monkey was used for all three sets of simulations. Horizontal red lines indicate the 1 st and 5th percentile of intact ORFs. Vertical red lines mark the key time points of last common ancestors (LCA) among New World monkeys.

To further evaluate the selective constraint acting on A3G retrocopies, we used computational models to test whether their evolution more closely resembles a functional gene or a pseudogene. We first used the RELAX method (Wertheim et al., 2015) to test whether the A3G retrocopies show relaxed selection relative to intron-containing A3G genes. Significant relaxed selection was not detected in the putatively intact retrocopies relative to the intron-containing A3Gs (Figure 5—figure supplement 1). Instead, RELAX suggests that the A3G retrocopies have evolved more rapidly than the intron-containing A3G genes in the same set of species (Figure 5—figure supplement 1). Next, using a branch model of PAML (Yang, 2007), we observed that two retrocopies (capuchin-C1 and marmoset-SS1) had elevated dN/dS (2.6 and 2.9 respectively, significantly greater than the neutral expectation of 1), while the rest of the branches were suggestive of neutral evolution or purifying selection (Figure 5—figure supplement 1). These analyses suggested that, overall, the retrocopies evolved at a similar or accelerated rate compared to intron-containing A3Gs. Further, capuchin-C1 and marmoset-SS1 show evidence of accelerated evolution. Overall, our three lines of evidence suggest that at least a subset of the A3G retrocopies are likely to have been retained for their function.

Antiviral activity of NWM A3G retrocopies

We reasoned that A3G retrocopies could have a role in innate immunity/genome defense similar to intron-containing A3 genes. To test this possibility, we cloned and assayed intron-containing A3Gs and each A3G retrocopy encoding an intact near-full-length ORF for its ability to restrict the endogenous retroelement LINE-1 using established in vitro retrotransposition assays (Dewannieux et al., 2003; Moran et al., 1996). These assays require that a LINE-1 sequence be transcribed, spliced, and reverse transcribed back into the genome. As controls, we tested the anti-LINE-1 restriction of human A3A and human A3G. Consistent with previous reports (Bogerd et al., 2006; Chen et al., 2006; Muckenfuss et al., 2006; Niewiadomska et al., 2007), we observed potent restriction of LINE-1 by human A3A, and no restriction by human A3G. In contrast to human A3G, we found that the intron-containing A3Gs from marmoset and squirrel monkey restricted LINE-1 more than 10-fold, comparable to A3A. However, we observed no appreciable restriction of LINE-1 by any of the A3G retrocopies (Figure 6; Figure 6—figure supplement 1). Thus, despite potent anti-LINE-1 restriction by NWM A3Gs, it appears that this activity is not retained by any of the retrocopies tested.

Figure 6 with 1 supplement see all
A3G retrocopies restrict HIV-1 but not LINE-1 Bar charts of measured restriction of LINE-1 (retrotransposition assays) and HIV-1ΔVif (single cycle infectivity assays) show that NWM A3Gs and some A3G retrocopies restrict retrovirus.

Only NWM A3Gs, but not retrocopies restrict LINE-1.

Next, we investigated the antiviral restriction by NWM A3G genes and retrocopies. Using single-cycle infectivity assays, we measured the ability of NWM A3G genes and retrocopies to block infectivity of HIV-1ΔVif, which lacks Vif, a known antagonist of APOBEC3 proteins. Consistent with previous results, we found that human A3G potently restricts HIV-1ΔVif but human A3A is a poor restrictor; this restriction pattern is the opposite to that observed for LINE-1 restriction in here and in previous findings (Bogerd et al., 2006; Chen et al., 2006; Turelli et al., 2004Figure 6). We also observed 100-fold or greater restriction of HIV-1ΔVif infectivity by intron-containing A3G genes from marmoset and squirrel monkey (Figure 6; Figure 6—figure supplement 1), consistent with a previous report of restriction by NWM A3Gs (Wong et al., 2009). Finally, we observed that two retrocopies – marmoset-SS1 and capuchin-C1 – restrict HIV-1ΔVif at least 10-fold, suggesting that these two A3G retrocopies encode bona fide A3G-like anti-retroviral activity. Thus, retrocopying has expanded the functional repertoire of A3 antiviral genes in NWMs. At least 2 of these genes are expressed at the RNA level in at least some tissues and encode a functional protein with antiviral activity.


Replicating retrotransposons inflict deleterious consequences on host genomes via insertional mutagenesis, ectopic recombination, and dysregulation of proximal genes (Beck et al., 2011). Despite these negative consequences, retrotransposons can bring about innovation in host genomes via the birth of new exons or genes (Mi et al., 2000; Schmitz and Brosius, 2011), or novel regulatory mechanisms and gene-regulatory networks (Chuong et al., 2016; Kunarso et al., 2010; Wang et al., 2007). In this work, we show that retrotransposon-mediated gene birth can lead to continual evolution of new innate immune genes. We show that all simian primate genomes contain the remnants of A3I, an ancient A3 retrocopy. We further find that NWM genomes have continually acquired A3G-derived retrocopies, a subset of which are transcribed, retain intact ORFs and functional motifs, and are capable of restricting retroviruses.

This history of ancient and young retrocopies provides a valuable resource in understanding how antiviral genes coevolve with pathogens, including changes in Vif-interacting residues or viral restriction profile (Krupp et al., 2013). Although numerous methods exist for reconstruction of ancestral sequences, rapidly evolving genes like the A3s violate assumptions and often limit the utility of these methods, thereby preventing reliable reconstruction of ancestral sequences. However, retrocopies are molecular fossils, an evolutionary snapshot of the ancient parental gene sequence which presumably evolved neutrally after inserting into the genome. A3I provides such a record of an A3G-like gene from 40 MYA, which was present in the common ancestor of simian primates. Given the rapid gene turnover of the A3 locus in mammals, it is possible that the parent of A3I no longer exists in modern primates. In this scenario, the A3I retrocopy may be all that remains of this ancient A3 gene which predates simian primate diversification.

Recent computational analysis corroborates the presence of A3 retrocopies in two of the genomes we analyzed (Hayward et al., 2018; Ito et al., 2020) and adds to a growing literature suggesting the A3 content of mammalian genomes may be even more variable and dynamic than previously appreciated (Hayward et al., 2018; Ito et al., 2020). Our data suggests that A3 retrocopying is more prevalent in NWM genomes compared even to other simian primates. This abundance is consistent with a previous study that reported an increased number of retrocopies of all genes in marmoset and squirrel monkey genomes, correlated with an increase in the activity of two LINE-1 subfamilies L1PA7 and L1PA3 (Navarro and Galante, 2015). It is unclear whether increased LINE-1 activity is sufficient to explain our observations since some NWMs like the Ateles lineage may have low or no retroelement activity (Boissinot et al., 2004). Even if NWM LINE-1 activity is high, it would not necessarily explain why A3G rather than the other NWM A3 genes are subject to recurrent retrocopying. Although duplication of some nuclear A3 proteins like human A3A or A3B are likely to be more toxic due to increased genomic mutation (Hultquist et al., 2011; McLaughlin et al., 2016), we favor the alternate hypothesis that A3G expression in the germline/early embryos of NWMs is unusually high, rendering it a more likely substrate for retrocopying relative to other NWM A3 genes. Following their insertion into a new genomic location, these retrocopies could be expressed by exaptation of a neighboring transposable element, promoter piggybacking, or recruitment of a novel promoter (Carelli et al., 2016). Recent work suggests that most of the mouse genome is transcribed over relatively short evolutionary timescales (Neme and Tautz, 2016). Such ‘genome-wide’ transcription could be the first step in exposing an advantageous function of a retrogene (Jaganathan et al., 2019).

We showed that intron-containing NWM A3G genes restricted both LINE-1 and HIV-1. Thus, it is likely that A3G retrocopies retained both of these functions immediately following birth. Yet, over time, all retrocopies that restrict HIV-1 have lost the ability to restrict LINE-1 (Figure 6). Although this could reflect idiosyncratic events, our finding that anti-LINE-1 activity, but not anti-retroviral activity, was repeatedly lost, suggests otherwise. It is possible that the anti-LINE-1 function is simply more sensitive to random mutation, such that mutations are more likely to result in loss of LINE-1 restriction; we also cannot rule out the possibility that certain A3G retrocopies retain the capacity to restrict NWM-specific LINE-1 lineages. Alternatively, A3G retrocopies may have been absolved of selection for LINE-1 restriction, perhaps due to sufficient silencing by A3G and other restriction factors. Nevertheless, the retrocopies present a natural ‘separation of function’ event that can delineate the requirements for A3G proteins to restrict LINE-1 versus retroviruses.

Although we used the lentivirus HIV-1ΔVif to measure the anti-retroviral activity of NWM A3G retrocopies, lentiviruses have not yet been found in NWMs. Even apart from lentiviruses, few active retroviruses in general have been found in NWMs; those that have been found likely represent the tip of an understudied aspect of monkey and virus biology (Colcher et al., 1977; Muniz et al., 2013). Thus, HIV-1ΔVif only serves as a proxy for the activity of A3G retrocopies towards some relevant viral pathogen in the natural environment of these monkeys.

While the NWM A3G retrocopies did not restrict LINE-1, such a mechanism of gene duplication could, in theory, function as a feedback mechanism on excess retroelement activity in the germline/early embryo. Retroelement restriction factors expressed in these tissues could be retrocopied and increase dosage or diversity of anti-retroelement restriction factors (Kondrashov et al., 2002). In this way, the retrocopies may represent a ‘revolving door’ of new gene substrates for neo- or sub-functionalization; the needs of the genome would dictate which functions persist.

In conclusion, our findings suggest retrocopied gene sequences represent a prevalent, recurrent, and rapid mechanism in primates and other organisms to evolve new genome defense functions including restriction of viruses. Although the presence of endogenous retroelements is probably net deleterious to the host, retrogene birth represents a mechanism whereby host genomes could nevertheless take advantage of the activities of these genomic pathogens to protect themselves against endogenous and infectious pathogens.

Materials and methods

Identification of A3 retrocopies

Request a detailed protocol

A3G retrocopies were identified using BLAT of UCSC genome databases for marmoset (Callithrix jacchus draft assembly, WUGSC 3.2, GCA_000004665.1) and squirrel monkey (Saimiri boliviensis, saiBol1, GCA_00023585.1) with marmoset A3G (NM_001267742) as a query sequence. Additional copies were identified using BLASTn of the NCBI genome assemblies of Ma’s night monkey (Aotus nancymaae, Anan_2.0) and capuchin (Cebus capucinus imitator, Cebus_imitator-1.0). The spider monkey retrocopy was identified using BLAST to query the NCBI HTGS database for reads from New World monkeys. See Supplementary file 1 for detailed coordinates of each sequence.

Mapping inactivating mutations in retrocopies

Request a detailed protocol

A3I sequences were queried using the codon-based and indel-sensitive alignment program LAST (http://last.cbrc.jp). The translated A3G sequence of Callithrix jacchus (NC_013914.1) was used as the reference sequence and indexed using the setting of ‘lastdb -p -cR01’ of the LAST aligner, and then the A3I sequences were queried using the setting of ‘lastal -F15’ to output in ‘maf’ format. The longest indel-sensitive translation of each A3I was then manually extracted from the maf output and aligned with mafft (https://mafft.cbrc.jp/alignment/software/) using the setting of ‘--anysymbol’ to allow stop codons and frame shifting changes to be shown.

Analysis of syntenic A3G retrocopies

Request a detailed protocol

Synteny of A3G retrocopies in marmoset and squirrel monkey was analyzed using UCSC table browser to download gene names within 1Mbp of either side of the retrocopy. Synteny was confirmed if the same gene was adjacent next to the retrocopy in both species. For five pairs of sequences that the tree suggested should be orthologous, we found shared genes on both sides of the retrocopies. For one retrocopy (C5), we found a shared gene on only one side of the retrocopies (Figure 2—figure supplement 1).

Construction of A3 phylogeny

Request a detailed protocol

A3G and A3G retrocopies were aligned using MAFFT (Katoh and Standley, 2013) with auto algorithm parameters within Geneious version 11.1.4 (Kearse et al., 2012). All retrocopies (both ORF-containing and retropseudogenes) were aligned using the complete alignable region defined by BLASTn. Trees were constructed using PHYML (Guindon et al., 2010) with NNIs topology search, BioNJ initial tree, HKY85 nucleotide substitution model, and 100 bootstraps.

ORF retention simulation

Request a detailed protocol

To simulated the decay of retro A3G ORFs, we used the 'mutator' and 'orf_scanner' scripts developed by Young et al., 2018. The ORF of Callithrix jacchus A3G (identified from GenBank accession NC_013914.1, 1,150 bp) was used as the starting ORF. Combinations of several substitution, insertion, deletion rate and sexual maturation time were used for the simulation. We used substitution, insertion and deletion rate of 1.16 × 10−8, 2 × 10−10 and 5.5 × 10−10 per site per generation for human (Campbell and Eichler, 2013), substitution, insertion and deletion rate of 5.4 × 10−9, 1.55 × 10−10 and 1.55 × 10−10 per site per generation for mouse (Uchimura et al., 2015), and substitution rate of 8.1 × 10−9 for Night monkey (Aotus nancymaae) (Thomas et al., 2018). Sexual maturation time of human, mouse and New World monkeys were estimated to be 25, 0.3 and 1–9 years (http://genomics.senescence.info; https://animaldiversity.org). Each run simulates the mutation of the starting ORF for 50 million years, and the simulations with each set of parameters were repeated 10,000 times. The number of ORFs that were still open and at the same length of the starting ORF were counted at every 50,000 years of each simulation.

Analysis of selective constraints in A3G retrocopies

Request a detailed protocol

RELAX (Wertheim et al., 2015) was carried out using the Datamonkey webserver (Weaver et al., 2018) and a PhyML (Guindon et al., 2010) tree of the MAFFT (Katoh and Standley, 2013) aligned nucleotide sequences of the subset of retrocopies that encode an ORF longer than 250 amino acids in addition to the New World monkey A3Gs with or without human A3G. We defined the branches leading to the A3Gs as reference branches and all of the other branches as test branches. The above nucleotide alignment and PhyML tree were input into the CODEML NSsites model of PAML (Yang, 2007). To test for selection along branches, these same input files were input into the branch model of PAML. To test for significance of branches with apparent dN/dS < 1, we fixed that branch at dN/dS = 1 and calculated the likelihood of this tree.

RNA-seq analysis for retrocopy and A3G expression

Request a detailed protocol

We searched the NCBI GEO and SRA databases (October 2018) with the keywords ‘Callithrix’, ‘Aotus’, ‘Saimiri’ and ‘Cebus’ to find existing RNA-seq datasets from these species. Callithrix jacchus, Aotus nancymaae, and Cebus capucinus are used, matching the available species where retrocopies of A3G were identified. For Saimiri, Saimiri sciureus RNA-seq was used, for which no genome sequence has been published, and the retrocopy analysis in the rest of the text analyzes Saimiri boliviensis. All RNA-seq datasets (Supplementary file 2) were queried using the default parameters of the ‘blastn_vdb’ tool of SRA toolkit (Leinonen et al., 2011) and the identified A3Gs and A3G retrocopies in this work as query sequences. RNA-seq reads hit by blastn_vdb were then processed with a custom perl (https://www.perl.org) script to only keep the reads that match the query sequence at 100% identity across the entire RNA-seq read and maps uniquely to only one of the queried retrocopies or A3G. Read that passed these filters were tallied and organized by species, tissue type, and the retrocopy or A3G copy they match.

LINE-1 retrotransposition assays

Request a detailed protocol

LINE-1 retrotransposition assays were carried out as previously described (Xie et al., 2011). For the mouse ORFeus luciferase assays 25,000 HEK293T cells (ATCC Cat# CRL-3216, RRID:CVCL_0063) were seeded into each well of a 96-well clear bottom, white-wall plate. 24 hr later, each well was transfected with 200 ng pYX016 (CAG promoter driving mouse ORFeus LINE-1 with globin intron and luciferase reporter) or pYX015 (Xie et al., 2011) (JM111 inactive human LINE-1 construct which contains loss-of-function mutations in ORF1p of LINE-1) and pCMV-HA-A3 or pCMV-HA-empty (RRID:Addgene_32530). 24 hr post-transfection, transfected cells were selected with 2.5 μg/ml puromycin for 72 hr. Cells were lysed and luciferase substrates provided using the Dual-Glo Luciferase Assay System (Promega E2920). Renilla and firefly luciferase activity were measured using the LUMIstar Omega luminometer. Retrotransposition is reported as firefly/renilla activity to control for toxicity.

Virus infectivity assays

Request a detailed protocol

Single-round HIV-1 infectivity assays were performed as described previously (OhAinle et al., 2006; Yamashita and Emerman, 2004). To produce VSV-G-pseudotyped HIV-1, 50,000 HEK293T cells (ATCC Cat# CRL-3216, RRID:CVCL_0063) were plated in a 24-well plate, and 24 hr later, co-transfected with 0.3 μg lentiviral vector encoding luciferase in the place of the nef gene (pLai3ΔenvLuc2 (Yamashita and Emerman, 2004), pLai3ΔenvLuc2ΔVif (OhAinle et al., 2006), 50 ng L-VSV-G, and 300 ng pCMV-HA-A3G or pCMV-HA-empty. All viruses were harvested 48 hr after transfection and filtered through a 0.2 μm filter. p24 gag in viral supernatants was quantified using an HIV-1 p24 Antigen Capture Assay (ABL Inc). Virus equivalents to two nanograms of p24 gag were used to infect 50,000 SupT1 cells (ATCC Cat# CRL-1942, RRID:CVCL_1714) per well in a 96-well plate in the presence of 20 μg/ml DEAE-dextran. Forty-eight hours after infection, cells from triplicate infections were lysed in 100 μl Bright-Glo luciferase assay reagent (Promega) and read on a LUMIstar Omega luminometer (BMG Labtech). A3A Western blots were carried out using Covance mouse HA.11 Clone 16B12 anti-HA monoclonal antibody (Covance Cat# MMS-101P-200, RRID:AB_10064068).

Data availability

All data generated or analyzed during this study are included in the manuscript, supporting files, or publicly available databases as listed in the Supplementary files 1 and 2. Raw data files have been provided for Figure 3.


    1. Lander ES
    2. Linton LM
    3. Birren B
    4. Nusbaum C
    5. Zody MC
    6. Baldwin J
    7. Devon K
    8. Dewar K
    9. Doyle M
    10. FitzHugh W
    11. Funke R
    12. Gage D
    13. Harris K
    14. Heaford A
    15. Howland J
    16. Kann L
    17. Lehoczky J
    18. LeVine R
    19. McEwan P
    20. McKernan K
    21. Meldrim J
    22. Mesirov JP
    23. Miranda C
    24. Morris W
    25. Naylor J
    26. Raymond C
    27. Rosetti M
    28. Santos R
    29. Sheridan A
    30. Sougnez C
    31. Stange-Thomann Y
    32. Stojanovic N
    33. Subramanian A
    34. Wyman D
    35. Rogers J
    36. Sulston J
    37. Ainscough R
    38. Beck S
    39. Bentley D
    40. Burton J
    41. Clee C
    42. Carter N
    43. Coulson A
    44. Deadman R
    45. Deloukas P
    46. Dunham A
    47. Dunham I
    48. Durbin R
    49. French L
    50. Grafham D
    51. Gregory S
    52. Hubbard T
    53. Humphray S
    54. Hunt A
    55. Jones M
    56. Lloyd C
    57. McMurray A
    58. Matthews L
    59. Mercer S
    60. Milne S
    61. Mullikin JC
    62. Mungall A
    63. Plumb R
    64. Ross M
    65. Shownkeen R
    66. Sims S
    67. Waterston RH
    68. Wilson RK
    69. Hillier LW
    70. McPherson JD
    71. Marra MA
    72. Mardis ER
    73. Fulton LA
    74. Chinwalla AT
    75. Pepin KH
    76. Gish WR
    77. Chissoe SL
    78. Wendl MC
    79. Delehaunty KD
    80. Miner TL
    81. Delehaunty A
    82. Kramer JB
    83. Cook LL
    84. Fulton RS
    85. Johnson DL
    86. Minx PJ
    87. Clifton SW
    88. Hawkins T
    89. Branscomb E
    90. Predki P
    91. Richardson P
    92. Wenning S
    93. Slezak T
    94. Doggett N
    95. Cheng JF
    96. Olsen A
    97. Lucas S
    98. Elkin C
    99. Uberbacher E
    100. Frazier M
    101. Gibbs RA
    102. Muzny DM
    103. Scherer SE
    104. Bouck JB
    105. Sodergren EJ
    106. Worley KC
    107. Rives CM
    108. Gorrell JH
    109. Metzker ML
    110. Naylor SL
    111. Kucherlapati RS
    112. Nelson DL
    113. Weinstock GM
    114. Sakaki Y
    115. Fujiyama A
    116. Hattori M
    117. Yada T
    118. Toyoda A
    119. Itoh T
    120. Kawagoe C
    121. Watanabe H
    122. Totoki Y
    123. Taylor T
    124. Weissenbach J
    125. Heilig R
    126. Saurin W
    127. Artiguenave F
    128. Brottier P
    129. Bruls T
    130. Pelletier E
    131. Robert C
    132. Wincker P
    133. Smith DR
    134. Doucette-Stamm L
    135. Rubenfield M
    136. Weinstock K
    137. Lee HM
    138. Dubois J
    139. Rosenthal A
    140. Platzer M
    141. Nyakatura G
    142. Taudien S
    143. Rump A
    144. Yang H
    145. Yu J
    146. Wang J
    147. Huang G
    148. Gu J
    149. Hood L
    150. Rowen L
    151. Madan A
    152. Qin S
    153. Davis RW
    154. Federspiel NA
    155. Abola AP
    156. Proctor MJ
    157. Myers RM
    158. Schmutz J
    159. Dickson M
    160. Grimwood J
    161. Cox DR
    162. Olson MV
    163. Kaul R
    164. Raymond C
    165. Shimizu N
    166. Kawasaki K
    167. Minoshima S
    168. Evans GA
    169. Athanasiou M
    170. Schultz R
    171. Roe BA
    172. Chen F
    173. Pan H
    174. Ramser J
    175. Lehrach H
    176. Reinhardt R
    177. McCombie WR
    178. de la Bastide M
    179. Dedhia N
    180. Blöcker H
    181. Hornischer K
    182. Nordsiek G
    183. Agarwala R
    184. Aravind L
    185. Bailey JA
    186. Bateman A
    187. Batzoglou S
    188. Birney E
    189. Bork P
    190. Brown DG
    191. Burge CB
    192. Cerutti L
    193. Chen HC
    194. Church D
    195. Clamp M
    196. Copley RR
    197. Doerks T
    198. Eddy SR
    199. Eichler EE
    200. Furey TS
    201. Galagan J
    202. Gilbert JG
    203. Harmon C
    204. Hayashizaki Y
    205. Haussler D
    206. Hermjakob H
    207. Hokamp K
    208. Jang W
    209. Johnson LS
    210. Jones TA
    211. Kasif S
    212. Kaspryzk A
    213. Kennedy S
    214. Kent WJ
    215. Kitts P
    216. Koonin EV
    217. Korf I
    218. Kulp D
    219. Lancet D
    220. Lowe TM
    221. McLysaght A
    222. Mikkelsen T
    223. Moran JV
    224. Mulder N
    225. Pollara VJ
    226. Ponting CP
    227. Schuler G
    228. Schultz J
    229. Slater G
    230. Smit AF
    231. Stupka E
    232. Szustakowki J
    233. Thierry-Mieg D
    234. Thierry-Mieg J
    235. Wagner L
    236. Wallis J
    237. Wheeler R
    238. Williams A
    239. Wolf YI
    240. Wolfe KH
    241. Yang SP
    242. Yeh RF
    243. Collins F
    244. Guyer MS
    245. Peterson J
    246. Felsenfeld A
    247. Wetterstrand KA
    248. Patrinos A
    249. Morgan MJ
    250. de Jong P
    251. Catanese JJ
    252. Osoegawa K
    253. Shizuya H
    254. Choi S
    255. Chen YJ
    256. Szustakowki J
    257. International Human Genome Sequencing Consortium
    (2001) Initial sequencing and analysis of the human genome
    Nature 409:860–921.

Decision letter

  1. Karla Kirkegaard
    Senior and Reviewing Editor; Stanford University School of Medicine, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors' note: this paper was reviewed by Review Commons.]

Acceptance summary:

We appreciate the well-supported notion that retrocopying rather than de novo integration has contributed to the evolution of the APOBEC family, and judge that this manuscript represents a significant addition to the field.


Author response

We thank the reviewers and the editor for the insightful and thorough assessment of our manuscript. In this response to review letter, we have listed the original review and responded to each critique after it.

Reviewer #1 (Evidence, reproducibility and clarity):

Yang et al. submitted a manuscript describing the detection of pseudogenes ("retrocopies") of APOBEC3 (A3) genes in primates. The evolutionary history and relationship to specific A3s was analyzed and speculated that the maintained A3 retrocopies had a functional role at least early in the evolution and some may have still now. Functional data on some of the expressed retrocopies are presented on L1 and HIV.

The authors claim that "retrocopying expands the functional repertoire of A3 antiviral proteins in primates". While almost of the genetic findings were published recently (Ito et al., 2020), the authors should more clearly describe how their data differ or confirm the data of Ito et al.

We thank the reviewer for their helpful comments which have guided revisions to our manuscript. We have taken steps to clarify the dramatic differences between our work and the recent publication from Ito, Gifford, and Sato.

Foremost, we respectfully disagree with the reviewer that the genetic findings in our work were contained within the Ito, et al. manuscript. Using a computational screen of assembled mammalian genome, the Sato group catalogued the gain and loss of APOBEC3 genes during the evolution of mammals. They found a fascinating correlation between the dynamics of A3s and ERVs that formed the precis of the paper. From their genome-wide search for A3s, Ito et al. describe several retrocopies of A3s in two New World monkey species, one of which retains a full-length open reading frame, leading to the statement that this gene may be functional.

We note that the retrocopies found in the Ito et al. paper span only two of the more than 20 species in which we identify A3 retrocopies. Further, as a result of the breadth of our search for A3s, we find additional retrocopies in the same two New World monkey species that were examined in the Ito et al. paper. Finally, our study also examined functional capabilities of these additional A3s. These differences are highlighted by reviewer #3 who writes that relative to Ito et al., our manuscript studies the phenomenon of A3 retrocopies “more deeply both by in silico analyses and cell culture experiments.” Reviewer #3 also summarizes the most important difference in our studies – our work presents a “conceptual advance that the antiviral gene expansion has achieved not only via tandem gene duplication but also via gene retrocopying”.

Lastly, we want to point out that the findings of our manuscript and Ito et al., 2020 were made concurrently. Indeed, throughout the preparation process of this manuscript, we were both aware of each other’s findings and shared preprints with each other. Most of the participating journals in Review Commons have “scoop protection” mechanisms that typically extend 6 months after the publication of the first article (Ito et al. was published Jan 2020), and our article was first submitted to Review Commons on February 14, 2020. Therefore, we feel confident that the “no scoop” policy applies to the minimal overlap between our paper and that of Ito et al.

Nevertheless, we have modified the text to more clearly acknowledge the parallel finding of some New World monkey retrogenes in the Ito, et al. paper.

The functional data (Figure 6) are interesting, but in the current form not complete. The authors have to show protein expression in the transfected cells (A3, L1, HIV) and level of encapsidation into viral particles. In addition, please analyze if the retrocopies express cytidine deaminase active enzymes.

We thank the reviewer for this comment, and we have added a Western blot of the six long-ORF-containing retrocopies as Figure 6—figure supplement 1. In this blot (from early in the project), we detected protein production in 293T cells for 3/6 retrocopies. In later optimizations of subsets of this blot, we were able to detect expression of the marmoset A3G and the other two marmoset retrocopies (marmoset-2 and marmoset-4). Despite optimization attempts, we were unable to detect protein for one of the retrocopies that restricts HIV-1ΔVif (capuchin-C1). Unfortunately, at this time the included blot is the only one we have in which all 6 constructs are included on a single blot. Optimally, all 6 constructs would be side-by-side in a single blot with optimized conditions, and we are happy to complete this experiment as soon as we are able to return to our lab after the SARS-2 quarantine is lifted. However, we think the added blot shows that some of the retrocopies produce protein and the absence of detectable protein from capuchin-C1 could suggest that this retrocopy is especially potent in its restriction function or an idiosyncratic problem with detecting this protein using Western blot analyses.

We have not previously tested our lentiviral particles for levels of encapsidation of protein from each retrocopy. The value we see in this experiment is in explaining why some of the retrocopies that are expressed in producer cells may not restrict in target cells. While we note that precedent in the literature suggests that A3 proteins which restrict HIV-1ΔVif are invariably encapsidated, we would be happy to carry out this experiment when our lab reopens.

In response to the reviewer’s request to test deaminase activity for each retrocopy, we note that Figure 4 shows the intactness of the deaminase motif in each retrocopy. However, we feel that a description of the mechanisms of restriction of these retrocopies is not a major point of this paper and is beyond the scope of the current investigation.

Reviewer #1 (Significance):

Minor advance compared to Ito et al., 2020.

We respectfully and rigorously disagree with this assessment. Please refer back to the reviewer’s first comment. We defer, again, to reviewer 3’s assessment that our work presents a “conceptual advance that the antiviral gene expansion has achieved not only via tandem gene duplication but also via gene retrocopying”. Moreover, we must point out that the Ito et al., 2020 paper was entirely computational; indeed, several retrogenes that could computationally be predicted to be “dead” were confirmed by us as having antiviral activity.

Reviewer #2 (Evidence, reproducibility and clarity):


Yang et al. study the expansion of APOBEC3 (A3) cytidine deaminases genes in primates. Authors find A3 retrocopies in several lineages in primates using Blast searches. Some are old and some are species specific. Some have disablements and some have intact ORFs. Authors study their mode of evolution, expression and functionality. Authors have performed detailed analyses including functional analyses. Some A3 retrocopies are broadly expressed and some have retained ability to restrain retroelements. I agree with the authors that their data supports that retrocopying has contributed the turnover in the repertoire of host retroelement restriction factors. Authors show that some retrocopies have remained active for long periods of time and they still show that they can restrict retroelements/retrovirus. This work provides an interesting example of immune system diversification. This study of the A3 family of proteins that are part of the vertebrate innate immune system and the data supporting turnover of these kind of immune system genes is strong. The work underscores that this is a way immunity genes evolve and it has parallels in the evolution of the TRIM gene family of immune genes. I just have a few comments. I think the work can gain from analyzing some aspects of the data in more detail and presenting the big picture in a summary table, even if it is just supplementary.

Major comments:

1) A3I is in many species. Does this mean it was preserved (i.e., functional for a while)? For how long have disabling mutations been accumulating? Can we get a sense of that? Even for other retrocopies, do we have a sense of how recent has the pseudogenization been? If it is very recent that means that the gene was active until not long ago.

Our analyses suggest that A3I was born in the common ancestor of simian primates and pseudogenized before the Catarrhini/Platyrrhini split. It is possible that A3I was functional within this extended period (~12-15 million years), but the presence of a shared truncating stop codon amongst all simian A3Is suggests the gene was no longer full-length at the time of diversification of the simians. Instead, the simian LCA likely encoded an A3I with a predicted ORF of 261 codons; if this truncated ORF were functional, it was then further truncated/pseudogenized with additional frame-breaking mutations which follow the phylogeny of primates.

We estimated the timeline of pseudogenization of each retrocopy using the species distribution of each syntenic retrocopy. We also note that we find full-length ORFs in three retrocopies which have been retained for a period of time at least as long as the age of the last common ancestor of the four New World monkeys. These old but intact retrocopies motivated our simulations of ORF retention rates (Figure 5).

2) In the PAML analyses test could be performed to test if the rate of evolution that are higher or lower than 1 for particular genes are actually significantly higher or lower than 1 for the particular gene comparing the likelihoods of the modes with the given rate with the one with the rate fixed to 1. Is there enough power to do this?

We thank the reviewer for pointing out this omission in our analysis. We did perform these tests and find a significant p-value for two of the nodes p=0.058 and p=0.025 respectively). We have updated the legend for Figure 5—figure supplement 1 to incorporate these p-values.

3) It seems to me that the synteny data Figure 2—figure supplement 1 reveals they are derived from independent retroposition events and not duplications of segments because those would include flanking genes. Is this correct? Authors could comment on that.

Yes, we think that each retrocopy we show in Figure 2—figure supplement 1 is likely created via an independent retrotransposition event. We have clarified in the text that Figure 2—figure supplement 1 shows the genes used to establish synteny to support orthology of the retrocopies shared amongst multiple species and that each of these ortholog groups presumably originated via distinct retrotransposition events.

4) In Figure 5—figure supplement 1, I am not sure why orthologous genes are not grouped together in the phylogeny and why p is smaller than 0.05. How should that figure, and the probability be interpreted?

We thank the reviewer for their comments on this figure. First, the reviewer identified an error in the tree in which the branch labels for “night monkey-C2” and “night monkey-SS1” were inadvertently switched. The corrected tree now follows the pattern expected by the reviewer. Second, we employed RELAX to “determine whether selective strength was relaxed or intensified in one of these subsets relative to the other” (Wertheim, et al., 2014). In this case, the p-value corresponds to the finding that the retrocopies (test branches) show intensification of selection relative to the intron-containing A3Gs (reference branches).

We have modified Figure 5—figure supplement 1 and the associated text to more clearly explain the specific hypothesis test we report.

5) It would be good to have a summary table that summarizes what genes have support for past or current functionality (preservation for long time or recent pseudogenization, expression, purifying or positive selection, ability to restrict retroelements) and in what lineages.

We agree with this reviewer suggestion. We have added the additional information including the number of frame disrupting mutations as a measure of age, intactness, and ability to restrict retroelements to Supplementary file 1. Thanks to this suggestion, Supplementary file 1 now serves as the master table to summarize the analyses of each retrocopy.

Reviewer #2 (Significance):

This work provides an interesting example of immune system diversification. Authors study the APOBEC3 family of proteins that is part of the vertebrate innate immune system and the data supporting turnover of these kind of immune system genes. The work underscores that this is a way immunity genes evolve and it has parallels in the evolution of the TRIM gene family of immune genes revealing patterns in the mode of evolution of immunity genes. The audience of this work will be people interested in evolution of immunity, arms races and gene diversification and all evolutionary biologists interested in adaptation. I work in the field of comparative genomics and molecular evolution.

Reviewer #3 (Evidence, reproducibility and clarity):


This manuscript by Yang et al. is a well-written, intriguing paper highlighting the evolutionary significance of the gene creation via "retrocopying". The authors investigated the expansion of antiviral A3 genes via retrocopy in Primates and found that A3G-like retrocopies have been generated repeatedly during primate evolution. A part of A3 retrocopies found in New World monkeys retained full length open reading flames and anti-lentiviral capacities. Interestingly, the spectrum of anti-retroelement activity of A3 retrocopies was different from the original (i.e., intron-containing) A3G gene in these species, suggesting the occurrence of the functional differentiation followed by gene amplification. However, one of the main findings that many A3 retrocopies are present in New World monkey is in-line to a previous report (i.e., Ito et al., 2020), and the experimental validations were based on the human (not New World monkey's) retroelements. Nevertheless, this study deeply investigated the possible importance of A3 retrocopies for the host defense system evolution both by in silico analyses and cell culture experiments. This study provides the findings that can potentially expand our knowledge on the evolutionary arms races between retroelements and the hosts.


To strengthen the impact of this work, it would be better to increase the numbers of retroviruses in which the anti-retroviral capacities are investigated. I understand that it is difficult to examine retroviruses or L1s that are colonized naturally with New World monkeys, but I suppose it is not so difficult to investigate a variety of representative retroviruses such as murine leukemia virus (MLV) or the reconstructed human endogenous retrovirus K (HERV-Kcon). This additional experiment would be helpful to highlight that the spectrum of anti-retroviral activity of A3 retrocopies is divergent from the original A3G gene in these species and strengthen the concept to be proposed by this study.

The reviewer raises a fascinating question about whether retrocopies might have different restriction abilities relative to the other A3s in a given species. First, we feel that showing activity against one pathogen is sufficient for our claim that some of the A3 retrocopies have antiviral potential. Second, we discuss in the paper the idea that HIV-1 is not the actual target of these (or any) innate immune genes in New world primates. We argue that any other targets we might test would also be surrogates for the “true” target of these genes.


1) Since the authors found the expansion of "functional" repertoire of A3 retrocopies specifically in New World monkey, it would be better to rephrase the title as

"Retrocopying expands the functional repertoire of APOBEC3 antiviral proteins in New World monkeys".

We thank the reviewer for this comment but point out that a large portion of our manuscript presents our work on primates outside the New World monkeys. The reviewer is correct to note that our finding of restriction activity is limited to New World monkey retrocopies, but we feel that the current title will attract a broader audience and reflects the broader relevance of this work.

2) It might be better to add a figure summarizing which A3 retrocopies in which species retain nearly full length ORFs. For example, how about making a figure like Figure 2A for all the four representative New World monkey species?

We agree. We have added the length of the longest ORF for each retrocopy to Supplementary file 1.

3) Figure 3. It would be helpful to clarify that which cell of the heatmap corresponds to the intact A3 retrocopies.

We have added labels to indicate the intact A3 retrocopies and adjusted the legend accordingly.

4) Introduction, paragraph four. It would be better to replace the word "protected" with "escaped" because this retrocopy subset should include the ones that are intact but not functional.

Changed as suggested.

5) Introduction, final paragraph. It would be better to rephrase "the common ancestor of mammals" as "the common ancestor of placental mammals" because A3 gene is absent in Marsupial.

Changed as suggested.

6) Introduction, final paragraph. Please rephrase "ongoing" as "recently-occurred".

Changed as suggested.

7) Results, paragraph three. I checked the multiple sequence alignment in Supplementary file 3 and suspect that the codon (alignment) position of the shared premature stop codon is 261 (not 264).

We thank the reviewer for pointing out this discrepancy. We have revised the text to reflect the correct position of the shared stop.

8) Results paragraph three. I could not understand the meaning of the sentence "Intriguingly, one lineage-specific mutation…". Please specify the position of mutation which the authors mentioned (in Supplementary file 3 or Figure 1B).

This portion of the text refers to a reversion of a stop codon in the orangutan A3I; specifically, the stop codon shared in all simians acquired a second mutation that created a longer ORF in only this species. We have removed this sentence from the text for the sake of clarity.

9) Subsection “Retention of putatively functional NWM A3G retrogenes” paragraph five

Please refer Figure 5—figure supplement 1 here.

Changed as suggested.

10) Subsection “Retention of putatively functional NWM A3G retrogenes” paragraph five

Please say "Significant relaxed selection was not detected" rather than "Our analysis detected no relaxation…".

Changed as suggested.

11) Figure 5—figure supplement 1 indicates "p=0.015", but the authors regard it as "not significant"?

We thank the reviewer for pointing out this confusing wording. We employ RELAX to “determine whether selective strength was relaxed or intensified in one of these subsets relative to the other” (Wertheim, et al., 2014). In this case, the p-value corresponds to the finding that the retrocopies show intensification of selection.

We have modified Figure 5—figure supplement 1 to more clearly explain the specific hypothesis test for this p-value. We have also modified the text to clarify this point.

12) Subsection “Retention of putatively functional NWM A3G retrogenes” paragraph five

Please here refer the data showing the claim "Instead, these A3G retrocopies have evolved more rapidly than…".

Changed as suggested; see previous point.

13) Did the authors perform the statistical test on the dN/dS ratio analysis? If so, please mention the result of the test.

Yes, we did. Please refer to reviewer #2’s major point 3.

14) It would be better to modify the phrase "show evidence of recurrent selection for functional innovation".

Changed as suggested.

Reviewer #3 (Significance):

This study provides a conceptual advance that the antiviral gene expansion has achieved not only via tandem gene duplication but also via gene retrocopying.

Compare to existing published knowledge.

Although one of the main findings that many A3 retrocopies are present in New World monkey is in-line to a previous report (i.e., Ito et al., 2020), this study investigated the above finding more deeply both by in silico analyses and cell culture experiments.


Evolutionary biologists and researchers in the field of viruses (particularly retroviruses including HIV-1) and transposable elements would be interested in this work.

Your expertise.

Bioinformatics, genome biology, viruses, and transposable elements.


Article and author information

Author details

  1. Lei Yang

    Pacific Northwest Research Institute, Seattle, United States
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9284-1744
  2. Michael Emerman

    Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, United States
    Conceptualization, Supervision, Funding acquisition, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Harmit S Malik

    1. Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, United States
    2. Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, United States
    Conceptualization, Supervision, Funding acquisition, Methodology, Writing - review and editing
    For correspondence
    Competing interests
    No competing interests declared
  4. Richard N McLaughlin Jnr

    1. Pacific Northwest Research Institute, Seattle, United States
    2. Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, United States
    Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0950-2253


National Institute of General Medical Sciences (GM112941)

  • Richard N McLaughlin Jr

Helen Hay Whitney Foundation

  • Richard N McLaughlin Jr

National Institute of Allergy and Infectious Diseases (AI3092)

  • Michael Emerman

G. Harold and Leila Y. Mathers Foundation

  • Harmit S Malik

National Institute of General Medical Sciences (GM074108)

  • Harmit S Malik

Howard Hughes Medical Institute

  • Harmit S Malik

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


We thank members of the Malik, Emerman, and McLaughlin labs for valuable discussions. We especially thank Janet Young for comments and suggestions critical for the generation of this manuscript, and Lily Wu for technical training and consultation on virus restriction assays. This work was supported by a Howard Hughes Medical Institute postdoctoral fellowship of the Helen Hay Whitney Foundation, a National Institute of General Medical Sciences (NIGMS) at the National Institutes of Health (NIH) K99/R00 Pathway to Independence Award (grant number GM112941) to RNM; National Institute of Allergy and Infectious Disease at the NIH R01 (grant number AI3092) to ME; grants from the Mathers Foundation and an NIGMS at the NIH R01 (grant number GM074108) to HSM. HSM is an Investigator of the Howard Hughes Medical Institute.

Senior and Reviewing Editor

  1. Karla Kirkegaard, Stanford University School of Medicine, United States

Publication history

  1. Received: May 1, 2020
  2. Accepted: May 13, 2020
  3. Version of Record published: June 1, 2020 (version 1)


© 2020, Yang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 1,158
    Page views
  • 143
  • 6

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    Joshua T Washington et al.
    Research Article

    Given the importance of DNA methylation in protection of the genome against transposable elements and transcriptional regulation in other taxonomic groups, the diversity in both levels and patterns of DNA methylation in the insects raises questions about its function and evolution. We show that the maintenance DNA methyltransferase, DNMT1, affects meiosis and is essential to fertility in milkweed bugs, Oncopeltus fasciatus, while DNA methylation is not required in somatic cells. Our results support the hypothesis that Dnmt1 is required for the transition of germ cells to gametes in O. fasciatus and that this function is conserved in male and female gametogenesis. They further suggest that DNMT1 has a function independent of DNA methylation in germ cells. Our results raise the question of how a gene so critical in fitness across multiple insect species can have diverged widely across the insect tree of life.

    1. Chromosomes and Gene Expression
    2. Evolutionary Biology
    Rachel A Johnston et al.
    Research Article

    In some mammals and many social insects, highly cooperative societies are characterized by reproductive division of labor, in which breeders and nonbreeders become behaviorally and morphologically distinct. While differences in behavior and growth between breeders and nonbreeders have been extensively described, little is known of their molecular underpinnings. Here, we investigate the consequences of breeding for skeletal morphology and gene regulation in highly cooperative Damaraland mole-rats. By experimentally assigning breeding 'queen' status versus nonbreeder status to age-matched littermates, we confirm that queens experience vertebral growth that likely confers advantages to fecundity. However, they also up-regulate bone resorption pathways and show reductions in femoral mass, which predicts increased vulnerability to fracture. Together, our results show that, as in eusocial insects, reproductive division of labor in mole-rats leads to gene regulatory rewiring and extensive morphological plasticity. However, in mole-rats, concentrated reproduction is also accompanied by costs to bone strength.