1. Cancer Biology
  2. Evolutionary Biology
Download icon

Pervasive duplication of tumor suppressors in Afrotherians during the evolution of large bodies and reduced cancer risk

  1. Juan M Vazquez
  2. Vincent J Lynch  Is a corresponding author
  1. Department of Human Genetics, The University of Chicago, United States
  2. Department of Biological Sciences, University at Buffalo, United States
Research Article
Cite this article as: eLife 2021;10:e65041 doi: 10.7554/eLife.65041
4 figures, 2 tables and 5 additional files


Large-bodied Afrotherians are nested within species with smaller body sizes (Tacutu et al., 2013; Puttick and Thomas, 2015).

(A) Phylogenetic relationships between Eutherian orders, examples of each order are given in parenthesis. Horizontal branch lengths are proportional to time since divergence between lineages (see scale, Millions of Ago [MYA]). The clades Atlantogenata and Boreoeutheria are indicated, the order Proboscidea is colored blue, Sirenia is colored orange, and Hyracoidea is colored red. (B) Phylogenetic relationships of extant and recently extinct Atlantogenatans with available genomes are shown along with clade names and maximum body sizes. Note that horizontal branch lengths are arbitrary, species indicated with skull and crossbones are extinct, and those in parentheses do not have genomes. The order Proboscidea is colored blue, Sirenia is colored orange, and Hyracoidea is colored red.

Convergent evolution of large-bodied, cancer resistant Afrotherians.

(A) Atlantogenatan phylogeny, with branch lengths scaled by log2 change in body size (left) or log2 change in intrinsic cancer risk (right). Branches are colored according to ancestral state reconstruction of body mass or estimated intrinsic cancer risk. Clades and lineages leading to extant Proboscideans and dwarf elephants are labeled. (B) Extant and ancestral body size (left), lifespan (middle), and estimated intrinsic cancer risk reconstructions; data are shown as mean (dot) and 95% confidence interval (CI, whiskers).

Figure 3 with 2 supplements
Pervasive duplication of tumor suppressors in Atlantogenata.

(A) Afrotherian phylogeny indicating the number of genes duplicated in each lineage, inferred by maximum likelihood with Bayesian posterior probability (BPP) ≥0.80. Branches are colored according to log2 change in body size. Inset, phylogeny with branch lengths proportional to gene expression changes per gene. (B) Upset plot of cancer related Reactome pathways enriched in each Afrotherian lineage; lineages in which the cancer pathway enrichment percentage is less than background are shown in gray. Note that Upset plots are Euler diagrams showing intersections between sets; lines indicate intersections in pathway terms between lineages connected by that line (for example, the line connecting the points for Aardvark and Tenrec indicate pathway indications for those two lineages), and empty sets are not shown. (C) Wordcloud of pathways enriched exclusively in the Proboscidean stem-lineage (purple), shared between Proboscidea and Tethytheria (blue), or shared between Proboscidea and any other lineage (green).

Figure 3—figure supplement 1
Estimated Copy Number by Coverage (ECNC) consolidates fragmented genes while accounting for missing domains in homologs.

(A) A single, contiguous gene homolog in a target genome with 100% query length coverage has an ECNC of 1.0. (B) Two contiguous gene homologs each with 100% query length coverage have an ECNC of 2.0. (C) A single gene homolog, split across multiple scaffolds and contigs in a fragmented target genome; BLAT identifies each fragment as a single hit. Per nucleotide of query sequence, there is only one corresponding nucleotide over all the hits, thus the ECNC is 1.0. (D) Two gene homologs, one fragmented and one contiguous. 100% of nucleotides in the query sequence are represented between all hits; however, every nucleotide in the query has two matching nucleotides in the target genome, thus the ECNC is 2.0. (E) One true gene homolog in the target genome, plus multiple hits of a conserved domain that span 20% of the query sequence. While 100% of the query sequence is represented in total, 20% of the nucleotides have four hits. Thus, the ECNC for this gene is 1.45. (F) Two real gene homologs; one hit is contiguous, one hit is fragmented in two, and the tail end of both sequences was not identified by BLAT due to sequence divergence. Only 75% of the query sequence was covered in total between the hits, but for that 75%, each nucleotide has two hits. As such, ECNC is equal to 2.0 for this gene.

Figure 3—figure supplement 2
Correlations between genome quality metrics and ECNC metrics.

Gene copy number metrics, and the genome quality metrics most strongly associated with them, are highlighted in red.

Duplications in the African savannah elephant (Loxodonta africana) are enriched for TP53-related and other tumor suppressor processes.

(A) Upset plot of cancer-related Reactome pathways in African savannah elephant, highlighting shared genes in each set, and the pathway class represented by the combinations (see Figure 3 for a description of Upset plots). (B) Inverted Upset plot from A showing the pathways shared by genes highlighted by WEBGESTALT in each pathway. (C) Cladogram of Afrotheria with sequenced genomes. Exemplar tumor suppressor duplicates are mapped onto lineages in which those genes are duplicated. Dots represent a duplication event of the color-coded genes. Note that we are unable to determine duplication status for some genes in Proboscideans because of assembly gaps in ancient genomes (indicated with skull and crossbones); these genes appear to be independently duplicated in extant species (African Forest, African Savanah, and Asian elephants) because they are missing from ancient genomes, biasing ancestral reconstructions of duplication status. (D) Gene expression levels of genes from panel C that have two or more expressed duplicates.

Figure 4—source data 1

Data set used for manual coding gene potential associated with Figure 4C,D.



Table 1
Genomes used in this study.
SpeciesCommon NameGenomesHighest Quality GenomeReference(s)
Choloepus hoffmanniHoffmans two-toed slothchoHof1,choHof-C_hoffmanni-2.0.1_HiCDudchenko et al., 2017
Chrysochloris asiaticaCape golden molechrAsi1mchrAsi1mGCA_000296735.1
Dasypus novemcinctusNine-banded armadillodasNov3dasNov3GCA_000208655.2
Echinops telfairiLesser Hedgehog TenrecechTel2echTel2GCA_000313985.1
Elephantulus edwardiiCape elephant shreweleEdw1meleEdw1mGCA_000299155.1
Elephas maximusAsian elephanteleMaxDeleMaxDPalkopoulou et al., 2018
Loxodonta africanaAfrican savanna elephantloxAfr3,loxAfr4ftp://ftp.broadinstitute.org/pub/assemblies/mammals/elephant/loxAfr4
Loxodonta cyclotisAfrican forest elephantloxCycFloxCycFPalkopoulou et al., 2018
Mammut americanumAmerican mastodonmamAmeImamAmeIPalkopoulou et al., 2018
Mammuthus columbiColumbian mammothmamColUmamColUPalkopoulou et al., 2018
Mammuthus primigeniusWoolly mammothmamPriVmamPriVPalkopoulou et al., 2015
Orycteropus aferAardvarkoryAfe1, oryAfe2oryAfe2Dudchenko et al., 2017
Palaeoloxodon antiquusStraight tusked elephantpalAntNpalAntNPalkopoulou et al., 2018
Procavia capensisRock hyraxproCap1, proCap2, proCap-Pcap_2.0_HiCproCap-Pcap_2.0_HiCDudchenko et al., 2017; Lindblad-Toh et al., 2011
Trichechus manatus latirostrisManateetriMan1, triManLat2triManLat2Dudchenko et al., 2017; Foote et al., 2015
Table 2
Summary of reactome pathways in Atlantogenata.
Number ofPercentageCancer pathways greater than simulated?
GenesPathwaysCancer pathwaysSimulated cancer pathways
Chrysochloris asiatica159110027.00%15.42%Yes
Echinops telfairi58710022.00%15.42%Yes
Elephantulus edwardii210310022.00%15.42%Yes
Elephas maximus943240.63%17.73%Yes
Loxodonta africana1004753.19%15.42%Yes
Loxodonta cyclotis763534.29%16.11%Yes
Mammut americanum52160.00%12.91%No
Mammuthus columbi282626.92%12.88%Yes
Mammuthus primigenius35160.00%12.28%No
Orycteropus afer50410038.00%15.42%Yes
Palaeoloxodon antiquus3580.00%12.28%No
Procavia capensis383352.86%15.42%No
Trichechus manatus4844721.28%15.42%Yes

Additional files

Source data 1

All necessary data sets and scripts to reproduce results presented in this manuscript.

Supplementary file 1

Summary of duplications in Atlantogenata.

Supplementary file 2

RNA-Seq data sets used in this study, along with key biological and genome information.

Supplementary file 3

Summary of PGLS model used to estimate lifespan.

Transparent reporting form

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)