Dynamics of genomic innovation in the unicellular ancestry of animals

  1. Xavier Grau-Bové  Is a corresponding author
  2. Guifré Torruella
  3. Stuart Donachie
  4. Hiroshi Suga
  5. Guy Leonard
  6. Thomas A Richards
  7. Iñaki Ruiz-Trillo  Is a corresponding author
  1. Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Catalonia, Spain
  2. Universitat de Barelona, Catalonia, Spain
  3. Université Paris-Sud/Paris-Saclay, AgroParisTech, France
  4. University of Hawai'i at Mānoa, United States
  5. Prefectural University of Hiroshima, Japan
  6. University of Exeter, United Kingdom
  7. ICREA, Passeig Lluís Companys, Catalonia, Spain
10 figures

Figures

Figure 1 with 1 supplement
Evolutionary framework and genome statistics of the study.

(A) Schematic phylogenetic tree of eukaryotes, with a focus on the Holozoa. The adjacent table summarizes genome assembly/annotation statistics. Data sources: red asterisks denote Teretosporea …

https://doi.org/10.7554/eLife.26036.003
Figure 1—source data 1

Table of genome structure statistics, from the data-set of eukaryotic genomes used in the study.

Includes genome size and portion of the genome covered by genes, exons, introns and intergenic regions. Used in Figures 1, 3 and 4.

https://doi.org/10.7554/eLife.26036.004
Figure 1—source data 2

List of genome and transcriptome assemblies and annotations, including abbreviations, taxonomic classification and data sources.

Used in Figure 1.

https://doi.org/10.7554/eLife.26036.005
Figure 1—figure supplement 1
Comparisons of gene length of one-to-one orthologs from pair-wise comparisons of all 10 unicellular Holozoa.

Dots around the diagonal lines indicate that orthologs from both organisms have identical lengths. Note that Abeoforma and Pirum have abundant incomplete orthologous sequences.

https://doi.org/10.7554/eLife.26036.006
Figure 2 with 1 supplement
Phylogenomic tree of Unikonta/Amorphea.

Phylogenomic analysis of the BVD57 taxa matrix. Tree topology is the consensus of two Markov chain Monte Carlo chains run for 1231 generations, saving every 20 trees and after a burn-in of 32%. …

https://doi.org/10.7554/eLife.26036.007
Figure 2—source data 1

BVD57 phylogenomic dataset (Torruella et al., 2015) including 87 unaligned protein domains (with PFAM accession number) per species.

Used in Fig. 2.

https://doi.org/10.7554/eLife.26036.008
Figure 2—figure supplement 1
Phylogenomic analysis of the BVD57 matrix using (A) IQ-TREE maximum likelihood and the LG + R7+C60 model (supports are SH-like approximate likelihood ratio test/UFBS, respectively); (B) IQ-TREE maximum likelihood and the LG + R7+PMSF model (fast CAT approximation; non-parametric bootstrap supports); and (C) Phylobayes Bayesian inference under the LG+Γ7 + CAT model (BPP supports).
https://doi.org/10.7554/eLife.26036.009
Figure 3 with 3 supplements
Patterns of genome evolution across unicellular Holozoa.

(A) Genome size and composition in terms of coding exonic, intronic and intergenic sequences of unicellular holozoan and selected metazoans. Percentage of repetitive sequences shown as black bars. …

https://doi.org/10.7554/eLife.26036.010
Figure 3—source data 1

Annotated repetitive sequences from 10 unicellular Holozoa genomes.

Includes transposable elements, simple repeats, low complexity regions and small RNAs. Used in Figure 3.

https://doi.org/10.7554/eLife.26036.011
Figure 3—source data 2

List of annotated transposable element families in 10 unicellular Holozoa genomes, with copy counts.

Used in Figure 3.

https://doi.org/10.7554/eLife.26036.012
Figure 3—source data 3

List of annotated transposable element families shared between the genomes of 10 unicellular holozoans and 11 animals, including the number of species where the TE family is present.

Three lists are included: all TE families present in any holozoan, a list restricted to the most abundant TE families accounting for 75% of all copies in each holozoan (P75f statistic; see Figure 3B), and id. for 25% copies (P25f statistic). Used in Figure 3.

https://doi.org/10.7554/eLife.26036.013
Figure 3—figure supplement 1
Profile of TE composition of unicellular Holozoa.

(A-J) Profile of transposable element (TE) composition of 10 unicellular Holozoa, including (i) distribution of sequence similarity frequencies within the TE complement obtained from BLAST …

https://doi.org/10.7554/eLife.26036.014
Figure 3—figure supplement 2
Shared TEs between unicellular Holozoa and animal genomes.

(A) Pattern of presence/absence of TE families across Holozoa (11 animals and 10 unicellular holozoans). Dendrogram at the left represents the sorting of TE families by Euclidean distance and Ward …

https://doi.org/10.7554/eLife.26036.015
Figure 3—figure supplement 3
Heatmap of pairwise ratios of ortholog collinearity between 10 unicellular holozoan genomes.

Species are manually ordered by taxonomic classification (no clustering).

https://doi.org/10.7554/eLife.26036.016
Intron abundance in eukaryotes.

(A) Distribution of intron lengths and number of introns per gene in selected eukaryote genomes. Dots represent median intron lengths and vertical lines delimit the first and third quartiles. Color …

https://doi.org/10.7554/eLife.26036.017
Figure 5 with 3 supplements
Intron evolution.

(A) Rates of intron gain and loss per lineage, including extant genomes and ancestral reconstructed nodes. Diameter and color of circles denote the number of introns per kbp of coding sequence at …

https://doi.org/10.7554/eLife.26036.018
Figure 5—source data 1

Rates of gain and loss of intron sites for extant and ancestral eukaryotes, calculated for a rates-across-sites Markov model for intron evolution with branch-specific gain and loss rates (Csurös, 2008).

Used in Figure 5.

https://doi.org/10.7554/eLife.26036.019
Figure 5—source data 2

Reconstruction of intron site evolutionary histories, using a rates-across-sites Markov model for intron evolution, with branch-specific gain and loss rates (Csurös, 2008).

Used in Figures 5 and 6.

https://doi.org/10.7554/eLife.26036.020
Figure 5—source data 3

Reconstruction of the evolution of the NMD machinery (He and Jacobson, 2015) and key SR splicing factors (Plass et al., 2008).

Used in Figure 5.

https://doi.org/10.7554/eLife.26036.021
Figure 5—figure supplement 1
Classification of intron sites by conservation in protein alignments, as used in (Csűrös and Miklós, 2006; Csurös, 2008).

Grey boxes denote aligned amino acids with gaps (dashed lines). Intron sites (vertical lines) are conserved if they are present in various organisms at the same alignment position and codon phase. …

https://doi.org/10.7554/eLife.26036.022
Figure 5—figure supplement 2
Phylogenetic distribution of the NMD machinery, SR splicing factors and RNA-binding domains in eukaryotes.

(A) Phylogenetic distribution of the NMD molecular toolkit across eukaryotes, as defined in Whelan et al. (2015), with a focus on unicellular holozoans and selected metazoans. The analysis includes …

https://doi.org/10.7554/eLife.26036.023
Figure 5—figure supplement 3
Phylogenetic analysis of (A) eIF4A3, (B) Smg5/6/7, and (C) eRF3, using Maximum likelihood in IQ-TREE (supports are SH-like approximate likelihood ratio test/UFBS, respectively); including Bayesian inference supports for the ortologous groups of interest (BPP statistical supports, in red).
https://doi.org/10.7554/eLife.26036.024
Profile of intron site presence across eukaryotes.

(A) Heatmap representing presence/absence of 4312 intron sites (columns) from extant and ancestral holozoan genomes, plus the line of ascent to the LECA (rows). Intron sites and genomes have been …

https://doi.org/10.7554/eLife.26036.025
Figure 7 with 1 supplement
Evolution of protein domain architectures.

(A) Protein domain combination gain and loss per lineage, including extant genomes and ancestral reconstructed nodes. Diameter and color of circles denote the number of different domain combinations …

https://doi.org/10.7554/eLife.26036.026
Figure 7—source data 1

Rates of gain and loss of protein domain pairs within a given orthogroup for extant and ancestral eukaryotes, calculated for a phylogenetic birth-and-death probabilistic model that accounts for gains, losses and duplications (Csurös, 2010).

Used in Figures 7, 8 and 9.

https://doi.org/10.7554/eLife.26036.027
Figure 7—source data 2

Reconstruction of the evolutionary histories of protein domain pairs gains within orthogroups, using a phylogenetic birth-and-death probabilistic model that accounts for gains, losses and duplications (Csurös, 2010).

Used in Figures 7, 8, 9 and 10.

https://doi.org/10.7554/eLife.26036.028
Figure 7—source data 3

Reconstruction of the evolutionary histories of individual protein domains, using Dollo parsimony and accounting for gains and losses (Csurös, 2010).

Used in Figures 7, 8, 9 and 10.

https://doi.org/10.7554/eLife.26036.029
Figure 7—source data 4

Rates of gain and loss of orthogroups for extant and ancestral eukaryotes, using a phylogenetic birth-and-death probabilistic model that accounts for gains, losses and duplications.

Used in Figures 1.

https://doi.org/10.7554/eLife.26036.030
Figure 7—figure supplement 1
Gains and losses of individual protein domains across eukaryotes.

(A) Ancestral reconstruction of gains (green) and losses (red) of protein domains per lineage, based on Dollo parsimony. Note that, in contrast with the evolution of protein domain combinations here …

https://doi.org/10.7554/eLife.26036.031
Figure 8 with 1 supplement
Protein domain architecture networks.

(A and B) Modularity and community size of the global network of domain pairs (upper panels) and the TF subnetwork (lower panels), with ≥90% probability. The modularity parameter measures the …

https://doi.org/10.7554/eLife.26036.032
Figure 8—figure supplement 1
Modularity of protein domain co-occurrence networks of multicellularity-related gene sets across eukaryotes.

(A–D) Modularity and community size of the functional sub-networks based on domains related to signaling (Richter and King, 2013), ubiquitination (Grau-Bové et al., 2015), ECM (Richter and King, 2013

https://doi.org/10.7554/eLife.26036.033
Figure 9 with 1 supplement
Phylogenetic analysis of the premetazoan gene families LIM Homeobox, CBP/p300 and type IV collagen.

(A and B) Protein domain co-occurrence matrices of transcription factor (TF) (A) or extracellular matrix (ECM)-related gene families (B), inferred at the LCA of Metazoa (≥90% probability). …

https://doi.org/10.7554/eLife.26036.034
Figure 9—figure supplement 1
Phylogenetic analysis of the (A) LIM-Homeobox, (B) p300/CBP, and (C) Collagen Type IV, using Maximum likelihood in IQ-TREE (supports are SH-like approximate likelihood ratio test/UFBS, respectively) and Bayesian inference in Mr. Bayes (BPP statistical supports).
https://doi.org/10.7554/eLife.26036.035
Domain combinations that appear in transcription factor (TF) families in unicellular premetazoans, from the LCA of Unikonta/Amorphea to the LCA of Metazoa.

First and second columns indicate the TF family and its inferred evolutionary origin, respectively (from [de Mendoza et al., 2013]). Subsequent columns list (i) the p-value of a Fisher's exact test …

https://doi.org/10.7554/eLife.26036.036
Figure 10—source data 1

Probability of emergence of protein domain combinations present in the LCA of Metazoa in previous ancestral nodes (from LCA of Metazoa to LCA of Unikonta/Amorphea).

Only protein domain combinations with >90% presence probability in the LCA of Metazoa were included. Protein domain combinations that are not gained with >90% probability in any of the surveyed nodes have been associated with LECA or pre-LECA origins (‘1’ values in the ‘LECA or before’ field). Used in Figure 10.

https://doi.org/10.7554/eLife.26036.037

Download links