(A) Schematic phylogenetic tree of eukaryotes, with a focus on the Holozoa. The adjacent table summarizes genome assembly/annotation statistics. Data sources: red asterisks denote Teretosporea …
Table of genome structure statistics, from the data-set of eukaryotic genomes used in the study.
List of genome and transcriptome assemblies and annotations, including abbreviations, taxonomic classification and data sources.
Used in Figure 1.
Dots around the diagonal lines indicate that orthologs from both organisms have identical lengths. Note that Abeoforma and Pirum have abundant incomplete orthologous sequences.
Phylogenomic analysis of the BVD57 taxa matrix. Tree topology is the consensus of two Markov chain Monte Carlo chains run for 1231 generations, saving every 20 trees and after a burn-in of 32%. …
BVD57 phylogenomic dataset (Torruella et al., 2015) including 87 unaligned protein domains (with PFAM accession number) per species.
Used in Fig. 2.
(A) Genome size and composition in terms of coding exonic, intronic and intergenic sequences of unicellular holozoan and selected metazoans. Percentage of repetitive sequences shown as black bars. …
Annotated repetitive sequences from 10 unicellular Holozoa genomes.
Includes transposable elements, simple repeats, low complexity regions and small RNAs. Used in Figure 3.
List of annotated transposable element families in 10 unicellular Holozoa genomes, with copy counts.
Used in Figure 3.
List of annotated transposable element families shared between the genomes of 10 unicellular holozoans and 11 animals, including the number of species where the TE family is present.
(A-J) Profile of transposable element (TE) composition of 10 unicellular Holozoa, including (i) distribution of sequence similarity frequencies within the TE complement obtained from BLAST …
(A) Pattern of presence/absence of TE families across Holozoa (11 animals and 10 unicellular holozoans). Dendrogram at the left represents the sorting of TE families by Euclidean distance and Ward …
Species are manually ordered by taxonomic classification (no clustering).
(A) Distribution of intron lengths and number of introns per gene in selected eukaryote genomes. Dots represent median intron lengths and vertical lines delimit the first and third quartiles. Color …
(A) Rates of intron gain and loss per lineage, including extant genomes and ancestral reconstructed nodes. Diameter and color of circles denote the number of introns per kbp of coding sequence at …
Rates of gain and loss of intron sites for extant and ancestral eukaryotes, calculated for a rates-across-sites Markov model for intron evolution with branch-specific gain and loss rates (Csurös, 2008).
Used in Figure 5.
Reconstruction of intron site evolutionary histories, using a rates-across-sites Markov model for intron evolution, with branch-specific gain and loss rates (Csurös, 2008).
Reconstruction of the evolution of the NMD machinery (He and Jacobson, 2015) and key SR splicing factors (Plass et al., 2008).
Used in Figure 5.
Grey boxes denote aligned amino acids with gaps (dashed lines). Intron sites (vertical lines) are conserved if they are present in various organisms at the same alignment position and codon phase. …
(A) Phylogenetic distribution of the NMD molecular toolkit across eukaryotes, as defined in Whelan et al. (2015), with a focus on unicellular holozoans and selected metazoans. The analysis includes …
(A) Heatmap representing presence/absence of 4312 intron sites (columns) from extant and ancestral holozoan genomes, plus the line of ascent to the LECA (rows). Intron sites and genomes have been …
(A) Protein domain combination gain and loss per lineage, including extant genomes and ancestral reconstructed nodes. Diameter and color of circles denote the number of different domain combinations …
Rates of gain and loss of protein domain pairs within a given orthogroup for extant and ancestral eukaryotes, calculated for a phylogenetic birth-and-death probabilistic model that accounts for gains, losses and duplications (Csurös, 2010).
Reconstruction of the evolutionary histories of protein domain pairs gains within orthogroups, using a phylogenetic birth-and-death probabilistic model that accounts for gains, losses and duplications (Csurös, 2010).
Reconstruction of the evolutionary histories of individual protein domains, using Dollo parsimony and accounting for gains and losses (Csurös, 2010).
Rates of gain and loss of orthogroups for extant and ancestral eukaryotes, using a phylogenetic birth-and-death probabilistic model that accounts for gains, losses and duplications.
Used in Figures 1.
(A) Ancestral reconstruction of gains (green) and losses (red) of protein domains per lineage, based on Dollo parsimony. Note that, in contrast with the evolution of protein domain combinations here …
(A and B) Modularity and community size of the global network of domain pairs (upper panels) and the TF subnetwork (lower panels), with ≥90% probability. The modularity parameter measures the …
(A–D) Modularity and community size of the functional sub-networks based on domains related to signaling (Richter and King, 2013), ubiquitination (Grau-Bové et al., 2015), ECM (Richter and King, 2013…
(A and B) Protein domain co-occurrence matrices of transcription factor (TF) (A) or extracellular matrix (ECM)-related gene families (B), inferred at the LCA of Metazoa (≥90% probability). …
First and second columns indicate the TF family and its inferred evolutionary origin, respectively (from [de Mendoza et al., 2013]). Subsequent columns list (i) the p-value of a Fisher's exact test …
Probability of emergence of protein domain combinations present in the LCA of Metazoa in previous ancestral nodes (from LCA of Metazoa to LCA of Unikonta/Amorphea).
Only protein domain combinations with >90% presence probability in the LCA of Metazoa were included. Protein domain combinations that are not gained with >90% probability in any of the surveyed nodes have been associated with LECA or pre-LECA origins (‘1’ values in the ‘LECA or before’ field). Used in Figure 10.