Choanoflagellates, the closest living relatives of animals, can provide unique insights into the changes in gene content that preceded the origin of animals. However, only two choanoflagellate genomes are currently available, providing poor coverage of their diversity. We sequenced transcriptomes of 19 additional choanoflagellate species to produce a comprehensive reconstruction of the gains and losses that shaped the ancestral animal gene repertoire. We identified ~1,944 gene families that originated on the animal stem lineage, of which only 39 are conserved across all animals in our study. In addition, ~372 gene families previously thought to be animal-specific, including Notch, Delta, and homologs of the animal Toll-like receptor genes, instead evolved prior to the animal-choanoflagellate divergence. Our findings contribute to an increasingly detailed portrait of the gene families that defined the biology of the Urmetazoan and that may underpin core features of extant animals.
Raw sequencing reads have been deposited at the NCBI SRA under BioProject PRJNA419411 (19 choanoflagellate transcriptomes) and PRJNA420352 (S. rosetta polyA selection test). Transcriptome assemblies, annotations, and gene families are available on FigShare at DOI: 10.6084/m9.figshare.5686984. Transcriptome assemblies have also been submitted to the NCBI Transcriptome Shotgun Assembly database under BioProject PRJNA419411. Protocols have been deposited to protocols.io and are accessible at DOI: 10.17504/protocols.io.kwscxee.Details on the datasets available via figshare:Dataset 1. Final sets of contigs from choanoflagellate transcriptome assemblies. There is one FASTA file per sequenced choanoflagellate. We assembled contigs de novo with Trinity, followed by removal of cross-contamination that occurred within multiplexed Illumina sequencing lanes, removal of contigs encoding strictly redundant protein sequences, and elimination of noise contigs with extremely low (FPKM < 0.01) expression levels.Dataset 2. Final sets of proteins from choanoflagellate transcriptome assemblies. There is one FASTA file per sequenced choanoflagellate. We assembled contigs de novo with Trinity, followed by removal of cross-contamination that occurred within multiplexed Illumina sequencing lanes, removal of strictly redundant protein sequences, and elimination of proteins encoded on noise contigs with extremely low (FPKM < 0.01) expression levels.Dataset 3. Expression levels of assembled choanoflagellate contigs. Expression levels are shown in FPKM, as calculated by eXpress. Percentile expression rank is calculated separately for each choanoflagellate.Dataset 4. Protein sequences for all members of each gene family. This includes sequences from all species within the data set (i.e., it is not limited to the choanoflagellates we sequenced).Dataset 5. Gene families, group presences, and species probabilities. For each gene family, the protein members are listed. Subsequent columns contain inferred gene family presences in different groups of species, followed by probabilities of presence in individual species in the data set.Dataset 6. List of gene families present, gained and lost in last common ancestors of interest. A value of 1 indicates that the gene family was present, gained or lost; a value of 0 indicates that it was not. The six last common ancestors are: Ureukaryote, Uropisthokont, Urholozoan, Urchoanozoan, Urchoanoflagellate and Urmetazoan. Gains and losses are not shown for the Ureukaryote, as our data set only contained eukaryote species and was thus not appropriate to quantify changes occurring on the eukaryotic stem lineage.Dataset 7. Pfam, transmembrane, signal peptide, PANTHER and Gene Ontology annotations for all proteins. Annotations are listed for all proteins in the data set, including those not part of any gene family. Pfam domains are delimited by a tilde (~) and Gene Ontology terms by a semicolon (;). Transmembrane domains and signal peptides are indicated by the number present in the protein, followed by their coordinates in the protein sequence.Dataset 8. Pfam, transmembrane, signal peptide, PANTHER and Gene Ontology annotations aggregated by gene family. The proportion of proteins within the gene family that were assigned an annotation is followed by the name of the annotation. Multiple annotations are delimited by a semicolon (;)
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
© 2018, Richter et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Gene duplication drives evolution by providing raw material for proteins with novel functions. An influential hypothesis by Ohno (1970) posits that gene duplication helps genes tolerate new mutations and thus facilitates the evolution of new phenotypes. Competing hypotheses argue that deleterious mutations will usually inactivate gene duplicates too rapidly for Ohno’s hypothesis to work. We experimentally tested Ohno’s hypothesis by evolving one or exactly two copies of a gene encoding a fluorescent protein in Escherichia coli through several rounds of mutation and selection. We analyzed the genotypic and phenotypic evolutionary dynamics of the evolving populations through high-throughput DNA sequencing, biochemical assays, and engineering of selected variants. In support of Ohno’s hypothesis, populations carrying two gene copies displayed higher mutational robustness than those carrying a single gene copy. Consequently, the double-copy populations experienced relaxed purifying selection, evolved higher phenotypic and genetic diversity, carried more mutations and accumulated combinations of key beneficial mutations earlier. However, their phenotypic evolution was not accelerated, possibly because one gene copy rapidly became inactivated by deleterious mutations. Our work provides an experimental platform to test models of evolution by gene duplication, and it supports alternatives to Ohno’s hypothesis that point to the importance of gene dosage.
Maintenance of rod-shape in bacterial cells depends on the actin-like protein MreB. Deletion of mreB from Pseudomonas fluorescens SBW25 results in viable spherical cells of variable volume and reduced fitness. Using a combination of time-resolved microscopy and biochemical assay of peptidoglycan synthesis, we show that reduced fitness is a consequence of perturbed cell size homeostasis that arises primarily from differential growth of daughter cells. A 1000-generation selection experiment resulted in rapid restoration of fitness with derived cells retaining spherical shape. Mutations in the peptidoglycan synthesis protein Pbp1A were identified as the main route for evolutionary rescue with genetic reconstructions demonstrating causality. Compensatory pbp1A mutations that targeted transpeptidase activity enhanced homogeneity of cell wall synthesis on lateral surfaces and restored cell size homeostasis. Mechanistic explanations require enhanced understanding of why deletion of mreB causes heterogeneity in cell wall synthesis. We conclude by presenting two testable hypotheses, one of which posits that heterogeneity stems from non-functional cell wall synthesis machinery, while the second posits that the machinery is functional, albeit stalled. Overall, our data provide support for the second hypothesis and draw attention to the importance of balance between transpeptidase and glycosyltransferase functions of peptidoglycan building enzymes for cell shape determination.