Gene family innovation, conservation and loss on the animal stem lineage

  1. Daniel J Richter
  2. Parinaz Fozouni
  3. Michael Eisen
  4. Nicole King  Is a corresponding author
  1. Howard Hughes Medical Institute, University of California, Berkeley, United States

Abstract

Choanoflagellates, the closest living relatives of animals, can provide unique insights into the changes in gene content that preceded the origin of animals. However, only two choanoflagellate genomes are currently available, providing poor coverage of their diversity. We sequenced transcriptomes of 19 additional choanoflagellate species to produce a comprehensive reconstruction of the gains and losses that shaped the ancestral animal gene repertoire. We identified ~1,944 gene families that originated on the animal stem lineage, of which only 39 are conserved across all animals in our study. In addition, ~372 gene families previously thought to be animal-specific, including Notch, Delta, and homologs of the animal Toll-like receptor genes, instead evolved prior to the animal-choanoflagellate divergence. Our findings contribute to an increasingly detailed portrait of the gene families that defined the biology of the Urmetazoan and that may underpin core features of extant animals.

Data availability

Raw sequencing reads have been deposited at the NCBI SRA under BioProject PRJNA419411 (19 choanoflagellate transcriptomes) and PRJNA420352 (S. rosetta polyA selection test). Transcriptome assemblies, annotations, and gene families are available on FigShare at DOI: 10.6084/m9.figshare.5686984. Transcriptome assemblies have also been submitted to the NCBI Transcriptome Shotgun Assembly database under BioProject PRJNA419411. Protocols have been deposited to protocols.io and are accessible at DOI: 10.17504/protocols.io.kwscxee.Details on the datasets available via figshare:Dataset 1. Final sets of contigs from choanoflagellate transcriptome assemblies. There is one FASTA file per sequenced choanoflagellate. We assembled contigs de novo with Trinity, followed by removal of cross-contamination that occurred within multiplexed Illumina sequencing lanes, removal of contigs encoding strictly redundant protein sequences, and elimination of noise contigs with extremely low (FPKM < 0.01) expression levels.Dataset 2. Final sets of proteins from choanoflagellate transcriptome assemblies. There is one FASTA file per sequenced choanoflagellate. We assembled contigs de novo with Trinity, followed by removal of cross-contamination that occurred within multiplexed Illumina sequencing lanes, removal of strictly redundant protein sequences, and elimination of proteins encoded on noise contigs with extremely low (FPKM < 0.01) expression levels.Dataset 3. Expression levels of assembled choanoflagellate contigs. Expression levels are shown in FPKM, as calculated by eXpress. Percentile expression rank is calculated separately for each choanoflagellate.Dataset 4. Protein sequences for all members of each gene family. This includes sequences from all species within the data set (i.e., it is not limited to the choanoflagellates we sequenced).Dataset 5. Gene families, group presences, and species probabilities. For each gene family, the protein members are listed. Subsequent columns contain inferred gene family presences in different groups of species, followed by probabilities of presence in individual species in the data set.Dataset 6. List of gene families present, gained and lost in last common ancestors of interest. A value of 1 indicates that the gene family was present, gained or lost; a value of 0 indicates that it was not. The six last common ancestors are: Ureukaryote, Uropisthokont, Urholozoan, Urchoanozoan, Urchoanoflagellate and Urmetazoan. Gains and losses are not shown for the Ureukaryote, as our data set only contained eukaryote species and was thus not appropriate to quantify changes occurring on the eukaryotic stem lineage.Dataset 7. Pfam, transmembrane, signal peptide, PANTHER and Gene Ontology annotations for all proteins. Annotations are listed for all proteins in the data set, including those not part of any gene family. Pfam domains are delimited by a tilde (~) and Gene Ontology terms by a semicolon (;). Transmembrane domains and signal peptides are indicated by the number present in the protein, followed by their coordinates in the protein sequence.Dataset 8. Pfam, transmembrane, signal peptide, PANTHER and Gene Ontology annotations aggregated by gene family. The proportion of proteins within the gene family that were assigned an annotation is followed by the name of the annotation. Multiple annotations are delimited by a semicolon (;)

The following data sets were generated

Article and author information

Author details

  1. Daniel J Richter

    Department of Molecular and Cell Biology, Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9238-5571
  2. Parinaz Fozouni

    Department of Molecular and Cell Biology, Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Michael Eisen

    Department of Molecular and Cell Biology, Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7528-738X
  4. Nicole King

    Department of Molecular and Cell Biology, Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, United States
    For correspondence
    nking@berkeley.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6409-1111

Funding

Howard Hughes Medical Institute

  • Michael Eisen
  • Nicole King

National Institutes of Health

  • Nicole King

U.S. Department of Defense (National Defense Science and Engineering Graduate Fellowship)

  • Daniel J Richter

National Science Foundation (Central Europe Summer Research Institute Fellowship)

  • Daniel J Richter

Chang-Lin Tien Fellowship in Environmental Sciences and Biodiversity

  • Daniel J Richter

Conseil Régional de Bretagne (Postdoctoral Fellowship)

  • Daniel J Richter

Investissements d'Avenir (ANR-11-BTBR-0008)

  • Daniel J Richter

National Science Foundation (955517)

  • Parinaz Fozouni

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Maximilian J Telford, University College London, United Kingdom

Publication history

  1. Received: December 10, 2017
  2. Accepted: May 26, 2018
  3. Accepted Manuscript published: May 31, 2018 (version 1)
  4. Accepted Manuscript updated: June 15, 2018 (version 2)
  5. Version of Record published: July 3, 2018 (version 3)

Copyright

© 2018, Richter et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 7,416
    Page views
  • 1,029
    Downloads
  • 103
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Daniel J Richter
  2. Parinaz Fozouni
  3. Michael Eisen
  4. Nicole King
(2018)
Gene family innovation, conservation and loss on the animal stem lineage
eLife 7:e34226.
https://doi.org/10.7554/eLife.34226
  1. Further reading

Further reading

    1. Cancer Biology
    2. Evolutionary Biology
    Juan Manuel Vazquez, Maria T Pena ... Vincent J Lynch
    Research Advance

    The risk of developing cancer is correlated with body size and lifespan within species, but there is no correlation between cancer and either body size or lifespan between species indicating that large, long-lived species have evolved enhanced cancer protection mechanisms. Previously we showed that several large bodied Afrotherian lineages evolved reduced intrinsic cancer risk, particularly elephants and their extinct relatives (Proboscideans), coincident with pervasive duplication of tumor suppressor genes (Vazquez and Lynch 2021). Unexpectedly, we also found that Xenarthrans (sloths, armadillos, and anteaters) evolved very low intrinsic cancer risk. Here, we show that: 1) several Xenarthran lineages independently evolved large bodies, long lifespans, and reduced intrinsic cancer risk; 2) the reduced cancer risk in the stem lineages of Xenarthra and Pilosa coincided with bursts of tumor suppressor gene duplications; 3) cells from sloths proliferate extremely slowly while Xenarthran cells induce apoptosis at very low doses of DNA damaging agents; and 4) the prevalence of cancer is extremely low Xenarthrans, and cancer is nearly absent from armadillos. These data implicate the duplication of tumor suppressor genes in the evolution of remarkably large body sizes and decreased cancer risk in Xenarthrans and suggest they are a remarkably cancer resistant group of mammals.

    1. Ecology
    2. Evolutionary Biology
    Zinan Wang, Joseph P Receveur ... Henry Chung
    Research Article

    Maintaining water balance is a universal challenge for organisms living in terrestrial environments, especially for insects, which have essential roles in our ecosystem. Although the high surface area to volume ratio in insects makes them vulnerable to water loss, insects have evolved different levels of desiccation resistance to adapt to diverse environments. To withstand desiccation, insects use a lipid layer called cuticular hydrocarbons (CHCs) to reduce water evaporation from the body surface. It has long been hypothesized that the waterproofing capability of this CHC layer, which can confer different levels of desiccation resistance, depends on its chemical composition. However, it is unknown which CHC components are important contributors to desiccation resistance and how these components can determine differences in desiccation resistance. In this study, we used machine learning algorithms, correlation analyses, and synthetic CHCs to investigate how different CHC components affect desiccation resistance in 50 Drosophila and related species. We showed that desiccation resistance differences across these species can be largely explained by variation in CHC composition. In particular, length variation in a subset of CHCs, the methyl-branched CHCs (mbCHCs), is a key determinant of desiccation resistance. There is also a significant correlation between the evolution of longer mbCHCs and higher desiccation resistance in these species. Given that CHCs are almost ubiquitous in insects, we suggest that evolutionary changes in insect CHC components can be a general mechanism for the evolution of desiccation resistance and adaptation to diverse and changing environments.