Research Article

The genome sequence of the colonial chordate, Botryllus schlosseri

Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, United States
Stanford University, United States
Howard Hughes Medical Institute, Stanford University, United States
Università degli Studi di Milano, Italy
University of Haifa-Oranim, Israel
Stanford University School of Medicine, United States

Jul 2, 2013

Open access
Copyright information

Abstract
eLife digest
Introduction
Results and discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration.

https://doi.org/10.7554/eLife.00569.001

eLife digest

The tunicates are an evolutionary group that includes species such as sea squirts and sea tulips. Their name comes from the structure known as a ‘tunic’ that surrounds their sac-like bodies. As marine filter feeders, tunicates obtain nutrients by straining food particles from water, and they can live either alone or in colonies depending on the species. Charles Darwin suggested that tunicates may be the key to understanding the evolution of vertebrates, and indeed today they are regarded as the closest living relatives of this group.

Colonial tunicates can reproduce either sexually, or asexually by budding. Compatible colonies have the ability to recognize one another and to fuse their blood vessels to form a single organism, whereas incompatible colonies reject one another and remain separate. This recognition process bears some resemblance to the rejection of foreign organ transplants in mammals.

Here, Voskoboynik and co-workers present the first genome sequence of a colonial tunicate, Botryllus schlosseri. They used a novel sequencing approach that significantly increased the length of a DNA molecule that can be determined by next-generation sequencing, and allowed large DNA repeat regions to be easily resolved. In total, they sequenced 580 million base pairs of DNA, which they estimate contains roughly 27,000 genes.

By comparing the B. schlosseri genome with those of a number of vertebrates, Voskoboynik et al. identified multiple B. schlosseri genes that also participate in the development and functioning of the vertebrate eye, heart, and auditory system, as well as others that may have contributed to the evolution of the immune system and of blood cells. The genome of B. schlosseri thus provides an important new tool for studying the genetic basis of the evolution of vertebrates.

https://doi.org/10.7554/eLife.00569.002

Introduction

In 1866, Russian embryologist Alexander Kowalevsky wrote to Charles Darwin about the extensive developmental and morphological similarities between ascidian larvae and vertebrates, leading Darwin to hypothesize that ascidians (belonging to urochordates or tunicates) might be crucial to understanding the origin of the vertebrate phylum (Darwin, 1874). Indeed, tunicates are the closest extant relatives of vertebrates (Delsuc et al., 2006), and represent an investigative model for evolutionary events leading to adaptive immunity (Sabbadin, 1962; Scofield et al., 1982) and vertebrate-specific organ/tissue complexity (Dehal et al., 2002; Jeffery et al., 2004; Abitua et al., 2012). The colonial tunicate species, Botryllus schlosseri, represents an important model organism for studying unique aspects of a pre-vertebrate colonial lifestyle, such as self recognition (Sabbadin, 1962; Scofield et al., 1982), vasculature and blood development (Schlumpberger et al., 1984; Gasparini et al., 2008; Tiozzo et al., 2008), apoptosis (Lauzon et al., 1993; Cima et al., 2009), and alternative reproduction pathways (Sabbadin et al., 1975; Manni and Burighel, 2006; Voskoboynik et al., 2007; Lemaire, 2011), including stem cell-mediated regeneration of complete individuals within a colony unit (Laird et al., 2005; Voskoboynik et al., 2008; Rinkevich et al., 2013).

Botryllus schlosseri is an invasive colonial urochordate, living in large communities consisting of multiple colonies organized into expansive mats that coat a variety of marine surfaces, such as rocks, molluscs, multicellular algae, and ship hulls (Stoner et al., 2002). Communities develop among compatible colonies, governed by a genetically encoded histocompatibility system (Sabbadin, 1962; Scofield et al., 1982). The progeny of each colony usually represents a clone of the vascularly connected, asexually reproducing individuals (zooids) derived from a single planktonic larva (Manni and Burighel, 2006; Figure 1A–D). Compatible colonies fuse their blood vessels to generate a chimera, while incompatible colonies reject one another, maintaining individuality (Sabbadin, 1962; Scofield et al., 1982; Voskoboynik et al., 2013). Following the fusion of blood vessels between colonies, the circulating stem cells of one partner colony can compete and replace the germline and/or the soma of the other partner (Stoner and Weissman, 1996; Stoner et al., 1999; Laird et al., 2005; Voskoboynik et al., 2008; Rinkevich et al., 2013), a phenomenon analogous to allogeneic transplantation.

Figure 1 with 1 supplement see all

Download asset Open asset

*Botryllus schlosseri* anatomy, life cycle, and phylogeny.

*B. schlosseri* reproduces both through sexual and asexual (budding) pathways, giving rise to virtually identical adult body plans. Upon settlement, the tadpole phase of the *B. schlosseri* lifecycle (A) will metamorphose into a founder individual (oozooid) (B), which through asexual budding, generates a colony. The colony includes three overlapping generations: an adult zooid, a primary bud, and a secondary bud, all of which are connected via a vascular network (bv) embedded within a gelatinous matrix (termed tunic). The common vasculature terminates in finger-like protrusions (termed ampullae; B–D). Bud development commences in stage A (C). Through budding, *B. schlosseri* generates its entire body, including digestive (ds) and respiratory (brs) systems, a simple tube-like heart (h), an endostyle (en) that harbors a stem cell niche, a primitive neural complex, and siphons used for feeding, waste, and releasing larvae (B–D). Each week, successive buds grow large (D) and complete replication of all zooids in the colony, ultimately replacing the previous generation’s zooids, which die through a massive apoptosis. (E) A phylogenomic tree produced from analysis of 521 nuclear genes (40,798 aligned amino acids) from 15 species, including *B. schlosseri*. Scale bar-1 mm.

https://doi.org/10.7554/eLife.00569.003

Tunicates are classified as chordates because their planktonic larva stage (Figure 1A) shares structural characteristics with all chordates: a notochord, dorsal neural tube, segmented musculature, and gill slits (Darwin, 1874; Dehal et al., 2002). Larvae settle in response to light and metamorphose into sessile individuals (Figure 1B), which lose most of their chordate phenotypes (Darwin, 1874; Dehal et al., 2002). Tunicates reproduce either sexually (solitary tunicates; Dehal et al., 2002; Lemaire, 2011), or sexually and asexually (colonial tunicates; Manni and Burighel, 2006; Lemaire, 2011). These two reproductive modes give rise to nearly identical complex adult body plans, including digestive and respiratory systems, a simple tube-like heart, siphons, an endostyle, a neural complex, ovary and testis (Manni and Burighel, 2006; Figure 1A–D).

The ability to reproduce asexually renders colonial tunicates robust survivors, capable of rapid proliferation and whole body regeneration. These unique features of colonial tunicates coupled with their key evolutionary position and long history of scientific study prompted us to sequence the B. schlosseri genome.

Results and discussion

A novel genome sequencing method for deciphering repeat-rich genomes

The B. schlosseri genome was previously estimated to be 725 Mb based on flow cytometry analysis (De Tomaso et al., 1998), and metaphase spreads suggested that it is organized into 16 chromosomes (Colombera, 1963). To accurately assemble this relatively large genome, we developed a novel method to accurately sequence many large fragments in parallel. This long read sequencing approach (LRseq) effectively increases the read length of a next generation sequencer by 50-fold, while decreasing the error rate by orders of magnitude (Figure 2; ‘Materials and methods’ under ‘Genome sequencing and assembly’). Our approach began with genomic DNA sheared to 6–8 kb fragments. Limiting dilution was used to create aliquots of a few hundred to a few thousand DNA molecules. Each aliquot was amplified with PCR, fragmented (600–800 bp), barcoded, and sequenced by Illumina HiSeq 2000 (Figure 2). The Velvet assembler (Zerbino, 2010) was used to assemble short paired-end reads from each barcode (i.e., well) separately, thus simplifying the assembly problem and creating effective read lengths corresponding to the original large fragment sizes (Figure 2B; Supplementary file 2A, Supplementary file 2B). Limiting the number of DNA molecules per well greatly reduces or eliminates chances of having a repeated or duplicate sequence within a defined partition. Furthermore, since each well was over-sequenced, the error rate is reduced by the coverage and is substantially improved from the intrinsic error rate of the sequencer (Supplementary file 2C). This procedure is amenable to automation in multiwell plates, and we obtained data from twelve 96-well plates (Supplementary file 2A, Supplementary file 2B). We validated this method on human genomic DNA, for which an independent reference is available (Figure 2—figure supplement 1).

Figure 2 with 6 supplements see all

Download asset Open asset

A novel short read genome sequencing and assembly method for complex, repeat-rich genomes.

(A) Genomic DNA is sheared into 6–8 kb fragments, partitioned into twelve 96-well plates, further fragmented to 600–800 bp, barcoded and sequenced separately for each well (Illumina HiSeq 2000 2x100bp), and assembled by Velvet. (B) Size distribution of contigs assembled from a representative library preparation (BL5). (C) Limiting the number of amplifiable molecules per well (barcode) to the level that almost 100% of all amplifiable molecules are present as single copies (<1000 gDNA molecules) greatly reduces the chance of having a repeated or homologous sequence within a well. Thus, sample complexity is significantly reduced, which reduces ambiguity in the reconstruction of a consensus sequence. As an example, two different predicted repeat-containing genes (g2001,1189bp; and g2002, 688bp) were assembled from two different wells (005 and 145 respectively). Although they contain highly homologous repeats (represented as a Dot Matrix plot, (D) these repetitive genes were resolved and reconstructed properly in the final assembly.

https://doi.org/10.7554/eLife.00569.005

Genomic DNA (gDNA) was extracted from tissue from two long-lived B. schlosseri colonies (Sc6a-b and 356a) raised in our mariculture facility (‘Materials and methods’ under ‘Animals and genomic DNA sample collection’). Microsatellite heterogeneity confirmed clonality (Figure 2—figure supplement 2). Each colony was sequenced and assembled separately. We first attempted conventional sequencing and assembly from colony Sc6a-b DNA using Roche 454 Titanium (Branford, USA) and Illumina GAII (San Diego, USA) sequences (Supplementary file 2C, ‘Materials and methods’ under ‘Genome sequencing and assembly’). This Sc6a-b assembly achieved an average N50 of 1 kb, yielding short contigs that were insufficient for whole genome assembly (Supplementary file 2D). By contrast, when we applied LRseq to the 356a clone, we obtained a 566 Mbp assembly with a dramatically improved N50 of 7kb (Supplementary file 2D; Figure 2—figure supplement 3). This approach not only simplified the assembly of a complex eukaryotic genome, but also reduced the confounding impact of repetitive DNA on contig assembly (Figure 2C–D; Figure 2—figure supplement 3).

Chromosome assignments, repeats, and gene content

We sought to determine the chromosomal organization of the B. schlosseri genome. Using embryos from a wild B. schlosseri colony from Monterey Bay, we loaded a dilute solution of dispersed metaphase chromosomes into a microfluidic device as previously described (Fan et al., 2011). The isolated metaphase chromosome mixtures from 21 individual wells were amplified, barcoded, and sequenced separately (‘Materials and methods’ under ‘Chromosome sequencing, assignment and assembly’; Fan et al., 2011; Xu et al., 2011). Using the 21 chromosome mixtures, containing between 1 and 4 chromosomes each, 356a genomic contigs larger than 7 kb were aligned to the chromosome reads using BWA. Then, scaffolds were assigned to chromosome clusters by iterative K-means clustering on the correlation matrix between each scaffold (Figure 3; ‘Materials and methods’ under ‘Chromosome sequencing, assignment and assembly’). Assuming that B. schlosseri carries 16x2 chromosomes (Colombera, 1963), this approach clearly resolves 13 chromosomes with a mean chromosome meta-scaffold size of 16,234 kb and a mean N50 of 38 kb (Figure 3; Figure 3—figure supplement 1; Supplementary file 2D). Finally, we attempted to improve our genomic assembly by incorporating the additional 21 chromosome assemblies into a hybrid assembly (‘Materials and methods’ under ‘Chromosome sequencing, assignment and assembly’; Figure 3—figure supplement 2; Figure 3—figure supplement 1). An overall improvement in N50 was achieved, yielding a final 580 Mbp draft assembly (Supplementary file 2D).

Figure 3 with 4 supplements see all

Download asset Open asset

Clustering and assignment of *B.schlosseri* chromosomes.

(A) We isolated and sequenced 21 metaphase chromosome mixtures using a microfluidic device. Each chromosome mixtures was amplified, barcoded and sequenced separately (IlluminaHiSeq). Genomic contigs larger than 7 kb were aligned to the chromosome reads using BWA. Subsequently, assignment of scaffolds to chromosome cluster was performed using iterative K-means clustering on the correlation matrix between each scaffold. In addition, to find the number of clusters/chromosomes we performed k-means clustering iteratively across different cluster numbers. This plot demonstrates that increasing beyond 13 clusters does little to reduce the error; therefore 13 chromosomes were successfully resolved. (B) To estimate the configuration after the clustering step, 17 out of the 21 wells were deduced to contain information that is used in the clustering process. The average number of normalized reads counts from each metaphase chromosome mixture (well) that align to each scaffold in a cluster group was calculated and plotted. Each peak represented can be inferred to denote the presence of a specific chromosome in the well. Examples of four representative wells are presented, metaphase chromosome mixtures contained between 1–4 chromosomes (see also Figure 3—figure supplement 1).

https://doi.org/10.7554/eLife.00569.012

Repetitive elements can confound traditional genome assembly methods (Salzberg et al., 2012), and are often removed to avoid assembly errors (e.g., Dehal et al., 2002; Putnam et al., 2007; Shinzato et al., 2011; Supplementary file 2H). However, because LRSeq was designed to explicitly resolve long read sequences even in the presence of repeats, we further evaluated LRSeq performance by enumerating two major repeat classes in the assembly, interspersed repeats and tandem repeats. We used RepeatScout for de novo identification of interspersed elements (Price et al., 2005), and RepeatMasker (Smit et al., 1996–2010) for analysis of genome-wide repeat demographics. We identified 6601 interspersed repeat families, each present in at least three copies, that together cover ∼65% of the B. schlosseri genome assembly (Supplementary file 2E). We also identified 1400 large repeat families, defined as interspersed repeats with genomic alignments of at least 1 kb. Notably, large interspersed repeats are found in a median of four chromosomes (of 13 chromosome assignments), and >10% of large interspersed repeat families occur in over 100 copies (Supplementary file 2E; Figure 2—figure supplement 4A). Despite considerable repetitive content, we observed a strong concordance between genomic contigs and Sanger fosmid sequences, supporting the effectiveness of the LRseq approach (e.g., see Figure 2—figure supplement 5). As a further validation, we interrogated our former sc6ab 380 Mb assembly for the same interspersed repeat elements, with the expectation of recovering less repeats. Indeed, only 52.27% of sc6ab base pairs were masked using the same repeat library. These results validate the repeat families and support their widespread presence in the B. schlosseri genome. Finally, we analyzed the assembly for perfect (100% sequence identity) and degenerate tandem repeat content using XSTREAM (Newman and Cooper, 2007). In all, ∼3.2 million tandem repeats were identified, with periods ranging from 1–6525 bp and copy numbers ranging from 2–1096x (Figure 2—figure supplement 4B). By comparison, the human genome was assembled to a very high standard using conventional Sanger technology and later Illumina technology, and was found to contain over 50% repeats (de Konning et al., 2011). The considerable repeat content and diversity in the B. schlosseri genome indicates that LRseq may have general utility for resolving repeat architectures in diverse eukaryotic genomes.

We further validated the assembly by comparison to a variety of independently generated B. schlosseri sequence data. All B. schlosseri genes (n = 66), fosmid sequences (n = 11) and most of the 98,611 expressed sequence tags (ESTs) available from NCBI aligned with the B. schlosseri draft assembly (Supplementary file 2F, Figure 2—figure supplement 5; ‘Materials and methods’ under ‘Evaluation of 365a-chromosomes hybrid assembly’). Moreover, nearly all of the independently sequenced and assembled Roche 454 Sc6a-b contigs (93%) were successfully mapped to the assembly (Supplementary files 2F; ‘Materials and methods’ under ‘Evaluation of 365a-chromosomes hybrid assembly’). Taken together, these data represent independent validation of the quality and integrity of the B. schlosseri draft assembly which compares favorably to, and in some cases exceeds, existing wild type genomes with respect to ungapped chromosome contig N50, chromosome assignments, and repeat sequence integration (e.g., see Supplementary file 2G).

Next, to identify protein-encoding genes, we generated RNA-Seq data (88 Gb; Supplementary file 2C) from 19 different colonies to guide the gene prediction program Augustus (Stanke et al., 2008). In total, 38,730 putative protein-coding loci were identified, all of which have at least 30% transcript support (‘Materials and methods’ under ‘RNA sequencing’, ‘Gene prediction’, ‘Gene annotation’; Supplementary file 2I). Among these predicted genes, 27,463 include a start and stop codon, 13,910 genes have at least one intron, and 13,553 are intron-less (Supplementary file 2H). Moreover, for each of the B. schlosseri chromosomes 55% of genes have at least one intron while ∼45% are intron-less (Figure 3—figure supplement 4). In addition, the mean B. schlosseri gene length is predicted to be 3.6 kbp with a mean exon length of 170 bp (Supplementary file 2H). We tested a set of 145 genes by PCR and Sanger-sequencing, and were able to confirm 144 of them (99.3%), further validating the genome assembly (Figure 2—figure supplement 6, ‘Materials and methods’ under ‘Evaluation of genes’).

Using these predicted genes, we investigated the evolutionary position of B. schlosseri. Phylogenomic analysis of 425 conserved homologous genes across 15 diverse species, and mitogenomic analysis of 65 species both support the phylogenic position of tunicates within Chordata (Delsuc et al., 2006; Figure 1E; Figure 1—figure supplement 1, Supplementary file 1; ‘Materials and methods’ under ‘Mitochondrial phylogeny’, ‘Phylogenomic analyses’), and provide strong evidence that colonial and solitary tunicates represent the closest living relative of vertebrates.

B. schlosseri and the emergence of vertebrate phenotypes

We investigated the B. schlosseri genome for molecular events underlying the emergence and early diversification of vertebrates. Protein-encoding genes in B. schlosseri were compared to a diverse sampling of 18 well-annotated genomes from other species, and for each genome, we assessed the presence or absence of significant homology to human or mouse proteins (Figure 4—source data 1A; ‘Materials and methods’ under ‘Evolution analysis’). All proteomes were combined into a single data set (of constant size) for blast analysis. As such, differences in the number of genes per genome would not have impacted our results. An e-value cutoff of e⁻¹⁰ was selected to strike a balance between statistical significance and the detection of remote homology (‘Materials and methods’ under ‘Evolution analysis’). Among the analyzed species, we found that 77% of human genes could be traced back to protochordates with at least some homology (e-value ≤ e⁻¹⁰), around 10% less than chicken (85%) and frog (86%) genomes, indicating that the common ancestral genome of tunicates and vertebrates had homology to at least 77% of the human gene repertoire. This list includes about 660 genes present in the common ancestor, but absent in non-chordate species (Figure 4—source data 1B).

Among the genes found in B. schlosseri (either alone or in combination with other protochordate species) and vertebrates (Figure 4—source data 1B, Figure 4—source data 1C), we found genes that are critical to the development and function of the vertebrate heart (e.g., ALPK3, TNNT2; Hosoda et al., 2001; Frey et al., 2012), and eye (gamma and beta crystallins; Sun et al., 2011), and the ability to hear (GJB2/3/6 CLDN; Rabionet et al., 2000; Wilcox et al., 2001; Figure 4, Figure 4—source data 1C). Mutations in these genes are implicated in a variety of human diseases and disorders, including heart diseases (Frey et al., 2012), cataracts (Sun et al., 2011), deafness (Rabionet et al., 2000; Wilcox et al., 2001), and nemaline myopathy (Johnston et al., 2000; Figure 4, Figure 4—source data 1C). In addition, B. schlosseri was the only protochordate in our analysis with proteins homologous to pregnancy-specific glycoproteins (PSGs). PSGs are the major placental polypeptides, and complications in pregnancies and spontaneous abortions have been associated with abnormally low levels of PSGs in the maternal blood (Camolotto et al., 2010). Analogous to mammalian pregnancies, a common blood supply among kin is established and tolerated in B. schlosseri chimeras (Voskoboynik et al., 2009). Thus, by studying PSG-like proteins in B. schlosseri, new insights might be gained into maternal and fetal medicine.

Figure 4

Download asset Open asset

Innovations underlying the emergence and early diversification of vertebrates.

Protein-encoding genes in *B. schlosseri* were compared to a diverse sampling of 18 well-annotated genomes from other species, and for each genome, the presence or absence of homology to human or mouse proteins was assessed (all vs all blastp e-value threshold of 1e⁻¹⁰; Figure 4—source data 1A). Our data indicate that homologs of ∼660 human/mouse genes were present in the common ancestor of tunicates and vertebrates, but not non-chordate species Figure 4—source data 1B). Among them are genes associated with the development, function, and pathology of vertebrate features, including heart, eye, hearing, immunity, pregnancy and cancer (Figure 4—source data 1C). Gray box = no homology; Yellow box = homology.

https://doi.org/10.7554/eLife.00569.017

Figure 4—source data 1 Vertebrates evolution. (A) Innovations that underline the emergence and early diversification of vertebrates. We compared protein-encoding genes in B. schlosseri to a diverse sampling of 18 well-annotated genomes from other species. All protein sequences were compared by blastp against all other protein sequences. Based on this data set a list was generated of genes known from human and mouse and their existence (1) or absence (0) in the tested species (e-value < e⁻¹⁰). (B) The 660 putative genes present in protochordates, human and mouse, but absent in non-chordate species. This list was generated from Figure 4—source data 1A. Per every species, or species group we filtered for genes that were present in the tested species/species group and in human or mouse, but were absent in non-chordate species. (C) Innovations that underline the emergence and early diversification of vertebrates. This table is based on data gathered in Figure 4—source data 1B and is focused on the genes that are present in B. schlosseri and vertebrates (either alone or in combination with other protochordate species) but are absent in non-chordate species. A ToppGene analysis is presented of these sets of genes which summarized their molecular functions, biological processes, human and mouse phenotypes, and pathways they are involved in, gene families, drugs and human diseases.: https://doi.org/10.7554/eLife.00569.018
Download elife-00569-fig4-data1-v1.xlsx

Numerous genes predicted to have evolved in a common ancestor of B. schlosseri and vertebrates are essential to the immune system and hematopoiesis (Figure 4, Figure 4—source data 1B, Figure 4—source data 1C). Six genes unique to B. schlosseri and vertebrates (ZBTB1, MEFV, DSG3, NQO1, NQO2 and BHLHE40) are associated with increased leukocyte and hematopoietic cell numbers (Figure 4—source data 1C; Chen et al., 2009). In our analysis, these genes are absent in cephalochordates and solitary urochordates, which all lack a defined vascular system (Moller and Philpott, 2005; Lemaire, 2011). In contrast, the heart in each individual zooid in a B. schlosseri colony beats synchronously with the hearts of other zooids in the colony, driving a bidirectional blood flow throughout an interconnected vasculature (Video 1). Moreover, this blood system carries at least ten morphologically different cell types (Schlumpberger et al., 1984; Ballarin et al., 2008). Because of the anatomy of B. schlosseri, coupled with its hematopoietic-related gene repertoire, we hypothesize that colonial ascidians may have retained and elaborated many components of the ancestral hematopoietic program, much of which has been lost in extant solitary urochordates and cephalochordates.

Video 1

Download asset

posterframe for video — *B. schlosseri* blood circulation.

(A) Time-lapse acquisition of blood flow in the blood vessels (bv) and ampullae of a chimeric *B. schlosseri* colony, generated from a fusion between a mother and its offspring (fused). (B) Ampullae contract, buds develop, and a colony gets ready to replace the old generation. (C) Old generation zooids are getting resorbed (res. z) and replaced by the new generation (buds). (D) A heart beating and pumping blood in the primary bud of a different colony. (E) Blood flow through a common blood vessel between two allogeneic/compatible colonies, creating a natural chimera.

https://doi.org/10.7554/eLife.00569.019

Evolution of hematopoiesis

We next attempted to identify potential precursors of human hematopoietic populations in B. schlosseri and 17 other diverse species, including fungal, plant, and mammalian species. We analyzed gene expression microarray data from 26 different human blood cell populations, and additional non-blood human tissue samples. We identified a set of twenty signature genes that were highly expressed in each of the 26 hematopoietic populations (Benita et al., 2010; Seita et al., 2012; ‘Materials and methods’ under ‘Evolution analysis’). For each blood-related gene set, we identified homologous gene sequences in B. schlosseri and 17 other species (Figure 5, Supplementary file 3). Among B. schlosseri homologs, we found high enrichment for gene sets predominantly expressed in human hematopoietic stem cells (HSCs; i.e., 14 of 20 cord blood HSC genes), myeloid populations (i.e., 14 of 20 early erythroid CD71+ genes), and early but not mature lymphoid populations. Consistent with previous studies (Bartl et al., 1994; Laird et al., 2000; Dishaw and Litman, 2009; Guo et al., 2009; Flajnik and Kasahara, 2010; Bajoghli et al., 2011; Hirano et al., 2011) this analysis indicates that the evolution of adaptive immunity progressed rapidly beginning with jawless vertebrates, with much of the genetic repertoire in place by the emergence of jawed vertebrates (Figure 5). However, homologs of human genes with specific expression in HSC and blood progenitor populations, including T and B progenitor cells, appear early in metazoan evolution (Figure 5; Supplementary file 3).

Figure 5

Download asset Open asset

Analysis of blood and immune cell type-specific genes across evolution reveals evidence for hematopoietic precursors in *B. schlosseri*.

We analyzed gene expression microarray data from 26 different human blood cell populations, organized into four cell lineages (HSC; Lymphoid Progenitors; Myeloid and Lymphoid Lineage), and identified a set of twenty signature genes with highly enriched expression profiles for each population (Supplementary file 3). For each blood-related gene set, we identified homologous gene sequences in *B. schlosseri* and 17 other species; the fraction of genes (out of 20) found for each species is displayed as a heat map. Within each major lineage, cell populations are sorted in decreasing order by a conservation index, calculated as the average number of genes found across the 18 species (indicated by a blue bar graph).

https://doi.org/10.7554/eLife.00569.020

Unlike solitary tunicates (e.g., Ciona), B. schlosseri has a defined vasculature with circulating blood cells (including cells with lymphocyte-like and macrophage-like morphology; Schlumpberger et al., 1984; Ballarin et al., 2008; Video 1). As such, we further investigated by PCR and re-sequencing the expression of all 28 B. schlosseri genes with homology to human HSCs (n = 14) and early erythroid CD71+ blood cell (n = 14) gene sets. Strikingly, we found evidence for expression of 13 HSC homologs in the B. schlosseri endostyle stem cell niche (Voskoboynik et al., 2008), and 7 in the vasculature. We also confirmed expression of all 14 early erythroid CD71+ genes in the vasculature and endostyle (Supplementary file 3). Thus, our analysis not only identified B. schlosseri genes that may define evolutionary precursor cells of human hematopoietic lineages, but also indicates that the evolution of hematopoiesis proceeded from stem cells to myeloid populations to lymphoid populations, leading to the eventual emergence, absent in B. schlosseri, of T/B-cell based adaptive immunity in vertebrates (Figure 5; Supplementary file 3).

Not surprisingly, the B. schlosseri genome lacks significant homology to most genes known to play an important role in the vertebrate adaptive immune system. For instance, no evidence for the following immune-related genes could be found: (i) assembled major histocompatibility genes, (ii) genes with homology to RAG1/RAG2, which are involved in immunoglobulin and T-cell receptor rearrangements, (iii) terminal deoxynucleotidyl transferase, which adds nucleotides to the rearranging VDJ elements to create receptor diversity, (iv) V region subgenic elements encoding T cell and immunoglobulin antigen receptor domains, or (v) VLR like immune receptor elements found in lampreys (Weigert et al., 1970; Davis et al., 1984; Oettinger et al., 1990; Fagan and Weissman, 1998; Laird et al., 2000; Muramatsu et al., 2000; Pancer et al., 2004; Rogozin et al., 2007; Dishaw and Litman, 2009; Flajnik and Kasahara, 2010; Hirano et al., 2011). We identified a large fraction (∼45%; Supplementary file 2H; Figure 3—figure supplement 4) of intron-less genes in the B. schlosseri draft genome, including retroviral genes such as Gag, Poli, Env and LTRs, which are used by viruses to insert their genetic sequences into the host genomes. As adaptive immunity genes like RAG1/RAG2 are intron-less and first appear in jawed vertebrates, it has been suggested that they may have originated via horizontal infections of primitive retroviral like agents, and/or gene transfer (Bartl et al., 1994). In addition, the B. schlosseri genome encodes homologues of Foxn1, the thymus epithelial gene mutated in the immunodeficient nude mouse (nu/nu), a marker of the thymopoietic microenvironment in vertebrates (Nehls et al., 1996; Bajoghli et al., 2011). These data indicate that at least some genetic circuitry relevant for vertebrate adaptive immunity was already in place in the common ancestor of the protochordate B. schlosseri and vertebrates. It leaves open the question of whether Ig or TCR genes, and the MHC proteins that capture and present intracellular peptides to T cells expressing these TCR proteins, existed in antecedents to B. schlosseri but were lost or somehow introduced after the line from colonial tunicates to the organisms that have an adaptive immune system. As omnis DNA e DNA, this question is perhaps the most puzzling of our findings.

In conclusion, using a novel method for deciphering eukaryotic genomes, we assembled and analyzed the B. schlosseri genome, the first colonial tunicate to be sequenced. One of the great challenges in evolutionary biology is to understand how differences in DNA sequences between species underlie distinct phenotypes. The B. schlosseri genome provides an important new resource for unraveling the genes and regulatory logic that led to the emergence of vertebrates and lymphoid-mediated immunity. Moreover, the many important features encoded by the B. schlosseri genome will facilitate new insights into complex vasculature, chimerism among kin, whole-body stem cell-mediated regeneration, and a colonial lifestyle.

Names of scaffold	Chromosome preparation 1	…	Chromosome preparation n
scaffold 1	Number of reads associated with the scaffold and preparation	…	…
…	…	…	…
Scaffold n	…	…	…

Share this article

Cite this article

Botryllus schlosseri anatomy, life cycle, and phylogeny.

A novel short read genome sequencing and assembly method for complex, repeat-rich genomes.

Clustering and assignment of B.schlosseri chromosomes.

Innovations underlying the emergence and early diversification of vertebrates.

Figure 4—source data 1

B. schlosseri blood circulation.

Analysis of blood and immune cell type-specific genes across evolution reveals evidence for hematopoietic precursors in B. schlosseri.

Author details

Ayelet Voskoboynik

Contribution

For correspondence

Competing interests

Norma F Neff

Contribution

Contributed equally with

Competing interests

Debashis Sahoo

Contribution

Contributed equally with

Competing interests

Aaron M Newman

Contribution

Contributed equally with

Competing interests

Dmitry Pushkarev

Contribution

Contributed equally with

Competing interests

Winston Koh

Contribution

Contributed equally with

Competing interests

Benedetto Passarelli

Contribution

Competing interests

H Christina Fan

Contribution

Competing interests

Gary L Mantalas

Contribution

Competing interests

Karla J Palmeri

Contribution

Competing interests

Katherine J Ishizuka

Contribution

Competing interests

Carmela Gissi

Contribution

Competing interests

Francesca Griggio

Contribution

Competing interests

Rachel Ben-Shlomo

Contribution

Competing interests

Daniel M Corey

Contribution

Competing interests

Lolita Penland

Contribution

Competing interests

Richard A White III

Contribution

Competing interests

Irving L Weissman

Contribution

For correspondence

Competing interests

Stephen R Quake

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags