1. Developmental Biology and Stem Cells
  2. Genomics and Evolutionary Biology
Download icon

Complex transcriptional regulation and independent evolution of fungal-like traits in a relative of animals

  1. Alex de Mendoza Is a corresponding author
  2. Hiroshi Suga
  3. Jon Permanyer
  4. Manuel Irimia
  5. Iñaki Ruiz-Trillo Is a corresponding author
  1. Universitat Pompeu Fabra, Spain
  2. Prefectural University of Hiroshima, Japan
  3. Centre for Genomic Regulation, Spain
  4. Institució Catalana de Recerca i Estudis Avançats, Spain
Research Article
Cited
10
Views
3,355
Comments
0
Cite as: eLife 2015;4:e08904 doi: 10.7554/eLife.08904

Abstract

Cell-type specification through differential genome regulation is a hallmark of complex multicellularity. However, it remains unclear how this process evolved during the transition from unicellular to multicellular organisms. To address this question, we investigated transcriptional dynamics in the ichthyosporean Creolimax fragrantissima, a relative of animals that undergoes coenocytic development. We find that Creolimax utilizes dynamic regulation of alternative splicing, long inter-genic non-coding RNAs and co-regulated gene modules associated with animal multicellularity in a cell-type specific manner. Moreover, our study suggests that the different cell types of the three closest animal relatives (ichthyosporeans, filastereans and choanoflagellates) are the product of lineage-specific innovations. Additionally, a proteomic survey of the secretome reveals adaptations to a fungal-like lifestyle. In summary, the diversity of cell types among protistan relatives of animals and their complex genome regulation demonstrates that the last unicellular ancestor of animals was already capable of elaborate specification of cell types.

https://doi.org/10.7554/eLife.08904.001

eLife digest

All living animals are descended from a single-celled ancestor, and understanding how these ancestors became the first multicellular animals remains a major challenge in the field of evolutionary biology. An early breakthrough towards this goal was the realization that, even though they’re mostly single-celled organisms, the closest living relatives of animals share most of the basic gene toolkit that animals use to support their multicellular lifestyles.

This shared toolkit also includes the genes that allow each specialized cell type in an animal (for example, a skin cell or liver cell) to express the subset of genes that it needs to fulfil its specific role. Discovering how the single-celled relatives of animals regulate these and other “multicellularity-related” genes during their life cycles is the next crucial step towards understanding how animals became multicellular.

Creolimax fragrantissima is a single-celled relative of animals. One stage in this organism’s life cycle involves its nucleus (which contains its genetic material) replicating multiple times without the cell itself dividing. After this stage of development, new cells are formed, each receiving with a single nucleus, and released to live freely in the environment. Characterizing how C. fragrantissima regulates which genes are expressed during these two very different stages of development could shed new light on how multicellular animals evolved to regulate their genes in specific cell types. However, little is known about these processes in C. fragrantissima.

Now, de Mendoza et al. have both sequenced C. fragrantissima’s genome and analysed which genes are expressed during the stages of its life cycle. This analysis reveals that this organism regulates its gene expression in several ways that are more commonly associated with gene regulation in multicellular animals. Furthermore, when compared to two other living relatives of animals that have brief multicellular stages in their life cycles, de Mendoza et al. found that the three organisms expressed similar genes during these similar life cycle stages.

Furthermore, like fungi, C. fragrantissima digests its food externally and then absorbs the nutrients. Using a range of techniques, de Mendoza et al. identified the proteins involved in these processes and discovered that many had evolved independently from their counterparts in fungi. Furthermore, in some cases, the genes for these proteins had actually been acquired from bacteria via a process called lateral gene transfer.

Together these findings suggest that it was likely that the last single-celled ancestor of multicellular animals already had the biological ability to create different cell types. Understanding if the cell types found in single-celled species resemble cell types from simple animals, such as sponges and comb jellies, at a molecular level is the next step towards determining what the ancestor of living animals looked like.

https://doi.org/10.7554/eLife.08904.002

Introduction

The process by which multicellular animals develop from a unicellular zygote is believed to mirror the first evolutionary steps that led to the origins of animal multicellularity from a unicellular species (King, 2004). As the development of complex multicellularity is dependent upon differential genome regulation, the evolutionary onset of animal multicellularity must likewise have involved the appearance of differential genome regulatory capacities leading to distinct cell types. Many of the genes involved in the control of animal development and cell type identity, including signaling pathways and transcription factors (TFs), pre-date animal origins (Sebe-Pedros et al., 2011; Sebé-Pedrós et al., 2012; Suga et al., 2012; Richter and King, 2013). As these genes are known to be present in the genomes of the protistan relatives of animals, the unicellular holozoans (King et al., 2008; Suga et al., 2013; Fairclough et al., 2013), the evolution of complex multicellularity, with cell type-specific transcriptional programs, must have involved changes in gene regulation. Therefore, a key step in understanding the evolution of multicellularity will be to infer the regulatory complexity of the last common ancestor of all living animals.

To address ancestral regulatory complexity, it will be necessary to elucidate the molecular control of cell differentiation through development in the unicellular relatives of animals. Three distinct developmental modes that lead to transient simple multicellular forms have been described in the protistan relatives of animals (Figure 1) (Sebé-Pedrós et al., 2013; Suga and Ruiz-Trillo, 2013; Dayel et al., 2011). Colonial clonal development has been shown to involve differential regulation of a few multicellularity-related genes in the choanoflagellate Salpingoeca rosetta (Fairclough et al., 2013). On the other hand, the filasterean amoeba Capsaspora owczarzaki exhibits up-regulation of adhesion-related genes in its aggregative stage (Sebé-Pedrós et al., 2013). To date, however, there has been no molecular characterization of coenocytic development, a third and completely distinct mode of development observed in the ichthyosporeans. Ichthyosporeans are the earliest branching holozoan lineage (Torruella et al., 2012; Paps et al., 2013; Torruella et al., 2015), and coenocytic development is a shared feature within the group (Glockling et al., 2013; Mendoza et al., 2002). This developmental mode comprises a growth stage in which nuclei divide synchronously within a common cytoplasm before undergoing cellularization, followed by release of motile ameboid zoospores (Suga and Ruiz-Trillo, 2013; Marshall et al., 2008). These stages have distinct physiological and structural characteristics, the amoeboid stage is mono-nucleated, non-diving, and motile (Figure 1B), while the multinucleate stage has a cell wall, a big central vacuole, and does not move (Figure 1C). Despite being quite distinct from canonic animal development, coenocytic development and multinucleate cell types are found in some animal lineages, such as in Drosophila syncytial blastoderm (Suga and Ruiz-Trillo, 2013). Thus, in order to infer ancestral regulatory complexity by comparison of premetazoan developmental modes, it will be essential to obtain a detailed molecular characterization of ichthyosporean coenocytic development.

Figure 1 with 1 supplement see all
Evolution of developmental and feeding modes across holozoans.

(A) The cladogram represents known phylogenetic relationships among holozoans (Torruella et al., 2012; Torruella et al., 2015). Each lineage is represented by the species proposed as a model system with a schema of its developmental mode on the right. The evolution of specialized osmotrophy is shown as a blue triangle in the cladogram, while the putative ancestral phagotrophic feeding mode of opisthokonta is shown as an orange circle (Cavalier-Smith, 2012). Divergence times of the lineages shown in this figure range between 700 Mya (considered the latest estimates of animal origins) and 1200 Mya (earliest estimates of Opisthokont origins) (Sharpe* et al., 2015). Micrographs depicting the (B) amoeboid stage and (C) multinucleate stage of Creolimax fragrantissima are shown. Scale bars = 10 μm. Choanoflagellate adapted from Mark Dayel (CC BY-SA 3.0) www.dayel.com/blog/2010/10/07/ choanoflagellate-illustration.

https://doi.org/10.7554/eLife.08904.003

Here we describe the transcriptome dynamics of Creolimax fragrantissima, which has been proposed as a model system for ichthyosporeans (Suga and Ruiz-Trillo, 2013; Marshall et al., 2008). We show that Creolimax employs complex gene regulation including alternative splicing and the use of long intergenic non-coding RNAs (lincRNA). Through analysis of the Creolimax secretome in silico and by proteomics, we also provide evidence of secondary adaptation to a specialized osmotrophic feeding mode through lateral gene transfer (LGT) and gene duplication. Taken together, our results suggest that the last common unicellular ancestor of animals was already capable of implementing elaborate, cell type-specific differentiation programs.

Results

Assembly and annotation of a reference genome

As a reference genome for mapping RNA sequencing (RNA-seq) data, we assembled the 45 Mb draft genome of Creolimax from a combination of 454 reads and mate-paired end Illumina reads corresponding to a 75× coverage. The draft assembly comprises 82 scaffolds, with an N50 of 1.5 Mb (see Materials and methods). The genome is twice as large as that of Capsaspora but in line with sequenced choanoflagellate genomes. We annotated 8695 genes, 92% of which are supported by transcriptional evidence. A diverse array of annotation pipelines rendered functional information on 78% of the predicted protein coding genes (see Material and methods). Among those genes, many belong to gene families involved in multicellularity and development in animals, such as TFs and signaling pathways, as previously reported in targeted gene family analysis (Suga et al., 2014; de Mendoza et al., 2013; de Mendoza et al., 2014).

Transcriptional dynamics reveal differences between Creolimax multinucleate stage and animal development

We isolated two different stages of the Creolimax lifecycle by taking advantage of the cell size difference between the amoeboid stage and the multinucleate growth stages (Marshall et al., 2008). After filtering with a 5 μm mesh, we obtained a highly enriched culture of amoebae. The amoebae encysted and grew for at least 24 hr (Figure 1—figure supplement 1), and then multinucleate coenocytic cysts matured asynchronously, releasing new amoeboid zoospores with different cell sizes to form a heterogeneous culture. Given the drastic morphological difference between the non-motile mitotic multinucleate stage pre-dominant in the 24 hr culture and the motile non-dividing amoeba isolated after filtration, we decided to investigate their transcriptomic differences through RNA-seq (see Material and methods).

The amoeboid and the multinucleate stage showed distinct transcriptional profiles, consistent among replicates (Figure 2A), from which we identified 956 genes as significantly differentially expressed. Functional enrichment analyses of Gene Ontologies (GOs) and PFAM domains (p<0.01, Fisher’s exact test) revealed that the multinucleate stage shows up-regulation of genes associated with cell growth, including ribosome, translation, DNA replication, amino acid and RNA metabolism activities (Figure 2, Figure 2—figure supplement 1). Conversely, in the amoeboid stage, we found enrichment for signaling activities (GTPases and kinases) and an up-regulation of the actin cytoskeleton, most likely involved in the motile behavior of the amoeba (see raw GOs in Figure 2—source data 1). Strikingly, single-celled amoebas also showed up-regulation of extracellular matrix (ECM) adhesion, including the up-regulation of the integrin pathway components.

Figure 2 with 2 supplements see all
Differential gene expression in Creolimax.

(A) Diagram of the amoeboid and multinucleate stages, and heatmap showing the significantly differentially expressed genes across biological replicates in the pair-wise stage comparison. (B) Gene set enrichment analysis for the two stages. Orange represents enrichment in the amoeboid stage and blue represents enrichment in the multinucleate stage, color intensity depicts level of significance (p value). Node size represents the total number of genes in each GO, and edge width represents the total number of genes shared between each enriched GO category. Functionally related GOs are manually circled in gray shade according to functional and genic redundancy established by network connectivity. Complete list of GOs and inclusive groupings are found in Figure 2—source data 1. GOs, Gene Ontologies.

https://doi.org/10.7554/eLife.08904.005

Besides signaling and adhesion activities, regulation through TFs is also considered to be a defining characteristic of animal development (de Mendoza et al., 2013; Levine and Tjian, 2003). To address whether that is the case for the multinucleate stage in Creolimax, we compared the stage-specific expression level of all the TFs encoded in the genome. Contrary to our expectation, the comparison revealed that TF expression is higher in the amoeboid stage (p<0.05, Wilcoxon signed rank test, Figure 2—figure supplement 2A). Among the 127 putative TFs found in the genome, only 5 are significantly up-regulated in the multinucleate stage. These five genes have DNA binding domains with ambiguous TF activity and include genes containing Myb_DNA-binding and cold-shock domains (Figure 2—figure supplement 2B). In the amoeboid stage, we found significant up-regulation of 11 TFs, including T-box, Runx and hemophagocytic lymphohistiocytosis TFs, and TF classes that have important roles in animal development (Sebe-Pedros et al., 2011). Altogether, these data indicate that, unlike in animal development, the multinucleate stage of Creolimax is not characterized by a tight regulation of adhesive, signaling, and TF activities.

Differential alternative splicing down-regulates specific pathways in Creolimax

Alternative splicing is a widespread mechanism for post-transcriptional gene regulation in animals and many other eukaryotes. To evaluate the extent to which alternative splicing expands transcriptome complexity in the intron-rich genome of Creolimax (6.5 introns per gene), we undertook a comprehensive analysis of intron retention and exon skipping (ES) events, and quantified their inclusion levels using previously developed pipelines (Braunschweig et al., 2014; Irimia et al., 2014) (see Material and methods). We observed 3927 intron-retention events that affected 2172 genes. In contrast, we found only 211 genes affected by ES. Despite having a two-fold higher intron density and longer introns compared to Capsaspora, both holozoan species show comparable alternative splicing profiles that are highly dominated by intron retention (Sebé-Pedrós et al., 2013). This provides further support for the hypothesis that ES-dominated alternative splicing is a unique feature of animal transcriptomes (Sebé-Pedrós et al., 2013; Mcguire et al., 2008; Irimia and Roy, 2014).

Next, we evaluated the extent to which intron retention is differentially regulated across Creolimax stages. We found 865 introns with differential retention between stages (differential percent intron retention (ΔPIR) >15), the majority of which were more retained in the amoeboid stage (Figure 3A). Reverse transcription-polymerase chain reaction (RT-PCR) assays confirmed differential retention for all 10 tested introns (Figure 3D and Figure 3—figure supplement 2). Despite these marked differences in intron retention between stages, the size distribution of retained introns was similar in both stages, and comparable to that of constitutively spliced introns (PIR<2% in all samples) (Figure 3—figure supplement 1A,B). GO enrichment analysis of genes with amoeboid-specific intron retention revealed enrichment in spindle pole formation and other mitosis-related activities (Figure 3C), supporting an active role of IR in down-regulating these functions in this stage. Indeed, consistent with recent reports in vertebrates (Braunschweig et al., 2014), genes with amoeboid-specific intron retention showed significantly lower steady-state mRNA levels compared to the multinucleate stage (Figure 3—figure supplement 1C; p<0.0001, Wilcoxon signed rank test; the converse was true for genes with multinucleate-specific intron retention). Therefore, intron retention is a conserved mechanism for reducing transcript levels in pathway-specific genes from unicellular holozoans to vertebrates.

Figure 3 with 2 supplements see all
Regulated alternative splicing modes in Creolimax.

(A) Heatmap showing PIR inclusion levels of differentially retained introns. (B) Heatmap showing the PSI levels of differentially skipped exons. (C) GO enrichment activities of the genes showing differential IR. Bar length indicates the significance of the enrichment, orange indicates those with higher inclusion levels in the amoeboid stage and blue those with higher inclusion levels in the multinucleate stage. (D–E) RT-PCR validations of selected IR and ES events. The values correspond to relative intensity of the alternative isoform (retained intron or skipped exon) bands in the RT-PCR and the proportions observed for the inclusion values in the RNA-seq. (F) GOs enrichment of genes with differential ES, in blue those with higher exon inclusion levels in the multinucleate stage. ES, exon skipping; GO, Gene Ontology; IR, intron retention; PIR, percent intron retention; PSI, percent spliced in; RT-PCR, reverse transcription-polymerase chain reaction; RNA-seq, RNA sequencing.

https://doi.org/10.7554/eLife.08904.009

In the case of ES, we found 63 exons with differential inclusion levels across stages (Figure 3B). RT-PCR assays were used to validate the differential inclusion for all 12 tested cases (Figure 3E and Figure 3—figure supplement 2). Only a minority of these exons (Torruella et al., 2015) keep the reading frame upon skipping, and only two of these overlapped with functional Pfam domains, suggesting that, similar to intron retention, ES in Creolimax predominantly contributed to functionally down-regulating specific genes. GO enrichment analysis revealed that differentially spliced exons belong to genes involved in various biological processes, including channel activity and histone modifications (Figure 3F).

A population of lincRNAs with regulated expression in Creolimax

Another layer of transcriptional complexity is provided by long non-coding RNAs, which are increasingly recognized as important players in animal development and cell type-specific genome regulation (Ulitsky and Bartel, 2013; Morris and Mattick, 2014; Sauvageau et al., 2013). To characterize the repertoire of long non-coding RNAs in Creolimax, we assembled a de novo transcriptome from the RNA-seq data. We filtered out transcripts shorter than 200 bp and mapped the rest to the genome. We then applied a pipeline to identify putative non-coding RNAs, including searches for coding potential, homology, untranslated region (UTR) mis-annotation, and low expression (see Material and methods). Restricting our search to transcripts that did not overlap with protein-coding genes, known as long-intergenic RNAs (lincRNAs ), we identified 692 putative lincRNA loci in Creolimax. In comparison to protein-coding transcripts, lincRNAs in Creolimax were shorter in length, harbored fewer exons, had longer exons, and had a lower GC content (Figure 4—figure supplement 1). Moreover, overall transcription levels of lincRNAs were significantly lower than those of protein-coding genes (p<0.01, Wilcoxon rank sum test). Interestingly, all those characteristics have been reported for animal lincRNAs (Hezroni et al., 2015; Pauli et al., 2012; Gaiti et al., 2015).

To infer possible functions of Creolimax lincRNAs, we first looked at the functional annotations of the closest neighboring protein-coding genes, as animal lincRNAs are enriched near developmental genes and TFs (Ulitsky and Bartel, 2013). In Creolimax, the only significantly enriched GOs of lincRNA closest neighboring genes were metabolic activities (p<0.01). However, when we analyzed the vicinity of all the TFs, we found that 23.6% had at least one neighboring lincRNA, a significant enrichment compared to the rest of the genome (14.5%, p=0.0007, Fisher’s exact test). Next, we searched for putative homologs of the lincRNAs in the transcriptomes of closely related species and other unicellular holozoans, but we did not retrieve any positive hits. This pattern of rapid evolutionary sequence turnover of lincRNAs has also been described for animals (Gaiti et al., 2015; Kapusta and Feschotte, 2014), where homology detection based on sequence similarity is restricted to short evolutionary distances.

In animals, expression of lincRNAs is generally restricted to specific tissues and organs (Ulitsky and Bartel, 2013; Necsulea and Kaessmann, 2014). In Creolimax we detected 51 lincRNA loci that were differentially expressed between the amoeba and the multinucleate stage (Figure 4). Overall, only 7% of the total detected lincRNAs are differentially expressed, compared to 10% of coding genes (p=0.0059, Fisher’s exact test). Thus, in stark contrast to animals, the lincRNAs in Creolimax appear to be less cell type-specific than coding genes.

Figure 4 with 2 supplements see all
Transcriptional and post-transcriptional regulation of lincRNAs in Creolimax.

(A) Heatmap showing transcriptional levels of significantly differentially expressed lincRNAs across biological replicates of amoeboid and multinucleate stages. (B) Example of genomic region where two lincRNA loci are found in tail-to-tail orientation surrounded by two protein-coding genes. (C) RT-PCR validations of the lincRNA loci. (D) Barplot depicting the average gene expression of those lincRNA in each stage. (E) Alternative splicing isoforms of lincRNAs showing various degrees of IR. IR, intron retention; lincRNA, long intergenic non-coding RNAs; RT-PCR, reverse transcription-polymerase chain reaction.

https://doi.org/10.7554/eLife.08904.012

To test whether the transcription of lincRNAs is linked to their upstream genes, we first subdivided lincRNAs into two categories: head-to-head, where the lincRNA and the adjacent protein-coding gene have opposite orientation and may share the same promoter, and head-to-tail, where the lincRNA promoter is downstream of the adjacent protein-coding gene (Figure 4—figure supplement 2). Head-to-head orientation was significantly over-represented in the genome (485/692, 70% of the total vs the 50% expected by chance, p<3.6e-14, χ2 test). Furthermore, pairs in this head-to-head orientation showed a higher proportion of positively correlated expression levels, although this was similar to pairs of protein-coding genes in the head-to-tail orientation. Moreover, 80% of differentially regulated lincRNAs are in head-to-tail orientation or display uncoupled transcriptional profiles in relation to their neighbor genes (p<0.05 Pearson’s correlation coefficient). Although the possibility that bidirectional promoters are the main source of lincRNAs in Creolimax cannot be excluded, differential regulation of lincRNA is largely independent of their neighboring protein-coding genes.

Finally, we analyzed the impact of alternative splicing in lincRNAs. Although the majority of lincRNAs have a single exon, 132 (19.1%) loci have two or more exons. Among those multi-exon transcripts, we detected 183 intron-retention events (Figure 4E). Of these, eight events differed between samples (PIR>15), suggesting differential regulation of lincRNAs also at the level of splicing. Therefore, both transcriptional regulation and alternative splicing may play a role in the active regulation of lincRNAs in Creolimax.

Evidence of species-specific cell type evolution in Holozoa

Direct comparison of steady-state transcript abundance is useful to reveal cross-species molecular homologies in animals, as homologous tissues in different species show more similarity in the transcriptional profiles than do non-homologous tissues in a single species (Barbosa-Morais et al., 2012; Chan et al., 2009; Brawand et al., 2011). Using a similar rationale, we sought to compare the different cell stages of unicellular holozoans. From normalized transcriptional levels of a set of 2177 one-to-one protein-coding orthologs between Salpingoeca, Capsaspora, and Creolimax, we calculated pairwise Spearman correlation distances, which were then used to perform hierarchical clustering and neighbor-joining tree reconstructions (Figure 5). Both approaches showed consistent trees with species-specific clustering for the different stages. The signal obtained from those 2177 genes was enough to cluster samples consistent with their whole transcriptome patterns (except one sample from Salpingoeca).

Holozoan cross-species comparison of transcriptional profiles.

(A) Symmetrical heatmap of the pair-wise Spearman correlation coefficients for the gene expression profiles of each cell stage. For each sample, log2(cRPKM+1) expression levels were obtained for 2177 one-to-one orthologs in the three species analyzed (see Materials and methods). Dashed-line squares highlight the direct comparisons for 1) Cfra multinucleate stage replicates against Cowc cystic stage replicates and 2) Cfra amoeboid stage replicates against Cowc aggregate and filopodial stage replicates. (B) Neighbor-joining tree of the species cell stages based on the aforementioned Spearman correlation distances matrix. Filled circles represent >95% bootstrap replicate nodal support. (C) The cell types plotted according to the values of the principal components 2 and 3 from a PCA of a dataset of 3030 1-to-1 orthologs between Capsaspora and Creolimax. (D) The significant GO enrichments for the top positive loading genes (>0.03) of the principal component 2 and 3. Cfra, Creolimax fragrantissima; Cowc, Capsaspora owczarzaki; GO, Gene Ontology; Sros, Salpingoeca rosetta; PCA, principal component analysis.

https://doi.org/10.7554/eLife.08904.015

Direct cell type comparisons between species revealed that Creolimax multinucleate stage and Capsaspora cystic stage are the most dissimilar among all samples, whereas the amoeboid stage in Creolimax has a stronger correlation with the filopodial and aggregative stages of Capsaspora (Figure 5A). In order to characterize gene sets responsible for those cross-species similarities, we used a principal component analysis (PCA) to distinguish between the species-specific factors and the underlying biological similarities. Species-specific variability was captured by PC1 (47.96%); however, when we plotted PC2 and PC3 we obtained different groupings of cell types independent of their species of origin (Figure 5C). PC2 grouped the multinucleate stages of Creolimax with the aggregative and the filopodial stages of Capsaspora. This grouping was explained by top-loading genes involved in DNA replication and spliceosomal activities (Figure 5D), a pattern consistent with the down-regulation of mitotic activity and cell growth in the Capsaspora cystic stage (Sebé-Pedrós et al., 2013). On the other hand, PC3 grouped the amoeboid stage of Creolimax and the filopodial and aggregative stages of Capsaspora. PC3 was loaded with genes involved in the integrin adhesome and amoeboid actin-based motility and signaling activities (Figure 5D). Therefore, despite the vast phylogenetic divergence among holozoan species, some functional patterns can be recovered through direct comparison of transcriptional profiles.

We used the same approach to compare RNA-seq data from a wide range of human cell types and tissues with those of Creolimax (Figure 6A). Certain human cell types showed differential grades of similarity in the expression profiles with each of the Creolimax stages. The human cell types with a higher positive correlation with the multinucleate stage were those with high proliferation rates, including embryonic stem cells, induced pluripotent stem cells and transformed cell lines (293T, HeLA, K562) (Figure 6B). Similar patterns were obtained using PCA on the normalized steady-state transcriptional levels from both species. PC1 explained most of the variability due to species-specific transcriptomic profiles, whereas PC2 distinguished between highly proliferative cell types and the rest in both species (Figure 6C). Consistent with this observation, GO enrichment analysis of the top 125 genes that most contributed to PC2 showed highly significant enrichment for genes involved in genome replication and proliferation (Figure 6D). These results suggest that a signal from the evolutionarily conserved machinery for cell proliferation in eukaryotes (Harashima et al., 2013; Cross et al., 2011) can be detected from direct expression pattern comparisons between the multinucleate stage of Creolimax and highly proliferative human cell types.

Comparison of human and Creolimax cell types and tissues.

(A) Symmetrical heatmap of the pair-wise Spearman correlation coefficients for the gene expression profiles of each cell type or tissue. For each sample log2(FPKM+1) expression levels were obtained for 2272 one-to-one orthologs between Creolimax and human (see Materials and methods). (B) The human cell types sorted by the difference of Spearman correlation between the amoeboid and the multinucleate cell stages. Highlighted in gray are those that displayed the major differences (>0.05). (C) The cell types plotted according to values of the principal components 1 and 2 from a PCA of the same transcriptional dataset of 2272 orthologs. The dots in gray are the human cell lines highlighted in the previous section. (D) The significant GO enrichments for the top positive loading genes of the principal component 2 (>0.04). Sampled human cell types described in Figure 6—source data 1. GO, Gene Ontology; PCA, principal component analysis.

https://doi.org/10.7554/eLife.08904.016

Conserved ancient co-regulated gene modules: the Integrin adhesome

To obtain more specific insights into the evolution of co-regulated gene programs across holozoans, we investigated several gene modules with key roles in animal multicellularity. Despite gene co-regulation providing indirect evidence for shared functionality in a given species (Gerstein et al., 2014; Stuart, 2003), comparative analysis of transcriptional co-regulated gene modules in different species may offer additional insights into the functional evolution of the animal multicellular toolkit.

First, we focused on the integrin adhesome, which is crucial for ECM adhesion in animals. The core components of the integrin adhesome have been identified in Capsaspora (Sebe-Pedros et al., 2010). Interestingly, these components are significantly up-regulated in the aggregative stage (Sebé-Pedrós et al., 2013). In contrast, most integrin adhesome components have been lost in choanoflagellates (Sebe-Pedros et al., 2010). In the case of Creolimax, we found a nearly complete repertoire of the integrin adhesome components (Figure 7D), and most of them are significantly up-regulated in the amoeboid stage (Figure 7A). To test whether the integrin adhesome components constitute a co-regulated gene module in the three unicellular holozoans (ichthyosporeans, filastereans, and choanoflagellates), we calculated the transcriptional Pearson correlation coefficients between all of the genes encoding the integrin adhesome for each species (Figure 7A–C). In the case of Creolimax, we observed a remarkable co-regulation among all of the components. The only exception is the Src tyrosine kinase (Figure 7A), suggesting that tyrosine kinase signaling is not connected to the integrin signaling pathway in Creolimax, consistent with the absence of a focal adhesion kinase in the ichthyosporean lineage (Suga et al., 2014). In the filasterean Capsaspora, we identified a more complex pattern of co-regulation, grouped in two submodules that are associated with specific paralogs of both alpha and beta integrins (Figure 7B). This complex pattern of co-regulation could be affected by the number of Capsaspora’s samples, higher than those analysed in Creolimax. In contrast, we did not detect any co-regulation among the few conserved integrin adhesome components in the choanoflagellate Salpingoeca (Figure 7C), suggesting that gene loss in the choanoflagellate lineage was accompanied by dismantling of this ancient co-regulatory module.

Figure 7 with 2 supplements see all
Co-regulation of the integrin adhesome in holozoans.

Heatmaps depicting expression levels of integrin adhesome orthologs (red–green) and their pair-wise Pearson correlation coefficients (white–blue) obtained from genome-wide FPKM transcriptional levels in the ichthyosporean Creolimax (A), the filasterean Capsaspora (B), and the choanoflagellate Salpingoeca (C). (D) Diagram of integrin adhesome components, those in green are found in Creolimax and those in white are absent. In gray, a tyrosine kinase receptor with extracellular EGF domains encoded Creolimax genome that could be interacting with an ECM component. (E) Repertoire of animal ECM domains in the three unicellular holozoan genomes; green = presence, white = absence. (F) Pfam domain architectures of fibronectin-domain containing genes in Creolimax. ECM, extracellular matrix; FPKM, fragments per kilobase of exon per million fragments mapped.

https://doi.org/10.7554/eLife.08904.018

In animals, integrins mediate cell-adhesion through binding to ECM proteins, such as laminins and fibronectins (Cromar et al., 2014). In Capsaspora, laminin-containing genes with predicted secretion signals are up-regulated in the aggregative stage (Sebé-Pedrós et al., 2013). However, despite showing a co-regulated integrin adhesome, we did not find any laminin-containing genes in Creolimax (Figure 7E). The only ECM-related domains that we found in the genome were fibronectins, but those are fused to metabolic domains unrelated to the ECM and none had a secretion signal peptide (Figure 7F). Therefore, it seems unlikely that integrins are involved in adhesion to an endogenous animal-like ECM in Creolimax.

Next, we investigated the filopodial machinery (Sebé-Pedrós et al., 2013). We observed strong positive co-regulation of a core set of components in all unicellular holozoans (Figure 7—figure supplement 1), consistent with the use of an ancestral cellular machinery for filopodial formation. In stark contrast to the situation for cell replication machinery, we did not observe a single co-regulated module in any unicellular species for genes involved in animal neuronal pre-synaptic and post-synaptic processes, including most pairs of genes that are known to directly interact in animals (with the exception of syntaxin, synaptobrevin, and synaptogryin, which are involved in secretory vesicle formation (Burkhardt et al., 2011); Figure 7—figure supplement 2). In fact, a lack of interaction has been shown for some of the core proteins involved in the post-synaptic scaffold in Salpingoeca (Burkhardt et al., 2014). Thus, our results suggest that some molecular complexes directly involved in cell morphology and behavior already formed co-regulated gene modules in unicellular holozoans, whereas other complexes involved in unique animal cell types were assembled later, despite having conserved orthologs.

The secretome of Creolimax shows convergent adaptations to a specialized osmotrophic lifestyle

Ichthyosporeans are an interesting case of convergent evolution of fungal-like traits; in fact, they were once thought to be fungi based on their lifestyle and morphology (Glockling et al., 2013; Mendoza et al., 2002). For instance, both groups have a cell wall, similar parasitic lifestyles, and a specialized osmotrophic feeding mode, unlike any of the other holozoan lineages (Figure 1). Specialized osmotrophs are characterized by their highly adapted secretomes that are key to the external digestion of complex compounds. Therefore, we analyzed the secretome of Creolimax to investigate the convergent evolution of this specialized osmotrophic feeding mode in the ichthyosporeans. In addition, secretome analysis should provide further insights into the production and modification of ECM and signaling ligands.

To characterize the secretome, we performed a combination of in vivo and in silico approaches. First, using high-throughput proteomics of an in vivo secretome sample in culture conditions, we identified 91 proteins. Next, we applied an in silico approach to predict 453 proteins that are likely to be secreted (see Material and methods). Interestingly, only 48 proteins were common to both datasets, indicating that proteins without a canonical signal peptide or with additional transmembrane domains could also be found in the in vivo secretome (Figure 8A). Among the 43 non-canonically secreted proteins, we detected some with transmembrane domains that could be the product of shedding, as proposed for other species (Meijer et al., 2014). For instance, we found peptides corresponding to the extracellular region of both alpha and beta integrins. Although shedding of integrins has been observed during the inflammatory response in animals (Gomez et al., 2012), the functional implications of integrin shedding in Creolimax remain elusive.

Functional enrichments of Creolimax secretome.

(A) Venn diagram showing the number of genes identified in the Creolimax secretome through an in silico approach (see Material and methods) and an in vivo proteomics approach. A circle diagram describes the features of genes only identified in the in vivo approach, lacking a signalP or having TM domains . (B) GO categories and (C) PFAM domains enriched in the secretome; in dark blue are those enriched in the in vivo dataset; in pale blue are those enriched in the in silico dataset; in gray are the total amount of PFAM-domain containing genes in the genome. GO, Gene Ontology.

https://doi.org/10.7554/eLife.08904.021

Both secretome datasets were highly enriched in peptidase and proteolytic activities (Figure 8B), whereas only the in silico dataset showed further enrichments in other catabolic functions such as lipase and hydrolase activities. Analysis of Pfam-domain enrichment consistently revealed proteolytic and peptidase domains in both datasets (Figure 8C). In contrast, an in-depth search for candidate ECM-related proteins and diffusible ligands based on Pfam domains (using known animal domains as EGF, DSL, or laminins) retrieved no positive hits. Thus, the proteolytic activity of the Creolimax secretome does not seem to be related to modifying the endogenous ECM, but possibly to the adaptation to a specialized osmotrophic feeding mode.

Osmotrophic lifestyles are characterized by the external digestion of complex polymers (Talbot, 2013). Therefore, the enrichment in proteolytic activities observed in the secretome strongly suggests that proteins and peptides are the main food source of Creolimax, at least under culture conditions. Such external digestion requires a coupled mechanism for nutrient uptake (Talbot, 2013). Consistent with this requirement, we found 38 genes with four distinct Pfam domains (PF00324, PF01490, PF03169 and PF00854) that are predicted to be involved in the amino acid and oligopeptide transporter activity.

The most-abundant gene family in Creolimax secretome was the trypsins (15 proteins in the in vivo proteomic dataset out of the 31 genes found in the genome). Trypsins are usually found in multiple copies in animal genomes, and they have important roles in the digestive system as serine proteases and also as ECM remodellers. To further complement these observations, we profiled a wide range of eukaryotes and found independent expansions of trypsins in other osmotrophic lineages (Figure 9A). Among these, two fungal species (Coemansia reversa and Conidiolobus coronatus) and an oomycete (Saprolegnia parasitica) are also animal-dwelling parasites, thus sharing a similar lifestyle to the known ichthyosporeans. Phylogenetic analyses of Creolimax trypsins (Figure 9B) revealed that most are the product of rapid lineage-specific gene duplications, a common source of molecular adaptation (Kondrashov, 2012). Moreover, we found very distinct patterns of transcript and protein abundance (measured by the number of unique peptides identified) across the trypsin paralogs (Figure 9B), suggesting a rapid process of functional diversification. Thus, trypsins seem to be a recurrently used effector gene family in osmotrophic eukaryotes, and have evolved rapidly in Creolimax.

Trypsin evolution.

(A) Barplot showing the total number of Trypsin proteins (PF00089) found in the genomes of diverse eukaryotes. Branches are color coded according to the taxonomy shown in the legend. (B) Maximum-likelihood phylogenetic tree based on the amino acid sequence of the Trypsin domain from Creolimax fragrantissima and Sphaeroforma arctica. Expression levels obtained from genome-wide FPKM calculation are shown. Number of unique peptides obtained from the in vivo secretome proteomic dataset is also shown. In red are those genes that do not present a signal peptide according to SignalP. FPKM, fragments per kilobase of exon per million fragments mapped.

https://doi.org/10.7554/eLife.08904.023

Recurrent prokaryotic LGT events shaped the Creolimax secretome

LGT is a rare but recurrent mechanism for the acquisition of osmotrophy-related genes in both fungi and oomycetes (Talbot, 2013). We therefore investigated LGT in Creolimax by building automatic taxon-rich phylogenies. Manual inspection of these trees yielded a list of 163 genes that confidently branched within bacterial clades, supporting bacterial origins. Of these, 35 (21.5%) were found only in Creolimax among all eukaryotes sampled, 50 (30.7%) were shared with the close relative Sphaeroforma arctica, and 71 (43.6%) were shared with other ichthyosporeans (Figure 10—figure supplement 2). Although most LGT genes (102 out of 163) were intron-less, the vast majority (143, 87.7%) showed transcriptional support from our polyA-selected RNA-seq, minimizing the possibility of bacterial contaminations. LGT from bacteria accounts for 1.8% of the total proteome, which is slightly higher than in other eukaryotic genomes (Alsmark et al., 2013). Importantly, we found 6 genes acquired by LGT in the in vivo secretome and 18 more in the in silico secretome (Figure 10—figure supplement 2). Therefore, similar to other eukaryotic lineages (Richards et al., 2011; Richards et al., 2011), ichthyosporeans have enriched their osmotrophy-related gene complement through prokaryotic LGT.

Among the six genes of prokaryotic origin that we found in the in vivo secretome, three belonged to the spore coat homology (CotH) family (PF08757). CotH proteins were first described as being fundamental for spore coat formation in the bacteria Bacillus subtilis (Naclerio et al., 1996), but they have recently been characterized as critical factors for host invasion in the fungus Rhizopus oryzae (Gebremariam et al., 2014). Our phylogenetic reconstruction of CotH family evolution revealed that the presence of CotH homologs in R. oryzae and other fungi (only belonging to the class mucorales) originated from an LGT event that was independent of those found in ichthyosporeans (Figure 10 and Figure 10—figure supplement 1). Interestingly, both lineages have expanded their CotH family members after LGT acquisition. Moreover, CotH genes were also found in other eukaryotic and archaeal lineages, suggesting a complex history of interdomain LGT similar to that recently described for other gene families (Funkhouser-Jones et al., 2014; Chou et al., 2014). Active gene duplication and domain shuffling characterized the evolutionary history of CotH in the ichthyosporea, where we could observe variable transcriptional levels and peptides among distinct paralogs, as well as the acquisition of an N-terminal LTD domain (PF00932) (Figure 10B). Active transcription and secretion of Creolimax CotH genes in axenic culture conditions underscore their putative role in host invasion and highlight the importance of LGT in effector gene acquisition across different osmotrophic lineages.

Figure 10 with 2 supplements see all
CotH evolution.

(A) Maximum-likelihood phylogenetic tree of the CotH domain (PF08757). Nodal support is shown in key branches (100 maximum likelihood replicates bootstrap values and Bayesian posterior probabilities). Color code indicates taxon distribution in each clade as depicted in the legend; for a detailed tree, see Figure 10—figure supplement 1. (B) Detail of the phylogenetic tree depicting ichthyosporean CotH sequences, covering Creolimax, Sphaeroforma arctica, and Amoebidium parasiticum. Expression levels obtained from genome-wide FPKM calculation and the number of unique peptides obtained from the in vivo secretome proteomic dataset are shown. Domain configurations obtained from a PfamScan analysis. Gene identifiers in red are those that do not present a signal peptide according to SignalP. FPKM, fragments per kilobase of exon per million fragments mapped.

https://doi.org/10.7554/eLife.08904.024

Discussion

In this study, we showed how different layers of genome regulation shape the coenocytic development and lifecycle of the ichthyosporean Creolimax, a unicellular relative of animals. These regulatory layers include complex mechanisms that also play a role in animal development and cellular differentiation. In Creolimax, differential genome regulation is not only limited to the transcriptional control of protein coding genes, but also includes cell-stage specific alternative splicing and lincRNA expression. Despite some global similarities to animal regulation, we observed significant differences in how these regulatory layers are deployed in Creolimax and animals. For example, the population of lincRNAs that we described in Creolimax does not show a major enrichment near developmental genes nor a highly specific cell type-dependent expression, both hallmarks of metazoan lincRNAs (Ulitsky and Bartel, 2013; Gaiti et al., 2015). The differential processing of the intron-rich genes during alternative splicing in Creolimax is dominated by IR and not ES. Although this is similar to most other eukaryotic groups including sponges (Fernandez-Valverde et al., 2015), it contrasts with alternative splicing in more complex animals, which predominantly involves ES (Sebé-Pedrós et al., 2013; Mcguire et al., 2008). Therefore, while alternative splicing in Creolimax is likely to contribute to the regulation of gene expression, it does not provide a greater expansion of the proteome by generating multiple protein isoforms from individual genes as in animals (Irimia and Roy, 2014).

Intriguingly, while this complex genome regulation in Creolimax often involves similar gene toolkits to those employed in animal development, it is the unicellular amoeboid stage and not the multinucleate stage that is defined by multicellularity-related activities. This is the case for most differentially expressed TFs, signaling pathways, and adhesion molecules, which are characteristically associated with animal development and multicellularity. In fact, our results indicate that the transcriptional profile of the multinucleate stage of Creolimax is more similar to those of highly proliferative cell types in humans, despite the obvious differences in cell morphology and cell division strategy (cell divison vs. coenocytic nuclear division). Thus, we consider that the multinucleate stage of Creolimax can be regarded as a highly specialized proliferative cell type. This cell type can be considered to function in a manner analogous to stem cells, with the undifferentiated nuclei dividing in the multinucleate cell before differentiation into amoeboid cells closes the lifecycle. This would suggest that separation of functions as crucial as self-renewal and differentiation can occur in a unicellular context in a temporal manner, which pre-dates the exclusive ability of multicellular organisms to engulf both functionally distinct cell types within a single entity (Arendt, 2008; Hemmrich et al., 2012). Drastic differences in cell structure, morphology, function, and molecular signatures found between stages in protistan relatives of animals indicate that cell stages can be considered cell types according to established definition (57 and see author’s response section for an in-depth discussion on this topic).

In addition to assessing the impact of genome regulation in Creolimax, our multiple genome-wide approaches reveal other novel aspects of ichthyosporean biology. For example, we show that Creolimax has undergone secondary adaptation to a specialized osmotrophic feeding mode, shaping its secretome and genome through both gene duplication and acquisition of bacterial genes by LGT. These observations highlight the uniqueness of the different holozoan lineages, each representing derived specializations from an ancestral state. However, these specializations of unicellular holozoans are achieved through a largely common genetic toolkit shared with animals, including several signaling pathways, TFs and adhesion molecules (Richter and King, 2013; Suga et al., 2013). Moreover, many of the components of this toolkit are assembled in co-regulated gene modules preserved since the common ancestor of all holozoans, suggesting that recurrent recruitment of full co-regulated gene programs underlies the evolution of lineage-specific cell types and developmental modes (Newman, 2012). Additional complementary insights on the evolution of developmental modes will be provided by studying the immediate out-group of the Holozoa and Holomycota (fungi + Discicristoidea) (Torruella et al., 2015). The Holomycota show a wide variety of developmental modes, ranging from aggregative fruiting body formation to several modes of coenocytic development (Stajich et al., 2009; Brown et al., 2009).

We have shown that the diversity of cellular behaviors and morphologies observed in holozoan lifecycles is likely to have evolved from lineage-specific specializations. A widely accepted hypothesis states that the origin of animals involved a clonally dividing organism, similar to choanoflagellate colonies, that subsequently evolved specialized cellular differentiation (King, 2004; Nielsen, 2008). A competing hypothesis, however, suggests that pre-existing cell types and their associated molecular mechanisms were integrated in the spatiotemporal developmental dynamics of the last common ancestor of all animals (Mikhailov et al., 2009). Our data suggests a third mixed model, in which the capacity to build differentiated cell types and transient multicellular entities was not a rare feature in pre-metazoan evolution. Nevertheless, the rise of animal multicellularity is not directly homologous to any ancestral developmental mode, and it may be seen as another derived specialization involving the integration of ancestral molecular modules and their associated cell behaviors into one single multicellular entity. Consequently, our results reveal the importance of obtaining complementary data from multiple lineages before significant insights can be gained into the organism that took the first steps on the road towards complex animal multicellularity.

Materials and methods

Cell culture and nucleic acid extraction

Creolimax fragrantissima cells (available from the Canadian Centre for the Culture of Microorganisms under accession numbers CCCM 101 – 107) were grown axenically in liquid medium (marine broth Difco 2216) at 12ºC. To obtain biological replicates for the RNA extraction, three independent cell lines were isolated from distinct colonies grown on a marine agar plate (marine agar Difco 2216). Those initial isolates were then grown in liquid medium (marine broth Difco 2216). After one pass, new 1/10 subcultures (10 ml) were initiated and grown statically for 5 days in 25 ml flasks. When the cells became confluent on the fifth day, they were scratched and passed into 50 ml flasks with an additional 25 ml of fresh medium. These 50 ml flasks were then grown for 48 hr with gentle agitation (150 rpm), allowing the mature coenocytes to form clumps. Then, the 50 ml flasks were filtered using a 5.0 μm Isopore membrane filter (Millipore ) and collected into a 50 ml Falcon tube . As only amoebas pass through the 5 μm filter step, filtered cells were then immediately pelleted by centrifugation at 1500 rcf for 3 min and harvested to get the RNA from the amoeboid stage. For the multinucleate-stage RNA, filtered cells were re-cultured in a new 50 ml flask, grown for 24 hr and harvested by centrifugation at 3000 rcf for 3 min. For all cell lines and stages, the RNA was extracted using Trizol reagent (Life Technologies, Carlsbad, CA) with a further step of DNAseI (Roche) to avoid gDNA contamination, and then purified using RNeasy columns (Qiagen).

Genome sequencing, assembly and annotation

We generated 1.7 million 454 single reads and 34 million Illumina 5kb mate-pair reads (both after trimming, totaling 3.4G bp). Those were combined and preassembled with a Newbler 2.7 assembler (Roche). The mitochondrial DNA sequence was removed before the assembly. Using the pairing information of the Illumina mate-pair reads, the 846 pre-scaffolds were broken at unreliable positions found by REAPR 1 (Hunt et al., 2013) and re-assembled by SSPACE 2 (Boetzer et al., 2011). Some of the N-stretches within the scaffolds were filled by Gapfiller (Boetzer and Pirovano, 2012). This array of improvement tools assembled the pre-scaffolds into 196 sequence pieces. We then used the pairing information of the mate-pair reads for further assembly improvement, breaking and re-connecting the scaffolds. Finally, we obtained an assembly with 82 final scaffolds, of which 29 were short (<1000 bp) fragments.

To predict the protein-coding genes from the whole genome sequence, we used Augustus 2.7 (Stanke et al., 2008) combined with RNA-seq data (see details on this data in ‘RNA-seq and differential expression analysis’ section). We followed the protocol described here: http://bioinf.uni-greifswald.de/bioinf/wiki/pmwiki.php?n=IncorporatingRNAseq.Tophat. Briefly, we pooled all the RNA-seq samples and mapped them to the genome using Tophat2 (Kim et al., 2013), using the resulting introns to train Augustus ab initio predictions in an iterative process. The resulting predictions were manually screened in a genome browser and compared to the spliced-aligned reads resulting from Tophat2. We further validated our predicted annotation, comparing the data to a set of genes that we had previously cloned by RT-PCR and rapid amplification of cDNA ends PCR, including highly expressed genes (e.g. Histone 2B, Tubulin beta) and lowly expressed genes (e.g. Myc, Grainyhead, p53, Src Tyrosine Kinase).

Moreover, we used the mapped transcriptome to perform a genome-guided Trinity assembly (Haas et al., 2013). Those transcripts were then used to annotate the UTRs of the protein-coding genes resulting from the Augustus annotation step. The elongation of the UTRs was done using PASA (Haas, 2003). The transcripts that did not overlap with Augustus annotations were then searched against the NCBI non-redundant protein database using tBlastX. Those that retrieved significant hits (e-value <10e-3) and had clear open reading frames were then manually annotated as protein-coding transcripts. The resulting annotation retrieved 8695 genes, which are available here: http://dx.doi.org/10.6084/m9.figshare.1403592.

To functionally annotate the genes, we used Blast2GO (Gotz et al., 2008), searching the protein sequences against the NCBI non-redundant database using a BLASTP threshold of 10e-6 and the default InterProScan settings. We also performed a PfamScan search with PFAM A database version 26 using the default gathering threshold parameters (Punta et al., 2012). As a result, 6814 genes were functionally annotated.

RNA-seq and differential expression analysis

100-base paired-end libraries were constructed using the TruSeq Stranded mRNA Sample Prep kit (Illumina, San Diego, CA, USA). The libraries were sequenced in two lanes of an Illumina HiSeq2000 instrument at the CRG genomics unit (Barcelona, Spain). We obtained 417 million reads that were then mapped to the genome using Tophat2 (Kim et al., 2013), resulting in an average mapping of 82%. Raw gene counts and FPKM (fragments per kilobase of exon per million fragments mapped) values were obtained using Cufflinks2 (Trapnell et al., 2013). Differential expression analysis was performed by comparing the three replicates from each stage using DEseq (Anders and Huber, 2010) (threshold 5e-5), EdgeR (Robinson et al., 2010) (threshold 5e-5), Cuffdiff2 (Trapnell et al., 2013) (threshold 5e-5), and NOIseq (Tarazona et al., 2011) (threshold 0.8). Only the genes that were identified as differentially expressed with at least three methods were taken, resulting in 956 genes. Data can be downloaded from GEO GSE68616.

GO enrichments were obtained using the Topology-Weighted method in Ontologizer (Bauer et al., 2008) taking a P value lower than 0.01 as a threshold (see full list in Figure 2—source data 1). The resulting Gene Ontology (GO) enrichments were then visualized as a network in Cytoscape using the Enrichment map plug-in (Merico et al., 2010). Enrichment map plug-in connects GO terms according to gene annotation; therefore connected GOs belong to the same set of genes with multiple associated GOs. We used network connectivity between enriched GO terms as a criterion to collapse GO redundancy, shading general GO groups in more inclusive categories as seen in Figure 2. Inclusive categories complementary relied on the functional similarity between GOs based on GO definitions (e.g. distinct unconnected aminoacid metabolic pathways are collapsed in ‘Aminoacid metabolism’ inclusive category). Additionally, PFAM domain enrichments were calculated using a Fisher’s exact test implemented in R, taking a threshold of 0.01. PFAM enrichments were also visualized as a network using the Enrichment map plug-in, where connected nodes reflect domain presence in the same genes.

Genome-wide analysis of alternative splicing

Identification and quantification of ES (including events with single or multiple cassette exons and microexons of 3–15 nucleotides [nt]) and IR were performed as previously described (Braunschweig et al., 2014; Irimia et al., 2014). For ES, we used two complementary approaches. First, we implemented a ‘splice site-based module’, which utilizes the joining of all hypothetically possible exon–exon junction (EEJ) forward combinations from annotated and de novo splice sites (as described in [Han et al., 2013]). To identify splice sites de novo, for each annotated splice site donor/acceptor, we scanned two downstream/upstream introns for potential splice-site acceptors/donors that would constitute a novel EEJ. Next, we mapped our RNA-seq data to this library of all potential EEJs, and considered ‘novel splice sites’ those supported by at least five reads mapped to multiple positions of the EEJ. Then, we implemented our recently described ‘microexon module’ (Irimia et al., 2014), which also includes de novo searching of pairs of donor and acceptor splice sites in intronic sequences to detect novel, very short (3–15 nt) microexons. For IR, we used our recently described pipeline (Braunschweig et al., 2014), which employs a comprehensive set of reference sequences comprising exon–intron junctions (EIJs), intron sequences (if introns were longer than 200 nt, only a mid-intron segment of 200 nt was used), and EEJs formed by intron removal. Introns were classified as retained when there was a balanced accumulation of reads mapping to 5´ and 3´ EIJs and the intron body sequence, relative to the EEJ sequence. The level of retention was calculated based on PIR, which is the percentage of transcripts from a given gene in which the intron sequence is present.

In all modules, quantification of alternative sequence inclusion in the transcripts is derived only from junction reads (either EEJs or EIJs). To increase the fraction of mapping junction reads within each RNA-Seq sample, each read was first split into 50 nt read groups using a sliding window of 25 nt. Therefore, each 100 nt (replicates a and b) and 125 nt (replicate c) read produces 3 and 4 overlapping reads, respectively. In addition, both read mates from the paired-end sequencing were pooled. These 50 nt split reads were then mapped to the genome using Bowtie (Langmead et al., 2009) with –m 1 –v 2 parameters (unique mapping with no more than two mismatches). Reads that mapped to the genome were discarded for ES quantifications. For quantification, only one random count per read group (i.e. all sub-reads coming from the same original read) was considered to avoid multiple counting of the same original sequenced molecule. In addition, for all modules and alternative splicing types, final read counts were corrected for the number mappable positions in each EEJ or EIJ following the formula:

Corrected_EEJcount=EEJcount*MaximummappabilityEEJmappability

where EEJcount is the number of read groups mapped to the EEJ, Maximummappability the maximum number of mapping positions that any EEJ can have for reads of length 50 nt (i.e. 35 positions), and EEJmappability the number of positions that can be mapped uniquely to the EEJ using specific bowtie parameters (–m 1 –v 2), and thus EEJmappability  Maximummappability (see [Barbosa-Morais et al., 2012; Han et al., 2013] for details).

The different modules to detect and quantify AS have been integrated into vast-tools (https://github.com/vastgroup/vast-tools; species key “Cfr”). Associated files can be downloaded at http://vastdb.crg.eu/libs/vastdb.cfr.31.1.15.tar.gz.

We used a threshold of ≥20 PIR in at least one stage for positive intron retention events. As a threshold for ES events we used skipping rates below 90% (measured using the metric Percent Spliced In, PSI) in at least one stage. In both cases, the minimum coverage allowed was 20 reads per splice junction. To evaluate differential alternative splicing, we used differences over 15 PSI or PIR between stages, allowing a standard deviation of <10 between replicates.

lincRNA annotation

From the genome-guided Trinity assembly (see Genome sequencing, assembly and annotation) we further analyzed the transcripts that did not retrieve any significant TBLASTX hit against the NCBI non-redundant database (e-value > e-3) and were more than 200 bp long. To avoid lineage-specific protein-coding genes, we performed an additional TBLASTX search against six frame translations of the de novo assembled transcriptomes of several closely related species (S. arctica, Ichthyophonus hoferi, Pirum gemmata, Amoebidium parasiticum, Abeoforma whisleri, Corallochytrium limacisporum, C. owczarzaki, S. rosetta, Monosiga brevicollis) and the protein coding genes of Creolimax, filtering out the positive hits (e-value < e-3). From the remaining transcripts, we performed a RfamScan_2 search against RFAM 11.0 (Burge et al., 2013) to annotate all known ncRNAs (e.g. U6, 18S, 28S). Additionally, we used the coding potential calculator (Kong et al., 2007) to discard all those transcripts with putative coding potential (coding potential score < −0.5). With the final list of transcripts, we selected those that did not overlap with gene+UTRs annotations or were close to uncertain assembly regions (multi-N stretches). For all those transcripts that were in a head-to-tail orientation regarding protein-coding genes, we manually inspected those that had an intergenic distance shorter than 1000 bp to the nearby gene to filter out misannotated UTRs. Additionally, we discarded transcripts overlapping repetitive regions of the genome. Finally, we collapsed all the remaining transcripts into single loci and quantified their expression level using Cuffcompare and Cufflinks (Trapnell et al., 2013). From the resulting 2661 loci, to avoid noisy transcription, we filtered out all those that did not have at least 5 FPKM in at least one sample, and over 1 FPKM in any other sample.

To detect putative homology, lincRNAs were searched using BLASTN against the same list of closely related species described above. Differential expression analysis of the lincRNAs was done using the same parameters as for the coding genes (see RNA-seq and differential expression analysis). Consequently, we only accepted lincRNA loci as differentially expressed when they were identified by at least 3 out of 4 methods. We validated 6 out of 6 lincRNA using RT-PCR (see below). Finally, we used the same pipeline to detetect alternative splicing events in coding genes, using a minimum coverage of 10 reads for each splice junction.

Reverse transcription-polymerase chain reaction

To validate lincRNAs and alternative splicing events, RNA samples obtained as described in Cell culture, gDNA and RNA extraction were reverse transcribed to cDNA using SMARTer cDNA kit (CloneTech). For both stages, the same amount of initial purified RNA was used (1 μg). Pairs of primers with melting temperatures close to 60ºC were designed to capture the lincRNA and the alternative splicing events, and the PCR was performed using Expand high-fidelity Taq polymerase (Roche). Validations of IR and ES events were preformed using primer pairs spanning the neighboring constitutive exons. Quantification of alternative sequence inclusion levels from gel band intensity was done using ImageJ software (Schneider et al., 2012).

Comparative transcriptomic cross-species clustering

For the cross-species comparison, we first identified one-to-one orthologs in the proteomes of Creolimax, Capsaspora, and Salpingoeca using the Multiparanoid pipeline (Alexeyenko et al., 2006). We trimmed all RNA-seq datasets into the same length (50 bp) and only mapped the left reads when paired-end data was available. We then obtained cRPKM (corrected by mappability) values only for the subset of orthologs, transformed them to log2(cRPKM +1) and further normalized the expression data using quantile normalization. Hierarchical clustering (‘complete’ method) of samples was obtained by comparing pairwise distances based on Spearman correlation coefficients in R. To obtain the neighbor-joining trees and bootstrap supports across the samples, we used the ‘ape’ package in R (Paradis et al., 2004). For the Creolimax/human comparison we followed the same methodology, using the data detailed in Figure 6—source data 1. The PCA of the expression data was performed as implemented in R ‘prcomp’ function. GO enrichments were obtained as described in RNA-seq and differential expression analysis section.

Secretome proteomics and in silico prediction

To obtain the secretome sample for proteomics, we cultured Creolimax cells in liquid medium (marine broth Difco 2216) at 12ºC for 5 days and allowed the cells to attach to the bottom of the flask. The medium was then replaced with artificial seawater to avoid excessive protein contamination, and the culture was incubated for another 24 hr. Then, we collected the medium by gently tilting the flask to avoid collecting attached cells. The medium was immediately centrifuged at 10,000 rcf for 2 min, and the supernatant was collected and filtered twice through a 0.2 μm filter. The filtered medium was concentrated by ultrafiltration using a molecular weight cut-off membrane (Vivaspin 6, 3000 MW; Sartorius, Gottingen, Germany) and quantified using the BCA Protein Quantification Kit (Thermo Fisher Scientific, San Jose, CA). The resulting protein extract was digested with 5 µg of trypsin (cat # V5113, Promega ) (overnight, 37ºC). Finally, 2 μg of the sample was analyzed using an LTQ-Orbitrap XL mass spectrometer (Thermo Fisher Scientific) coupled to an EasyLC (Thermo Fisher Scientific (Proxeon), Odense, Denmark) at the CRG proteomics unit (Barcelona, Spain). All data were acquired with Xcalibur software v2.2. Proteome Discoverer software suite (v1.4, Thermo Fisher Scientific) and the Mascot search engine (v2.5, Matrix Science (Perkins et al., 1999)) were used for peptide identification and quantification. The data were searched against a database containing Creolimax proteome, a list of common contaminants, and all the corresponding decoy entries. Resulting data files were filtered for false discovery rate (FDR)<0.05. Finally, we discarded the contaminants and all the proteins that were identified by less than two unique peptides, resulting in a list of 91 proteins.

To obtain the list of in silico secretome components, we performed an initial search step using SignalP 3.0 (Dyrløv Bendtsen et al., 2004) (D-cutoff = 0.450) to identify all the proteins with a canonical signal peptide. We then performed a search step with TMHMM v.2.0 (Krogh et al., 2001) to discard all those proteins with a transmembrane domain downstream of the first 60 amino acids. We also filtered out proteins tagged to the mitochondria using TargetP v.1.1 (Emanuelsson et al., 2007), and proteins with endoplasmic reticulum retention signal with c-terminal motifs KDEL or HDE[LF] and GPI-anchored proteins using PredGPI (Pierleoni et al., 2008). The resulting list comprised 453 proteins. PFAM and GO enrichment analyses were performed as described in RNA-seq and differential expression analysis.

Ortholog identification, phylogeny and LGT detection

In order to obtain orthology assignments for the various gene families analyzed in this study, we used a phylogeny-based pipeline. First, we used PFAM domain information to obtain all the members of a gene family across a database comprising 108 eukaryotic proteomes. Then, proteins were aligned using MAFFT software with L-INS-i parameters (Katoh and Standley, 2013). The resulting alignments were automatically trimmed using trimAl v1.2 (Capella-Gutierrez et al., 2009) (-gt 0.7) and phylogenies were obtained using RAxML v8.0 (Stamatakis, 2014) (LG model, gamma distribution, 100 bootstrap supports) and Phylobayes 3 (Lartillot et al., 2009) (LG model, ran until two chains converged). Tree visualization and annotation was performed using iTol v2 (Letunic and Bork, 2011).

This pipeline was slightly modified to detect LGT cases. Instead of using PFAM, we gathered close orthologs of all the proteins in the genome by performing a BLASTP search against the NCBI non-redundant database plus the 108 eukaryotic genomes. Only those proteins that retrieved lower e-values for bacterial/archaeal hits were selected for downstream analysis. Those proteins were then separately searched using BLASTP against all bacteria in nr, all Archaea in nr, and the 108 eukaryotic proteomes. We selected those sequences that had at least 25 hits with an e-value under e-10, selecting a maximum of 50 proteins for bacteria, and 25 for archaea and eukaryotes. We got rid of redundancy using CD-HIT (Li and Godzik, 2006) by filtering for 0.95 identity and then performed alignment, trimming, and phylogeny as described in the general pipeline above. The resulting trees were analyzed manually, taking LGT positives when Creolimax (and other ichthyosporean) sequences branched within bacterial clades with nodal bootstrap supports over 70%. Finally, we manually checked that the resulting LGT genes were found in distinct parts of the genome and not in genomic singletons. For those LGT candidates without introns and not found in any other ichthyosporean (therefore, Creolimax specific) we further checked if they were located in scaffolds with other genes containing introns. To further discard bacterial contaminations, the neighboring genes to LGT candidates were blasted against NCBI nr to verify their eukaryotic origins. When the immediate neighbor did not retrieve any hit, the following gene was searched.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    Evolution of networks and sequences in eukaryotic cell cycle control
    1. FR Cross
    2. NE Buchler
    3. JM Skotheim
    (2011)
    Philosophical Transactions of the Royal Society B: Biological Sciences 366:3532–3544.
    https://doi.org/10.1098/rstb.2011.0078
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
    Antibacterial gene transfer across the tree of life
    1. JA Metcalf
    2. LJ Funkhouser-Jones
    3. K Brileya
    4. A-L Reysenbach
    5. SR Bordenstein
    (2014)
    eLife, 3, 10.7554/eLife.04266.
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
    Bacillus subtilis spore coat assembly requires cotH gene expression
    1. G Naclerio
    2. L Baccigalupi
    3. R Zilhao
    4. M de Felice
    5. E Ricca
    (1996)
    Journal of Bacteriology 178:4375–4380.
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99

Decision letter

  1. Alejandro Sánchez Alvarado
    Reviewing Editor; Stowers Institute for Medical Research, United States

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for submitting your work entitled "Complex transcriptional regulation and independent evolution of fungal traits in a close relative of animals" for peer review at eLife. Your submission has been favorably evaluated by Diethard Tautz (Senior Editor), Alejandro Sánchez Alvarado (Reviewing Editor), and three reviewers.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

This manuscript by de Mendoza et al. presents a comprehensive analysis of the genome and stage-specific transcriptomes of a member of the earliest holozoan lineage: the ichthyosporean Creolimax frangrantissima. As such, this study puts forward intriguing inferences about the evolution of animal and fungal attributes. The extensive suite of analyses incorporated into this manuscript could have led the authors to simply produce a catalog of observations; however, by focusing the interpretation of the results on biological inferences, the manuscript strays minimally. An example is provided by the range of analyses presented ranging from differences in expression levels, lincRNAs, and alternative splicing between the amoeboid and multinucleate stages, to detection of coregulation of various biological pathways, to analyzing the secretome.

Perhaps the most interesting findings pertain to the analysis of the integrin adhesome, filopodial, and postsynaptic machinery. There's been much written about the presence/absence of the components of these cellular structures and the authors present a creative analysis that gives meaningful insight about the functions of these pathways in holozoan lineages that are not easily amenable to functional studies.

In sum, from the analyses reported by de Mendoza et al., two main themes emerge: (i) Creolimax possesses some genomic and transcriptomic (co-expression) features that suggest regulatory features associated with metazoan multicellularity have a more ancient origin; and (ii) that Creolimax displays a range of lineage-specific innovations that separate it from other opisthokonts. From these, it is inferred that although each major lineage of holozoan has evolved a unique manner of being "muliti-celled", their last common ancestor possessed the genomic foundation to allow for a rudimentary multi-phased life cycle with a colonial stage.

Essential revisions:

1) The authors seek to understand the regulatory differences between the amoeboid and multinucleate stages in C. frangrantissima and how they relate to cell type specification in metazoans. It appears that the multinucleate stage shows higher expression of genes for cell proliferation, which are in turn enriched for intron retention in the amoeboid stage, confirming that the multinucleate stage has highly proliferative nuclei (since it was already known that these cells are diving cells). The authors should provide more information on how the GO categories and PFAM domains were assigned to larger groupings (grey circles in Figures 2 and Figure 2—figure supplement 1). The authors rest their conclusions on these categories, so this should be done in a more objective manner, e.g. the LIM domain does not have to have an actin cytoskeleton function.

2) The authors should be careful not to conflate stages with cell types. In metazoan biology, different cell types have different biological functions that go beyond the capacity to divide or not (e.g., intestinal stem cells and hematopoetic stem cells have different epigenomic landscapes that facilitate the different functions of these cells, yet both cells are able to divide and though both cells will show differential expression of genes when they are quiescent vs. proliferating, those phases are considered to be just different stages of the same cell type). The statement in the second paragraph of the Discussion is not supported by these data – C. frangrantissima cells clearly go through a proliferative stage, but these functions are not "segregated" because the amoeboid and proliferative cells don't exist in one entity, they are separated temporally.

3) The analyses throughout the manuscript are very clear in showing the different biological functionalities of various stages in C. fragrantissima and C. owczarzaki, specifically, in a PCA of expression levels of orthologous genes in the two species, PC2 and PC3 separate cells based on proliferation and motility/adhesion. The authors' focus on thinking of these stages as "types" leads to some overstatements and reconsidering these statements in view of the biological functions would be more meaningful, e.g., the title of the section "Evidence of species-specific cell type evolution in Holozoa". Also the statement at the end of this subsection is stronger than needed – don't we know from the yeast cell cycle that genes for human cell proliferation are conserved eukaryotic genes?

4) Throughout the article, the authors should clearly state what we already knew before this study. e.g., C. fragrantissima multinucleate cells may use similar machinery to human proliferating cells, but we know that the cell cycle is a highly conserved phenomenon, well-studied in yeast (the outgroup).

5) The authors use word the "fungal" referring to features of Creolimax which are similar to these of fungi (in the title, Abstract and in the beginning of the subsection “The secretome of C. fragrantissima shows convergent adaptations to a specialized osmotrophic lifestyle”). It makes me a bit uneasy. I would suggest using terms "fungal-like" or perhaps simply "osmotrophic".

6) In the second paragraph of the Introduction the syncytial stage of Creolimax is compared to Drosophila's syncytial blastoderm and to glass sponge tissues. I feel the similarity to Drosophila is a much better one, as in glass sponges the syncytial organization is the final step of development, rather than transient stage.

In the subsections “Recurrent prokaryotic lateral gene transfer events shaped the C. fragrantissima secretome” and”Ortholog identification, phylogeny and LGT detection”, the authors address the issue of possible bacterial contaminations presenting a lateral gene transfer. Perhaps the most convincing way would be to check whether the genes which are likely results of LGT are present on the same genomic scaffolds as unquestionably eukaryotic genes?

7) In the subsection “Genome sequencing, assembly and annotation” the authors describe a combination of Augustus and RNA-Seq data to generate protein predictions. This section is rather brief, and could be made more valuable to the readers if we could learn some more details of the efficiency/correctness of their pipeline. For example, given that PASA can easily miss a gene if its transcripts do not align perfectly on the genome, how big was the fraction of Trinity transcripts which could be aligned to the genome but were rejected by PASA? How were the Augustus models verified not only during the training phase, but also the final gene models? It would be recommended to at least manually compare a sub-set of Augustus models with transcripts from Trinity, or computationally compare of e.g. intron-exon boundaries and UTRs predicted by Augustus with those in Trinity dataset.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Complex transcriptional regulation and independent evolution of fungal traits in a close relative of animals" for further consideration at eLife. Your revised article has been favorably evaluated by Diethard Tautz (Senior Editor), a Reviewing Editor, and three reviewers. The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below.

Two reviewers have still some comments that need to be addressed before the final version of the manuscript can be accepted. Please look particularly carefully into the comments concerning the GO/PFAM analysis. When resubmitting the revised version, please add a detailed reply to the remaining comments.

Reviewer #1:In this revised version of the paper, de Mendoza et al. have addressed most of my previous comments, but two points remain:

I think the GO/PFAM analysis needs some more clarification or trimming. The authors have now provided a table with p-values and GO categories that the differentially expressed genes were assigned to. This is very helpful, but I still am not sure how the authors came up with the "more inclusive functions". For example, GO:0060026 (name: convergent extension) does not have "actin cytoskeleton" as a parent category in the GO database. Of the thousands of human genes that fall in this GO:0060026, some are sfrp and wnt, which by no means have molecular functions directly related to actin cytoskeleton. So how was this GO term placed under the grey cloud of "actin cytoskeleton"? Are these classifications based on subjective decisions made by the authors? If so, I think that this should be made clear in the Methods section. Furthermore, I think this undercuts this point of this analysis – GO assignments and PFAM annotations provide unbiased ways of looking at large genomic datasets. If the authors then make subjective decisions on the inclusive categories, then they may be introducing bias. The authors also gave the example of the gene of the integrin adhesome that did not obtain the correct GO terms, which the authors corrected manually. However, they could do this for a set of genes they are very familiar with, but what about the many other genes that they are not as familiar with? This manual curation is certainly valuable when studying a specific set of genes, but, in analyzing large datasets, where an unbiased method is supposed to give you biological insight, this seems arbitrary.

The authors have presented good reasons to think of the multinucleate and amoeba stages as different cell types in their rebuttal. However, this nuanced discussion is missing from the manuscript itself. I think the authors should add a very explicit description of these two cell types with their differing features before they start talking about the different "cell types". Also, though they have edited the second paragraph of the Discussion, I think a more in depth discussion of cell-types vs. cell-stages, that explores the ideas in the authors’ rebuttal, would greatly enhance the manuscript.

Reviewer #3: For the most part, I am happy with the changes made in this revision.

However the general statement that ichthyosporeans are close relatives of animals, which is stated a number of times in the manuscript, is still misleading. If the metazoans and their closest unicellular relatives, the choanoflagellates, are estimated to have diverged 150-200 million years before the emergence of the crown Metazoa some 7-800 Mya and the ichthyosporeans diverged well before this (before but probably closer to the metazoan-fungal divergence time), it seems stating they are close relatives of animals is very misleading.

It would benefit the reader if the Opisthokont relationships can be more clearly articulated, and if the authors can provide accurate estimates of divergence times of the lineages shown in Figure 1. When discussing the various 'developmental' modes of holozoans it would be appropriate to include some discussion of fungi, as they are the outgroup to the Holozoa. This, in turn, should lead to a more balanced discussion about the evolution of development and multicellularity in the Holozoa.

https://doi.org/10.7554/eLife.08904.035

Author response

Essential revisions: 1) The authors seek to understand the regulatory differences between the amoeboid and multinucleate stages in C. frangrantissima and how they relate to cell type specification in metazoans. It appears that the multinucleate stage shows higher expression of genes for cell proliferation, which are in turn enriched for intron retention in the amoeboid stage, confirming that the multinucleate stage has highly proliferative nuclei (since it was already known that these cells are diving cells). The authors should provide more information on how the GO categories and PFAM domains were assigned to larger groupings (grey circles in Figures 2 and Figure 2—figure supplement 1). The authors rest their conclusions on these categories, so this should be done in a more objective manner, e.g. the LIM domain does not have to have an actin cytoskeleton function.

The groupings were done using the network connectivity between different nodes as an indicator of redundancy using the EnrichmentMap plugin (Merico et al., 2010, PLoS ONE) in Cytoscape. In that software the edges between the nodes are the genes shared by the enriched GOs, therefore clusters of connected GOs belong to a similar set of genes. From those clusters of GOs we obtained a more inclusive function by assessing which genes and functions were more redundant. For example, in the case of Protein Kinase activity or GTPase function, many GOs were associated to those more inclusive functions. Some unconnected nodes were grouped in the inclusive categories by similarity of the function. As a case example, Focal Adhesion GO appears unconnected to the rest of GOs related to the Integrin pathway. This is because one gene of the Integrin Adhesome did not retrieve the integrin GOs in the Blast2GO search, but it had a Focal Adhesion GO in the annotation step. Both GOs should share the same genes. It should be noted that Blast2GO performance in distant non-model organisms such as Creolimax has some limits of detection if compared to its performance in a species with close relatives sequenced and annotated in the databases (e.g. vertebrates or insects). Thus, we preferred to group the resulting GOs in inclusive categories rather than curating the GOs for each gene manually. To address the reviewers’ concerns, we now include the GO categories depicted in Figure 2 in Figure 2—source data 1, with the respective p-values and group classification.

In the case of the domains, we used the information available in each of those PFAM domain webpage descriptions to curate inclusive categories that summarize the data in meaningful sets. In the cases where a domain has unspecific functions (e.g. LIM or Ank_2), we used three criteria to assign it to an inclusive category: 1) we checked if the unspecific PFAM domain was found in the same gene with other specific domains, 2) we searched for the GOs of the differentially expressed genes containing the unspecific domain (e.g. GOs of LIM-containing genes enriched in the amoeboid stage) 3) we used a network connectivity criteria, checking redundancy of genes between GO and PFAM categories in a mixed Enrichment map network. This “guilty-by-association” procedure produced the final categories, but as reviewers suggest we now exclude LIM from “Actin cytoskeleton”. Despite LIM domain was associated with “Cell migration associated with gastrulation” GO in the combined network, only 2 of 9 genes had that GO associated. We now also explain these criteria in Figure 2 and Figure 2—figure supplement 1 figure legends.

2) The authors should be careful not to conflate stages with cell types. In metazoan biology, different cell types have different biological functions that go beyond the capacity to divide or not (e.g., intestinal stem cells and hematopoetic stem cells have different epigenomic landscapes that facilitate the different functions of these cells, yet both cells are able to divide and though both cells will show differential expression of genes when they are quiescent vs. proliferating, those phases are considered to be just different stages of the same cell type). The statement in the second paragraph of the Discussion is not supported by these data – C. frangrantissima cells clearly go through a proliferative stage, but these functions are not "segregated" because the amoeboid and proliferative cells don't exist in one entity, they are separated temporally.

This is, of course an interesting open conceptual debate. In our opinion, the cell type concept can potentially be used within a unicellular context, and specifically to C. fragrantissima life cycle. Both under the reviewer’s definition and under the definition by Detlev Arendt in his 2008 Nature Review Genetics: “What is a cell type and how can cell types be compared? By definition, any cell type has special physiological or structural characteristics. The aim of comparative study of cell type characteristics is to elucidate the evolutionary diversification of cell types (cell typogenesis) by detecting the similarities and differences between them. Physiological and structural characteristics will be reflected by cell type-specific gene expression at some point, and therefore comparisons can be based on differential expression profiling data”. In this case we observe two completely different cell types, not only restricted to their ability to proliferate. The multinucleate stage is not only a coenocyte (many nuclei sharing a single cytoplasm) but also has a thick cell wall reminiscent of those of fungi and has unique ultra-structural features (see Marshall et al., 2008, Protist). Moreover it has different biological functions to the amoeboid stage, not only division but also encystation, as multinucleate cells are able to stop growing under non-favourable circumstances. On the other hand, the amoeba stage lacks a cell wall, it is mono-nucleated and has a dispersal function.

Another line of evidence towards the segregation of cell types in this species is that in some rare cases, a mature multinucleate stage can bypass the amoeboid stage, just entering into the multinucleate stage again (as reported in Marshall et al., 2008, Protist and Suga and Ruiz-Trillo, 2013, Developmental Biology). Moreover, the closely related species from the Sphaeroforma genus, do not have this amoeboid stage (see Marshall and Berbee, 2013, Protist). Thus, both the molecular and morphological differences are huge; enough – in our opinion – to merit the designation of different cell types separated temporally. We suggest keeping the denomination of “cell type”, but clarifying the ideas further. For example, we have rephrased the second paragraph of the Discussion which now reads as: “This would suggest that separation of functions as crucial as self-renewal and differentiation can occur in a unicellular context in a temporal manner, which pre-dates the exclusive ability of multicellular organisms to engulf both functionally distinct cell types within a single entity”.

3) The analyses throughout the manuscript are very clear in showing the different biological functionalities of various stages in C. fragrantissima and C. owczarzaki, specifically, in a PCA of expression levels of orthologous genes in the two species, PC2 and PC3 separate cells based on proliferation and motility/adhesion. The authors' focus on thinking of these stages as "types" leads to some overstatements and reconsidering these statements in view of the biological functions would be more meaningful, e.g., the title of the section "Evidence of species-specific cell type evolution in Holozoa". Also the statement at the end of this subsection is stronger than needed – don't we know from the yeast cell cycle that genes for human cell proliferation are conserved eukaryotic genes?

Following the reasoning above, we think the cell type nomenclature could be used in this context, although we have toned down the text significantly. Specifically, a choanoflagellate cell, a filopodiated amoeba, a cystic coenoctye and a crawling amoeba are completely distinct in function, physiology and, obviously, gene expression profiles. In an evolutionary perspective, it is conceivable that the ancestors of some of those species might have had more cell types than those currently displayed. In fact, some extant species might have some cell types still unknown because they are not seen in culture conditions. However, if the definition of cell type is restricted to cells co-existing in one multicellular entity with some defined cell lineages, then we are open to include the changes. Our point is to broaden the definition of cell type, as we think that the frontiers between multicellular and unicellular species are blurrier than commonly thought.

We now have rephrased the end of the subsection “Evidence of species-specific cell type evolution in Holozoa” according to reviewers’ suggestion. Now it reads “These results suggest that a signal from the evolutionarily conserved machinery for cell proliferation in eukaryotes can be detected from direct expression pattern comparisons between the multinucleate stage of C. frangrantissima and highly proliferative human cell types.”

Accordingly, we have also rephrased the sentence in the Discussion: “In fact, our results indicate that the transcriptional profile of the multinucleate stage of C. fragrantissima is more similar to those of highly proliferative cell types in humans, despite the obvious differences in cell morphology and cell division strategy (cell divison vs. coenocytic nuclear division).”

4) Throughout the article, the authors should clearly state what we already knew before this study. e.g., C. fragrantissima multinucleate cells may use similar machinery to human proliferating cells, but we know that the cell cycle is a highly conserved phenomenon, well-studied in yeast (the outgroup).

We agree. We think that the previously modified sentences now include awareness about the conservation of some general features of the cell cycle throughout eukaryotes. In that same sentence we have included two citations to recent reviews on the topic: “Cell cycle control across the eukaryotic kingdom” by Harashima et al. (2013, Trends in Cell Biology) and “Evolution of networks and sequences in eukaryotic cell cycle control” by Cross et al. (2011, Philos Trans R Soc Lond B Biol Sci.) However, we would like to note that recent data (covered in both those reviews) suggest that the cell cycle described in yeast (and in fact, in most fungi) is only analogous to the cell cycle of animals and not homologous. In fact, gene content analysis on Cyclins and CDKs suggest that unicellular holozoans might be better out-groups to understand the ancestral cell cycle of animals than yeast in terms of gene orthology (see Suga et al., 2012, Nature Communications). However, discussing this issue is out of the scope of this paper.

5) The authors use word the "fungal" referring to features of Creolimax which are similar to these of fungi (in the title, Abstract and in the beginning of the subsection “The secretome of C. fragrantissima shows convergent adaptations to a specialized osmotrophic lifestyle”). It makes me a bit uneasy. I would suggest using terms "fungal-like" or perhaps simply "osmotrophic".

We now use “fungal-like” as suggested.

6) In the second paragraph of the Introduction the syncytial stage of Creolimax is compared to Drosophila's syncytial blastoderm and to glass sponge tissues. I feel the similarity to Drosophila is a much better one, as in glass sponges the syncytial organization is the final step of development, rather than transient stage.

In the subsections “Recurrent prokaryotic lateral gene transfer events shaped the C. fragrantissima secretome “and”Ortholog identification, phylogeny and LGT detection”, the authors address the issue of possible bacterial contaminations presenting a lateral gene transfer. Perhaps the most convincing way would be to check whether the genes which are likely results of LGT are present on the same genomic scaffolds as unquestionably eukaryotic genes?

We have now deleted the reference to the glass sponge tissues. Regarding the LGT genes, we have many sources of evidence to rule out putative bacterial contamination. First of all, Creolimax grows in axenic conditions. Therefore, we are pretty sure about the purity of our gDNA. Moreover, in most cases we are detecting the LGT events not only in Creolimax, but also in other ichthyosporean species (135 of the genes, in front of 35 only observed in Creolimax, as shown in Figure 10—figure supplement 2). The other ichthyosporeans sequences have been generated in the same axenic conditions, and have been sequenced in different platforms in different institutes (Broad Institute, BGI and CRG). Therefore those sequences are pretty unlikely to come from contamination from other material during library construction. In the case of the 35 LGTs only detected in Creolimax, 17 have at least one intron supported by RNA-seq data, discarding bacterial origin. The other 18 intron-less genes are distributed in 17 scaffolds, that range from 29 to 843 genes and all those scaffolds contain genes with introns (in fact, genes belonging to genomic singletons were discarded for the analysis). Following the reviewers’ suggestion, we now have checked the top blast hit against the non-redundant database of NCBI of the neighbouring genes. In all the cases, genes with eukaryotic best hits were found surrounding the LGT candidates. We now have included this extra criterions exposed here in the “Materials and methods – Ortholog identification, phylogeny and LGT detection” subsection.

7) In the subsection “Genome sequencing, assembly and annotation” the authors describe a combination of Augustus and RNA-Seq data to generate protein predictions. This section is rather brief, and could be made more valuable to the readers if we could learn some more details of the efficiency/correctness of their pipeline. For example, given that PASA can easily miss a gene if its transcripts do not align perfectly on the genome, how big was the fraction of Trinity transcripts which could be aligned to the genome but were rejected by PASA? How were the Augustus models verified not only during the training phase, but also the final gene models? It would be recommended to at least manually compare a sub-set of Augustus models with transcripts from Trinity, or computationally compare of e.g. intron-exon boundaries and UTRs predicted by Augustus with those in Trinity dataset.

Yes, we are aware that PASA might miss some genes, but as we used a “genome-guided” Trinity assembly, the resulting transcripts mapped perfectly to the genome (citing Trinity’s manual, “reads are first aligned to the genome, partitioned according to locus, followed by de novo transcriptome assembly at each locus.”). Anyhow, we used Trinity+PASA only to annotate the UTRs, feeding PASA with the annotation of CDS from Augustus and elongating the UTRs. Moreover, the information from UTRs is only used for the analysis of lincRNA, to avoid identifying as a lincRNA a miss-annotated UTR.

In the case of Augustus models for the Coding Sequence (we did not use it to predict the UTRs), we went through several rounds of manually checking the genes in the IGV genome browser and comparing different parameters outputs with the spliced-alignments resulting from Tophat2 read mapping. Not only that, but we further used as a quality measure the recovery of a manually curated set of genes that we have amplified by RACE-PCR or RT-PCR from previous experiments, which covered both highly expressed genes such as Histone2B or Tubulin-β and lowly expressed genes such as Transcription Factors, including Grainyhead, Myc, NF-κ B or P53, or Tyrosine Kinases as Src. In fact, the gene models presented here are currently used in the lab to clone more genes of interest, usually retrieving positive results (e.g. Integrin Adhesome components).

To clarify this points for future readers, we have now extended the “Materials and methods – Genome sequencing, assembly and annotation” subsection in the manuscript.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Reviewer #1:

In this revised version of the paper, de Mendoza et al. have addressed most of my previous comments, but two points remain: I think the GO/PFAM analysis needs some more clarification or trimming. The authors have now provided a table with p-values and GO categories that the differentially expressed genes were assigned to. This is very helpful, but I still am not sure how the authors came up with the "more inclusive functions". For example, GO:0060026 (name: convergent extension) does not have "actin cytoskeleton" as a parent category in the GO database. Of the thousands of human genes that fall in this GO:0060026, some are sfrp and wnt, which by no means have molecular functions directly related to actin cytoskeleton. So how was this GO term placed under the grey cloud of "actin cytoskeleton"? Are these classifications based on subjective decisions made by the authors? If so, I think that this should be made clear in the Methods section. Furthermore, I think this undercuts this point of this analysis – GO assignments and PFAM annotations provide unbiased ways of looking at large genomic datasets. If the authors then make subjective decisions on the inclusive categories, then they may be introducing bias. The authors also gave the example of the gene of the integrin adhesome that did not obtain the correct GO terms, which the authors corrected manually. However, they could do this for a set of genes they are very familiar with, but what about the many other genes that they are not as familiar with? This manual curation is certainly valuable when studying a specific set of genes, but, in analyzing large datasets, where an unbiased method is supposed to give you biological insight, this seems arbitrary.

We agree this is an important point and we have tried to further clarify it in the text. As the reviewer states, it is very crucial for genome-wide studies that GO/PFAM analyses are done in an unbiased and objective manner. We have done so in our work, relying only on Blast2GO gene-to-term associations (Götz et al., 2008, Nucleic Acids Research) and Enrichment map GO term connections (Merico et al., 2010, Plos ONE). We believe the confusion may come from the fact that many genes have multiple associated GO terms. For example, in the case mentioned by the reviewer, the genes that have “convergent extension” as an associated GO also have “Actin binding” and “Rho GTPase binding” as associated categories in Creolimax Blast2GO annotation. Therefore, the signal from all those enriched GOs comes from the same set of genes, and this is why those GOs are connected in the network and shaded in an inclusive category. We would like to restate that we have not forced these connections – they are built by the “Enrichment map” plug-in for Cytoscape, which connects GOs through common genes. These GO networks simply allow visualization of GO redundancy, which is helpful when showing a very long list of GOs, as in this case (89 enriched categories). In other cases where the list of GOs is shorter we have shown all the significant categories in a bar graph plot (see Figures 3C and 3F, Figure 5D, Figure 6D, Figures 8B and 8C). Finally, in the case of Figure 2, we provide the full list of enriched GOs (as well as all enriched PFAM in Figure 2—figure supplement 1) so that the reader can take a look at our raw results. We have thus simply made an effort to show the results in the most informative manner. In that sense, we would like to clarify that none of the extended list of enriched GO categories contradicts our interpretation of the data.

We would also like to explain the case of the integrin adhesome cited by the reviewer. We did not manually correct the annotation of any gene. In fact, we wanted to specifically avoid this, as mentioned above, as we agree with the reviewer that unbiased methods are preferable in large dataset analysis. The GO focal adhesion and the group of GOs integrin-mediated signaling pathway/integrin complex/cell-matrix adhesion/lamellipodium/positive regulation of podosome assembly/cleavage furrow formation appear unconnected in the network and that is how we show it in Figure 2. But consulting the GO term definition and previous literature, all these categories belong to the same set of genes involved in Cell-ECM adhesion. We strongly believe that this kind of inclusion is not an “arbitrary” and “biased” decision, just reflects research and logical decisions made in order to summarize the data and present it in a meaningful way to the reader.

We have now provided more explicit and detailed explanations of these points in the corresponding Methods sections. Moreover, in the Results section we refer to the raw list of GOs (in the subsection “Transcriptional dynamics reveal differences between Creolimax multinucleate stage and animal development”) to draw attention to the reader of the availability of this data:

“Gene Ontology enrichments were obtained using the Topology-Weighted method in Ontologizer (Bauer et al., 2008) taking a P value lower than 0.01 as a threshold (see full list in Figure 2—source data 1). […] Additionally, PFAM domain enrichments were calculated using a Fisher’s exact test implemented in R, taking a threshold of 0.01. PFAM enrichments were also visualized as a network using the Enrichment map plug-in, where connected nodes reflect domain presence in the same genes.”

The authors have presented good reasons to think of the multinucleate and amoeba stages as different cell types in their rebuttal. However, this nuanced discussion is missing from the manuscript itself. I think the authors should add a very explicit description of these two cell types with their differing features before they start talking about the different "cell types". Also, though they have edited the second paragraph of the Discussion, I think a more in depth discussion of cell-types vs. cell-stages, that explores the ideas in the authors’ rebuttal, would greatly enhance the manuscript.

Now we include explicit sentences about cell stages in Creolimax in the Introduction (with referrals to Figure 1B and 1C). It reads: These stages have distinct physiological and structural characteristics, the amoeboid stage is mono-nucleated, non-diving and motile (Figure 1B), while the multinucleate stage has a cell wall, a big central vacuole and does not move (Figure 1C).” We also have expanded this argument in the Discussion section: “Drastic differences in cell structure, morphology, function and molecular signatures found between stages in protistan relatives of animals indicate that cell stages can be considered cell types according to established definitions (Fernandez-Valverde, Calcino and Degnan, 2015 and see Author’s response section for an in depth discussion on this topic).”

Reviewer #3:

For the most part, I am happy with the changes made in this revision. However the general statement that ichthyosporeans are close relatives of animals, which is stated a number of times in the manuscript, is still misleading. If the metazoans and their closest unicellular relatives, the choanoflagellates, are estimated to have diverged 150-200 million years before the emergence of the crown Metazoa some 7-800 Mya and the ichthyosporeans diverged well before this (before but probably closer to the metazoan-fungal divergence time), it seems stating they are close relatives of animals is very misleading.

The level of relatedness to the animal lineage is debatable. If we take the millions of years of divergence as a measure of relatedness, Ichthyosporeans and animals seem distantly related. But if we look at the whole array of known cellular life or eukaryotic diversity (more than 100 different lineages – see del Campo et al. Trends Ecol Evol, 29(5), 252–259), it is clear that only filastereans and choanoflagellates appear closer to the animal group than ichthyosporeans. Therefore, we think that “close relative” of animals is a valid term under the macro-evolutionary perspective used in this manuscript. Moreover, the major genomic similarities found between all the extant holozoan groups and stark differences with their sister group, the Holomycota, points towards this “close” relatedness from a genomic level, which is the main topic of this manuscript (how lineages with similar genomic complements achieve different forms and developmental modes). Nevertheless, we now accommodate the reviewer’s suggestions and we have deleted “close” from all the sentences where “close relative” appeared, including the title of the manuscript. As an exception, in the Abstract we now refer to the “the three closest animal relatives (ichthyosporeans, filastereans and choanoflagellates)”, which truly reflects the phylogenetic position of this groups, as there is no other know group branching closer to animals than those three.

It would benefit the reader if the Opisthokont relationships can be more clearly articulated, and if the authors can provide accurate estimates of divergence times of the lineages shown in Figure 1. When discussing the various 'developmental' modes of holozoans it would be appropriate to include some discussion of fungi, as they are the outgroup to the Holozoa. This, in turn, should lead to a more balanced discussion about the evolution of development and multicellularity in the Holozoa.

We are a bit confused about the first part of this reviewer's concern as the opisthokont relationships are clearly depicted in Figure 1, where there are no ambiguities about the placement of Fungi, Ichthyosporeans, Filastereans and Choanoflagellates in relation to animals. This phylogeny is depicted according to the latest and more comprehensive phylogenetic analysis designed to unravel Opisthokont relationships (Shalchian-Tabrizi et al., 2008, Plos ONE; Torruella et al., 2012, Molecular Biology and Evolution and Torruella et al., 2015, Current Biology).

Regarding the “accurate dating” of the divergence times, we agree that it is an important topic and there is plenty of literature on molecular clocks and eukaryotic phylogeny, but there is currently no consensus divergence time for the groups treated in this manuscript, or even for the origin of animals (published divergence times range from 698 Mya to 1200 Mya, nicely reviewed in Sharpe SC, Eme L, Brown MW and Roger AJ in the book chapter “Timing the Origins of Multicellular Eukaryotes Through Phylogenomics and Relaxed Molecular Clock Analyses”). Depending on the method, the parameters, the evolutionary models and the backbone of the phylogeny used to calculate those estimates, results keep changing and giving very disparate ages. Given that this manuscript’s focus does not include molecular clocks and there is not a consensus and “accurate” divergence time for the splitting of every holozoan group, we think including this information is detrimental to the manuscript, as it would attract criticism and controversy for a topic out of its focus. We have nonetheless mentioned this point in the figure legend and provide a reference to the review article mentioned above. It reads “Divergence times of the lineages shown in this figure range between 700 Mya (considered the latest estimates of animal origins) and 1200 Mya (earliest estimates of Opisthokont origins) (Li and Godzik, 2006)”.

Regarding discussing the fungal developmental modes, we would first like to highlight that not only the Fungi are the outgroup of the Holozoa, but all the Holomycota. Within the Holomycota, we find lots of different developmental mechanisms, from aggregative fruiting body formation in Fonticula alba, coenocytic development seen in some Chytrids to a huge variety of truly multicellular and multinucleate forms found in the fungal kingdom. We now mention this in the manuscript: “Further complementary insights on the evolution of developmental modes will be provided by studying the immediate out-group of the Holozoa, the Holomycota (Fungi + Discicristoidea) (Torruella et al., 2015). The Holomycota show a wide variety of developmental modes, ranging from aggregative fruiting body formation to several modes of coenocytic development (Stajich et al., 2009; Brown, Spiegel and Silberman, 2009).”

https://doi.org/10.7554/eLife.08904.036

Article and author information

Author details

  1. Alex de Mendoza

    Institut de Biologia Evolutiva, Universitat Pompeu Fabra, Barcelona, Spain
    Present address
    ARC CoE Plant Energy Biology, University of Western Australia, Crawley, Australia
    Contribution
    AdeM, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    alexmendozasoler@gmail.com
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-6441-6529
  2. Hiroshi Suga

    1. Institut de Biologia Evolutiva, Universitat Pompeu Fabra, Barcelona, Spain
    2. Prefectural University of Hiroshima, Shobara, Japan
    Contribution
    HS, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  3. Jon Permanyer

    1. EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation, Barcelona, Spain
    2. Universitat Pompeu Fabra, Barcelona, Spain
    Contribution
    JP, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  4. Manuel Irimia

    1. EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation, Barcelona, Spain
    2. Universitat Pompeu Fabra, Barcelona, Spain
    Contribution
    MI, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  5. Iñaki Ruiz-Trillo

    1. Institut de Biologia Evolutiva, Universitat Pompeu Fabra, Barcelona, Spain
    2. Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
    Contribution
    IRT, Conception and design, Drafting or revising the article
    For correspondence
    inaki.ruiz@ibe.upf-csic.es
    Competing interests
    The authors declare that no competing interests exist.

Funding

European Research Council (ERC-2007-StG-206883)

  • Alex de Mendoza
  • Hiroshi Suga
  • Iñaki Ruiz-Trillo

Ministerio de Economía y Competitividad (BFU-2011-23434)

  • Alex de Mendoza

European Research Council (ERC-2012-Co -616960)

  • Alex de Mendoza
  • Hiroshi Suga
  • Iñaki Ruiz-Trillo

Japan Society for the Promotion of Science (Research Activity)

  • Hiroshi Suga

Center for Genomic Regulation (Core funding)

  • Jon Permanyer
  • Manuel Irimia

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to thank the proteomics unit in the CRG for help in the analysis of the secretome, especially Eduard Sabidó and Guadalupe Iglesias, and the genomics unit of CRG for help in the RNA-seq library preparation and sequencing. We also thank Arnau Sebé-Pedrós, Xavi Grau-Bové and Ozren Bogdanovic for critical reading and discussion of the manuscript. Finally, we are grateful to Meritxell Antó for the technical support provided throughout the project.

Reviewing Editor

  1. Alejandro Sánchez Alvarado, Reviewing Editor, Stowers Institute for Medical Research, United States

Publication history

  1. Received: May 21, 2015
  2. Accepted: October 13, 2015
  3. Accepted Manuscript published: October 14, 2015 (version 1)
  4. Version of Record published: January 21, 2016 (version 2)

Copyright

© 2015, de Mendoza et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,355
    Page views
  • 700
    Downloads
  • 10
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Biophysics and Structural Biology
    2. Genes and Chromosomes
    Ramon A van der Valk et al.
    Research Article Updated