1. Evolutionary Biology
Download icon

Support for a clade of Placozoa and Cnidaria in genes with minimal compositional bias

  1. Christopher E Laumer  Is a corresponding author
  2. Harald Gruber-Vodicka
  3. Michael G Hadfield
  4. Vicki B Pearse
  5. Ana Riesgo
  6. John C Marioni
  7. Gonzalo Giribet
  1. Wellcome Trust Sanger Institute, United Kingdom
  2. European Molecular Biology Laboratories-European Bioinformatics Institute, United Kingdom
  3. Max Planck Institute for Marine Microbiology, Germany
  4. Pacific Biosciences Research Center and the University of Hawaii-Manoa, United States
  5. University of California, United States
  6. The Natural History Museum, United Kingdom
  7. University of Cambridge, United Kingdom
  8. Harvard University, United States
Short Report
  • Cited 0
  • Views 1,168
  • Annotations
Cite this article as: eLife 2018;7:e36278 doi: 10.7554/eLife.36278

Abstract

The phylogenetic placement of the morphologically simple placozoans is crucial to understanding the evolution of complex animal traits. Here, we examine the influence of adding new genomes from placozoans to a large dataset designed to study the deepest splits in the animal phylogeny. Using site-heterogeneous substitution models, we show that it is possible to obtain strong support, in both amino acid and reduced-alphabet matrices, for either a sister-group relationship between Cnidaria and Placozoa, or for Cnidaria and Bilateria as seen in most published work to date, depending on the orthologues selected to construct the matrix. We demonstrate that a majority of genes show evidence of compositional heterogeneity, and that support for the Cnidaria + Bilateria clade can be assigned to this source of systematic error. In interpreting these results, we caution against a peremptory reading of placozoans as secondarily reduced forms of little relevance to broader discussions of early animal evolution.

https://doi.org/10.7554/eLife.36278.001

eLife digest

Filter-feeding sponges and tiny gliding, pancake-like animals called placozoans are the only two major groups of animals that lack muscles, nerves and an internal gut. Sponges have historically been seen as the first to have branched off in animal phylogeny – the family tree of living organisms that shows how species are related. This is because it is assumed that they split from the other animals before features including muscles, nerves and internal guts evolved.

Sequences of their genetic material (the genome) support this view, although some argue that jellyfish-like animals called ctenophores branched first. One explanation for this disagreement is that ctenophores use different proportions of amino acids in their proteins, known as compositional heterogeneity. Computer algorithms that assume amino acid usage is the same universally throughout evolution may therefore place ctenophores incorrectly. In contrast, so far the only genome from a placozoan shows that they are equally closely related to jellyfish and corals (cnidarians) and bilaterians, which includes worms, insects and vertebrates.

To test whether this view of the first branches of the animal tree of life is correct, Laumer et al. included the genomes from several undescribed species of placozoans in a phylogenetic analysis. These analyses showed a relationship that had not previously been seen. The placozoans were the closest living relative to cnidarians. However, when looking at the level of genes rather than whole genomes, the more usual relationship of placozoans being equally related to cnidarians and bilaterians re-emerged. To resolve this conflict, Laumer et al. focused on the genes that had the least compositional heterogeneity. When doing this, the relationship appeared to be the newly identified one of placozoans being most closely related to cnidarians.

Researchers studying cnidarians often hope to find some clues as to how the complex features they seem to share with bilaterians originated. The findings of Laumer et al. may suggest that the ancestors of the placozoans did in fact have muscles, nerves and guts, but they lost these traits in favor of a simpler lifestyle. An alternative, but controversial possibility is that the ancestor of cnidarians and bilaterians was a simple organism like a placozoan, and the two evolved their complex traits independently. The findings show a complex picture of early animal evolution. Further study of placozoans may well clarify this picture.

https://doi.org/10.7554/eLife.36278.002

Introduction

The discovery (Schulze, 1883) and mid-20th century rediscovery (Grell and Benwitz, 1971) of the enigmatic, amoeba-like placozoan Trichoplax adhaerens did much to ignite the imagination of zoologists interested in early animal evolution (Bütschli, 1884). As microscopic animals adapted to extracellular grazing on the biofilms over which they creep (Wenderoth, 1986), placozoans have a simple anatomy suited to exploit passive diffusion for many physiological needs, with only six morphological cell types discernible even to intensive microscopical scrutiny (Grell and Ruthmann, 1991; Smith et al., 2014), albeit a greater diversity of cell types is apparent through single-cell RNA-seq (Sebé-Pedrós, 2018a). They have no conventional muscular, digestive, or nervous systems, yet show tightly-coordinated behaviour regulated by peptidergic signaling (Smith et al., 2015; Senatore et al., 2017; Varoqueaux, 2018; Armon et al., 2018). In laboratory conditions, they proliferate through fission and somatic growth. Evidence for sexual reproduction remains elusive, despite genetic evidence of recombination (Srivastava et al., 2008) and descriptions of early abortive embryogenesis (Eitel et al., 2011; Grell, 1972), with the possibility that sexual phases of the life cycle may occur only under poorly understood field conditions (Pearse and Voigt, 2007; McFall-Ngai et al., 2013)

Given their simple, puzzling morphology and dearth of embryological clues, molecular data are crucial in placing placozoans phylogenetically. The position of Placozoa in the animal tree proved recalcitrant to early standard-marker analyses (Kim et al., 1999; Silva et al., 2007; Wallberg et al., 2004), although this paradigm did reveal a large degree of molecular diversity in placozoan isolates from around the globe, clearly indicating the existence of many cryptic species (Pearse and Voigt, 2007; Eitel et al., 2013; Signorovitch et al., 2007) with up to 27% genetic distance in 16S rRNA alignments (Eitel and Schierwater, 2010). An apparent answer to the question of placozoan affinities was provided by analysis of a nuclear genome assembly (Srivastava et al., 2008), which strongly supported a position as the sister group of a clade of Cnidaria + Bilateria (sometimes called Planulozoa). However, this effort also revealed a surprisingly bilaterian-like (Dunn et al., 2015) developmental gene toolkit in placozoans, a paradox for such a simple animal.

As metazoan phylogenetics has pressed onward into the genomic era, perhaps the largest controversy has been the debate over the identity of the sister group to the remaining metazoans, traditionally thought to be Porifera, but considered to be Ctenophora by Dunn et al (Dunn et al., 2008). and subsequently by additional studies (Hejnol et al., 2009; Moroz et al., 2014; NISC Comparative Sequencing Program et al., 2013; Whelan et al., 2015; Whelan et al., 2017). Others have suggested that this result arises from artifacts with potentially additive effects, such as inadequate taxon sampling, flawed matrix husbandry (undetected paralogy or contamination), and use of poorly fitting substitution models (Philippe et al., 2009; Pick et al., 2010; Pisani et al., 2015; Simion et al., 2017; Feuda et al., 2017). A third view has emphasized that using different sets of genes can lead to different conclusions, with only a small number sometimes sufficient to drive one result or another (Nosenko et al., 2013; Shen et al., 2017). This controversy, regardless of its eventual resolution, has spurred serious contemplation of possibly independent origins of several hallmark traits such as striated muscles, digestive systems, and nervous systems (Moroz et al., 2014; Dayraud et al., 2012; Hejnol and Martín-Durán, 2015; Liebeskind et al., 2017; Moroz and Kohn, 2016; Presnell et al., 2016; Steinmetz et al., 2012).

Driven by this controversy, new genomic and transcriptomic data from sponges, ctenophores, and metazoan outgroups have accrued, while new sequences and analyses focusing on the position of Placozoa have been slow to emerge. Here, we provide a novel test of the phylogenetic position of placozoans, adding draft genomes from three putative species that span the root of this clade’s known diversity (Eitel et al., 2013) (Table 1), and critically assessing the role of systematic error in placing of these enigmatic organisms (Laumer, 2018).

Table 1
Summary statistics describing the contiguity and completeness of the draft host metagenome bins from the three clade A placozoan isolates utilized in this paper, presented in comparison to the reference H1 strain.
https://doi.org/10.7554/eLife.36278.003
H11H4H6H1
assembly span (Mbp)56.6383.3976.798.06
scaffold number5813533783101415
scaffold N50 (kbp)12.73825.9712.845790
GC%30.7630.8429.929.37
BUSCO2 Eukaryota
complete (of 303)
220276239294
BUSCO2 Eukaryota
complete + partial (of 303)
246282265298
Average # of hits per BUSCO1.001.041.001.00
% of BUSCOs with more
than one match
0.453.990.420.34

Results and discussion

Orthology assignment on sets of predicted proteomes derived from 59 genome and transcriptome assemblies yielded 4294 gene trees with at least 20 sequences each, sampling all five major metazoan clades and outgroups, from which we obtained 1388 well-aligned orthologues. Within this set, individual maximum-likelihood (ML) gene trees were constructed, and a set of 430 most-informative orthologues were selected on the basis of tree-likeness scores (Misof et al., 2013). This yielded an amino-acid matrix of 73,547 residues with 37.55% gaps or missing data, with an average of 371.92 and 332.75 orthologues represented for Cnidaria and Placozoa, respectively (with a maximum of 383 orthologues present for the newly sequenced placozoan H4 clade representative; Figure 1).

Figure 1 with 2 supplements see all
Consensus phylogram showing deep metazoan interrelationships under Bayesian phylogenetic inference of the 430-orthologue amino acid matrix, using the CAT + GTR + Г4 mixture model.

All nodes received full posterior probability. Numerical annotations of given nodes represent Extended Quadripartition Internode Certainty (EQP-IC) scores, describing among-gene-tree agreement for both the monophyly of the five major metazoan clades and the given relationships between them in this reference tree. A bar chart on the right depicts the proportion of the total orthologue set each terminal taxon is represented by in the concatenated matrix. ‘Placozoa H1’ in this and all other figures refers to the GRELL isolate sequenced in Srivastava et al., 2008, which has there and elsewhere been referred to as Trichoplax adhaerens, despite the absence of type material linking this name to any modern isolate. Line drawings of clade representatives are taken from the BIODIDAC database (http://biodidac.bio.uottawa.ca/).

https://doi.org/10.7554/eLife.36278.004

Our Bayesian analyses of this matrix place Cnidaria and Placozoa as sister groups with full posterior probability under the general site-heterogeneous CAT + GTR + Г4 model (Figure 1). Under ML inference with the C60 +LG + FO + R4 profile mixture model (Wang et al., 2018) (Figure 1—figure supplement 1), we again recover Cnidaria + Placozoa, albeit with more marginal resampling support. Both Bayesian and ML analyses show little internal branch diversity within Placozoa. Accordingly, deleting all newly-added placozoan genomes from our analysis has no effect on topology and only a marginal effect on support in ML analysis (Figure 1—figure supplement 2). Quartet-based concordance analyses (Zhou, 2017) show no evidence of strong phylogenetic conflicts among ML gene trees in this 430-gene set (Figure 1), although internode certainty metrics are close to 0 for many key clades including Cnidaria + Placozoa, indicating that support for some ancient relationships may be masked by gene-tree estimation errors, emerging only in combined analysis (Gatesy and Baker, 2005).

Compositional heterogeneity of amino-acid frequencies along the tree is a source of phylogenetic error not modelled by even complex site-heterogeneous substitution models such as CAT+GTR (Blanquart and Lartillot, 2008; Foster, 2004; Lartillot and Philippe, 2004; Lartillot et al., 2013). Furthermore, previous analyses (Nosenko et al., 2013) have shown that placozoans and choanoflagellates in particular, both of which taxa our matrix samples intensively, deviate strongly from the mean amino-acid composition of Metazoa, perhaps as a result of genomic GC content discrepancies. As a measure to at least partially ameliorate such nonstationary substitution, we recoded the amino-acid matrix into the 6 ‘Dayhoff’ categories, a common strategy previously shown to reduce the effect of compositional variation among taxa, albeit the Dayhoff-6 groups represent only one of many plausible recoding strategies, all of which sacrifice information (Feuda et al., 2017; Nesnidal et al., 2010; Rota-Stabelli et al., 2013; Susko and Roger, 2007). Analysis of this recoded matrix under the CAT + GTR model again recovered full support (pp = 1) for Cnidaria + Placozoa (Figure 2). Indeed, under Dayhoff-6 recoding, the only major change is in the relative positions of Ctenophora and Porifera, with the latter here constituting the sister group to all other animals with full support. Similar recoding-driven effects on relative positions of Porifera and Ctenophora have also been seen in other recent work (Feuda et al., 2017), and have been interpreted to indicate a role for compositional bias in misplacing Ctenophora as sister group to all other animals

Consensus phylogram under Bayesian phylogenetic inference under the CAT + GTR + Г4 mixture model, on the 430-orthologue concatenated amino acid matrix, recoded into 6 Dayhoff groups.

Nodes annotated with posterior probability; unannotated nodes received full support.

https://doi.org/10.7554/eLife.36278.007

Many research groups, using good taxon sampling and genome-scale datasets, and even recently including data from a new divergent placozoan species (Whelan et al., 2017; Feuda et al., 2017; Eitel, 2017), have consistently reported strong support for Planulozoa under the CAT + GTR model. Indeed, when we construct a supermatrix from our predicted peptide catalogues using a different strategy, relying on complete sequences of 303 pan-eukaryote ‘Benchmarking Universal Single-Copy Orthologs’ (BUSCOs) (Simão et al., 2015), we also see full support in a CAT + GTR + Г analysis for Planulozoa, in both amino-acid (Figure 3a) and Dayhoff-6 recoded alphabets (Figure 3b). Which phylogeny is correct, and what process drives support for the incorrect topology? Posterior predictive tests, which compare the observed among-taxon usage of amino-acid frequencies to expected distributions simulated using the sampled posterior distribution and a single composition vector, may provide insight (Feuda et al., 2017; Lartillot and Philippe, 2004). Both the initial 430-gene matrix and the 303-gene BUSCO matrix fail these tests, but the BUSCO matrix fails it more profoundly, with z-scores (measuring mean-squared across-taxon heterogeneity) scoring in the range of 330–340, in contrast to the range of 176–187 seen in the 430-gene matrix (Table 2). Furthermore, inspecting z-scores for individual taxa in representative chains from both matrices shows that a large amount of this global difference in z-scores can be attributed to placozoans, with additional contributions from choanoflagellates and select isolated representatives of other clades (Figure 3C).

Posterior consensus trees from CAT + GTR + Г4 mixture model analysis of a 94,444 amino acid supermatrix derived from the 303 single-copy conserved eukaryotic BUSCO orthologs, analysed in A.

amino acid space or (B) the Dayhoff-6 reduced alphabet space. Nodal support values comprise posterior probabilities; nodes with full support not annotated. Taxon colourings as in previous Figures. (C) Plot of z-scores (summed absolute distance between taxon-specific and global empirical frequencies) from representative posterior predictive tests of amino acid compositional bias, from both the BUSCO 303-orthologue matrix (red) and the initial 430-orthologue matrix (blue). Placozoan taxon abbreviations are shown in blue font.

https://doi.org/10.7554/eLife.36278.008
Table 2
Mean (and standard deviation of) z-scores from posterior predictive tests of per-site amino acid diversity and among-lineage compositional homogeneity, called for amino-acid alignments using the PhyloBayes-MPI v1.8 readpb_mpi –div and –comp options, respectively, with burn-ins selected as per the posterior consensus summaries shown elsewhere.

Except for the diversity statistic in the test-passing matrix, all tests reject (at p=0.05) the adequacy of the inferred CAT + GTR + Г4 model to describe the data.

https://doi.org/10.7554/eLife.36278.009
DiversityComposition (mean)Composition (maximum)
430 matrix1.94 (0.09)181.35 (7.50)105.04 (3.13)
BUSCO 303-gene matrix11.27 (0.73)334.98 (4.56)107.56 (6.17)
comp-failed matrix2.51 (0.19)270.16 (12.03)173.87 (9.15)
comp-passed matrix0.81 (0.18)107.67 (10.10)63.19 (6.95)

As a final measure to describe the influence of compositional heterogeneity in this dataset, we applied a null-simulation test for compositional bias to each alignment in our set of 1388 orthologues. This test, which compares the real data to a null distribution of amino-acid frequencies simulated along assumed gene trees with a substitution model using a single composition vector, is less prone to Type II errors than the more conventional X (Grell and Benwitz, 1971) test (Foster, 2004). Remarkably, at a conservative significance threshold of α = 0.10, the majority (764 genes or ~55%) of this gene set is identified as compositionally biased by this test, highlighting the importance of using appropriate statistical tests to control this source of systematic error, rather than applying arbitrary heuristic cutoffs (Kück and Struck, 2014). Building informative matrices from gene sets on either side of this significance threshold, and again applying both CAT + GTR mixture models and ML profile mixtures, we see strong support for Cnidaria + Placozoa in the test-passing supermatrix, and conversely, strong support for Cnidaria + Bilateria in the test-failing supermatrix (Figure 4, Figure 4—figure supplement 1, Figure 4—figure supplement 2). Interestingly, in trees built through CAT + GTR + Г4 analysis of the test-failing supermatrix (Figure 4A,C), in both amino-acid and Dayhoff-6 alphabets, we also observe full support for Porifera as sister to all other animals. In contrast, analysis of this amino acid matrix under a profile mixture model recovers support for Ctenophora in this position (Figure 4—figure supplement 1), indicating that, at least for this alignment, compositional heterogeneity need not be invoked to explain why outcomes differ among analyses, as some have argued (Feuda et al., 2017): both CAT + GTR and the C60 +LG + FO + R4 profile mixture model assume a single composition vector over time, but the CAT + GTR model is better able to accommodate site-heterogeneous substitution patterns (Lartillot et al., 2013; Quang et al., 2008). In the context of this experiment, Dayhoff-6 recoding appears impactful only for the test-passing supermatrix (Figure 4B,D), where it obviates support for Ctenophora-sister (Figure 4B, Figure 4—figure supplement 2) in favour of (albeit, with marginal support) Porifera-sister (Figure 4D), and also diminishes support for Placozoa + Cnidaria (in contrast to the 430-gene matrix; Figure 2), perhaps reflecting the inherent information loss of using a reduced amino-acid alphabet for this relatively shorter matrix.

Figure 4 with 2 supplements see all
Schematic depiction of deep metazoan interrelationships in posterior consensus trees from CAT + GTR + Г4 mixture model analyses of matrices made from subsets of genes passing or failing a sensitive null-simulation test of compositional heterogeneity.

Panels correspond to (A) the amino acid matrix made within the failing set; (B) the amino acid matrix derived from the passing set; (C) the Dayhoff-6 recoded matrix from the failing set; (D) the Dayhoff-6 recoded matrix from the passing set. Only nodes with posterior probability less than 1.00 are annotated numerically.

https://doi.org/10.7554/eLife.36278.010

A possible hidden variable related to the phylogenetic discordance we describe, the precise significance of which remains unclear, is mean trimmed alignment length: both the test-passing and the original 430-gene matrix are composed of considerably shorter alignments than the test-failing and the 303-gene BUSCO matrix (see Materials and methods). Indeed, alignment length has been previously shown to be predictive of a number of other metrics of phylogenetic relevance (Shen et al., 2016); the generality and directionality of such relationships in empirical datasets at varying scales of divergence is clearly worthy of further investigation.

The previously cryptic phylogenetic link between cnidarians and placozoans seen in gene sets less influenced by compositional bias will require further testing with other analyses and data modalities, such as rare genomic changes, which should be ever more visible as highly contiguous assemblies continue to be reported from non-bilaterian animals (Eitel et al., 2018; Kamm et al., 2018; Jiang, 2018; Leclère, 2018). However, if validated, this relationship must continue to raise questions on the homology of certain traits across non-bilaterians. Many workers, citing the incompletely known development (Eitel et al., 2011; Pearse and Voigt, 2007) and relatively bilaterian-like gene content of placozoans (Srivastava et al., 2008; Eitel, 2017), presume that these organisms must have a still-unobserved, more typical development and life cycle (DuBuc et al., 2018), or else are merely oddities that have experienced wholesale secondary simplification, having scant significance to any evolutionary path outside their own. Indeed, it is tempting to interpret this new phylogenetic position as further bolstering such hypotheses, as much work on cnidarian models in the evo-devo paradigm is predicated on the notion that cnidarians and bilaterians share, more or less, many homologous morphological features, viz. axial organization (Genikhovich and Technau, 2017DuBuc et al., 2018), nervous systems (Liebeskind et al., 2017; Moroz and Kohn, 2016; Kelava et al., 2015; Kristan, 2016; Arendt et al., 2016), basement-membrane lined epithelia (Fidler et al., 2017; Leys and Riesgo, 2012), musculature (Steinmetz et al., 2012), embryonic germ-layer organisation (Steinmetz et al., 2017), and internal digestion (Presnell et al., 2016; Putnam et al., 2007; Hejnol and Martindale, 2008; Martindale and Hejnol, 2009). While we do not argue, as some have done (Schierwater, 2005; Syed and Schierwater, 2002), that placozoans resemble hypothetical metazoan ancestors, we hesitate to dismiss them a priori as irrelevant to understanding early bilaterian evolution in particular: although apparently simpler and less diverse, placozoans nonetheless have equal status to cnidarians as an immediate extant outgroup. Rather, we see value in testing assumed hypotheses of homology, character by character, by extending pairwise comparisons between bilaterians and cnidarians to include placozoans, an agenda which demands reducing the large disparity in embryological, physiological, and molecular genetic knowledge between these taxa, towards which recent progress has been made using both established methods such as in situ hybridization (DuBuc et al., 2018) and image analysis (Varoqueaux, 2018), as well as new technologies such as single-cell RNA-seq (Sebé-Pedrós, 2018a; Sebé-Pedrós et al., 2018b). Conversely, we emphasize another implication of this phylogeny: characters that can be validated as homologous at any level between Bilateria and Cnidaria must have originated earlier in animal evolution than previously appreciated, and should either cryptically occur in modern placozoans or else have been lost at some point in their ancestry. In this light, paleobiological scenarios of early animal evolution founded on inherently phylogenetically-informed interpretations of Ediacaran fossil forms (Cavalier-Smith, 2018; Cavalier-Smith, 2017; Dufour and McIlroy, 2018; Sperling and Vinther, 2010; Evans et al., 2017) and molecular clock estimates (Cunningham et al., 2017; dos Reis et al., 2015; Dohrmann and Wörheide, 2017; Erwin et al., 2011) may require re-examination.

Materials and methods

Sampling, sequencing, and assembling reference genomes from previously unsampled placozoans

Haplotype H4 and H6 placozoans were collected from water tables at the Kewalo Marine Laboratory, University of Hawaii-Manoa, Honolulu, Hawaii in October 2016. Haplotype H11 placozoans were collected from the Mediterranean ‘Anthias’ show tank in the Palma de Mallorca Aquarium, Mallorca, Spain in June 2016. All placozoans were sampled by placing glass slides suspended freely or mounted in cut-open plastic slide holders into the tanks for 10 days (Pearse and Voigt, 2007). Placozoans were identified under a dissection microscope and single individuals were transferred to 500 µl of RNAlater, stored as per manufacturer’s recommendations.

DNA was extracted from 3 individuals of haplotype H11 and 5 individuals of haplotype H6 using the DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany). DNA and RNA from three haplotype H4 individuals were extracted using the AllPrep DNA/RNA Micro Kit (Qiagen), with both kits used according to manufacturer’s protocols.

Illumina library preparation and sequencing was performed by the Max Planck Genome Centre, Cologne, Germany, as part of an ongoing metagenomics project in marine symbiosis. In brief, DNA/RNA quality was assessed with the Agilent 2100 Bioanalyzer (Agilent, Santa Clara, USA) and the genomic DNA was fragmented to an average fragment size of 500 bp. For the DNA samples, the concentration was increased (MinElute PCR purification kit; Qiagen, Hilden, Germany) and an Illumina-compatible library was prepared using the Ovation Ultralow Library Systems kit (NuGEN, Leek, The Netherlands) according the manufacturer’s protocol. For the haplotype H4 RNA samples, the Ovation RNA-seq System V2 (NuGen, 376 San Carlos, CA, USA) was used to synthesize cDNA and sequencing libraries were then generated with the DNA library prep kit for Illumina (BioLABS, Frankfurt am Main, Germany). All libraries were size selected by agarose gel electrophoresis, and the recovered fragments quality assessed and quantified by fluorometry. For each DNA library 14 – 75 million 100 bp or 150 bp paired-end reads were sequenced on Illumina HiSeq 2500 or 4000 machines (Illumina, San Diego, U.S.A); for the haplotype H4 RNA libraries 32 – 37 million single 150 bp reads were obtained.

For assembly, adapters and low-quality reads were removed with bbduk (https://sourceforge.net/projects/bbmap/) with a minimum quality value of two and a minimum length of 36 and single reads were excluded from the analysis. Each library was error corrected using BayesHammer (Nikolenko et al., 2013). A combined assembly of all libraries for each haplotype was performed using SPAdes 3.62 (Bankevich et al., 2012). Haplotype four and H11 data were assembled from the full read set with standard parameters and kmers 21, 33, 55, 77, 99. The Haplotype H6 data was preprocessed to remove all reads with an average kmer coverage <5 using bbnorm and then assembled with kmers 21, 33, 55 and 77.

Reads from each library were mapped back to the assembled scaffolds using bbmap (https://sourceforge.net/projects/bbmap/) with the option fast = t. Scaffolds were binned based on the mapped read data using MetaBAT (Kang et al., 2015) with default settings and the ensemble binning option activated (switch –B 20). The Trichoplax host bins were evaluated using metawatt (Strous et al., 2012) based on coding density and sequence similarity to the Trichoplax H1 reference assembly (NZ_ABGP00000000.1). The bin quality metrics were computed with BUSCO2 (Simão et al., 2015) (Table 1) and QUAST (Gurevich et al., 2013). Both the stringent metagenomics binning procedure (a procedure also expedient in other holobiont organisms (Celis et al., 2018)) and the very low proportion of multiple orthologue hits in the BUSCO2 assessment (Table 1) attest to the lack of evidence for residual non-placozoan contamination within the scaffolds used for gene prediction.

Predicting proteomes from transcriptome and genome assemblies

Predicted proteomes from species with published draft genome assemblies were downloaded from the NCBI Genome portal or Ensembl Metazoa in June 2017. For Clade A placozoans, host metagenomic bins were used directly for gene annotation. For the H6 and H11 representatives, annotation was entirely ab initio, performed with GeneMark-ES (Ter-Hovhannisyan et al., 2008); for the H4 representative, total RNA-seq libraries obtained from three separate isolates (BioProject PRJNA505163) were mapped to genomic contigs with STAR v2.5.3a (Dobin et al., 2013) under default settings; merged bam files were then used to annotate genomic contigs and derive predicted peptides with BRAKER v1.9 (Hoff et al., 2016) under default settings. Choanoflagellate proteome predictions (Simion et al., 2017) were provided as unpublished data from Dan Richter. Peptides from a Calvadosia (previously Leucosolenia) complicata transcriptome assembly were downloaded from compagen.org. Peptide predictions from Nemertoderma westbladi and Xenoturbella bocki as used in Cannon et al 2016 (Cannon et al., 2016) were provided directly by the authors. The transcriptome assembly (raw reads unpublished) from Euplectella aspergillum was provided by the Satoh group, downloaded from (http://marinegenomics.oist.jp/kairou/viewer/info?project_id=62). Predicted peptides were derived from Trinity RNA-seq assemblies (multiple versions released 2012–2016) as described by Laumer et al (Laumer et al., 2015). for the following sources/SRA accessions:: Porifera: Petrosia ficiformis: SRR504688, Cliona varians: SRR1391011, Crella elegans: SRR648558, Corticium candelabrum: SRR504694-SRR499820-SRR499817, Spongilla lacustris: SRR1168575, Clathrina coriacea: SRR3417192, Sycon coactum: SRR504689-SRR504690, Sycon ciliatum: ERR466762, Ircinia fasciculata, Chondrilla caribensis (originally misidentified as Chondrilla nucula) and Pseudospongosorites suberitoides from (https://dataverse.harvard.edu/dataverse/spotranscriptomes); Cnidaria: Abylopsis tetragona: SRR871525, Stomolophus meleagris: SRR1168418, Craspedacusta sowerbyi: SRR923472, Gorgonia ventalina: SRR935083; Ctenophora: Vallicula multiformis: SRR786489, Pleurobrachia bachei: SRR777663, Beroe abyssicola: SRR777787; Bilateria: Limnognathia maerski: SRR2131287. All other peptide predictions were derived through transcriptome assembly as paired-end, unstranded libraries with Trinity v2.4.0 (Haas et al., 2013), running with the –trimmomatic flag enabled (and all other parameters as default), with peptide extraction from assembled transcripts using TransDecoder v4.0.1 with default settings. For these species, no ad hoc isoform selection was performed: any redundant isoforms were removed during tree pruning in the orthologue determination pipeline (see below).

Orthologue identification and alignment

Predicted proteomes were grouped into top-level orthogroups with OrthoFinder v1.0.6 (Emms and Kelly, 2015), run as a 200-threaded job, directed to stop after orthogroup assignment, and print grouped, unaligned sequences as FASTA files with the ‘-os’ flag. A custom python script (‘renamer.py’) was used to rename all headers in each orthogroup FASTA file in the convention [taxon abbreviation] + ‘@’ + [sequence number as assigned by OrthoFinder SequenceIDs.txt file], and to select only those orthogroups with membership comprising at least one of all five major metazoan clades plus outgroups, of which exactly 4300 of an initial 46,895 were retained. Scripts in the Phylogenomic Dataset Construction pipeline (Yang and Smith, 2014) were used for successive data grooming stages as follows: Gene trees for top-level orthogroups were derived by calling the fasta_to_tree.py script as a job array, without bootstrap replicates; six very large orthogroups did not finish this process. In the same directory, the trim_tips.py, mask_tips_by_taxonID_transcripts.py, and cut_long_internal_branches.py scripts were called in succession, with ‘./. tre 10 10’, ‘././y’, and ‘./. mm 1 20. /’ passed as arguments, respectively. The 4267 subtrees generated through this process were concatenated into a single Newick file and 1419 orthologues were extracted with UPhO (Ballesteros and Hormiga, 2016). Orthologue alignment was performed using the MAFFT v7.271 ‘E-INS-i’ algorithm, and probabilistic masking scores were assigned with ZORRO (Wu et al., 2012), removing all sites in each alignment with scores below five as described previously (Laumer et al., 2015). 31 orthologues with retained lengths less than 50 amino acids were discarded, leaving 1388 well-aligned orthologues.

Matrix assembly

A full concatenation of all retained 1388 orthogroups was performed with the ‘geneStitcher.py’ script distributed with UPhO available at https://github.com/ballesterus/PhyloUtensils. However, such a matrix would be too large for tractably inferring a phylogeny under well-fitting mixture models such as CAT + GTR; therefore we used MARE v0.1.2 (Misof et al., 2013) to extract an informative subset of genes using tree-likeness scores, running with ‘-t 100’ to retain all taxa and using ‘-d 1’ as a tuning parameter on alignment length. This yielded our 430-orthologue, 73,547 site matrix, with a mean partition length of 202.24 (s.d. 116.96) residues.

As a check on the above procedure, which is agnostic to the identity of the genes assigned into orthologue groups, we also sought to construct a matrix using complete, single-copy sequences identified by the BUSCO v3.0.1 algorithm (Simão et al., 2015), using the 303-gene eukaryote_odb9 orthologue set. BUSCO was run independently on each peptide FASTA file used as input to OrthoFinder, and a custom python script (‘extract.py’) was used to parse the full output table from each species, selecting only those entries identified as complete-length, single-copy representatives of each BUSCO orthologue, and grouping these into unix directories, facilitating downstream alignment, probabilistic masking, and concatenation, as described for the OrthoFinder matrix. This 303-gene BUSCO matrix had a total length of 94,444 amino acids, with 39.6% of sites representing gaps or missing data, with mean partition length 311.70 (standard deviation 202.78).

Within the gene bins nominated by the test of compositional heterogeneity (see below), matrices were constructed again by concatenating and reducing matrices with MARE, using ‘-t 100’ to retain all taxa and setting ‘-d 0.5’ to yield a matrix of an optimal size for inferring a phylogeny under the CAT + GTR model. This procedure gave a 349-gene matrix of 80,153 amino acids (mean partition lengths 228.67 ± s.d. 136.19, 41.64% gaps) within the test-failing gene set, and a 348-gene matrix of 55,426 amino acids (mean partition lengths 158.27 ± s.d. 79.06, 38.92% gaps), within the test-passing set (Figure 4).

Phylogenetic inference

Individual ML gene trees were constructed on all 1388 orthologues in IQ-tree v1.6beta, with ‘-m MFP -b 100’ passed as parameters to perform automatic model selection and 100 standard nonparametric bootstraps on each gene tree.

For inference on the initial 430-gene matrix, we proceeded as follows: ML inference on the concatenated matrix (Figure 1—figure supplement 1) was performed with IQ-tree v1.6beta, passing ‘-m C60 +LG + FO + R4 bb 1000’ as parameters to specify a profile mixture model and retain 1000 trees for ultrafast bootstrapping; the ‘-bnni’ flag was used to incorporate NNI correction during UF bootstrapping, an approach shown to control misleading inflated support arising from model misspecification (Hoang et al., 2018). ML inference using only the H1 haplotype as a representative of Placozoa (Figure 1—figure supplement 2) was undertaken similarly, albeit using a marginally less complex profile mixture model (C20 +LG + FO + R4). Bayesian inference under the CAT + GTR + Г4 model was performed in PhyloBayes MPI v1.6j (Lartillot et al., 2013) with 20 cores each dedicated to four separate chains, run for 2885–3222 generations with the ‘-dc’ flag applied to remove constant sites from the analysis, and using a starting tree derived from the FastTree2 program (Price et al., 2010). The two chains used to generate the posterior consensus tree summarized in Figure 1 converged on exactly the same tree in all MCMC samples after removing the first 2000 generations as burn-in. Analysis of Dayhoff-6-state recoded matrices in CAT + GTR + Г4 was performed with the serial PhyloBayes program v4.1c, with ‘-dc -recode dayhoff6’ passed as flags. Six chains on the 430- gene matrix were run from 1441 to 1995 generations; two chains showed a maximum bipartition discrepancy (maxdiff) of 0.042 after removing the first 1000 generations as burn-in (Figure 2). QuartetScores (Zhou, 2017) was used to measure internode certainty metrics including the reported EQP-IC, using the 430 gene trees from those orthologues used to derive the matrix as evaluation trees, and using the amino-acid CAT + GTR + Г4 tree as the reference to be annotated (Figure 1).

For inference on the BUSCO 303 gene set, we ran 4 chains of the CAT + GTR + Г4 mixture model with PhyloBayes MPI v1.7a, applying the -dc flag again to remove constant sites, but here not specifying a starting tree; chains were run from 1873 to 2361 generations. Unfortunately, no pair of chains reached strict convergence on the amino-acid version of this matrix (with all pairs showing a maxdiff = 1 at every burn-in proportion examined), perhaps indicating problems mixing among the four chains we ran. However, all chains showed full posterior support for identical relationships among the five major animal groups, with differences among chains assignable to minor differences in the internal relationships within Choanoflagellata and Bilateria. Accordingly, the posterior consensus tree in Figure 3A is summarized from all four chains, with a burn-in of 1000 generations, sampling every 10 generations. For the Dayhoff-recoded version of this matrix, we ran six separate chains again with CAT + GTR + Г4 with the -dc flag, for 5433 – 6010 generations; two chains were judged to have converged, giving a maxdiff of 0.141157 during posterior consensus summary with a burn-in of 2500, sampling every 10 generations (Figure 3B).

For inference on the 348 and 349 gene matrices produced within gene bins defined by the null-simulation test of compositional bias (see below), we ran six chains each for the amino acid and recoded versions of each matrix, under CAT + GTR + Г4 with constant sites removed. In the amino-acid matrix, chains ran from 2709 to 3457 and 1423 – 1475 generations for the test-failing and test-passing matrices, respectively. In the recoded matrix, chains ran from 3893 to 4480 and 4350 – 4812 generations for the test-failing and test-passing matrices, respectively. In selecting chains to input for posterior consensus summary tree presentation (Figure 4A–D), we chose pairs of chains and burn-ins that yielded the lowest possible maxdiff values (all <0.1 with the first 500 generations discarded as burn-in, except for the amino-acid coded test-failing matrix, whose most similar pair of chains gave a maxdiff of 0.202 with 1000 generations discarded as burn-in). We emphasize that the topologies and supports displayed in Figure 4A–D are similar when all chains (and conservative burn-in values) are used to generate consensus trees. For ML trees using profile mixture models for the test-failing (Figure 4—figure supplement 1) and test-passing (Figure 4—figure supplement 2) gene matrices, we used IQ-tree 1.6rc, calling in the same manner (with C60 +LG + FO + R4) as used on our 430-gene matrix (see above).

Tests of compositional heterogeneity

For posterior predictive tests of compositional heterogeneity and residue diversity using MCMC samples under CAT + GTR (Table 2), we used PhyloBayes MPI v1.8 to test two chains from the initial 430-gene matrix, three chains from the 303-gene BUSCO matrix, and six chains each from the 348 (test-failing) and 349 (test-passing) gene matrices, removing 2000 generations from the first matrix and 1000 from the others as burn-in. Results from tests on representative chains were selected for plotting in Figure 3C and summary in Table 2; however, results from all chains tested are deposited in the Data Dryad accession.

For the per-gene null simulation tests of compositional bias (Foster, 2004), we used the p4 package (https://github.com/pgfoster/p4-phylogenetics), inputting the ML trees inferred by IQ-tree for each of the 1388 alignments, and assuming an LG+Γ4 substitution model with a single empirical frequency vector for each gene; this test was implemented with a simple wrapper script (‘p4_compo_test_multiproc.py’) leveraging the python multiprocessing module. We opted not to model-test each gene individually in p4, both because the range of models implemented in p4 are more limited than those tested for in IQ-tree, and because, as a practical matter, LG (usually with variant of the FreeRates model of rate heterogeneity) was chosen as the best-fitting model in the IQ-tree model tests for a large majority of genes, suggesting that LG+Γ4 would be a reasonable approximation for the purposes of this test. We selected an α-threshold of 0.10 for dividing genes into test-passing and -failing bins as a conservative measure; however, we emphasize that even at a less conservative α = 0.05, 47% of genes would still be detected as falling outside the null expectation.

Source data availability

SRA accession codes, where used, and all alternative sources for sequence data (e.g. individually hosted websites, personal communications), are listed above in the Materials and methods section. A DataDryad accession is available at https://doi.org/10.5061/dryad.6cm1166, which makes available all helper scripts, orthogroups, multiple sequence alignments, phylogenetic program output, and raw host proteomes inputted to OrthoFinder. Metagenomic bins containing placozoan host contigs and gene annotations from H4, H6 and H11 isolates are also provided in this accession. PhyloBayes. chain files, due to their large size, are separately accessioned at in Zenodo at https://doi.org/10.5281/zenodo.1197272.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
    Bemerkungen zur Gastraea-Theorie
    1. O Bütschli
    (1884)
    Morphologisches Jahrbuch 9:415–427.
  7. 7
  8. 8
    Origin of animal multicellularity: precursors, causes, consequences—the choanoflagellate/sponge transition, neurogenesis and the Cambrian explosion
    1. T Cavalier-Smith
    (2017)
    Philosophical Transactions of the Royal Society B: Biological Sciences 372:0476.
  9. 9
  10. 10
    Binning enables efficient host genome reconstruction in cnidarian holobionts
    1. JS Celis
    2. D Wibberg
    3. C Ramírez-Portilla
    4. O Rupp
    5. A Sczyrba
    6. A Winkler
    7. J Kalinowski
    8. T Wilke
    (2018)
    GigaScience, 7, 10.1093/gigascience/giy075, 29917104.
  11. 11
  12. 12
  13. 13
  14. 14
    Dating early animal evolution using phylogenomic data
    1. M Dohrmann
    2. G Wörheide
    (2017)
    Scientific Reports, 7, 10.1038/s41598-017-03791-w, 28620233.
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
    Die Ultrastruktur Von Trichoplax adhaerens F. E. Schultze
    1. KG Grell
    2. B Benwitz
    (1971)
    Cytobiologie 4:216–240.
  34. 34
  35. 35
    Placozoa. in Microscopic Anatomy of Invertebrates
    1. KG Grell
    2. A Ruthmann
    (1991)
    Book Depository.
  36. 36
  37. 37
  38. 38
    Getting to the bottom of anal evolution
    1. A Hejnol
    2. JM Martín-Durán
    (2015)
    Zoologischer Anzeiger - a Journal of Comparative Zoology 256:61–74.
    https://doi.org/10.1016/j.jcz.2015.02.006
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
    Evolution of eumetazoan nervous systems: insights from cnidarians
    1. I Kelava
    2. F Rentzsch
    3. U Technau
    (2015)
    Philosophical Transactions of the Royal Society B: Biological Sciences 370:20150065.
    https://doi.org/10.1098/rstb.2015.0065
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
    Epithelia, an evolutionary novelty of metazoans
    1. SP Leys
    2. A Riesgo
    (2012)
    Journal of Experimental Zoology Part B: Molecular and Developmental Evolution 318:438–447.
    https://doi.org/10.1002/jez.b.21442
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
    Independent origins of neurons and synapses: insights from ctenophores
    1. LL Moroz
    2. AB Kohn
    (2016)
    Philosophical Transactions of the Royal Society B: Biological Sciences 371:20150041.
    https://doi.org/10.1098/rstb.2015.0041
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
    Trichoplax adhaerens, nov. gen., nov. spec
    1. FE Schulze
    (1883)
    Zoologischer Anzeiger 6:92–97.
  77. 77
    Early metazoan cell type diversity and the evolution of multicellular gene regulation. nat
    1. A Sebé-Pedrós
    (2018a)
    Ecology and Evolution 2:1176–1188.
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
    Trichoplax adhaerens: discovered as a missing link, forgotten as a Hydrozoan, re-discovered as a key to metazoan evolution
    1. T Syed
    2. B Schierwater
    (2002)
    Vie Et Milieu 52:177–187.
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
  100. 100
  101. 101
  102. 102
  103. 103
  104. 104

Decision letter

  1. Antonis Rokas
    Reviewing Editor; Vanderbilt University, United States
  2. Diethard Tautz
    Senior Editor; Max-Planck Institute for Evolutionary Biology, Germany

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Placozoa and Cnidaria are sister taxa" for consideration by eLife. Your article has been reviewed by Diethard Tautz as the Senior Editor, a Reviewing Editor and three reviewers. The following individuals involved in review of your submission have agreed to reveal their identity: Davide Pisani (Reviewer #2); David C Plachetzki (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission

Summary:

The manuscript by Laumer and colleagues is the latest chapter in the quest to understand (and question) relationships at the base of the animal tree. For quite some time, the controversy has focused on whether poriferans or ctenophores are the sister lineage to the rest of the metazoan phyla, whereas the rest of the relationships appear to have been somewhat stable toward a grouping of bilaterians + cnidarians, with placozoans as the sister to that lineage. This study questions the validity of this later set of accepted relationships as well as sheds additional light on the controversy about sponges-sister vs. ctenophores-sister. The central claim made by Laumer et al., is that among lineage compositional heterogeneity has plagued most previous analyses of similar phylogenetic depth and that the finding of bilateria + cnidaria to the exclusion of placozoa is the result of such error. Such a conclusion would force a reevaluation of the significance of placozoans in our understanding of early metazoan evolution and would additionally place our understanding of placozoan biology in a new context.

Essential revisions:

1) New placozoan genomes were sequenced and assembled in this paper, but there is very little description of the properties of these new assemblies. Several questions related to the contiguity of the assemblies produced, how the new assemblies compare with the existing assembly and gene models for Trichoplax adhaerens, the degree of overlap among the new genome assemblies reported here, BUSCO scores, etc., of the protein models produced from these analyses, are not addressed. Similarly, the phylogenomic data matrix lacks descriptions of key attributes such as global taxon occupancy and a distribution of partition lengths. Finally, one challenge when sequencing and assembling genomes and, to a lesser extent transcriptomes, from marine meiofauna is contamination. It was not clear how contamination was dealt with in the preparation of genome assemblies, or in the phylogenomic dataset construction procedure. The central claim of the paper is that compositional heterogeneity drives a certain analytical outcome in this and in previous analyses, but given the limited description of the data that underlie this result, one could imagine other artifacts, unrelated to compositional heterogeneity, that could also be at play.

2) The paper leverages new genome scale datasets for placozoa, but it is not clear how the existing genome for Trichoplax adhaerens was utilized. Our reading is that a sample of Haplotype H1, the same haplotype that has been previously sequenced, was assembled here and data from this new H1 assembly was utilized, together with separate assemblies of the other haplotypes. If this is correct, it is not clear why the existing high-quality protein models for Trichoplax were not utilized in the production of the phylogentic matrix. This simple inclusion could help allay concerns over the nature of the new data.

3) Laumer et al., propose that the findings of the majority of previous analyses were influenced by compositional heterogeneity, as uncovered in their analyses. We note that the effects of compositional heterogeneity in the context of deep metazoan phylogenomics were explored previously at least once previously (in Borowiec et al., 2015) using dahoff 6 recoding; however, the Borowiec et al., study did not find the result reported here using the existing Trichoplax data. Moreover, the effect of compositional heterogeneity in previous studies were not directly tested by Laumer et al. Taxon sampling and extensive differences in the datasets utilized used in previous analyses are additional variables that need to be accounted for. At least a subset of previously published phylogenomic datasets that bear on the position of placozoa should be reanalyzed under the procedure used here, and cases where this has already been done should be addressed.

4) Results and Discussion section. The reported PPA scores are very bad, indicating that both datasets are extremely heterogeneous and that at the AA level both datasets are unreliable. The PPA do suggest BUSCO is worst, but that does not make your new AA dataset more reliable as Z-scores greater than one hundred indicate an utter failure of the model to describe the data (see Feuda et al., 2017 for a discussion of interpretation of Z-scores). This should be pointed out in your discussion. Note however, that the fact that a dataset is highly heterogeneous does not mean that the tree it supports is incorrect; that depends on the specific of the dataset (taxon specific distribution of heterogeneity, presence of other forms of noise etc.). High heterogeneity means that we should be cautious when interpreting the results as the tree might include tree-reconstruction artefacts. So, our reading of these two analyses is that both datasets are heterogeneous and that tests (like analyses under Dayhoff-6) need to be carried out to validate the clades in these trees. You did these tests and they clearly indicate that Placo-Cni and Pori-sister are unlikely to be compositional artefact while Cneto-sister is likely a compositional artefact.

5) Results and Discussion section: You need to provide more information about heterogeneity. Are you talking of site or lineage specific heterogeneity? These are two different components of a dataset’s heterogeneity that are differently modeled and both need to be considered. Site-specific heterogeneity is generally modelled relatively efficiently using CAT-GTR while lineage specific heterogeneity cannot be modeled using the standard models you used in this paper, as you would need BreakpointCAT, NDCH or similar to model it. D6 Recoding reduces both forms of heterogeneity and, in combination with CAT-GTR generally, allows for an adequate modeling of site-specific heterogeneity and an improvement in the modeling of lineage specific heterogeneity (the latter not necessarily reaching adequacy – see Feuda et al., 2017). Hence it is key that you report statistics for all your datasets (Z-scores for both site and lineage-specific heterogeneity for all you're a and Dayhoff datasets). A table with the Diversity and Comp PPA (these can be derived using Phylobayes) will do and should be easily done as you should already have all the chains (as you already run other PPA). A table presenting all z cores for all your dataset would be key to interpret your results (Figure 1 to Figure 4D included) and decide which tree is less likely to be incorrect/more likely to be accurate.

6) Results and Discussion section: This section reads like an ad hoc attempt to justify Ctenophora-sister, a result that is tangential to your paper (which is about Placozoa) and that is not supported by your analyses. Specifically:

6A) Figure 3 presents the BUSCO trees under CAT-GTR+G (3A) is AA and (3B) is Dayhoff. Figure 3A (your worst overall dataset in terms of heterogeneity) clearly supports Ctenophora-Sister, while Figure 3B shows an unorthodox PORIFERA+CTENOPHORA (but with relatively low support ~ 75%). Neither of these two figures supports Porifera-sister. According to your arguments BUSCO is the worst dataset. When considered as an AA alignment, the BUSCO dataset maximizes its compositional heterogeneity and supports Cteno-sister. In the text you state that Figure 4A (the dataset of the heterogeneous genes in your dataset – that we will discuss further below) provides evidence that Pori-sister might be linked to heterogeneity. However, even if that was actually the case, then Figure 3A must be evidence that Cteno-sister is a compositional artefact (equally linked to heterogeneity). Yet this important fact is not even mentioned in the text. This is surprising and unfair given that you use the heterogeneity of the dataset used to derive Figure 4A to make an argument against Porifera-sister, the heterogeneity of Figure 3A to suggest Placozoa + Bilateria is likely to be an artefact, but you do not discuss the fact that also Cteno-sister emerges from the "bad" BUSCO dataset and should thus be considered to be dubious. The text needs to be modified to reflect this.

6B) The experiment in Figure 4 is nicely designed, but some aspects of its implementation and the interpretation of its results are problematic. Specifically, we think that the analyses reported in Figure 4A and 4C are misleading for two reasons. First, the data matrix cannot be proved to be composed only by heterogeneous genes. This is because 111 of the 764 genes in the dataset used to derive this figure (~ 14% of the superalignment) are composed of genes that are not, strictly speaking (i.e., at the 0.05 cutoff level), heterogeneous. Second, even if this dataset was composed purely of heterogeneous genes it could not be used to claim, for example, that Porifera-sister is linked to heterogeneity. This is because on this dataset Dayhoff recoding did not change the position of the sponges. This indicates that the signal for this node in this dataset (irrespective of its heterogeneity) is not driven by the within-D6classes changes that are known to be associated with compositional heterogeneity. Irrespective of the heterogeneity of the dataset, the fact that Porifera sister is unchanged and does not loose support upon recoding indicates that it is supported by between D6-classes changes, which are not silenced by recoding, and are more likely to be associated phylogenetic signal. To conclude, you do not have evidence in Figure 4A and 4C to claim that the assertion of Feuda (that Cteno-sister is likely to be associated with compositional heterogeneity while Porifera-sister is not likely to be a compositional artifact) might be incorrect. The Dayhoff test rejects your conclusion. So, what to do with the analysis of Figure 4A and 4C? Our thought is that this dataset is confusing your experiments because it is composed of a mixture of more and less heterogeneous genes and it is best excluded. We suggest that you just report the results in Figure 4B and 4D which are actually very clear and say all that there is to say on Placo-Cni and Pori-sister vs. Cteno-sister.

6C) The fact that the genes in your "homogeneous dataset" are individually homogeneous does not mean that your homogenous superalignment is in itself homogeneous. This needs to be tested (using PPA under CAT-GTR for both Dayhoff-6 and AA datasets). This is because heterogeneity in a multigene dataset adds up and it is customary that superalignments composed of individually homogeneous genes fail heterogeneity tests. My expectation is that the dataset in Figure 4B will be shown to be still heterogeneous and that of Figure 4D will be shown to be much more homogeneous (as it is always the case with D6-recoded datasets). Accordingly, the analyses of Figure 4BD indicate that there is evidence for Cteno-sister to be a compositional artefact (driven by within D6-classes changes only present in Figure 4B) while there is no evidence for Cnidaria+Placoza and Pori-sister to be compositional artifacts (as they are both present in Figure 4D where only between classes changes are evident). In addition, this analysis confirms the existence of two signals in the data with reference to Ctenophores (one linked to unreliable within D6 classes changes) and one linked to the more reliable between classes changes, while Placo-Cnidaria seem to be the only signal in the dataset and it is more strongly represented in more reliable between-classes changes.

7) Results and Discussion section: Here you seem to assume that Dayhoff recoding always have to change the topology and that if the topology does not change the recoding, in some way, it failed. The application of D6 was "not significant" (in your words) for the dataset of Figure 4AC. The problem with your statement is that a tree does not need to be incorrect simply because the data are heterogeneous, hence D6 does not need to invariably cause topological changes. So, its application was not "not significant", but it did not drive to a topological change. As said above as well, D6 reduces the influence of compositional heterogeneity by silencing within category changes. The analysis in Figure 4AC indicates that Porifera-sister is driven by more reliable among category changes which are generally not compositionally driven. Otherwise it would have disappeared in the Dayhoff analysis. Note that we are not saying that Pori-sister is correct; rather, we are saying that Figure 4A C does not provide evidence that this clade is compositionally driven, hence you cannot conclude that Porifera-sister might be associated with compositional attractions. Note that, with reference to the Ctenophora debate, the application of recodings has invariably produced strongly directional changes Ctenophora->Porifera. This is a highly non-random directional change and at this time there is no known Dataset that while supporting Porifera-sister at the AA level (under CAT-GTR) was found to support Cteno-sister after recoding. This is true of all the datasets of Whelan when Choanoflagellata are the only outgroups used (pori-sister as AA and as Dayhoff), and for your dataset in Figure 4AC – for example. This can only be interpreted as suggesting that in all available datasets Porifera-sister is never driven by within category changes, accordingly, when it emerges at the root, is cannot be because of a compositional attraction – irrespective of the heterogeneity of the dataset itself. Differently, D6 experiments invariably indicate that when ctenophores emerges at the base, compositional heterogeneity can never (all datasets tested to date behave exactly in this same way -including yours) be ruled out – strongly suggesting Cteno-sister to be a compositional artifact.

8) Title: Given how controversial the relationships at the base of the animal tree have been, perhaps a more moderate title would be more fitting (e.g., Phylogenomic support for a sister relationship of Placozoa and Cnidaria).

9) The scripts used for data processing in genome assembly, phylogenomics matrix construction and analysis are not linked in the paper. Sharing these scripts on a publicly available repository like Github would enhance the reproducibility of the present study.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for submitting your article "Support for a clade of Placozoa and Cnidaria in genes with minimal compositional bias" for consideration by eLife and for your patience during the second round of reviewers. Your article has been re-reviewed and the evaluation has been overseen by a Reviewing Editor and Diethard Tautz as the Senior Editor. The following individual involved in review of your submission has agreed to reveal his identity: David C Plachetzki (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The manuscript was markedly improved and the authors' efforts are greatly appreciated. However, the concern remains that something more than compositional bias lies behind Cnidaria+Placozoa and that this finding is unstable at best. Three critical pieces of information emerge from the revision and the authors’ response: (1) Compositionally unbiased partitions that support Cnidaria+Placozoa are systematically shorter than partitions that do not support this clade. (2) A reanalysis of the Simeon et al., dataset, which is much larger, did not support Cnidaria+Placozoa following a similar phylogenetic approach used by Laumer et al. (3) In unpublished data, Cnidaria+Placozoa is only recovered in analyses that lack specific outgroups. Together, these findings suggest that something other than the removal of compositional bias, particular to the Laumer et al., dataset, is driving Cnidaria+Placozoa.

Essential revisions:

1) The authors allay some concerns over the description of the genome data and the point that the new genome assemblies produce BUSCO orthologue occupancies that are similar to other transcriptome assemblies is appreciated. One does wonder why short read genome assemblies, which produced highly fragmented assemblies, were utilized in the first place, instead of transcriptomes. If the purpose of doing so was to enhance placozoan genome resources, these resources are not described here in any detail because they are too fragmented. "we opted not [sic] to omit a detailed characterisation of these genomes, partially out of spatial constraints (this article was submitted as a Short Report format), and partially since the genomes themselves are Illumina short-insert-only assemblies and therefore relatively fragmented". Please provide this information.

2) The manuscript now includes information of the percentage, means and standard deviations for partition lengths. This information is important for evaluating the possible causes of the phylogenetic discrepancies (e.g. Cnidaria+Placozoa vs. Cnidaria+Bilateria) observed in different datasets. We now see that dataset that supports Cnidaria+Placozoa are systematically composed of shorter partitions. Thus, compositional bias is not the only difference between the two datasets. No further explanation is given. Please report these data and explicitly discuss the finding that ompositional bias is not the only difference between the two datasets.

3) The paragraph describing metaBAT and metawatt does not explicitly mention contamination, which would help the reader interpret the purpose of these steps. Please rectify this.

4) We agree that Dayhoff-6 recoding is not a silver bullet for compositional bias and that previous analyses that have not filtered data for compositionally unbiased partitions are not direct comparisons. However, the re-analysis of the Simeon dataset that the authors conducted, where unbiased partitions were selected and analyzed in both amino acid and Dayhoff-6 recoded space, does represent a more or less direct comparison and these results strongly contradict the topology favored by Laumer et al. While there are expectedly differences in taxon sampling and data handling between the Simeon re-analysis and that presented by Laumer et al., but this is true of the vast majority of previous phylogenomic analyses at this scale which confidently resolve other nodes in the animal tree of life. The explanation for this in the author's response, "this published dataset does not contain any signal for a Cnidaria-Placozoa relationship" seems quite correct indeed, but phylogenomic datasets as large as Simeon (which is much larger than that presented by Laumer et al.) should possess such phylogenetic signal if it were robust. Therefore, this re-analysis of the Simeon dataset, which is not included in the revision, draws the central phylogenetic finding of Laumer et al., into question and strongly suggests that some other feature of the Laumer et al., dataset is contributing to Cnidaria+Placozoa. As stated above, the unbiased partitions are significantly shorter than the biased partitions. It is possible that this finding is somehow, perhaps indirectly, related to the central finding?

5) Again, while Ctenophora-vs.-Porifera is tangential to the central claims of the paper, I did find it relevant that in the authors’ response (unpublished data), Cnidaria+Placozoa was recovered in "one" analysis, but only when certain outgroups were removed, and the data were D6 recoded. This finding, like the Simeon reanalysis, speaks strongly to the lability of Cnidaria+Placozoa. Please comment on this issue.

https://doi.org/10.7554/eLife.36278.021

Author response

Essential revisions:

1) New placozoan genomes were sequenced and assembled in this paper, but there is very little description of the properties of these new assemblies. Several questions related to the contiguity of the assemblies produced, how the new assemblies compare with the existing assembly and gene models for Trichoplax adhaerens, the degree of overlap among the new genome assemblies reported here, BUSCO scores, etc., of the protein models produced from these analyses, are not addressed. Similarly, the phylogenomic data matrix lacks descriptions of key attributes such as global taxon occupancy and a distribution of partition lengths. Finally, one challenge when sequencing and assembling genomes and, to a lesser extent transcriptomes, from marine meiofauna is contamination. It was not clear how contamination was dealt with in the preparation of genome assemblies, or in the phylogenomic dataset construction procedure. The central claim of the paper is that compositional heterogeneity drives a certain analytical outcome in this and in previous analyses, but given the limited description of the data that underlie this result, one could imagine other artifacts, unrelated to compositional heterogeneity, that could also be at play.

Although this article does indeed introduce new draft genomic assemblies from a hitherto-unsequenced lineage of Placozoa, we opted not to omit a detailed characterisation of these genomes, partially out of spatial constraints (this article was submitted as a Short Report format), and partially since the genomes themselves are Illumina short-insert-only assemblies and therefore relatively fragmented. However, because of the generally small apparent intron size of Placozoa, even such fragmentary assemblies show an acceptable degree of completeness (in terms of gene content) similar or superior to many contemporary transcriptome assemblies, judging by BUSCO orthologue occupancy. The reviewers make a good point that some basic summary statistics on these new draft assemblies would improve the paper; we therefore now include Table 1 to this end.

We somewhat contest the comment that the matrix contents are poorly described, see e.g. the first paragraph of the Results and Discussion section (and the bar plot in Figure 1 depicting matrix occupancy for each species). Nonetheless, following this critique we now include in the Materials and Methods a few more general summary statistics for each matrix described in the paper (global gap% age, means and standard deviations for partition lengths). One curious trend, also apparent from total supermatrix length, is that the supermatrices in which we have found support for Cnidaria+Placozoa seem to be systematically composed of shorter partitions, suggesting possibly that longer orthologues are more susceptible to compositional variation over large evolutionary timescales.

We were surprised by the comment that “It was not clear how contamination was dealt with in the preparation of genome assemblies”, and the apparent insinuation that our results may be explicable as the result of contamination. In the last paragraph of the first Materials and Methods section on sampling, we clearly describe a metagenome binning procedure used to extract putative host bins (metaBAT) apart from contaminating symbiotic bacterial genomes. The low BUSCO duplication rate in each assembly also attests to the lack of evidence for contaminating eukaryotic contigs being present in our draft metagenome assembly bins. To the extent that contamination may have been present in other libraries used in our orthology analysis, we cannot comment specifically, although we do observe that if contamination was indeed the explanation for our major result, one would in general expect to see Placozoa forming the sister group to one specific contaminating cnidarian lineage, not the mutual monophyly we have recovered.

2) The paper leverages new genome scale datasets for placozoa, but it is not clear how the existing genome for Trichoplax adhaerens was utilized. Our reading is that a sample of Haplotype H1, the same haplotype that has been previously sequenced, was assembled here and data from this new H1 assembly was utilized, together with separate assemblies of the other haplotypes. If this is correct, it is not clear why the existing high-quality protein models for Trichoplax were not utilized in the production of the phylogentic matrix. This simple inclusion could help allay concerns over the nature of the new data.

As explained in the caption for Figure 1 in the reviewed submission, the terminal taxon we label haplotype H1 is the reference assembly from Srivastava et al., 2008.

3) Laumer et al., propose that the findings of the majority of previous analyses were influenced by compositional heterogeneity, as uncovered in their analyses. We note that the effects of compositional heterogeneity in the context of deep metazoan phylogenomics were explored previously at least once previously (in Borowiec et al., 2015) using dahoff 6 recoding; however, the Borowiec et al., study did not find the result reported here using the existing Trichoplax data. Moreover, the effect of compositional heterogeneity in previous studies were not directly tested by Laumer et al. Taxon sampling and extensive differences in the datasets utilized used in previous analyses are additional variables that need to be accounted for. At least a subset of previously published phylogenomic datasets that bear on the position of placozoa should be reanalyzed under the procedure used here, and cases where this has already been done should be addressed.

Our Figure 3 and Figure 4A/4C show that Dayhoff-6 recoding is not sufficient to rescue Placozoa+Cnidaria in at least some compositionally biased datasets. Dayhoff-6 recoding is able to mask only a subset of potential compositionally driven changes; for a more detailed discussion of this, see the recent commentary paper by Laumer in Integrative & Comparative Biology. Therefore, if the matrix constructed for the Borowiec et al., 2015 study was compromised by compositional bias in a similar manner as the matrices we analysed in Figure 3 and Figure 4A/4C, it's not surprising that the authors of this paper might not have recovered Cnidaria+Placozoa either, even with Dayhoff-6 recoding.

It is not clear to us what is meant by the comment “Moreover, the effects of compositional heterogeneity in previous studies were not directly tested by Laumer et al.,” – a more specific rephrasing of this point would be required for us to respond meaningfully.

The reviewer’s request to reanalyse existing datasets with sufficient taxon sampling to detect Cnidaria+Placozoa is interesting and intuitive – it is somewhat of a mystery that this signal has not been previously reported in an area of intensive scrutiny (deep metazoan phylogenetics), if it is indeed legitimate. The point that there may be “taxon sampling and extensive differences in the datasets utilized used in previous analyses” is very important in this context. We could locate no published datasets that satisfied the following criteria, which would be needed for strict comparability to the present dataset:

1) Had adequate taxon sampling (10+ species) of all non-placozoan, pre-Bilaterian taxa and a good sampling of non-Metazoan outgroups (particularly Choanoflagellata).

2) Included Placozoa in orthology analysis.

3) Considered large numbers of orthologs (e.g. >2000 alignments), such as would result from global orthology analysis (typically, MCL-based approaches).

4) Made full, un-groomed (e.g. through alignment trimming) orthology groups publicly available.

One dataset which came close to satisfying these criteria was that of Simion et al., (2017). They do not, unfortunately, make their entire dataset available online as untrimmed orthologue groups (there would be 4,002 in this case), but they have made available a set of 1,719 trimmed alignments which display adequate taxon sampling to search for a signal of Placozoa+Cnidaria. On these alignments, we employed the p4 null simulation exactly as described in the Materials and methods section on our 1,388 genes. This yielded a set of 834 test-passing alignments, and 883 test-failing (with two genes that caused errors in the p4 simulation test, likely due to the inclusion of gap-rich sequences post alignment). Using MARE to make information rich submatrices, we end up with a 561-gene, 91,116 residue alignment in the test-passing set (indicating, again, a shorter average length in compositionally less biased genes), and a 382-gene, 119,650 residue alignment in the test-failing set. We started six chains under the CAT+GTR model in PhyloBayes MPI in both amino-acid and Dayhoff-6 recoded space in both matrices.

Unfortunately, we find that achieving convergence has been challenging in both of these matrices in the limited time we have had while preparing this revision (although we continue to run chains from both matrices). However, qualitatively, the independent chains within each dataset are largely identical in the relationships they show among the 5 major animal clades. The general conclusions appear to be:

1) In the test-failing matrices at the amino-acid level, Ctenophora are recovered with full posterior probability as the sister-group to the remaining animals, whereas under Dayhoff-recoding support for this position becomes low (pp 0.87 for instance) or, indeed, in some chains, Porifera are recovered with full support as this sister-group. In the test-passing matrices at the amino-acid level and the recoded levels alike, Porifera are consistently recovered with full posterior probability as the sister-group to the remaining Metazoa.

2) In no chain was Cnidaria+Placozoa recovered, in either amino-acid or Dayhoff-recoded space (instead the conventional Cnidaria+Bilateria relationship was seen).

A straightforward interpretation of this result would be that a.) as suggested later in the Feuda et al., (2017) manuscript, Ctenophora-sister is a compositionally-driven artefact and that b.) this published dataset does not contain any signal for a Cnidaria-Placozoa relationship. In our opinion, full confidence in the latter conclusion is difficult to muster, although these analyses are at least consistent with that interpretation. However, due to the substantial differences in methodology (orthology, alignment parameters, alignment trimming), and particular taxon sampling (e.g. this Simion et al., dataset lacking the additional clade A placozoan genomes but including many other divergent outgroups with distinctive compositional properties, which may have affected the results of the p4 simulation tests), it is difficult to completely compare these results to our own. A more consistent comparison would require the public availability of the full orthology dataset.

Nonetheless, this preliminary experiment does show that signal for the Placozoa–Cnidaria clade may indeed be variable among different taxon and orthologue sets. We have therefore chosen to somewhat temper the language in the manuscript, although in our opinion the consistent direction of the analyses we have shown of our own dataset is to favour this clade.

4) Results and Discussion section. The reported PPA scores are very bad, indicating that both datasets are extremely heterogeneous and that at the AA level both datasets are unreliable. The PPA do suggest BUSCO is worst, but that does not make your new AA dataset more reliable as Z-scores greater than one hundred indicate an utter failure of the model to describe the data (see Feuda et al., 2017 for a discussion of interpretation of Z-scores). This should be pointed out in your discussion. Note however, that the fact that a dataset is highly heterogeneous does not mean that the tree it supports is incorrect; that depends on the specific of the dataset (taxon specific distribution of heterogeneity, presence of other forms of noise etc.). High heterogeneity means that we should be cautious when interpreting the results as the tree might include tree-reconstruction artefacts. So, our reading of these two analyses is that both datasets are heterogeneous and that tests (like analyses under Dayhoff-6) need to be carried out to validate the clades in these trees. You did these tests and they clearly indicate that Placo-Cni and Pori-sister are unlikely to be compositional artefact while Cneto-sister is likely a compositional artefact.

We have directly acknowledged ("Both the initial 430-gene matrix and the 303-gene BUSCO matrix fail these tests") that both matrices suffer from compositional heterogeneity, judging from the posterior predictive tests. We also agree that while these scores indicate the BUSCO test fails to a greater degree, the fact that both matrices are so heterogeneous indicates that neither is strictly reliable – indeed, this was what motivated our per-gene simulation test. We agree with the reviewer’s assertion that the resilience of Cnidaria+Placozoa to Dayhoff-6 recoding indicates that this clade is less likely to be a compositional artefact.

5) Results and Discussion section: You need to provide more information about heterogeneity. Are you talking of site or lineage specific heterogeneity? These are two different components of a dataset’s heterogeneity that are differently modeled and both need to be considered. Site-specific heterogeneity is generally modelled relatively efficiently using CAT-GTR while lineage specific heterogeneity cannot be modeled using the standard models you used in this paper, as you would need BreakpointCAT, NDCH or similar to model it. D6 Recoding reduces both forms of heterogeneity and, in combination with CAT-GTR generally, allows for an adequate modeling of site-specific heterogeneity and an improvement in the modeling of lineage specific heterogeneity (the latter not necessarily reaching adequacy – see Feuda et al., 2017). Hence it is key that you report statistics for all your datasets (Z-scores for both site and lineage-specific heterogeneity for all you're a and Dayhoff datasets). A table with the Diversity and Comp PPA (these can be derived using Phylobayes) will do and should be easily done as you should already have all the chains (as you already run other PPA). A table presenting all z cores for all your dataset would be key to interpret your results (Figure 1 to Figure 4D included) and decide which tree is less likely to be incorrect/more likely to be accurate.

In context we believe it was clear that our usage of “heterogeneity” was referring specifically to time heterogeneity in among-lineage residue frequency, rather than site heterogeneity. We very much agree with the reviewer’s point that even the site-heterogeneous CAT+GTR model fails to adequately model compositional drift over time, and would further add that the other models cited, BP-CAT and NDCH, do not have sufficiently modern implementations (e.g. with MPI) to use on contemporary large-scale datasets, and/or have other major limitations (e.g. the NDCH implementation in p4 not being compatible with site-heterogeneous mixture modelling). Indeed, we used Dayhoff recoding specifically to help mitigate the influence of compositional bias in our CAT+GTR analyses – but we are in agreement with Feuda et al., that “mitigate” is indeed the proper term, since even Dayhoff recoding cannot fully remove all non-stationary signal in a dataset. We have added a citation in this paragraph to the Feuda et al., manuscript at the first mention of posterior predictive tests and also to a recent commentary piece published by the first author in Integrative and Comparative Biology, which contains an extended discussion of the potential value and limitations of recoding, which we hope may help further contextualize these results. We also agree that there is value in providing z-scores for diversity and composition posterior predictive analyses for the various matrices we have employed; for all 4 supermatrices discussed in the manuscript, we now summarize z-scores in Supplementary file 2. Unfortunately, such posterior predictive tests can only be presented for amino-acid level analyses – we ran the Dayhoff-recoded analyses in Phylobayes 4.1c (serial version), whose default run behaviour does not record the full chain file needed to undertake a posterior predictive test. Total reanalysis in D6-space would therefore be required to present these scores.

6) Results and Discussion section: This section reads like an ad hoc attempt to justify Ctenophora-sister, a result that is tangential to your paper (which is about Placozoa) and that is not supported by your analyses. Specifically:

We agree that the debate over Ctenophora-vs-Porifera sister group of all other metazoans is indeed tangential to this paper, and hope that this discussion does not derail the way this paper at large is evaluated. However, we would like to respond to this confrontation in clear terms: our specific claim is that the factors influencing whether Porifera vs. Ctenophora is recovered as the sister group to the remaining metazoan may vary among datasets, as it does in virtually all published data sets (including many that claim to resolve this). While in some supermatrices, we certainly do agree that compositional heterogeneity is likely to be the main driver, this cannot be the case for other datasets.

6A) Figure 3 presents the BUSCO trees under CAT-GTR+G (3A) is AA and (3B) is Dayhoff. Figure 3A (your worst overall dataset in terms of heterogeneity) clearly supports Ctenophora-Sister, while Figure 3B shows an unorthodox PORIFERA+CTENOPHORA (but with relatively low support ~ 75%). Neither of these two figures supports Porifera-sister. According to your arguments BUSCO is the worst dataset. When considered as an AA alignment, the BUSCO dataset maximizes its compositional heterogeneity and supports Cteno-sister. In the text you state that Figure 4A (the dataset of the heterogeneous genes in your dataset – that we will discuss further below) provides evidence that Pori-sister might be linked to heterogeneity. However, even if that was actually the case, then Figure 3A must be evidence that Cteno-sister is a compositional artefact (equally linked to heterogeneity). Yet this important fact is not even mentioned in the text. This is surprising and unfair given that you use the heterogeneity of the dataset used to derive Figure 4A to make an argument against Porifera-sister, the heterogeneity of Figure 3A to suggest Placozoa + Bilateria is likely to be an artefact, but you do not discuss the fact that also Cteno-sister emerges from the "bad" BUSCO dataset and should thus be considered to be dubious. The text needs to be modified to reflect this.

Firstly, we assume that “Placozoa + Bilateria” in this response was a typo, and that the reviewer meant to write “Placozoa + Cnidaria” – in no analysis did we recover such a clade. Secondly, we are in full agreement that the Ctenophora-sister result recovered in our BUSCO supermatrix is likely to be a compositional artefact in this specific dataset: both the posterior predictive analyses (which show higher z-scores for most ctenophores as well as some calcisponges in the BUSCO dataset) and the fact that support for Ctenophora-sister erodes under Dayhoff recoding are highly suggestive on this conclusion, as the reviewer states.

6B) The experiment in Figure 4 is nicely designed, but some aspects of its implementation and the interpretation of its results are problematic. Specifically, we think that the analyses reported in Figure 4A and 4C are misleading for two reasons. First, the data matrix cannot be proved to be composed only by heterogeneous genes. This is because 111 of the 764 genes in the dataset used to derive this figure (~ 14% of the superalignment) are composed of genes that are not, strictly speaking (i.e., at the 0.05 cutoff level), heterogeneous. Second, even if this dataset was composed purely of heterogeneous genes it could not be used to claim, for example, that Porifera-sister is linked to heterogeneity. This is because on this dataset Dayhoff recoding did not change the position of the sponges. This indicates that the signal for this node in this dataset (irrespective of its heterogeneity) is not driven by the within-D6classes changes that are known to be associated with compositional heterogeneity. Irrespective of the heterogeneity of the dataset, the fact that Porifera sister is unchanged and does not loose support upon recoding indicates that it is supported by between D6-classes changes, which are not silenced by recoding, and are more likely to be associated phylogenetic signal. To conclude, you do not have evidence in Figure 4A and 4C to claim that the assertion of Feuda (that Cteno-sister is likely to be associated with compositional heterogeneity while Porifera-sister is not likely to be a compositional artifact) might be incorrect. The Dayhoff test rejects your conclusion. So, what to do with the analysis of Figure 4A and 4C? Our thought is that this dataset is confusing your experiments because it is composed of a mixture of more and less heterogeneous genes and it is best excluded. We suggest that you just report the results in Figure 4B and 4D which are actually very clear and say all that there is to say on Placo-Cni and Pori-sister vs. Cteno-sister.

Firstly, we must emphasize that we never have claimed that this experiment tells us that Porifera-sister is likely to be a compositional artefact (“that Porifera-sister is linked to heterogeneity”, in the reviewer’s words). Our specific claim was only that in the context of this experiment, we found it difficult to understand Ctenophora-sister as the result of compositional non-stationarity.

Indeed, we would agree that since Dayhoff recoding seems to diminish support for Ctenophora-sister in the test-passing matrix (Figure 4D), while Porifera-sister remains unchanged by the recoding in the test-failing matrix (Figure 4C), probably Ctenophora-sister is more likely to be the artefact. Instead what we are saying is that our results question the assertion of Feuda et al., 2017 that the converse is true in general: we see no evidence that Ctenophora-sister can be interpreted to be the result of compositional heterogeneity in this dataset (Figure 4). If this were true, we'd expect to see Ctenophora-sister in Figure 4A even in a CAT+GTR analysis, because CAT+GTR is also a model that assumes compositional stationarity. We find it very helpful to consider that we actually do recover strong support for Ctenophora-sister in an amino-acid analysis of our test-failing 349-gene matrix: in our profile mixture model ML tree made from this dataset (Figure 4—figure supplement 1). It is only when we analyse this same matrix under CAT+GTR that we recover Porifera-sister. Because both the ML model and CAT+GTR make the assumption of stationarity, logically, it can’t be that Ctenophora-sister is recovered under ML alone because of compositional heterogeneity. Therefore, in this matrix, it’s more likely that this result is driven by some other artefact – for instance, saturation due to site-heterogeneous substitution, which is better modelled by CAT+GTR than the profile mixture model.

Moreover, while in general we agree that Dayhoff-6 (and other forms of) recoding will mitigate compositional heterogeneity, we would also like to remark that a.) some compositionally-driven changes will still be apparent in between-Dayhoff-group substitutions and that b.) this may not be the only form of systematic error against which recoding might be protective. It might also well be that the recoding-driven erosion of support we see for Ctenophora-sister comparing e.g. Figure 4A and 4C (and elsewhere) is actually due (in some part) to a different class of systematic errors, e.g. heteropecilly or heterotachy, which also happens to be diminished under Dayhoff recoding. This is discussed at greater length in the recent ICB article by the first author. The statistical properties of recoded datasets are not yet completely understood, and we simply remark that it is a bit long-reaching to assert that the sole effect of recoding is to control compositional bias.

The other comment the reviewer makes here, “that the analyses […] are misleading” because “the data matrix cannot be proved to be composed only by heterogeneous genes” is difficult to respond to. No statistical test guarantees completely accurate results, and it is true that due to some type I error (inappropriate rejection of the null hypothesis), some proportion of genes in the test-failing matrix actually fit the stationarity assumption well. It is also true that some heterogeneous genes may have been included in the test-passing set due to type II error (although we have tried to minimize this by selecting an α-threshold of 0.10).

We believe that readers with a background in basic statistics will be able to critically assess the results displayed in Figure 4 themselves, and respectfully disagree with the reviewer’s suggestion that figures 4A and 4C are “best excluded” – it is only in comparison with figures 4B and 4D that this experiment shows the Cnidaria–Placozoa clade to be associated with compositional heterogeneity, which is the central point of this paper.

6C) The fact that the genes in your "homogeneous dataset" are individually homogeneous does not mean that your homogenous superalignment is in itself homogeneous. This needs to be tested (using PPA under CAT-GTR for both Dayhoff-6 and AA datasets). This is because heterogeneity in a multigene dataset adds up and it is customary that superalignments composed of individually homogeneous genes fail heterogeneity tests. My expectation is that the dataset in Figure 4B will be shown to be still heterogeneous and that of Figure 4D will be shown to be much more homogeneous (as it is always the case with D6-recoded datasets). Accordingly, the analyses of Figure 4BD indicate that there is evidence for Cteno-sister to be a compositional artefact (driven by within D6-classes changes only present in Figure 4B) while there is no evidence for Cnidaria+Placoza and Pori-sister to be compositional artifacts (as they are both present in Figure 4D where only between classes changes are evident). In addition, this analysis confirms the existence of two signals in the data with reference to Ctenophores (one linked to unreliable within D6 classes changes) and one linked to the more reliable between classes changes, while Placo-Cnidaria seem to be the only signal in the dataset and it is more strongly represented in more reliable between-classes changes.

We certainly do agree with the reviewer’s prediction that a concatenated matrix composed of genes that individually pass a test of compositional heterogeneity might still, in aggregate, show evidence of compositional nonstationarity. Indeed, this is what the posterior-predictive analysis presented in the new Table 2 demonstrates: all amino-acid super-matrices presented in this paper (and in others, see e.g. Feuda et al., 2017) fail to meet the stationarity assumption of the CAT+GTR model. However, as predicted, the matrix from the test-passing set has the lowest mean z-score. Presumably, our recoded analyses in all matrices will have further mitigated (but not eliminated) such non-stationary compositional variation; however, as described above, we are unable to report the results of posterior predictive tests with under Dayhoff recoding without fully redoing these analyses.

7) Results and Discussion section: Here you seem to assume that Dayhoff recoding always have to change the topology and that if the topology does not change the recoding, in some way, it failed. The application of D6 was "not significant" (in your words) for the dataset of Figure 4AC. The problem with your statement is that a tree does not need to be incorrect simply because the data are heterogeneous, hence D6 does not need to invariably cause topological changes. So, its application was not "not significant", but it did not drive to a topological change. As said above as well, D6 reduces the influence of compositional heterogeneity by silencing within category changes. The analysis in Figure 4AC indicates that Porifera-sister is driven by more reliable among category changes which are generally not compositionally driven. Otherwise it would have disappeared in the Dayhoff analysis. Note that we are not saying that Pori-sister is correct; rather, we are saying that Figure 4A C does not provide evidence that this clade is compositionally driven, hence you cannot conclude that Porifera-sister might be associated with compositional attractions. Note that, with reference to the Ctenophora debate, the application of recodings has invariably produced strongly directional changes Ctenophora->Porifera. This is a highly non-random directional change and at this time there is no known Dataset that while supporting Porifera-sister at the AA level (under CAT-GTR) was found to support Cteno-sister after recoding. This is true of all the datasets of Whelan when Choanoflagellata are the only outgroups used (pori-sister as AA and as Dayhoff), and for your dataset in Figure 4AC – for example. This can only be interpreted as suggesting that in all available datasets Porifera-sister is never driven by within category changes, accordingly, when it emerges at the root, is cannot be because of a compositional attraction – irrespective of the heterogeneity of the dataset itself. Differently, D6 experiments invariably indicate that when ctenophores emerges at the base, compositional heterogeneity can never (all datasets tested to date behave exactly in this same way -including yours) be ruled out – strongly suggesting Cteno-sister to be a compositional artifact.

This reviewer again reads too much into the text when he remarks that “you seem to assume that Dayhoff recoding always have to change the topology and that if the topology does not change the recoding, in some way, it failed”. Our usage of the phrase “not significant” in the context of Figure 4A and 4C was simply descriptive: we were only remarking that the topologies and supports do not change due to D6 recoding. Indeed, we agree that this probably indicates that Porifera-sister, in this dataset, is the result least likely to be the result of a systematic error, precisely because it is robust to Dayhoff recoding.

As an aside, although we do not see the relevance per se to this manuscript, the reviewer’s comment that “this is a highly non-random directional change and at this time there is no known Dataset that while supporting Porifera-sister at the AA level (under CAT-GTR) was found to support Cteno-sister after recoding” is not entirely true. In unpublished data, we have seen one large supermatrix which has poor (0.87 pp) support for Ctenophora-sister under amino-acid CAT+GTR analysis, which actually increases to full posterior probability under CAT+GTR of the Dayhoff-recoded version of this same matrix. This shows that the “directionality” of results from recoding is not necessarily consistent among datasets, and in our minds highlights that the theoretical properties of recoded alphabets are still not yet completely understood. Interestingly, however, in a separate analysis, we also recover Cnidaria+Placozoa in one analysis (using only the H1 reference strain as a representative) – in the matrix in question, this clade is recovered only when all non-choanoflagellate outgroups are removed and when the data are recoded into D6 categories – an experiment which was actually inspired by one of the reviewers’ past papers (Pisani et al., 2015), and which also suggests using an independent dataset that Cnidaria+Bilateria may be a compositionally driven artefact.

8) Title: Given how controversial the relationships at the base of the animal tree have been, perhaps a more moderate title would be more fitting (e.g., Phylogenomic support for a sister relationship of Placozoa and Cnidaria).

The suggestion is a good one; we use a more nuanced title in the resubmission.

9) The scripts used for data processing in genome assembly, phylogenomics matrix construction and analysis are not linked in the paper. Sharing these scripts on a publicly available repository like Github would enhance the reproducibility of the present study.

All python scripts used in producing and curating these datasets were actually included in this submission with the supplementary data available in the DataDryad accessory, as stated in the “Source Data Availability” section. Indeed, as these were for the most part one-off helper scripts (sometimes hard-coded), they are most interpretable/re-usable when bundled alongside the data they were used to manipulate. We understand that this DataDryad accession was available to reviewers and should have been available for inspection during the review process. If editorially required, we are certainly happy to also make these scripts available in a github page or elsewhere.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Essential revisions:

1) The authors allay some concerns over the description of the genome data and the point that the new genome assemblies produce BUSCO orthologue occupancies that are similar to other transcriptome assemblies is appreciated. One does wonder why short read genome assemblies, which produced highly fragmented assemblies, were utilized in the first place, instead of transcriptomes. If the purpose of doing so was to enhance placozoan genome resources, these resources are not described here in any detail because they are too fragmented. "we opted not [sic] to omit a detailed characterisation of these genomes, partially out of spatial constraints (this article was submitted as a Short Report format), and partially since the genomes themselves are Illumina short-insert-only assemblies and therefore relatively fragmented". Please provide this information.

Our purpose was explicitly not to provide new placozoan genomic resources in this paper – indeed, it seems this has been already reasonably adequately done during this article's review process (Eitel et al., 2018), although the authors of that paper regrettably did not take the opportunity to use their data to independently test our Cnidaria+Placozoa hypothesis which had been expressed in preprint form and was known to them. In reality, the sequencing data from these short-insert libraries were originally generated by Gruber-Vodicka and colleagues as part of another ongoing project. However, as is standard in the field, genomic data are very re-usable, and because placozoans seem to have relatively simple repeat landscapes and short introns, we saw that we could get acceptable enough gene representation from these fragmented assemblies to perform a phylogenetic analysis (Table 1). Indeed, the high occupancy of all placozoans we sample in our matrices (see bar plot on right of Figure 1) seems to have validated this intuition. We have altered the text to make the provenance of these data more transparent to the reader.

2) The manuscript now includes information of the% age, means and standard deviations for partition lengths. This information is important for evaluating the possible causes of the phylogenetic discrepancies (e.g. Cnidaria+Placozoa vs. Cnidaria+Bilateria) observed in different datasets. We now see that dataset that supports Cnidaria+Placozoa are systematically composed of shorter partitions. Thus, compositional bias is not the only difference between the two datasets. No further explanation is given. Please report these data and explicitly discuss the finding that ompositional bias is not the only difference between the two datasets.

We agree with the reviewers' assertion that it would benefit the transparency of this paper to emphasize this observation more explicitly, and in this resubmission we have brought attention to this in the text. However, "no further explanation is given" because, in truth, how to explain this correlation remains unclear to us. It is, however, worthwhile in assessing these results to consider recent literature seeking to understand the influence of intrinsic gene properties on phylogenetic inference. There appear to be relatively few papers that systematically assess correlations between different sequence-based metrics and their influence on phylogenetic inference, but we were particularly struck by one recent paper for which the reviewing editor was the senior author, Xing-Xing et al., 2016. This paper attempted to systematically investigate the correlation structure and relative importance to phylogenetic inference of 52 different quantitative properties of multiple sequence alignments. Among other results, they found that only two sequence-based properties, alignment length (with or without gaps), and Relative Composition Variability (RCV), a measure of compositional homogeneity, were strong predictors of phylogenetic signal and consistency with assumed “true” trees (eMRC trees in yeast and mammal datasets). These authors also found, evident in their Supplemental Tables S4 and S5, that these two metrics are strongly (-0.4388518 for yeast, and -0.1354731 for mammals) negatively correlated. A reader might therefore conclude that longer genes are more phylogenetically informative, less likely to be compositionally biased, and more consistent with the true tree – and in light of this conclusion, might reasonably question the Cnidaria+Placozoa result found in our supermatrices composed of systematically shorter genes (Figure 1 and Figure 2, Figure 4B and 4D).

To make a fair comparison against the results of Xing-Xing et al. 2016 and these unpublished results, we also plotted the RCFV score, a refinement of the RCV metric designed to accommodate gapped alignments, introduced by Zhong et al., 2011, against gene length in our 1,388 orthologue set.

Author response image 1

Intriguingly, using RCFV as a quantitative index of compositional heterogeneity, we see no evidence of a trend against gene length (from a linear regression, R2 = 0.000128562, adjusted R2 = -0.000592846), running against both of the above-cited results. Nonetheless, our previous report that it is, in general, smaller genes that pass the p4 compositional heterogeneity null simulation test (with p > 0.10), is clearly evident in the above plot, despite similar overall RCFV scores to the test-failing population.

Our conclusion at the moment, considering trends within our own data and the published literature, is that no general statements can yet be made regarding the relationship between alignment length and the degree of compositional bias. As the above plot indicates, even within one dataset results may differ depending on which metric of non-stationarity is used (although we expect that the p4 test, as it explicity attempts to incorporate gene tree topology, is the more sensitive than a simplistic, expectation-agnostic RCFV distribution thresholding). Furthermore, it is unclear to us that there exists a general relationship, whether constant or variable with scale of evolutionary divergence, between compositional bias and gene length. This is clearly an area in need of further investigation with empirical and theoretical approaches, hopefully incorporating multiple metrics of bias. It is our hope that the publication of this work, which cannot have the ambition of being a systematic phylogenetic methods comparison, might nonetheless motivate further such systematic studies. At present, we see no particular reason from first principles or a consensus of empirical studies to devalue results from supermatrices composed of shorter partitions.

3) The paragraph describing metaBAT and metawatt does not explicitly mention contamination, which would help the reader interpret the purpose of these steps. Please rectify this.

We agree this would help clarify the purpose of these procedures in the context of this phylogenetic paper (a discipline where metagenomics tools are not often used). We have altered the text accordingly.

4) We agree that Dayhoff-6 recoding is not a silver bullet for compositional bias and that previous analyses that have not filtered data for compositionally unbiased partitions are not direct comparisons. However, the re-analysis of the Simeon dataset that the authors conducted, where unbiased partitions were selected and analyzed in both amino acid and Dayhoff-6 recoded space, does represent a more or less direct comparison and these results strongly contradict the topology favored by Laumer et al. While there are expectedly differences in taxon sampling and data handling between the Simeon re-analysis and that presented by Laumer et al., but this is true of the vast majority of previous phylogenomic analyses at this scale which confidently resolve other nodes in the animal tree of life. The explanation for this in the author's response, "this published dataset does not contain any signal for a Cnidaria-Placozoa relationship" seems quite correct indeed, but phylogenomic datasets as large as Simeon (which is much larger than that presented by Laumer et al.,) should possess such phylogenetic signal if it were robust. Therefore, this re-analysis of the Simeon dataset, which is not included in the revision, draws the central phylogenetic finding of Laumer et al., into question and strongly suggests that some other feature of the Laumer et al., dataset is contributing to Cnidaria+Placozoa. As stated above, the unbiased partitions are significantly shorter than the biased partitions. It is possible that this finding is somehow, perhaps indirectly, related to the central finding?

As we attempted also to convey in our previous response, we disagree that our reanalysis of the Simion et al., dataset "does represent a more or less direct comparison" to our own analyses. Differences in taxon sampling and data handling matter, and there are multiple plausible explanations for why we failed to find support for Placozoa+Cnidaria in our reanalysis. Those at the top of the list for us include:

a) The Simion et al., genes included more distant, non-choanoflagellate outgroups, which include exceptionally compositionally biased taxa (e.g. Filasterea such as Capsaspora; see z-scores from PPA in Nosenko et al., 2013). This might influence not only phylogenetic inference (in particular by skewing the mean amino acid frequency vector) but also the p4 composition test.

b) The Simion et al., genes were provided to the public pre-aligned and pre-trimmed (https://github.com/psimion/SuppData_Metazoa_2017). The effect of multiple alignment algorithms (and alignment masking) on phylogenetic inference was extensively explored, and shown to be highly significant in driving results in the pre-genomic era, but it has largely been ignored by authors working with large-scale datasets. Of particular note, Simion et al., used a MAFFT algorithm different from ours (L-INS-i vs. our E-INS-i), and masked their alignments with BMGE, presumably using default parameters (which notably do not include its option to trim compositionally biased sites), representing a more stringent alignment masking (as it incorporates a measure of “entropy” in its trimming) than our use of ZORRO.

c.) The Simion et al., dataset, naturally, did not include any clade-A placozoan representatives. While we give one anecdotal analysis (Figure 1—figure supplement 2) suggesting that increased taxon sampling of Placozoa was not driving the Placozoa+Cnidaria clade in our dataset, it is not yet clear to us that inclusion of the total extant diversity of Placozoa has no influence on phylogenetic inference (especially in light of compositional bias) generally. We would be particularly interested in how taxon sampling (particularly of compositionally deviant taxa such as non-choanoflagellate outgroups and within Placozoa) might influence the outcome of a compositional bias test.

For these reasons, and additionally for the important reason that these reanalyses did not achieve convergence using the CAT+GTR model in a reasonable time (note that the original Simion et al., paper reports only results from the simpler and computationally more tractable CAT model, not CAT+GTR), we chose not to explicitly report them in our revised version of this Short Report, as we did not wish to invite direct comparisons between these results and our own.

5) Again, while Ctenophora-vs.-Porifera is tangential to the central claims of the paper, I did find it relevant that in the authors’ response (unpublished data), Cnidaria+Placozoa was recovered in "one" analysis, but only when certain outgroups were removed, and the data were D6 recoded. This finding, like the Simeon reanalysis, speaks strongly to the lability of Cnidaria+Placozoa. Please comment on this issue.

We are somewhat hesitant to comment more exhaustively on this, since in our opinion during the review process an article should be evaluated on its own merits, not in relationship to other work, however related, being considered for publication elsewhere. However, we brought it up explicitly since it seemed a straightforward, evidence driven counter-argument to the review of Pisani.

What we omitted to do in summarily describing our unpublished data in this forum, was to mention its second CAT+GTR analysis showing strong support for Cnidaria+Placozoa, displayed in a Supplemental Figure. The analysis in question – based on an unrecoded amino acid matrix, in contrast to the other experiment we described in our initial review response – comes from a 43K submatrix of an originally ~100K residue supermatrix, created by leaving non-choanoflagellate outgroups (plus a few phylogenetically redundant and data-poor metazoan taxa) in the matrix prior to BMGE trimming of saturated and non-stationary sites. In contrast, deleting these outgroups from the alignment and then applying the BMGE algorithm yields a ~56K site matrix, which shows strong support for Cnidaria+Bilateria.

The reviewers are quite correct to point out "the lability of Cnidaria+Placozoa", a fact which we tried to emphasize in this manuscript as written, and which is also apparent within the analyses of our unpublished data. What is important to us is to consider the directionality of the lability: Placozoa+Cnidaria does not appear randomly as one compares different analyses, but instead shows up only in those analyses which were constructed specifically to minimize the influence of compositional bias: either by testing and excluding biased genes individually, trimming biased individual sites from a concatenated matrix, removing compositionally biased distant outgroups whose inclusion might skew the inferred residue frequency vector, recoding data into Dayhoff groups, or some combination of these. In the absence of phylogenetic inference software which is able to efficiently handle both site-heterogeneous and time-heterogeneous substitution, such matrix curation choices are our best current defence against systematic error. The fact that Placozoa+Cnidaria – a clade which, we emphasize, has not even been reported in recent molecular phylogenetic literature – shows up only when such stringent measures are taken is indeed at the heart of our argument that this clade is the more credible of the two we have observed, despite its lability to analytical conditions. We consider that this information will be extremely helpful for evaluating future phylogenetic hypotheses, not just in relation to Placozoa and Cnidaria.

https://doi.org/10.7554/eLife.36278.022

Article and author information

Author details

  1. Christopher E Laumer

    1. Wellcome Trust Sanger Institute, Hinxton, United Kingdom
    2. European Molecular Biology Laboratories-European Bioinformatics Institute, Hinxton, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    claumer@ebi.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8097-8516
  2. Harald Gruber-Vodicka

    Max Planck Institute for Marine Microbiology, Bremen, Germany
    Contribution
    Resources, Formal analysis, Supervision, Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Michael G Hadfield

    Kewalo Marine Laboratory, Pacific Biosciences Research Center and the University of Hawaii-Manoa, Honolulu, United States
    Contribution
    Resources, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Vicki B Pearse

    Institute of Marine Sciences, University of California, Santa Cruz, United States
    Contribution
    Conceptualization, Resources, Writing—review and editing
    Competing interests
    No competing interests declared
  5. Ana Riesgo

    Invertebrate Division, Life Sciences Department, The Natural History Museum, London, United Kingdom
    Contribution
    Resources, Data curation, Writing—review and editing
    Competing interests
    No competing interests declared
  6. John C Marioni

    1. Wellcome Trust Sanger Institute, Hinxton, United Kingdom
    2. European Molecular Biology Laboratories-European Bioinformatics Institute, Hinxton, United Kingdom
    3. Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Resources, Supervision, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9092-0852
  7. Gonzalo Giribet

    Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    Contribution
    Conceptualization, Resources, Funding acquisition, Writing—review and editing
    Competing interests
    No competing interests declared

Funding

Max-Planck-Institut fuer Marine Microbiologie

  • Harald Gruber-Vodicka

European Bioinformatics Institute

  • John C Marioni

Harvard University (Faculty of Arts and Sciences)

  • Gonzalo Giribet

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Nicole Dubilier (Max Planck Institute for Marine Microbiology) contributed resources that permitted the collection and assembly of draft Trichoplax genomes, which were amplified and sequenced at the Max Planck-Genome-Centre Cologne. Dan Richter (King lab) and Kanako Hisata (Satoh lab) provided access to unpublished transcriptomes and peptide predictions. The EMBL-EBI Systems Infrastructure team provided essential support on the EBI compute cluster. Allen Collins, Scott Nichols, and particularly Andreas Hejnol provided useful comments on an earlier version of this manuscript.

Senior Editor

  1. Diethard Tautz, Max-Planck Institute for Evolutionary Biology, Germany

Reviewing Editor

  1. Antonis Rokas, Vanderbilt University, United States

Publication history

  1. Received: February 27, 2018
  2. Accepted: October 11, 2018
  3. Accepted Manuscript published: October 30, 2018 (version 1)
  4. Version of Record published: December 3, 2018 (version 2)

Copyright

© 2018, Laumer et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,168
    Page views
  • 247
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Evolutionary Biology
    Daria N Shalaeva et al.
    Research Article
    1. Evolutionary Biology
    2. Genetics and Genomics
    John Grey Monroe et al.
    Research Article