The genome of the crustacean Parhyale hawaiensis, a model for animal development, regeneration, immunity and lignocellulose digestion

  1. Damian Kao
  2. Alvina G Lai
  3. Evangelia Stamataki
  4. Silvana Rosic
  5. Nikolaos Konstantinides
  6. Erin Jarvis
  7. Alessia Di Donfrancesco
  8. Natalia Pouchkina-Stancheva
  9. Marie Semon
  10. Marco Grillo
  11. Heather Bruce
  12. Suyash Kumar
  13. Igor Siwanowicz
  14. Andy Le
  15. Andrew Lemire
  16. Michael B Eisen
  17. Cassandra Extavour
  18. William E Browne
  19. Carsten Wolff
  20. Michalis Averof
  21. Nipam H Patel
  22. Peter Sarkies
  23. Anastasios Pavlopoulos  Is a corresponding author
  24. Aziz Aboobaker  Is a corresponding author
  1. University of Oxford, United Kingdom
  2. Janelia Farm Research Campus, United States
  3. Imperial College London, United Kingdom
  4. Centre National de la Recherche Scientifique (CNRS) and É cole Normale Supé rieure de Lyon, France
  5. University of California, United States
  6. Howard Hughes Medical Institute, University of California, United States
  7. Harvard University, United States
  8. Smithsonian National Museum of Natural History, United States
  9. Institut fur Biologie,Humboldt-Universitat zu Berlin, Germany
16 figures, 4 tables and 6 additional files

Figures

Introduction.

(A) Phylogenetic relationship of Arthropods showing the Chelicerata as an outgroup to Mandibulata and the Pancrustacea clade which includes crustaceans and insects. Species listed for each clade …

https://doi.org/10.7554/eLife.20062.003
Parhyale karyotype.

(A) Frequency of the number of chromosomes observed in 42 mitotic spreads. Forty-six chromosomes were observed in more than half of all preparations. (B) Representative image of Hoechst-stained …

https://doi.org/10.7554/eLife.20062.005
Parhyale genome assembly metrics.

(A) K-mer frequency spectra of all reads for k-lengths ranging from 20 to 50. (B) K-mer branching analysis showing the frequency of k-mer branches classified as variants compared to Homo sapiens

https://doi.org/10.7554/eLife.20062.006
Figure 4 with 1 supplement
Workflows of assembly, annotation, and proteome generation.

(A) Flowchart of the genome assembly. Two shotgun libraries and four mate-pair libraries with the indicated average sizes were prepared from a single male animal and sequenced to a predicted depth …

https://doi.org/10.7554/eLife.20062.007
Figure 4—source data 1

Catalog of repeat elements in Parhyale genome assembly.

Description of repeat content in the Parhyale genome.

https://doi.org/10.7554/eLife.20062.008
Figure 4—source data 2

Software and Data.

List of programs and bioinformatic tools and publicly available sequence data used in this study.

https://doi.org/10.7554/eLife.20062.009
Figure 4—figure supplement 1
CEGMA assessment of Parhyale transcriptome and genome.

(A) CEGMA genes present in the transcriptome assembly scored by BLAST identity (y axis) and proportion of coverage (relative length, x axis) (B) CEGMA genes present in the genome assembly scored by …

https://doi.org/10.7554/eLife.20062.010
Figure 5 with 1 supplement
Parhyale genome comparisons.

(A) Box plots comparing gene sizes between Parhyale and humans (H. sapiens), water fleas (D. pulex), flies (D. melanogaster) and nematodes (C. elegans). Ratios were calculated by dividing the size …

https://doi.org/10.7554/eLife.20062.012
Figure 5—source data 1

List of proteins currently unique to Parhyale.

List of proteins in Parhyale without identity to other species.

https://doi.org/10.7554/eLife.20062.013
Figure 5—source data 2

List of genes likely to be specific to the Malacostraca

List of genes likely to be specific to the Malacostraca.

https://doi.org/10.7554/eLife.20062.014
Figure 5—source data 3

Orthofinder analysis.

Orthofinder analysis using the Parhyale predicted proteome.

https://doi.org/10.7554/eLife.20062.015
Figure 5—figure supplement 1
Expanded gene families in Parhyale.

Histograms showing number of paralogs in each listed species for (A) sidestep, (B) lachesin, (C) neurotrimin/DPR, (D) APN and (E) cathepsin genes for gene families over represented in Parhyale.

https://doi.org/10.7554/eLife.20062.016
Figure 6 with 1 supplement
Variation analyses of predicted genes.

(A) A read coverage histogram of predicted genes. Reads were first mapped to the genome, then coverage was calculated for transcribed regions of each defined locus. (B) A coverage distribution plot …

https://doi.org/10.7554/eLife.20062.017
Figure 6—source data 1

Polymorphism in Parhyale devlopmental genes.

Description of polymorphism in previously identfied Parhyale developmental genes.

https://doi.org/10.7554/eLife.20062.018
Figure 6—figure supplement 1
Confirmation of polymorphisms in the wider laboratory population of Parhyale.

(A) An example of laboratory population polymorphism in exon 1 of the gene aristalless. As well as heterozygoisty in the single Chicago-F male sequenced (pink and purple bases) there is additional …

https://doi.org/10.7554/eLife.20062.019
Variation observed in contiguous BAC sequences.

(A) Schematic diagram of the contiguous BAC clones tiling across the HOX cluster and their% sequence identities. 'Overlap length' refers to the lengths (bp) of the overlapping regions between two …

https://doi.org/10.7554/eLife.20062.021
Figure 8 with 2 supplements
Comparison of Wnt family members across Metazoa.

Comparison of Wnt genes across Metazoa. Tree on the left illustrates the phylogenetic relationships of species used. Dotted lines in the phylogenetic tree illustrate the alternative hypothesis of …

https://doi.org/10.7554/eLife.20062.022
Figure 8—source data 1

List of Parhyale transcription factors by family.

List of Parhyale transcript IDs for all transcription factors in the proteome, grouped by transcription factor family.

https://doi.org/10.7554/eLife.20062.023
Figure 8—source data 2

Wnt, TGFβ and FGF signaling pathways .

Parhyale transcript IDs for Wnt, Wnt ligand, FGF, FGFR and TGFβ pathway genes.

https://doi.org/10.7554/eLife.20062.024
Figure 8—source data 3

Homeobox transcription factors.

Annotation of homeobox transcription factor genes in Parhyale.

https://doi.org/10.7554/eLife.20062.025
Figure 8—figure supplement 1
Phylogenetic tree of FGF and FGR molecules

(A) Phylogenetic tree of arthropod and vertebrate FGFs, including two FGFs from Parhyale (B) Phylogenetic tree of arthropod and vertebrate FGFRs, including a single FGFR in Parhyale.

https://doi.org/10.7554/eLife.20062.026
Figure 8—figure supplement 2
Phylogenetic tree of CERS homeobox family genes.

A phylogenetic tree highlighting an expansion of CERS homeobox family genes in Parhyale.

https://doi.org/10.7554/eLife.20062.027
Homeodomain protein family tree.

The overview of homeodomain radiation and phylogenetic relationships among homeodomain proteins from Arthropoda (P. hawaiensis, D. melanogaster and A. mellifera), Chordata (H. sapiens and B. floridae

https://doi.org/10.7554/eLife.20062.028
Evidence for an intact Hox cluster in Parhyale.

(A–F’’) Double fluorescent in situ hybridizations (FISH) for nascent transcripts of genes. (A–A’’) Deformed (Dfd) and Sex combs reduced (Scr), (B-B’’) engrailed 1 (en1) and Ultrabithorax (Ubx), …

https://doi.org/10.7554/eLife.20062.029
Lignocellulose digestion overview.

(A) Simplified drawing of lignocellulose structure. The main component of lignocellulose is cellulose, which is a-1,4-linked chain of glucose monosaccharides. Cellulose and lignin are organized in …

https://doi.org/10.7554/eLife.20062.030
Figure 12 with 1 supplement
Phylogenetic analysis of GH7 and GH9 family proteins.

(A) Phylogenetic tree showing the relationship between GH7 family proteins of Parhyale, other crustaceans (Malacostraca, Branchiopoda, Copepoda), fungi and symbiotic protists (root). UniProt and …

https://doi.org/10.7554/eLife.20062.031
Figure 12—source data 1.

Catalog of GH family genes in Parhyale.

IDs of all Parhyale GH genes and analyis of GH family membership across available malacostracan data sets.

https://doi.org/10.7554/eLife.20062.032
Figure 12—figure supplement 1
Alignment of GH7 family genes.

Alignment of GH7 family genes in Parhyale with those from Chelura terebans and Limnoria quadripunctata.

https://doi.org/10.7554/eLife.20062.033
Figure 13 with 1 supplement
Comparison of innate immunity genes.

(A) Phylogenetic tree of peptidoglycan recognition proteins (PGRPs). With the exception of Remipedes, PGRPs were not found in Crustaceans. PGRPs have been found in Arthropods, including insects, …

https://doi.org/10.7554/eLife.20062.034
Figure 13—source data 1

Catalog of innate immunity related genes in Parhyale.

Parhyale IDs and numbers of immune related genes in comparison to other species.

https://doi.org/10.7554/eLife.20062.035
Figure 13—figure supplement 1
Overview of Parhyale Dscam structure and hypervariable regions

(A) Overview of domain structure of Parhyale Dscam protein and position of primers used to assess use of exons in 3 hypervariable regions. (B) Sequence alignments of cloned hypervariable regions in …

https://doi.org/10.7554/eLife.20062.036
Figure 14 with 2 supplements
Evolution of miRNA families in Eumetazoans.

Phylogenetic tree showing the gains (in green) and losses (in red) of miRNA families at various taxonomic levels of the Eumetazoan tree leading to Parhyale. miRNAs marked with plain characters were …

https://doi.org/10.7554/eLife.20062.038
Figure 14—source data 1

RFAM based annotation of the Parhyale genome.

RFAM annotation of the Parhyale genome.

https://doi.org/10.7554/eLife.20062.039
Figure 14—figure supplement 1
Phylogenetic trees of Dicer and PIWI/AGO genes.

(A) Phylogenetic tree of Dicer family genes, including two Dicer genes from Parhyale. (B) Phylogenetic tree of PIWI/AGO genes, including several Parhyale genes.

https://doi.org/10.7554/eLife.20062.040
Figure 14—figure supplement 2
Examples of miRNAs in the Parhyale genome.

(A) Parhyale mir-100 and let-7 and clustered together in the intron of a putative lncRNA (B) A Parhyale mir-71/mir-2 family cluster (C) Parhyale mir-10 is in a conserved position in the genome …

https://doi.org/10.7554/eLife.20062.041
Analysis of Parhyale genome methylation.

(A) Phylogenetic tree showing the families and numbers of DNA methyltransferases (DNMTs) present in the genomes of indicated species. Parhyale has one copy from each DNMT family. (B) Amounts of …

https://doi.org/10.7554/eLife.20062.042
Figure 15—source data 1

Genes involved with epigenetic modification.

Catalog of Parhyale genes involved in DNA methylation and histone modifications.

https://doi.org/10.7554/eLife.20062.043
Figure 16 with 1 supplement
CRISPR/Cas9-based genome editing in Parhyale.

(A) Wild-type morphology. (B) Mutant Parhyale with truncated limbs after CRISPR-mediated knock-out (DllKO) of the limb patterning gene Distal-less (PhDll-e). Panels show ventral views of juveniles …

https://doi.org/10.7554/eLife.20062.044
Figure 16—figure supplement 1
CRISPR experiments targeting the Distalless locus.

CRSIPR/Cas-based targeted genome editing in Parhyale. (A) Summary of gene knock-out experiments. (B) Illustration of the targeted PhDll-e (Dll) cDNA showing the 5’ and 3’ untranslated regions …

https://doi.org/10.7554/eLife.20062.045

Tables

Table 1

Experimental resources. Available experimental resources in Parhyale and corresponding references.

https://doi.org/10.7554/eLife.20062.004
Experimental ResourcesReferences
Embryological manipulations
Cell microinjection, isolation, ablation
(Gerberding et al., 2002; Extavour, 2005; Price et al., 2010; Alwes et al., 2011; Hannibal et al., 2012; Rehm et al., 2009; Rehm et al., 2009; Kontarakis and Pavlopoulos, 2014; Nast and Extavour, 2014)
Gene expression studies
In situ hybridization, antibody staining
(Rehm et al., 2009; Rehm et al., 2009)
Gene knock-down
RNA interference, morpholinos
(Liubicich et al., 2009; Ozhan-Kizil et al., 2009)
Transgenesis
Transposon-based, integrase-based
(Pavlopoulos and Averof, 2005; Kontarakis et al., 2011; Kontarakis and Pavlopoulos, 2014)
Gene trapping
Exon/enhancer trapping, iTRAC (trap conversion)
(Kontarakis et al., 2011)
Gene misexpressionHeat-inducible(Pavlopoulos et al., 2009)
Gene knock-outCRISPR/Cas(Martin et al., 2015)
Gene knock-in
CRISPR/Cas homology-dependent or homology-independent
(Serano et al., 2015)
Live imaging
Bright-field, confocal, light-sheet microscopy
(Alwes et al., 2011; Hannibal et al., 2012; Chaw and Patel, 2012Alwes et al., 2016)
Table 2

Assembly statistics. Length metrics of assembled scaffolds and contigs.

https://doi.org/10.7554/eLife.20062.011
# sequencesN90N50N10Sum lengthMax length# Ns
scaffolds133,03514,79981,190289,7053.63 GB1,285,3851.10 GB
unplaced contigs259,3433046271779146 MB40,22223,431
hetero. contigs584,3922654021038240 MB24,461627
genic scaffolds15,16052952161,8194338361.49 GB1,285,385323 MB
Table 3

BAC variant statistics. Level of heterozygosity of each BAC sequence determined by mapping genomic reads to each BAC individually. Population variance rate represents additional alleles found (i.e. m…

https://doi.org/10.7554/eLife.20062.020
BAC IDLengthHeterozygosityPop.Variance
PA81-D11140,2641.6540.568
PA40-O15129,9572.4460.647
PA76-H18141,8441.8240.199
PA120-H17126,7662.6731.120
PA222-D11128,5421.3441.404
PA31-H15140,1432.7930.051
PA284-I07141,3902.0460.450
PA221-A05148,7031.8621.427
PA93-L04139,9552.1770.742
PA272-M04134,7441.9250.982
PA179-K23137,2392.6710.990
PA92-D22126,8482.6500.802
PA268-E13135,3341.6781.322
PA264-B19108,5711.5750.157
PA24-C06141,4461.9461.488
Table 4

Small RNA processing pathway members. The Parhyale orthologs of small RNA processing pathway members.

https://doi.org/10.7554/eLife.20062.037
GeneCountsGen ID
Armitage2phaw_30_tra_m.006391
phaw_30_tra_m.007425
Spindle_E3phaw_30_tra_m.000091
phaw_30_tra_m.020806
phaw_30_tra_m.018110
rm627phaw_30_tra_m.014329
phaw_30_tra_m.012297
phaw_30_tra_m.004444
phaw_30_tra_m.012605
phaw_30_tra_m.001849
phaw_30_tra_m.006468
phaw_30_tra_m.023485
Piwi/aubergine2phaw_30_tra_m.011247
phaw_30_tra_m.016012
Dicer 11phaw_30_tra_m.001257
Dicer 21phaw_30_tra_m.021619
argonaute 11phaw_30_tra_m.006642
arogonaute 23phaw_30_tra_m.021514
phaw_30_tra_m.018276
phaw_30_tra_m.012367
Loquacious2phaw_30_tra_m.006389
phaw_30_tra_m.000074
Drosha1phaw_30_tra_m.015433

Additional files

Source code 1

iPython Notebook for Parhyale genome assembly.

Includes bioinformatic processsing of raw read data, k-mer analysis, contig assembly, scaffolding and CEGMA cased representation analyis.

https://doi.org/10.7554/eLife.20062.046
Source code 2

iPython Notebook for repeat analysis.

Includes repeat analysis of the Parhyale genome using Repeat Modeller and Repeat Masker.

https://doi.org/10.7554/eLife.20062.047
Source code 3

iPython Notebook for transcriptome and annotation.

Parhyale transcriptome assembly, genome annotation and generation of canonical proteome dataset.

https://doi.org/10.7554/eLife.20062.048
Source code 4

iPython Notebook for variant analysis.

Analysis of polymorphism in Parhyale using genome reads, transcriptome data and sanger sequenced BACs.

https://doi.org/10.7554/eLife.20062.049
Source code 5

iPython Notebook of orthology analysis.

Protein orthology analysis between Parhyale and other species

https://doi.org/10.7554/eLife.20062.050
Source code 6

iPython Notebook for RNA.

Analysis of microRNAs and putative lncRNAs in Parhyale.

https://doi.org/10.7554/eLife.20062.051

Download links