Introduction

Coleoid cephalopods (octopus, squid, cuttlefish) are a highly derived group of mollusks, characterized by the largest nervous systems among all invertebrates (ca. 500 million neurons in an adult octopus1,2, ca. 200 million neurons in adult cuttlefish) and specializations with a great historical importance for neuroscience (e.g., “giant axons”3 and “giant synapses”46). These animals exhibit very sophisticated behaviors such as dynamic camouflage713, learning1416, social communication1719 and hunting2022 as well as two-stage sleep2325. Because of these characteristics, cephalopods have been the focus of many fundamental studies by biologists, biophysicists and physiologists over the past century2635. Thanks to advances in sequencing technologies, coleoid cephalopods are now also emerging as animal models in molecular neurobiology, evolution and genomics36. Recent studies in particular examined cephalopod biology from the perspectives of single-cell gene expression3740, genome topology and gene regulation41.

Despite the technical progress, the genomes of cephalopods remain challenging to assemble because of their large sizes and high repeat fractions, creating a bottleneck for further molecular studies. While the genomes of Octopodiformes (Octopus, Eledone, Argonauta) are often comparable in size to that of humans (around 3 Gigabases (Gb)42,43) or possibly even smaller44, the typical genomes of Decapodiformes (squids and cuttlefish) are often twice as large, reaching 6 Gb45,46. The biggest contributing factors to this genome expansion are transposable elements (TEs), with different TE classes differentiating squid and cuttlefish genomes from those of Octopodiformes47. Besides such differences in genome sizes, karyotypes in coleoid cephalopods are also unusual when compared to those of other invertebrates, including mollusks: for example, haploid chromosome numbers are around 30 in octopuses and around 46 in squids and cuttlefish46,48, while they are between 8 and 19 in non-cephalopod mollusks4956. Karyotype reconstruction and validation techniques in animals with such large genomes are also not as developed as they are for more familiar invertebrate and vertebrate species.

The common cuttlefish Sepia officinalis has been used as a model organism for studies in neural control and development57,58, behavior7,15, evolution59, environmental studies60,61 and biomechanics62,63. Found in the Mediterranean, the English Channel and the European Atlantic, it can produce large egg clutches containing hundreds of embryos, several times throughout its semelparous reproductive cycle64,65. More recently, this species has emerged as an important model to study the neural basis of cephalopod camouflage66,67 building on extensive behavioral studies713 and neuronal cell type characterization, evolution and development6870.

Despite a widespread interest in this animal, including the recent sequencing of several cephalopod species as part of the Aquatic Symbiosis Genomics project71, a high-quality, annotated genomic resource for this species is still lacking. We describe here the sequencing, assembly, and annotation of the Sepia officinalis genome. Our analysis provides evidence for the existence of 47 chromosomes, with clear homologies to the chromosomes of related species such as Euprymna scolopes and Doryteuthis pealeii. We predict full-length transcripts and provide functional gene annotation that should inform future evolutionary and transcriptomic studies. Taken together, this genome should be an important resource for scientists interested in genome evolution, brain organization, information processing, development and behavior.

Results

We sequenced the genome of a 6-month-old Sepia officinalis individual (probably male, see below), reared from eggs collected in the Portuguese Atlantic. The DNA was extracted from brain tissue and sequenced using long-read (PacBio HiFi) and chromatin conformation (Hi-C) methods (Figure 1A,B).

Sepia officinalis assembly statistics and quality control.

A) Specimen of S. officinalis (credit: Stephan Junek, MPI Brain Research). B) Overview of the assembly workflow. Genome size was estimated using GenomeScope72,73. The primary assembly was generated from long DNA reads (PacBio Sequel II) and HiC reads (Dovetail OmniC) with hifiasm124. Assembly was scaffolded with YAHS77. C) Snail plot of chromosome-scale S. officinalis assembly generated using blobtools2135 showing scaffold statistics, base composition and BUSCO75 completeness against the metazoa_odb10 database. D) HiC heatmap showing 47 chromosome-scale scaffolds with very few sequences in unplaced scaffolds. Heatmap was generated using juicebox78, 0-7039 observed counts (balanced) are shown.

Genome size and heterozygosity

The haploid genome size of S. officinalis was estimated to be ∼5.14 Gb based on k-mer estimation from the long-read data, with a high repeat content of 54 %72,73. The heterozygosity rate was estimated to be 1.03 %, which is higher than in octopus genomes, yet moderate among marine invertebrates42,43,46.

The size of the scaffolded assembly was 5.68 Gb (Figure 1C), thus about 10% greater than our initial GenomeScope estimate. Indeed, the sizes of most published metazoan genome assemblies deviate by more than 10% from estimates, with proportionally greater deviations as assembly sizes increase74.

Completeness of the genome was assessed using BUSCO75 with metazoa_odb10. It was 95.2% [single copy: 93.5%, duplicated: 1.7%], with 3.2% fragmented and 1.6% missing BUSCOs. The BUSCO score with mollusca_odb10 for the final assembly was 77.2% completeness [single copy: 75.9%, duplicated: 1.3%] with 4.3% fragmented and 18.5% missing BUSCOs.

Karyotype

The assembly was generated from PacBio HiFi reads and Hi-C reads with ca. 23-fold and 11-fold coverage, respectively. Scaffolding was performed using Hi-C reads, placing 95.8 % of bases in 47 scaffolds. The Hi-C heatmap in Figure 1D shows the chromatin contacts in clusters corresponding to these 47 scaffolds. It is consistent with the proposal that they each represent a single chromosome.

To increase confidence in our assembled Sepia officinalis karyotype, we used several scaffolding programs and manually curated the scaffolds using two independent approaches. The first used HapHiC76, a tool based on Hi-C data that utilizes allele information from primary assembly programs, and scaffolds assemblies with the constraint of an expected number of chromosomes. The resulting Hi-C contact maps with 46, 47, 48, 49 and 50 chromosomes (Supplementary Figure 1) show support for 47 chromosomes; an input of 46 scaffolds resulted in one clear chromosome merger (blue arrowhead); higher input numbers resulted in false chromosome splits (black arrowheads, 48, 49 and 50 scaffolds).

Second, we performed manual curation of the scaffolding result obtained with YAHS77 using JBAT78. These two approaches, conducted independently and both based on Hi-C signal, converged on 47 chromosomal scaffolds.

Karyotype comparisons with other Decapods

Previous karyotyping studies in decapod cephalopods based on cytogenetic techniques provide widely varying estimates of chromosome numbers, reflecting the difficulty of resolving large chromosome numbers in situ. For example, 1n=46 chromosomes has been reported for two species of cuttlefish (Sepia esculenta and Sepia lycidas) and three loliginid squids79; 1n=34 chromosomes has been reported for Sepia arabica80 and 1n=24 chromosomes in Sepia pharaonis81. In Sepia officinalis, a karyotype of 1n=52 has even been described from testis samples82. More recently, several cephalopod genomes have been scaffolded at chromosome-scale resolution, resolving karyotypes in silico. From those recent studies, a haploid karyotype of 46 chromosomes seems to be common among Decapodiformes46 (Figure 2A).

Syntenic comparison of three decapod species.

A) Taxonomy of selected cephalopod species showing their genome size and (haploid) chromosome numbers. Taxonomy information downloaded from NCBI taxonomy browser, divergence times from85,136. B) Synteny relationship between chromosomes of E. scolopes46 (top), D. pealeii46 (middle) and S. officinalis (bottom). Riparian plot was generated using orthogroups located on chromosomes of the species. Euprymna chromosomes 45 and 46 are not shown because they contain too few orthogroups. C) Synteny plot of Sepia chromosomes 40 (magenta) and 43 (darkblue), that are joined in the other species and cause the different haploid chromosome number in Sepia.

Comparison with another chromosome-scale Sepia officinalis assembly

A chromosome-scale assembly for Sepia officinalis was released recently by the Wellcome Sanger Institute’s Darwin Tree of Life project83 (DToL, GCA_964300435.1). That genome was assembled from a male individual using PacBio Sequel II and Arima2 Hi-C data, with a final assembly size of 5.8 Gb. In that study, the haploid chromosome number was estimated to be 49. We scaffolded their assembly using our Dovetail Omni-C data, and the resulting contact map again supports 47 chromosomes (Supplementary Figure 2A), consistent with our results. We next aligned the two S. officinalis assemblies using minimap284 to assess the differences between them (Supplementary Figure 2B). The “Sanger” assembly appears to split three chromosomes of our assembly (chromosomes 1, 5 and 42) and to merge two (chromosomes 40 + 43). In conclusion, available evidence supports a haploid karyotype of 47 chromosomes in Sepia officinalis.

Comparisons with other chromosome-scale Decapodiformes genomes

To further investigate the possible reasons why our estimate of Sepia officinalis’s chromosome numbers deviates from those in other studies, we compared our assembly with the chromosome-scale genomes of two other decapod cephalopods, Euprymna scolopes41 (a sepiolidae) and Doryteuthis pealeii46 (a loliginidae), both described as having 46 chromosomes. Using orthogroups to perform linkage analysis, we could detect a clear chromosome homology between Doryteuthis and Sepia, and to a lesser extent between Euprymna and Sepia (Figure 2B). Dotplots showing detailed pairwise synteny comparisons are shown in Supplementary Figure 3 (Sepia to Doryteuthis) and Supplementary Figure 4 (Sepia to Euprymna). The inferred species tree as part of this analysis places Euprymna as sister to Sepia and Doryteuthis, matching recent phylogenetic analyses50,85. However, this tree was constructed without the inclusion of outgroup taxa, and therefore lacks a reliable root.

This comparison revealed that chromosomes 40 and 43 in Sepia are merged into one in Euprymna and in Dorytheutis (Figure 2C), as they were in the DToL Sepia officinalis assembly (Supplementary Figure 2B). Our scaffolding approach showed no contact between chromosomes 40 and 43 in our Sepia Hi-C data, providing support for these chromosomes being separate in Sepia officinalis.

We also compared our assembly to a chromosome-scale Sepia esculenta assembly (1n=46, GCA_964036315.1), and recently annotated by genome liftover from Sepia pharaonis86. That assembly, constructed from a female individual, showed a low read coverage for a putative sex chromosome in a ZZ/Z0 sex determination system87. Our syntenic comparison indicated a strong homology between the putative Z chromosome of Sepia esculenta and chromosome 46 of Sepia officinalis (Supplementary Figure 5A). Because we did not record the sex of the Sepia officinalis animal used for genome sequencing and assembly, we tried to infer the sex by replicating the analysis used to identify the putative Z chromosome in Sepia esculenta87. For this, we aligned short-reads (Illumina) from the same Sepia officinalis individual to the assembly and examined the normalized read coverage for each chromosome (Supplementary Figure 5B). In contrast to the low coverage observed in the female Sepia esculenta assembly (Supplementary Figure 5C), we observed no decrease in read coverage for chromosome 46, suggesting that, if the alleged sex chromosome identity is correct, then our material came from a male animal.

Genome repeat landscape

After creating a custom repeat library for Sepia officinalis using RepeatModeler88, we masked the genome using RepeatMasker89, resulting in 71.17% masked bases. The categories of repeats are shown in Figure 3A. Most repeats (39.65% of total bp) were not characterized and presumably represent ancient repeats that diverged beyond recognition90. Retroelements constituted the largest characterized repeat category (17.32%) followed by DNA transposons (5.92%); 5.83% were annotated as simple repeats. As observed in other Decapodiformes, the Sepia genome contained almost no short interspersed nuclear elements (SINEs), supporting the hypothesis that the SINE expansion observed in octopuses47 occurred independently in their lineage.

Genome annotation for Sepia officinalis.

A) Annotation of repeat landscape of the S. officinalis genome, annotated using RepeatModeler88. Full repeat landscape is shown on the left, annotated repeats (excluding unclassified or simple repeats) are shown on the right. B-C) Quality control of gene annotation and comparison to two other cuttlefish species using OMArk109. Results shown for Sepia lycidas (GCA_963932145.1, Ensembl Genebuild), Sepia officinalis (BRAKER, this study) and Sepia pharaonis137 (BRAKER). Lophotrochozoa was used as the ancestral clade. B) Completeness assessed by the presence of genes conserved in the clade, classified as single or multiple copies (duplicated), or missing. C) Consistency assessed by the proportion of proteins placed in the correct lineage (consistent); placement in incorrect lineages randomly (inconsistent) or to specific species (contamination), or no placement in known gene families (unknown).

Gene modeling and annotation

The genome was further annotated using BRAKER75,91106 combining short- and long-read RNA-seq data and publicly available protein data from multiple molluscan species (Doryteuthis pealeii46, Euprymna scolopes107, Octopus bimaculoides46, Octopus vulgaris43, Nautilus pompilius108, and Pecten maximus51). A total of 18,663 gene models and 23,768 proteins were annotated. The gene annotation was evaluated using BUSCO75 v5.5.0 in protein mode, showing high completeness of the annotation (metazoa_odb10: C:98.2%[S:77.3%,D:20.9%], F:1.0%, M:0.8%, n:954 and mollusca_odb10: C:81.5%[S:59.6%,D:21.9%], F:0.9%, M:17.6%, n:5295). We further checked the completeness and consistency of our gene models using OMArk109, and compared them to genome annotations for two other cuttlefish species, Sepia pharaonis and Sepia lycidas (Figure 3B). The completeness is assessed by comparing the annotated genes to conserved orthologs (present in 80% of extant species) in a given taxonomic clade, here the superorder Lophotrochozoa. The S. officinalis annotation misses 181 out of 2373 genes, which is higher than S. lycidas (96 genes) but much lower than S. pharaonis (458 genes).

In a consistency assessment, where the presence of known gene families from the lineage is evaluated (Figure 3C), S. officinalis contains low proportions of taxonomically inconsistent or fragmented proteins (ca. 13%) similar to S. lycidas (9%), giving confidence in the annotation. In contrast, more than 50% of S. pharaonis proteins are labeled as inconsistent or fragmented, which could indicate annotation errors109.

Overall, the annotations contain different numbers of predicted proteins, likely caused by differences in the annotation method and reference data used. S. officinalis contains the least proteins (23,768) compared to S. lycidas (35,949) and S. pharaonis (53,515). The S. officinalis annotation thus represents a conservative yet high-quality gene set. In the future, this annotation could be enhanced by incorporating more long-read RNA sequencing data, for example.

Lastly, we assigned orthology information to the S. officinalis proteome from InterPro110 and eggNOG-mapper111 to aid the interpretability of the resource for future transcriptomic or proteomic studies. Overall, 89% of proteins (21,204 out of 23,768) received an annotation from InterPro from at least one of their databases. 59% of proteins (14,126 out of 23,768) were annotated by eggNOG-mapper, reflecting the more stringent orthology filters and prioritization of full-length matches implemented in the program112. Similar to previous studies of other cephalopod genomes42, we observed a high number of C2H2 zinc finger transcription factors (1,538 proteins, InterPro annotation) and protocadherins (339 proteins, InterPro), suggesting a conserved expansion of these gene families among cephalopods. Further analyses should reveal more expanded gene families in S. officinalis that underlie species-specific adaptations.

Discussion

This study presents a high-quality, chromosome-scale genome assembly and annotation for the common cuttlefish Sepia officinalis, providing an additional important resource for comparative genomics, neurobiology, and evolutionary studies within cephalopods and mollusks. The assembly was derived from a combination of PacBio HiFi long reads, Dovetail Omni-C chromatin conformation capture, and multiple rounds of scaffolding and manual curation, resulting in a genome size of 5.68 Gb and a haploid karyotype of 47 chromosomes. This resource represents a significant step forward for cephalopod genomics, especially within the Decapodiformes, where large and repetitive genomes have hindered previous efforts.

Chromosome Number and Karyotypic Variation

Our results strongly support a haploid chromosome number of 47 in S. officinalis, a conclusion reinforced by two independent scaffolding approaches (YAHS77 and HapHiC76) and supported by Hi-C contact maps. This karyotype contrasts with the 46 chromosomes reported in other decapod cephalopods, such as Euprymna scolopes and Doryteuthis pealeii46. Syntenic comparisons revealed that chromosomes 40 and 43 in S. officinalis correspond to a single chromosome in these other species. The current Hi-C data for Sepia officinalis provides clear evidence that chromosomes 40 and 43 are distinct, with no interaction contact. Because these two chromosomes are merged also in S. esculenta, it is plausible that this difference results from a S. officinalis-specific fission. However, our results show how difficult it still is to assemble large genomes with high karyotype numbers. Therefore, validation and comparisons with other Sepia species’ genomes will be important. This, in turn, will help identify chromosomal fusion-with-mixing events that are clade specific. Early branching order in Decapodiformes has been notoriously unstable50,85,113116; thus, such rare and irreversible phylogenetic characters could be useful in further phylogenetic analysis of this clade48,117.

We also compared our results to the recently published S. officinalis genome from the Darwin Tree of Life project (DToL)83, which proposed 49 chromosomes. The discrepancy is attributable to one merger in our assembly and three apparent chromosome splits in the DToL assembly. While these differences could stem from technical choices such as different Hi-C methods (Arima v2 vs. Dovetail Omni-C) and sequencing depth (83-fold vs. 11-fold coverage), our assembly is strongly supported by Hi-C contact data, especially in regions where the DToL assembly splits chromosomes that appear contiguous in our data. Interestingly, the DToL assembly merges the chromosomes 40 and 43 - supporting a 46-chromosome model as ancestral state. However, this is not supported by re-scaffolding with the current Omni-C dataset, which again shows them as separate. This discrepancy might reflect technical differences, but it could also be hinting at population-level variability in chromosomal architecture; that is, some individuals may carry a fusion while others do not. Intraspecific chromosome number variation is rare but not unprecedented; for instance, chromosomal polymorphism has been described in the butterfly Leptidea sinapis across different populations118,119. Notably, the two S. officinalis individuals used for sequencing originated from different regions - the Portuguese Atlantic (this study) and the French Mediterranean (DToL project), raising the possibility of geographic variation in chromosome structure. Future population-level analyses will be critical to determine whether S. officinalis exhibits karyotypic polymorphism.

Genome Size and Repeat Landscape

The size of the genome of S. officinalis is estimated at 5.68 Gb - comparable to other Decapodiformes and roughly double the size of typical Octopodiformes genomes. Repeat annotation revealed that over 71% of the genome is repetitive, dominated by unclassified elements and retrotransposons. The near absence of SINEs reinforces the idea that their expansion in octopuses was a lineage-specific event47. These results underscore the complexity and evolutionary dynamism of the cephalopod genome, shaped by waves of transposon activity that likely played a role in the evolution of novel traits41,46.

Gene Annotation and Comparative Assessment

We annotated 18,663 gene models and 23,768 proteins using a combination of short- and long-read RNA-seq data and molluscan protein references. Compared with other annotated cuttlefish genomes (S. lycidas, S. pharaonis), our annotation is conservative in gene number but stands out in terms of completeness and taxonomic consistency. OMArk-based evaluations confirmed that our gene models have low levels of fragmentation and contamination, and high lineage-specific coherence. Striking differences across species in the number of predicted genes, duplications, and missing orthologs likely reflect variation in the annotation pipelines and reference data rather than true biological divergence. Still, our dataset provides a high-confidence foundation that can be refined with additional transcriptomic data from long-read techniques and diverse tissues.

As in other coleoid genomes, we observed notable expansions of gene families such as C2H2 zinc finger transcription factors and protocadherins, genes implicated in neural development and plasticity42,45. These conserved expansions across cephalopods point to genomic underpinnings of their cognitive complexity and behavioral sophistication.

Conclusions and Outlook

This study delivers a foundational genomic resource for Sepia officinalis, a model for studying neural control, behavior, and adaptive camouflage. The chromosome-scale assembly and annotation open the door to in-depth studies of gene regulation, neural circuit development, and genome evolution across coleoid cephalopods. Moreover, the identification of possible population-level karyotype variation highlights the importance of incorporating broader sampling strategies in future genomic studies.

As we move toward a more complete picture of cephalopod genome evolution, the integration of chromosomal synteny, regulatory architecture, and transcriptomic diversity across species will be especially important. The S. officinalis genome represents an important step on this path, enabling high-resolution comparative and functional analyses of one of the most enigmatic and evolutionarily successful invertebrate lineages.

Materials & Methods

Animal husbandry

All research and animal care procedures were carried out following the institutional guidelines that comply with national and international laws and policies (DIRECTIVE 2010/63/EU; German animal welfare act; FELASA guidelines). European cuttlefish Sepia officinalis were hatched from eggs collected in the Portuguese Atlantic and reared in a seawater system, at 20°C. The closed system contains 4,000l of artificial seawater (ASW; Instant Ocean) with a salinity of 33‰ and pH of 8–8.5. Water quality was tested daily and adjusted as required. Trace elements and amino acids were supplied weekly. Marine LED lights above each tank provided a 12/12-h light/dark cycle with gradual on- and off-sets at 07:00 and 19:00. The animals were fed live food (either Hemimysis spp. or small Palaemonetes spp.) ad libitum twice per day. The animals were housed together in 120-l glass tanks with a constant water through-flow resulting in five complete water exchanges per hour. Enrichment consisted of natural fine-grained sand substrate, seaweed (Caulerpa prolifera), rocks of different sizes and various natural and man-made three-dimensional objects.

Tissue preparation

Animals underwent terminal anesthesia to prevent animal suffering or distress, following the Guidelines for the Care and Welfare of Cephalopods in Research120,121. Animals were transferred to a bucket containing ethanol 2% (v/v) in ASW. After 5 minutes, animals were gently probed for simple avoidance reflexes, and the ethanol concentration in the ASW was gradually increased to a maximum of 5%. Sufficient depth of anesthesia was reached when the animal no longer reacted even to stronger stimuli (touching and pinching with tweezers), and no reaction was observed when a hand was moved in the visual area and the cornea was touched122. In this deep anesthesia, the animals were decapitated and the tissue was rapidly transferred into ice-cold, calcium-free Ringer solution (460 mM NaCl, 10 mM KCl, 51 mM MgCl2, 10 mM glucose, 2 mM Glutamine, 10 mM HEPES, pH 7.4) bubbled with oxygen. Brain and body tissue was dissected under a stereoscope and flash-frozen in liquid nitrogen before storage at -80°C until further use.

Unless stated otherwise, sequencing libraries were prepared from the same individual (6-month-old adult Sepia officinalis, F1 from eggs collected in Portugal).

Long-read Whole Genome Library Preparation and Sequencing

The long-read library was prepared and sequenced at the MPI for Plant Research in Cologne, Germany. Genomic DNA was extracted from flash-frozen brain tissue with the MagAttract HMW DNA kit (Qiagen, Cat. no. 67563) and the sequencing library was prepared with the SMRTbell express template prep kit 2.0 (PacBio, Cat. no. 100-938-900). The library was sequenced on 5 SMRT cells on the PacBio Sequel II with the Sequel II binding kit 2.2 (102-089-000), Sequel II sequencing kit 2.0 (101-820-200) and SMRT Cell 8M tray (101-389-001).

Long-read RNA library preparation and sequencing

RNA was isolated from 5-10 mg of flash-frozen tissue (brain, mantle/epidermis, arm/tentacle). First, tissue samples were homogenized in TRIzol using a homogenizer (VDI12/S12N-5S, VWR, Germany). RNA was isolated and DNAseI treated according to the Direct-zol RNA Mini prep kit from ZymoResearch (R2050). RNA was quantified by spectrophotometry (Nanodrop, Thermo Scientific, USA) and its quality was assessed with a RNA 6000 Nanochip on Bioanalyzer (Agilent Technologies, Germany).

The Iso-Seq libraries were prepared and sequenced at the Sequencing facility of the MPI for Plant Research, using a method targeting the 5’ cap and poly-A tail of mRNAs, described in detail in123. Briefly, mRNA was pooled from the 8 tissue samples and cDNA was synthesized using the TeloPrime Full-Length cDNA Amplification Kit V2 (Lexogen, Cat Nr: 013.08 or 013.24). The sequencing library was prepared with the SMRTbell express template prep kit 2.0 (PacBio, Cat. no. 100-938-900). The pooled library was sequenced on 1 SMRT cell on the PacBio Sequel II system using the Sequel II binding kit 2.1 (101-843-000), Sequel II sequencing kit 2.0 (101-820-200) and SMRT Cell 8M tray (101-389-001).

Further, a separate library was prepared from optic lobe cDNA and sequenced on a second SMRT cell using the same reagents described above.

Omni-C Library Preparation and Sequencing

An Omni-C library was prepared and sequenced at the MPI for Plant Research in Cologne, Germany. The library was prepared from brain tissue using the DovetailTM Omni-C™ Kit (Dovetail Genomics, Cat. No. 21005) according to the manufacturer’s protocol. Briefly, chromatin was fixed in place in the nucleus. Fixed chromatin was digested with DNase I then extracted. Chromatin ends were repaired and ligated to a biotinylated bridge adapter followed by proximity ligation of adapter-containing ends. After proximity ligation, crosslinks were reversed and the DNA was purified from proteins. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The sequencing library was generated using Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of the library. The library was sequenced on an Illumina NextSeq 2000 platform to generate 400 million 2 x 150 bp read pairs.

Short-read DNA sequencing

High molecular weight genomic DNA was isolated from brain tissue (optic lobe) using NEB Monarch gDNA Purification Kit (T3010S). First, 10 mg of flash-frozen tissue were submerged in 100 µl of tissue lysis buffer, cut into small pieces and transferred into a 1.5 ml tube containing an additional 100 µl of lysis buffer. Then, 3 µl of 20 U/µl Proteinase K was added to the tissue sample and incubated for 2 h at 56°C in a thermal mixer with agitation until tissue pieces were completely dissolved. After incubation, tissue lysate was purified from the debris by centrifugation for 3 min at maximal speed (Eppendorf, Germany). Clean supernatant was transferred to a new 1.5 ml tube and incubated with 3 µl RNase A for 5 min at 56°C with agitation at full speed. After RNase A incubation tissue lysate was column purified and eluted with 80 µl of elution buffer according to the manufacturer protocol. The purity of gDNA was assessed using A260/280 and A260/230 absorbance ratios measured by NanoDrop spectrophotometer (ThermoFisher Scientific). The integrity of the obtained gDNA was checked using 0,75% agarose gel electrophoresis by loading 100 ng of gDNA sample and using Hind III digested Lambda DNA (NEB#3012) as a marker.

DNA sequencing libraries were prepared from 500 ng of high MW gDNA using Illumina DNA PCR-Free Tagmentation Library Prep Kit (20041794) and UD Indexes Set A (20026121). Obtained dual-indexed single-stranded libraries were quantified using the Qubit single-stranded DNA (ssDNA) assay kit (Q10212, ThermoFisher Scientific). 1000 pM of the final library were run on P3 reagents and Illumina NextSeq2000 using Illumina DNA PCR-Free Sequencing and Indexing primer (20041797) with 151 bp paired-end reads.

Short-read RNA library preparation and sequencing

For short-read RNA sequencing, tissue from anoth er animal (8-month-old adult, F0 from eggs collected in Normandie, France) was used. RNA was isolated from 5 mg of flash-frozen tissues (brain, skin and retina). First, tissue samples were homogenized in TRIzol using a homogenizer (VDI12/S12N-5S, VWR, Germany) following DNAseI treatment and column purification according to the Direct-zol RNA Micro prep kit from ZymoResearch (R2062). The RNA integrity and quantity were measured with the Qubit fluorometer (Invitrogen, Q33216) and the 2100 Bioanalyzer (Agilent Technologies, Germany).

RNA-seq libraries were prepared from 300 ng of total RNA, using the Illumina TruSeq stranded mRNA library prep kit (Cat:20020594) and IDT for Illumina xGen UDI-UMI Adapters (Cat:10005903). Libraries were sequenced on an Illumina NextSeq500 (Mid output, 300 cyc) and NextSeq2000 (P3, 300 cyc) with 145 bp paired-end reads.

Nuclear genome assembly

Before assembly, PacBio HiFi reads were subjected to additional trimming to remove residual adapter sequences using NCBI’s VecScreen tool. Genome size was estimated using GenomeScope72 from Illumina short reads and PacBio HiFi data. The primary assembly was generated with hifiasm124 using a combination of HiFi and Hi-C reads.

Scaffolding was performed iteratively with YAHS77 on the phased haplotype 1. We adjusted the run parameters for scaffolding resolutions (-r 1000,2000,5000,10000,20000,30000,50000,70000,100000,150000,200000,250000,300000, 350000,400000,500000,700000,1000000,2000000,5000000,10000000,15000000,20000000,30000000,40000000,50000000,60000000,70000000,80000000,100000000,110000000,120 000000,150000000,170000000,200000000,500000000), repetitions per resolution (-R3), minimum read mapping quality (-q1) and telomeric sequence (--telo-motif “TTAGGG”) from the only experimentially determined telomere motif for cephalopods (Amphioctopus areolatus125, accessed via TeloBase126) to optimize the tool’s performance for a large, fragmented assembly. Finally, the assembly was manually curated using JBAT78 to place residual scaffolds into chromosome scaffolds. Different versions of the assembly were evaluated based on HiC coverage, mRNA alignment and completeness based on BUSCO75 v5.5.0 (metazoa_odb10 and mollusca_odb10).

Mitochondrial genome assembly

To assemble the mitochondrial genome, we aligned a published S. officinalis mitochondrial genome127 (NC_007895.1) to the PacBio Hi-Fi and Omni-C reads using minimap284. The hits were extracted using seqtk128 subseq and assembled using hifiasm124. The resulting contigs contained multiple repeats of the circular mitochondrial genome, and were aligned back to NC_007895.1 to extract the final sequence.

Nuclear genome annotation

Repetitive elements in the genome were softmasked using RepeatMasker v4.1.7-p189 (with rmblast v2.14.1+) after creating a custom repeat library with RepeatModeler v.2.0.688.

Gene models were created using BRAKER375,91106, run via the Docker container, on the softmasked genome using both RNA-seq and protein data. We used long-read Iso-Seq data and short-read Illumina RNA-seq data generated for this study (see above). For protein data, publicly available peptide annotations for Doryteuthis pealeii46, Euprymna scolopes107, Octopus bimaculoides46, Octopus vulgaris43, Nautilus pompilius108, Pecten maximus51 were used for training, as well as a previously generated proteome for Sepia officinalis from StringTie gene models (this study). RNA-seq data was input into BRAKER3 using the —bam option, protein files were specified with the–prot_seq option. The BUSCO completeness of the resulting gene set was maximized using TSEBRA within BRAKER on the BUSCO lineage metazoa_odb10.

For the initial protein set used as input for BRAKER, the short and long RNA reads were aligned to the genome using minimap284. Transcript models were predicted with StringTie v3.0.0129 using the --conservative and --mix options. The resulting GTF files were combined using the transcript merge mode, resulting in a set of non-redundant transcripts. Finally, TransDecoder130 v5.7.0 was run with default parameters to translate coding regions in the transcripts.

The final gene annotations for S. officinalis were assessed for completeness using OMArk v.0.3.0109 with the ancestral clade Lophotrochozoa and without including splice information, accessed via their webserver. Proteins were annotated with orthology information using InterProScan110 v5.73-104 including lookup of annotations and GO terms (options -iprlookup -goterms). Further orthology information was added using the eggNOG-mapper v2.1.12111 webserver with the eggNOG v5.0 database and default parameters.

Whole genome alignment and synteny analysis

For whole genome alignments, the assembly produced for Sepia officinalis in this study and the published assembly from the Darwin Tree of Life project (GCA_964300435.1) were aligned using minimap284 and visualized with a custom script in R v4.4.2131.

For synteny analyses, peptide and gtf files were downloaded for Doryteuthis pealeii and Euprymna scolopes. Annotation files for Sepia esculenta were recently generated by liftover annotation from Sepia pharaonis87 and downloaded from Zenodo86. Synteny analyses between all chromosomes of the compared species were performed using the R package GENESPACE v.1.2.3132. S. officinalis was used as the reference species for all synteny maps.

Coverage analysis

Read coverage across chromosomes was analyzed as described recently87. Short reads were aligned using STAR v2.7.11b133 to our chromosome-scale assembly. For Sepia esculenta, short reads (ERR12945500) and assembly (GCA_964036315.1) were downloaded from NCBI. The sequencing depth was calculated using mosdepth134 by a window size of 500,000 bp and normalized to the median coverage of the first chromosome.

HapHiC scaffolding for different numbers of expected chromosomes show 47 chromosomes as most supported.

Hi-C contact maps from HapHiC76 are shown for 46, 47, 48, 49 and 50 expected chromosome scaffolds. Assembled chromosomes are shown as blue boxes, HiC signal indicating a false (unsupported) merger is shown by cyan arrow, false splits are shown by black arrows.

Comparison of Sepia officinalis chromosome-scale assemblies.

A) HiC heatmap of GCA_964300435.1 scaffolded against Omni-C data collected from a different individual (this study). Assembled chromosomes are shown as blue boxes, HiC signal indicating a false merge is shown by cyan arrow, false splits are shown by black arrows. B) Whole-genome alignment of both S. officinalis assemblies (this study: rows, GCA_964300435.1: columns).

Syntenic relationship between S. officinalis and D. pealeii chromosomes.

Dot plot showing finer-resolution syntenic anchor hits (perfectly collinear blast hits within the same orthogroup). Genes are ordered along the chromosomes, gridlines are shown every 1000 genes.

Syntenic relationship between S. officinalis and E. scolopes chromosomes.

Dot plot showing finer-resolution syntenic anchor hits (perfectly collinear blast hits within the same orthogroup). Genes are ordered along the chromosomes, gridlines are shown every 1000 genes. E. scolopes chromosomes 45 and 46 are not shown because they contain too few orthogroups.

Syntenic comparison of four decapod species hints at a cephalopod sex chromosome.

A) Riparian plot showing synteny relationships of chromosomes from four decapod species, generated using orthogroups. Euprymna chromosomes 45 and 46 are not shown because they contain too few orthogroups. Chromosome split in S. officinalis compared to other species is shown in purple, putative sex chromosome as identified recently87 is shown in cyan. B) Normalized coverage of sequencing data in S. officinalis chromosomes. C) Normalized coverage of short reads to female S. esculenta genome, reproduced from87. Decrease in read coverage for chromosome 46 visible, the putative Z sex chromosome. Read depth was calculated from Illumina gDNA reads in windows of 500,000 bp and normalized to the median coverage of chromosome 1. Box plots showing median divergence (box dividing line), interquartile range (box), and 1.5 times the interquartile range (whiskers). Putative Z chromosome is highlighted in cyan.

Acknowledgements

We thank Bruno Huettel for PacBio and Omni-C sequencing. We thank Xitong Liang, Theodosia Woo and Mathieu Renard for help with tissue dissection. We thank Darrin Schultz, Dalila Destanović and Thea Rogers for helpful discussions and advice on manual curation and gene modeling.

This work was funded by the Max Planck Society (GL) and the European Research Council (GL; ERC grant CAMOUFLAGE, 101141501). O.S. was supported by the ERC’s Horizon 2020: European Union Research and Innovation Programme, grant No. 945026.

All authors declare no conflict of interest. Views and opinions expressed are those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

Additional information

Data availability

The genome assembly and raw sequencing reads can be found at the BioProject PRJNA1091451 on NCBI. The code for the genome assembly and annotation is available at https://gitlab.mpcdf.mpg.de/mpibr/laur/cuttlefishomics/soffgenome.

Genome annotation files are deposited at https://public.brain.mpg.de/Laurent/sepoff/annotation/.

Author contributions

O.S., and G.L. conceived of the study design. S.R., O.S. and G.L. wrote the first draft of the manuscript. D.H. collected tissue for sequencing. E.C. extracted DNA and RNA from tissue and performed Illumina sequencing. G.T. performed genome assembly. G.T. and S.R. manually curated and annotated the genome. S.R., G.T., and O.S. performed genomic analyses. S.R. created the figures. S.R., O.S. and G.L. coordinated and led the project. G.L. acquired funding for this study. All authors contributed to the review and editing of the manuscript.

Funding

European Research Council

https://doi.org/10.3030/101141501