Figures and data in Chromosome-scale genome assembly of the European common cuttlefish Sepia officinalis

Figures
Tables
Additional files

7 figures, 4 tables and 1 additional file

Figures

Figure 1 with 1 supplement

Download asset Open asset

*Sepia officinalis* assembly statistics and quality control.

(A) Specimen of *S. officinalis* (credit: Stephan Junek, MPI for Brain Research). (B) Overview of the genome assembly workflow. Genome size was estimated from short DNA reads (Illumina) using GenomeScope (Ranallo-Benavidez et al., 2020; Vurture et al., 2017). The primary assembly was generated from long DNA reads (PacBio Sequel II) and chromatin conformation capture (Hi-C) reads (Dovetail OmniC) with hifiasm (Cheng et al., 2021). Assembly was scaffolded with YAHS (Zhou et al., 2023) and residual small scaffolds were manually placed in chromosomes. (C) Snail plot of chromosome-scale *S. officinalis* assembly generated using blobtools2 (Challis et al., 2020) showing scaffold statistics (e.g. number of scaffolds, median scaffold length N50), base composition, and completeness measured using Benchmarking Universal Single-Copy Orthologs (BUSCO) (Simão et al., 2015) against the *metazoa_odb12* database. (D) Hi-C heatmap showing the 47 chromosome-scale scaffolds with few sequences remaining in unplaced scaffolds. X and y-axes show the genome position in Mbp. The heatmap was generated using juicebox (Dudchenko et al., 2018), 0–7039 observed counts (balanced) are shown.

Figure 1—figure supplement 1

Download asset Open asset

HapHiC scaffolding for different numbers of expected chromosome scaffolds show 47 chromosomes as most supported.

Hi-C contact maps from HapHiC (Zeng et al., 2024) are shown for 46, 47, 48, 49, and 50 expected chromosome scaffolds. Assembled chromosomes are shown as blue boxes, Hi-C signal indicating a false (unsupported) merger is shown by cyan arrow, false splits are shown by black arrows. The contact maps differ from the map shown in Figure 1, which was created using YAHS and manual curation.

Figure 2 with 2 supplements

Download asset Open asset

Comparison of two *Sepia officinalis* chromosome-scale assemblies indicates chromosome number of 1n=46.

Datasets were collected from two *S. officinalis* animals, one as described in this study (MPIBR), the second by the Darwin Tree of Life consortium (DToL) (Blaxter et al., 2022). Both datasets were assembled using a common pipeline (hifiasm and YAHS). (A) Hi-C contact map of the MPIBR primary assembly, scaffolded using YAHS without manual curation. Assembled 47 chromosome scaffolds are shown as blue boxes. (B) Hi-C contact map of DToL primary assembly, scaffolded using YAHS without manual curation, showing 49 assembled chromosome scaffolds as blue boxes. (C) Whole-genome alignment of both scaffolded assemblies using Winnowmap2 (Jain et al., 2022), showing DToL on x-axis and MPIBR on the y-axis. The 4 ‘breakpoints’ of chromosomes in either of the assemblies (three breaks in DToL chromosomes compared to MPIBR, one break in MPIBR compared to DToL) are highlighted in different colors. (D) Ribbon diagram showing the four breakpoints from (C) compared to the chromosome-scale assembly from another cuttlefish, *Acanthosepion esculentum* (1n=46). The color of breakpoints are the same in panels C+D.

Figure 2—figure supplement 1

Download asset Open asset

BUSCO completeness results.

(A) Comparison of two *S. officinalis* chromosome-scale assemblies, which were constructed from two independent datasets (this study: MPIBR, Darwin Tree of Life project: DToL), assembled using a common pipeline (hifiasm Cheng et al., 2021 with PacBio HiFi and Hi-C reads). Results for the database *metazoa_odb12*, the zoom in shows only duplicated, fragmented, and missing fractions to improve readability. The DToL assemblies have slightly higher completeness than MPIBR, due to the higher sequencing coverage used as input. In both datasets, compared to the primary assembly (‘.hic’), the phased haplotypes (‘.hic.hap1’ and ‘hic.hap2’) have less duplicated but more missing genes. (B) BUSCO results for the *mollusca_odb12* database, showing the same trend as in (A). (C) Comparison of different BUSCO databases *odb10* and *odb12* on the manually curated assembly (‘sepoff241117’). For the *mollusca* gene sets (top), a strong improvement in completeness was observed between *odb10* and *odb12*, reflecting that the updated gene set is more concise and conserved across species. For the *metazoa* gene sets (bottom), the completeness was marginally increased for *odb12* compared to *odb10*.

Figure 2—figure supplement 2

Download asset Open asset

Analysis of raw data at breakpoints between S. *officinalis* assemblies hints at a technical cause of breakpoints.

(A) Coverage of HiC and HiFi data shown for pairs of scaffolds exhibiting breakpoints. Blue shows MPIBR data, orange shows Darwin Tree of Life project (DToL) data. For each breakpoint, trans HiC contacts are shown on top across the full scaffold, with terminal 200 kb windows highlighted in yellow. Both terminal windows are shown below with aligned HiFi reads (gray horizontal bars) and normalized HiFi read density. Trans HiC contacts are shown as purple dots. Right gray box: same data shown for the complete breakpoint scaffold of the other assembly, with trans HiC contacts calculated to a size-matched scaffold. (B) Distribution of normalized trans HiC contact rate (pairs per Mb Young, 1963a) for random scaffold pairs (‘background pairs,’ gray) and within scaffolds (‘intra scaffold,’ green) for MPIBR (left) and DToL (right) data. Values for scaffolds with breakpoints are indicated in blue and orange, respectively. (C) Histogram of contact rates from (B) shown for random scaffold pairs and breakpoint pairs. Contact rates and empirical p-values of breakpoint pairs are indicated in blue (left, MPIBR) and orange (right, DToL). Joint p-value for three rates for DToL breakpoints is indicated in box (Wilcoxon rank-sum, one-tailed). (D) Repeat analysis of 200 kb scaffold ends at breakpoints and control scaffolds (gray box). Overall repeat content (% of base pairs) and type are shown.

Figure 3 with 3 supplements

Download asset Open asset

Syntenic comparison of three decapod species.

(A) Taxonomy of selected cephalopod species showing their genome size (in gigabases, Gb) and haploid chromosome numbers. Taxonomy information was downloaded from NCBI taxonomy browser, divergence times for Coleoidea and Decapodiformes from Kröger et al., 2011 and for Sepiidae from López-Córdova et al., 2022. (B) Genome-wide syntenic relationship between chromosomes of *E. scolopes* (Albertin et al., 2022) (top), *D. pealeii* (Albertin et al., 2022) (middle), and *S. officinalis* (bottom). Colored braids connect syntenic regions across genomes, with chromosomes drawn to physical scale. *Euprymna* chromosomes 45 and 46 are not shown because they contain too few orthogroups. (C) Detailed synteny of *Sepia* chromosomes 40 (magenta) and 43 (dark blue) shown, that are joined in the other species and cause the different haploid chromosome number in *Sepia*. Riparian plots were generated using GENESPACE v1.2.3 (Lovell et al., 2022).

Figure 3—figure supplement 1

Download asset Open asset

Syntenic relationship between *S. officinalis* and *D. pealeii* chromosomes.

Dot plot showing finer-resolution syntenic anchor hits (perfectly collinear blast hits within the same orthogroup). Genes are ordered along the chromosomes, only chromosome pairs with a minimum synteny score of 10 and at least 10 syntenic genes are shown. Synteny analysis and visualization were performed using GENESPACE v1.2.3 (Lovell et al., 2022).

Figure 3—figure supplement 2

Download asset Open asset

Syntenic relationship between *S. officinalis* and *E. scolopes* chromosomes.

Dot plot showing finer-resolution syntenic anchor hits (perfectly collinear blast hits within the same orthogroup). Genes are ordered along the chromosomes, only chromosome pairs with a minimum synteny score of 10 and at least 10 syntenic genes are shown. *E. scolopes* chromosomes 45 and 46 are not shown because they contain too few orthogroups. Synteny analysis and visualization were performed using GENESPACE v1.2.3 (Lovell et al., 2022).

Figure 3—figure supplement 3

Download asset Open asset

Syntenic comparison of four decapod species hints at a cephalopod sex chromosome.

(A) Riparian plot showing synteny relationships of chromosomes from four decapod species, generated using GENESPACE (Lovell et al., 2022) with orthogroups. *Euprymna* chromosomes 45 and 46 are not shown because they contain too few orthogroups. Chromosome split in *S. officinalis* compared to other species is shown in purple, putative sex chromosome as identified recently (Coffing et al., 2025) is shown in cyan. (B) Normalized coverage of sequencing data in *S. officinalis* chromosomes. (C) Normalized coverage of short reads to female *A. esculentum* genome, reproduced from Coffing et al., 2025. Decrease in read coverage for chromosome 46 is visible, the putative Z sex chromosome. Read depth was calculated from Illumina gDNA reads in windows of 500,000 bp and normalized to the median coverage of chromosome 1. Box plots showing median divergence (box dividing line), interquartile range (box), and 1.5 times the interquartile range (whiskers). The putative Z chromosome is highlighted in cyan. Chromosomes with significantly reduced read coverage (orange label) were identified by a one-sided Wilcoxon rank-sum test of each chromosome’s normalized depth windows against all remaining chromosomes (Benjamini-Hochberg-corrected, at least 10% decrease in median normalized depth, *p<0.5, **p<0.01, ***p<0.001).

Figure 4 with 1 supplement

Download asset Open asset

Genome annotation for *Sepia officinalis*.

(A) Annotation of repeat landscape of the *S. officinalis* genome, annotated using RepeatModeler (Flynn et al., 2020). Full repeat landscape is shown on the left, annotated repeats (excluding unclassified or simple repeats) are shown on the right. (**B–C**) Quality control of gene annotation and comparison to two other cuttlefish species using OMArk (Nevers et al., 2022). Results shown for *Acanthosepion lycidas* (GCA_963932145.1, Ensembl Genebuild), *Sepia officinalis* (BRAKER, this study), and *Acanthosepion pharaonis* (Song et al., 2021) (BRAKER). Lophotrochozoa was used as the ancestral clade. (B) Completeness assessed by the presence of genes conserved in the clade, classified as *single* or multiple copies (*duplicated*), or *missing*. (C) Consistency assessed by the proportion of proteins placed in the correct lineage (*consistent*); placement in incorrect lineages randomly (*inconsistent*) or to specific species (*contamination*), or no placement in known gene families (*unknown*). (D) Phylogenetic tree of 13 molluscan species used for analysis of gene families with Orthofinder (Emms et al., 2025). Species are colored by clade: purple = coleoid cephalopods, blue = nautiloid (non-coleoid cephalopod), green = non-cephalopod mollusk. (E) Heatmap of largest gene families (orthogroups from Orthofinder, with more than 100 genes in any species), ordered from largest gene count across all species on the left. Families with at least one gene in *S. officinalis* are depicted. Rows show gene counts for each species (color capped at 500 genes), columns show orthogroups and their annotation by eggNOG mapper (Cantalapiedra et al., 2021; Huerta-Cepas et al., 2017) or InterProScan (Blum et al., 2025), if available. Clade colors match (D).

Figure 4—figure supplement 1

Download asset Open asset

Gene family expansion analysis.

(A) Gene family expansion analysis using CAFE5 (Mendes et al., 2021) with a gamma model (k=3) on all smaller gene families (less than 100 genes in any species). 30 families with the most change in different categories are shown (expanded only in *S. officinalis* (pink), in all coleoids (orange), in all species (yellow), in non-cephalopod mollusks (green), or overall contraction (blue)). Rows show change (expansion or contraction) of gene families in any species, columns show orthogroups and annotation, if available. Dots show significant change (p<0.05), gene counts are shown for any orthogroup with at least 12 genes in any species. (B) Gene families with differential expression in bulk RNA-seq data. Dot size shows the number of differentially expressed (DE) genes for each tissue. (C) Dotplots of enriched (gene ontology GO) terms for large gene families, enriched using clusterProfiler using a hypergeometric test. Dot size shows the number of expressed genes per family with this GO term, x-axis shows percentage of expressed genes from all genes with this GO term. Dot color shows adjusted p-value after Benjamini-Hochberg false discovery rate (FDR) correction. CC: cellular component, MF: molecular function, BP: biological process. (D) Heatmap of z-scored expression of all DE genes from the gene families with enriched GO terms.

Figure 5

Download asset Open asset

Expression of expanded gene families in tissue bulk RNA-seq data.

Bulk RNA-seq data collected from one adult *S. officinalis* from different brain tissues (optic lobes - yellow, basal lobes - turquoise, vertical and subvertical lobes - orange, posterior subesophageal mass - purple), retina (red), and skin (blue, from the dorsal mantle). Tissue color code is identical throughout the figure. (A) Principal component analysis (PCA) of the data, showing the first 2 PCs, colored by tissue. (B) Barplot showing number of differentially expressed (DE) genes (i.e. marker genes) for each tissue, calculated against all other tissues using DESeq2 (Love et al., 2014). (C) Largest gene families (orthogroups) with differential expression in bulk RNA-seq data. Dot size shows the number of DE genes for each tissue. Families with enriched gene ontology (GO) terms are highlighted in gray. (**D+E**) Dotplots of enriched gene ontology (Aleksander et al., 2026; Ashburner et al., 2000) (GO) terms for large gene families, enriched using clusterProfiler (Xu et al., 2024) using a hypergeometric test. Dot size shows the number of expressed genes per family with this GO term, x-axis shows the percentage of expressed genes from all genes with this GO term. Dot color shows the adjusted p-value after Benjamini-Hochberg false discovery rate (FDR) correction. CC: cellular component, MF: molecular function, BP: biological process. (F) Heatmap of z-scored expression of all DE genes from the largest gene families with enriched GO terms.

Author response image 1

Download asset Open asset

Analysis of Hi-C read pairs from both *S. officinalis* assemblies.

Hi-C reads were aligned to the primary contigs from hifiasm (as is used for scaffolding with YAHS) and analyzed using pairtools. Note the higher fraction of long-range contacts (at least 1 kb cis pairs or trans pairs) in the MPIBR data (top) compared to DToL (bottom). Due to overall higher coverage, the absolute number of read pairs is higher for DToL than for MPIBR data.

Author response image 2

Download asset Open asset

Quantification of repeat content in chromosome scaffolds and unplaced residual scaffolds.

Density plot showing fraction of repeat masked bases in total sequence length for chromosome scaffolds (i.e. scaffolds 1-47) in teal and all remaining small scaffolds (1840 scaffolds) in purple. Median repeat fraction is shown as vertical lines.

Tables

Table 1

Statistics of S. officinalis assemblies from two independent datasets, assembled using a common pipeline.

	MPIBR reassembly			DToL reassembly
version	p_ctg	hap1	hap2	p_ctg	hap1	hap2
number of contigs	8.289	10.651	10.425	8.783	11.026	11.089
raw length [bp]	6.049.669.443	5.675.386.986	5.662.586.038	6.053.996.452	5.721.157.269	5.950.565.264
N50 length [bp]	1.723.203	1.032.632	1.010.375	1.810.137	1.165.578	1.182.649
average contig length [bp]	729.843	532.850	543.173	689.285	518.878	536.618

Table 2

Overview of gene annotation of 13 molluscan species used for gene family analysis.

Organism Scientific Name	Accession	Source	URL	# of Proteins
Aplysia californica	GCF_000002075.1	RefSeq	https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000002075.1/	21897
Crassostrea virginica	GCF_053477285.1	RefSeq	https://www.ncbi.nlm.nih.gov/datasets/gene/GCF_053477285.1/	53819
Doryteuthis pealeii	GCA_023376005.1	custom (Albertin et al., 2022)	https://metazoa.csb.univie.ac.at/CephData/dorPea.prot.gz	24931
Euprymna scolopes	GCA_024364805.1	Github (Rogers, 2025)	https://github.com/TheaFrances/E.scolopes-V2.2-BRAKER2-gene-annotation	31908
Gigantopelta aegis	GCF_016097555.1	RefSeq	https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_016097555.1/	24904
Lottia gigantea	GCF_000327385.1	RefSeq	https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000327385.1/	23822
Magallana gigas	GCF_963853765.1	RefSeq	https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_963853765.1/	35231
Nautilus pompilius	GWHBECW00000000	GWH	https://ngdc.cncb.ac.cn/gwh/Assembly/21849/show	16536
Octopus bimaculoides	GCF_001194135.2	RefSeq	https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_001194135.2/	29037
Octopus vulgaris	GCA_951406725.2	RefSeq	https://www.ncbi.nlm.nih.gov/datasets/gene/GCA_951406725.2/	30134
Pecten maximus	GCF_902652985.1	RefSeq	https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_902652985.1/	28975
Acanthosepion lycidas	GCA_963932145.1	Ensembl genebuild	https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sepia_lycidas/GCA_963932145.1/ensembl/geneset/2024_05/	35949
Sepia officinalis	GCA_050097725.1	this study	https://doi.org/10.17617/1.5n7h-4385	23768

Appendix 1—key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Biological sample (Sepia officinalis)	European cuttlefish, F1 individual	Eggs supplied by Flying Sharks – consultoria e inovação, Lda., Horta, Azores, Portugal		6-month-old adult; F1 from eggs collected in the Portuguese Atlantic; used for long-read DNA, Iso-Seq, Omni-C and short-read DNA library preparation
Biological sample (Sepia officinalis)	European cuttlefish, F0 individual	Eggs supplied by Université de Caen Normandie, France		8-month-old adult; F0 from eggs collected in Normandie, France; used for short-read RNA-seq library preparation
Commercial assay or kit	MagAttract HMW DNA Kit	Qiagen	Cat#67563	Genomic DNA extraction from flash-frozen brain tissue for PacBio HiFi library
Commercial assay or kit	NEB Monarch gDNA Purification Kit	New England Biolabs	Cat#T3010S	Genomic DNA extraction from optic lobe; used both for short-read Illumina library preparation and as template for sex-genotyping qPCR
Commercial assay or kit	Direct-zol RNA Miniprep Kit	Zymo Research	Cat#R2050	RNA isolation and DNase I treatment for Iso-Seq libraries
Commercial assay or kit	Direct-zol RNA Microprep Kit	Zymo Research	Cat#R2062	RNA isolation and DNase I treatment for short-read RNA-seq libraries
Commercial assay or kit	TeloPrime Full-Length cDNA Amplification Kit V2	Lexogen	Cat#013.08; Cat#013.24	Full-length cDNA synthesis targeting 5' cap and poly-A tail for Iso-Seq
Commercial assay or kit	SMRTbell express template prep kit 2.0	PacBio	Cat#100-938-900	Long-read library preparation for both HiFi DNA and Iso-Seq sequencing
Commercial assay or kit	Sequel II binding kit 2.2	PacBio	Cat#102-089-000	Used for HiFi DNA sequencing
Commercial assay or kit	Sequel II binding kit 2.1	PacBio	Cat#101-843-000	Used for Iso-Seq sequencing
Commercial assay or kit	Sequel II sequencing kit 2.0	PacBio	Cat#101-820-200	Used for PacBio Sequel II runs (HiFi DNA and Iso-Seq)
Commercial assay or kit	SMRT Cell 8 M tray	PacBio	Cat#101-389-001	5 SMRT cells used for HiFi DNA sequencing; 2 SMRT cells for Iso-Seq (pooled tissues +optic lobe)
Commercial assay or kit	Dovetail Omni-C Kit	Dovetail Genomics	Cat#21005	Omni-C proximity ligation library prepared from brain tissue
Commercial assay or kit	Illumina DNA PCR-Free Tagmentation Library Prep Kit	Illumina	Cat#20041794	Short-read DNA library preparation from 500 ng of high-MW gDNA
Commercial assay or kit	IDT for Illumina DNA/RNA UD Indexes, Set A	Illumina	Cat#20026121	Dual indexes for short-read DNA library
Commercial assay or kit	Illumina DNA PCR-Free Sequencing and Indexing primer	Illumina	Cat#20041797	Used during NextSeq2000 P3 sequencing of short-read DNA library
Commercial assay or kit	Qubit ssDNA Assay Kit	Thermo Fisher Scientific	Cat#Q10212	Quantification of dual-indexed single-stranded short-read DNA libraries
Commercial assay or kit	Illumina TruSeq Stranded mRNA Library Prep Kit	Illumina	Cat#20020594	Short-read RNA-seq libraries prepared from 300 ng total RNA
Commercial assay or kit	IDT for Illumina xGen UDI-UMI Adapters	Integrated DNA Technologies (IDT)	Cat#10005903	Adapters used with TruSeq Stranded mRNA library prep
Commercial assay or kit	Illumina NextSeq500 mid output flow cell (300 cycles)	Illumina	Cat#20024905	Used for short-read RNA sequencing
Commercial assay or kit	Illumina NextSeq2000 P3 flow cell (300 cycles)	Illumina	Cat#20040561	Used for short-read RNA and DNA sequencing
Commercial assay or kit	KAPA SYBR FAST qPCR Master Mix (2×Universal)	Roche / KAPA Biosystems	Cat#KK4600	qPCR master mix used for sex-chromosome genotyping qPCR
Chemical compound, drug	TRIzol Reagent	Invitrogen / Thermo Fisher Scientific	Cat#15596026	Homogenization reagent for RNA isolation from flash-frozen tissues (Iso-Seq and short-read RNA-seq)
Sequence-based reagent	SepOff_chr2_auto_G2_F (qPCR primer)	Rubino et al., 2025; 10.1101/2025.10.28.685099		5'-TTTGCCACTGTGTCCCTTTATAC-3'; forward primer targeting an autosomal locus on chromosome 2; used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting
Sequence-based reagent	SepOff_chr2_auto_G2_R (qPCR primer)	Rubino et al., 2025; 10.1101/2025.10.28.685099		5'-ACACACACAGGCTGCTTATTG-3'; reverse primer targeting an autosomal locus on chromosome 2; used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting
Sequence-based reagent	SepOff_chr46_sex_H2_F (qPCR primer)	Rubino et al., 2025; 10.1101/2025.10.28.685099		5'-TTTCAACCCATCTGCGTCTATAG-3'; forward primer targeting a sex-chromosomal locus on chromosome 46 used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting
Sequence-based reagent	SepOff_chr46_sex_H2_R (qPCR primer)	Rubino et al., 2025; 10.1101/2025.10.28.685099		5'-ACTCCTCTCGTTGCATGATTAC-3'; reverse primer targeting a sex-chromosomal locus on chromosome 46 used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting
Other	Lambda DNA-HindIII Digest	New England Biolabs	Cat#3012	Molecular weight ladder; 100 ng loaded alongside gDNA on 0.75% agarose gel to assess DNA integrity
Other	Hard-Shell 96-Well PCR Plates	Bio-Rad	Cat#HSP9601	96-well qPCR plates used for sex-chromosome genotyping
Other	Microseal 'B' PCR Plate Sealing Film	Bio-Rad	Cat#MSB1001	Adhesive sealing film used to seal 96-well qPCR plates for sex-chromosome genotyping
Software, algorithm	VecScreen	NCBI; https://www.ncbi.nlm.nih.gov/tools/vecscreen/	RRID:SCR_016577	Adapter/vector trimming of PacBio HiFi reads prior to assembly
Software, algorithm	Meryl	Rhie et al., 2020; 10.1186/s13059-020-02134-9		k-mer counting (k=21) for k-mer distribution estimation; bundled with Merqury
Software, algorithm	Merfin	Formenti et al., 2022; 10.1038/s41592-022-01445-y		Provides Meryl wrapper used for k-mer distribution estimation
Software, algorithm	GenomeScope 2.0	Ranallo-Benavidez et al., 2020; 10.1038/s41467-020-14998-3	RRID:SCR_017014	Genome size estimation from Illumina short reads and PacBio HiFi data
Software, algorithm	hifiasm	Cheng et al., 2021; 10.1038/s41592-020-01056-5	RRID:SCR_021069	Primary genome assembly from combined HiFi+Hi C reads; also used for mitochondrial assembly
Software, algorithm	YAHS	Zhou et al., 2023; 10.1093/bioinformatics/btac808	RRID:SCR_022965	Hi-C scaffolding on phased haplotype 1 with custom -r/-R/-q/--telo-motif parameters
Software, algorithm	JBAT (Juicebox Assembly Tools)	Dudchenko et al., 2018; 10.1101/254797		Manual curation of scaffolds into chromosome-scale scaffolds
Software, algorithm	BUSCO v5.5.0	Simão et al., 2015; 10.1093/bioinformatics/btv351	RRID:SCR_015008	Assembly and annotation completeness assessment using metazoa_odb10, metazoa_ob12, mollusca_odb10 and mollusca_odb12 lineages
Software, algorithm	minimap2	Li, 2018; 10.1093/bioinformatics/bty191	RRID:SCR_018550	Used for: aligning mt genome reference NC_007895.1 to long reads; aligning short and long RNA reads to genome; aligning HiFi reads to scaffolded assemblies for coverage
Software, algorithm	seqtk	Li, 2013	RRID:SCR_018927	seqtk subseq used to extract reads matching mt genome reference for mitochondrial assembly
Software, algorithm	RepeatMasker v4.1.7-p1	Smit et al., 2025; http://www.repeatmasker.org	RRID:SCR_012954	Soft-masking of repetitive elements (-xsmall, -gff); also used to characterize repeat content at scaffold junctions; run with rmblast v2.14.1+
Software, algorithm	RepeatModeler v2.0.6	Flynn et al., 2020; 10.1073/pnas.1921046117	RRID:SCR_015027	De novo repeat library construction (without LTRstruct option)
Software, algorithm	BRAKER3 (incl. TSEBRA)	Simão et al., 2015; Hoff et al., 2019; Brůna et al., 2021; Gabriel et al., 2021; Hoff et al., 2016; Stanke et al., 2006; Stanke et al., 2008; Li, 2023; Iwata and Gotoh, 2012; Gotoh, 2008; Buchfink et al., 2015; Kovaka et al., 2019; ; Huang and Li, 2023; Pertea and Pertea, 2020; Gabriel et al., 2024; 10.1007/978-1-4939-9173-0_5	RRID:SCR_018964	Gene model prediction via Docker container on softmasked genome; used both RNA-seq (--bam) and protein (--prot_seq) input; UTRs added with --addUTR=on; TSEBRA tuned to maximize BUSCO completeness on metazoa_odb10
Software, algorithm	StringTie v3.0.0	Shumate et al., 2022; 10.1371/journal.pcbi.1009730	RRID:SCR_016323	Transcript model prediction with --conservative and --mix options; GTFs merged with transcript merge mode
Software, algorithm	TransDecoder v5.7.0	Haas, 2026; https://github.com/TransDecoder/TransDecoder	RRID:SCR_017647	Translation of coding regions in transcripts (default parameters)
Software, algorithm	OMArk v0.3.0	Nevers et al., 2022; 10.1101/2022.11.25.517970		Annotation completeness assessment; ancestral clade Lophotrochozoa; run on webserver without splice information
Software, algorithm	InterProScan v5.73–104	Blum et al., 2025; 10.1093/nar/gkae1082	RRID:SCR_005829	Protein orthology and GO annotation with options -iprlookup -goterms
Software, algorithm	eggNOG-mapper v2.1.12	Cantalapiedra et al., 2021; 10.1093/molbev/msab293	RRID:SCR_021165	Functional/orthology annotation via webserver with eggNOG v5.0 database, default parameters
Software, algorithm	Winnowmap2	Jain et al., 2022; 10.1038/s41592-022-01457-8	RRID:SCR_025349	Whole-genome pairwise alignments of S. officinalis and A. esculentum (GCA_964036315.1) assemblies
Software, algorithm	R v4.4.2	R Development Core Team, 2024	RRID:SCR_001905	Statistical environment for downstream analyses and visualization (whole-genome alignment plots and other custom scripts)
Software, algorithm	GENESPACE v1.2.3	Lovell et al., 2022; 10.7554/eLife.78526		Pairwise synteny analysis across all chromosomes of compared species with default parameters; riparian plots and pairwise dotplots
Software, algorithm	DIAMOND2	Buchfink et al., 2015; 10.1038/nmeth.3176	RRID:SCR_016071	Protein sequence similarity in fast mode within GENESPACE
Software, algorithm	OrthoFinder v2.5	Emms and Kelly, 2019; 10.1186/s13059-019-1832-y	RRID:SCR_017118	Orthogroup and pairwise orthologue inference with hierarchical orthogroups (HOGs); used within GENESPACE
Software, algorithm	OrthoFinder v3.1.0	Emms et al., 2025; 10.1101/2025.07.15.664860	RRID:SCR_017118	Orthogroup inference across 13 molluscan species for gene family expansion analysis; default parameters; rooted species tree generated via STAG (Ponte et al., 2023) and STRIDE (Andrews et al., 2013)
Software, algorithm	MCScanX	Wang et al., 2012; 10.1093/nar/gkr1293	RRID:SCR_022067	Pairwise syntenic block identification (onlyOgAnchors = TRUE, blkSize = 5, nGaps = 5, blkRadius = 25, synBuff = 100, nSecondaryHits = 0)
Software, algorithm	dbscan (R package)	Hahsler et al., 2019; 10.18637/jss.v091.i01		Density-based clustering of MCScanX anchor hits into syntenic regions
Software, algorithm	bwa-mem2 v2.3	Vasimuddin et al., 2019; 10.1109/IPDPS.2019.00041	RRID:SCR_022192	Alignment of Hi-C reads for breakpoint coverage analysis
Software, algorithm	pairtools v1.1.0	Abdennur et al., 2023; 10.1101/2023.02.13.528389	RRID:SCR_023038	Quantification of Hi-C contacts; extraction of trans pairs from deduplicated read pairs (pair type UU)
Software, algorithm	pysam v0.22.1	pysam-developers, 2026; https://github.com/pysam-developers/pysam	RRID:SCR_021017	HiFi read depth via count_coverage (MAPQ ≥10, 1 kb bins); spanning reads identified by querying split alignments
Software, algorithm	STAR v2.7.11b	Dobin et al., 2013; 10.1093/bioinformatics/bts635	RRID:SCR_004463	Alignment of short reads to chromosome-scale assembly (sex chromosome analysis and RNA-seq)
Software, algorithm	mosdepth	Pedersen and Quinlan, 2018; 10.1093/bioinformatics/btx699	RRID:SCR_018929	Sequencing coverage calculation for sex chromosome analysis
Software, algorithm	ape v5.8.1 (R package)	Paradis and Schliep, 2019; 10.1093/bioinformatics/bty633	RRID:SCR_017343	Conversion of rooted OrthoFinder species tree to ultrametric tree
Software, algorithm	CAFE5 v5.1.1	Mendes et al., 2021; 10.1093/bioinformatics/btaa1022	RRID:SCR_018924	Gene family evolution rate estimation
Software, algorithm	bedtools v2.30	Quinlan and Hall, 2010; 10.1093/bioinformatics/btq033	RRID:SCR_006646	bedtools intersect for CDS–RepeatMasker overlap analysis of expanded gene family members
Software, algorithm	featureCounts (Subread v2.0.8)	Liao et al., 2014; 10.1093/bioinformatics/btt656	RRID:SCR_012919	Gene-level read counting from STAR-aligned RNA-seq BAMs (-t exon, -g gene_id, -p --countReadPairs, -Q 255)
Software, algorithm	DESeq2 v1.42.0	Love et al., 2014; 10.1186/s13059-014-0550-8	RRID:SCR_015687	Tissue marker identification in bulk RNA-seq data
Software, algorithm	apeglm	Zhu et al., 2019; 10.1093/bioinformatics/bty895	RRID:SCR_026951	log2 fold-change shrinkage applied to DESeq2 results
Software, algorithm	clusterProfiler v4.12.6	Yu et al., 2012; 10.1089/omi.2011.0118	RRID:SCR_016884	GO enrichment via enricher() with custom GO annotations from InterProScan

Author response table 1

	soff250801_mpibr			soff250801_dtol
contigs	p_ctg	hap1	hap2	P_ctg	hap1	hap2
records	8.289	10.651	10.425	8.783	11.026	11.089
length.raw	6.049.669.443	5.675.386.986	5.662.586.038	6.053.996.452	5.721.157.269	5.950.565.264
length._min	2.496	2.959	2.959	5.151	5.999	6.915
length.n25	825.171	518.304	518.787	865.207	560.043	580.263
length.n50	1.723.203	1.032.632	1.010.375	1.810 .137	1.165.578	1.182.649
length.n75	3.129.788	1.845.042	1.789 .755	3.317 .923	2.125.467	2.203.556
length.max	14.924.420	7.470 .229	10.284.785	15.032.877	9.223.473	11.008.627
length.med	309.429	293.763	314.184	245.676	239.367	250.010
length.avg	729.843	532.850	543.173	689.285	518.878	536.618
length.top46	339.372.830	213.767 .346	225.912.626	364.533 .811	278.738 .760	283.344.104
frac.top46	5,609774769	3,766568633	3,98956633	6,021374705	4,872069529	4,761633415

Additional files

MDAR checklist: https://cdn.elifesciences.org/articles/107393/elife-107393-mdarchecklist1-v1.docx
Download elife-107393-mdarchecklist1-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Simone Daniela Rencken
Georgi Tushev
David Hain
Elena Ciirdaeva
Oleg Simakov
Gilles Laurent

(2026)

Chromosome-scale genome assembly of the European common cuttlefish Sepia officinalis

eLife 14:RP107393.

https://doi.org/10.7554/eLife.107393.3

Figures

Sepia officinalis assembly statistics and quality control.

HapHiC scaffolding for different numbers of expected chromosome scaffolds show 47 chromosomes as most supported.

Comparison of two Sepia officinalis chromosome-scale assemblies indicates chromosome number of 1n=46.

BUSCO completeness results.

Analysis of raw data at breakpoints between S. officinalis assemblies hints at a technical cause of breakpoints.

Syntenic comparison of three decapod species.

Syntenic relationship between S. officinalis and D. pealeii chromosomes.

Syntenic relationship between S. officinalis and E. scolopes chromosomes.

Syntenic comparison of four decapod species hints at a cephalopod sex chromosome.

Genome annotation for Sepia officinalis.

Gene family expansion analysis.

Expression of expanded gene families in tissue bulk RNA-seq data.

Analysis of Hi-C read pairs from both S. officinalis assemblies.

Quantification of repeat content in chromosome scaffolds and unplaced residual scaffolds.

Tables

Statistics of S. officinalis assemblies from two independent datasets, assembled using a common pipeline.

Overview of gene annotation of 13 molluscan species used for gene family analysis.

Additional files

MDAR checklist

Download links

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Be the first to read new articles from eLife

Share this article

Cite this article

Sepia officinalis assembly statistics and quality control.

HapHiC scaffolding for different numbers of expected chromosome scaffolds show 47 chromosomes as most supported.

Comparison of two Sepia officinalis chromosome-scale assemblies indicates chromosome number of 1n=46.

BUSCO completeness results.

Analysis of raw data at breakpoints between S. officinalis assemblies hints at a technical cause of breakpoints.

Syntenic comparison of three decapod species.

Syntenic relationship between S. officinalis and D. pealeii chromosomes.

Syntenic relationship between S. officinalis and E. scolopes chromosomes.

Syntenic comparison of four decapod species hints at a cephalopod sex chromosome.

Genome annotation for Sepia officinalis.

Gene family expansion analysis.

Expression of expanded gene families in tissue bulk RNA-seq data.

Analysis of Hi-C read pairs from both S. officinalis assemblies.

Quantification of repeat content in chromosome scaffolds and unplaced residual scaffolds.

Statistics of S. officinalis assemblies from two independent datasets, assembled using a common pipeline.

Overview of gene annotation of 13 molluscan species used for gene family analysis.

MDAR checklist

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)