Chromosome-scale genome assembly of the European common cuttlefish Sepia officinalis

  1. Simone Daniela Rencken
  2. Georgi Tushev
  3. David Hain
  4. Elena Ciirdaeva
  5. Oleg Simakov
  6. Gilles Laurent  Is a corresponding author
  1. Max Planck Institute for Brain Research, Germany
  2. Radboud University, Donders Institute for Brain, Cognition and Behaviour, Netherlands
  3. Faculty of Biological Sciences, Goethe University, Germany
  4. Department of Neuroscience and Developmental Biology, University of Vienna, Austria
7 figures, 4 tables and 1 additional file

Figures

Figure 1 with 1 supplement
Sepia officinalis assembly statistics and quality control.

(A) Specimen of S. officinalis (credit: Stephan Junek, MPI for Brain Research). (B) Overview of the genome assembly workflow. Genome size was estimated from short DNA reads (Illumina) using GenomeScope (Ranallo-Benavidez et al., 2020; Vurture et al., 2017). The primary assembly was generated from long DNA reads (PacBio Sequel II) and chromatin conformation capture (Hi-C) reads (Dovetail OmniC) with hifiasm (Cheng et al., 2021). Assembly was scaffolded with YAHS (Zhou et al., 2023) and residual small scaffolds were manually placed in chromosomes. (C) Snail plot of chromosome-scale S. officinalis assembly generated using blobtools2 (Challis et al., 2020) showing scaffold statistics (e.g. number of scaffolds, median scaffold length N50), base composition, and completeness measured using Benchmarking Universal Single-Copy Orthologs (BUSCO) (Simão et al., 2015) against the metazoa_odb12 database. (D) Hi-C heatmap showing the 47 chromosome-scale scaffolds with few sequences remaining in unplaced scaffolds. X and y-axes show the genome position in Mbp. The heatmap was generated using juicebox (Dudchenko et al., 2018), 0–7039 observed counts (balanced) are shown.

Figure 1—figure supplement 1
HapHiC scaffolding for different numbers of expected chromosome scaffolds show 47 chromosomes as most supported.

Hi-C contact maps from HapHiC (Zeng et al., 2024) are shown for 46, 47, 48, 49, and 50 expected chromosome scaffolds. Assembled chromosomes are shown as blue boxes, Hi-C signal indicating a false (unsupported) merger is shown by cyan arrow, false splits are shown by black arrows. The contact maps differ from the map shown in Figure 1, which was created using YAHS and manual curation.

Figure 2 with 2 supplements
Comparison of two Sepia officinalis chromosome-scale assemblies indicates chromosome number of 1n=46.

Datasets were collected from two S. officinalis animals, one as described in this study (MPIBR), the second by the Darwin Tree of Life consortium (DToL) (Blaxter et al., 2022). Both datasets were assembled using a common pipeline (hifiasm and YAHS). (A) Hi-C contact map of the MPIBR primary assembly, scaffolded using YAHS without manual curation. Assembled 47 chromosome scaffolds are shown as blue boxes. (B) Hi-C contact map of DToL primary assembly, scaffolded using YAHS without manual curation, showing 49 assembled chromosome scaffolds as blue boxes. (C) Whole-genome alignment of both scaffolded assemblies using Winnowmap2 (Jain et al., 2022), showing DToL on x-axis and MPIBR on the y-axis. The 4 ‘breakpoints’ of chromosomes in either of the assemblies (three breaks in DToL chromosomes compared to MPIBR, one break in MPIBR compared to DToL) are highlighted in different colors. (D) Ribbon diagram showing the four breakpoints from (C) compared to the chromosome-scale assembly from another cuttlefish, Acanthosepion esculentum (1n=46). The color of breakpoints are the same in panels C+D.

Figure 2—figure supplement 1
BUSCO completeness results.

(A) Comparison of two S. officinalis chromosome-scale assemblies, which were constructed from two independent datasets (this study: MPIBR, Darwin Tree of Life project: DToL), assembled using a common pipeline (hifiasm Cheng et al., 2021 with PacBio HiFi and Hi-C reads). Results for the database metazoa_odb12, the zoom in shows only duplicated, fragmented, and missing fractions to improve readability. The DToL assemblies have slightly higher completeness than MPIBR, due to the higher sequencing coverage used as input. In both datasets, compared to the primary assembly (‘.hic’), the phased haplotypes (‘.hic.hap1’ and ‘hic.hap2’) have less duplicated but more missing genes. (B) BUSCO results for the mollusca_odb12 database, showing the same trend as in (A). (C) Comparison of different BUSCO databases odb10 and odb12 on the manually curated assembly (‘sepoff241117’). For the mollusca gene sets (top), a strong improvement in completeness was observed between odb10 and odb12, reflecting that the updated gene set is more concise and conserved across species. For the metazoa gene sets (bottom), the completeness was marginally increased for odb12 compared to odb10.

Figure 2—figure supplement 2
Analysis of raw data at breakpoints between S. officinalis assemblies hints at a technical cause of breakpoints.

(A) Coverage of HiC and HiFi data shown for pairs of scaffolds exhibiting breakpoints. Blue shows MPIBR data, orange shows Darwin Tree of Life project (DToL) data. For each breakpoint, trans HiC contacts are shown on top across the full scaffold, with terminal 200 kb windows highlighted in yellow. Both terminal windows are shown below with aligned HiFi reads (gray horizontal bars) and normalized HiFi read density. Trans HiC contacts are shown as purple dots. Right gray box: same data shown for the complete breakpoint scaffold of the other assembly, with trans HiC contacts calculated to a size-matched scaffold. (B) Distribution of normalized trans HiC contact rate (pairs per Mb Young, 1963a) for random scaffold pairs (‘background pairs,’ gray) and within scaffolds (‘intra scaffold,’ green) for MPIBR (left) and DToL (right) data. Values for scaffolds with breakpoints are indicated in blue and orange, respectively. (C) Histogram of contact rates from (B) shown for random scaffold pairs and breakpoint pairs. Contact rates and empirical p-values of breakpoint pairs are indicated in blue (left, MPIBR) and orange (right, DToL). Joint p-value for three rates for DToL breakpoints is indicated in box (Wilcoxon rank-sum, one-tailed). (D) Repeat analysis of 200 kb scaffold ends at breakpoints and control scaffolds (gray box). Overall repeat content (% of base pairs) and type are shown.

Figure 3 with 3 supplements
Syntenic comparison of three decapod species.

(A) Taxonomy of selected cephalopod species showing their genome size (in gigabases, Gb) and haploid chromosome numbers. Taxonomy information was downloaded from NCBI taxonomy browser, divergence times for Coleoidea and Decapodiformes from Kröger et al., 2011 and for Sepiidae from López-Córdova et al., 2022. (B) Genome-wide syntenic relationship between chromosomes of E. scolopes (Albertin et al., 2022) (top), D. pealeii (Albertin et al., 2022) (middle), and S. officinalis (bottom). Colored braids connect syntenic regions across genomes, with chromosomes drawn to physical scale. Euprymna chromosomes 45 and 46 are not shown because they contain too few orthogroups. (C) Detailed synteny of Sepia chromosomes 40 (magenta) and 43 (dark blue) shown, that are joined in the other species and cause the different haploid chromosome number in Sepia. Riparian plots were generated using GENESPACE v1.2.3 (Lovell et al., 2022).

Figure 3—figure supplement 1
Syntenic relationship between S. officinalis and D. pealeii chromosomes.

Dot plot showing finer-resolution syntenic anchor hits (perfectly collinear blast hits within the same orthogroup). Genes are ordered along the chromosomes, only chromosome pairs with a minimum synteny score of 10 and at least 10 syntenic genes are shown. Synteny analysis and visualization were performed using GENESPACE v1.2.3 (Lovell et al., 2022).

Figure 3—figure supplement 2
Syntenic relationship between S. officinalis and E. scolopes chromosomes.

Dot plot showing finer-resolution syntenic anchor hits (perfectly collinear blast hits within the same orthogroup). Genes are ordered along the chromosomes, only chromosome pairs with a minimum synteny score of 10 and at least 10 syntenic genes are shown. E. scolopes chromosomes 45 and 46 are not shown because they contain too few orthogroups. Synteny analysis and visualization were performed using GENESPACE v1.2.3 (Lovell et al., 2022).

Figure 3—figure supplement 3
Syntenic comparison of four decapod species hints at a cephalopod sex chromosome.

(A) Riparian plot showing synteny relationships of chromosomes from four decapod species, generated using GENESPACE (Lovell et al., 2022) with orthogroups. Euprymna chromosomes 45 and 46 are not shown because they contain too few orthogroups. Chromosome split in S. officinalis compared to other species is shown in purple, putative sex chromosome as identified recently (Coffing et al., 2025) is shown in cyan. (B) Normalized coverage of sequencing data in S. officinalis chromosomes. (C) Normalized coverage of short reads to female A. esculentum genome, reproduced from Coffing et al., 2025. Decrease in read coverage for chromosome 46 is visible, the putative Z sex chromosome. Read depth was calculated from Illumina gDNA reads in windows of 500,000 bp and normalized to the median coverage of chromosome 1. Box plots showing median divergence (box dividing line), interquartile range (box), and 1.5 times the interquartile range (whiskers). The putative Z chromosome is highlighted in cyan. Chromosomes with significantly reduced read coverage (orange label) were identified by a one-sided Wilcoxon rank-sum test of each chromosome’s normalized depth windows against all remaining chromosomes (Benjamini-Hochberg-corrected, at least 10% decrease in median normalized depth, *p<0.5, **p<0.01, ***p<0.001).

Figure 4 with 1 supplement
Genome annotation for Sepia officinalis.

(A) Annotation of repeat landscape of the S. officinalis genome, annotated using RepeatModeler (Flynn et al., 2020). Full repeat landscape is shown on the left, annotated repeats (excluding unclassified or simple repeats) are shown on the right. (B–C) Quality control of gene annotation and comparison to two other cuttlefish species using OMArk (Nevers et al., 2022). Results shown for Acanthosepion lycidas (GCA_963932145.1, Ensembl Genebuild), Sepia officinalis (BRAKER, this study), and Acanthosepion pharaonis (Song et al., 2021) (BRAKER). Lophotrochozoa was used as the ancestral clade. (B) Completeness assessed by the presence of genes conserved in the clade, classified as single or multiple copies (duplicated), or missing. (C) Consistency assessed by the proportion of proteins placed in the correct lineage (consistent); placement in incorrect lineages randomly (inconsistent) or to specific species (contamination), or no placement in known gene families (unknown). (D) Phylogenetic tree of 13 molluscan species used for analysis of gene families with Orthofinder (Emms et al., 2025). Species are colored by clade: purple = coleoid cephalopods, blue = nautiloid (non-coleoid cephalopod), green = non-cephalopod mollusk. (E) Heatmap of largest gene families (orthogroups from Orthofinder, with more than 100 genes in any species), ordered from largest gene count across all species on the left. Families with at least one gene in S. officinalis are depicted. Rows show gene counts for each species (color capped at 500 genes), columns show orthogroups and their annotation by eggNOG mapper (Cantalapiedra et al., 2021; Huerta-Cepas et al., 2017) or InterProScan (Blum et al., 2025), if available. Clade colors match (D).

Figure 4—figure supplement 1
Gene family expansion analysis.

(A) Gene family expansion analysis using CAFE5 (Mendes et al., 2021) with a gamma model (k=3) on all smaller gene families (less than 100 genes in any species). 30 families with the most change in different categories are shown (expanded only in S. officinalis (pink), in all coleoids (orange), in all species (yellow), in non-cephalopod mollusks (green), or overall contraction (blue)). Rows show change (expansion or contraction) of gene families in any species, columns show orthogroups and annotation, if available. Dots show significant change (p<0.05), gene counts are shown for any orthogroup with at least 12 genes in any species. (B) Gene families with differential expression in bulk RNA-seq data. Dot size shows the number of differentially expressed (DE) genes for each tissue. (C) Dotplots of enriched (gene ontology GO) terms for large gene families, enriched using clusterProfiler using a hypergeometric test. Dot size shows the number of expressed genes per family with this GO term, x-axis shows percentage of expressed genes from all genes with this GO term. Dot color shows adjusted p-value after Benjamini-Hochberg false discovery rate (FDR) correction. CC: cellular component, MF: molecular function, BP: biological process. (D) Heatmap of z-scored expression of all DE genes from the gene families with enriched GO terms.

Expression of expanded gene families in tissue bulk RNA-seq data.

Bulk RNA-seq data collected from one adult S. officinalis from different brain tissues (optic lobes - yellow, basal lobes - turquoise, vertical and subvertical lobes - orange, posterior subesophageal mass - purple), retina (red), and skin (blue, from the dorsal mantle). Tissue color code is identical throughout the figure. (A) Principal component analysis (PCA) of the data, showing the first 2 PCs, colored by tissue. (B) Barplot showing number of differentially expressed (DE) genes (i.e. marker genes) for each tissue, calculated against all other tissues using DESeq2 (Love et al., 2014). (C) Largest gene families (orthogroups) with differential expression in bulk RNA-seq data. Dot size shows the number of DE genes for each tissue. Families with enriched gene ontology (GO) terms are highlighted in gray. (D+E) Dotplots of enriched gene ontology (Aleksander et al., 2026; Ashburner et al., 2000) (GO) terms for large gene families, enriched using clusterProfiler (Xu et al., 2024) using a hypergeometric test. Dot size shows the number of expressed genes per family with this GO term, x-axis shows the percentage of expressed genes from all genes with this GO term. Dot color shows the adjusted p-value after Benjamini-Hochberg false discovery rate (FDR) correction. CC: cellular component, MF: molecular function, BP: biological process. (F) Heatmap of z-scored expression of all DE genes from the largest gene families with enriched GO terms.

Author response image 1
Analysis of Hi-C read pairs from both S. officinalis assemblies.

Hi-C reads were aligned to the primary contigs from hifiasm (as is used for scaffolding with YAHS) and analyzed using pairtools. Note the higher fraction of long-range contacts (at least 1 kb cis pairs or trans pairs) in the MPIBR data (top) compared to DToL (bottom). Due to overall higher coverage, the absolute number of read pairs is higher for DToL than for MPIBR data.

Author response image 2
Quantification of repeat content in chromosome scaffolds and unplaced residual scaffolds.

Density plot showing fraction of repeat masked bases in total sequence length for chromosome scaffolds (i.e. scaffolds 1-47) in teal and all remaining small scaffolds (1840 scaffolds) in purple. Median repeat fraction is shown as vertical lines.

Tables

Table 1
Statistics of S. officinalis assemblies from two independent datasets, assembled using a common pipeline.
MPIBR reassemblyDToL reassembly
versionp_ctghap1hap2p_ctghap1hap2
number of contigs8.28910.65110.4258.78311.02611.089
raw length [bp]6.049.669.4435.675.386.9865.662.586.0386.053.996.4525.721.157.2695.950.565.264
N50 length [bp]1.723.2031.032.6321.010.3751.810.1371.165.5781.182.649
average contig length [bp]729.843532.850543.173689.285518.878536.618
Table 2
Overview of gene annotation of 13 molluscan species used for gene family analysis.
Organism Scientific NameAccessionSourceURL# of Proteins
Aplysia californicaGCF_000002075.1RefSeqhttps://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000002075.1/21897
Crassostrea virginicaGCF_053477285.1RefSeqhttps://www.ncbi.nlm.nih.gov/datasets/gene/GCF_053477285.1/53819
Doryteuthis pealeiiGCA_023376005.1custom (Albertin et al., 2022)https://metazoa.csb.univie.ac.at/CephData/dorPea.prot.gz24931
Euprymna scolopesGCA_024364805.1Github (Rogers, 2025)https://github.com/TheaFrances/E.scolopes-V2.2-BRAKER2-gene-annotation31908
Gigantopelta aegisGCF_016097555.1RefSeqhttps://www.ncbi.nlm.nih.gov/datasets/genome/GCF_016097555.1/24904
Lottia giganteaGCF_000327385.1RefSeqhttps://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000327385.1/23822
Magallana gigasGCF_963853765.1RefSeqhttps://www.ncbi.nlm.nih.gov/datasets/genome/GCF_963853765.1/35231
Nautilus pompiliusGWHBECW00000000GWHhttps://ngdc.cncb.ac.cn/gwh/Assembly/21849/show16536
Octopus bimaculoidesGCF_001194135.2RefSeqhttps://www.ncbi.nlm.nih.gov/datasets/genome/GCF_001194135.2/29037
Octopus vulgarisGCA_951406725.2RefSeqhttps://www.ncbi.nlm.nih.gov/datasets/gene/GCA_951406725.2/30134
Pecten maximusGCF_902652985.1RefSeqhttps://www.ncbi.nlm.nih.gov/datasets/genome/GCF_902652985.1/28975
Acanthosepion lycidasGCA_963932145.1Ensembl genebuildhttps://ftp.ebi.ac.uk/pub/ensemblorganisms/Sepia_lycidas/GCA_963932145.1/ensembl/geneset/2024_05/35949
Sepia officinalisGCA_050097725.1this studyhttps://doi.org/10.17617/1.5n7h-438523768
Appendix 1—key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Biological sample (Sepia officinalis)European cuttlefish, F1 individualEggs supplied by Flying Sharks – consultoria e inovação, Lda., Horta, Azores, Portugal6-month-old adult; F1 from eggs collected in the Portuguese Atlantic; used for long-read DNA, Iso-Seq, Omni-C and short-read DNA library preparation
Biological sample (Sepia officinalis)European cuttlefish, F0 individualEggs supplied by Université de Caen Normandie, France8-month-old adult; F0 from eggs collected in Normandie, France; used for short-read RNA-seq library preparation
Commercial assay or kitMagAttract HMW DNA KitQiagenCat#67563Genomic DNA extraction from flash-frozen brain tissue for PacBio HiFi library
Commercial assay or kitNEB Monarch gDNA Purification KitNew England BiolabsCat#T3010SGenomic DNA extraction from optic lobe; used both for short-read Illumina library preparation and as template for sex-genotyping qPCR
Commercial assay or kitDirect-zol RNA Miniprep KitZymo ResearchCat#R2050RNA isolation and DNase I treatment for Iso-Seq libraries
Commercial assay or kitDirect-zol RNA Microprep KitZymo ResearchCat#R2062RNA isolation and DNase I treatment for short-read RNA-seq libraries
Commercial assay or kitTeloPrime Full-Length cDNA Amplification Kit V2LexogenCat#013.08; Cat#013.24Full-length cDNA synthesis targeting 5' cap and poly-A tail for Iso-Seq
Commercial assay or kitSMRTbell express template prep kit 2.0PacBioCat#100-938-900Long-read library preparation for both HiFi DNA and Iso-Seq sequencing
Commercial assay or kitSequel II binding kit 2.2PacBioCat#102-089-000Used for HiFi DNA sequencing
Commercial assay or kitSequel II binding kit 2.1PacBioCat#101-843-000Used for Iso-Seq sequencing
Commercial assay or kitSequel II sequencing kit 2.0PacBioCat#101-820-200Used for PacBio Sequel II runs (HiFi DNA and Iso-Seq)
Commercial assay or kitSMRT Cell 8 M trayPacBioCat#101-389-0015 SMRT cells used for HiFi DNA sequencing; 2 SMRT cells for Iso-Seq (pooled tissues +optic lobe)
Commercial assay or kitDovetail Omni-C KitDovetail GenomicsCat#21005Omni-C proximity ligation library prepared from brain tissue
Commercial assay or kitIllumina DNA PCR-Free Tagmentation Library Prep KitIlluminaCat#20041794Short-read DNA library preparation from 500 ng of high-MW gDNA
Commercial assay or kitIDT for Illumina DNA/RNA UD Indexes, Set AIlluminaCat#20026121Dual indexes for short-read DNA library
Commercial assay or kitIllumina DNA PCR-Free Sequencing and Indexing primerIlluminaCat#20041797Used during NextSeq2000 P3 sequencing of short-read DNA library
Commercial assay or kitQubit ssDNA Assay KitThermo Fisher ScientificCat#Q10212Quantification of dual-indexed single-stranded short-read DNA libraries
Commercial assay or kitIllumina TruSeq Stranded mRNA Library Prep KitIlluminaCat#20020594Short-read RNA-seq libraries prepared from 300 ng total RNA
Commercial assay or kitIDT for Illumina xGen UDI-UMI AdaptersIntegrated DNA Technologies (IDT)Cat#10005903Adapters used with TruSeq Stranded mRNA library prep
Commercial assay or kitIllumina NextSeq500 mid output flow cell (300 cycles)IlluminaCat#20024905Used for short-read RNA sequencing
Commercial assay or kitIllumina NextSeq2000 P3 flow cell (300 cycles)IlluminaCat#20040561Used for short-read RNA and DNA sequencing
Commercial assay or kitKAPA SYBR FAST qPCR Master Mix (2×Universal)Roche / KAPA BiosystemsCat#KK4600qPCR master mix used for sex-chromosome genotyping qPCR
Chemical compound, drugTRIzol ReagentInvitrogen / Thermo Fisher ScientificCat#15596026Homogenization reagent for RNA isolation from flash-frozen tissues (Iso-Seq and short-read RNA-seq)
Sequence-based reagentSepOff_chr2_auto_G2_F (qPCR primer)Rubino et al., 2025;
10.1101/2025.10.28.685099
5'-TTTGCCACTGTGTCCCTTTATAC-3'; forward primer targeting an autosomal locus on chromosome 2; used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting
Sequence-based reagentSepOff_chr2_auto_G2_R (qPCR primer)Rubino et al., 2025;
10.1101/2025.10.28.685099
5'-ACACACACAGGCTGCTTATTG-3'; reverse primer targeting an autosomal locus on chromosome 2; used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting
Sequence-based reagentSepOff_chr46_sex_H2_F (qPCR primer)Rubino et al., 2025;
10.1101/2025.10.28.685099
5'-TTTCAACCCATCTGCGTCTATAG-3'; forward primer targeting a sex-chromosomal locus on chromosome 46 used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting
Sequence-based reagentSepOff_chr46_sex_H2_R (qPCR primer)Rubino et al., 2025;
10.1101/2025.10.28.685099
5'-ACTCCTCTCGTTGCATGATTAC-3'; reverse primer targeting a sex-chromosomal locus on chromosome 46 used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting
OtherLambda DNA-HindIII DigestNew England BiolabsCat#3012Molecular weight ladder; 100 ng loaded alongside gDNA on 0.75% agarose gel to assess DNA integrity
OtherHard-Shell 96-Well PCR PlatesBio-RadCat#HSP960196-well qPCR plates used for sex-chromosome genotyping
OtherMicroseal 'B' PCR Plate Sealing FilmBio-RadCat#MSB1001Adhesive sealing film used to seal 96-well qPCR plates for sex-chromosome genotyping
Software, algorithmVecScreenNCBI; https://www.ncbi.nlm.nih.gov/tools/vecscreen/RRID:SCR_016577Adapter/vector trimming of PacBio HiFi reads prior to assembly
Software, algorithmMerylRhie et al., 2020; 10.1186/s13059-020-02134-9k-mer counting (k=21) for k-mer distribution estimation; bundled with Merqury
Software, algorithmMerfinFormenti et al., 2022; 10.1038/s41592-022-01445-yProvides Meryl wrapper used for k-mer distribution estimation
Software, algorithmGenomeScope 2.0Ranallo-Benavidez et al., 2020; 10.1038/s41467-020-14998-3RRID:SCR_017014Genome size estimation from Illumina short reads and PacBio HiFi data
Software, algorithmhifiasmCheng et al., 2021; 10.1038/s41592-020-01056-5RRID:SCR_021069Primary genome assembly from combined HiFi+Hi C reads; also used for mitochondrial assembly
Software, algorithmYAHSZhou et al., 2023; 10.1093/bioinformatics/btac808RRID:SCR_022965Hi-C scaffolding on phased haplotype 1 with custom -r/-R/-q/--telo-motif parameters
Software, algorithmJBAT (Juicebox Assembly Tools)Dudchenko et al., 2018; 10.1101/254797Manual curation of scaffolds into chromosome-scale scaffolds
Software, algorithmBUSCO v5.5.0Simão et al., 2015; 10.1093/bioinformatics/btv351RRID:SCR_015008Assembly and annotation completeness assessment using metazoa_odb10, metazoa_ob12, mollusca_odb10 and mollusca_odb12 lineages
Software, algorithmminimap2Li, 2018; 10.1093/bioinformatics/bty191RRID:SCR_018550Used for: aligning mt genome reference NC_007895.1 to long reads; aligning short and long RNA reads to genome; aligning HiFi reads to scaffolded assemblies for coverage
Software, algorithmseqtkLi, 2013RRID:SCR_018927seqtk subseq used to extract reads matching mt genome reference for mitochondrial assembly
Software, algorithmRepeatMasker v4.1.7-p1Smit et al., 2025; http://www.repeatmasker.orgRRID:SCR_012954Soft-masking of repetitive elements (-xsmall, -gff); also used to characterize repeat content at scaffold junctions; run with rmblast v2.14.1+
Software, algorithmRepeatModeler v2.0.6Flynn et al., 2020; 10.1073/pnas.1921046117RRID:SCR_015027De novo repeat library construction (without LTRstruct option)
Software, algorithmBRAKER3 (incl. TSEBRA)Simão et al., 2015; Hoff et al., 2019; Brůna et al., 2021; Gabriel et al., 2021; Hoff et al., 2016; Stanke et al., 2006; Stanke et al., 2008; Li, 2023; Iwata and Gotoh, 2012; Gotoh, 2008; Buchfink et al., 2015; Kovaka et al., 2019; ; Huang and Li, 2023; Pertea and Pertea, 2020; Gabriel et al., 2024; 10.1007/978-1-4939-9173-0_5RRID:SCR_018964Gene model prediction via Docker container on softmasked genome; used both RNA-seq (--bam) and protein (--prot_seq) input; UTRs added with --addUTR=on; TSEBRA tuned to maximize BUSCO completeness on metazoa_odb10
Software, algorithmStringTie v3.0.0Shumate et al., 2022; 10.1371/journal.pcbi.1009730RRID:SCR_016323Transcript model prediction with --conservative and --mix options; GTFs merged with transcript merge mode
Software, algorithmTransDecoder v5.7.0Haas, 2026; https://github.com/TransDecoder/TransDecoderRRID:SCR_017647Translation of coding regions in transcripts (default parameters)
Software, algorithmOMArk v0.3.0Nevers et al., 2022; 10.1101/2022.11.25.517970Annotation completeness assessment; ancestral clade Lophotrochozoa; run on webserver without splice information
Software, algorithmInterProScan v5.73–104Blum et al., 2025; 10.1093/nar/gkae1082RRID:SCR_005829Protein orthology and GO annotation with options -iprlookup -goterms
Software, algorithmeggNOG-mapper v2.1.12Cantalapiedra et al., 2021; 10.1093/molbev/msab293RRID:SCR_021165Functional/orthology annotation via webserver with eggNOG v5.0 database, default parameters
Software, algorithmWinnowmap2Jain et al., 2022; 10.1038/s41592-022-01457-8RRID:SCR_025349Whole-genome pairwise alignments of S. officinalis and A. esculentum (GCA_964036315.1) assemblies
Software, algorithmR v4.4.2R Development Core Team, 2024RRID:SCR_001905Statistical environment for downstream analyses and visualization (whole-genome alignment plots and other custom scripts)
Software, algorithmGENESPACE v1.2.3Lovell et al., 2022; 10.7554/eLife.78526Pairwise synteny analysis across all chromosomes of compared species with default parameters; riparian plots and pairwise dotplots
Software, algorithmDIAMOND2Buchfink et al., 2015; 10.1038/nmeth.3176RRID:SCR_016071Protein sequence similarity in fast mode within GENESPACE
Software, algorithmOrthoFinder v2.5Emms and Kelly, 2019; 10.1186/s13059-019-1832-yRRID:SCR_017118Orthogroup and pairwise orthologue inference with hierarchical orthogroups (HOGs); used within GENESPACE
Software, algorithmOrthoFinder v3.1.0Emms et al., 2025; 10.1101/2025.07.15.664860RRID:SCR_017118Orthogroup inference across 13 molluscan species for gene family expansion analysis; default parameters; rooted species tree generated via STAG (Ponte et al., 2023) and STRIDE (Andrews et al., 2013)
Software, algorithmMCScanXWang et al., 2012; 10.1093/nar/gkr1293RRID:SCR_022067Pairwise syntenic block identification (onlyOgAnchors = TRUE, blkSize = 5, nGaps = 5, blkRadius = 25, synBuff = 100, nSecondaryHits = 0)
Software, algorithmdbscan (R package)Hahsler et al., 2019; 10.18637/jss.v091.i01Density-based clustering of MCScanX anchor hits into syntenic regions
Software, algorithmbwa-mem2 v2.3Vasimuddin et al., 2019; 10.1109/IPDPS.2019.00041RRID:SCR_022192Alignment of Hi-C reads for breakpoint coverage analysis
Software, algorithmpairtools v1.1.0Abdennur et al., 2023; 10.1101/2023.02.13.528389RRID:SCR_023038Quantification of Hi-C contacts; extraction of trans pairs from deduplicated read pairs (pair type UU)
Software, algorithmpysam v0.22.1pysam-developers, 2026; https://github.com/pysam-developers/pysamRRID:SCR_021017HiFi read depth via count_coverage (MAPQ ≥10, 1 kb bins); spanning reads identified by querying split alignments
Software, algorithmSTAR v2.7.11bDobin et al., 2013; 10.1093/bioinformatics/bts635RRID:SCR_004463Alignment of short reads to chromosome-scale assembly (sex chromosome analysis and RNA-seq)
Software, algorithmmosdepthPedersen and Quinlan, 2018; 10.1093/bioinformatics/btx699RRID:SCR_018929Sequencing coverage calculation for sex chromosome analysis
Software, algorithmape v5.8.1 (R package)Paradis and Schliep, 2019; 10.1093/bioinformatics/bty633RRID:SCR_017343Conversion of rooted OrthoFinder species tree to ultrametric tree
Software, algorithmCAFE5 v5.1.1Mendes et al., 2021; 10.1093/bioinformatics/btaa1022RRID:SCR_018924Gene family evolution rate estimation
Software, algorithmbedtools v2.30Quinlan and Hall, 2010; 10.1093/bioinformatics/btq033RRID:SCR_006646bedtools intersect for CDS–RepeatMasker overlap analysis of expanded gene family members
Software, algorithmfeatureCounts (Subread v2.0.8)Liao et al., 2014; 10.1093/bioinformatics/btt656RRID:SCR_012919Gene-level read counting from STAR-aligned RNA-seq BAMs (-t exon, -g gene_id, -p --countReadPairs, -Q 255)
Software, algorithmDESeq2 v1.42.0Love et al., 2014; 10.1186/s13059-014-0550-8RRID:SCR_015687Tissue marker identification in bulk RNA-seq data
Software, algorithmapeglmZhu et al., 2019; 10.1093/bioinformatics/bty895RRID:SCR_026951log2 fold-change shrinkage applied to DESeq2 results
Software, algorithmclusterProfiler v4.12.6Yu et al., 2012; 10.1089/omi.2011.0118RRID:SCR_016884GO enrichment via enricher() with custom GO annotations from InterProScan
Author response table 1
soff250801_mpibrsoff250801_dtol
contigsp_ctghap1hap2P_ctghap1hap2
records8.28910.65110.4258.78311.02611.089
length.raw6.049.669.4435.675.386.9865.662.586.0386.053.996.4525.721.157.2695.950.565.264
length._min2.4962.9592.9595.1515.9996.915
length.n25825.171518.304518.787865.207560.043580.263
length.n501.723.2031.032.6321.010.3751.810 .1371.165.5781.182.649
length.n753.129.7881.845.0421.789 .7553.317 .9232.125.4672.203.556
length.max14.924.4207.470 .22910.284.78515.032.8779.223.47311.008.627
length.med309.429293.763314.184245.676239.367250.010
length.avg729.843532.850543.173689.285518.878536.618
length.top46339.372.830213.767 .346225.912.626364.533 .811278.738 .760283.344.104
frac.top465,6097747693,7665686333,989566336,0213747054,8720695294,761633415

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Simone Daniela Rencken
  2. Georgi Tushev
  3. David Hain
  4. Elena Ciirdaeva
  5. Oleg Simakov
  6. Gilles Laurent
(2026)
Chromosome-scale genome assembly of the European common cuttlefish Sepia officinalis
eLife 14:RP107393.
https://doi.org/10.7554/eLife.107393.3